par_mem man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

PAR_MEM(8)			    LMBENCH			    PAR_MEM(8)

NAME
       par_mem - memory parallelism benchmark

SYNOPSIS
       par_mem	[ -L <line size> ] [ -M <len> ] [ -W <warmups> ] [ -N <repeti‐
       tions> ]

DESCRIPTION
       par_mem measures the available parallelism in the memory hierarchy,  up
       to  len	bytes.	 Modern	 processors  can often service multiple memory
       requests in parallel, while older processors typically blocked on  LOAD
       instructions and had no available parallelism (other than that provided
       by cache prefetching).  par_mem measures the available parallelism at a
       variety	of points, since the available parallelism is often a function
       of the data location in the memory hierarchy.

       In order to measure the available parallelism par_mem conducts a	 vari‐
       ety of experiments at each memory size; one for each level of parallel‐
       ism.  It builds a pointer chain of the desired length.  It then creates
       an  array  of  pointers	which  point to chain entries which are evenly
       spaced across the chain.	 Then it starts running the  pointers  forward
       through	the chain in parallel.	It can then measure the average memory
       latency for each level of parallelism, and the available parallelism is
       the  minimum  average  memory  latency for parallelism 1 divided by the
       average memory latency across all levels of available parallelism.

       For example, the inner loop which measures  parallelism	2  would  look
       something like:

       for  (i	=  0;  i  <  N; ++i) {	    p0 = (char **)*p0;	    p1 = (char
       **)*p1; }

       in a for loop (the overhead of the for loop  is	not  significant;  the
       loop  is	 an unrolled loop 100 loads long).  In this case, if the hard‐
       ware can process two LOAD operations  in	 parallel,  then  the  overall
       latency	of  the	 loop should be equivalent to that of a single pointer
       chain, so the measured parallelism would be roughly two.	 If,  however,
       the  hardware  can  only process a single LOAD operation at once, or if
       there is (significant) resource contention between the two LOAD	opera‐
       tions,  then  the  loop	will  be much slower than a loop with a single
       pointer chain, so the measured parallelism will be less than  two,  and
       probably no smaller than one.

OUTPUT
       Output  format  is  intended as input to xgraph or some similar program
       (we use a perl script that produces pic input).	There is a set of data
       produced	 for  each  stride.  The data set title is the stride size and
       the data points are the array size in megabytes (floating point	value)
       and the load latency over all points in that array.

SEE ALSO
       lmbench(8), line(8), cache(8), tlb(8), par_ops(8).

AUTHOR
       Carl Staelin and Larry McVoy

       Comments, suggestions, and bug reports are always welcome.

(c)2000 Carl Staelin and Larry McVoy$Date$			    PAR_MEM(8)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net