pfm man page on Tru64

pfm man page on Tru64

Man page or keyword search:
man Server 12896 pages
apropos Keyword Search (all sections)
Output format

pfm(7)									pfm(7)

NAME
       pfm - The on-chip performance counter pseudo-device

SYNOPSIS
       pseudo-device pfm

DESCRIPTION
       The pfm pseudo-device is the interface to Alpha implementation-specific
       on-chip performance counters.  A set of ioctl calls form the interface,
       as defined in the <sys/pfcntr.h> header file.

       The  kernel  in use must have the pfm pseudo-device configured into it.
       To do this, use one of the following methods: Add the following line to
       the  kernel  configuration file and rebuild the kernel. Do not use this
       method if CPU hot-swap is supported by the system, because it does  not
       allow  pfm  to  be  easily  unconfigured	 as  required  for a hot-swap;
       instead, use the sysconfig method below.
	    pseudo-device	pfm Enter the following command from the  root
       account. Do not configure pfm if CPU hot-swap is anticipated.
	    # sysconfig -c pfm

	      If  pfm  is configured, the CPU hot-swap procedure requires that
	      it be unconfigured, using the following command, before any  CPU
	      is swapped.
		   # sysconfig -u pfm

	      The  autosysconfig program can be used to automatically load the
	      configurable pfm device at each system startup.

EV4 INTERFACE DESCRIPTION
       The EV4 implementations (21064, 21064A,	21066,	and  21068)  have  two
       counters,  each	of which can be independently programmed to count cer‐
       tain internal or external events. Each counter  interrupts  the	system
       when a certain number of the selected events have been counted. Any one
       of the following three actions can happen  at  each  interrupt  (tick):
       Counters	 (PFM_COUNTERS)	 IPL histogramming (PFM_IPL) User or kernel PC
       profiling (PFM_PROFILING)

       These values are defined in <sys/pfcntr.h> and can be selected orthogo‐
       nally  by  bitwise ORing the selections together and passing the result
       to the PCNTSETITEMS ioctl request.

       If counters are enabled, the interrupt count for this event  is	incre‐
       mented.	 This  records the number of times each event has happened, in
       multiples of the interrupt frequency selected (PCNTSETMUX).  Note  that
       the driver can only count the interrupts generated; no direct access to
       the EV4 on-chip counter values is provided.

       If IPL histogramming is enabled, the appropriate entry in the IPL array
       is  incremented. The entries are: 0-5 refer to IPL0-IPL5.  6 is unused.
       (IPL6 is the level of the performance counter  interrupts.)   7	counts
       “idle” ticks (IPL = 0 and current_thread = idle_thread).	 8 counts user
       mode ticks.

       If profiling is enabled, a PC sample is added to the profile  histogram
       if the mode is correct (kernel or user).

       Each  CPU  in  a multiprocessor platform has separate counters, and the
       device can be opened in three different	ways:  PCNTOPENONE  opens  and
       collects data on only the CPU that the program is running on.  PCNTOPE‐
       NEACH opens all CPUs but keeps data for each one separately.   PCNTOPE‐
       NALL opens all CPUs, aggregating the data for all CPUs into one collec‐
       tion.

       These values are defined in <sys/pfcntr.h> and are  bitwise  ORed  into
       the  mode  passed  to the device open call. Note that if PCNTOPENONE is
       selected, the opening thread/process must be bound to  that  processor;
       otherwise,  the	open will fail. It must also remain bound to that pro‐
       cessor for the duration of the driver usage or extremely	 unpredictable
       results will occur.

       The  following  ioctl  calls  apply  to the performance counter pseudo-
       device. Note that most of the EV4 ioctls can also be used on EV5,  EV6,
       and EV7:

       Disables performance counter interrupts on the CPU. Takes no arguments.
       Enables performance counter interrupts on the CPU. Takes no  arguments.
       Selects	the  statistics	 to be counted by each performance counter and
       the interrupt frequency. Takes a pointer to a struct  iccsr  that  con‐
       tains the MUX register values desired. The fields in this register are:
       Controls the interrupt frequency of  performance	 counter  0.  If  set,
       interrupt frequency is every 2^12 events. If clear, interrupt frequency
       is every 2^16 events.  Controls the interrupt frequency of  performance
       counter	1.  If set, interrupt frequency is every 2^8 events. If clear,
       interrupt frequency is every 2^12 events.  Selects the event counted by
       counter	0.  One	 of:  PF_ISSUES,  PF_PIPEDRY, PF_LOADI, PF_PIPEFROZEN,
       PF_BRANCHI, PF_CYCLES, PF_PALMODE, PF_NONISSUES, PF_EXTPIN0 Selects the
       event  counted  by  counter  1.	One of: PF_DCACHE, PF_ICACHE, PF_DUAL,
       PF_BRANCHMISS, PF_FPINST, PF_INTOPS, PF_STOREI, PF_EXTPIN1 Contains two
       bits,  each of which disables data collection on the specified counter.
       For example, set to 2 to disable counter 1 and enable counter 0. Cannot
       be set to 3 (which disables both counters, causing PCNTSETMUX to return
       EINVAL).	 Do not set these fields. Must	be  zero.   Selects  the  data
       items  to  be  collected at each tick: Counters (PFM_COUNTERS) IPL his‐
       togramming (PFM_IPL) User or kernel PC profiling (PFM_PROFILING	-  see
       PCNTSETUADDR, PCNTSETURANGE, PCNTSETKADDR, and PCNTSETKRANGE)

	      These  values  are defined in <sys/pfcntr.h> and can be selected
	      orthogonally by bitwise ORing the selections together  into  the
	      integer  argument.  If  no  items	 are selected, returns EINVAL.
	      Sets the on-chip counters to count all system activity. Takes no
	      arguments	 and  returns no errors.  Sets the on-chip counters to
	      count only those threads/processes with the PCB_PME_BIT  set  in
	      their  PCBs, and sets the PCB_PME_BIT for this process. This bit
	      is inherited across fork/exec,  setting  it  for	all  children.
	      Takes   no   arguments   and  returns  no	 errors.   Clears  the
	      PCB_PME_BIT in the PCB of the current process.  Takes  no	 argu‐
	      ments and returns no errors.  Clears the driver's internal coun‐
	      ters appropriate to the actions  selected.  If  PFM_COUNTERS  is
	      enabled,	the  interrupt	counters  and  cycle counter value are
	      reset. If PFM_IPL is enabled, the IPL  histogram	is  reset.  If
	      neither  is  enabled  (PFM_PROFILING  only),  returns EINVAL and
	      nothing is cleared. Takes no arguments.	Returns	 the  driver's
	      counter values and the pcc value(s). Takes a pointer to an array
	      of struct pfcntrs; the array is filled in with the values.  Sam‐
	      ple  usage  of this ioctl is: struct pfcntrs cntrs[NUM_OF_CPUS];
	      struct pfcntrs *pfcntrs = cntrs; ioctl (fd,  PCNTGETCNT,	&pfcn‐
	      trs);

	      If  the  driver  is  opened in mode PCNTOPENEACH, the underlying
	      array must be big enough to hold all of the data for  each  CPU;
	      otherwise,  EFAULT  is returned. If the driver is opened in mode
	      PCNTOPENONE or PCNTOPENALL, the array can	 be  one  element.  If
	      PFM_COUNTER  is not enabled, returns EINVAL.  Returns the number
	      of bytes of data available to read for getting the PC  profiling
	      samples.	By  default  this  will	 be equal to one fourth of the
	      address range being profiled. (By	 default,  profiling  data  is
	      kept as one bucket per four instructions, which corresponds to a
	      default profiling stride of 4 instructions per sample count.) If
	      the  driver is opened in mode PCNTOPENEACH, this number of bytes
	      will be multiplied by the number of CPUs.

	      To set the profiling address range and stride (and  select  user
	      or  kernel  profiling),  use  the PCNTSETURANGE or PCNTSETKRANGE
	      ioctl, respectively. To set the address range  without  changing
	      the  stride,  you	 can also use the PCNTSETUADDR or PCNTSETKADDR
	      ioctl.

	      The PCNTGETRSIZE ioctl takes a pointer to a long and returns  no
	      errors.  The  returned  value will be 0 if profiling is not cur‐
	      rently selected or if the address range and mode have  not  been
	      specified.   Returns  the	 current  IPL  histogram(s).   Takes a
	      pointer to an array of struct pfipls; the	 array	is  filled  in
	      with  the	 values.  Sample usage of this ioctl is: struct pfipls
	      ipls[NUM_OF_CPUS]; struct pfipls	*pfipls	 =  ipls;  ioctl  (fd,
	      PCNTGETIPLHIS, &pfipls);

	      If  the  driver  is  opened in mode PCNTOPENEACH, the underlying
	      array must be big enough to hold all of the data for  each  CPU.
	      If  the  underlying  array  is  not  big enough, EFAULT might be
	      returned or other data in the program might be overwritten.

	      If the driver is opened in mode PCNTOPENONE or PCNTOPENALL,  the
	      array  can  be  one  element. If PFM_IPL is not enabled, returns
	      EINVAL.	If  kernel  mode  profiling   is   turned   on	 (with
	      PCNTSETKADDR  or PCNTSETKRANGE), directs the profiler to collect
	      data on the caller of certain system utility routines (for exam‐
	      ple, bcopy, bzero, simple_lock). If kernel mode profiling is not
	      turned  on,  returns  EINVAL.  (See  also	 the  descriptions  of
	      PCNTSETKADDR  and	 PCNTSETKRANGE for information about their use
	      in PCNTCALLER mode.)  Sets the kernel address range  to  profile
	      and turns on kernel mode PC profiling. If the device is not open
	      for profiling, returns EINVAL. If memory cannot be obtained  for
	      the sample data, returns ENOMEM.

	      If  PCNTCALLER  kernel  profiling	 mode is engaged, specifies an
	      additional address range to collect profiling data on the caller
	      of  a  routine, instead of the routine itself. Takes a start and
	      end address range. Up to 4  additional  address  ranges  may  be
	      added;  additional attempts will return ENOSPC. If the addresses
	      are out of range of  kernel  text,  not  aligned,	 or  otherwise
	      invalid, returns EFAULT.

	      Note   that   PCNTSETKRANGE   performs  the  same	 functions  as
	      PCNTSETKADDR and,	 in  addition,	lets  you  set	the  profiling
	      stride.	Sets  the kernel address range to profile and sets the
	      profile stride (the number of consecutive	 instructions  grouped
	      together	for  each sample count). The stride must be a power of
	      two (for example, 0, 1, 2, 4, 8).	 A  zero  stride  means	 there
	      should  be  only	one  counter for the whole address range. This
	      ioctl also turns on kernel mode PC profiling. If the  device  is
	      not  open	 for  profiling,  returns  EINVAL. If memory cannot be
	      obtained for the sample data, returns ENOMEM.

	      If PCNTCALLER kernel profiling mode  is  engaged,	 specifies  an
	      additional address range to collect profiling data on the caller
	      of a routine, instead of the routine itself. Takes a  start  and
	      end  address  range,  and ignores the stride. Up to 4 additional
	      address ranges may be added;  additional	attempts  will	return
	      ENOSPC.  If  the	addresses are out of range of kernel text, not
	      aligned, or otherwise invalid, returns EFAULT.   Sets  the  user
	      address range to profile and turns on user mode PC profiling. If
	      the device is not open for profiling, returns EINVAL. If	memory
	      cannot  be  obtained  for	 the sample data, returns ENOMEM. Note
	      that PCNTSETURANGE performs the same functions  as  PCNTSETUADDR
	      and,  in	addition, lets you set the profiling stride.  Sets the
	      user address range to profile and sets the profile  stride  (the
	      number  of  consecutive  instructions  grouped together for each
	      sample count). The stride must be a power of two	(for  example,
	      0,  1,  2,  4,  8). A zero stride means there should be only one
	      counter for the whole address range. This ioctl  also  turns  on
	      user mode PC profiling. If the device is not open for profiling,
	      returns EINVAL. If memory cannot	be  obtained  for  the	sample
	      data, returns ENOMEM.

       Only  one process can have the pfm device open at any point in time. If
       the device is opened with PCNTOPENONE, only the specified CPU  is  con‐
       sidered open; subsequent open attempts will return EBUSY. If the device
       is opened with PCNTOPENALL or PCNTOPENEACH, all CPUs must be available;
       otherwise, returns EBUSY.

       EBUSY  will  also  be returned if another tool is using the performance
       counters (or has used them but has not restored the default performance
       counter	interrupt  handler).  In  this	case, if you are sure no other
       users are using the performance counters, re-execute the open call with
       superuser privilege. This will reset the busy status and proceed to use
       the counters.

       It is sufficient to open the device read-only. Opening the device  will
       disable	interrupts  (PCNTDISABLE) and log all system activity (PCNTLO‐
       GALL), generating simple counters only. The counters are	 not  cleared.
       Closing	the  device  automatically  disables interrupts and resets the
       service routines (PCNTDISABLE).

EV4 DETAILED STAT DESCRIPTIONS
       Following are more detailed descriptions of each of the events that can
       be  counted  by the two on-chip counters associated with the EV4 imple‐
       mentations.  For more information, consult the  21064  chip  specifica‐
       tion.

       Counter	0:  This counter is incremented by one for each cycle in which
       two instructions are issued and is incremented by 1/2 for each cycle in
       which  one  instruction	is  issued.  The number of cycles in which one
       instruction is issued can be found by using the Dual Issues  field  and
       the equation S = (I - D) * 2, where S = Single Issues, D = Dual Issues,
       and I = Issues.	This counter is incremented by one for each  cycle  in
       which  nothing  is  issued  due to the lack of valid instruction stream
       data. The causes could be instruction cache refill operations  (due  to
       normal  sequential  operation  or delays while fetching the target of a
       branch) or delays caused by the draining of the pipeline in response to
       an  exception.	This counter is incremented for each load instruction.
       Note: If a load misses in the primary data cache,  the  replay  of  the
       instruction  will cause the load counter to be incremented again.  This
       counter is incremented for each cycle in which nothing is issued due to
       a  resource conflict within the pipeline.  Examples are: Not all source
       and destination registers are available A load  miss  or	 write	buffer
       overflow occurs A conditional branch cannot be issued in the cycle fol‐
       lowing a jump Memory Barrier instruction processing can cause the  pipe
       to  freeze  This	 counter  is  incremented for each branch instruction.
       This counter is incremented for each cycle.   This  counter  is	incre‐
       mented for each cycle spent in PALmode.	This counter is incremented by
       one for each cycle in which no instructions are issued  and  is	incre‐
       mented  by  1/2 for each cycle in which only one instruction is issued.
       This counter is the inverse of the Issues counter:  Non-issues  =  1  -
       Issues.	 This  counter is incremented for each external event supplied
       to external pin 0. On the DEC 3000/500 and DEC 3000/400,	 this  pin  is
       connected to logic that indicates external cache misses with victims. A
       victim is a data block that must be written back to main memory	before
       it is reused.

       Counter	1:  This  counter  is  incremented for each primary data cache
       miss. Note: this counter actually is incremented each  time  a  primary
       data  cache  probe  does	 not  complete in one cycle. This includes all
       misses, but also includes hits that are stalled for other reasons  such
       as  bus	traffic	 holding  previously  misses pending.  This counter is
       incremented for each primary instruction cache miss.  This  counter  is
       incremented  for	 each cycle in which two instructions are dual-issued.
       This counter is incremented  for	 each  incorrectly  predicted  branch.
       This  counter  is  incremented for each floating-point operate instruc‐
       tion. The floating-point operate instructions do not include the float‐
       ing-point load, floating-point branch and floating-point store instruc‐
       tions.  This counter is incremented for each integer  operate  instruc‐
       tion  as	 well  as for each Load Address and Load Address High instruc‐
       tion.  This counter is incremented for each  store  instruction.	  This
       counter is incremented for each external event supplied to external pin
       1. On the DEC 3000/500 and DEC 3000/400, this pin is connected to logic
       that indicates external cache misses without victims.

       Most  items  count  the	instances  of different types of instructions.
       These counters are incremented for each occurrence,  and	 they  do  not
       give  information about the cost of executing the instruction. The Pipe
       Frozen/Dry counter increments for each frozen or	 dry  cycle,  not  for
       each instance of pipe freeze or pipe dry.

EV5 INTERFACE DESCRIPTION
       The  EV5	 implementations (21164, 21164A, and 21164PC) have three coun‐
       ters, each of which can be independently programmed  to	count  certain
       internal	 or  external  events. They operate in much the same way as on
       EV4. Most of the EV4 ioctl calls can also be used on EV5. Here are some
       descriptions  for  EV5-specific ioctl calls: Selects the events counted
       by all three counters. The argument is a bitwise OR of one  event  name
       for  each  counter.   See  <sys/pfcntr.h>  for  the identifiers for the
       events:	PF5_MUX0_*,  PF5_MUX1_*,  PF5_MUX2_*.	Selects	 the  sampling
       interrupt  frequency  for all three counters. The argument is a bitwise
       OR of one frequency indicator for each  counter.	 A  frequency  of  256
       requires	 superuser  privilege  because it can place an extremely heavy
       load on the system. Only	 carefully  selected  rare  events  should  be
       counted with such a high frequency. A lower frequency is usually advis‐
       able,  for   example:   PF5_C0_INT_EVERY_65536	PF5_C1_INT_EVERY_65536
       PF5_C2_INT_EVERY_16384 Enables selected counters.  (PCNT5RESTART zeroes
       them first.) The argument is the address of the pmctrs_ev5_long	member
       of  a  union  pmctrs_ev5,  with	the  following additional field-member
       assignments: pmctrs_ev5_cpu = PMCTRS_ALL_CPUS pmctrs_ev5_select	=  any
       combination     of     PF5_SEL_COUNTER_0,     PF5_SEL_COUNTER_1,	   and
       PF5_SEL_COUNTER_2 using a bitwise OR operator Disables  selected	 coun‐
       ters.   Clears  or writes selected counters on selected CPUs. The argu‐
       ment is the address of the  pmctrs_ev5_long  member  of	a  union  pmc‐
       trs_ev5.	 See  <sys/pfcntr.h>  for  more information.  Sets contexts in
       which to count. The argument is a bitwise  OR  of  selected  PF5_CTXT_*
       values.	 Similar  to  EV4's  PCNTGETCNT	 except that the argument is a
       pointer to an array of  struct  pfcntrs_ev5.   Similar  to  PCNT5GETCNT
       except that the driver's counter values (i.e., the number of interrupts
       from each counter) are shifted left by the counter width.  The  current
       raw hardware counters are read and added to the tally.  Reads the hard‐
       ware counters from the selected CPU. The argument is the address of the
       pmctrs_ev5_long	member	of  a union pmctrs_ev5. See <sys/pfcntr.h> for
       more information.

EV5 DETAILED STAT DESCRIPTIONS
       Following are more detailed descriptions of each of the events that can
       be counted by the three on-chip counters associated with the EV5 imple‐
       mentations.  For more information, see the 21164 or 21164PC chip speci‐
       fication.

   All EV5 Implementations (EV5, EV56, PCA56)
       Counter	0:  This  counter  is  incremented for each cycle.  (Note that
       counter 2 also has a cycles counter.)  This counter is incremented  for
       each instruction.

       Counter	1:  This  counter is incremented for each cycle in which valid
       instructions are ready for issue, but none  are	issued	because	 of  a
       pipeline	 stall	or  because the resources they need are not available.
       This counter is incremented for each cycle in which some but not all of
       the  maximum  of	 four instructions are issued.	This counter is incre‐
       mented for each cycle in which no  instructions	are  ready  to	issue.
       This counter is incremented for each time an instruction has to be exe‐
       cuted again (instead of	those  behind  it  in  the  pipeline)  because
       resources it needed were found to be unavailable the first time it exe‐
       cuted.  This counter  is	 incremented  for  each	 cycle	in  which  one
       instruction  is	issued.	 This counter is incremented for each cycle in
       which two instructions are issued.  This	 counter  is  incremented  for
       each  cycle  in	which  three instructions are issued.  This counter is
       incremented for each cycle in which four instructions are issued.  This
       counter	is  incremented	 for each branch, jump, or return instruction.
       This counter is incremented for each integer operation.	 This  counter
       is  incremented	for  each  floating-point  operation.  This counter is
       incremented for each load operation.  This counter is  incremented  for
       each store operation.  This counter is incremented for each Instruction
       Cache access.  This counter is incremented for each Data Cache access.

       Counter 2: This counter is incremented for  each	 long  pipeline	 stall
       (over  15  cycles).  This counter is incremented for each PC mispredic‐
       tion.  This counter is incremented for each branch misprediction.  This
       counter	is  incremented	 for  each instruction not found in either the
       Instruction Cache or the associated Refill  Buffer.   This  counter  is
       incremented for each Instruction Cache miss for which the instruction's
       page entry is not stored in the Instruction Translation	Buffer.	  This
       counter is incremented for each load of a value that is not in the Data
       Cache.  This counter is incremented for each Data Cache miss for	 which
       the data page entry is not stored in the Data Translation Buffer.  This
       counter is incremented for each load from an address that misses in the
       Data  Cache  but is merged with another load from the same address that
       is already in the Missed Address File.  This counter is incremented for
       each  Data  Cache  miss	(for a load) that causes the replay of a later
       instruction that uses the loaded value.	This  counter  is  incremented
       for  each  store	 that is replayed because the Write Buffer is full and
       for each load that is replayed because the Missed Address File is full.
       This  counter  is  incremented  for each cycle for which the perf_mon_h
       External Input pin is true.   This  counter  is	incremented  for  each
       cycle.	(Note that counter 0 also has a cycles counter.)  This counter
       is incremented for each stall cycle resulting from  a  Memory  Barrier.
       This counter is incremented for each Locked Load instruction.

   EV5 and EV56 Implementations Only
       Counter	1: This counter is incremented for each Secondary Cache access
       (for either instructions or data).  This	 counter  is  incremented  for
       each  read  from	 the Secondary Cache.  This counter is incremented for
       each write to the Secondary Cache. (Note that  counter  2  also	has  a
       scachewrites  counter.)	 This  counter	is incremented for each time a
       data block in the Secondary Cache must be written back to  main	memory
       before  it  is  reused.	This counter is incremented for each access to
       the optional, board-level Backup Cache.	This  counter  is  incremented
       for  each time a data block in the Backup Cache must be written back to
       main memory before it is reused.	 This counter is incremented for  each
       system request.

       Counter	2:  This counter is incremented for each Secondary Cache miss.
       This counter is incremented for each Secondary Cache Read  miss.	  This
       counter	is  incremented	 for  each  Secondary  Cache Write miss.  This
       counter is incremented for each Secondary Cache Shared Write operation.
       This  counter  is incremented for each Secondary Cache Write operation.
       (Note that counter 1 also has a scachewrites counter.)  This counter is
       incremented  for	 each  miss  in the optional board-level Backup Cache.
       This counter is incremented for each System Invalidate operation.  This
       counter is incremented for each System Read Request.

   PCA56 Implementation Only
       Counter	1:  This counter is incremented for each read request from the
       MBOX.  This counter is incremented for each Dstream read	 request  that
       hits  in the bcache.  This counter is incremented for each Dstream read
       fill to the Bcache.  This counter is incremented for each write request
       from  the MBOX.	This counter is incremented for each write that hits a
       clean block in the Bcache.  This counter is incremented for each VICTIM
       command issued by the 21164PC.  This counter is incremented each time a
       second READ_MISS is sent to the system while an earlier READ_MISS  com‐
       mand is still outstanding.

       Counter	2:  This  counter is incremented for each Dstream read request
       from the MBOX.  This counter is incremented for each read request  that
       hits  in the Bcache.  This counter is incremented for each read fill to
       the Bcache.  This counter is incremented for each write	that  hits  in
       the  Bcache.   This  counter  is incremented for each write fill to the
       Bcache.	This counter is incremented for each system READ or FLUSH  hit
       in  the	Bcache.	  This	counter is incremented for each system READ or
       FLUSH request.  This counter is incremented each time a third READ_MISS
       is  sent	 to  the system while two earlier READ_MISS commands are still
       outstanding.

EV6 INTERFACE DESCRIPTION
       The EV6 implementation (21264) has two counters, each of which  can  be
       programmed  to  count certain internal or external events. They operate
       in much the same way as the counters on EV4 and EV5. Most  of  the  EV4
       ioctl  calls  can  also be used on EV6. Below are some descriptions for
       EV6-specific ioctl calls. Note that the EV6 interface  should  also  be
       used  on	 EV7 systems.  Selects the events counted by the two counters.
       The argument is a bitwise OR of one event name for each	counter.   See
       <sys/pfcntr.h>	for   the  identifiers	for  the  events:  PF6_MUX0_*,
       PF6_MUX1_*.  Enables selected counters. PCNT6RESTART zeros them	first.
       PCNT6ENABWRITE  sets  them  to  specified  values.  The argument is the
       address of the pmctrs_ev6_long member of a union pmctrs_ev6,  with  the
       following  additional  field-member  assignments: pmctrs_ev6_cpu = PMC‐
       TRS_ALL_CPUS pmctrs_ev6_select = any combination	 of  PF6_SEL_COUNTER_0
       and  PF6_SEL_COUNTER_1  using a bitwise OR operator.  Disables selected
       counters.  Clears or writes selected counters  on  selected  CPUs.  The
       argument	 is  the address of the pmctrs_ev6_long member of a union pmc‐
       trs_ev6. See <sys/pfcntr.h> for more  information.   Similar  to	 EV4's
       PCNTGETCNT  except that the argument is a pointer to an array of struct
       pfcntrs_ev6.  Reads the hardware counters from the  selected  CPU.  The
       argument	 is  the address of the pmctrs_ev6_long member of a union pmc‐
       trs_ev6.	 See  <sys/pfcntr.h>  for  more	  information.	  Similar   to
       PCNT6GETCNT  except  that the driver's counter values (i.e., the number
       of interrupts from each counter) are shifted left by the counter width.
       The current raw hardware counters are read and added to the tally.

EV6 DETAILED STAT DESCRIPTIONS
       Following are more detailed descriptions of each of the events that can
       be counted by the two on-chip counters associated with the  EV6	imple‐
       mentation.  For more information, see the 21264 chip specification.

       Counter	0:  This  counter  is  incremented for each cycle.  (Note that
       counter 1 also has a cycles counter.)  This counter is incremented  for
       every retired instruction.

       Counter	1:  This  counter  is  incremented for each cycle.  (Note that
       counter 0 also has a cycles counter.)  This counter is incremented  for
       each retired conditional branch.	 This counter is incremented twice for
       each retired  single  dstream  translation  buffer  (DTB)  miss.	  This
       counter	is incremented for each retired double DTB miss.  This counter
       is incremented for each retired instruction  translation	 buffer	 (ITB)
       miss.   This  counter  is  incremented for each retired unaligned trap.
       This counter is incremented for each replay trap.

EV67 AND EV7 DETAILED STAT DESCRIPTIONS
       Following are some descriptions of events that can be  counted  by  the
       on-chip	counters  associated  with  the	 EV67 implementation. The EV67
       counters may be used  in	 two  mutually	exclusive  modes:  traditional
       aggregate  and profile-me.  The EV67 traditional aggregate counters are
       not completely independent. Any one statistic may be selected,  or  one
       of  the	following  pairs may be selected: (cycles0, replay); (retinst,
       cycles1); (retinst, bcachemisses). EV7  provides	 the  same  statistics
       that EV67 does.

       Counter	0:  This  counter  is  incremented for each cycle.  (Note that
       counter 1 also has a cycles counter.)  This counter is incremented  for
       every retired instruction.

       Counter	1:  This  counter  is  incremented for each cycle.  (Note that
       counter 0 also has a cycles counter.)  This counter is incremented  for
       each  miss  in  the Backup Cache.  This counter is incremented for each
       replay trap.

       EV67 profile-me mode and traditional aggregate  counters	 work  differ‐
       ently:  instead	of  counting  events  as done by traditional aggregate
       counters, instructions in profile-me mode are  uniformly	 selected  and
       various	events	are  recorded  during  the  execution of each selected
       instruction.

       The descriptions below are written for the perspective of a uprofile or
       kprofile user. For example, the *_per_ret statistics actually cause the
       pfm driver to return (statistic, retired) pairs which  are  later  pro‐
       cessed  by  uprofile  or	 kprofile.   Similarly,	 the freq statistic is
       merely the same as the retired statistic	 until	uprofile  or  kprofile
       postprocesses it.

       Any  one	 of the following profile-me statistics may be selected.  This
       statistic is incremented if the profiled execution  is  aborted.	  This
       ratio  is  the abort statistic scaled by 100 and divided by the retired
       statistic.  This statistic is incremented  if  the  profiled  execution
       causes  an  arithmetic trap.  This statistic is incremented if the pro‐
       filed execution is a taken  conditional	branch.	  This	ratio  is  the
       cbr_taken statistic scaled by 100 and divided by the retired statistic.
       This statistic is incremented by the approximate number of  cycles  the
       execution was in flight.	 This ratio is the cycles statistic divided by
       the retired statistic.  This statistic is incremented by	 the  approxi‐
       mate  retire  delay of the profiled execution.  This ratio is the delay
       statistic scaled by 100 and divided by  the  retired  statistic.	  This
       statistic  is  incremented  if  the profiled execution causes a Dstream
       fault.  This statistic is incremented if the profiled execution	causes
       a  DTB single miss.  This ratio is the dtb_miss statistic scaled by 100
       and divided by the retired statistic.  This statistic is incremented if
       the  profiled execution causes a DTB double miss (3 level page tables).
       This statistic is incremented if the profiled execution	causes	a  DTB
       double  miss  (4	 level page tables).  This statistic is incremented if
       the profiled execution is killed early in the pipeline.	This ratio  is
       the  early_kill	statistic  scaled  by  100  and divided by the retired
       statistic.  This statistic is incremented  if  the  profiled  execution
       causes  a  floating-point disabled trap.	 This statistic is incremented
       if the profiled execution retires.  uprofile and kprofile average  this
       statistic  within  basic	 blocks	 to provide instruction execution fre‐
       quency estimates.  This statistic is incremented if the profiled execu‐
       tion  was  not yet prefetched for the cache. Note the profiled instruc‐
       tion may experience an unrecorded  icache  miss	if  the	 fetch	is  in
       progress.   This	 ratio	is the icache_miss statistic scaled by 100 and
       divided by the retired statistic.  This statistic is incremented if the
       profiled	 execution experienced an icache parity error.	This statistic
       is incremented by the approximate number of bcache  misses  during  the
       profiled	 execution.   This statistic is incremented by the approximate
       number of replay traps during the profiled execution.   This  statistic
       is  incremented by the approximate number of instruction retires during
       the profiled execution.	This statistic is incremented if the  profiled
       execution is pre-empted by an interrupt.	 This statistic is incremented
       if the profiled execution causes an  istream  access  violation.	  This
       statistic  is incremented if the profiled execution causes an ITB miss.
       This statistic is incremented if the profiled execution causes a	 load-
       store order trap.  This statistic is incremented if the profiled execu‐
       tion causes an unaligned load or store.	This statistic is  incremented
       if  the profiled execution stalled before it was mapped.	 This ratio is
       the map_stall statistic scaled  by  100	and  divided  by  the  retired
       statistic.   This  statistic  is	 incremented if the profiled execution
       experiences a misprediction.  This ratio is  the	 mispredict  statistic
       scaled  by 100 and divided by the retired statistic.  This statistic is
       incremented if the profiled execution causes a  reserved	 opcode	 trap.
       This statistic is incremented if the profiled execution causes a replay
       trap.  This ratio is  the  replay_trap  statistic  scaled  by  100  and
       divided by the retired statistic.  This statistic is incremented if the
       profiled execution retires.  This statistic is incremented if the  pro‐
       filed execution causes a trap.  This ratio is the trap statistic scaled
       by 100 and divided by the retired statistic.  This statistic is	incre‐
       mented if the profiled execution is valid.

       For more information, see the 21264a chip specification.

NOTES
       The notes in this section pertain only to EV4 processors.

       Disabling  an  EV4 counter cannot actually disable it from interrupting
       the CPU. However, the interrupt will be dismissed without recording any
       data.

       Connections  of	the  CPU's  External Input pins to external events are
       platform dependent. The DEC 3000/400,  /500,  /600,  /800  workstations
       have these connections; they count BCache Misses and BCache Misses with
       Victims.

       Generating statistics on a per-process basis is only possible on	 21064
       Pass  3 or later processors. Attempts to do this on a Pass 2 or earlier
       will gather statistics for the entire system.

FILES
       The device entry (character, dev# 26/0) Structure definitions

SEE ALSO
       Commands: kprofile(1), uprofile(1), prof(1), sysconfig(8),  autosyscon‐
       fig(8)

									pfm(7)

[top]

                             _         _         _ 
                            | |       | |       | |     
                            | |       | |       | |     
                         __ | | __ __ | | __ __ | | __  
                         \ \| |/ / \ \| |/ / \ \| |/ /  
                          \ \ / /   \ \ / /   \ \ / /   
                           \   /     \   /     \   /    
                            \_/       \_/       \_/

More information is available in HTML format for server Tru64

List of man pages available for Tru64

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]

Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................

Vote for polarhome