pfm(7)pfm(7)NAMEpfm - The on-chip performance counter pseudo-device
SYNOPSIS
pseudo-device pfmDESCRIPTION
The pfm pseudo-device is the interface to Alpha implementation-specific
on-chip performance counters. A set of ioctl calls form the interface,
as defined in the <sys/pfcntr.h> header file.
The kernel in use must have the pfm pseudo-device configured into it.
To do this, use one of the following methods: Add the following line to
the kernel configuration file and rebuild the kernel. Do not use this
method if CPU hot-swap is supported by the system, because it does not
allow pfm to be easily unconfigured as required for a hot-swap;
instead, use the sysconfig method below.
pseudo-device pfm Enter the following command from the root
account. Do not configure pfm if CPU hot-swap is anticipated.
# sysconfig -cpfm
If pfm is configured, the CPU hot-swap procedure requires that
it be unconfigured, using the following command, before any CPU
is swapped.
# sysconfig -upfm
The autosysconfig program can be used to automatically load the
configurable pfm device at each system startup.
EV4 INTERFACE DESCRIPTION
The EV4 implementations (21064, 21064A, 21066, and 21068) have two
counters, each of which can be independently programmed to count cer‐
tain internal or external events. Each counter interrupts the system
when a certain number of the selected events have been counted. Any one
of the following three actions can happen at each interrupt (tick):
Counters (PFM_COUNTERS) IPL histogramming (PFM_IPL) User or kernel PC
profiling (PFM_PROFILING)
These values are defined in <sys/pfcntr.h> and can be selected orthogo‐
nally by bitwise ORing the selections together and passing the result
to the PCNTSETITEMS ioctl request.
If counters are enabled, the interrupt count for this event is incre‐
mented. This records the number of times each event has happened, in
multiples of the interrupt frequency selected (PCNTSETMUX). Note that
the driver can only count the interrupts generated; no direct access to
the EV4 on-chip counter values is provided.
If IPL histogramming is enabled, the appropriate entry in the IPL array
is incremented. The entries are: 0-5 refer to IPL0-IPL5. 6 is unused.
(IPL6 is the level of the performance counter interrupts.) 7 counts
“idle” ticks (IPL = 0 and current_thread = idle_thread). 8 counts user
mode ticks.
If profiling is enabled, a PC sample is added to the profile histogram
if the mode is correct (kernel or user).
Each CPU in a multiprocessor platform has separate counters, and the
device can be opened in three different ways: PCNTOPENONE opens and
collects data on only the CPU that the program is running on. PCNTOPE‐
NEACH opens all CPUs but keeps data for each one separately. PCNTOPE‐
NALL opens all CPUs, aggregating the data for all CPUs into one collec‐
tion.
These values are defined in <sys/pfcntr.h> and are bitwise ORed into
the mode passed to the device open call. Note that if PCNTOPENONE is
selected, the opening thread/process must be bound to that processor;
otherwise, the open will fail. It must also remain bound to that pro‐
cessor for the duration of the driver usage or extremely unpredictable
results will occur.
The following ioctl calls apply to the performance counter pseudo-
device. Note that most of the EV4 ioctls can also be used on EV5, EV6,
and EV7:
Disables performance counter interrupts on the CPU. Takes no arguments.
Enables performance counter interrupts on the CPU. Takes no arguments.
Selects the statistics to be counted by each performance counter and
the interrupt frequency. Takes a pointer to a struct iccsr that con‐
tains the MUX register values desired. The fields in this register are:
Controls the interrupt frequency of performance counter 0. If set,
interrupt frequency is every 2^12 events. If clear, interrupt frequency
is every 2^16 events. Controls the interrupt frequency of performance
counter 1. If set, interrupt frequency is every 2^8 events. If clear,
interrupt frequency is every 2^12 events. Selects the event counted by
counter 0. One of: PF_ISSUES, PF_PIPEDRY, PF_LOADI, PF_PIPEFROZEN,
PF_BRANCHI, PF_CYCLES, PF_PALMODE, PF_NONISSUES, PF_EXTPIN0 Selects the
event counted by counter 1. One of: PF_DCACHE, PF_ICACHE, PF_DUAL,
PF_BRANCHMISS, PF_FPINST, PF_INTOPS, PF_STOREI, PF_EXTPIN1 Contains two
bits, each of which disables data collection on the specified counter.
For example, set to 2 to disable counter 1 and enable counter 0. Cannot
be set to 3 (which disables both counters, causing PCNTSETMUX to return
EINVAL). Do not set these fields. Must be zero. Selects the data
items to be collected at each tick: Counters (PFM_COUNTERS) IPL his‐
togramming (PFM_IPL) User or kernel PC profiling (PFM_PROFILING - see
PCNTSETUADDR, PCNTSETURANGE, PCNTSETKADDR, and PCNTSETKRANGE)
These values are defined in <sys/pfcntr.h> and can be selected
orthogonally by bitwise ORing the selections together into the
integer argument. If no items are selected, returns EINVAL.
Sets the on-chip counters to count all system activity. Takes no
arguments and returns no errors. Sets the on-chip counters to
count only those threads/processes with the PCB_PME_BIT set in
their PCBs, and sets the PCB_PME_BIT for this process. This bit
is inherited across fork/exec, setting it for all children.
Takes no arguments and returns no errors. Clears the
PCB_PME_BIT in the PCB of the current process. Takes no argu‐
ments and returns no errors. Clears the driver's internal coun‐
ters appropriate to the actions selected. If PFM_COUNTERS is
enabled, the interrupt counters and cycle counter value are
reset. If PFM_IPL is enabled, the IPL histogram is reset. If
neither is enabled (PFM_PROFILING only), returns EINVAL and
nothing is cleared. Takes no arguments. Returns the driver's
counter values and the pcc value(s). Takes a pointer to an array
of struct pfcntrs; the array is filled in with the values. Sam‐
ple usage of this ioctl is: struct pfcntrs cntrs[NUM_OF_CPUS];
struct pfcntrs *pfcntrs = cntrs; ioctl (fd, PCNTGETCNT, &pfcn‐
trs);
If the driver is opened in mode PCNTOPENEACH, the underlying
array must be big enough to hold all of the data for each CPU;
otherwise, EFAULT is returned. If the driver is opened in mode
PCNTOPENONE or PCNTOPENALL, the array can be one element. If
PFM_COUNTER is not enabled, returns EINVAL. Returns the number
of bytes of data available to read for getting the PC profiling
samples. By default this will be equal to one fourth of the
address range being profiled. (By default, profiling data is
kept as one bucket per four instructions, which corresponds to a
default profiling stride of 4 instructions per sample count.) If
the driver is opened in mode PCNTOPENEACH, this number of bytes
will be multiplied by the number of CPUs.
To set the profiling address range and stride (and select user
or kernel profiling), use the PCNTSETURANGE or PCNTSETKRANGE
ioctl, respectively. To set the address range without changing
the stride, you can also use the PCNTSETUADDR or PCNTSETKADDR
ioctl.
The PCNTGETRSIZE ioctl takes a pointer to a long and returns no
errors. The returned value will be 0 if profiling is not cur‐
rently selected or if the address range and mode have not been
specified. Returns the current IPL histogram(s). Takes a
pointer to an array of struct pfipls; the array is filled in
with the values. Sample usage of this ioctl is: struct pfipls
ipls[NUM_OF_CPUS]; struct pfipls *pfipls = ipls; ioctl (fd,
PCNTGETIPLHIS, &pfipls);
If the driver is opened in mode PCNTOPENEACH, the underlying
array must be big enough to hold all of the data for each CPU.
If the underlying array is not big enough, EFAULT might be
returned or other data in the program might be overwritten.
If the driver is opened in mode PCNTOPENONE or PCNTOPENALL, the
array can be one element. If PFM_IPL is not enabled, returns
EINVAL. If kernel mode profiling is turned on (with
PCNTSETKADDR or PCNTSETKRANGE), directs the profiler to collect
data on the caller of certain system utility routines (for exam‐
ple, bcopy, bzero, simple_lock). If kernel mode profiling is not
turned on, returns EINVAL. (See also the descriptions of
PCNTSETKADDR and PCNTSETKRANGE for information about their use
in PCNTCALLER mode.) Sets the kernel address range to profile
and turns on kernel mode PC profiling. If the device is not open
for profiling, returns EINVAL. If memory cannot be obtained for
the sample data, returns ENOMEM.
If PCNTCALLER kernel profiling mode is engaged, specifies an
additional address range to collect profiling data on the caller
of a routine, instead of the routine itself. Takes a start and
end address range. Up to 4 additional address ranges may be
added; additional attempts will return ENOSPC. If the addresses
are out of range of kernel text, not aligned, or otherwise
invalid, returns EFAULT.
Note that PCNTSETKRANGE performs the same functions as
PCNTSETKADDR and, in addition, lets you set the profiling
stride. Sets the kernel address range to profile and sets the
profile stride (the number of consecutive instructions grouped
together for each sample count). The stride must be a power of
two (for example, 0, 1, 2, 4, 8). A zero stride means there
should be only one counter for the whole address range. This
ioctl also turns on kernel mode PC profiling. If the device is
not open for profiling, returns EINVAL. If memory cannot be
obtained for the sample data, returns ENOMEM.
If PCNTCALLER kernel profiling mode is engaged, specifies an
additional address range to collect profiling data on the caller
of a routine, instead of the routine itself. Takes a start and
end address range, and ignores the stride. Up to 4 additional
address ranges may be added; additional attempts will return
ENOSPC. If the addresses are out of range of kernel text, not
aligned, or otherwise invalid, returns EFAULT. Sets the user
address range to profile and turns on user mode PC profiling. If
the device is not open for profiling, returns EINVAL. If memory
cannot be obtained for the sample data, returns ENOMEM. Note
that PCNTSETURANGE performs the same functions as PCNTSETUADDR
and, in addition, lets you set the profiling stride. Sets the
user address range to profile and sets the profile stride (the
number of consecutive instructions grouped together for each
sample count). The stride must be a power of two (for example,
0, 1, 2, 4, 8). A zero stride means there should be only one
counter for the whole address range. This ioctl also turns on
user mode PC profiling. If the device is not open for profiling,
returns EINVAL. If memory cannot be obtained for the sample
data, returns ENOMEM.
Only one process can have the pfm device open at any point in time. If
the device is opened with PCNTOPENONE, only the specified CPU is con‐
sidered open; subsequent open attempts will return EBUSY. If the device
is opened with PCNTOPENALL or PCNTOPENEACH, all CPUs must be available;
otherwise, returns EBUSY.
EBUSY will also be returned if another tool is using the performance
counters (or has used them but has not restored the default performance
counter interrupt handler). In this case, if you are sure no other
users are using the performance counters, re-execute the open call with
superuser privilege. This will reset the busy status and proceed to use
the counters.
It is sufficient to open the device read-only. Opening the device will
disable interrupts (PCNTDISABLE) and log all system activity (PCNTLO‐
GALL), generating simple counters only. The counters are not cleared.
Closing the device automatically disables interrupts and resets the
service routines (PCNTDISABLE).
EV4 DETAILED STAT DESCRIPTIONS
Following are more detailed descriptions of each of the events that can
be counted by the two on-chip counters associated with the EV4 imple‐
mentations. For more information, consult the 21064 chip specifica‐
tion.
Counter 0: This counter is incremented by one for each cycle in which
two instructions are issued and is incremented by 1/2 for each cycle in
which one instruction is issued. The number of cycles in which one
instruction is issued can be found by using the Dual Issues field and
the equation S = (I - D) * 2, where S = Single Issues, D = Dual Issues,
and I = Issues. This counter is incremented by one for each cycle in
which nothing is issued due to the lack of valid instruction stream
data. The causes could be instruction cache refill operations (due to
normal sequential operation or delays while fetching the target of a
branch) or delays caused by the draining of the pipeline in response to
an exception. This counter is incremented for each load instruction.
Note: If a load misses in the primary data cache, the replay of the
instruction will cause the load counter to be incremented again. This
counter is incremented for each cycle in which nothing is issued due to
a resource conflict within the pipeline. Examples are: Not all source
and destination registers are available A load miss or write buffer
overflow occurs A conditional branch cannot be issued in the cycle fol‐
lowing a jump Memory Barrier instruction processing can cause the pipe
to freeze This counter is incremented for each branch instruction.
This counter is incremented for each cycle. This counter is incre‐
mented for each cycle spent in PALmode. This counter is incremented by
one for each cycle in which no instructions are issued and is incre‐
mented by 1/2 for each cycle in which only one instruction is issued.
This counter is the inverse of the Issues counter: Non-issues = 1 -
Issues. This counter is incremented for each external event supplied
to external pin 0. On the DEC 3000/500 and DEC 3000/400, this pin is
connected to logic that indicates external cache misses with victims. A
victim is a data block that must be written back to main memory before
it is reused.
Counter 1: This counter is incremented for each primary data cache
miss. Note: this counter actually is incremented each time a primary
data cache probe does not complete in one cycle. This includes all
misses, but also includes hits that are stalled for other reasons such
as bus traffic holding previously misses pending. This counter is
incremented for each primary instruction cache miss. This counter is
incremented for each cycle in which two instructions are dual-issued.
This counter is incremented for each incorrectly predicted branch.
This counter is incremented for each floating-point operate instruc‐
tion. The floating-point operate instructions do not include the float‐
ing-point load, floating-point branch and floating-point store instruc‐
tions. This counter is incremented for each integer operate instruc‐
tion as well as for each Load Address and Load Address High instruc‐
tion. This counter is incremented for each store instruction. This
counter is incremented for each external event supplied to external pin
1. On the DEC 3000/500 and DEC 3000/400, this pin is connected to logic
that indicates external cache misses without victims.
Most items count the instances of different types of instructions.
These counters are incremented for each occurrence, and they do not
give information about the cost of executing the instruction. The Pipe
Frozen/Dry counter increments for each frozen or dry cycle, not for
each instance of pipe freeze or pipe dry.
EV5 INTERFACE DESCRIPTION
The EV5 implementations (21164, 21164A, and 21164PC) have three coun‐
ters, each of which can be independently programmed to count certain
internal or external events. They operate in much the same way as on
EV4. Most of the EV4 ioctl calls can also be used on EV5. Here are some
descriptions for EV5-specific ioctl calls: Selects the events counted
by all three counters. The argument is a bitwise OR of one event name
for each counter. See <sys/pfcntr.h> for the identifiers for the
events: PF5_MUX0_*, PF5_MUX1_*, PF5_MUX2_*. Selects the sampling
interrupt frequency for all three counters. The argument is a bitwise
OR of one frequency indicator for each counter. A frequency of 256
requires superuser privilege because it can place an extremely heavy
load on the system. Only carefully selected rare events should be
counted with such a high frequency. A lower frequency is usually advis‐
able, for example: PF5_C0_INT_EVERY_65536 PF5_C1_INT_EVERY_65536
PF5_C2_INT_EVERY_16384 Enables selected counters. (PCNT5RESTART zeroes
them first.) The argument is the address of the pmctrs_ev5_long member
of a union pmctrs_ev5, with the following additional field-member
assignments: pmctrs_ev5_cpu = PMCTRS_ALL_CPUS pmctrs_ev5_select = any
combination of PF5_SEL_COUNTER_0, PF5_SEL_COUNTER_1, and
PF5_SEL_COUNTER_2 using a bitwise OR operator Disables selected coun‐
ters. Clears or writes selected counters on selected CPUs. The argu‐
ment is the address of the pmctrs_ev5_long member of a union pmc‐
trs_ev5. See <sys/pfcntr.h> for more information. Sets contexts in
which to count. The argument is a bitwise OR of selected PF5_CTXT_*
values. Similar to EV4's PCNTGETCNT except that the argument is a
pointer to an array of struct pfcntrs_ev5. Similar to PCNT5GETCNT
except that the driver's counter values (i.e., the number of interrupts
from each counter) are shifted left by the counter width. The current
raw hardware counters are read and added to the tally. Reads the hard‐
ware counters from the selected CPU. The argument is the address of the
pmctrs_ev5_long member of a union pmctrs_ev5. See <sys/pfcntr.h> for
more information.
EV5 DETAILED STAT DESCRIPTIONS
Following are more detailed descriptions of each of the events that can
be counted by the three on-chip counters associated with the EV5 imple‐
mentations. For more information, see the 21164 or 21164PC chip speci‐
fication.
All EV5 Implementations (EV5, EV56, PCA56)
Counter 0: This counter is incremented for each cycle. (Note that
counter 2 also has a cycles counter.) This counter is incremented for
each instruction.
Counter 1: This counter is incremented for each cycle in which valid
instructions are ready for issue, but none are issued because of a
pipeline stall or because the resources they need are not available.
This counter is incremented for each cycle in which some but not all of
the maximum of four instructions are issued. This counter is incre‐
mented for each cycle in which no instructions are ready to issue.
This counter is incremented for each time an instruction has to be exe‐
cuted again (instead of those behind it in the pipeline) because
resources it needed were found to be unavailable the first time it exe‐
cuted. This counter is incremented for each cycle in which one
instruction is issued. This counter is incremented for each cycle in
which two instructions are issued. This counter is incremented for
each cycle in which three instructions are issued. This counter is
incremented for each cycle in which four instructions are issued. This
counter is incremented for each branch, jump, or return instruction.
This counter is incremented for each integer operation. This counter
is incremented for each floating-point operation. This counter is
incremented for each load operation. This counter is incremented for
each store operation. This counter is incremented for each Instruction
Cache access. This counter is incremented for each Data Cache access.
Counter 2: This counter is incremented for each long pipeline stall
(over 15 cycles). This counter is incremented for each PC mispredic‐
tion. This counter is incremented for each branch misprediction. This
counter is incremented for each instruction not found in either the
Instruction Cache or the associated Refill Buffer. This counter is
incremented for each Instruction Cache miss for which the instruction's
page entry is not stored in the Instruction Translation Buffer. This
counter is incremented for each load of a value that is not in the Data
Cache. This counter is incremented for each Data Cache miss for which
the data page entry is not stored in the Data Translation Buffer. This
counter is incremented for each load from an address that misses in the
Data Cache but is merged with another load from the same address that
is already in the Missed Address File. This counter is incremented for
each Data Cache miss (for a load) that causes the replay of a later
instruction that uses the loaded value. This counter is incremented
for each store that is replayed because the Write Buffer is full and
for each load that is replayed because the Missed Address File is full.
This counter is incremented for each cycle for which the perf_mon_h
External Input pin is true. This counter is incremented for each
cycle. (Note that counter 0 also has a cycles counter.) This counter
is incremented for each stall cycle resulting from a Memory Barrier.
This counter is incremented for each Locked Load instruction.
EV5 and EV56 Implementations Only
Counter 1: This counter is incremented for each Secondary Cache access
(for either instructions or data). This counter is incremented for
each read from the Secondary Cache. This counter is incremented for
each write to the Secondary Cache. (Note that counter 2 also has a
scachewrites counter.) This counter is incremented for each time a
data block in the Secondary Cache must be written back to main memory
before it is reused. This counter is incremented for each access to
the optional, board-level Backup Cache. This counter is incremented
for each time a data block in the Backup Cache must be written back to
main memory before it is reused. This counter is incremented for each
system request.
Counter 2: This counter is incremented for each Secondary Cache miss.
This counter is incremented for each Secondary Cache Read miss. This
counter is incremented for each Secondary Cache Write miss. This
counter is incremented for each Secondary Cache Shared Write operation.
This counter is incremented for each Secondary Cache Write operation.
(Note that counter 1 also has a scachewrites counter.) This counter is
incremented for each miss in the optional board-level Backup Cache.
This counter is incremented for each System Invalidate operation. This
counter is incremented for each System Read Request.
PCA56 Implementation Only
Counter 1: This counter is incremented for each read request from the
MBOX. This counter is incremented for each Dstream read request that
hits in the bcache. This counter is incremented for each Dstream read
fill to the Bcache. This counter is incremented for each write request
from the MBOX. This counter is incremented for each write that hits a
clean block in the Bcache. This counter is incremented for each VICTIM
command issued by the 21164PC. This counter is incremented each time a
second READ_MISS is sent to the system while an earlier READ_MISS com‐
mand is still outstanding.
Counter 2: This counter is incremented for each Dstream read request
from the MBOX. This counter is incremented for each read request that
hits in the Bcache. This counter is incremented for each read fill to
the Bcache. This counter is incremented for each write that hits in
the Bcache. This counter is incremented for each write fill to the
Bcache. This counter is incremented for each system READ or FLUSH hit
in the Bcache. This counter is incremented for each system READ or
FLUSH request. This counter is incremented each time a third READ_MISS
is sent to the system while two earlier READ_MISS commands are still
outstanding.
EV6 INTERFACE DESCRIPTION
The EV6 implementation (21264) has two counters, each of which can be
programmed to count certain internal or external events. They operate
in much the same way as the counters on EV4 and EV5. Most of the EV4
ioctl calls can also be used on EV6. Below are some descriptions for
EV6-specific ioctl calls. Note that the EV6 interface should also be
used on EV7 systems. Selects the events counted by the two counters.
The argument is a bitwise OR of one event name for each counter. See
<sys/pfcntr.h> for the identifiers for the events: PF6_MUX0_*,
PF6_MUX1_*. Enables selected counters. PCNT6RESTART zeros them first.
PCNT6ENABWRITE sets them to specified values. The argument is the
address of the pmctrs_ev6_long member of a union pmctrs_ev6, with the
following additional field-member assignments: pmctrs_ev6_cpu = PMC‐
TRS_ALL_CPUS pmctrs_ev6_select = any combination of PF6_SEL_COUNTER_0
and PF6_SEL_COUNTER_1 using a bitwise OR operator. Disables selected
counters. Clears or writes selected counters on selected CPUs. The
argument is the address of the pmctrs_ev6_long member of a union pmc‐
trs_ev6. See <sys/pfcntr.h> for more information. Similar to EV4's
PCNTGETCNT except that the argument is a pointer to an array of struct
pfcntrs_ev6. Reads the hardware counters from the selected CPU. The
argument is the address of the pmctrs_ev6_long member of a union pmc‐
trs_ev6. See <sys/pfcntr.h> for more information. Similar to
PCNT6GETCNT except that the driver's counter values (i.e., the number
of interrupts from each counter) are shifted left by the counter width.
The current raw hardware counters are read and added to the tally.
EV6 DETAILED STAT DESCRIPTIONS
Following are more detailed descriptions of each of the events that can
be counted by the two on-chip counters associated with the EV6 imple‐
mentation. For more information, see the 21264 chip specification.
Counter 0: This counter is incremented for each cycle. (Note that
counter 1 also has a cycles counter.) This counter is incremented for
every retired instruction.
Counter 1: This counter is incremented for each cycle. (Note that
counter 0 also has a cycles counter.) This counter is incremented for
each retired conditional branch. This counter is incremented twice for
each retired single dstream translation buffer (DTB) miss. This
counter is incremented for each retired double DTB miss. This counter
is incremented for each retired instruction translation buffer (ITB)
miss. This counter is incremented for each retired unaligned trap.
This counter is incremented for each replay trap.
EV67 AND EV7 DETAILED STAT DESCRIPTIONS
Following are some descriptions of events that can be counted by the
on-chip counters associated with the EV67 implementation. The EV67
counters may be used in two mutually exclusive modes: traditional
aggregate and profile-me. The EV67 traditional aggregate counters are
not completely independent. Any one statistic may be selected, or one
of the following pairs may be selected: (cycles0, replay); (retinst,
cycles1); (retinst, bcachemisses). EV7 provides the same statistics
that EV67 does.
Counter 0: This counter is incremented for each cycle. (Note that
counter 1 also has a cycles counter.) This counter is incremented for
every retired instruction.
Counter 1: This counter is incremented for each cycle. (Note that
counter 0 also has a cycles counter.) This counter is incremented for
each miss in the Backup Cache. This counter is incremented for each
replay trap.
EV67 profile-me mode and traditional aggregate counters work differ‐
ently: instead of counting events as done by traditional aggregate
counters, instructions in profile-me mode are uniformly selected and
various events are recorded during the execution of each selected
instruction.
The descriptions below are written for the perspective of a uprofile or
kprofile user. For example, the *_per_ret statistics actually cause the
pfm driver to return (statistic, retired) pairs which are later pro‐
cessed by uprofile or kprofile. Similarly, the freq statistic is
merely the same as the retired statistic until uprofile or kprofile
postprocesses it.
Any one of the following profile-me statistics may be selected. This
statistic is incremented if the profiled execution is aborted. This
ratio is the abort statistic scaled by 100 and divided by the retired
statistic. This statistic is incremented if the profiled execution
causes an arithmetic trap. This statistic is incremented if the pro‐
filed execution is a taken conditional branch. This ratio is the
cbr_taken statistic scaled by 100 and divided by the retired statistic.
This statistic is incremented by the approximate number of cycles the
execution was in flight. This ratio is the cycles statistic divided by
the retired statistic. This statistic is incremented by the approxi‐
mate retire delay of the profiled execution. This ratio is the delay
statistic scaled by 100 and divided by the retired statistic. This
statistic is incremented if the profiled execution causes a Dstream
fault. This statistic is incremented if the profiled execution causes
a DTB single miss. This ratio is the dtb_miss statistic scaled by 100
and divided by the retired statistic. This statistic is incremented if
the profiled execution causes a DTB double miss (3 level page tables).
This statistic is incremented if the profiled execution causes a DTB
double miss (4 level page tables). This statistic is incremented if
the profiled execution is killed early in the pipeline. This ratio is
the early_kill statistic scaled by 100 and divided by the retired
statistic. This statistic is incremented if the profiled execution
causes a floating-point disabled trap. This statistic is incremented
if the profiled execution retires. uprofile and kprofile average this
statistic within basic blocks to provide instruction execution fre‐
quency estimates. This statistic is incremented if the profiled execu‐
tion was not yet prefetched for the cache. Note the profiled instruc‐
tion may experience an unrecorded icache miss if the fetch is in
progress. This ratio is the icache_miss statistic scaled by 100 and
divided by the retired statistic. This statistic is incremented if the
profiled execution experienced an icache parity error. This statistic
is incremented by the approximate number of bcache misses during the
profiled execution. This statistic is incremented by the approximate
number of replay traps during the profiled execution. This statistic
is incremented by the approximate number of instruction retires during
the profiled execution. This statistic is incremented if the profiled
execution is pre-empted by an interrupt. This statistic is incremented
if the profiled execution causes an istream access violation. This
statistic is incremented if the profiled execution causes an ITB miss.
This statistic is incremented if the profiled execution causes a load-
store order trap. This statistic is incremented if the profiled execu‐
tion causes an unaligned load or store. This statistic is incremented
if the profiled execution stalled before it was mapped. This ratio is
the map_stall statistic scaled by 100 and divided by the retired
statistic. This statistic is incremented if the profiled execution
experiences a misprediction. This ratio is the mispredict statistic
scaled by 100 and divided by the retired statistic. This statistic is
incremented if the profiled execution causes a reserved opcode trap.
This statistic is incremented if the profiled execution causes a replay
trap. This ratio is the replay_trap statistic scaled by 100 and
divided by the retired statistic. This statistic is incremented if the
profiled execution retires. This statistic is incremented if the pro‐
filed execution causes a trap. This ratio is the trap statistic scaled
by 100 and divided by the retired statistic. This statistic is incre‐
mented if the profiled execution is valid.
For more information, see the 21264a chip specification.
NOTES
The notes in this section pertain only to EV4 processors.
Disabling an EV4 counter cannot actually disable it from interrupting
the CPU. However, the interrupt will be dismissed without recording any
data.
Connections of the CPU's External Input pins to external events are
platform dependent. The DEC 3000/400, /500, /600, /800 workstations
have these connections; they count BCache Misses and BCache Misses with
Victims.
Generating statistics on a per-process basis is only possible on 21064
Pass 3 or later processors. Attempts to do this on a Pass 2 or earlier
will gather statistics for the entire system.
FILES
The device entry (character, dev# 26/0) Structure definitions
SEE ALSO
Commands: kprofile(1), uprofile(1), prof(1), sysconfig(8), autosyscon‐
fig(8)pfm(7)