er_kernel(1)er_kernel(1)NAMEer_kernel - generate an Analyzer experiment on the Solaris kernel
SYNOPSISer_kernel args [load-command]
AVAILABILITY
Solaris systems with DTrace supported
DESCRIPTION
The er_kernel command can generate an experiment from the Solaris ker‐
nel, using the DTrace functionality provided with some Solaris
releases. The data may be examined with a GUI program, analyzer, or a
command-line version, er_print.
The er_kernel command may be used only by a user with DTrace privi‐
leges.
If an optional command to provide a load is given, er_kernel forks, and
the child sleeps for a quiet period, then executes the command to pro‐
vide a load. When the child exits, er_kernel continues for another
quiet period, and then exits. The duration of the quiet period may be
specified by a -q argument. The load command is launched as specified,
and may either be a command or a shell script. If it is a script, it
should wait for any commands it spawns to terminate before exiting, or
the experiment may be terminated prematurely.
If an optional -t argument is given, er_kernel will collect data
according to the -t argument, and then exit.
If neither is specified, er_kernel will run until terminated. It may
always be terminated by ctrl-C (SIGINT), or by using the kill command
and sending SIGINT, SIGQUIT, or SIGTERM to the er_kernel process.
ARGUMENTS
If invoked with no arguments, print a usage message.
If invoked with -h without any other arguments, and if the processor
supports hardware counter overflow profiling, print two lists contain‐
ing information about hardware counters. The first list contains
"aliased" hardware counters; the second list contains raw hardware
counters. For more details, see the "Hardware Counter Overflow Profil‐
ing" section in the collect.1 man page.
-p option
Collect clock-based profiles. The allowed values of option are:
Value Meaning
off turn off clock-based profiling
on turn on clock-based profiling with the default profil‐
ing interval of approximately 10 milliseconds
lo[w] turn on clock-based profiling with the low-resolution
profiling interval of approximately 100 milliseconds
hi[gh] turn on clock-based profiling with the high-resolution
profiling interval of approximately 1 millisecond
n turn on clock-based profiling with a profiling inter‐
val of n.
The value may be an integer or floating-point number,
with a suffix of u specifying microseconds, or m spec‐
ifying milliseconds. If no suffix is used, the value
will be assumed to be in milliseconds.
If the value is smaller than the system clock profiling minimum
it is set to the minimum; if it is not a multiple of the clock
profiling resolution it is rounded down to the nearest multiple
of the clock profiling resolution. If it exceeds the clock pro‐
filing maximum, an error is reported. If it is negative, an
error is reported. If it is zero, clock profiling is turned
off.
The DTrace profile provider, used to obtain the data, is avail‐
able only for integer values in ticks per second. The value
specified will be converted to an integer rate, and then con‐
verted back to the time corresponding to the actual rate used.
If no explicit -p off argument is given, and hardware-counter
overflow profiling is not turned on, clock-based profiling is
turned on by default.
-h option Collect hardware-counter overflow profiles (using the DTrace
cpc provider). The option is specified as for the collect
command. Hardware-counter profiling is not available on sys‐
tems prior to Oracle Solaris 11. If the overflow mechanism
on the chip allows the kernel to tell which counter over‐
flowed, as many counters as the chip provides may be used;
otherwise, only one counter may be specified. Dataspace pro‐
filing is not supported, and dataspace requests are ignored.
The system hardware-counter mechanism can be used by multiple
processes for user profiling, but can not be used for kernel
profiling if any user process, or cputrack, or another
er_kernel is using the mechanism. In that case, er_kernel
will report "HW counter profiling is not supported on this
system."
-F option Provide system-wide profiling, including the kernel and
applications. Control whether or not descendant processes
should have their data recorded. The allowed values of
option are:
Value Meaning
off Do not record experiments on application processes;
record on kernel only (Default).
on Record experiments on all application processes as
well as the kernel
all Record experiments on all application processes as
well as the kernel
=<regexp> Record experiments on processes whose name or PID
matches the regular-expression. See "SYSTEM-WIDE
PROFILING", below.
-T { pid/tid | 0/did }
-T is no longer supported.
-t duration
Collect data for the specified duration. duration may be a
single number, followed by either m, specifying minutes, or
s, specifying seconds (default), or two such numbers sepa‐
rated by a - sign. If one number is given, data will be col‐
lected from the start of the run until the given time; if two
numbers are given, data will be collected from the first time
to the second. If the second time is zero, data will be col‐
lected until the end of the run. If two non-zero numbers are
given, the first must be less than the second.
-q duration
Enforce a quiet period of length duration (seconds) before
and after running the specified load. Default duration is 3
seconds. The quiet period is ignored if no load is speci‐
fied.
-S interval
Collect periodic samples at the interval specified (in sec‐
onds). If interval is zero, do not collect periodic samples.
By default, enable periodic sampling at 1-second intervals.
The data recorded in the samples is data for the er_kernel
process, and includes a timestamp and execution statistics
from the kernel, among other things. Samples are markers
within the data, and can be used for filtering.
-C comment
Put the comment, either a single token, or a quoted string,
into the experiment. Up to ten comments may be provided.
-o experiment_name
Use experiment_name as the name of the experiment to be
recorded. The experiment_name string must end in the string
.er; if not, report an error, and do not run the experiment.
If -o is not specified, choose a name of the form stem.n.er,
where stem is a string, and n is a number. If a -g argument
is given, use the string appearing before the .erg suffix in
the group name as the stem prefix; if no -g argument is
given, set the stem prefix to the string ktest.
If the name is not specified in the form stem.n.er, and the
the given name is in use, print an error message and do not
run experiment. If the name is of that form, and the name is
in use, record the experiment under a name corresponding to
the first available value of n that is not in use; issue a
warning if the name is changed.
-l signal Record a sample point whenever the given signal is delivered
to the er_kernel process.
-y signal[,r]
Control recording of data with signal. Whenever the given
signal is delivered to the er_kernel process, switch between
paused (no data is recorded) and resumed (data is recorded)
states. er_kernel is started in the resumed state if the
optional ,r flag is given, otherwise it is started in the
paused state. This option shall not affect the recording of
sample points.
-d directory_name
Place the experiment in directory directory_name. if none is
given, record into the current working directory.
-g group_name
Consider the experiment to be part of experiment group
group_name. The group_name string must end in the string
.erg; if not, report an error, and do not the experiment.
-L size Limit the amount of profiling and tracing data recorded to
size megabytes. The limit applies to the sum of all profil‐
ing data and tracing data, but not to sample points. The
limit is only approximate, and can be exceeded. Terminate
the experiment when the limit is reached. The allowed values
of size are:
Value Meaning
unlimited or none
Do not impose a size limit on the experiment
n Impose a limit of n MB.; n must be positive and
greater than zero.
There is no default limit on the amount of data recorded.
-A option Control whether or not the kernel modules used during the run
are copied into the recorded experiment. The allowed values
of option are:
Value Meaning
on Archive the kernel modules.
off Do not archive the kernel modules into the experi‐
ment.
copy Copy the kernel modules into the experiment and ar‐
chive them.
To copy experiments onto a different machine, or read them
from a different machine, the user should specify -A copy.
The default setting for -A is copy.
-n Dry run: do not collect data, but print all the details of
the experiment that would be run. Turn on -v.
-V Print the current version. No further arguments are exam‐
ined, and no further processing is done.
-v Print detailed information about the experiment being run,
including the current version.
SYSTEM-WIDE PROFILING
If the -F argument is used to specify following user processes detected
during an er_kernel experiment, a sub-experiment for each such user
process is created. The user process will only record data when the
process is in user mode, and will record only the user callstack.
The subexperiments are named as follows:
_process-name_PID_process-pid.1.er
DATA RECORDED
Clock Profiling
Clock profiling experiments support two metrics, labeled "KCPU
Cycles" (metric name kcycles), for clock profile events recorded
in the kernel founder experiment, and "KUCPU Cycles" (metric name
kucycles) for clock profile event recorded in user process sub-
experiments, when the CPU is in user-mode. Data is recorded on a
per-CPU basis, with the CPU number recorded as the CPU, the PID of
the process on behalf of which the kernel is running recorded as
the LWPID, and the kernel thread ID recorded as thread in the raw
data.
The kernel founder experiment will contain data for the kcycles
metric. When the CPU is in system-mode, the kernel callstacks are
recorded; when the CPU is idle, a single-frame callstack for the
pseudo-function named:
<IDLE>
is recorded; when the CPU is in user-mode, a single-frame call‐
stack attributed to the pseudo-function named:
<process-name_PID_process-pid>
is recorded. In the kernel experiment, no callstack information
on the user processes is recorded.
If -F is used to specify following user processes, the subexperi‐
ments for each followed process will contain data for the kucycles
metric. User-level callstacks will be recorded for all clock pro‐
file events where that process was running in user mode.
Hardware Counter Profiling
Hardware counter profiles are recorded with the metric for the
named counter, using the system callstacks as described above for
clock-profiling experiments, in the founder experiment, and user
callstacks in the user-process subexperiments.
NOTE: Because the same metric is used in both the founder experi‐
ments, and the user-process subexperiments, the HW counter metric
will double count when the CPU is in user mode. It will count
against the user callstacks in the user subexperiments, and
against the pseudo-function representing that process in the
founder experiment. To avoid the double counting, filter the data
to see only the kernel experiment or only one or more user experi‐
ments.
PROFILING STATISTICS
When kernel profiling terminates, er_kernel will write several lines of
statistics for the driver, including any counts for run time errors.
RUN TIME ERRORS
While er_kernel is running, it processes DTrace events. Some of those
events are delivered with inconsistent data. Specifically, each event
has the process PID in two places, and they should be the same. How‐
ever, for reasons not yet understood, sometimes they are different.
Such events are recorded in the founder kernel experiment, against the
pseudo-function <INCONSISTENT_PID>. When these events occur, er_kernel
will also record an event in the subexperiment corresponding to the PID
in the reported user callstack. The errors are also counted and, if
verbose mode is set, a message will be written to stderr.
DTrace also sometimes reports various errors. The most common of these
is an invalid address, which appears to be harmless. These errors are
counted, and, if verbose mode is set, they are logged to stderr.
The stack unwind done by DTrace may be incorrect, and, especially on
x86/amd64 codes, may omit the caller of the current leaf frame. These
errors may occur on either the kernel stack or the user stack.
SYSTEM SETUP FOR DTRACE
Normally, the DTrace driver is restricted to user root. To use it as a
regular user, username, that user must have privileges assigned, and be
in group sys.
To give privileges to the user, add a line:
username::::defaultpriv=basic,dtrace_kernel,dtrace_proc
to the file /etc/user_attr.
To put the user in group sys, add username to the sys line in file
/etc/group.
You must log out, and then log in again after making these changes.
SEE ALSOdtrace(1M) (Solaris 10 or later), analyzer(1), collect(1), er_ar‐
chive(1), er_cp(1), er_export(1), er_mv(1), er_print(1), er_rm(1),
er_src(1), and the Performance Analyzer manual.
September 2011 er_kernel(1)