Look for stuck channels or bits.
BitTest monitors specified channels for bits that remain in one state and for channels that repeat a value twice or more in succession. The channels to be monitored may produce either integer or floating point data, but the bit checking facility functions only for the integer channels. BitTest accumulates statistics on all requested channels for a fixed number of frames, until a specified time or until an external interrupt is received. Any detected problems are reported at the end of the period and the accumulation cycle is restarted.
The BitTest error reports are produced in two sections with each section written to a separate file. The first section lists the channels found to be missing or to have data errors. The second section gives detailed statistics on each channel.
Triggers may be generated for each erroneous channel found. One trigger is generated for each flagged channel in each statistics accumulation period. The generation of all triggers may be enabled or disabled with a command line argument.
The BitTest Configuration File
The BitTest configuration file specifies the channels to be monitored and parameters affecting the generation of triggers for each channel. Each line of the configuration file specifies the parameters for a single channel and contains the following fields:
<channel-name> <bit-range> <max-repetition>
The meaning of each field is as follows:
<channel-name> Name of the channel to be monitored. <bit-range> Mask of bits to be monitored. <max-repetition> Maximum number of times a value may be repeated.
The bits in the mask need not be adjacent. If the bit mask is zero, an automatic error detection algorithm is used that requires that any stuck bits be adjacent high-order bits. Thus, any bit that is stuck in one state, is not the most significant bit and is not adjacent to a higher order stuck bit will generate a trigger. This will give the desired result in cases where the channel measures a signal that varies smoothly over a given range. This condition is satisfied by most LIGO signals.
Repetition count triggers may be disabled by setting the maximum repetition count to zero. The configuration file is reread each time a SIGUSR1 signal is received. This means that the monitor process need not be restarted in order to change the configuration. Instead, the configuration file can be modified, and the process reconfigured with a "kill -USR1 <pid>" command.
The syntax of the BitTest command is as follows:BitTest [-partition <pname>] [-infile <file>] [cfile <config>] \ [ofile <out-file>] [reset <nsec>] [synch hh[:mm[:ss]]] \ [-debug <dbg-level>] [+trig[ger]] [-toc]
Where the arguments have the following meaning:
<pname> Shared memory partition name with data to be read <file> Input frame file(s) (exclusive of <pname>) <dbg-level> Debug level <config> Configuration file name. <nsec> Accumulation time in seconds hh:mm:ss Time (in current UTC day) to generate first report <out-file> Root output file name (defaults to "BitTest.junk")
The partition name is mutually exclusive with the input frame file name. If both are specified, data are read from the specified shared memory partition. The debug level defaults to 0 (no debug messages). Any other value for <dbg-level> will cause debugging messages to be printed to cout/cerr. Reports are produced at hh:mm:ss and every <nsec> seconds after that. If <nsec> is not specified, BitTest will continue to accumulate statistics until it catches either a SIGTERM or SIGUSR1 signal. If hh:mm:ss is not specified, BitTest will produce a report after <nsec> frames have been received.
Modifying the online configuration
The BitTest configuration file can be modified while BitTest is running. This is accomplished by editing the current BitTest configuration file, usually ~ops/pars/BitTest.conf. Once the file has been modified, BitTest can be made to read in the new configuration by signaling the running process with SIGUSR1. When the signal is caught by the process, BitTest will write out the status files with whatever statistics have already been collected, read the configuration file and restart processing. The SIGUSR1 signal is delivered with the following command:
kill -USR1 <pid>
where <pid> is the ID of the BitTest process. The process ID(s) can be found with, e.g.
ps -eopid,comm | grep BitTest BitTest Output
BitTest generates a trigger for each flagged channel at the end of each statistics accumulation period (nominally 20 minutes). The trigger has a trigger ID of BitTest and a sub-ID of the channel name. The trigger user data contains the following double precision float fields:
Triggers will not be produced unless the "+trig[ger]" option is specified on the command line.
- Number of words read
- Maximum repetition count
- Number of readout errors
- Mask of bits always set
- Mask of bits always zero.
- Number of Overflows/Underflows
BitTest generates an alarm for each channel with a bit error or data overflow error at the end of each statistics accumulation period. Bit errors are indicated by a Bit_is_Stuck alarm and data overflows are indicated by a Overflow alarm. The Alarms' short descriptions (displayed by holding your pointer over the severity ball) give the channel names and for stuck bit errors, the hex masks of the stuck bits.
In general, BitTest alarms do not indicate serious problems. Nevertheless, channels with consistent alarms should be investigated when time permits.
Reports and Other Output
The BitTest reports are divided into two sections. The first section is a list all channels that were found to have errors. This list is stored in the file named by "<out-file>.Errors" where the file root name is specified on the command line. The error list is further divided into categories containing channels with the following errors:
- Channels with errors: List of channels that either have one or more stuck bits or have a repetition count greater than the maximum specified.
- Channels with all one value: List of channels that never changed value during the accumulation period. Channels are listed here even if the repetition count test is disabled by setting the maximum to zero
- Channels not read out: List of channels that were requested in the configuration file, but never found in the data frames.
The second section is a table with statistics and status information for all configured channels. The table is written to a file named "<out-file>.Statistics" and contains the following information for each channel:
- Error Flag: A flag (****) is printed for Channels failing the bit or repetition test.
- Channel Name: Obvious
- Frames: The number of frames in which the channel was seen.
- On Bits: A hex mask containing a 1 in each bit position in which the data was always 1.
- Off Bits: A hex mask containing a 1 in each bit position in which the data was always 0.
- Repeat Count: Largest number of consecutive samples containing the same data value.
- Average value: Average of all accumulated samples.
- Sigma: Standard deviation from the mean of all samples during the accumulation period.
- Minimum: The smallest (signed) value found during the accumulation period.
- Maximum: The largest (signed) value found during the accumulation period.
alphabetic index hierarchy of classes
generated by doc++