Monitors
Introduction [edit section]
"Monitors" is a proposed feature for Fuego to execution separate processes (or use external hardware) to perform measurement before, after and/or during a test.This data can be used to stress the system, or to gather more information about the execution of the target board related to the test. However, the primary use is intended to be to allow adding an extra dimension to testing, but allowing the data from a monitor to be used for results determination.
For example, imagine that you wish to capture data from a power measuring device (a separate piece of hardware attached to the device under test) during test execution. Suppose the measuring device reported data via a serial connection to the host. A test could start a monitor, which would collect data during the test, and then the test could check that data to see if the power exceeded some threshold. In this case, the test "log" data would come from the monitor, rather than the program being executed on the target.
Ideas [edit section]
There needs to be a mechanism to start a monitor, stop a monitor, and tell the monitor where to place data (probably the log directory).There needs to be a mechanism for a test to check the monitor data after the test (either by converting the monitor data into the test data, or by telling the parser the alternate data stream to check).
There could be multiple monitors running.
Details [edit section]
- add a function 'start_monitor' to Fuego core
- function takes a monitor name
- monitors are shell scripts located in fuego-core/monitors
- monitors are run in test context
- they may need to communicate with board control to find the ports
or data for external hardware
- e.g. the control_dev for SDB attached to the board, for power
- it would be nice to be able to run monitors separate from the test system
- this means that there needs to be a monitor abstraction layer
- monitors should be runnable outside test context
- the name 'monitor' implies that something is being watched.
- is it being reported continuosly, or is it being watched for a trigger
- you could imagine a "report-on-condition" monitor, where something was
watched, and a report made if the condition was every met
- similar in spirit to 'ttc wait-for', but with condition logic
- could you use ttc wait-for for 'report-on-condition'? (maybe so)
- similar in spirit to 'ttc wait-for', but with condition logic
- the initial implementation concept was to report a stream of data at a periodic interval
- where does the interval come from
- two possibilities:
- defined by the monitor (it reports when it can)
- defined by the test (test requests the monitor frequency (e.g. watch)
- probably allow requests, but if monitor ignores it, the test has to deal with it
- two possibilities:
- some monitors don't have a frequency - they just gather data
- eg. video or audio
- a monitor could be 'audio-pops' to monitor an audio stream for pops
- this would only report errors, not provide the whole stream
Where is meta-data for the monitor setup kept? [edit section]
- shouldn't have test know about type of hardware used for monitoring
- this should be in the board control layer
- if there's no board control layer, the data should be in the board file?
- no, this should only have information about the board, not the farm
- if there's no board control layer, the data should be in the board file?
- the test should be able to call a generic monitor of a particular kind:
- serial monitor, power monitor, audio monitor, video monitor
- the generic monitor should call the particular board's monitor
- e.g. sdb_power_monitor
- this should go into ttc?
what should the monitor output? [edit section]
- 0-day doesn't worry about this because 0-day monitors track things on the local system
- what does workload automation do?
- does the monitor output have to be well-defined for Fuego to work on it?
- yes.
- different power monitors will provide power data in different formats
- there needs to be a single format for tests to use to analyze power data
- this means there needs to be a converter (or translator) (possibly for each monitor), and a common format spec. Ugh - another one.
- a test should be able to ask for a 'power' monitor, without knowing the
details
- this means that some layer has to map that request to 'sdb_power_monitor' or 'acme_power_monitor'
- a monitors should return a stream of data (in real time), or a file
- it should be a stream of data? and the system should save it to a file if needed
- what about for audio or video?
- what about hardware trace data?
- what about digital analyzer data?
- it should be a stream of data? and the system should save it to a file if needed
how should the monitor be defined? [edit section]
- 0day has some very simple monitors, where they just mention a filename in /proc, and the system apparently snapshots it periodically during the test
- that's a handy framework
aligning monitor data with testlog data [edit section]
There will need to be some kind of system to synchronize the monitor data with the testlog data.Solution: time annotations per line.
For now, ignore this issue, but here are some ideas:
- add time-of-day data to the monitor data (in a uniform format)
- add time-of-day data to the test log data (in a uniform format)
- coalesce (or co-chart) the data, so that tests can see the effect of test operation on the monitored data. (This seems like something that snmp would have had to deal with - does something like this already exist?)
how to monitors stop [edit section]
The test can terminate them manually.Fuego should have the capability to stop them when the test_run is complete automatically. This means that Fuego tracks monitors and manages their lifecycles.
use of monitor data for test results [edit section]
A monitor could be used for diagnostic data only (not used by the test, but only for human use for analysis after the test), or the post-processing step of the test might use the data to indicate the result.For example, here are two different scenarios:
- run the nuttx prime test, and measure power output (just for information)
- run a power test, with prime as the stress application. In this case the test is actually examining the power log as the 'test log' for the test.
Resources [edit section]
0day [edit section]
0day has a monitors. They are implemented as shell scripts.See https://github.com/fengguang/lkp-tests/tree/master/monitors
They have features for automatically adding the loop.
A monitor can be a simple as a mention of a file in /proc. apparently, the framework will sample that file at a regular frequency.
There is some kind of timeout for a monitor.
(I'm not sure whether monitors stop themselves, or are stopped by the system)
There is also some mechanism where the monitor will do setup, then wait for a signal (not a Unix signal, but some indication) that they should start.
workload automation [edit section]
wa has a feature called "augmentations" that is similar to monitors (I believe). They have a plugin called an "instrument", that can be used to gather data during a run (?).See https://workload-automation.readthedocs.io/en/latest/plugins/instruments.html