Fuego wiki

Monitors in 'raw' format

= Introduction =
"Monitors" is a proposed feature for Fuego to execution separate
processes (or use external hardware) to perform measurement
before, after and/or during a test.

This data can be used to stress the system, or to gather more information
about the execution of the target board related to the test.
However, the primary use is intended to be to allow adding an extra
dimension to testing, but allowing the data from a monitor to be
used for results determination.

For example, imagine that you wish to capture data from a power measuring
device (a separate piece of hardware attached to the device under test)
during test execution. Suppose the measuring device reported data via
a serial connection to the host. A test could start a monitor, which would
collect data during the test, and then the test could check that data
to see if the power exceeded some threshold. In this case, the test "log"
data would come from the monitor, rather than the program being executed
on the target.

= Ideas =
There needs to be a mechanism to start a monitor, stop a monitor, and tell the monitor where to place data (probably the log directory).

There needs to be a mechanism for a test to check the monitor data after
the test (either by converting the monitor data into the test data, or
by telling the parser the alternate data stream to check).

There could be multiple monitors running.

= Details =
* add a function 'start_monitor' to Fuego core
* function takes a monitor name
* monitors are shell scripts located in fuego-core/monitors
* monitors are run in test context
* they may need to communicate with board control to find the ports
or data for external hardware
* e.g. the control_dev for SDB attached to the board, for power
* it would be nice to be able to run monitors separate from the test system
* this means that there needs to be a monitor abstraction layer
* monitors should be runnable outside test context
* the name 'monitor' implies that something is being watched.
* is it being reported continuosly, or is it being watched for a trigger
* you could imagine a "report-on-condition" monitor, where something was
watched, and a report made if the condition was every met
* similar in spirit to 'ttc wait-for', but with condition logic
* could you use ttc wait-for for 'report-on-condition'? (maybe so)
* the initial implementation concept was to report a stream of data at a periodic interval
* where does the interval come from
* two possibilities:
* defined by the monitor (it reports when it can)
* defined by the test (test requests the monitor frequency (e.g. watch)
* probably allow requests, but if monitor ignores it, the test has to deal with it
* some monitors don't have a frequency - they just gather data
* eg. video or audio
* a monitor could be 'audio-pops' to monitor an audio stream for pops
* this would only report errors, not provide the whole stream

== Where is meta-data for the monitor setup kept? ==
* shouldn't have test know about type of hardware used for monitoring
* this should be in the board control layer
* if there's no board control layer, the data should be in the board file?
* no, this should only have information about the board, not the farm
* the test should be able to call a generic monitor of a particular kind:
* serial monitor, power monitor, audio monitor, video monitor
* the generic monitor should call the particular board's monitor
* e.g. sdb_power_monitor
* this should go into ttc?

== what should the monitor output? ==
* 0-day doesn't worry about this because 0-day monitors track things on the local system
* what does workload automation do?
* does the monitor output have to be well-defined for Fuego to work on it?
* yes.
* different power monitors will provide power data in different formats
* there needs to be a single format for tests to use to analyze power data
* this means there needs to be a converter (or translator) (possibly for each monitor), and a common format spec. Ugh - another one.
* a test should be able to ask for a 'power' monitor, without knowing the
details
* this means that some layer has to map that request to 'sdb_power_monitor' or 'acme_power_monitor'
* a monitors should return a stream of data (in real time), or a file
* it should be a stream of data? and the system should save it to a file if needed
* what about for audio or video?
* what about hardware trace data?
* what about digital analyzer data?

== how should the monitor be defined? ==
* 0day has some very simple monitors, where they just mention a filename in /proc, and the system apparently snapshots it periodically during the test
* that's a handy framework

== aligning monitor data with testlog data ==
There will need to be some kind of system to synchronize the monitor data
with the testlog data.

Solution: time annotations per line.

For now, ignore this issue, but here are some ideas:
* add time-of-day data to the monitor data (in a uniform format)
* add time-of-day data to the test log data (in a uniform format)
* coalesce (or co-chart) the data, so that tests can see the effect of
test operation on the monitored data. (This seems like something
that snmp would have had to deal with - does something like this already
exist?)

== how to monitors stop ==
The test can terminate them manually.

Fuego should have the capability to stop them when the test_run is complete
automatically. This means that Fuego tracks monitors and manages their
lifecycles.

== use of monitor data for test results ==
A monitor could be used for diagnostic data only (not used by the test,
but only for human use for analysis after the test), or the post-processing
step of the test might use the data to indicate the result.

For example, here are two different scenarios:
* run the nuttx prime test, and measure power output (just for information)
* run a power test, with prime as the stress application. In this case
the test is actually examining the power log as the 'test log' for the test.

= Resources =
== 0day ==
0day has a monitors. They are implemented as shell scripts.

See https://github.com/fengguang/lkp-tests/tree/master/monitors

They have features for automatically adding the loop.

A monitor can be a simple as a mention of a file in /proc.
apparently, the framework will sample that file at a regular frequency.

There is some kind of timeout for a monitor.

(I'm not sure whether monitors stop themselves, or are stopped by the system)

There is also some mechanism where the monitor will do setup, then wait
for a signal (not a Unix signal, but some indication) that they should start.

== workload automation ==
wa has a feature called "augmentations" that is similar to monitors (I believe). They have a plugin called an "instrument", that can be used
to gather data during a run (?).

See https://workload-automation.readthedocs.io/en/latest/plugins/instruments.html