Fuego wiki

Test log output notes in 'raw' format

Here are some common log output formats:

See also [[Other test systems]] and [[Test results formats]]

== Discusssion from Fuego list ==
Victor Rodriquez wrote (on November 8, 2016):{{BR}}
(see [[https://lists.linuxfoundation.org/pipermail/fuego/2016-November/000103.html|here]] for discussion thread.)

This week I presented a case of study for the problem of lack of test
log output standardization in the majority of packages that are used
to build the current Linux distributions. This was presented as a BOF
( https://www.linuxplumbersconf.org/2016/ocw/proposals/3555) during
the Linux Plumbers Conference.

it was a productive discussion that let us share the problem that we
have in the current projects that we use every day to build a
distribution ( either in embedded as in a cloud base distribution).
The open source projects don't follow a standard output log format to
print the passing and failing tests that they run during packaging
time ( "make test" or "make check" )

The Clear Linux project is using a simple Perl script that helps them
to count the number of passing and failing tests (which should be
trivial if could have a single standard output among all the projects,
but we don’t):

https://github.com/clearlinux/autospec/blob/master/autospec/count.pl

# perl count.pl <build.log>

Examples of real packages build logs:

https://kojipkgs.fedoraproject.org//packages/gcc/6.2.1/2.fc25/data/logs/x86_64/build.log
https://kojipkgs.fedoraproject.org//packages/acl/2.2.52/11.fc24/data/logs/x86_64/build.log

So far that simple (and not well engineered) parser has found 26
"standard" outputs ( and counting ) . The script has the fail that it
does not recognize the name of the tests in order to detect
regressions. Maybe one test was passing in the previous release and in
the new one is failing, and then the number of failing tests remains
the same.

To be honest, before presenting at LPC I was very confident that this
script ( or another version of it , much smarter ) could be beginning
of the solution to the problem we have. However, during the discussion
at LPC I understand that this might be a huge effort (not sure if
bigger) in order to solve the nightmare we already have.

----

Tim Bird responded:
A few remarks about this. This will be something of a stream of ideas, not
very well organized. I'd like to prevent requiring too many different
language skills in Fuego. In order to write a test for Fuego, we already require
knowledge of shell script, python (for the benchmark parsers) and json formats
(for the test specs and plans). I'd be hesitant to adopt something in perl, but maybe
there's a way to leverage the expertise embedded in your script.

I'm not that fond of the idea of integrating all the parsers into a single program.
I think it's conceptually simpler to have a parser per log file format. However,
I haven't looked in detail at your parser, so I can't really comment on it's
complexity. I note that 0day has a parser per test (but I haven't checked to
see if they re-use common parsers between tests.) Possibly some combination
of code-driven and data-driven parsers is best, but I don't have the experience
you guys do with your parser.

If I understood your presentation, you are currently parsing
logs for thousands of packages. I thought you said that about half of the
20,000 packages in a distro have unit tests, and I thought you said that
your parser was covering about half of those (so, about 5000 packages currently).
And this is with 26 log formats parsed so far.

I'm guessing that packages have a "long tail" of formats, with them getting
weirder and weirder the farther out on the tail of formats you get.

Please correct my numbers if I'm mistaken.

{{{#!IndentPre
indent=2
> So far that simple (and not well engineered) parser has found 26
> “standard” outputs ( and counting ) .
}}}

This is actually remarkable, as Fuego is only handing the formats for the
standalone tests we ship with Fuego. As I stated in the BOF, we have two
mechanisms, one for functional tests that uses shell, grep and diff, and
one for benchmark tests that uses a very small python program that uses
regexes. So, currently we only have 50 tests covered, but many of these
parsers use very simple one-line grep regexes.

Neither of these Fuego log results parser methods supports tracking individual
subtest results.

{{{#!IndentPre
indent=2
> The script has the fail that it
> does not recognize the name of the tests in order to detect
> regressions. Maybe one test was passing in the previous release and in
> the new one is failing, and then the number of failing tests remains
> the same.
}}}

This is a concern with the Fuego log parsing as well.

I would like to modify Fuego's parser to not just parse out counts, but to
also convert the results to something where individual sub-tests can be
tracked over time. Daniel Sangorrin's recent work converting the output
of LTP into excel format might be one way to do this (although I'm not
that comfortable with using a proprietary format - I would prefer CSV
or json, but I think Daniel is going for ease of use first.)

I need to do some more research, but I'm hoping that there are Jenkins
plugins (maybe xUnit) that will provide tools to automatically handle
visualization of test and sub-test results over time. If so, I might
try converting the Fuego parsers to produce that format.

...

I do think we share the goal of producing a standard, or at least a recommendation,
for a common test log output format. This would help the industry going forward.
Even if individual tests don't produce the standard format, it will help 3rd parties
write parsers that conform the test output to the format, as well as encourage the
development of tools that utilize the format for visualization or regression checking.

Do you feel confident enough to propose a format? I don't at the moment.
I'd like to survey the industry for 1) existing formats produced by tests (which you have good experience
with, which is already maybe capture well by your perl script), and 2) existing tools
that use common formats as input (e.g. the Jenkins xunit plugin). From this I'd like
to develop some ideas about the fields that are most commonly used, and a good language to
express those fields. My preference would be JSON - I'm something of an XML naysayer, but
I could be talked into YAML. Under no circumstances do I want to invent a new language for
this.

...

Here is how I propose moving forward on this. I'd like to get a group together to study this
issue. I wrote down a list of people at LPC who seem to be working on test issues. I'd like to
do the following:
* perform a survey of the areas I mentioned above
* write up a draft spec
* send it around for comments (to what individual and lists? is an open issue)
* discuss it at a future face-to-face meeting (probably at ELC or maybe next year's plumbers)
* publish it as a standard endorsed by the Linux Foundation

----
Victor wrote later:

After talking with Guillermo we came to the idea of move our parsers
to the Fuego modules

We are going to attack this problem with two solutions, happy to hear feeadback

* 1) Merge the parsers we have into the Fuego infrastructure
* 2) Provide an API to the new developers ( and current maintainers of
the existing packages ) to check if their logs are easy to track
* 'easy to track' means that we can get the status and name of each test
* if the parser can't read the log file we suggest the developer to fit their test to a standard ( as CMAKE or autotools )

To be honest it seems like a Titanic work to change all the packages
to a standard log output ( specially since there are things from the
80's ) but we can make the new ones fit the standards we have and
sugest the maintainers to fit into one.

Tim , I think that we should make a call for action to the linux
comunity , do you think a publication might be useful ? maybe LWN or
someplace else ?

----

= Discussion summary =
* the ClearLinux project has a program count.pl (perl script) which has embedded in it about 26 different parsers for log formats, and can produce counts of passing and failing
tests, based on build logs and test logs (produced using 'make' and 'make test' for the packages.
* it produces text output with a comma-separate list of numbers
* something like '<package>,100,80,20,0,0'
* visualization is done by combining the CSV files and creating graphs from the data
* Fuego 1.0 does not provide counts or fancy visualization at the moment (pass/fail at the level of a Jenkins job (Fuego test), and plots for some Benchmark measures.
* There are some existing systems for testing packages in debian and yocto:
* https://packages.debian.org/sid/autopkgtest
* https://wiki.yoctoproject.org/wiki/Ptest

* essential elements of a good output format are:
* per testcase:
* status
* test identifier (string)
* duration (?)