Issue 0073 in 'raw' format
; Summary: Add partial results (by calling post-test) when a test times out ; Owner: Tim ; Reporter: Tim ; Status: open ; Priority: high = Description = As of Fuego version 1.3, when a job times out, ftc will just kill it immediately. This ends up not gathering the test log from the target board, and misses an opportunity to show diagnostic information which would help determine the cause of the timeout. Maybe the timeout just needs to be a bit longer. Maybe the board hung. Maybe the test program hung. Without a log, or any attempt to gather additional information (like - is the board still up?, or what does the serial console say?), then it's hard to diagnose the problem. Modify the code in ftc to send a signal (SIGTERM) to the test, rather than kill it immediately, to allow it to call post_test and possibly provide some partial testcase results, or diagnose the problem. = Notes = I tried this in the 1.4 release, but it ended up that the test showed partial results that resulted up in a "SUCCESS" test result. This can happen when you have a log_compare($TESTDIR, "not ok", 0, "n") and partial results are all successful (but incomplete) (e.g. 5 out of 6 testcase results are in the log, all successful, but testcase 6 was not reported due to a timeout). Here is the code I tried: {{{ $ git diff diff --git a/engine/scripts/ftc b/engine/scripts/ftc index 92e1d3f..a029a10 100755 --- a/engine/scripts/ftc +++ b/engine/scripts/ftc @@ -3161,12 +3161,24 @@ def ftc_exec_command(command, timeout): time.sleep(.1) except FTC_INTERRUPT: - print "Job interrupted!" + print("Job interrupted! Timeout of '%s' was exceeded." % timeout) + # try graceful termination of test - this may allow post_test to run + p.send_signal(signal.SIGTERM) + for i in range(3): + if p.poll() == None: + time.sleep(10) + p.send_signal(signal.SIGTERM) + # abort with prejudice... - p.kill() + if p.poll() == None: + p.kill() + + finally: + # cancel the timeout alarm signal.alarm(0) + dprint("test return code is %d" % p.returncode) return p.returncode }}} Probably, we need to modify post_test operations in functions.sh, so that when a job is interrupted it is not possible to return success from main.sh (even if the parser or log_compare show no testcase fail results). ; backlink: [[Fuego Issues List]]