FrontPage 

Fuego wiki

Login or create account

Issue 0073 in 'raw' format

; Summary: Add partial results (by calling post-test) when a test times out
; Owner: Tim
; Reporter: Tim
; Status: open
; Priority: high

= Description =
As of Fuego version 1.3, when a job times out, ftc will just kill it
immediately.  This ends up not gathering the test log from the target
board, and misses an opportunity to show diagnostic information
which would help determine the cause of the timeout.

Maybe the timeout just needs to be a bit longer.  Maybe the board hung.
Maybe the test program hung.  Without a log, or any attempt to gather
additional information (like - is the board still up?, or what does the
serial console say?), then it's hard to diagnose the problem.

Modify the code in ftc to send a signal (SIGTERM) to the test, rather
than kill it immediately, to allow it to call post_test and possibly
provide some partial testcase results, or diagnose the problem.

= Notes =
I tried this in the 1.4 release, but it ended up that the test showed
partial results that resulted up in a "SUCCESS" test result.   This can
happen when you have a log_compare($TESTDIR, "not ok", 0, "n") and partial
results are all successful (but incomplete) (e.g. 5 out of 6 testcase results
are in the log, all successful, but testcase 6 was not reported due to a
timeout).

Here is the code I tried:
{{{
$ git diff
diff --git a/engine/scripts/ftc b/engine/scripts/ftc
index 92e1d3f..a029a10 100755
--- a/engine/scripts/ftc
+++ b/engine/scripts/ftc
@@ -3161,12 +3161,24 @@ def ftc_exec_command(command, timeout):
             time.sleep(.1)
 
     except FTC_INTERRUPT:
-        print "Job interrupted!"
+        print("Job interrupted! Timeout of '%s' was exceeded." % timeout)
+        # try graceful termination of test - this may allow post_test to run
+        p.send_signal(signal.SIGTERM)
+        for i in range(3):
+            if p.poll() == None:
+                time.sleep(10)
+                p.send_signal(signal.SIGTERM)
+
         # abort with prejudice...
-        p.kill()
+        if p.poll() == None:
+            p.kill()
+
+
     finally:
+        # cancel the timeout alarm
         signal.alarm(0)
 
+    dprint("test return code is %d" % p.returncode)
     return p.returncode
}}}

Probably, we need to modify post_test operations in functions.sh, so
that when a job is interrupted it is not possible to return success
from main.sh (even if the parser or log_compare show no testcase
fail results).

; backlink: [[Fuego Issues List]]


TBWiki engine 1.8.3 by Tim Bird