Message ID | 167149446946.332657.17186597494532662986.stgit@magnolia (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fstests: improve junit xml reporting | expand |
On 2022/12/20 08:01, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Generate the section report between tests so that the summary report > always reflects the outcome of the most recent test. Two usecases are > envisioned here -- if a cluster-based test runner anticipates that the > testrun could crash the VM, they can set REPORT_DIR to (say) an NFS > mount to preserve the intermediate results. If the VM does indeed > crash, the scheduler can examine the state of the crashed VM and move > the tests to another VM. The second usecase is a reporting agent that > runs in the VM to upload live results to a test dashboard. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > check | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > > diff --git a/check b/check > index 1ff0f44af1..70a0b537b1 100755 > --- a/check > +++ b/check > @@ -842,6 +842,16 @@ function run_section() > fi > seqres="$REPORT_DIR/$seqnum" > > + # Generate the entire section report with whatever test results > + # we have so far. Leave the $sect_time parameter empty so that > + # it's a little more obvious that this test run is incomplete. > + if $do_report; then > + local sect_now=`_wallclock` > + _make_section_report "$section" "${#try[*]}" \ > + "${#bad[*]}" "${#notrun[*]}" \ > + "" &> /dev/null > + fi > + In fact that's exactly what I just wanted to do next... However one problem I can't work around is the timestamp. Currently our timestamp is the finish time, with this change our timestamp is the start time. I'm not sure if there is any strong definition on the timestamp, but personally speaking, timestamp for the start time makes more sense to me. But there may be some users relying on the timestamp to be the ending time? Thanks, Qu > mkdir -p $RESULT_DIR > rm -f ${RESULT_DIR}/require_scratch* > rm -f ${RESULT_DIR}/require_test* >
On Mon, Dec 19, 2022 at 04:01:09PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Generate the section report between tests so that the summary report > always reflects the outcome of the most recent test. Two usecases are > envisioned here -- if a cluster-based test runner anticipates that the > testrun could crash the VM, they can set REPORT_DIR to (say) an NFS > mount to preserve the intermediate results. If the VM does indeed > crash, the scheduler can examine the state of the crashed VM and move > the tests to another VM. The second usecase is a reporting agent that > runs in the VM to upload live results to a test dashboard. Leah has been working on adding crash recovery for gce-xfstests. It'll be interesting to see how her work dovetails with your patches. The basic design we've worked out works by having the test framework recognize whether the VM had been had been previously been running tests. We keep track of the last test that was run by hooking into $LOGGER_PROG. We then use a python script[1] to append to the xunit file a test result for the test that was running at the time of the crash, and we set the test result to "error", and then we resume running tests from where we had left off. [1] https://github.com/lrumancik/xfstests-bld/blob/ltm-auto-resume-new/test-appliance/files/usr/local/bin/add_error_xunit To deal with cases where the kernel has deadlocked, when the test VM is launched by the LTM server, the LTM server will monitor the test VM, if the LTM server notices that the test VM has failed to make forward progress within a set time, it will force the test VM to reboot, at which point the recovery process described above kicks in. Eventually, we'll have the LTM server examine the serial console of the test VM, looking for indications of kernel panics and RCU / soft lockup warnings, so we can more quickly force a reboot when the system under test is clearly unhappy. The advantage of this design is that it doesen't require using NFS to store the results, and in theory we don't even need to use a separate monitoring VM; we could just use a software and kernel watchdogs to notice when the tests have stopped making forward progress. - Ted P.S. We're not using section reporting since we generally use launch separate VM's for each "section" so we can speed up the test run time by sharding across those VM's. And then we have the LTM server merge the results together into a single test run report.
This was actually an alternative proposed instead of moving /tmp to /var/tmp [1]. I applied this patch last week in place of the /var/tmp changes and it is working well with my setup! It is actually nicer than the /var/tmp approach as I don't have to do as much xml formatting in gce-xfstests. - Leah [1] https://lore.kernel.org/fstests/20221212230646.4022294-1-leah.rumancik@gmail.com/
On Tue, Dec 20, 2022 at 09:14:07AM +0800, Qu Wenruo wrote: > > > On 2022/12/20 08:01, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Generate the section report between tests so that the summary report > > always reflects the outcome of the most recent test. Two usecases are > > envisioned here -- if a cluster-based test runner anticipates that the > > testrun could crash the VM, they can set REPORT_DIR to (say) an NFS > > mount to preserve the intermediate results. If the VM does indeed > > crash, the scheduler can examine the state of the crashed VM and move > > the tests to another VM. The second usecase is a reporting agent that > > runs in the VM to upload live results to a test dashboard. > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > --- > > check | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > > > diff --git a/check b/check > > index 1ff0f44af1..70a0b537b1 100755 > > --- a/check > > +++ b/check > > @@ -842,6 +842,16 @@ function run_section() > > fi > > seqres="$REPORT_DIR/$seqnum" > > + # Generate the entire section report with whatever test results > > + # we have so far. Leave the $sect_time parameter empty so that > > + # it's a little more obvious that this test run is incomplete. > > + if $do_report; then > > + local sect_now=`_wallclock` > > + _make_section_report "$section" "${#try[*]}" \ > > + "${#bad[*]}" "${#notrun[*]}" \ > > + "" &> /dev/null > > + fi > > + > > In fact that's exactly what I just wanted to do next... > > However one problem I can't work around is the timestamp. > > Currently our timestamp is the finish time, with this change our timestamp > is the start time. Er... what do you mean exactly by start and finish time? The most recent test? The entire run? From check: if [ -w /dev/kmsg ]; then export date_time=`date +"%F %T"` echo "run fstests $seqnum at $date_time" > /dev/kmsg ... fi From common/report: if [ -z "$date_time" ]; then date_time=$(date +"%F %T") fi ...timestamp="${date_time/ /T}"> The "date_time" variable can be set by ./check just prior to starting each test, but only if /dev/kmsg is a writable file. If it's set, then the timestamp in the xml file reflects the start time of the most recent test. If date_time is not set, then the timestamp in the xml file records the point in time when the report is generated... which I suppose is a good enough estimate if we failed to record date_time when the test was started. > I'm not sure if there is any strong definition on the timestamp, but > personally speaking, timestamp for the start time makes more sense to me. > But there may be some users relying on the timestamp to be the ending time? The junit xml schema says that the timestamp should be "when the test was executed". I think ./check ought to be setting date_time regardless of /dev/kmsg. ATM my test dashboard computes the difference between now and the timestamp= attribute of the last report received to decide if it should flag a test VM as possibly stalled. Though to undercut everything I just wrote, the junit xml schema defines testsuite and testcase as distinct entities and then refers to "the test". Given that fstests doesn't strictly follow that schema anyway, I guess we can define the timestamp as we want, or add more timestamp attributes? <testsuite suite_timestamp="..." timestamp="..." report_timestamp="..."...> --D > Thanks, > Qu > > > mkdir -p $RESULT_DIR > > rm -f ${RESULT_DIR}/require_scratch* > > rm -f ${RESULT_DIR}/require_test* > >
On 2023/2/15 02:46, Darrick J. Wong wrote: > On Tue, Dec 20, 2022 at 09:14:07AM +0800, Qu Wenruo wrote: >> >> >> On 2022/12/20 08:01, Darrick J. Wong wrote: >>> From: Darrick J. Wong <djwong@kernel.org> >>> >>> Generate the section report between tests so that the summary report >>> always reflects the outcome of the most recent test. Two usecases are >>> envisioned here -- if a cluster-based test runner anticipates that the >>> testrun could crash the VM, they can set REPORT_DIR to (say) an NFS >>> mount to preserve the intermediate results. If the VM does indeed >>> crash, the scheduler can examine the state of the crashed VM and move >>> the tests to another VM. The second usecase is a reporting agent that >>> runs in the VM to upload live results to a test dashboard. >>> >>> Signed-off-by: Darrick J. Wong <djwong@kernel.org> >>> --- >>> check | 10 ++++++++++ >>> 1 file changed, 10 insertions(+) >>> >>> >>> diff --git a/check b/check >>> index 1ff0f44af1..70a0b537b1 100755 >>> --- a/check >>> +++ b/check >>> @@ -842,6 +842,16 @@ function run_section() >>> fi >>> seqres="$REPORT_DIR/$seqnum" >>> + # Generate the entire section report with whatever test results >>> + # we have so far. Leave the $sect_time parameter empty so that >>> + # it's a little more obvious that this test run is incomplete. >>> + if $do_report; then >>> + local sect_now=`_wallclock` >>> + _make_section_report "$section" "${#try[*]}" \ >>> + "${#bad[*]}" "${#notrun[*]}" \ >>> + "" &> /dev/null >>> + fi >>> + >> >> In fact that's exactly what I just wanted to do next... >> >> However one problem I can't work around is the timestamp. >> >> Currently our timestamp is the finish time, with this change our timestamp >> is the start time. > > Er... what do you mean exactly by start and finish time? The most > recent test? The entire run? Personally speaking, I'd prefer start/finish time of the whole suite/section at least. > > From check: > > if [ -w /dev/kmsg ]; then > export date_time=`date +"%F %T"` > echo "run fstests $seqnum at $date_time" > /dev/kmsg > ... > fi > > From common/report: > > if [ -z "$date_time" ]; then > date_time=$(date +"%F %T") > fi > > ...timestamp="${date_time/ /T}"> > > The "date_time" variable can be set by ./check just prior to starting > each test, but only if /dev/kmsg is a writable file. If it's set, then > the timestamp in the xml file reflects the start time of the most recent > test. > > If date_time is not set, then the timestamp in the xml file records the > point in time when the report is generated... which I suppose is a good > enough estimate if we failed to record date_time when the test was > started. > >> I'm not sure if there is any strong definition on the timestamp, but >> personally speaking, timestamp for the start time makes more sense to me. >> But there may be some users relying on the timestamp to be the ending time? > > The junit xml schema says that the timestamp should be "when the test > was executed". I think ./check ought to be setting date_time regardless > of /dev/kmsg. ATM my test dashboard computes the difference between now > and the timestamp= attribute of the last report received to decide if it > should flag a test VM as possibly stalled. > > Though to undercut everything I just wrote, the junit xml schema defines > testsuite and testcase as distinct entities and then refers to "the > test". > > Given that fstests doesn't strictly follow that schema anyway, I guess > we can define the timestamp as we want, or add more timestamp > attributes? > > <testsuite suite_timestamp="..." timestamp="..." report_timestamp="..."...> OK, then this looks much better. Thanks, Qu > > --D > >> Thanks, >> Qu >> >>> mkdir -p $RESULT_DIR >>> rm -f ${RESULT_DIR}/require_scratch* >>> rm -f ${RESULT_DIR}/require_test* >>>
diff --git a/check b/check index 1ff0f44af1..70a0b537b1 100755 --- a/check +++ b/check @@ -842,6 +842,16 @@ function run_section() fi seqres="$REPORT_DIR/$seqnum" + # Generate the entire section report with whatever test results + # we have so far. Leave the $sect_time parameter empty so that + # it's a little more obvious that this test run is incomplete. + if $do_report; then + local sect_now=`_wallclock` + _make_section_report "$section" "${#try[*]}" \ + "${#bad[*]}" "${#notrun[*]}" \ + "" &> /dev/null + fi + mkdir -p $RESULT_DIR rm -f ${RESULT_DIR}/require_scratch* rm -f ${RESULT_DIR}/require_test*