[1/8] check: generate section reports between tests

Message ID	167149446946.332657.17186597494532662986.stgit@magnolia (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-xfs-owner@kernel.org> Subject: [PATCH 1/8] check: generate section reports between tests From: "Darrick J. Wong" <djwong@kernel.org> To: djwong@kernel.org, zlang@redhat.com Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org, guan@eryu.me, leah.rumancik@gmail.com, quwenruo.btrfs@gmx.com Date: Mon, 19 Dec 2022 16:01:09 -0800 Message-ID: <167149446946.332657.17186597494532662986.stgit@magnolia> In-Reply-To: <167149446381.332657.9402608531757557463.stgit@magnolia> References: <167149446381.332657.9402608531757557463.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk
Series	fstests: improve junit xml reporting \| expand [PATCHSET,RFC,0/8] fstests: improve junit xml reporting [1/8] check: generate section reports between tests [2/8] report: derive an xml schema for the xunit report [3/8] report: capture the time zone in the test report timestamp [4/8] report: sort properties by name [5/8] report: pass property value to _xunit_add_property [6/8] report: collect basic information about a test run [7/8] report: record xfs-specific information about a test run [8/8] report: record ext*-specific information about a test run

Darrick J. Wong Dec. 20, 2022, 12:01 a.m. UTC

From: Darrick J. Wong <djwong@kernel.org>

Generate the section report between tests so that the summary report
always reflects the outcome of the most recent test.  Two usecases are
envisioned here -- if a cluster-based test runner anticipates that the
testrun could crash the VM, they can set REPORT_DIR to (say) an NFS
mount to preserve the intermediate results.  If the VM does indeed
crash, the scheduler can examine the state of the crashed VM and move
the tests to another VM.  The second usecase is a reporting agent that
runs in the VM to upload live results to a test dashboard.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 check |   10 ++++++++++
 1 file changed, 10 insertions(+)

Qu Wenruo Dec. 20, 2022, 1:14 a.m. UTC | #1

On 2022/12/20 08:01, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Generate the section report between tests so that the summary report
> always reflects the outcome of the most recent test.  Two usecases are
> envisioned here -- if a cluster-based test runner anticipates that the
> testrun could crash the VM, they can set REPORT_DIR to (say) an NFS
> mount to preserve the intermediate results.  If the VM does indeed
> crash, the scheduler can examine the state of the crashed VM and move
> the tests to another VM.  The second usecase is a reporting agent that
> runs in the VM to upload live results to a test dashboard.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>   check |   10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> 
> diff --git a/check b/check
> index 1ff0f44af1..70a0b537b1 100755
> --- a/check
> +++ b/check
> @@ -842,6 +842,16 @@ function run_section()
>   		fi
>   		seqres="$REPORT_DIR/$seqnum"
>   
> +		# Generate the entire section report with whatever test results
> +		# we have so far.  Leave the $sect_time parameter empty so that
> +		# it's a little more obvious that this test run is incomplete.
> +		if $do_report; then
> +			local sect_now=`_wallclock`
> +			_make_section_report "$section" "${#try[*]}" \
> +					     "${#bad[*]}" "${#notrun[*]}" \
> +					     "" &> /dev/null
> +		fi
> +

In fact that's exactly what I just wanted to do next...

However one problem I can't work around is the timestamp.

Currently our timestamp is the finish time, with this change our 
timestamp is the start time.

I'm not sure if there is any strong definition on the timestamp, but 
personally speaking, timestamp for the start time makes more sense to me.
But there may be some users relying on the timestamp to be the ending time?

Thanks,
Qu

>   		mkdir -p $RESULT_DIR
>   		rm -f ${RESULT_DIR}/require_scratch*
>   		rm -f ${RESULT_DIR}/require_test*
>

Theodore Ts'o Dec. 20, 2022, 3:16 a.m. UTC | #2

On Mon, Dec 19, 2022 at 04:01:09PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Generate the section report between tests so that the summary report
> always reflects the outcome of the most recent test.  Two usecases are
> envisioned here -- if a cluster-based test runner anticipates that the
> testrun could crash the VM, they can set REPORT_DIR to (say) an NFS
> mount to preserve the intermediate results.  If the VM does indeed
> crash, the scheduler can examine the state of the crashed VM and move
> the tests to another VM.  The second usecase is a reporting agent that
> runs in the VM to upload live results to a test dashboard.

Leah has been working on adding crash recovery for gce-xfstests.
It'll be interesting to see how her work dovetails with your patches.

The basic design we've worked out works by having the test framework
recognize whether the VM had been had been previously been running
tests.  We keep track of the last test that was run by hooking into
$LOGGER_PROG.  We then use a python script[1] to append to the xunit file
a test result for the test that was running at the time of the crash,
and we set the test result to "error", and then we resume running
tests from where we had left off.

[1] https://github.com/lrumancik/xfstests-bld/blob/ltm-auto-resume-new/test-appliance/files/usr/local/bin/add_error_xunit

To deal with cases where the kernel has deadlocked, when the test VM
is launched by the LTM server, the LTM server will monitor the test
VM, if the LTM server notices that the test VM has failed to make
forward progress within a set time, it will force the test VM to
reboot, at which point the recovery process described above kicks in.

Eventually, we'll have the LTM server examine the serial console of
the test VM, looking for indications of kernel panics and RCU / soft
lockup warnings, so we can more quickly force a reboot when the system
under test is clearly unhappy.

The advantage of this design is that it doesen't require using NFS to
store the results, and in theory we don't even need to use a separate
monitoring VM; we could just use a software and kernel watchdogs to
notice when the tests have stopped making forward progress.

       	    	      	   	   - Ted

P.S.  We're not using section reporting since we generally use launch
separate VM's for each "section" so we can speed up the test run time
by sharding across those VM's.  And then we have the LTM server merge
the results together into a single test run report.

Leah Rumancik Dec. 20, 2022, 6:18 p.m. UTC | #3

This was actually an alternative proposed instead of moving /tmp to
/var/tmp [1]. I applied this patch last week in place of the /var/tmp
changes and it is working well with my setup! It is actually nicer than
the /var/tmp approach as I don't have to do as much xml formatting in
gce-xfstests.

- Leah

[1] https://lore.kernel.org/fstests/20221212230646.4022294-1-leah.rumancik@gmail.com/

Darrick J. Wong Feb. 14, 2023, 6:46 p.m. UTC | #4

On Tue, Dec 20, 2022 at 09:14:07AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/12/20 08:01, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Generate the section report between tests so that the summary report
> > always reflects the outcome of the most recent test.  Two usecases are
> > envisioned here -- if a cluster-based test runner anticipates that the
> > testrun could crash the VM, they can set REPORT_DIR to (say) an NFS
> > mount to preserve the intermediate results.  If the VM does indeed
> > crash, the scheduler can examine the state of the crashed VM and move
> > the tests to another VM.  The second usecase is a reporting agent that
> > runs in the VM to upload live results to a test dashboard.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >   check |   10 ++++++++++
> >   1 file changed, 10 insertions(+)
> > 
> > 
> > diff --git a/check b/check
> > index 1ff0f44af1..70a0b537b1 100755
> > --- a/check
> > +++ b/check
> > @@ -842,6 +842,16 @@ function run_section()
> >   		fi
> >   		seqres="$REPORT_DIR/$seqnum"
> > +		# Generate the entire section report with whatever test results
> > +		# we have so far.  Leave the $sect_time parameter empty so that
> > +		# it's a little more obvious that this test run is incomplete.
> > +		if $do_report; then
> > +			local sect_now=`_wallclock`
> > +			_make_section_report "$section" "${#try[*]}" \
> > +					     "${#bad[*]}" "${#notrun[*]}" \
> > +					     "" &> /dev/null
> > +		fi
> > +
> 
> In fact that's exactly what I just wanted to do next...
> 
> However one problem I can't work around is the timestamp.
> 
> Currently our timestamp is the finish time, with this change our timestamp
> is the start time.

Er... what do you mean exactly by start and finish time?  The most
recent test?  The entire run?

From check:

	if [ -w /dev/kmsg ]; then
		export date_time=`date +"%F %T"`
		echo "run fstests $seqnum at $date_time" > /dev/kmsg
		...
	fi

From common/report:

	if [ -z "$date_time" ]; then
		date_time=$(date +"%F %T")
	fi

	...timestamp="${date_time/ /T}">

The "date_time" variable can be set by ./check just prior to starting
each test, but only if /dev/kmsg is a writable file.  If it's set, then
the timestamp in the xml file reflects the start time of the most recent
test.

If date_time is not set, then the timestamp in the xml file records the
point in time when the report is generated... which I suppose is a good
enough estimate if we failed to record date_time when the test was
started.

> I'm not sure if there is any strong definition on the timestamp, but
> personally speaking, timestamp for the start time makes more sense to me.
> But there may be some users relying on the timestamp to be the ending time?

The junit xml schema says that the timestamp should be "when the test
was executed".  I think ./check ought to be setting date_time regardless
of /dev/kmsg.  ATM my test dashboard computes the difference between now
and the timestamp= attribute of the last report received to decide if it
should flag a test VM as possibly stalled.

Though to undercut everything I just wrote, the junit xml schema defines
testsuite and testcase as distinct entities and then refers to "the
test".

Given that fstests doesn't strictly follow that schema anyway, I guess
we can define the timestamp as we want, or add more timestamp
attributes?

<testsuite suite_timestamp="..." timestamp="..." report_timestamp="..."...>

--D

> Thanks,
> Qu
> 
> >   		mkdir -p $RESULT_DIR
> >   		rm -f ${RESULT_DIR}/require_scratch*
> >   		rm -f ${RESULT_DIR}/require_test*
> >

Qu Wenruo Feb. 15, 2023, 2:06 a.m. UTC | #5

On 2023/2/15 02:46, Darrick J. Wong wrote:
> On Tue, Dec 20, 2022 at 09:14:07AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2022/12/20 08:01, Darrick J. Wong wrote:
>>> From: Darrick J. Wong <djwong@kernel.org>
>>>
>>> Generate the section report between tests so that the summary report
>>> always reflects the outcome of the most recent test.  Two usecases are
>>> envisioned here -- if a cluster-based test runner anticipates that the
>>> testrun could crash the VM, they can set REPORT_DIR to (say) an NFS
>>> mount to preserve the intermediate results.  If the VM does indeed
>>> crash, the scheduler can examine the state of the crashed VM and move
>>> the tests to another VM.  The second usecase is a reporting agent that
>>> runs in the VM to upload live results to a test dashboard.
>>>
>>> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
>>> ---
>>>    check |   10 ++++++++++
>>>    1 file changed, 10 insertions(+)
>>>
>>>
>>> diff --git a/check b/check
>>> index 1ff0f44af1..70a0b537b1 100755
>>> --- a/check
>>> +++ b/check
>>> @@ -842,6 +842,16 @@ function run_section()
>>>    		fi
>>>    		seqres="$REPORT_DIR/$seqnum"
>>> +		# Generate the entire section report with whatever test results
>>> +		# we have so far.  Leave the $sect_time parameter empty so that
>>> +		# it's a little more obvious that this test run is incomplete.
>>> +		if $do_report; then
>>> +			local sect_now=`_wallclock`
>>> +			_make_section_report "$section" "${#try[*]}" \
>>> +					     "${#bad[*]}" "${#notrun[*]}" \
>>> +					     "" &> /dev/null
>>> +		fi
>>> +
>>
>> In fact that's exactly what I just wanted to do next...
>>
>> However one problem I can't work around is the timestamp.
>>
>> Currently our timestamp is the finish time, with this change our timestamp
>> is the start time.
> 
> Er... what do you mean exactly by start and finish time?  The most
> recent test?  The entire run?

Personally speaking, I'd prefer start/finish time of the whole 
suite/section at least.

> 
>  From check:
> 
> 	if [ -w /dev/kmsg ]; then
> 		export date_time=`date +"%F %T"`
> 		echo "run fstests $seqnum at $date_time" > /dev/kmsg
> 		...
> 	fi
> 
>  From common/report:
> 
> 	if [ -z "$date_time" ]; then
> 		date_time=$(date +"%F %T")
> 	fi
> 
> 	...timestamp="${date_time/ /T}">
> 
> The "date_time" variable can be set by ./check just prior to starting
> each test, but only if /dev/kmsg is a writable file.  If it's set, then
> the timestamp in the xml file reflects the start time of the most recent
> test.
> 
> If date_time is not set, then the timestamp in the xml file records the
> point in time when the report is generated... which I suppose is a good
> enough estimate if we failed to record date_time when the test was
> started. >
>> I'm not sure if there is any strong definition on the timestamp, but
>> personally speaking, timestamp for the start time makes more sense to me.
>> But there may be some users relying on the timestamp to be the ending time?
> 
> The junit xml schema says that the timestamp should be "when the test
> was executed".  I think ./check ought to be setting date_time regardless
> of /dev/kmsg.  ATM my test dashboard computes the difference between now
> and the timestamp= attribute of the last report received to decide if it
> should flag a test VM as possibly stalled.
> 
> Though to undercut everything I just wrote, the junit xml schema defines
> testsuite and testcase as distinct entities and then refers to "the
> test".
> 
> Given that fstests doesn't strictly follow that schema anyway, I guess
> we can define the timestamp as we want, or add more timestamp
> attributes?
> 
> <testsuite suite_timestamp="..." timestamp="..." report_timestamp="..."...>

OK, then this looks much better.

Thanks,
Qu

> 
> --D
> 
>> Thanks,
>> Qu
>>
>>>    		mkdir -p $RESULT_DIR
>>>    		rm -f ${RESULT_DIR}/require_scratch*
>>>    		rm -f ${RESULT_DIR}/require_test*
>>>

[1/8] check: generate section reports between tests

Commit Message

Comments

Patch