diff mbox series

[3/3] check: add --print-start-done to enhance watchdogs

Message ID 20240216181859.788521-4-mcgrof@kernel.org (mailing list archive)
State New, archived
Headers show
Series few enhancements | expand

Commit Message

Luis Chamberlain Feb. 16, 2024, 6:18 p.m. UTC
fstests specific watchdogs want to know when the full test suite will
start and end. Right now the kernel ring buffer can get augmented but we
can't know for sure if it was due to a test or some odd hardware issue
after fstests ran. This is specially true for systems left running tests in
loops in automation where we are not running things ourselves but rather just
get access to kernel logs, or for filesystem runner watdogs such as the one
in kdevops [0]. It is also often not easy to determine for sure based on
just logs when fstests check really has completed unless we have a
matching log of who spawned that test runner. Although we could keep track of
this ourselves by an annotation locally on the test runner, it is useful to
have independent tools which are not attached to the process which spawned
check to just peak into a system and verify the system's progress with
fstests by just using the kernel log. Keeping this in the test target kernel
ring buffer enables these use cases.

This is useful for example for filesyste checker specific watchdogs like the
one in kdevops so that the watchdog knows when to start hunting for crashes
based just on the kernel ring buffer, and so it also knows when the show is
over.

[0] https://github.com/linux-kdevops/kdevops/blob/master/scripts/workflows/fstests/fstests_watchdog.py

Demo:

root@demo-xfs-reflink /var/lib/xfstests # dmesg -c > /dev/null
root@demo-xfs-reflink /var/lib/xfstests # ./check -s xfs_reflink --print-start-done generic/002
SECTION       -- xfs_reflink
RECREATING    -- xfs on /dev/loop16
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/x86_64 demo-xfs-reflink 6.5.0-5-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.13-1 (2023-11-29)
MKFS_OPTIONS  -- -f -f -m reflink=1,rmapbt=1, -i sparse=1, /dev/loop5
MOUNT_OPTIONS -- /dev/loop5 /media/scratch

generic/002        1s
Ran: generic/002
Passed all 1 tests

SECTION       -- xfs_reflink
=========================
Ran: generic/002
Passed all 1 tests

root@demo-xfs-reflink /var/lib/xfstests # dmesg -c
[61392.525562] XFS (loop16): Mounting V5 Filesystem 20c3bdf7-69d8-42b2-8a7c-fccff0949dcc
[61392.536059] XFS (loop16): Ending clean mount
[61392.684417] run fstests fstestsstart/000 at 2024-02-16 17:51:28
[61392.726709] XFS (loop16): Unmounting Filesystem 20c3bdf7-69d8-42b2-8a7c-fccff0949dcc
[61392.779791] XFS (loop16): Mounting V5 Filesystem ce4188e8-8da8-474a-acb4-b1d8c3e7edb7
[61392.791217] XFS (loop16): Ending clean mount
[61393.328386] XFS (loop5): Mounting V5 Filesystem 6e3f6c64-5b48-41b6-8810-d2a41fbcd125
[61393.340019] XFS (loop5): Ending clean mount
[61393.347636] XFS (loop5): Unmounting Filesystem 6e3f6c64-5b48-41b6-8810-d2a41fbcd125
[61393.403519] XFS (loop16): Unmounting Filesystem ce4188e8-8da8-474a-acb4-b1d8c3e7edb7
[61393.456945] XFS (loop16): Mounting V5 Filesystem ce4188e8-8da8-474a-acb4-b1d8c3e7edb7
[61393.466506] XFS (loop16): Ending clean mount
[61393.504926] run fstests generic/002 at 2024-02-16 17:51:29
[61394.579638] XFS (loop16): Unmounting Filesystem ce4188e8-8da8-474a-acb4-b1d8c3e7edb7
[61394.637296] XFS (loop16): Mounting V5 Filesystem ce4188e8-8da8-474a-acb4-b1d8c3e7edb7
[61394.646365] XFS (loop16): Ending clean mount
[61394.762667] XFS (loop16): Unmounting Filesystem ce4188e8-8da8-474a-acb4-b1d8c3e7edb7
[61394.790581] run fstests fstestsdone/000 at 2024-02-16 17:51:30

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 check | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Dave Chinner Feb. 19, 2024, 3:42 a.m. UTC | #1
On Fri, Feb 16, 2024 at 10:18:59AM -0800, Luis Chamberlain wrote:
> fstests specific watchdogs want to know when the full test suite will
> start and end. Right now the kernel ring buffer can get augmented but we
> can't know for sure if it was due to a test or some odd hardware issue
> after fstests ran. This is specially true for systems left running tests in
> loops in automation where we are not running things ourselves but rather just
> get access to kernel logs, or for filesystem runner watdogs such as the one
> in kdevops [0]. It is also often not easy to determine for sure based on
> just logs when fstests check really has completed unless we have a
> matching log of who spawned that test runner. Although we could keep track of
> this ourselves by an annotation locally on the test runner, it is useful to
> have independent tools which are not attached to the process which spawned
> check to just peak into a system and verify the system's progress with
> fstests by just using the kernel log. Keeping this in the test target kernel
> ring buffer enables these use cases.
> 
> This is useful for example for filesyste checker specific watchdogs like the
> one in kdevops so that the watchdog knows when to start hunting for crashes
> based just on the kernel ring buffer, and so it also knows when the show is
> over.

Why can't the runner that requires timing information in the
kernel log just emit a message to the kernel log before it
runs check and again immediately after completion of the check
script?

-Dave.
Luis Chamberlain Feb. 21, 2024, 4:46 p.m. UTC | #2
On Mon, Feb 19, 2024 at 02:42:32PM +1100, Dave Chinner wrote:
> On Fri, Feb 16, 2024 at 10:18:59AM -0800, Luis Chamberlain wrote:
> > fstests specific watchdogs want to know when the full test suite will
> > start and end. Right now the kernel ring buffer can get augmented but we
> > can't know for sure if it was due to a test or some odd hardware issue
> > after fstests ran. This is specially true for systems left running tests in
> > loops in automation where we are not running things ourselves but rather just
> > get access to kernel logs, or for filesystem runner watdogs such as the one
> > in kdevops [0]. It is also often not easy to determine for sure based on
> > just logs when fstests check really has completed unless we have a
> > matching log of who spawned that test runner. Although we could keep track of
> > this ourselves by an annotation locally on the test runner, it is useful to
> > have independent tools which are not attached to the process which spawned
> > check to just peak into a system and verify the system's progress with
> > fstests by just using the kernel log. Keeping this in the test target kernel
> > ring buffer enables these use cases.
> > 
> > This is useful for example for filesyste checker specific watchdogs like the
> > one in kdevops so that the watchdog knows when to start hunting for crashes
> > based just on the kernel ring buffer, and so it also knows when the show is
> > over.
> 
> Why can't the runner that requires timing information in the
> kernel log just emit a message to the kernel log before it
> runs check and again immediately after completion of the check
> script?

That's exactly what is done today, it just seemed to me that since this
has been useful to a test runner now for years, it might make sense to
generalize this.
 
  Luis
diff mbox series

Patch

diff --git a/check b/check
index 523cf024c139..b3fdda57f665 100755
--- a/check
+++ b/check
@@ -19,6 +19,7 @@  have_test_arg=false
 randomize=false
 exact_order=false
 start_after_test=""
+print_start_done=false
 list_group_tests=false
 export here=`pwd`
 xfile=""
@@ -84,6 +85,7 @@  check options
     --large-fs		optimise scratch device for large filesystems
     --list-group-tests	only list tests part of the groups you specified, do not run the tests
     --start-after	only start testing after the test specified
+    --print-start-done  append to /dev/kmsg when available test start and end time
     -s section		run only specified section from config file
     -S section		exclude the specified section from the config file
     -L <n>		loop tests <n> times following a failure, measuring aggregate pass/fail metrics
@@ -379,6 +381,11 @@  while [ $# -gt 0 ]; do
 	--list-group-tests)
 		list_group_tests=true
 		;;
+	--print-start-done)
+		if [ -w /dev/kmsg ]; then
+			print_start_done=true
+		fi
+		;;
 	-s)	RUN_SECTION="$RUN_SECTION $2"; shift ;;
 	-S)	EXCLUDE_SECTION="$EXCLUDE_SECTION $2"; shift ;;
 	-l)	diff="diff" ;;
@@ -1161,6 +1168,11 @@  function run_section()
 	_scratch_unmount 2> /dev/null
 }
 
+if $print_start_done; then
+	start_time=`date +"%F %T"`
+	echo "run fstests fstestsstart/000 at $start_time" > /dev/kmsg
+fi
+
 for ((iters = 0; iters < $iterations; iters++)) do
 	for section in $HOST_OPTIONS_SECTIONS; do
 		run_section $section
@@ -1172,6 +1184,11 @@  for ((iters = 0; iters < $iterations; iters++)) do
 	done
 done
 
+if $print_start_done; then
+	end_time=`date +"%F %T"`
+	echo "run fstests fstestsdone/000 at $end_time" > /dev/kmsg
+fi
+
 interrupt=false
 status=`expr $sum_bad != 0`
 exit