diff mbox

[RFC] check: try to fix the test device if it gets corrupted

Message ID 20170302232050.31125-1-tytso@mit.edu (mailing list archive)
State New, archived
Headers show

Commit Message

Theodore Ts'o March 2, 2017, 11:20 p.m. UTC
If the test device gets corrupted all subsequent tests will fail.  To
prevent this from causing all subsequent tests to be useless, try
repair the file system on TEST_DEV if possible.  We don't need to do
this with the scratch device since that file system gets recreated
each time anyway.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---

This is a quick hack that I needed while debubgging some research
code[1].  It turns out that when the grad student is up against a
paper deadline is an, this becomes amazing evolutionary process which
creates file system modifications which are optimized for running
postmark and file bench --- and falls over very easily otherwise.  So
when TEST_DEV is getting corrupted very frequently, it's nice to be
able to continue running other tests in the quick or auto group.

So please consider this a proof-concept-patch; would people consider
it worthwhile to have this in xfstests upstream?

 check     |  6 +++++-
 common/rc | 22 ++++++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

Comments

Eryu Guan March 3, 2017, 9:03 a.m. UTC | #1
On Thu, Mar 02, 2017 at 06:20:50PM -0500, Theodore Ts'o wrote:
> If the test device gets corrupted all subsequent tests will fail.  To
> prevent this from causing all subsequent tests to be useless, try
> repair the file system on TEST_DEV if possible.  We don't need to do
> this with the scratch device since that file system gets recreated
> each time anyway.
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
> 
> This is a quick hack that I needed while debubgging some research
> code[1].  It turns out that when the grad student is up against a
> paper deadline is an, this becomes amazing evolutionary process which
> creates file system modifications which are optimized for running
> postmark and file bench --- and falls over very easily otherwise.  So
> when TEST_DEV is getting corrupted very frequently, it's nice to be
> able to continue running other tests in the quick or auto group.
> 
> So please consider this a proof-concept-patch; would people consider
> it worthwhile to have this in xfstests upstream?

This idea looks reasonable to me, and TEST_DEV is supposed to be aging,
perhaps being currupted & repaired is kind of aging too :)

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong March 3, 2017, 5:21 p.m. UTC | #2
On Fri, Mar 03, 2017 at 05:03:32PM +0800, Eryu Guan wrote:
> On Thu, Mar 02, 2017 at 06:20:50PM -0500, Theodore Ts'o wrote:
> > If the test device gets corrupted all subsequent tests will fail.  To
> > prevent this from causing all subsequent tests to be useless, try
> > repair the file system on TEST_DEV if possible.  We don't need to do
> > this with the scratch device since that file system gets recreated
> > each time anyway.
> > 
> > Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> > ---
> > 
> > This is a quick hack that I needed while debubgging some research
> > code[1].  It turns out that when the grad student is up against a
> > paper deadline is an, this becomes amazing evolutionary process which
> > creates file system modifications which are optimized for running
> > postmark and file bench --- and falls over very easily otherwise.  So
> > when TEST_DEV is getting corrupted very frequently, it's nice to be
> > able to continue running other tests in the quick or auto group.
> > 
> > So please consider this a proof-concept-patch; would people consider
> > it worthwhile to have this in xfstests upstream?
> 
> This idea looks reasonable to me, and TEST_DEV is supposed to be aging,
> perhaps being currupted & repaired is kind of aging too :)

<shrug> The test device isn't supposed to get corrupted, since it (at
least in theory) should be an old filesystem.  That said, I suppose
there's little point in banging around with a corrupt test fs.  Maybe we
could go further and stop running if there's unfixable corruption?

--D

> 
> Thanks,
> Eryu
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o March 3, 2017, 11:01 p.m. UTC | #3
On Fri, Mar 03, 2017 at 09:21:57AM -0800, Darrick J. Wong wrote:
> 
> <shrug> The test device isn't supposed to get corrupted, since it (at
> least in theory) should be an old filesystem.  That said, I suppose
> there's little point in banging around with a corrupt test fs.  Maybe we
> could go further and stop running if there's unfixable corruption?

Yes, that was the other alternative I considered.  In my case, though,
since I'm trying to get a sense of how many failures I have to deal
with, I really wanted a "make -k" behavior that would continue after
the first failure.  After all, all I was going to do was manually run
fsck, and then continue the run --- so we might as well have the check
script do it automatically and then allow things to continue.

We could make it be configurable, via a command-line option.  The -k
option isn't taken so we could have check -k that works like make -k
if you think that's better.  OTOH, perhaps making -k the default
behaviour is actually the better way to go, and in that case, maybe
it's not worth having the command-line flag?

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/check b/check
index 5d7f75c4..62f8ff41 100755
--- a/check
+++ b/check
@@ -455,7 +455,11 @@  _summary()
 _check_filesystems()
 {
 	if [ -f ${RESULT_DIR}/require_test ]; then
-		_check_test_fs || err=true
+		if ! _check_test_fs ; then
+		    err=true
+		    echo "Trying to repair broken TEST_DEV file system"
+		    _repair_test_fs
+		fi
 		rm -f ${RESULT_DIR}/require_test*
 	fi
 	if [ -f ${RESULT_DIR}/require_scratch ]; then
diff --git a/common/rc b/common/rc
index 2be55e6f..60af86fc 100644
--- a/common/rc
+++ b/common/rc
@@ -1098,6 +1098,28 @@  _repair_scratch_fs()
     esac
 }
 
+_repair_test_fs()
+{
+    case $FSTYP in
+    ext2|ext3|ext4)
+        fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1
+	if test $? -ge 4 ; then
+	    echo "_repair_test_fs: couldn't repair filesystem on $device (see $seqres.full)"
+
+	    echo "_repair_test_fs: couldn't repair filesystem on $device" >>$seqres.full
+	    echo "*** fsck.$FSTYP output ***"	>>$seqres.full
+            cat $tmp.repair			>>$seqres.full
+	    echo "*** end fsck.$FSTYP output"	>>$seqres.full
+	    return 1
+	fi
+	return 0
+	;;
+    *)
+	return 1
+	;;
+    esac
+}
+
 _get_pids_by_name()
 {
     if [ $# -ne 1 ]