diff mbox

[V2] xfs: Test infinite loop while searching for a free inode slot

Message ID 20170817133943.10774-1-cmaiolino@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Carlos Maiolino Aug. 17, 2017, 1:39 p.m. UTC
Tests the search algorithm for a free inode slot in a specific AG, done
in xfs_dialloc_ag_inobt().

When finobt is not used, and agi->freecount is not 0, XFS will scan the AG inode
tree looking for a free inode slot, but if agi->freecount is corrupted,
and there is no free slot at all, it will end up in an infinite loop.

This test checks for the infinite loop fix.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
---

Changelog:

V2:
	- Small clean up
	- use _disable_dmesg_check (the test will generate error
				    messages which will appear in dmesg)
	- Get the amount of free inodes from the AGI instead of assuming
	  61 free inodes
	- Send a few outputs to $seqres.full
	- Use xfs_db `write` without -d option once this will create
	  incompatibility with previous xfsprogs, and is not strictly
	  required for the test to work.
	- Check for finobt, and explicitly disable it if it is
	  supported by the kernel being tested.

 tests/xfs/057     | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/057.out |  2 ++
 tests/xfs/group   |  1 +
 3 files changed, 89 insertions(+)
 create mode 100755 tests/xfs/057
 create mode 100644 tests/xfs/057.out

Comments

Eryu Guan Aug. 18, 2017, 5:48 a.m. UTC | #1
On Thu, Aug 17, 2017 at 03:39:43PM +0200, Carlos Maiolino wrote:
> Tests the search algorithm for a free inode slot in a specific AG, done
> in xfs_dialloc_ag_inobt().
> 
> When finobt is not used, and agi->freecount is not 0, XFS will scan the AG inode
> tree looking for a free inode slot, but if agi->freecount is corrupted,
> and there is no free slot at all, it will end up in an infinite loop.
> 
> This test checks for the infinite loop fix.
> 
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
> 
> Changelog:
> 
> V2:
> 	- Small clean up
> 	- use _disable_dmesg_check (the test will generate error
> 				    messages which will appear in dmesg)
> 	- Get the amount of free inodes from the AGI instead of assuming
> 	  61 free inodes
> 	- Send a few outputs to $seqres.full
> 	- Use xfs_db `write` without -d option once this will create
> 	  incompatibility with previous xfsprogs, and is not strictly
> 	  required for the test to work.
> 	- Check for finobt, and explicitly disable it if it is
> 	  supported by the kernel being tested.

Thanks a lot for the updated version! It looks much better than v1 :)

Though I hit another issue while testing this patch, please see below.

> 
>  tests/xfs/057     | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/057.out |  2 ++
>  tests/xfs/group   |  1 +
>  3 files changed, 89 insertions(+)
>  create mode 100755 tests/xfs/057
>  create mode 100644 tests/xfs/057.out
> 
> diff --git a/tests/xfs/057 b/tests/xfs/057
> new file mode 100755
> index 00000000..9833ba25
> --- /dev/null
> +++ b/tests/xfs/057
> @@ -0,0 +1,86 @@
> +#! /bin/bash
> +# FS QA Test 057
> +#
> +# Check if the filesystem will lockup when trying to allocate a new inode in
> +# an AG with no free inodes but with a corrupted agi->freecount showing free inodes.
> +#
> +# At the end of the test, the scratch device will purposely be in a corrupted
> +# state, so there is no need for checking that.
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2017 Red Hat, Inc.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch_nocheck
> +_disable_dmesg_check
> +
> +# Make sure we disable finobt if the filesystem supports it, otherwise, just
> +# initialize it with default options.
> +_scratch_mkfs | grep -q "finobt=1" && _scratch_mkfs -m "finobt=0" >/dev/null 2>&1
> +
> +# Get the amount of free inodes from the AGI 0, so we can fill up the freecount
> +# structure.
> +freecount=`_scratch_xfs_db -c "agi 0" -c "p freecount" | cut -d' ' -f 3`
> +
> +_scratch_mount
> +
> +# At the end of filesystem's initialization, AG 0 will have $freecount free
> +# inodes in the agi->freecount, create $freecount extra dummy files to fill it.
> +for i in `seq 1 $freecount`; do
> +	touch $SCRATCH_MNT/dummy_file$i
> +done
> +
> +_scratch_unmount >/dev/null 2>&1
> +
> +# agi->freecount is 0 here, corrupt it to show extra free inodes
> +$XFS_DB_PROG -x -c "agi 0" -c "write freecount 10" $SCRATCH_DEV >> $seqres.full 2>&1

Minor nit, please use _scratch_xfs_db helper.

> +
> +_scratch_mount
> +
> +# Lock up a buggy kernel
> +touch $SCRATCH_MNT/lockupfile >> $seqres.full 2>&1

I applied your patch

Subject: [PATCH] Stop searching for free slots in an inode chunk when there are none

on top of 4.13-rc5 kernel, test passed with a non-debug built xfs, but
with debug built xfs, creating new inode on corrupted freecount xfs
crashes the kernel.

XFS: Assertion failed: freecount == be32_to_cpu(agi->agi_freecount), file: fs/xfs/libxfs/xfs_ialloc.c, line: 246

Perhaps test should _notrun with CONFIG_XFS_DEBUG on?

Thanks,
Eryu

> +
> +# If we reach this point, the filesystem is fixed
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/xfs/057.out b/tests/xfs/057.out
> new file mode 100644
> index 00000000..185023c7
> --- /dev/null
> +++ b/tests/xfs/057.out
> @@ -0,0 +1,2 @@
> +QA output created by 057
> +Silence is golden
> diff --git a/tests/xfs/group b/tests/xfs/group
> index cf876a29..37e4e641 100644
> --- a/tests/xfs/group
> +++ b/tests/xfs/group
> @@ -54,6 +54,7 @@
>  054 auto quick
>  055 dump ioctl remote tape
>  056 dump ioctl auto quick
> +057 auto quick fuzzers dangerous
>  059 dump ioctl auto quick
>  060 dump ioctl auto quick
>  061 dump ioctl auto quick
> -- 
> 2.13.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Carlos Maiolino Aug. 18, 2017, 7:35 a.m. UTC | #2
Hi Eryu,

> Thanks a lot for the updated version! It looks much better than v1 :)
> 
> Though I hit another issue while testing this patch, please see below.
> 
> > +
> > +# agi->freecount is 0 here, corrupt it to show extra free inodes
> > +$XFS_DB_PROG -x -c "agi 0" -c "write freecount 10" $SCRATCH_DEV >> $seqres.full 2>&1
> 
> Minor nit, please use _scratch_xfs_db helper.
> 

NP at all
> > +
> > +_scratch_mount
> > +
> > +# Lock up a buggy kernel
> > +touch $SCRATCH_MNT/lockupfile >> $seqres.full 2>&1
> 
> 
> XFS: Assertion failed: freecount == be32_to_cpu(agi->agi_freecount), file: fs/xfs/libxfs/xfs_ialloc.c, line: 246
> 
> Perhaps test should _notrun with CONFIG_XFS_DEBUG on?
> 

The Assert is expected.

We have debug code to catch the agi->agi_freecount corruption which will explode
in this assert, such code is debug only though because it will search the
whole btree and 'manually' count how many free slots are in the tree, then
compare with agi_freecount.

Although, I'm not sure if you notice, but even you hit this assert, you will
not be able to unmount the filesystem.

I'm not sure now if simply disable this test on XFS_DEBUG is the best approach,
hitting the bug in XFS_DEBUG is less catastrophic because it will not lockup,
but will make the FS unmountable.

I'd simply disable it in XFS_DEBUG, once the bug is exactly the same as with non
debug code, but with a different behavior. Not sure though if there are people
running xfstests exclusively on XFS_DEBUG.

Any thoughts?

Cheers

> Thanks,
> Eryu
> 
> > +
> > +# If we reach this point, the filesystem is fixed
> > +echo "Silence is golden"
> > +status=0
> > +exit
> > diff --git a/tests/xfs/057.out b/tests/xfs/057.out
> > new file mode 100644
> > index 00000000..185023c7
> > --- /dev/null
> > +++ b/tests/xfs/057.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 057
> > +Silence is golden
> > diff --git a/tests/xfs/group b/tests/xfs/group
> > index cf876a29..37e4e641 100644
> > --- a/tests/xfs/group
> > +++ b/tests/xfs/group
> > @@ -54,6 +54,7 @@
> >  054 auto quick
> >  055 dump ioctl remote tape
> >  056 dump ioctl auto quick
> > +057 auto quick fuzzers dangerous
> >  059 dump ioctl auto quick
> >  060 dump ioctl auto quick
> >  061 dump ioctl auto quick
> > -- 
> > 2.13.5
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eryu Guan Aug. 18, 2017, 7:55 a.m. UTC | #3
On Fri, Aug 18, 2017 at 09:35:25AM +0200, Carlos Maiolino wrote:
> Hi Eryu,
> 
> > Thanks a lot for the updated version! It looks much better than v1 :)
> > 
> > Though I hit another issue while testing this patch, please see below.
> > 
> > > +
> > > +# agi->freecount is 0 here, corrupt it to show extra free inodes
> > > +$XFS_DB_PROG -x -c "agi 0" -c "write freecount 10" $SCRATCH_DEV >> $seqres.full 2>&1
> > 
> > Minor nit, please use _scratch_xfs_db helper.
> > 
> 
> NP at all
> > > +
> > > +_scratch_mount
> > > +
> > > +# Lock up a buggy kernel
> > > +touch $SCRATCH_MNT/lockupfile >> $seqres.full 2>&1
> > 
> > 
> > XFS: Assertion failed: freecount == be32_to_cpu(agi->agi_freecount), file: fs/xfs/libxfs/xfs_ialloc.c, line: 246
> > 
> > Perhaps test should _notrun with CONFIG_XFS_DEBUG on?
> > 
> 
> The Assert is expected.
> 
> We have debug code to catch the agi->agi_freecount corruption which will explode
> in this assert, such code is debug only though because it will search the
> whole btree and 'manually' count how many free slots are in the tree, then
> compare with agi_freecount.

Yeah, I know it's expected, we turn on fatal assert in debug mode.

> 
> Although, I'm not sure if you notice, but even you hit this assert, you will
> not be able to unmount the filesystem.
> 
> I'm not sure now if simply disable this test on XFS_DEBUG is the best approach,
> hitting the bug in XFS_DEBUG is less catastrophic because it will not lockup,
> but will make the FS unmountable.

It crashed my host immediately, because I have
/proc/sys/kernel/panic_on_oops set to 1.

Anyway, an unmountable fs is as bad as a crash, they all prevent
subsequent tests from running. The underlying bug is fixed, we expect
test to pass, not crash the host or block further testings.

> 
> I'd simply disable it in XFS_DEBUG, once the bug is exactly the same as with non
> debug code, but with a different behavior. Not sure though if there are people
> running xfstests exclusively on XFS_DEBUG.

I believe running xfstests with XFS_DEBUG is pretty common.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Carlos Maiolino Aug. 18, 2017, 8:10 a.m. UTC | #4
Hi!

> 
> > 
> > Although, I'm not sure if you notice, but even you hit this assert, you will
> > not be able to unmount the filesystem.
> > 
> > I'm not sure now if simply disable this test on XFS_DEBUG is the best approach,
> > hitting the bug in XFS_DEBUG is less catastrophic because it will not lockup,
> > but will make the FS unmountable.
> 
> It crashed my host immediately, because I have
> /proc/sys/kernel/panic_on_oops set to 1.
> 
> Anyway, an unmountable fs is as bad as a crash, they all prevent
> subsequent tests from running. The underlying bug is fixed, we expect
> test to pass, not crash the host or block further testings.
> 

Do you have any suggestion about it? The test will purposely corrupt the FS,
which will certainly trigger this assert (it's a NO-OP without XFS_DEBUG btw),
I've already added it to dangerous group though.
> > 
> > I'd simply disable it in XFS_DEBUG, once the bug is exactly the same as with non
> > debug code, but with a different behavior. Not sure though if there are people
> > running xfstests exclusively on XFS_DEBUG.
> 
> I believe running xfstests with XFS_DEBUG is pretty common.
> 

The assert is triggered even before the fixed code can catch the corruption, so,
I think if you are not comfortable with it crashing the system, unless we remove
the assert from the code (which I believe might be done once the fix is merged),
I'de suggest not running this test with XFS_DEBUG.

cheers.

> Thanks,
> Eryu
Eryu Guan Aug. 18, 2017, 8:26 a.m. UTC | #5
On Fri, Aug 18, 2017 at 10:10:44AM +0200, Carlos Maiolino wrote:
> Hi!
> 
> > 
> > > 
> > > Although, I'm not sure if you notice, but even you hit this assert, you will
> > > not be able to unmount the filesystem.
> > > 
> > > I'm not sure now if simply disable this test on XFS_DEBUG is the best approach,
> > > hitting the bug in XFS_DEBUG is less catastrophic because it will not lockup,
> > > but will make the FS unmountable.
> > 
> > It crashed my host immediately, because I have
> > /proc/sys/kernel/panic_on_oops set to 1.
> > 
> > Anyway, an unmountable fs is as bad as a crash, they all prevent
> > subsequent tests from running. The underlying bug is fixed, we expect
> > test to pass, not crash the host or block further testings.
> > 
> 
> Do you have any suggestion about it? The test will purposely corrupt the FS,
> which will certainly trigger this assert (it's a NO-OP without XFS_DEBUG btw),
> I've already added it to dangerous group though.

I think adding a _require_no_xfs_debug helper in common/xfs and calling
it in the new test should work (please rename if you have a better
name), e.g.

_require_no_xfs_debug()
{
	if grep -q "debug 1" /proc/fs/xfs/stat; then
		_notrun "Require XFS built without CONFIG_XFS_DEBUG"
	fi
}

> > > 
> > > I'd simply disable it in XFS_DEBUG, once the bug is exactly the same as with non
> > > debug code, but with a different behavior. Not sure though if there are people
> > > running xfstests exclusively on XFS_DEBUG.
> > 
> > I believe running xfstests with XFS_DEBUG is pretty common.
> > 
> 
> The assert is triggered even before the fixed code can catch the corruption, so,
> I think if you are not comfortable with it crashing the system, unless we remove
> the assert from the code (which I believe might be done once the fix is merged),
> I'de suggest not running this test with XFS_DEBUG.

Yeah, _notrun if XFS_DEBUG is on :)

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Carlos Maiolino Aug. 18, 2017, 8:35 a.m. UTC | #6
> 
> I think adding a _require_no_xfs_debug helper in common/xfs and calling
> it in the new test should work (please rename if you have a better
> name), e.g.
> 
> _require_no_xfs_debug()
> {
> 	if grep -q "debug 1" /proc/fs/xfs/stat; then
> 		_notrun "Require XFS built without CONFIG_XFS_DEBUG"
> 	fi
> }
> 

sounds a pretty decent name for me.
> > > > 
> > > > I'd simply disable it in XFS_DEBUG, once the bug is exactly the same as with non
> > > > debug code, but with a different behavior. Not sure though if there are people
> > > > running xfstests exclusively on XFS_DEBUG.
> > > 
> > > I believe running xfstests with XFS_DEBUG is pretty common.
> > > 
> > 
> > The assert is triggered even before the fixed code can catch the corruption, so,
> > I think if you are not comfortable with it crashing the system, unless we remove
> > the assert from the code (which I believe might be done once the fix is merged),
> > I'de suggest not running this test with XFS_DEBUG.
> 
> Yeah, _notrun if XFS_DEBUG is on :)

Sure, I'll add the above helper as a patch before my test, thanks
> 
> Thanks,
> Eryu
diff mbox

Patch

diff --git a/tests/xfs/057 b/tests/xfs/057
new file mode 100755
index 00000000..9833ba25
--- /dev/null
+++ b/tests/xfs/057
@@ -0,0 +1,86 @@ 
+#! /bin/bash
+# FS QA Test 057
+#
+# Check if the filesystem will lockup when trying to allocate a new inode in
+# an AG with no free inodes but with a corrupted agi->freecount showing free inodes.
+#
+# At the end of the test, the scratch device will purposely be in a corrupted
+# state, so there is no need for checking that.
+#-----------------------------------------------------------------------
+# Copyright (c) 2017 Red Hat, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_scratch_nocheck
+_disable_dmesg_check
+
+# Make sure we disable finobt if the filesystem supports it, otherwise, just
+# initialize it with default options.
+_scratch_mkfs | grep -q "finobt=1" && _scratch_mkfs -m "finobt=0" >/dev/null 2>&1
+
+# Get the amount of free inodes from the AGI 0, so we can fill up the freecount
+# structure.
+freecount=`_scratch_xfs_db -c "agi 0" -c "p freecount" | cut -d' ' -f 3`
+
+_scratch_mount
+
+# At the end of filesystem's initialization, AG 0 will have $freecount free
+# inodes in the agi->freecount, create $freecount extra dummy files to fill it.
+for i in `seq 1 $freecount`; do
+	touch $SCRATCH_MNT/dummy_file$i
+done
+
+_scratch_unmount >/dev/null 2>&1
+
+# agi->freecount is 0 here, corrupt it to show extra free inodes
+$XFS_DB_PROG -x -c "agi 0" -c "write freecount 10" $SCRATCH_DEV >> $seqres.full 2>&1
+
+_scratch_mount
+
+# Lock up a buggy kernel
+touch $SCRATCH_MNT/lockupfile >> $seqres.full 2>&1
+
+# If we reach this point, the filesystem is fixed
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/xfs/057.out b/tests/xfs/057.out
new file mode 100644
index 00000000..185023c7
--- /dev/null
+++ b/tests/xfs/057.out
@@ -0,0 +1,2 @@ 
+QA output created by 057
+Silence is golden
diff --git a/tests/xfs/group b/tests/xfs/group
index cf876a29..37e4e641 100644
--- a/tests/xfs/group
+++ b/tests/xfs/group
@@ -54,6 +54,7 @@ 
 054 auto quick
 055 dump ioctl remote tape
 056 dump ioctl auto quick
+057 auto quick fuzzers dangerous
 059 dump ioctl auto quick
 060 dump ioctl auto quick
 061 dump ioctl auto quick