Message ID | 1515737783-2061-2-git-send-email-yangx.jy@cn.fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote: > If log stripe unit isn't a multiple of the fs blocksize and mounting, > the invalid sb_logsunit leads to crash as soon as we try to write to > the log. > > Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com> > --- > tests/xfs/437 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/xfs/437.out | 2 ++ > tests/xfs/group | 1 + > 3 files changed, 76 insertions(+) > create mode 100755 tests/xfs/437 > create mode 100644 tests/xfs/437.out > > diff --git a/tests/xfs/437 b/tests/xfs/437 > new file mode 100755 > index 0000000..f2b84ad > --- /dev/null > +++ b/tests/xfs/437 > @@ -0,0 +1,73 @@ > +#! /bin/bash > +# FS QA Test No. 437 > +# > +# Regression test for commit: > +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize") > +# > +# If log stripe unit isn't a multiple of the fs blocksize and mounting, > +# the invalid sb_logsunit leads to crash as soon as we try to write to > +# the log. > +# > +#----------------------------------------------------------------------- > +# Copyright (c) 2018 FUJITSU. All Rights Reserved. > +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com> > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#----------------------------------------------------------------------- > + > +seq=`basename "$0"` > +seqres="$RESULT_DIR/$seq" > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + rm -rf $tmp.* > +} > + > +# get standard environment and checks > +. ./common/rc > + > +# real QA test starts here > +_supported_os Linux > +_supported_fs xfs > +_require_scratch This test triggers ASSERT failure and warning on debug build, thus failed _dmesg_check, I think we need _disable_dmesg_check (and some comments) too. [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort! [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679 [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs] > + > +rm -f "$seqres.full" > + > +# Format > +_scratch_mkfs > $seqres.full 2>&1 || _fail "mkfs failed" > + > +# Set logsunit to a value which is not a multiple of the fs blocksize > +blksz=$(_scratch_xfs_get_sb_field blocksize) > +_scratch_xfs_set_sb_field logsunit $((blksz - 1)) >> $seqres.full 2>&1 \ > + || _notrun "failed to set sb_logsunit" It seems xfs_db doesn't return non-zero on failure, so _notrun won'be be triggered, you may want to read the logsuint value again to see if it's set correctly. > + > +# Mount and writing log may trigger a crash > +if _scratch_mount >> $seqres.full 2>&1; then Better to mention that with the fix applied kernel refuses to mount, so we know a mount failure is expected. > + for i in $(seq 1 1000); do > + touch ${SCRATCH_MNT}/$i echo > ${SCRATCH_MNT}/$i might be faster Thanks, Eryu > + done > + _scratch_unmount > +fi > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/xfs/437.out b/tests/xfs/437.out > new file mode 100644 > index 0000000..4dcb607 > --- /dev/null > +++ b/tests/xfs/437.out > @@ -0,0 +1,2 @@ > +QA output created by 437 > +Silence is golden > diff --git a/tests/xfs/group b/tests/xfs/group > index d230060..35d1b03 100644 > --- a/tests/xfs/group > +++ b/tests/xfs/group > @@ -434,3 +434,4 @@ > 434 auto quick clone fsr > 435 auto quick clone > 436 auto quick clone fsr > +437 auto quick log dangerous > -- > 1.8.3.1 > > > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote: > On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote: > > If log stripe unit isn't a multiple of the fs blocksize and mounting, > > the invalid sb_logsunit leads to crash as soon as we try to write to > > the log. > > > > Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com> > > --- > > tests/xfs/437 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > tests/xfs/437.out | 2 ++ > > tests/xfs/group | 1 + > > 3 files changed, 76 insertions(+) > > create mode 100755 tests/xfs/437 > > create mode 100644 tests/xfs/437.out > > > > diff --git a/tests/xfs/437 b/tests/xfs/437 > > new file mode 100755 > > index 0000000..f2b84ad > > --- /dev/null > > +++ b/tests/xfs/437 > > @@ -0,0 +1,73 @@ > > +#! /bin/bash > > +# FS QA Test No. 437 > > +# > > +# Regression test for commit: > > +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize") > > +# > > +# If log stripe unit isn't a multiple of the fs blocksize and mounting, > > +# the invalid sb_logsunit leads to crash as soon as we try to write to > > +# the log. > > +# > > +#----------------------------------------------------------------------- > > +# Copyright (c) 2018 FUJITSU. All Rights Reserved. > > +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com> > > +# > > +# This program is free software; you can redistribute it and/or > > +# modify it under the terms of the GNU General Public License as > > +# published by the Free Software Foundation. > > +# > > +# This program is distributed in the hope that it would be useful, > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > +# GNU General Public License for more details. > > +# > > +# You should have received a copy of the GNU General Public License > > +# along with this program; if not, write the Free Software Foundation, > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > +#----------------------------------------------------------------------- > > + > > +seq=`basename "$0"` > > +seqres="$RESULT_DIR/$seq" > > +echo "QA output created by $seq" > > + > > +here=`pwd` > > +tmp=/tmp/$$ > > +status=1 # failure is the default! > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > + > > +_cleanup() > > +{ > > + rm -rf $tmp.* > > +} > > + > > +# get standard environment and checks > > +. ./common/rc > > + > > +# real QA test starts here > > +_supported_os Linux > > +_supported_fs xfs > > +_require_scratch > > This test triggers ASSERT failure and warning on debug build, thus > failed _dmesg_check, I think we need _disable_dmesg_check (and some > comments) too. > > [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size > [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort! > [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679 > [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs] If it triggers debug asserts which have been put there to catch bugs in other utilities (like mkfs) on a current upstream debug kernel, then the test should not be part of the auto and quick groups. Cheers, Dave.
On Fri, Jan 12, 2018 at 07:36:05PM +1100, Dave Chinner wrote: > On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote: > > On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote: > > > If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > the invalid sb_logsunit leads to crash as soon as we try to write to > > > the log. > > > > > > Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com> > > > --- > > > tests/xfs/437 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > tests/xfs/437.out | 2 ++ > > > tests/xfs/group | 1 + > > > 3 files changed, 76 insertions(+) > > > create mode 100755 tests/xfs/437 > > > create mode 100644 tests/xfs/437.out > > > > > > diff --git a/tests/xfs/437 b/tests/xfs/437 > > > new file mode 100755 > > > index 0000000..f2b84ad > > > --- /dev/null > > > +++ b/tests/xfs/437 > > > @@ -0,0 +1,73 @@ > > > +#! /bin/bash > > > +# FS QA Test No. 437 > > > +# > > > +# Regression test for commit: > > > +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize") > > > +# > > > +# If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > +# the invalid sb_logsunit leads to crash as soon as we try to write to > > > +# the log. > > > +# > > > +#----------------------------------------------------------------------- > > > +# Copyright (c) 2018 FUJITSU. All Rights Reserved. > > > +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com> > > > +# > > > +# This program is free software; you can redistribute it and/or > > > +# modify it under the terms of the GNU General Public License as > > > +# published by the Free Software Foundation. > > > +# > > > +# This program is distributed in the hope that it would be useful, > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > > +# GNU General Public License for more details. > > > +# > > > +# You should have received a copy of the GNU General Public License > > > +# along with this program; if not, write the Free Software Foundation, > > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > > +#----------------------------------------------------------------------- > > > + > > > +seq=`basename "$0"` > > > +seqres="$RESULT_DIR/$seq" > > > +echo "QA output created by $seq" > > > + > > > +here=`pwd` > > > +tmp=/tmp/$$ > > > +status=1 # failure is the default! > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > + > > > +_cleanup() > > > +{ > > > + rm -rf $tmp.* > > > +} > > > + > > > +# get standard environment and checks > > > +. ./common/rc > > > + > > > +# real QA test starts here > > > +_supported_os Linux > > > +_supported_fs xfs > > > +_require_scratch > > > > This test triggers ASSERT failure and warning on debug build, thus > > failed _dmesg_check, I think we need _disable_dmesg_check (and some > > comments) too. > > > > [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size > > [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort! > > [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679 > > [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs] > > If it triggers debug asserts which have been put there to catch bugs > in other utilities (like mkfs) on a current upstream debug kernel, > then the test should not be part of the auto and quick groups. This test corrupts the filesystem on purpose (I didn't make this clear in my last reply, sorry about that), and xfs detects the corruption & refuses the mount instead of crashing. So I think the ASSERT failure is expected on debug build and we just need to ignore it. And the test reproduces the crash quickly and reliably on buggy kernel and passes (if we ignore the ASSERT failure) on patched kernel, it serves as a good 'auto' and 'quick' regression test to me. Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 12, 2018 at 04:50:05PM +0800, Eryu Guan wrote: > On Fri, Jan 12, 2018 at 07:36:05PM +1100, Dave Chinner wrote: > > On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote: > > > On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote: > > > > If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > the invalid sb_logsunit leads to crash as soon as we try to write to > > > > the log. > > > > > > > > Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com> > > > > --- > > > > tests/xfs/437 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > tests/xfs/437.out | 2 ++ > > > > tests/xfs/group | 1 + > > > > 3 files changed, 76 insertions(+) > > > > create mode 100755 tests/xfs/437 > > > > create mode 100644 tests/xfs/437.out > > > > > > > > diff --git a/tests/xfs/437 b/tests/xfs/437 > > > > new file mode 100755 > > > > index 0000000..f2b84ad > > > > --- /dev/null > > > > +++ b/tests/xfs/437 > > > > @@ -0,0 +1,73 @@ > > > > +#! /bin/bash > > > > +# FS QA Test No. 437 > > > > +# > > > > +# Regression test for commit: > > > > +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize") > > > > +# > > > > +# If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > +# the invalid sb_logsunit leads to crash as soon as we try to write to > > > > +# the log. > > > > +# > > > > +#----------------------------------------------------------------------- > > > > +# Copyright (c) 2018 FUJITSU. All Rights Reserved. > > > > +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com> > > > > +# > > > > +# This program is free software; you can redistribute it and/or > > > > +# modify it under the terms of the GNU General Public License as > > > > +# published by the Free Software Foundation. > > > > +# > > > > +# This program is distributed in the hope that it would be useful, > > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > > > +# GNU General Public License for more details. > > > > +# > > > > +# You should have received a copy of the GNU General Public License > > > > +# along with this program; if not, write the Free Software Foundation, > > > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > > > +#----------------------------------------------------------------------- > > > > + > > > > +seq=`basename "$0"` > > > > +seqres="$RESULT_DIR/$seq" > > > > +echo "QA output created by $seq" > > > > + > > > > +here=`pwd` > > > > +tmp=/tmp/$$ > > > > +status=1 # failure is the default! > > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > > + > > > > +_cleanup() > > > > +{ > > > > + rm -rf $tmp.* > > > > +} > > > > + > > > > +# get standard environment and checks > > > > +. ./common/rc > > > > + > > > > +# real QA test starts here > > > > +_supported_os Linux > > > > +_supported_fs xfs > > > > +_require_scratch > > > > > > This test triggers ASSERT failure and warning on debug build, thus > > > failed _dmesg_check, I think we need _disable_dmesg_check (and some > > > comments) too. > > > > > > [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size > > > [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort! > > > [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679 > > > [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs] > > > > If it triggers debug asserts which have been put there to catch bugs > > in other utilities (like mkfs) on a current upstream debug kernel, > > then the test should not be part of the auto and quick groups. This particular assert triggers because we rejected the log geometry options thta we just read off the disk. I don't think this scenario is that much different from a block verifier rejecting a block (two printks and an assert(0) for a corrupt field seem a bit much to me but clearly someone wanted that to fail noisily), so I think it better just to filter this particular assertion. Frankly I think I'd just as soon get rid of the ASSERT since I've been moving the disk verifiers away from ASSERT->BUGing on garbage disk contents. > This test corrupts the filesystem on purpose (I didn't make this clear > in my last reply, sorry about that), and xfs detects the corruption & > refuses the mount instead of crashing. So I think the ASSERT failure is > expected on debug build and we just need to ignore it. > > And the test reproduces the crash quickly and reliably on buggy kernel > and passes (if we ignore the ASSERT failure) on patched kernel, it > serves as a good 'auto' and 'quick' regression test to me. It ought to be in 'fuzzers' too... --D > Thanks, > Eryu > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 12, 2018 at 04:50:05PM +0800, Eryu Guan wrote: > On Fri, Jan 12, 2018 at 07:36:05PM +1100, Dave Chinner wrote: > > On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote: > > > On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote: > > > > If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > the invalid sb_logsunit leads to crash as soon as we try to write to > > > > the log. > > > > > > > > Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com> > > > > --- > > > > tests/xfs/437 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > tests/xfs/437.out | 2 ++ > > > > tests/xfs/group | 1 + > > > > 3 files changed, 76 insertions(+) > > > > create mode 100755 tests/xfs/437 > > > > create mode 100644 tests/xfs/437.out > > > > > > > > diff --git a/tests/xfs/437 b/tests/xfs/437 > > > > new file mode 100755 > > > > index 0000000..f2b84ad > > > > --- /dev/null > > > > +++ b/tests/xfs/437 > > > > @@ -0,0 +1,73 @@ > > > > +#! /bin/bash > > > > +# FS QA Test No. 437 > > > > +# > > > > +# Regression test for commit: > > > > +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize") > > > > +# > > > > +# If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > +# the invalid sb_logsunit leads to crash as soon as we try to write to > > > > +# the log. > > > > +# > > > > +#----------------------------------------------------------------------- > > > > +# Copyright (c) 2018 FUJITSU. All Rights Reserved. > > > > +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com> > > > > +# > > > > +# This program is free software; you can redistribute it and/or > > > > +# modify it under the terms of the GNU General Public License as > > > > +# published by the Free Software Foundation. > > > > +# > > > > +# This program is distributed in the hope that it would be useful, > > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > > > +# GNU General Public License for more details. > > > > +# > > > > +# You should have received a copy of the GNU General Public License > > > > +# along with this program; if not, write the Free Software Foundation, > > > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > > > +#----------------------------------------------------------------------- > > > > + > > > > +seq=`basename "$0"` > > > > +seqres="$RESULT_DIR/$seq" > > > > +echo "QA output created by $seq" > > > > + > > > > +here=`pwd` > > > > +tmp=/tmp/$$ > > > > +status=1 # failure is the default! > > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > > + > > > > +_cleanup() > > > > +{ > > > > + rm -rf $tmp.* > > > > +} > > > > + > > > > +# get standard environment and checks > > > > +. ./common/rc > > > > + > > > > +# real QA test starts here > > > > +_supported_os Linux > > > > +_supported_fs xfs > > > > +_require_scratch > > > > > > This test triggers ASSERT failure and warning on debug build, thus > > > failed _dmesg_check, I think we need _disable_dmesg_check (and some > > > comments) too. > > > > > > [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size > > > [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort! > > > [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679 > > > [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs] > > > > If it triggers debug asserts which have been put there to catch bugs > > in other utilities (like mkfs) on a current upstream debug kernel, > > then the test should not be part of the auto and quick groups. > > This test corrupts the filesystem on purpose (I didn't make this clear > in my last reply, sorry about that), and xfs detects the corruption & > refuses the mount instead of crashing. So I think the ASSERT failure is > expected on debug build and we just need to ignore it. Except that after that assert, the block device is now busy and can't be released, so we can't use the scratch device anymore. All tests after this assert has been triggered are going to fail because the block device is busy.... That's the whole point of adding debug asserts in cases like this - they are supposed to stop test execution in it's tracks and leave a corpse to analyse. The auto group regression tests are not supposed to take the machine down on normal test configs (i.e. CONFIG_XFS_DEBUG=y). > And the test reproduces the crash quickly and reliably on buggy kernel > and passes (if we ignore the ASSERT failure) on patched kernel, it > serves as a good 'auto' and 'quick' regression test to me. Sure, I'm not saying that it doesn't work as a regression test. I'm saying that it's a regression test that *should not be run by everyone*. It's classified as dangerous which means - by definition - it should not be part of the quick and auto groups.... If you want to test dangerous regression tests, then you need to *opt-in* by adding "-g dangerous", not force everyone else to opt-out by having to run "-g auto -x dangerous". Cheers, Dave.
On Sat, Jan 13, 2018 at 01:23:53PM +1100, Dave Chinner wrote: > On Fri, Jan 12, 2018 at 04:50:05PM +0800, Eryu Guan wrote: > > On Fri, Jan 12, 2018 at 07:36:05PM +1100, Dave Chinner wrote: > > > On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote: > > > > On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote: > > > > > If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > > the invalid sb_logsunit leads to crash as soon as we try to write to > > > > > the log. > > > > > > > > > > Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com> > > > > > --- > > > > > tests/xfs/437 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > tests/xfs/437.out | 2 ++ > > > > > tests/xfs/group | 1 + > > > > > 3 files changed, 76 insertions(+) > > > > > create mode 100755 tests/xfs/437 > > > > > create mode 100644 tests/xfs/437.out > > > > > > > > > > diff --git a/tests/xfs/437 b/tests/xfs/437 > > > > > new file mode 100755 > > > > > index 0000000..f2b84ad > > > > > --- /dev/null > > > > > +++ b/tests/xfs/437 > > > > > @@ -0,0 +1,73 @@ > > > > > +#! /bin/bash > > > > > +# FS QA Test No. 437 > > > > > +# > > > > > +# Regression test for commit: > > > > > +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize") > > > > > +# > > > > > +# If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > > +# the invalid sb_logsunit leads to crash as soon as we try to write to > > > > > +# the log. > > > > > +# > > > > > +#----------------------------------------------------------------------- > > > > > +# Copyright (c) 2018 FUJITSU. All Rights Reserved. > > > > > +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com> > > > > > +# > > > > > +# This program is free software; you can redistribute it and/or > > > > > +# modify it under the terms of the GNU General Public License as > > > > > +# published by the Free Software Foundation. > > > > > +# > > > > > +# This program is distributed in the hope that it would be useful, > > > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > > > > +# GNU General Public License for more details. > > > > > +# > > > > > +# You should have received a copy of the GNU General Public License > > > > > +# along with this program; if not, write the Free Software Foundation, > > > > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > > > > +#----------------------------------------------------------------------- > > > > > + > > > > > +seq=`basename "$0"` > > > > > +seqres="$RESULT_DIR/$seq" > > > > > +echo "QA output created by $seq" > > > > > + > > > > > +here=`pwd` > > > > > +tmp=/tmp/$$ > > > > > +status=1 # failure is the default! > > > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > > > + > > > > > +_cleanup() > > > > > +{ > > > > > + rm -rf $tmp.* > > > > > +} > > > > > + > > > > > +# get standard environment and checks > > > > > +. ./common/rc > > > > > + > > > > > +# real QA test starts here > > > > > +_supported_os Linux > > > > > +_supported_fs xfs > > > > > +_require_scratch > > > > > > > > This test triggers ASSERT failure and warning on debug build, thus > > > > failed _dmesg_check, I think we need _disable_dmesg_check (and some > > > > comments) too. > > > > > > > > [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size > > > > [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort! > > > > [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679 > > > > [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs] > > > > > > If it triggers debug asserts which have been put there to catch bugs > > > in other utilities (like mkfs) on a current upstream debug kernel, > > > then the test should not be part of the auto and quick groups. > > > > This test corrupts the filesystem on purpose (I didn't make this clear > > in my last reply, sorry about that), and xfs detects the corruption & > > refuses the mount instead of crashing. So I think the ASSERT failure is > > expected on debug build and we just need to ignore it. > > Except that after that assert, the block device is now busy and can't > be released, so we can't use the scratch device anymore. All tests > after this assert has been triggered are going to fail because > the block device is busy.... > > That's the whole point of adding debug asserts in cases like this - > they are supposed to stop test execution in it's tracks and leave a > corpse to analyse. The auto group regression tests are not supposed > to take the machine down on normal test configs (i.e. > CONFIG_XFS_DEBUG=y). You're right, my test kernel was built with XFS_WARN not XFS_DEBUG, I missed that the test would crash kernel even on patched kernel on XFS_DEBUG builds. Thanks for pointing this out! Eryu -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 13, 2018 at 01:23:53PM +1100, Dave Chinner wrote: > On Fri, Jan 12, 2018 at 04:50:05PM +0800, Eryu Guan wrote: > > On Fri, Jan 12, 2018 at 07:36:05PM +1100, Dave Chinner wrote: > > > On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote: > > > > On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote: > > > > > If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > > the invalid sb_logsunit leads to crash as soon as we try to write to > > > > > the log. > > > > > > > > > > Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com> > > > > > --- > > > > > tests/xfs/437 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > tests/xfs/437.out | 2 ++ > > > > > tests/xfs/group | 1 + > > > > > 3 files changed, 76 insertions(+) > > > > > create mode 100755 tests/xfs/437 > > > > > create mode 100644 tests/xfs/437.out > > > > > > > > > > diff --git a/tests/xfs/437 b/tests/xfs/437 > > > > > new file mode 100755 > > > > > index 0000000..f2b84ad > > > > > --- /dev/null > > > > > +++ b/tests/xfs/437 > > > > > @@ -0,0 +1,73 @@ > > > > > +#! /bin/bash > > > > > +# FS QA Test No. 437 > > > > > +# > > > > > +# Regression test for commit: > > > > > +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize") > > > > > +# > > > > > +# If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > > +# the invalid sb_logsunit leads to crash as soon as we try to write to > > > > > +# the log. > > > > > +# > > > > > +#----------------------------------------------------------------------- > > > > > +# Copyright (c) 2018 FUJITSU. All Rights Reserved. > > > > > +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com> > > > > > +# > > > > > +# This program is free software; you can redistribute it and/or > > > > > +# modify it under the terms of the GNU General Public License as > > > > > +# published by the Free Software Foundation. > > > > > +# > > > > > +# This program is distributed in the hope that it would be useful, > > > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > > > > +# GNU General Public License for more details. > > > > > +# > > > > > +# You should have received a copy of the GNU General Public License > > > > > +# along with this program; if not, write the Free Software Foundation, > > > > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > > > > +#----------------------------------------------------------------------- > > > > > + > > > > > +seq=`basename "$0"` > > > > > +seqres="$RESULT_DIR/$seq" > > > > > +echo "QA output created by $seq" > > > > > + > > > > > +here=`pwd` > > > > > +tmp=/tmp/$$ > > > > > +status=1 # failure is the default! > > > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > > > + > > > > > +_cleanup() > > > > > +{ > > > > > + rm -rf $tmp.* > > > > > +} > > > > > + > > > > > +# get standard environment and checks > > > > > +. ./common/rc > > > > > + > > > > > +# real QA test starts here > > > > > +_supported_os Linux > > > > > +_supported_fs xfs > > > > > +_require_scratch > > > > > > > > This test triggers ASSERT failure and warning on debug build, thus > > > > failed _dmesg_check, I think we need _disable_dmesg_check (and some > > > > comments) too. > > > > > > > > [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size > > > > [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort! > > > > [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679 > > > > [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs] > > > > > > If it triggers debug asserts which have been put there to catch bugs > > > in other utilities (like mkfs) on a current upstream debug kernel, > > > then the test should not be part of the auto and quick groups. > > > > This test corrupts the filesystem on purpose (I didn't make this clear > > in my last reply, sorry about that), and xfs detects the corruption & > > refuses the mount instead of crashing. So I think the ASSERT failure is > > expected on debug build and we just need to ignore it. > > Except that after that assert, the block device is now busy and can't > be released, so we can't use the scratch device anymore. All tests > after this assert has been triggered are going to fail because > the block device is busy.... > > That's the whole point of adding debug asserts in cases like this - > they are supposed to stop test execution in it's tracks and leave a > corpse to analyse. The auto group regression tests are not supposed > to take the machine down on normal test configs (i.e. > CONFIG_XFS_DEBUG=y). > FWIW, my understanding of the dangerous group has always been that it's for tests that when they trigger a regression, forcibly affect the entire system as such (lockup, hang, crash, etc.). IMO, a test that intentionally generates an XFS assert doesn't really follow the spirit of that categorization, even though technically an assert can cause a BUG. This is why we have CONFIG_XFS_ASSERT_FATAL now, because unless that is disabled any auto test that reproduces a regression (assuming we have broad/robust assert coverage) can very easily BUG() and have the associated effect on the system. The flipside of course is that one should generally be able to run xfstests against a FATAL_ASSERTS=y kernel that otherwise has no regressions and not bork the system because a test generates a BUG() as part of a successful run. Perhaps the root problem here is that the kernel generates an assert from a codepath that "successfully" handles the erroneous situation. We already spit out several error messages and fail the mount, so my .02 (fwiw) would be to leave the test as auto+dangerous, delete the useless assert and let people affected by older kernels (w/ FATAL_ASSERT behavior) filter out the test as appropriate in those environments. (Looking back, I see Darrick already suggested to delete the assert as well...). > > And the test reproduces the crash quickly and reliably on buggy kernel > > and passes (if we ignore the ASSERT failure) on patched kernel, it > > serves as a good 'auto' and 'quick' regression test to me. > > Sure, I'm not saying that it doesn't work as a regression test. I'm > saying that it's a regression test that *should not be run by > everyone*. It's classified as dangerous which means - by definition > - it should not be part of the quick and auto groups.... > > If you want to test dangerous regression tests, then you need to > *opt-in* by adding "-g dangerous", not force everyone else to > opt-out by having to run "-g auto -x dangerous". > I often explicitly filter out the dangerous group as above because we already have more than a handful of tests that are in both groups. I don't think they've historically been mutually exclusive. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 15, 2018 at 07:45:23AM -0500, Brian Foster wrote: > On Sat, Jan 13, 2018 at 01:23:53PM +1100, Dave Chinner wrote: > > On Fri, Jan 12, 2018 at 04:50:05PM +0800, Eryu Guan wrote: > > > On Fri, Jan 12, 2018 at 07:36:05PM +1100, Dave Chinner wrote: > > > > On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote: > > > > > On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote: > > > > > > If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > > > the invalid sb_logsunit leads to crash as soon as we try to write to > > > > > > the log. > > > > > > > > > > > > Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com> > > > > > > --- > > > > > > tests/xfs/437 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > tests/xfs/437.out | 2 ++ > > > > > > tests/xfs/group | 1 + > > > > > > 3 files changed, 76 insertions(+) > > > > > > create mode 100755 tests/xfs/437 > > > > > > create mode 100644 tests/xfs/437.out > > > > > > > > > > > > diff --git a/tests/xfs/437 b/tests/xfs/437 > > > > > > new file mode 100755 > > > > > > index 0000000..f2b84ad > > > > > > --- /dev/null > > > > > > +++ b/tests/xfs/437 > > > > > > @@ -0,0 +1,73 @@ > > > > > > +#! /bin/bash > > > > > > +# FS QA Test No. 437 > > > > > > +# > > > > > > +# Regression test for commit: > > > > > > +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize") > > > > > > +# > > > > > > +# If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > > > +# the invalid sb_logsunit leads to crash as soon as we try to write to > > > > > > +# the log. > > > > > > +# > > > > > > +#----------------------------------------------------------------------- > > > > > > +# Copyright (c) 2018 FUJITSU. All Rights Reserved. > > > > > > +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com> > > > > > > +# > > > > > > +# This program is free software; you can redistribute it and/or > > > > > > +# modify it under the terms of the GNU General Public License as > > > > > > +# published by the Free Software Foundation. > > > > > > +# > > > > > > +# This program is distributed in the hope that it would be useful, > > > > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > > > > > +# GNU General Public License for more details. > > > > > > +# > > > > > > +# You should have received a copy of the GNU General Public License > > > > > > +# along with this program; if not, write the Free Software Foundation, > > > > > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > > > > > +#----------------------------------------------------------------------- > > > > > > + > > > > > > +seq=`basename "$0"` > > > > > > +seqres="$RESULT_DIR/$seq" > > > > > > +echo "QA output created by $seq" > > > > > > + > > > > > > +here=`pwd` > > > > > > +tmp=/tmp/$$ > > > > > > +status=1 # failure is the default! > > > > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > > > > + > > > > > > +_cleanup() > > > > > > +{ > > > > > > + rm -rf $tmp.* > > > > > > +} > > > > > > + > > > > > > +# get standard environment and checks > > > > > > +. ./common/rc > > > > > > + > > > > > > +# real QA test starts here > > > > > > +_supported_os Linux > > > > > > +_supported_fs xfs > > > > > > +_require_scratch > > > > > > > > > > This test triggers ASSERT failure and warning on debug build, thus > > > > > failed _dmesg_check, I think we need _disable_dmesg_check (and some > > > > > comments) too. > > > > > > > > > > [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size > > > > > [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort! > > > > > [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679 > > > > > [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs] > > > > > > > > If it triggers debug asserts which have been put there to catch bugs > > > > in other utilities (like mkfs) on a current upstream debug kernel, > > > > then the test should not be part of the auto and quick groups. > > > > > > This test corrupts the filesystem on purpose (I didn't make this clear > > > in my last reply, sorry about that), and xfs detects the corruption & > > > refuses the mount instead of crashing. So I think the ASSERT failure is > > > expected on debug build and we just need to ignore it. > > > > Except that after that assert, the block device is now busy and can't > > be released, so we can't use the scratch device anymore. All tests > > after this assert has been triggered are going to fail because > > the block device is busy.... > > > > That's the whole point of adding debug asserts in cases like this - > > they are supposed to stop test execution in it's tracks and leave a > > corpse to analyse. The auto group regression tests are not supposed > > to take the machine down on normal test configs (i.e. > > CONFIG_XFS_DEBUG=y). > > > > FWIW, my understanding of the dangerous group has always been that it's > for tests that when they trigger a regression, forcibly affect the > entire system as such (lockup, hang, crash, etc.). IMO, a test that > intentionally generates an XFS assert doesn't really follow the spirit > of that categorization, even though technically an assert can cause a > BUG. This is why we have CONFIG_XFS_ASSERT_FATAL now, because unless > that is disabled any auto test that reproduces a regression (assuming we > have broad/robust assert coverage) can very easily BUG() and have the > associated effect on the system. And this is exactly why I opposed the CONFIG_XFS_ASSERT_FATAL inclusion. It started us down the slippery slope of allowing people to ignore asserts that indicate an impossible condition had occurred. This was a bug found via fuzzing, not user problems in the wild. The assert had functioned perfectly well up to this point as a mechanism for preventing the XFS tools from creating a corrupt log stripe unit, and it still does.... That's the point here - a corrupt log stripe unit *should never happen* because it should trigger a CRC validation failure long before we get to parsing it. If we've got a corrupt value with a correct CRC, then we damn well need to understand the cause *immediately* because something else has gone very, very wrong. > The flipside of course is that one should generally be able to run > xfstests against a FATAL_ASSERTS=y kernel that otherwise has no > regressions and not bork the system because a test generates a BUG() as > part of a successful run. That is the historical behaviour of CONFIG_XFS_DEBUG=y and the historical policy for xfstests. Adding CONFIG_XFS_ASSERT_FATAL does not change that because (at least) some of use still treat asserts as fatal. > Perhaps the root problem here is that the > kernel generates an assert from a codepath that "successfully" handles > the erroneous situation. We do that all over the place - the assert is there because it indicates a fatal problem has occurred that should never happen. For production systems, we still have to handle that gracefully because people do fuzz testing of "should never happen" conditions and then get upset we fail to detect this. That does not mean the ASSERT is wrong - it makes it even more important that developers know when their code hits these cases.... If we really want to test these "should not ever happen" conditions that trigger asserts on debug kernels as "everyone always runs" regression testing, then these need to _notrun on CONFIG_XFS_DEBUG=y kernels. fstests already knows that it's running on a debug kernel, so adding a _requires_production_kernel check may be the way around this being considered a dangerous test. > We already spit out several error messages and > fail the mount, so my .02 (fwiw) would be to leave the test as > auto+dangerous, delete the useless assert and let people affected by > older kernels (w/ FATAL_ASSERT behavior) filter out the test as > appropriate in those environments. (Looking back, I see Darrick already > suggested to delete the assert as well...). ... and this specific assert caught multiple bugs I introduced while refactoring mkfs recently. That's it's purpose, and it works for that purpose very well. > > If you want to test dangerous regression tests, then you need to > > *opt-in* by adding "-g dangerous", not force everyone else to > > opt-out by having to run "-g auto -x dangerous". > > > > I often explicitly filter out the dangerous group as above because we > already have more than a handful of tests that are in both groups. I > don't think they've historically been mutually exclusive. Which means we've already deviated from the historic policy. That doesn't mean it the right direction to take. Cheers, Dave.
On Tue, Jan 16, 2018 at 08:03:52AM +1100, Dave Chinner wrote: > On Mon, Jan 15, 2018 at 07:45:23AM -0500, Brian Foster wrote: > > On Sat, Jan 13, 2018 at 01:23:53PM +1100, Dave Chinner wrote: > > > On Fri, Jan 12, 2018 at 04:50:05PM +0800, Eryu Guan wrote: > > > > On Fri, Jan 12, 2018 at 07:36:05PM +1100, Dave Chinner wrote: > > > > > On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote: > > > > > > On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote: > > > > > > > If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > > > > the invalid sb_logsunit leads to crash as soon as we try to write to > > > > > > > the log. > > > > > > > > > > > > > > Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com> > > > > > > > --- > > > > > > > tests/xfs/437 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > tests/xfs/437.out | 2 ++ > > > > > > > tests/xfs/group | 1 + > > > > > > > 3 files changed, 76 insertions(+) > > > > > > > create mode 100755 tests/xfs/437 > > > > > > > create mode 100644 tests/xfs/437.out > > > > > > > > > > > > > > diff --git a/tests/xfs/437 b/tests/xfs/437 > > > > > > > new file mode 100755 > > > > > > > index 0000000..f2b84ad > > > > > > > --- /dev/null > > > > > > > +++ b/tests/xfs/437 > > > > > > > @@ -0,0 +1,73 @@ > > > > > > > +#! /bin/bash > > > > > > > +# FS QA Test No. 437 > > > > > > > +# > > > > > > > +# Regression test for commit: > > > > > > > +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize") > > > > > > > +# > > > > > > > +# If log stripe unit isn't a multiple of the fs blocksize and mounting, > > > > > > > +# the invalid sb_logsunit leads to crash as soon as we try to write to > > > > > > > +# the log. > > > > > > > +# > > > > > > > +#----------------------------------------------------------------------- > > > > > > > +# Copyright (c) 2018 FUJITSU. All Rights Reserved. > > > > > > > +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com> > > > > > > > +# > > > > > > > +# This program is free software; you can redistribute it and/or > > > > > > > +# modify it under the terms of the GNU General Public License as > > > > > > > +# published by the Free Software Foundation. > > > > > > > +# > > > > > > > +# This program is distributed in the hope that it would be useful, > > > > > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > > > > > > +# GNU General Public License for more details. > > > > > > > +# > > > > > > > +# You should have received a copy of the GNU General Public License > > > > > > > +# along with this program; if not, write the Free Software Foundation, > > > > > > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > > > > > > +#----------------------------------------------------------------------- > > > > > > > + > > > > > > > +seq=`basename "$0"` > > > > > > > +seqres="$RESULT_DIR/$seq" > > > > > > > +echo "QA output created by $seq" > > > > > > > + > > > > > > > +here=`pwd` > > > > > > > +tmp=/tmp/$$ > > > > > > > +status=1 # failure is the default! > > > > > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > > > > > + > > > > > > > +_cleanup() > > > > > > > +{ > > > > > > > + rm -rf $tmp.* > > > > > > > +} > > > > > > > + > > > > > > > +# get standard environment and checks > > > > > > > +. ./common/rc > > > > > > > + > > > > > > > +# real QA test starts here > > > > > > > +_supported_os Linux > > > > > > > +_supported_fs xfs > > > > > > > +_require_scratch > > > > > > > > > > > > This test triggers ASSERT failure and warning on debug build, thus > > > > > > failed _dmesg_check, I think we need _disable_dmesg_check (and some > > > > > > comments) too. > > > > > > > > > > > > [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size > > > > > > [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort! > > > > > > [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679 > > > > > > [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs] > > > > > > > > > > If it triggers debug asserts which have been put there to catch bugs > > > > > in other utilities (like mkfs) on a current upstream debug kernel, > > > > > then the test should not be part of the auto and quick groups. > > > > > > > > This test corrupts the filesystem on purpose (I didn't make this clear > > > > in my last reply, sorry about that), and xfs detects the corruption & > > > > refuses the mount instead of crashing. So I think the ASSERT failure is > > > > expected on debug build and we just need to ignore it. > > > > > > Except that after that assert, the block device is now busy and can't > > > be released, so we can't use the scratch device anymore. All tests > > > after this assert has been triggered are going to fail because > > > the block device is busy.... > > > > > > That's the whole point of adding debug asserts in cases like this - > > > they are supposed to stop test execution in it's tracks and leave a > > > corpse to analyse. The auto group regression tests are not supposed > > > to take the machine down on normal test configs (i.e. > > > CONFIG_XFS_DEBUG=y). > > > > > > > FWIW, my understanding of the dangerous group has always been that it's > > for tests that when they trigger a regression, forcibly affect the > > entire system as such (lockup, hang, crash, etc.). IMO, a test that I had the same understanding of dangerous group. And I recommended the usage of "-g auto -x dangerous" before[1], and Dave acknowledged this dangerous group usage[2] :) [1] https://www.spinics.net/lists/linux-btrfs/msg57312.html [2] https://www.spinics.net/lists/linux-btrfs/msg57330.html > > intentionally generates an XFS assert doesn't really follow the spirit > > of that categorization, even though technically an assert can cause a > > BUG. This is why we have CONFIG_XFS_ASSERT_FATAL now, because unless > > that is disabled any auto test that reproduces a regression (assuming we > > have broad/robust assert coverage) can very easily BUG() and have the > > associated effect on the system. > > And this is exactly why I opposed the CONFIG_XFS_ASSERT_FATAL > inclusion. It started us down the slippery slope of allowing people > to ignore asserts that indicate an impossible condition had > occurred. > > This was a bug found via fuzzing, not user problems in the wild. The > assert had functioned perfectly well up to this point as a mechanism > for preventing the XFS tools from creating a corrupt log stripe > unit, and it still does.... > > That's the point here - a corrupt log stripe unit *should never > happen* because it should trigger a CRC validation failure long > before we get to parsing it. If we've got a corrupt value with a > correct CRC, then we damn well need to understand the cause > *immediately* because something else has gone very, very wrong. > > > The flipside of course is that one should generally be able to run > > xfstests against a FATAL_ASSERTS=y kernel that otherwise has no > > regressions and not bork the system because a test generates a BUG() as > > part of a successful run. > > That is the historical behaviour of CONFIG_XFS_DEBUG=y and the > historical policy for xfstests. Adding CONFIG_XFS_ASSERT_FATAL does > not change that because (at least) some of use still treat asserts > as fatal. > > > Perhaps the root problem here is that the > > kernel generates an assert from a codepath that "successfully" handles > > the erroneous situation. > > We do that all over the place - the assert is there because it > indicates a fatal problem has occurred that should never happen. > For production systems, we still have to handle that gracefully > because people do fuzz testing of "should never happen" conditions > and then get upset we fail to detect this. That does not mean > the ASSERT is wrong - it makes it even more important that > developers know when their code hits these cases.... > > If we really want to test these "should not ever happen" conditions > that trigger asserts on debug kernels as "everyone always runs" > regression testing, then these need to _notrun on CONFIG_XFS_DEBUG=y > kernels. fstests already knows that it's running on a debug kernel, > so adding a _requires_production_kernel check may be the way around > this being considered a dangerous test. This reminds that we already have a _require_no_xfs_debug rule, as used in xfs/115, which is known to trigger ASSERT failure on debug build. So we can do the same in this new test. Thanks, Eryu > > > We already spit out several error messages and > > fail the mount, so my .02 (fwiw) would be to leave the test as > > auto+dangerous, delete the useless assert and let people affected by > > older kernels (w/ FATAL_ASSERT behavior) filter out the test as > > appropriate in those environments. (Looking back, I see Darrick already > > suggested to delete the assert as well...). > > ... and this specific assert caught multiple bugs I introduced while > refactoring mkfs recently. That's it's purpose, and it works for > that purpose very well. > > > > If you want to test dangerous regression tests, then you need to > > > *opt-in* by adding "-g dangerous", not force everyone else to > > > opt-out by having to run "-g auto -x dangerous". > > > > > > > I often explicitly filter out the dangerous group as above because we > > already have more than a handful of tests that are in both groups. I > > don't think they've historically been mutually exclusive. > > Which means we've already deviated from the historic policy. That > doesn't mean it the right direction to take. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018/01/16 12:02, Eryu Guan wrote: > On Tue, Jan 16, 2018 at 08:03:52AM +1100, Dave Chinner wrote: >> On Mon, Jan 15, 2018 at 07:45:23AM -0500, Brian Foster wrote: >>> On Sat, Jan 13, 2018 at 01:23:53PM +1100, Dave Chinner wrote: >>>> On Fri, Jan 12, 2018 at 04:50:05PM +0800, Eryu Guan wrote: >>>>> On Fri, Jan 12, 2018 at 07:36:05PM +1100, Dave Chinner wrote: >>>>>> On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote: >>>>>>> On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote: >>>>>>>> If log stripe unit isn't a multiple of the fs blocksize and mounting, >>>>>>>> the invalid sb_logsunit leads to crash as soon as we try to write to >>>>>>>> the log. >>>>>>>> >>>>>>>> Signed-off-by: xiao yang<yangx.jy@cn.fujitsu.com> >>>>>>>> --- >>>>>>>> tests/xfs/437 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>>> tests/xfs/437.out | 2 ++ >>>>>>>> tests/xfs/group | 1 + >>>>>>>> 3 files changed, 76 insertions(+) >>>>>>>> create mode 100755 tests/xfs/437 >>>>>>>> create mode 100644 tests/xfs/437.out >>>>>>>> >>>>>>>> diff --git a/tests/xfs/437 b/tests/xfs/437 >>>>>>>> new file mode 100755 >>>>>>>> index 0000000..f2b84ad >>>>>>>> --- /dev/null >>>>>>>> +++ b/tests/xfs/437 >>>>>>>> @@ -0,0 +1,73 @@ >>>>>>>> +#! /bin/bash >>>>>>>> +# FS QA Test No. 437 >>>>>>>> +# >>>>>>>> +# Regression test for commit: >>>>>>>> +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize") >>>>>>>> +# >>>>>>>> +# If log stripe unit isn't a multiple of the fs blocksize and mounting, >>>>>>>> +# the invalid sb_logsunit leads to crash as soon as we try to write to >>>>>>>> +# the log. >>>>>>>> +# >>>>>>>> +#----------------------------------------------------------------------- >>>>>>>> +# Copyright (c) 2018 FUJITSU. All Rights Reserved. >>>>>>>> +# Author: Xiao Yang<yangx.jy@cn.fujitsu.com> >>>>>>>> +# >>>>>>>> +# This program is free software; you can redistribute it and/or >>>>>>>> +# modify it under the terms of the GNU General Public License as >>>>>>>> +# published by the Free Software Foundation. >>>>>>>> +# >>>>>>>> +# This program is distributed in the hope that it would be useful, >>>>>>>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >>>>>>>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>>>>>>> +# GNU General Public License for more details. >>>>>>>> +# >>>>>>>> +# You should have received a copy of the GNU General Public License >>>>>>>> +# along with this program; if not, write the Free Software Foundation, >>>>>>>> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA >>>>>>>> +#----------------------------------------------------------------------- >>>>>>>> + >>>>>>>> +seq=`basename "$0"` >>>>>>>> +seqres="$RESULT_DIR/$seq" >>>>>>>> +echo "QA output created by $seq" >>>>>>>> + >>>>>>>> +here=`pwd` >>>>>>>> +tmp=/tmp/$$ >>>>>>>> +status=1 # failure is the default! >>>>>>>> +trap "_cleanup; exit \$status" 0 1 2 3 15 >>>>>>>> + >>>>>>>> +_cleanup() >>>>>>>> +{ >>>>>>>> + rm -rf $tmp.* >>>>>>>> +} >>>>>>>> + >>>>>>>> +# get standard environment and checks >>>>>>>> +. ./common/rc >>>>>>>> + >>>>>>>> +# real QA test starts here >>>>>>>> +_supported_os Linux >>>>>>>> +_supported_fs xfs >>>>>>>> +_require_scratch >>>>>>> This test triggers ASSERT failure and warning on debug build, thus >>>>>>> failed _dmesg_check, I think we need _disable_dmesg_check (and some >>>>>>> comments) too. >>>>>>> >>>>>>> [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size >>>>>>> [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort! >>>>>>> [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679 >>>>>>> [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs] >>>>>> If it triggers debug asserts which have been put there to catch bugs >>>>>> in other utilities (like mkfs) on a current upstream debug kernel, >>>>>> then the test should not be part of the auto and quick groups. >>>>> This test corrupts the filesystem on purpose (I didn't make this clear >>>>> in my last reply, sorry about that), and xfs detects the corruption& >>>>> refuses the mount instead of crashing. So I think the ASSERT failure is >>>>> expected on debug build and we just need to ignore it. >>>> Except that after that assert, the block device is now busy and can't >>>> be released, so we can't use the scratch device anymore. All tests >>>> after this assert has been triggered are going to fail because >>>> the block device is busy.... >>>> >>>> That's the whole point of adding debug asserts in cases like this - >>>> they are supposed to stop test execution in it's tracks and leave a >>>> corpse to analyse. The auto group regression tests are not supposed >>>> to take the machine down on normal test configs (i.e. >>>> CONFIG_XFS_DEBUG=y). >>>> >>> FWIW, my understanding of the dangerous group has always been that it's >>> for tests that when they trigger a regression, forcibly affect the >>> entire system as such (lockup, hang, crash, etc.). IMO, a test that > I had the same understanding of dangerous group. And I recommended the > usage of "-g auto -x dangerous" before[1], and Dave acknowledged this > dangerous group usage[2] :) > > [1] https://www.spinics.net/lists/linux-btrfs/msg57312.html > [2] https://www.spinics.net/lists/linux-btrfs/msg57330.html > >>> intentionally generates an XFS assert doesn't really follow the spirit >>> of that categorization, even though technically an assert can cause a >>> BUG. This is why we have CONFIG_XFS_ASSERT_FATAL now, because unless >>> that is disabled any auto test that reproduces a regression (assuming we >>> have broad/robust assert coverage) can very easily BUG() and have the >>> associated effect on the system. >> And this is exactly why I opposed the CONFIG_XFS_ASSERT_FATAL >> inclusion. It started us down the slippery slope of allowing people >> to ignore asserts that indicate an impossible condition had >> occurred. >> >> This was a bug found via fuzzing, not user problems in the wild. The >> assert had functioned perfectly well up to this point as a mechanism >> for preventing the XFS tools from creating a corrupt log stripe >> unit, and it still does.... >> >> That's the point here - a corrupt log stripe unit *should never >> happen* because it should trigger a CRC validation failure long >> before we get to parsing it. If we've got a corrupt value with a >> correct CRC, then we damn well need to understand the cause >> *immediately* because something else has gone very, very wrong. >> >>> The flipside of course is that one should generally be able to run >>> xfstests against a FATAL_ASSERTS=y kernel that otherwise has no >>> regressions and not bork the system because a test generates a BUG() as >>> part of a successful run. >> That is the historical behaviour of CONFIG_XFS_DEBUG=y and the >> historical policy for xfstests. Adding CONFIG_XFS_ASSERT_FATAL does >> not change that because (at least) some of use still treat asserts >> as fatal. >> >>> Perhaps the root problem here is that the >>> kernel generates an assert from a codepath that "successfully" handles >>> the erroneous situation. >> We do that all over the place - the assert is there because it >> indicates a fatal problem has occurred that should never happen. >> For production systems, we still have to handle that gracefully >> because people do fuzz testing of "should never happen" conditions >> and then get upset we fail to detect this. That does not mean >> the ASSERT is wrong - it makes it even more important that >> developers know when their code hits these cases.... >> >> If we really want to test these "should not ever happen" conditions >> that trigger asserts on debug kernels as "everyone always runs" >> regression testing, then these need to _notrun on CONFIG_XFS_DEBUG=y >> kernels. fstests already knows that it's running on a debug kernel, >> so adding a _requires_production_kernel check may be the way around >> this being considered a dangerous test. > This reminds that we already have a _require_no_xfs_debug rule, as used > in xfs/115, which is known to trigger ASSERT failure on debug build. So > we can do the same in this new test. Agreed. This new test can be skipped by _require_no_xfs_debug rule on debug build, and is considered as a 'auto' and 'quick' regression test, so i wil send v4 patch in this way. Thanks, Xiao Yang > Thanks, > Eryu > >>> We already spit out several error messages and >>> fail the mount, so my .02 (fwiw) would be to leave the test as >>> auto+dangerous, delete the useless assert and let people affected by >>> older kernels (w/ FATAL_ASSERT behavior) filter out the test as >>> appropriate in those environments. (Looking back, I see Darrick already >>> suggested to delete the assert as well...). >> ... and this specific assert caught multiple bugs I introduced while >> refactoring mkfs recently. That's it's purpose, and it works for >> that purpose very well. >> >>>> If you want to test dangerous regression tests, then you need to >>>> *opt-in* by adding "-g dangerous", not force everyone else to >>>> opt-out by having to run "-g auto -x dangerous". >>>> >>> I often explicitly filter out the dangerous group as above because we >>> already have more than a handful of tests that are in both groups. I >>> don't think they've historically been mutually exclusive. >> Which means we've already deviated from the historic policy. That >> doesn't mean it the right direction to take. >> >> Cheers, >> >> Dave. >> -- >> Dave Chinner >> david@fromorbit.com >> -- >> To unsubscribe from this list: send the line "unsubscribe fstests" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > . > -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 16, 2018 at 12:02:54PM +0800, Eryu Guan wrote: > On Tue, Jan 16, 2018 at 08:03:52AM +1100, Dave Chinner wrote: > > On Mon, Jan 15, 2018 at 07:45:23AM -0500, Brian Foster wrote: > > > On Sat, Jan 13, 2018 at 01:23:53PM +1100, Dave Chinner wrote: > > > > That's the whole point of adding debug asserts in cases like this - > > > > they are supposed to stop test execution in it's tracks and leave a > > > > corpse to analyse. The auto group regression tests are not supposed > > > > to take the machine down on normal test configs (i.e. > > > > CONFIG_XFS_DEBUG=y). > > > > > > > > > > FWIW, my understanding of the dangerous group has always been that it's > > > for tests that when they trigger a regression, forcibly affect the > > > entire system as such (lockup, hang, crash, etc.). IMO, a test that > > I had the same understanding of dangerous group. And I recommended the > usage of "-g auto -x dangerous" before[1], and Dave acknowledged this > dangerous group usage[2] :) > > [1] https://www.spinics.net/lists/linux-btrfs/msg57312.html > [2] https://www.spinics.net/lists/linux-btrfs/msg57330.html Context is important. The context that the above links were talking about filtering the dangerous group was for older kernels that don't have the fixes for the bugs that crash the kernel. This particular test fails this auto group criteria in the case of people using CONFIG_XFS_DEBUG=y: " - it passes on current upstream kernels, if it fails, it's likely to be resolved in forseeable future [2] " It fails, and isn't likely to ever work, because the assert needs to remain there to catch userspace tool screwups.... > > If we really want to test these "should not ever happen" conditions > > that trigger asserts on debug kernels as "everyone always runs" > > regression testing, then these need to _notrun on CONFIG_XFS_DEBUG=y > > kernels. fstests already knows that it's running on a debug kernel, > > so adding a _requires_production_kernel check may be the way around > > this being considered a dangerous test. > > This reminds that we already have a _require_no_xfs_debug rule, as used > in xfs/115, which is known to trigger ASSERT failure on debug build. So > we can do the same in this new test. Ok, good! And it appears to be addressing the same "intentional corruption triggers failures" case as we are discussing here: # we corrupt XFS on purpose, and debug built XFS would crash due to assert # failure, so skip if we're testing on a debug built XFS _require_no_xfs_debug Cheers, Dave.
On Tue, Jan 16, 2018 at 08:03:52AM +1100, Dave Chinner wrote: > On Mon, Jan 15, 2018 at 07:45:23AM -0500, Brian Foster wrote: > > On Sat, Jan 13, 2018 at 01:23:53PM +1100, Dave Chinner wrote: > > > On Fri, Jan 12, 2018 at 04:50:05PM +0800, Eryu Guan wrote: > > > > On Fri, Jan 12, 2018 at 07:36:05PM +1100, Dave Chinner wrote: > > > > > On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote: > > > > > > On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote: ... > > > > > > > > > > > > This test triggers ASSERT failure and warning on debug build, thus > > > > > > failed _dmesg_check, I think we need _disable_dmesg_check (and some > > > > > > comments) too. > > > > > > > > > > > > [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size > > > > > > [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort! > > > > > > [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679 > > > > > > [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs] > > > > > > > > > > If it triggers debug asserts which have been put there to catch bugs > > > > > in other utilities (like mkfs) on a current upstream debug kernel, > > > > > then the test should not be part of the auto and quick groups. > > > > > > > > This test corrupts the filesystem on purpose (I didn't make this clear > > > > in my last reply, sorry about that), and xfs detects the corruption & > > > > refuses the mount instead of crashing. So I think the ASSERT failure is > > > > expected on debug build and we just need to ignore it. > > > > > > Except that after that assert, the block device is now busy and can't > > > be released, so we can't use the scratch device anymore. All tests > > > after this assert has been triggered are going to fail because > > > the block device is busy.... > > > > > > That's the whole point of adding debug asserts in cases like this - > > > they are supposed to stop test execution in it's tracks and leave a > > > corpse to analyse. The auto group regression tests are not supposed > > > to take the machine down on normal test configs (i.e. > > > CONFIG_XFS_DEBUG=y). > > > > > > > FWIW, my understanding of the dangerous group has always been that it's > > for tests that when they trigger a regression, forcibly affect the > > entire system as such (lockup, hang, crash, etc.). IMO, a test that > > intentionally generates an XFS assert doesn't really follow the spirit > > of that categorization, even though technically an assert can cause a > > BUG. This is why we have CONFIG_XFS_ASSERT_FATAL now, because unless > > that is disabled any auto test that reproduces a regression (assuming we > > have broad/robust assert coverage) can very easily BUG() and have the > > associated effect on the system. > > And this is exactly why I opposed the CONFIG_XFS_ASSERT_FATAL > inclusion. It started us down the slippery slope of allowing people > to ignore asserts that indicate an impossible condition had > occurred. > I thought you opposed the unconditional change from BUG() to WARN() but supported the config option, but I could be misremembering. > This was a bug found via fuzzing, not user problems in the wild. The > assert had functioned perfectly well up to this point as a mechanism > for preventing the XFS tools from creating a corrupt log stripe > unit, and it still does.... > > That's the point here - a corrupt log stripe unit *should never > happen* because it should trigger a CRC validation failure long > before we get to parsing it. If we've got a corrupt value with a > correct CRC, then we damn well need to understand the cause > *immediately* because something else has gone very, very wrong. > I'm not following the argument. What do we really lose by removing an assert that is accompanied by mount failure and several other context specific error messages? > > The flipside of course is that one should generally be able to run > > xfstests against a FATAL_ASSERTS=y kernel that otherwise has no > > regressions and not bork the system because a test generates a BUG() as > > part of a successful run. > > That is the historical behaviour of CONFIG_XFS_DEBUG=y and the > historical policy for xfstests. Adding CONFIG_XFS_ASSERT_FATAL does > not change that because (at least) some of use still treat asserts > as fatal. > Agree (I thought that's exactly what I said above..? ;)). > > Perhaps the root problem here is that the > > kernel generates an assert from a codepath that "successfully" handles > > the erroneous situation. > > We do that all over the place - the assert is there because it > indicates a fatal problem has occurred that should never happen. > For production systems, we still have to handle that gracefully > because people do fuzz testing of "should never happen" conditions > and then get upset we fail to detect this. That does not mean > the ASSERT is wrong - it makes it even more important that > developers know when their code hits these cases.... > I'm not suggesting the assert is wrong, that we don't do this all over the place already, or that we should never do the ASSERT(0) thing. I'm suggesting that in this specific case, the most practical thing may be just to delete the assert rather than categorize this new test as dangerous. > If we really want to test these "should not ever happen" conditions > that trigger asserts on debug kernels as "everyone always runs" > regression testing, then these need to _notrun on CONFIG_XFS_DEBUG=y > kernels. fstests already knows that it's running on a debug kernel, > so adding a _requires_production_kernel check may be the way around > this being considered a dangerous test. > I think that's overkill for the time being, but I suppose is a reasonable option in the long term as we end up with more of this particular situation (fuzzer-like tests vs. ASSERT()). I think more broadly useful would be to use the bug_on_assert sysfs value where it's available, then fall back to DEBUG=1 if necessary. > > We already spit out several error messages and > > fail the mount, so my .02 (fwiw) would be to leave the test as > > auto+dangerous, delete the useless assert and let people affected by > > older kernels (w/ FATAL_ASSERT behavior) filter out the test as > > appropriate in those environments. (Looking back, I see Darrick already > > suggested to delete the assert as well...). > > ... and this specific assert caught multiple bugs I introduced while > refactoring mkfs recently. That's it's purpose, and it works for > that purpose very well. > Maybe I'm missing something.. the mount would have still failed and spit out the error message as well, right? > > > If you want to test dangerous regression tests, then you need to > > > *opt-in* by adding "-g dangerous", not force everyone else to > > > opt-out by having to run "-g auto -x dangerous". > > > > > > > I often explicitly filter out the dangerous group as above because we > > already have more than a handful of tests that are in both groups. I > > don't think they've historically been mutually exclusive. > > Which means we've already deviated from the historic policy. That > doesn't mean it the right direction to take. > I've taken this approach with xfstests for as long as I can remember (running the dangerous tests separately at least, so as to not interrupt a longer running auto run). This doesn't have anything to do with the whole FATAL_ASSERT thing because most of those dangerous tests weren't considered dangerous just because they triggered an assert. They were dangerous because they hung or caused bad memory accesses or some such on kernels without the associated fix. That actually makes me wonder whether we should have some kind of dangerous expiration policy where the tag could be removed after some period of time (where fixes have had plenty of time to trickle into stable kernels), but that's a separate topic. The FATAL_ASSERT thing was more about the ability to enable all of the additional DEBUG mode code we have (things like sparse inode injection, etc., that simply cannot be enabled on DEBUG=0) without also having to crash the box if an assertion fails. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 16, 2018 at 07:50:23PM +1100, Dave Chinner wrote: > On Tue, Jan 16, 2018 at 12:02:54PM +0800, Eryu Guan wrote: > > On Tue, Jan 16, 2018 at 08:03:52AM +1100, Dave Chinner wrote: > > > On Mon, Jan 15, 2018 at 07:45:23AM -0500, Brian Foster wrote: > > > > On Sat, Jan 13, 2018 at 01:23:53PM +1100, Dave Chinner wrote: > > > > > That's the whole point of adding debug asserts in cases like this - > > > > > they are supposed to stop test execution in it's tracks and leave a > > > > > corpse to analyse. The auto group regression tests are not supposed > > > > > to take the machine down on normal test configs (i.e. > > > > > CONFIG_XFS_DEBUG=y). > > > > > > > > > > > > > FWIW, my understanding of the dangerous group has always been that it's > > > > for tests that when they trigger a regression, forcibly affect the > > > > entire system as such (lockup, hang, crash, etc.). IMO, a test that > > > > I had the same understanding of dangerous group. And I recommended the > > usage of "-g auto -x dangerous" before[1], and Dave acknowledged this > > dangerous group usage[2] :) > > > > [1] https://www.spinics.net/lists/linux-btrfs/msg57312.html > > [2] https://www.spinics.net/lists/linux-btrfs/msg57330.html > > Context is important. The context that the above links were talking > about filtering the dangerous group was for older kernels that don't > have the fixes for the bugs that crash the kernel. > > This particular test fails this auto group criteria in the case of > people using CONFIG_XFS_DEBUG=y: > > " > - it passes on current upstream kernels, if it fails, it's > likely to be resolved in forseeable future [2] > " > > It fails, and isn't likely to ever work, because the assert needs to > remain there to catch userspace tool screwups.... > > > > If we really want to test these "should not ever happen" conditions > > > that trigger asserts on debug kernels as "everyone always runs" > > > regression testing, then these need to _notrun on CONFIG_XFS_DEBUG=y > > > kernels. fstests already knows that it's running on a debug kernel, > > > so adding a _requires_production_kernel check may be the way around > > > this being considered a dangerous test. > > > > This reminds that we already have a _require_no_xfs_debug rule, as used > > in xfs/115, which is known to trigger ASSERT failure on debug build. So > > we can do the same in this new test. > > Ok, good! And it appears to be addressing the same "intentional > corruption triggers failures" case as we are discussing here: > My only suggestion (re: my previous comment) is to consider using a new check that looks at bug_on_assert when the knob is available for tests that are expected to generate asserts as such. The test stil has to filter the assert itself to cover the WARN=1 case, right? > # we corrupt XFS on purpose, and debug built XFS would crash due to assert > # failure, so skip if we're testing on a debug built XFS > _require_no_xfs_debug > Only with CONFIG_XFS_ASSERT_FATAL=y! :) Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 16, 2018 at 09:09:14AM -0500, Brian Foster wrote: > On Tue, Jan 16, 2018 at 07:50:23PM +1100, Dave Chinner wrote: > > On Tue, Jan 16, 2018 at 12:02:54PM +0800, Eryu Guan wrote: > > > On Tue, Jan 16, 2018 at 08:03:52AM +1100, Dave Chinner wrote: > > > > On Mon, Jan 15, 2018 at 07:45:23AM -0500, Brian Foster wrote: > > > > > On Sat, Jan 13, 2018 at 01:23:53PM +1100, Dave Chinner wrote: > > > > > > That's the whole point of adding debug asserts in cases like this - > > > > > > they are supposed to stop test execution in it's tracks and leave a > > > > > > corpse to analyse. The auto group regression tests are not supposed > > > > > > to take the machine down on normal test configs (i.e. > > > > > > CONFIG_XFS_DEBUG=y). > > > > > > > > > > > > > > > > FWIW, my understanding of the dangerous group has always been that it's > > > > > for tests that when they trigger a regression, forcibly affect the > > > > > entire system as such (lockup, hang, crash, etc.). IMO, a test that > > > > > > I had the same understanding of dangerous group. And I recommended the > > > usage of "-g auto -x dangerous" before[1], and Dave acknowledged this > > > dangerous group usage[2] :) > > > > > > [1] https://www.spinics.net/lists/linux-btrfs/msg57312.html > > > [2] https://www.spinics.net/lists/linux-btrfs/msg57330.html > > > > Context is important. The context that the above links were talking > > about filtering the dangerous group was for older kernels that don't > > have the fixes for the bugs that crash the kernel. > > > > This particular test fails this auto group criteria in the case of > > people using CONFIG_XFS_DEBUG=y: > > > > " > > - it passes on current upstream kernels, if it fails, it's > > likely to be resolved in forseeable future [2] > > " > > > > It fails, and isn't likely to ever work, because the assert needs to > > remain there to catch userspace tool screwups.... > > > > > > If we really want to test these "should not ever happen" conditions > > > > that trigger asserts on debug kernels as "everyone always runs" > > > > regression testing, then these need to _notrun on CONFIG_XFS_DEBUG=y > > > > kernels. fstests already knows that it's running on a debug kernel, > > > > so adding a _requires_production_kernel check may be the way around > > > > this being considered a dangerous test. > > > > > > This reminds that we already have a _require_no_xfs_debug rule, as used > > > in xfs/115, which is known to trigger ASSERT failure on debug build. So > > > we can do the same in this new test. > > > > Ok, good! And it appears to be addressing the same "intentional > > corruption triggers failures" case as we are discussing here: > > > > My only suggestion (re: my previous comment) is to consider using a new > check that looks at bug_on_assert when the knob is available for tests > that are expected to generate asserts as such. The test stil has to Looks like we need a new _require_no_xfs_bug_on_assert, which looks at /sys/fs/xfs/debug/bug_on_assert value if it's available, and just calls _require_no_xfs_debug if it's not available. > filter the assert itself to cover the WARN=1 case, right? I think so (and to cover the bug_on_assert=0 case). > > > # we corrupt XFS on purpose, and debug built XFS would crash due to assert > > # failure, so skip if we're testing on a debug built XFS > > _require_no_xfs_debug > > > > Only with CONFIG_XFS_ASSERT_FATAL=y! :) XFS_ASSERT_FATAL and _require_no_xfs_debug were introduced closely in time. XFS_ASSERT_FATAL was there first, and it was too new to be noticed when we then introduced _require_no_xfs_debug.. Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/tests/xfs/437 b/tests/xfs/437 new file mode 100755 index 0000000..f2b84ad --- /dev/null +++ b/tests/xfs/437 @@ -0,0 +1,73 @@ +#! /bin/bash +# FS QA Test No. 437 +# +# Regression test for commit: +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize") +# +# If log stripe unit isn't a multiple of the fs blocksize and mounting, +# the invalid sb_logsunit leads to crash as soon as we try to write to +# the log. +# +#----------------------------------------------------------------------- +# Copyright (c) 2018 FUJITSU. All Rights Reserved. +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com> +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#----------------------------------------------------------------------- + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + rm -rf $tmp.* +} + +# get standard environment and checks +. ./common/rc + +# real QA test starts here +_supported_os Linux +_supported_fs xfs +_require_scratch + +rm -f "$seqres.full" + +# Format +_scratch_mkfs > $seqres.full 2>&1 || _fail "mkfs failed" + +# Set logsunit to a value which is not a multiple of the fs blocksize +blksz=$(_scratch_xfs_get_sb_field blocksize) +_scratch_xfs_set_sb_field logsunit $((blksz - 1)) >> $seqres.full 2>&1 \ + || _notrun "failed to set sb_logsunit" + +# Mount and writing log may trigger a crash +if _scratch_mount >> $seqres.full 2>&1; then + for i in $(seq 1 1000); do + touch ${SCRATCH_MNT}/$i + done + _scratch_unmount +fi + +echo "Silence is golden" + +# success, all done +status=0 +exit diff --git a/tests/xfs/437.out b/tests/xfs/437.out new file mode 100644 index 0000000..4dcb607 --- /dev/null +++ b/tests/xfs/437.out @@ -0,0 +1,2 @@ +QA output created by 437 +Silence is golden diff --git a/tests/xfs/group b/tests/xfs/group index d230060..35d1b03 100644 --- a/tests/xfs/group +++ b/tests/xfs/group @@ -434,3 +434,4 @@ 434 auto quick clone fsr 435 auto quick clone 436 auto quick clone fsr +437 auto quick log dangerous
If log stripe unit isn't a multiple of the fs blocksize and mounting, the invalid sb_logsunit leads to crash as soon as we try to write to the log. Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com> --- tests/xfs/437 | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/437.out | 2 ++ tests/xfs/group | 1 + 3 files changed, 76 insertions(+) create mode 100755 tests/xfs/437 create mode 100644 tests/xfs/437.out