Message ID | 20170922232127.12032-1-bo.li.liu@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Sep 22, 2017 at 05:21:27PM -0600, Liu Bo wrote: >We had a bug in btrfs compression code which could end up with a >kernel panic. > >This is adding a regression test for the bug and I've also sent a >kernel patch to fix the bug. > >The patch is "Btrfs: fix kernel oops while reading compressed data". > >Signed-off-by: Liu Bo <bo.li.liu@oracle.com> >--- >v2: - Fix ambiguous copyright. > - Use /proc/$pid/make-it-fail to specify IO failure - /sys/kernel/debug/fail*/task-filter: Format: { 'Y' | 'N' } A value of 'N' disables filtering by process (default). Any positive value limits failures to only processes indicated by /proc/<pid>/make-it-fail==1.
On Fri, Sep 22, 2017 at 05:21:27PM -0600, Liu Bo wrote: > We had a bug in btrfs compression code which could end up with a > kernel panic. > > This is adding a regression test for the bug and I've also sent a > kernel patch to fix the bug. > > The patch is "Btrfs: fix kernel oops while reading compressed data". > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Hmm, I can't reproduce the panic with 4.13 kernel, which doesn't have the fix applied. Can you please help confirm if it panics on your test environment? > --- > v2: - Fix ambiguous copyright. > - Use /proc/$pid/make-it-fail to specify IO failure > - Use bash -c to run test only when pid is odd. > - Add test to dangerous group. > > tests/btrfs/150 | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/150.out | 3 ++ > tests/btrfs/group | 1 + > 3 files changed, 107 insertions(+) > create mode 100755 tests/btrfs/150 > create mode 100644 tests/btrfs/150.out > > diff --git a/tests/btrfs/150 b/tests/btrfs/150 > new file mode 100755 > index 0000000..8891c38 > --- /dev/null > +++ b/tests/btrfs/150 > @@ -0,0 +1,103 @@ > +#! /bin/bash > +# FS QA Test btrfs/150 > +# > +# This is a regression test which ends up with a kernel oops in btrfs. > +# It occurs when btrfs's read repair happens while reading a compressed > +# extent. > +# The patch to fix it is > +# Btrfs: fix kernel oops while reading compressed data > +# > +#----------------------------------------------------------------------- > +# Copyright (c) 2017 Oracle. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#----------------------------------------------------------------------- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch > +_require_fail_make_request > +_require_scratch_dev_pool 2 Trailing whitespace in above line. > + > +SYSFS_BDEV=`_sysfs_dev $SCRATCH_DEV` > +enable_io_failure() > +{ > + echo 100 > $DEBUGFS_MNT/fail_make_request/probability > + echo 1000 > $DEBUGFS_MNT/fail_make_request/times > + echo 0 > $DEBUGFS_MNT/fail_make_request/verbose > + echo 1 > $SYSFS_BDEV/make-it-fail > +} > + > +disable_io_failure() > +{ > + echo 0 > $SYSFS_BDEV/make-it-fail > + echo 0 > $DEBUGFS_MNT/fail_make_request/probability > + echo 0 > $DEBUGFS_MNT/fail_make_request/times > +} > + > +_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1 > + > +# It doesn't matter which compression algorithm we use. > +_scratch_mount -ocompress > + > +# Create a file with all data being compressed > +$XFS_IO_PROG -f -c "pwrite -W 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io > + > +# Raid1 consists of two copies and btrfs decides which copy to read by reader's > +# %pid. Now we inject errors to copy #1 and copy #0 is good. We want to read > +# the bad copy to trigger read-repair. > +while [[ -z $result ]]; do > + # invalidate the page cache > + $XFS_IO_PROG -f -c "fadvise -d 0 8K" $SCRATCH_MNT/foobar Does 'echo 3 > /proc/sys/vm/drop_caches' work? Thanks, Eryu > + > + enable_io_failure > + > + result=$(bash -c " > + if [ \$((\$\$ % 2)) == 1 ]; then > + echo 1 > /proc/\$\$/make-it-fail > + exec $XFS_IO_PROG -c \"pread 0 8K\" \$SCRATCH_MNT/foobar > + fi") > + > + disable_io_failure > +done > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/150.out b/tests/btrfs/150.out > new file mode 100644 > index 0000000..c492c24 > --- /dev/null > +++ b/tests/btrfs/150.out > @@ -0,0 +1,3 @@ > +QA output created by 150 > +wrote 8192/8192 bytes at offset 0 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > diff --git a/tests/btrfs/group b/tests/btrfs/group > index 70c3f05..e73bb1b 100644 > --- a/tests/btrfs/group > +++ b/tests/btrfs/group > @@ -152,3 +152,4 @@ > 147 auto quick send > 148 auto quick rw > 149 auto quick send compress > +150 auto quick dangerous > -- > 2.5.0 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 26, 2017 at 05:02:36PM +0800, Eryu Guan wrote: > On Fri, Sep 22, 2017 at 05:21:27PM -0600, Liu Bo wrote: > > We had a bug in btrfs compression code which could end up with a > > kernel panic. > > > > This is adding a regression test for the bug and I've also sent a > > kernel patch to fix the bug. > > > > The patch is "Btrfs: fix kernel oops while reading compressed data". > > > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> > > Hmm, I can't reproduce the panic with 4.13 kernel, which doesn't have > the fix applied. Can you please help confirm if it panics on your test > environment? > Yes, it is reproducible on my box, hrm...I'll be running it more times to double check. > > --- > > v2: - Fix ambiguous copyright. > > - Use /proc/$pid/make-it-fail to specify IO failure > > - Use bash -c to run test only when pid is odd. > > - Add test to dangerous group. > > > > tests/btrfs/150 | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > tests/btrfs/150.out | 3 ++ > > tests/btrfs/group | 1 + > > 3 files changed, 107 insertions(+) > > create mode 100755 tests/btrfs/150 > > create mode 100644 tests/btrfs/150.out > > > > diff --git a/tests/btrfs/150 b/tests/btrfs/150 > > new file mode 100755 > > index 0000000..8891c38 > > --- /dev/null > > +++ b/tests/btrfs/150 > > @@ -0,0 +1,103 @@ > > +#! /bin/bash > > +# FS QA Test btrfs/150 > > +# > > +# This is a regression test which ends up with a kernel oops in btrfs. > > +# It occurs when btrfs's read repair happens while reading a compressed > > +# extent. > > +# The patch to fix it is > > +# Btrfs: fix kernel oops while reading compressed data > > +# > > +#----------------------------------------------------------------------- > > +# Copyright (c) 2017 Oracle. All Rights Reserved. > > +# > > +# This program is free software; you can redistribute it and/or > > +# modify it under the terms of the GNU General Public License as > > +# published by the Free Software Foundation. > > +# > > +# This program is distributed in the hope that it would be useful, > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > +# GNU General Public License for more details. > > +# > > +# You should have received a copy of the GNU General Public License > > +# along with this program; if not, write the Free Software Foundation, > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > +#----------------------------------------------------------------------- > > +# > > + > > +seq=`basename $0` > > +seqres=$RESULT_DIR/$seq > > +echo "QA output created by $seq" > > + > > +here=`pwd` > > +tmp=/tmp/$$ > > +status=1 # failure is the default! > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > + > > +_cleanup() > > +{ > > + cd / > > + rm -f $tmp.* > > +} > > + > > +# get standard environment, filters and checks > > +. ./common/rc > > +. ./common/filter > > + > > +# remove previous $seqres.full before test > > +rm -f $seqres.full > > + > > +# real QA test starts here > > + > > +# Modify as appropriate. > > +_supported_fs btrfs > > +_supported_os Linux > > +_require_scratch > > +_require_fail_make_request > > +_require_scratch_dev_pool 2 > > Trailing whitespace in above line. > > > + > > +SYSFS_BDEV=`_sysfs_dev $SCRATCH_DEV` > > +enable_io_failure() > > +{ > > + echo 100 > $DEBUGFS_MNT/fail_make_request/probability > > + echo 1000 > $DEBUGFS_MNT/fail_make_request/times > > + echo 0 > $DEBUGFS_MNT/fail_make_request/verbose > > + echo 1 > $SYSFS_BDEV/make-it-fail > > +} > > + > > +disable_io_failure() > > +{ > > + echo 0 > $SYSFS_BDEV/make-it-fail > > + echo 0 > $DEBUGFS_MNT/fail_make_request/probability > > + echo 0 > $DEBUGFS_MNT/fail_make_request/times > > +} > > + > > +_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1 > > + > > +# It doesn't matter which compression algorithm we use. > > +_scratch_mount -ocompress > > + > > +# Create a file with all data being compressed > > +$XFS_IO_PROG -f -c "pwrite -W 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io > > + > > +# Raid1 consists of two copies and btrfs decides which copy to read by reader's > > +# %pid. Now we inject errors to copy #1 and copy #0 is good. We want to read > > +# the bad copy to trigger read-repair. > > +while [[ -z $result ]]; do > > + # invalidate the page cache > > + $XFS_IO_PROG -f -c "fadvise -d 0 8K" $SCRATCH_MNT/foobar > > Does 'echo 3 > /proc/sys/vm/drop_caches' work? > Yes, it works, drop_caches is system-wide, while here I'm just dropping caches on this single inode. Or are you implying that it's 'fadvise' that makes the test fail to show oops? thanks, -liubo > Thanks, > Eryu > > > + > > + enable_io_failure > > + > > + result=$(bash -c " > > + if [ \$((\$\$ % 2)) == 1 ]; then > > + echo 1 > /proc/\$\$/make-it-fail > > + exec $XFS_IO_PROG -c \"pread 0 8K\" \$SCRATCH_MNT/foobar > > + fi") > > + > > + disable_io_failure > > +done > > + > > +# success, all done > > +status=0 > > +exit > > diff --git a/tests/btrfs/150.out b/tests/btrfs/150.out > > new file mode 100644 > > index 0000000..c492c24 > > --- /dev/null > > +++ b/tests/btrfs/150.out > > @@ -0,0 +1,3 @@ > > +QA output created by 150 > > +wrote 8192/8192 bytes at offset 0 > > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > diff --git a/tests/btrfs/group b/tests/btrfs/group > > index 70c3f05..e73bb1b 100644 > > --- a/tests/btrfs/group > > +++ b/tests/btrfs/group > > @@ -152,3 +152,4 @@ > > 147 auto quick send > > 148 auto quick rw > > 149 auto quick send compress > > +150 auto quick dangerous > > -- > > 2.5.0 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe fstests" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 26, 2017 at 04:37:52PM -0700, Liu Bo wrote: > On Tue, Sep 26, 2017 at 05:02:36PM +0800, Eryu Guan wrote: > > On Fri, Sep 22, 2017 at 05:21:27PM -0600, Liu Bo wrote: > > > We had a bug in btrfs compression code which could end up with a > > > kernel panic. > > > > > > This is adding a regression test for the bug and I've also sent a > > > kernel patch to fix the bug. > > > > > > The patch is "Btrfs: fix kernel oops while reading compressed data". > > > > > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> > > > > Hmm, I can't reproduce the panic with 4.13 kernel, which doesn't have > > the fix applied. Can you please help confirm if it panics on your test > > environment? > > > > Yes, it is reproducible on my box, hrm...I'll be running it more times > to double check. > It worked for me...both v4.13 and v4.14.0-rc2 have the following messages[1]. This requires two config: CONFIG_FAULT_INJECTION=y CONFIG_FAULT_INJECTION_DEBUG_FS=y Could you please check again? [1]: [ 135.982643] run fstests btrfs/150 at 2017-09-26 16:11:27 [ 136.839434] BTRFS: device fsid 9152fe7e-3006-47d5-a9b7-330af2809da7 devid 1 transid 5 /dev/sde [ 136.842082] BTRFS: device fsid 9152fe7e-3006-47d5-a9b7-330af2809da7 devid 2 transid 5 /dev/sdc [ 136.879626] BTRFS info (device sdc): use zlib compression [ 136.880263] BTRFS info (device sdc): disk space caching is enabled [ 136.880845] BTRFS info (device sdc): has skinny extents [ 136.881386] BTRFS info (device sdc): flagging fs with big metadata feature [ 136.890763] BTRFS info (device sdc): creating UUID tree [ 137.023210] BTRFS error (device sdc): bdev /dev/sde errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 [ 137.023959] BTRFS warning (device sdc): csum failed root 5 ino 257 off 136839168 csum 0x98f94189 expected csum 0xd9cece72 mirror 0 [ 137.025349] ------------[ cut here ]------------ [ 137.025735] kernel BUG at fs/btrfs/extent_io.c:2104! [ 137.025800] ------------[ cut here ]------------ [ 137.025805] kernel BUG at fs/btrfs/extent_io.c:2104! Thanks, -liubo > > > --- > > > v2: - Fix ambiguous copyright. > > > - Use /proc/$pid/make-it-fail to specify IO failure > > > - Use bash -c to run test only when pid is odd. > > > - Add test to dangerous group. > > > > > > tests/btrfs/150 | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > tests/btrfs/150.out | 3 ++ > > > tests/btrfs/group | 1 + > > > 3 files changed, 107 insertions(+) > > > create mode 100755 tests/btrfs/150 > > > create mode 100644 tests/btrfs/150.out > > > > > > diff --git a/tests/btrfs/150 b/tests/btrfs/150 > > > new file mode 100755 > > > index 0000000..8891c38 > > > --- /dev/null > > > +++ b/tests/btrfs/150 > > > @@ -0,0 +1,103 @@ > > > +#! /bin/bash > > > +# FS QA Test btrfs/150 > > > +# > > > +# This is a regression test which ends up with a kernel oops in btrfs. > > > +# It occurs when btrfs's read repair happens while reading a compressed > > > +# extent. > > > +# The patch to fix it is > > > +# Btrfs: fix kernel oops while reading compressed data > > > +# > > > +#----------------------------------------------------------------------- > > > +# Copyright (c) 2017 Oracle. All Rights Reserved. > > > +# > > > +# This program is free software; you can redistribute it and/or > > > +# modify it under the terms of the GNU General Public License as > > > +# published by the Free Software Foundation. > > > +# > > > +# This program is distributed in the hope that it would be useful, > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > > +# GNU General Public License for more details. > > > +# > > > +# You should have received a copy of the GNU General Public License > > > +# along with this program; if not, write the Free Software Foundation, > > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > > +#----------------------------------------------------------------------- > > > +# > > > + > > > +seq=`basename $0` > > > +seqres=$RESULT_DIR/$seq > > > +echo "QA output created by $seq" > > > + > > > +here=`pwd` > > > +tmp=/tmp/$$ > > > +status=1 # failure is the default! > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > + > > > +_cleanup() > > > +{ > > > + cd / > > > + rm -f $tmp.* > > > +} > > > + > > > +# get standard environment, filters and checks > > > +. ./common/rc > > > +. ./common/filter > > > + > > > +# remove previous $seqres.full before test > > > +rm -f $seqres.full > > > + > > > +# real QA test starts here > > > + > > > +# Modify as appropriate. > > > +_supported_fs btrfs > > > +_supported_os Linux > > > +_require_scratch > > > +_require_fail_make_request > > > +_require_scratch_dev_pool 2 > > > > Trailing whitespace in above line. > > > > > + > > > +SYSFS_BDEV=`_sysfs_dev $SCRATCH_DEV` > > > +enable_io_failure() > > > +{ > > > + echo 100 > $DEBUGFS_MNT/fail_make_request/probability > > > + echo 1000 > $DEBUGFS_MNT/fail_make_request/times > > > + echo 0 > $DEBUGFS_MNT/fail_make_request/verbose > > > + echo 1 > $SYSFS_BDEV/make-it-fail > > > +} > > > + > > > +disable_io_failure() > > > +{ > > > + echo 0 > $SYSFS_BDEV/make-it-fail > > > + echo 0 > $DEBUGFS_MNT/fail_make_request/probability > > > + echo 0 > $DEBUGFS_MNT/fail_make_request/times > > > +} > > > + > > > +_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1 > > > + > > > +# It doesn't matter which compression algorithm we use. > > > +_scratch_mount -ocompress > > > + > > > +# Create a file with all data being compressed > > > +$XFS_IO_PROG -f -c "pwrite -W 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io > > > + > > > +# Raid1 consists of two copies and btrfs decides which copy to read by reader's > > > +# %pid. Now we inject errors to copy #1 and copy #0 is good. We want to read > > > +# the bad copy to trigger read-repair. > > > +while [[ -z $result ]]; do > > > + # invalidate the page cache > > > + $XFS_IO_PROG -f -c "fadvise -d 0 8K" $SCRATCH_MNT/foobar > > > > Does 'echo 3 > /proc/sys/vm/drop_caches' work? > > > > Yes, it works, drop_caches is system-wide, while here I'm just > dropping caches on this single inode. > > Or are you implying that it's 'fadvise' that makes the test fail to > show oops? > > thanks, > > -liubo > > > Thanks, > > Eryu > > > > > + > > > + enable_io_failure > > > + > > > + result=$(bash -c " > > > + if [ \$((\$\$ % 2)) == 1 ]; then > > > + echo 1 > /proc/\$\$/make-it-fail > > > + exec $XFS_IO_PROG -c \"pread 0 8K\" \$SCRATCH_MNT/foobar > > > + fi") > > > + > > > + disable_io_failure > > > +done > > > + > > > +# success, all done > > > +status=0 > > > +exit > > > diff --git a/tests/btrfs/150.out b/tests/btrfs/150.out > > > new file mode 100644 > > > index 0000000..c492c24 > > > --- /dev/null > > > +++ b/tests/btrfs/150.out > > > @@ -0,0 +1,3 @@ > > > +QA output created by 150 > > > +wrote 8192/8192 bytes at offset 0 > > > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > > diff --git a/tests/btrfs/group b/tests/btrfs/group > > > index 70c3f05..e73bb1b 100644 > > > --- a/tests/btrfs/group > > > +++ b/tests/btrfs/group > > > @@ -152,3 +152,4 @@ > > > 147 auto quick send > > > 148 auto quick rw > > > 149 auto quick send compress > > > +150 auto quick dangerous > > > -- > > > 2.5.0 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe fstests" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 26, 2017 at 05:18:51PM -0700, Liu Bo wrote: > On Tue, Sep 26, 2017 at 04:37:52PM -0700, Liu Bo wrote: > > On Tue, Sep 26, 2017 at 05:02:36PM +0800, Eryu Guan wrote: > > > On Fri, Sep 22, 2017 at 05:21:27PM -0600, Liu Bo wrote: > > > > We had a bug in btrfs compression code which could end up with a > > > > kernel panic. > > > > > > > > This is adding a regression test for the bug and I've also sent a > > > > kernel patch to fix the bug. > > > > > > > > The patch is "Btrfs: fix kernel oops while reading compressed data". > > > > > > > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> > > > > > > Hmm, I can't reproduce the panic with 4.13 kernel, which doesn't have > > > the fix applied. Can you please help confirm if it panics on your test > > > environment? > > > > > > > Yes, it is reproducible on my box, hrm...I'll be running it more times > > to double check. > > > > It worked for me...both v4.13 and v4.14.0-rc2 have the following > messages[1]. > > This requires two config: > CONFIG_FAULT_INJECTION=y > CONFIG_FAULT_INJECTION_DEBUG_FS=y > > Could you please check again? I re-compiled 4.14-rc2 kernel on my test vm with FAIL_MAKE_REQUEST enabled (which requires FAULT_INJECTION), and I can reproduce the crash now. It was so weired that previously I did have FAIL_MAKE_REQUEST enabled and test ran normally without hitting the bug, but now I can hit the bug quite reliably. Not sure what was happning in my previous test.. Thanks for confirming! Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 27, 2017 at 05:46:44PM +0800, Eryu Guan wrote: > On Tue, Sep 26, 2017 at 05:18:51PM -0700, Liu Bo wrote: > > On Tue, Sep 26, 2017 at 04:37:52PM -0700, Liu Bo wrote: > > > On Tue, Sep 26, 2017 at 05:02:36PM +0800, Eryu Guan wrote: > > > > On Fri, Sep 22, 2017 at 05:21:27PM -0600, Liu Bo wrote: > > > > > We had a bug in btrfs compression code which could end up with a > > > > > kernel panic. > > > > > > > > > > This is adding a regression test for the bug and I've also sent a > > > > > kernel patch to fix the bug. > > > > > > > > > > The patch is "Btrfs: fix kernel oops while reading compressed data". > > > > > > > > > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> > > > > > > > > Hmm, I can't reproduce the panic with 4.13 kernel, which doesn't have > > > > the fix applied. Can you please help confirm if it panics on your test > > > > environment? > > > > > > > > > > Yes, it is reproducible on my box, hrm...I'll be running it more times > > > to double check. > > > > > > > It worked for me...both v4.13 and v4.14.0-rc2 have the following > > messages[1]. > > > > This requires two config: > > CONFIG_FAULT_INJECTION=y > > CONFIG_FAULT_INJECTION_DEBUG_FS=y > > > > Could you please check again? > > I re-compiled 4.14-rc2 kernel on my test vm with FAIL_MAKE_REQUEST > enabled (which requires FAULT_INJECTION), and I can reproduce the crash > now. It was so weired that previously I did have FAIL_MAKE_REQUEST > enabled and test ran normally without hitting the bug, but now I can hit > the bug quite reliably. Not sure what was happning in my previous test.. > > Thanks for confirming! No problem at all, then I'll send a patch v3 with enabling task-filter pointed out by Lu. thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/tests/btrfs/150 b/tests/btrfs/150 new file mode 100755 index 0000000..8891c38 --- /dev/null +++ b/tests/btrfs/150 @@ -0,0 +1,103 @@ +#! /bin/bash +# FS QA Test btrfs/150 +# +# This is a regression test which ends up with a kernel oops in btrfs. +# It occurs when btrfs's read repair happens while reading a compressed +# extent. +# The patch to fix it is +# Btrfs: fix kernel oops while reading compressed data +# +#----------------------------------------------------------------------- +# Copyright (c) 2017 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#----------------------------------------------------------------------- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_fail_make_request +_require_scratch_dev_pool 2 + +SYSFS_BDEV=`_sysfs_dev $SCRATCH_DEV` +enable_io_failure() +{ + echo 100 > $DEBUGFS_MNT/fail_make_request/probability + echo 1000 > $DEBUGFS_MNT/fail_make_request/times + echo 0 > $DEBUGFS_MNT/fail_make_request/verbose + echo 1 > $SYSFS_BDEV/make-it-fail +} + +disable_io_failure() +{ + echo 0 > $SYSFS_BDEV/make-it-fail + echo 0 > $DEBUGFS_MNT/fail_make_request/probability + echo 0 > $DEBUGFS_MNT/fail_make_request/times +} + +_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1 + +# It doesn't matter which compression algorithm we use. +_scratch_mount -ocompress + +# Create a file with all data being compressed +$XFS_IO_PROG -f -c "pwrite -W 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io + +# Raid1 consists of two copies and btrfs decides which copy to read by reader's +# %pid. Now we inject errors to copy #1 and copy #0 is good. We want to read +# the bad copy to trigger read-repair. +while [[ -z $result ]]; do + # invalidate the page cache + $XFS_IO_PROG -f -c "fadvise -d 0 8K" $SCRATCH_MNT/foobar + + enable_io_failure + + result=$(bash -c " + if [ \$((\$\$ % 2)) == 1 ]; then + echo 1 > /proc/\$\$/make-it-fail + exec $XFS_IO_PROG -c \"pread 0 8K\" \$SCRATCH_MNT/foobar + fi") + + disable_io_failure +done + +# success, all done +status=0 +exit diff --git a/tests/btrfs/150.out b/tests/btrfs/150.out new file mode 100644 index 0000000..c492c24 --- /dev/null +++ b/tests/btrfs/150.out @@ -0,0 +1,3 @@ +QA output created by 150 +wrote 8192/8192 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) diff --git a/tests/btrfs/group b/tests/btrfs/group index 70c3f05..e73bb1b 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -152,3 +152,4 @@ 147 auto quick send 148 auto quick rw 149 auto quick send compress +150 auto quick dangerous
We had a bug in btrfs compression code which could end up with a kernel panic. This is adding a regression test for the bug and I've also sent a kernel patch to fix the bug. The patch is "Btrfs: fix kernel oops while reading compressed data". Signed-off-by: Liu Bo <bo.li.liu@oracle.com> --- v2: - Fix ambiguous copyright. - Use /proc/$pid/make-it-fail to specify IO failure - Use bash -c to run test only when pid is odd. - Add test to dangerous group. tests/btrfs/150 | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/btrfs/150.out | 3 ++ tests/btrfs/group | 1 + 3 files changed, 107 insertions(+) create mode 100755 tests/btrfs/150 create mode 100644 tests/btrfs/150.out