Message ID | 20230622065438.86402-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: add a test case to verify the write behavior of large RAID5 data chunks | expand |
On Thu, Jun 22, 2023 at 02:54:38PM +0800, Qu Wenruo wrote: > There is a recent regression during v6.4 merge window, that a u32 left > shift overflow can cause problems with large data chunks (over 4G) > sized. > > This is especially nasty for RAID56, which can lead to ASSERT() during > regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled. > > This is the regression test case for it. I am not able to reproduce the problem with the previous fix a7299a18a179a971 applied but not the additional one so I can't validate the fix and will have to postpone sending the pull request.
On 2023/6/23 01:23, David Sterba wrote: > On Thu, Jun 22, 2023 at 02:54:38PM +0800, Qu Wenruo wrote: >> There is a recent regression during v6.4 merge window, that a u32 left >> shift overflow can cause problems with large data chunks (over 4G) >> sized. >> >> This is especially nasty for RAID56, which can lead to ASSERT() during >> regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled. >> >> This is the regression test case for it. > > I am not able to reproduce the problem with the previous fix > a7299a18a179a971 applied Well, it's 100% reproducible here. And the missing type cast is indeed the cause inside btrfs_map_block(): if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map && (op != BTRFS_MAP_READ || mirror_num > 1)) { bioc->full_stripe_logical = em->start + ((stripe_nr * data_stripes) << BTRFS_STRIPE_LEN_SHIFT); Obviously @stripe_nr (now the full stripe number) * @data_stripes can reach stripe number beyond 4G range. And a left shift would lead to overflow, causing a too small value. This would trigger the ASSERT() in the raid56 code of rbio_add_bio(). I'm not sure why it didn't trigger in your case, mind to share the seqres.full? Thanks, Qu > but not the additional one so I can't validate > the fix and will have to postpone sending the pull request.
On Fri, Jun 23, 2023 at 06:03:54AM +0800, Qu Wenruo wrote: > > > On 2023/6/23 01:23, David Sterba wrote: > > On Thu, Jun 22, 2023 at 02:54:38PM +0800, Qu Wenruo wrote: > >> There is a recent regression during v6.4 merge window, that a u32 left > >> shift overflow can cause problems with large data chunks (over 4G) > >> sized. > >> > >> This is especially nasty for RAID56, which can lead to ASSERT() during > >> regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled. > >> > >> This is the regression test case for it. > > > > I am not able to reproduce the problem with the previous fix > > a7299a18a179a971 applied > > Well, it's 100% reproducible here. So the cause was that binaries of VM fstests built in fstests got out of sync with the system and thus many things did not work. After resolving that I was able to test it properly.
On 22/06/2023 14:54, Qu Wenruo wrote: > There is a recent regression during v6.4 merge window, that a u32 left > shift overflow can cause problems with large data chunks (over 4G) > sized. > > This is especially nasty for RAID56, which can lead to ASSERT() during > regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled. > > This is the regression test case for it. > > Unlike btrfs/292, btrfs doesn't support trim inside RAID56 chunks, thus > the workflow is simplified: > > - Create a RAID5 or RAID6 data chunk during mkfs > > - Fill the fs with 5G data and sync > For unpatched kernel, the sync would crash the kernel. > > - Make sure everything is fine > > Signed-off-by: Qu Wenruo <wqu@suse.com> Looks good. Reviewed-by: Anand Jain <anand.jain@oracle.com> Thanks. > --- > tests/btrfs/293 | 72 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/293.out | 2 ++ > 2 files changed, 74 insertions(+) > create mode 100755 tests/btrfs/293 > create mode 100644 tests/btrfs/293.out > > diff --git a/tests/btrfs/293 b/tests/btrfs/293 > new file mode 100755 > index 00000000..68312682 > --- /dev/null > +++ b/tests/btrfs/293 > @@ -0,0 +1,72 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 293 > +# > +# Test btrfs write behavior with large RAID56 chunks (size beyond 4G). > +# > +. ./common/preamble > +_begin_fstest auto raid volume > + > +# Import common functions. > +. ./common/filter > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_require_scratch_dev_pool 8 > +_fixed_by_kernel_commit a7299a18a179 \ > + "btrfs: fix u32 overflows when left shifting @stripe_nr" > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > + "btrfs: use a dedicated helper to convert stripe_nr to offset" > + > +_scratch_dev_pool_get 8 > + > +datasize=$((5 * 1024 * 1024 * 1024)) > + > + > +workload() > +{ > + local data_profile=$1 > + > + _scratch_pool_mkfs -m raid1 -d $data_profile >> $seqres.full 2>&1 > + _scratch_mount > + $XFS_IO_PROG -f -c "pwrite -b 1m 0 $datasize" $SCRATCH_MNT/foobar > /dev/null > + > + # Unpatched kernel would trigger an ASSERT() or crash at writeback. > + sync > + > + echo "=== With initial 5G data written ($data_profile) ===" >> $seqres.full > + $BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full > + _scratch_unmount > + > + # Make sure we haven't corrupted anything. > + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1 > + if [ $? -ne 0 ]; then > + _scratch_dev_pool_put > + _fail "data corruption detected after initial data filling" > + fi > +} > + > +# Make sure each device has at least 2G. > +# Btrfs has a limits on per-device stripe length of 1G. > +# Double that so that we can ensure a RAID6 data chunk with 6G size. > +for i in $SCRATCH_DEV_POOL; do > + devsize=$(blockdev --getsize64 "$i") > + if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then > + _scratch_dev_pool_put > + _notrun "device $i is too small, need at least 2G" > + fi > +done > + > +workload raid5 > +workload raid6 > + > +_scratch_dev_pool_put > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/293.out b/tests/btrfs/293.out > new file mode 100644 > index 00000000..076fc056 > --- /dev/null > +++ b/tests/btrfs/293.out > @@ -0,0 +1,2 @@ > +QA output created by 293 > +Silence is golden
On Thu, Jun 22, 2023 at 02:54:38PM +0800, Qu Wenruo wrote: > There is a recent regression during v6.4 merge window, that a u32 left > shift overflow can cause problems with large data chunks (over 4G) > sized. > > This is especially nasty for RAID56, which can lead to ASSERT() during > regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled. > > This is the regression test case for it. > > Unlike btrfs/292, btrfs doesn't support trim inside RAID56 chunks, thus > the workflow is simplified: > > - Create a RAID5 or RAID6 data chunk during mkfs > > - Fill the fs with 5G data and sync > For unpatched kernel, the sync would crash the kernel. > > - Make sure everything is fine > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > tests/btrfs/293 | 72 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/293.out | 2 ++ > 2 files changed, 74 insertions(+) > create mode 100755 tests/btrfs/293 > create mode 100644 tests/btrfs/293.out > > diff --git a/tests/btrfs/293 b/tests/btrfs/293 > new file mode 100755 > index 00000000..68312682 > --- /dev/null > +++ b/tests/btrfs/293 > @@ -0,0 +1,72 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 293 > +# > +# Test btrfs write behavior with large RAID56 chunks (size beyond 4G). > +# > +. ./common/preamble > +_begin_fstest auto raid volume > + > +# Import common functions. > +. ./common/filter > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_require_scratch_dev_pool 8 Could you add following lines? # RAID5/6 not working on zoned mode. _require_non_zoned_device "${SCRATCH_DEV}" With that, looks good to me. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> > +_fixed_by_kernel_commit a7299a18a179 \ > + "btrfs: fix u32 overflows when left shifting @stripe_nr" > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > + "btrfs: use a dedicated helper to convert stripe_nr to offset" > + > +_scratch_dev_pool_get 8 > + > +datasize=$((5 * 1024 * 1024 * 1024)) > + > + > +workload() > +{ > + local data_profile=$1 > + > + _scratch_pool_mkfs -m raid1 -d $data_profile >> $seqres.full 2>&1 > + _scratch_mount > + $XFS_IO_PROG -f -c "pwrite -b 1m 0 $datasize" $SCRATCH_MNT/foobar > /dev/null > + > + # Unpatched kernel would trigger an ASSERT() or crash at writeback. > + sync > + > + echo "=== With initial 5G data written ($data_profile) ===" >> $seqres.full > + $BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full > + _scratch_unmount > + > + # Make sure we haven't corrupted anything. > + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1 > + if [ $? -ne 0 ]; then > + _scratch_dev_pool_put > + _fail "data corruption detected after initial data filling" > + fi > +} > + > +# Make sure each device has at least 2G. > +# Btrfs has a limits on per-device stripe length of 1G. > +# Double that so that we can ensure a RAID6 data chunk with 6G size. > +for i in $SCRATCH_DEV_POOL; do > + devsize=$(blockdev --getsize64 "$i") > + if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then > + _scratch_dev_pool_put > + _notrun "device $i is too small, need at least 2G" > + fi > +done > + > +workload raid5 > +workload raid6 > + > +_scratch_dev_pool_put > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/293.out b/tests/btrfs/293.out > new file mode 100644 > index 00000000..076fc056 > --- /dev/null > +++ b/tests/btrfs/293.out > @@ -0,0 +1,2 @@ > +QA output created by 293 > +Silence is golden > -- > 2.39.0 >
On Mon, Jul 24, 2023 at 01:23:06AM +0000, Naohiro Aota wrote: > On Thu, Jun 22, 2023 at 02:54:38PM +0800, Qu Wenruo wrote: > > There is a recent regression during v6.4 merge window, that a u32 left > > shift overflow can cause problems with large data chunks (over 4G) > > sized. > > > > This is especially nasty for RAID56, which can lead to ASSERT() during > > regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled. > > > > This is the regression test case for it. > > > > Unlike btrfs/292, btrfs doesn't support trim inside RAID56 chunks, thus > > the workflow is simplified: > > > > - Create a RAID5 or RAID6 data chunk during mkfs > > > > - Fill the fs with 5G data and sync > > For unpatched kernel, the sync would crash the kernel. > > > > - Make sure everything is fine > > > > Signed-off-by: Qu Wenruo <wqu@suse.com> > > --- > > tests/btrfs/293 | 72 +++++++++++++++++++++++++++++++++++++++++++++ > > tests/btrfs/293.out | 2 ++ > > 2 files changed, 74 insertions(+) > > create mode 100755 tests/btrfs/293 > > create mode 100644 tests/btrfs/293.out > > > > diff --git a/tests/btrfs/293 b/tests/btrfs/293 > > new file mode 100755 > > index 00000000..68312682 > > --- /dev/null > > +++ b/tests/btrfs/293 > > @@ -0,0 +1,72 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved. > > +# > > +# FS QA Test 293 > > +# > > +# Test btrfs write behavior with large RAID56 chunks (size beyond 4G). > > +# > > +. ./common/preamble > > +_begin_fstest auto raid volume > > + > > +# Import common functions. > > +. ./common/filter > > + > > +# real QA test starts here > > + > > +# Modify as appropriate. > > +_supported_fs btrfs > > +_require_scratch_dev_pool 8 > > Could you add following lines? > > # RAID5/6 not working on zoned mode. > _require_non_zoned_device "${SCRATCH_DEV}" > > With that, looks good to me. Hi Naohiro, This patch has been merged last weekend. If you feel this line is needed, please send another patch to 'fix' it. Thanks, Zorro > > Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> > > > +_fixed_by_kernel_commit a7299a18a179 \ > > + "btrfs: fix u32 overflows when left shifting @stripe_nr" > > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > > + "btrfs: use a dedicated helper to convert stripe_nr to offset" > > + > > +_scratch_dev_pool_get 8 > > + > > +datasize=$((5 * 1024 * 1024 * 1024)) > > + > > + > > +workload() > > +{ > > + local data_profile=$1 > > + > > + _scratch_pool_mkfs -m raid1 -d $data_profile >> $seqres.full 2>&1 > > + _scratch_mount > > + $XFS_IO_PROG -f -c "pwrite -b 1m 0 $datasize" $SCRATCH_MNT/foobar > /dev/null > > + > > + # Unpatched kernel would trigger an ASSERT() or crash at writeback. > > + sync > > + > > + echo "=== With initial 5G data written ($data_profile) ===" >> $seqres.full > > + $BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full > > + _scratch_unmount > > + > > + # Make sure we haven't corrupted anything. > > + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1 > > + if [ $? -ne 0 ]; then > > + _scratch_dev_pool_put > > + _fail "data corruption detected after initial data filling" > > + fi > > +} > > + > > +# Make sure each device has at least 2G. > > +# Btrfs has a limits on per-device stripe length of 1G. > > +# Double that so that we can ensure a RAID6 data chunk with 6G size. > > +for i in $SCRATCH_DEV_POOL; do > > + devsize=$(blockdev --getsize64 "$i") > > + if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then > > + _scratch_dev_pool_put > > + _notrun "device $i is too small, need at least 2G" > > + fi > > +done > > + > > +workload raid5 > > +workload raid6 > > + > > +_scratch_dev_pool_put > > +echo "Silence is golden" > > + > > +# success, all done > > +status=0 > > +exit > > diff --git a/tests/btrfs/293.out b/tests/btrfs/293.out > > new file mode 100644 > > index 00000000..076fc056 > > --- /dev/null > > +++ b/tests/btrfs/293.out > > @@ -0,0 +1,2 @@ > > +QA output created by 293 > > +Silence is golden > > -- > > 2.39.0 > > >
diff --git a/tests/btrfs/293 b/tests/btrfs/293 new file mode 100755 index 00000000..68312682 --- /dev/null +++ b/tests/btrfs/293 @@ -0,0 +1,72 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved. +# +# FS QA Test 293 +# +# Test btrfs write behavior with large RAID56 chunks (size beyond 4G). +# +. ./common/preamble +_begin_fstest auto raid volume + +# Import common functions. +. ./common/filter + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_require_scratch_dev_pool 8 +_fixed_by_kernel_commit a7299a18a179 \ + "btrfs: fix u32 overflows when left shifting @stripe_nr" +_fixed_by_kernel_commit xxxxxxxxxxxx \ + "btrfs: use a dedicated helper to convert stripe_nr to offset" + +_scratch_dev_pool_get 8 + +datasize=$((5 * 1024 * 1024 * 1024)) + + +workload() +{ + local data_profile=$1 + + _scratch_pool_mkfs -m raid1 -d $data_profile >> $seqres.full 2>&1 + _scratch_mount + $XFS_IO_PROG -f -c "pwrite -b 1m 0 $datasize" $SCRATCH_MNT/foobar > /dev/null + + # Unpatched kernel would trigger an ASSERT() or crash at writeback. + sync + + echo "=== With initial 5G data written ($data_profile) ===" >> $seqres.full + $BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full + _scratch_unmount + + # Make sure we haven't corrupted anything. + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1 + if [ $? -ne 0 ]; then + _scratch_dev_pool_put + _fail "data corruption detected after initial data filling" + fi +} + +# Make sure each device has at least 2G. +# Btrfs has a limits on per-device stripe length of 1G. +# Double that so that we can ensure a RAID6 data chunk with 6G size. +for i in $SCRATCH_DEV_POOL; do + devsize=$(blockdev --getsize64 "$i") + if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then + _scratch_dev_pool_put + _notrun "device $i is too small, need at least 2G" + fi +done + +workload raid5 +workload raid6 + +_scratch_dev_pool_put +echo "Silence is golden" + +# success, all done +status=0 +exit diff --git a/tests/btrfs/293.out b/tests/btrfs/293.out new file mode 100644 index 00000000..076fc056 --- /dev/null +++ b/tests/btrfs/293.out @@ -0,0 +1,2 @@ +QA output created by 293 +Silence is golden
There is a recent regression during v6.4 merge window, that a u32 left shift overflow can cause problems with large data chunks (over 4G) sized. This is especially nasty for RAID56, which can lead to ASSERT() during regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled. This is the regression test case for it. Unlike btrfs/292, btrfs doesn't support trim inside RAID56 chunks, thus the workflow is simplified: - Create a RAID5 or RAID6 data chunk during mkfs - Fill the fs with 5G data and sync For unpatched kernel, the sync would crash the kernel. - Make sure everything is fine Signed-off-by: Qu Wenruo <wqu@suse.com> --- tests/btrfs/293 | 72 +++++++++++++++++++++++++++++++++++++++++++++ tests/btrfs/293.out | 2 ++ 2 files changed, 74 insertions(+) create mode 100755 tests/btrfs/293 create mode 100644 tests/btrfs/293.out