diff mbox series

btrfs: add a test case to verify the write behavior of large RAID5 data chunks

Message ID 20230622065438.86402-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: add a test case to verify the write behavior of large RAID5 data chunks | expand

Commit Message

Qu Wenruo June 22, 2023, 6:54 a.m. UTC
There is a recent regression during v6.4 merge window, that a u32 left
shift overflow can cause problems with large data chunks (over 4G)
sized.

This is especially nasty for RAID56, which can lead to ASSERT() during
regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled.

This is the regression test case for it.

Unlike btrfs/292, btrfs doesn't support trim inside RAID56 chunks, thus
the workflow is simplified:

- Create a RAID5 or RAID6 data chunk during mkfs

- Fill the fs with 5G data and sync
  For unpatched kernel, the sync would crash the kernel.

- Make sure everything is fine

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 tests/btrfs/293     | 72 +++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/293.out |  2 ++
 2 files changed, 74 insertions(+)
 create mode 100755 tests/btrfs/293
 create mode 100644 tests/btrfs/293.out

Comments

David Sterba June 22, 2023, 5:23 p.m. UTC | #1
On Thu, Jun 22, 2023 at 02:54:38PM +0800, Qu Wenruo wrote:
> There is a recent regression during v6.4 merge window, that a u32 left
> shift overflow can cause problems with large data chunks (over 4G)
> sized.
> 
> This is especially nasty for RAID56, which can lead to ASSERT() during
> regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled.
> 
> This is the regression test case for it.

I am not able to reproduce the problem with the previous fix
a7299a18a179a971 applied but not the additional one so I can't validate
the fix and will have to postpone sending the pull request.
Qu Wenruo June 22, 2023, 10:03 p.m. UTC | #2
On 2023/6/23 01:23, David Sterba wrote:
> On Thu, Jun 22, 2023 at 02:54:38PM +0800, Qu Wenruo wrote:
>> There is a recent regression during v6.4 merge window, that a u32 left
>> shift overflow can cause problems with large data chunks (over 4G)
>> sized.
>>
>> This is especially nasty for RAID56, which can lead to ASSERT() during
>> regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled.
>>
>> This is the regression test case for it.
>
> I am not able to reproduce the problem with the previous fix
> a7299a18a179a971 applied

Well, it's 100% reproducible here.

And the missing type cast is indeed the cause inside btrfs_map_block():

         if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map &&
             (op != BTRFS_MAP_READ || mirror_num > 1)) {
                 bioc->full_stripe_logical = em->start +
                         ((stripe_nr * data_stripes) <<
BTRFS_STRIPE_LEN_SHIFT);

Obviously @stripe_nr (now the full stripe number) * @data_stripes can
reach stripe number beyond 4G range.

And a left shift would lead to overflow, causing a too small value.

This would trigger the ASSERT() in the raid56 code of rbio_add_bio().


I'm not sure why it didn't trigger in your case, mind to share the
seqres.full?

Thanks,
Qu

> but not the additional one so I can't validate
> the fix and will have to postpone sending the pull request.
David Sterba June 26, 2023, 10:35 a.m. UTC | #3
On Fri, Jun 23, 2023 at 06:03:54AM +0800, Qu Wenruo wrote:
> 
> 
> On 2023/6/23 01:23, David Sterba wrote:
> > On Thu, Jun 22, 2023 at 02:54:38PM +0800, Qu Wenruo wrote:
> >> There is a recent regression during v6.4 merge window, that a u32 left
> >> shift overflow can cause problems with large data chunks (over 4G)
> >> sized.
> >>
> >> This is especially nasty for RAID56, which can lead to ASSERT() during
> >> regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled.
> >>
> >> This is the regression test case for it.
> >
> > I am not able to reproduce the problem with the previous fix
> > a7299a18a179a971 applied
> 
> Well, it's 100% reproducible here.

So the cause was that binaries of VM fstests built in fstests got out of
sync with the system and thus many things did not work. After resolving
that I was able to test it properly.
Anand Jain July 22, 2023, 2:52 p.m. UTC | #4
On 22/06/2023 14:54, Qu Wenruo wrote:
> There is a recent regression during v6.4 merge window, that a u32 left
> shift overflow can cause problems with large data chunks (over 4G)
> sized.
> 
> This is especially nasty for RAID56, which can lead to ASSERT() during
> regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled.
> 
> This is the regression test case for it.
> 
> Unlike btrfs/292, btrfs doesn't support trim inside RAID56 chunks, thus
> the workflow is simplified:
> 
> - Create a RAID5 or RAID6 data chunk during mkfs
> 
> - Fill the fs with 5G data and sync
>    For unpatched kernel, the sync would crash the kernel.
> 
> - Make sure everything is fine
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Looks good.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks.

> ---
>   tests/btrfs/293     | 72 +++++++++++++++++++++++++++++++++++++++++++++
>   tests/btrfs/293.out |  2 ++
>   2 files changed, 74 insertions(+)
>   create mode 100755 tests/btrfs/293
>   create mode 100644 tests/btrfs/293.out
> 
> diff --git a/tests/btrfs/293 b/tests/btrfs/293
> new file mode 100755
> index 00000000..68312682
> --- /dev/null
> +++ b/tests/btrfs/293
> @@ -0,0 +1,72 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 293
> +#
> +# Test btrfs write behavior with large RAID56 chunks (size beyond 4G).
> +#
> +. ./common/preamble
> +_begin_fstest auto raid volume
> +
> +# Import common functions.
> +. ./common/filter
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_require_scratch_dev_pool 8
> +_fixed_by_kernel_commit a7299a18a179 \
> +	"btrfs: fix u32 overflows when left shifting @stripe_nr"
> +_fixed_by_kernel_commit xxxxxxxxxxxx \
> +	"btrfs: use a dedicated helper to convert stripe_nr to offset"
> +
> +_scratch_dev_pool_get 8
> +
> +datasize=$((5 * 1024 * 1024 * 1024))
> +
> +
> +workload()
> +{
> +	local data_profile=$1
> +
> +	_scratch_pool_mkfs -m raid1 -d $data_profile >> $seqres.full 2>&1
> +	_scratch_mount
> +	$XFS_IO_PROG -f -c "pwrite -b 1m 0 $datasize" $SCRATCH_MNT/foobar > /dev/null
> +
> +	# Unpatched kernel would trigger an ASSERT() or crash at writeback.
> +	sync
> +
> +	echo "=== With initial 5G data written ($data_profile) ===" >> $seqres.full
> +	$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
> +	_scratch_unmount
> +
> +	# Make sure we haven't corrupted anything.
> +	$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1
> +	if [ $? -ne 0 ]; then
> +		_scratch_dev_pool_put
> +		_fail "data corruption detected after initial data filling"
> +	fi
> +}
> +
> +# Make sure each device has at least 2G.
> +# Btrfs has a limits on per-device stripe length of 1G.
> +# Double that so that we can ensure a RAID6 data chunk with 6G size.
> +for i in $SCRATCH_DEV_POOL; do
> +	devsize=$(blockdev --getsize64 "$i")
> +	if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then
> +		_scratch_dev_pool_put
> +		_notrun "device $i is too small, need at least 2G"
> +	fi
> +done
> +
> +workload raid5
> +workload raid6
> +
> +_scratch_dev_pool_put
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/293.out b/tests/btrfs/293.out
> new file mode 100644
> index 00000000..076fc056
> --- /dev/null
> +++ b/tests/btrfs/293.out
> @@ -0,0 +1,2 @@
> +QA output created by 293
> +Silence is golden
Naohiro Aota July 24, 2023, 1:23 a.m. UTC | #5
On Thu, Jun 22, 2023 at 02:54:38PM +0800, Qu Wenruo wrote:
> There is a recent regression during v6.4 merge window, that a u32 left
> shift overflow can cause problems with large data chunks (over 4G)
> sized.
> 
> This is especially nasty for RAID56, which can lead to ASSERT() during
> regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled.
> 
> This is the regression test case for it.
> 
> Unlike btrfs/292, btrfs doesn't support trim inside RAID56 chunks, thus
> the workflow is simplified:
> 
> - Create a RAID5 or RAID6 data chunk during mkfs
> 
> - Fill the fs with 5G data and sync
>   For unpatched kernel, the sync would crash the kernel.
> 
> - Make sure everything is fine
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  tests/btrfs/293     | 72 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/293.out |  2 ++
>  2 files changed, 74 insertions(+)
>  create mode 100755 tests/btrfs/293
>  create mode 100644 tests/btrfs/293.out
> 
> diff --git a/tests/btrfs/293 b/tests/btrfs/293
> new file mode 100755
> index 00000000..68312682
> --- /dev/null
> +++ b/tests/btrfs/293
> @@ -0,0 +1,72 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 293
> +#
> +# Test btrfs write behavior with large RAID56 chunks (size beyond 4G).
> +#
> +. ./common/preamble
> +_begin_fstest auto raid volume
> +
> +# Import common functions.
> +. ./common/filter
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_require_scratch_dev_pool 8

Could you add following lines?

# RAID5/6 not working on zoned mode.
_require_non_zoned_device "${SCRATCH_DEV}"

With that, looks good to me.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>

> +_fixed_by_kernel_commit a7299a18a179 \
> +	"btrfs: fix u32 overflows when left shifting @stripe_nr"
> +_fixed_by_kernel_commit xxxxxxxxxxxx \
> +	"btrfs: use a dedicated helper to convert stripe_nr to offset"
> +
> +_scratch_dev_pool_get 8
> +
> +datasize=$((5 * 1024 * 1024 * 1024))
> +
> +
> +workload()
> +{
> +	local data_profile=$1
> +
> +	_scratch_pool_mkfs -m raid1 -d $data_profile >> $seqres.full 2>&1
> +	_scratch_mount
> +	$XFS_IO_PROG -f -c "pwrite -b 1m 0 $datasize" $SCRATCH_MNT/foobar > /dev/null
> +
> +	# Unpatched kernel would trigger an ASSERT() or crash at writeback.
> +	sync
> +
> +	echo "=== With initial 5G data written ($data_profile) ===" >> $seqres.full
> +	$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
> +	_scratch_unmount
> +
> +	# Make sure we haven't corrupted anything.
> +	$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1
> +	if [ $? -ne 0 ]; then
> +		_scratch_dev_pool_put
> +		_fail "data corruption detected after initial data filling"
> +	fi
> +}
> +
> +# Make sure each device has at least 2G.
> +# Btrfs has a limits on per-device stripe length of 1G.
> +# Double that so that we can ensure a RAID6 data chunk with 6G size.
> +for i in $SCRATCH_DEV_POOL; do
> +	devsize=$(blockdev --getsize64 "$i")
> +	if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then
> +		_scratch_dev_pool_put
> +		_notrun "device $i is too small, need at least 2G"
> +	fi
> +done
> +
> +workload raid5
> +workload raid6
> +
> +_scratch_dev_pool_put
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/293.out b/tests/btrfs/293.out
> new file mode 100644
> index 00000000..076fc056
> --- /dev/null
> +++ b/tests/btrfs/293.out
> @@ -0,0 +1,2 @@
> +QA output created by 293
> +Silence is golden
> -- 
> 2.39.0
>
Zorro Lang July 24, 2023, 4:53 a.m. UTC | #6
On Mon, Jul 24, 2023 at 01:23:06AM +0000, Naohiro Aota wrote:
> On Thu, Jun 22, 2023 at 02:54:38PM +0800, Qu Wenruo wrote:
> > There is a recent regression during v6.4 merge window, that a u32 left
> > shift overflow can cause problems with large data chunks (over 4G)
> > sized.
> > 
> > This is especially nasty for RAID56, which can lead to ASSERT() during
> > regular writes, or corrupt memory if CONFIG_BTRFS_ASSERT is not enabled.
> > 
> > This is the regression test case for it.
> > 
> > Unlike btrfs/292, btrfs doesn't support trim inside RAID56 chunks, thus
> > the workflow is simplified:
> > 
> > - Create a RAID5 or RAID6 data chunk during mkfs
> > 
> > - Fill the fs with 5G data and sync
> >   For unpatched kernel, the sync would crash the kernel.
> > 
> > - Make sure everything is fine
> > 
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > ---
> >  tests/btrfs/293     | 72 +++++++++++++++++++++++++++++++++++++++++++++
> >  tests/btrfs/293.out |  2 ++
> >  2 files changed, 74 insertions(+)
> >  create mode 100755 tests/btrfs/293
> >  create mode 100644 tests/btrfs/293.out
> > 
> > diff --git a/tests/btrfs/293 b/tests/btrfs/293
> > new file mode 100755
> > index 00000000..68312682
> > --- /dev/null
> > +++ b/tests/btrfs/293
> > @@ -0,0 +1,72 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved.
> > +#
> > +# FS QA Test 293
> > +#
> > +# Test btrfs write behavior with large RAID56 chunks (size beyond 4G).
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto raid volume
> > +
> > +# Import common functions.
> > +. ./common/filter
> > +
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> > +_supported_fs btrfs
> > +_require_scratch_dev_pool 8
> 
> Could you add following lines?
> 
> # RAID5/6 not working on zoned mode.
> _require_non_zoned_device "${SCRATCH_DEV}"
> 
> With that, looks good to me.

Hi Naohiro,

This patch has been merged last weekend. If you feel this line is needed, please
send another patch to 'fix' it.

Thanks,
Zorro

> 
> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
> 
> > +_fixed_by_kernel_commit a7299a18a179 \
> > +	"btrfs: fix u32 overflows when left shifting @stripe_nr"
> > +_fixed_by_kernel_commit xxxxxxxxxxxx \
> > +	"btrfs: use a dedicated helper to convert stripe_nr to offset"
> > +
> > +_scratch_dev_pool_get 8
> > +
> > +datasize=$((5 * 1024 * 1024 * 1024))
> > +
> > +
> > +workload()
> > +{
> > +	local data_profile=$1
> > +
> > +	_scratch_pool_mkfs -m raid1 -d $data_profile >> $seqres.full 2>&1
> > +	_scratch_mount
> > +	$XFS_IO_PROG -f -c "pwrite -b 1m 0 $datasize" $SCRATCH_MNT/foobar > /dev/null
> > +
> > +	# Unpatched kernel would trigger an ASSERT() or crash at writeback.
> > +	sync
> > +
> > +	echo "=== With initial 5G data written ($data_profile) ===" >> $seqres.full
> > +	$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
> > +	_scratch_unmount
> > +
> > +	# Make sure we haven't corrupted anything.
> > +	$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1
> > +	if [ $? -ne 0 ]; then
> > +		_scratch_dev_pool_put
> > +		_fail "data corruption detected after initial data filling"
> > +	fi
> > +}
> > +
> > +# Make sure each device has at least 2G.
> > +# Btrfs has a limits on per-device stripe length of 1G.
> > +# Double that so that we can ensure a RAID6 data chunk with 6G size.
> > +for i in $SCRATCH_DEV_POOL; do
> > +	devsize=$(blockdev --getsize64 "$i")
> > +	if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then
> > +		_scratch_dev_pool_put
> > +		_notrun "device $i is too small, need at least 2G"
> > +	fi
> > +done
> > +
> > +workload raid5
> > +workload raid6
> > +
> > +_scratch_dev_pool_put
> > +echo "Silence is golden"
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/btrfs/293.out b/tests/btrfs/293.out
> > new file mode 100644
> > index 00000000..076fc056
> > --- /dev/null
> > +++ b/tests/btrfs/293.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 293
> > +Silence is golden
> > -- 
> > 2.39.0
> > 
>
diff mbox series

Patch

diff --git a/tests/btrfs/293 b/tests/btrfs/293
new file mode 100755
index 00000000..68312682
--- /dev/null
+++ b/tests/btrfs/293
@@ -0,0 +1,72 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 293
+#
+# Test btrfs write behavior with large RAID56 chunks (size beyond 4G).
+#
+. ./common/preamble
+_begin_fstest auto raid volume
+
+# Import common functions.
+. ./common/filter
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_require_scratch_dev_pool 8
+_fixed_by_kernel_commit a7299a18a179 \
+	"btrfs: fix u32 overflows when left shifting @stripe_nr"
+_fixed_by_kernel_commit xxxxxxxxxxxx \
+	"btrfs: use a dedicated helper to convert stripe_nr to offset"
+
+_scratch_dev_pool_get 8
+
+datasize=$((5 * 1024 * 1024 * 1024))
+
+
+workload()
+{
+	local data_profile=$1
+
+	_scratch_pool_mkfs -m raid1 -d $data_profile >> $seqres.full 2>&1
+	_scratch_mount
+	$XFS_IO_PROG -f -c "pwrite -b 1m 0 $datasize" $SCRATCH_MNT/foobar > /dev/null
+
+	# Unpatched kernel would trigger an ASSERT() or crash at writeback.
+	sync
+
+	echo "=== With initial 5G data written ($data_profile) ===" >> $seqres.full
+	$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
+	_scratch_unmount
+
+	# Make sure we haven't corrupted anything.
+	$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1
+	if [ $? -ne 0 ]; then
+		_scratch_dev_pool_put
+		_fail "data corruption detected after initial data filling"
+	fi
+}
+
+# Make sure each device has at least 2G.
+# Btrfs has a limits on per-device stripe length of 1G.
+# Double that so that we can ensure a RAID6 data chunk with 6G size.
+for i in $SCRATCH_DEV_POOL; do
+	devsize=$(blockdev --getsize64 "$i")
+	if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then
+		_scratch_dev_pool_put
+		_notrun "device $i is too small, need at least 2G"
+	fi
+done
+
+workload raid5
+workload raid6
+
+_scratch_dev_pool_put
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/293.out b/tests/btrfs/293.out
new file mode 100644
index 00000000..076fc056
--- /dev/null
+++ b/tests/btrfs/293.out
@@ -0,0 +1,2 @@ 
+QA output created by 293
+Silence is golden