Message ID | 20240824103021.264856-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] fstests: btrfs: a new test case to verify a use-after-free bug | expand |
On Sat, Aug 24, 2024 at 11:31 AM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > There is a use-after-free bug triggered very randomly by btrfs/125. > > If KASAN is enabled it can be triggered on certain setup. > Or it can lead to crash. > > [CAUSE] > The test case btrfs/125 is using RAID5 for metadata, which has a known > RMW problem if the there is some corruption on-disk. > > RMW will use the corrupted contents to generate a new parity, losing the > final chance to rebuild the contents. > > This is specific to metadata, as for data we have extra data checksum, > but the metadata has extra problems like possible deadlock due to the > extra metadata read/recovery needed to search the extent tree. > > And we know this problem for a while but without a better solution other > than avoid using RAID56 for metadata: > > > Metadata > > Do not use raid5 nor raid6 for metadata. Use raid1 or raid1c3 > > respectively. > > Combined with the above csum tree corruption, since RAID5 is stripe > based, btrfs needs to split its read bios according to stripe boundary. > And after a split, do a csum tree lookup for the expected csum. This is way too much unrelated stuff. The problem may have been triggered sporadically by btrfs/125, but there's no need to go into details on raid5 problems in btrfs. Even because the use-after-free bug can be triggered without raid5, just using raid0 like in the test case introduced by this patch. > > But if that csum lookup failed, in the error path btrfs doesn't handle > the split bios properly and lead to double freeing of the original bio > (the one containing the bio vectors). > > [NEW TEST CASE] > Unlike the original btrfs/125, which is very random and picky to > reproduce, introduce a new test case to verify the specific behavior by: > > - Create a btrfs with enough csum leaves > To bump the csum tree level, use the minimal nodesize possible (4K). > Writing 32M data which needs at least 8 leaves for data checksum > > - Find the last csum tree leave and corrupt it > > - Read the data many times until we trigger the bug or exit gracefully > With an x86_64 VM (which is never able to trigger btrfs/125 failure) > with KASAN enabled, it can trigger the KASAN report in just 4 > iterations (the default iteration number is 32). > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > Changelog: > v2: > - Fix the wrong commit hash > The proper fix is not yet merged, the old hash is a place holder > copied from another test case but forgot to remove. > > - Minor wording update > > - Add to "dangerous" group > --- > tests/btrfs/319 | 84 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/319.out | 2 ++ > 2 files changed, 86 insertions(+) > create mode 100755 tests/btrfs/319 > create mode 100644 tests/btrfs/319.out > > diff --git a/tests/btrfs/319 b/tests/btrfs/319 > new file mode 100755 > index 00000000..4be2b50b > --- /dev/null > +++ b/tests/btrfs/319 There's already a btrfs/319 test case in for-next btw. This needs to be renumbered. > @@ -0,0 +1,84 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 319 > +# > +# Make sure data csum lookup failure will not lead to double bio freeing > +# > +. ./common/preamble > +_begin_fstest auto quick dangerous Missing the "raid" group. > + > +_require_scratch > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > + "btrfs: fix a use-after-free bug when hitting errors inside btrfs_submit_chunk()" > + > +# The final fs will have a corrupted csum tree, which will never pass fsck > +_require_scratch_nocheck > +_require_scratch_dev_pool 2 > + > +# Use RAID0 for data to get bio splitted according to stripe boundary. > +# This is required to trigger the bug. > +_check_btrfs_raid_type raid0 > + > +# This test goes 4K sectorsize and 4K nodesize, so that we easily create > +# higher level of csum tree. > +_require_btrfs_support_sectorsize 4096 > + > +# The bug itself has a race window, run this many times to ensure triggering. > +# On an x86_64 VM with KASAN enabled, normally it is triggered before the 10th run. > +runtime=32 This is a confusing name because it actually means iterations in the for loop below, and not a time duration. I would suggest renaming it to "iterations" for example, or just get rid of it since it's only used in the for loop's condition. > + > +_scratch_pool_mkfs "-d raid0 -m single -n 4k -s 4k" >> $seqres.full 2>&1 > +# This test requires data checksum to trigger the bug. > +_scratch_mount -o datasum,datacow > + > +# For the smallest csum size (CRC32C) it's 4 bytes per 4K, writing 32M data > +# will need 32K data checksum at minimal, which is at least 8 leaves. > +_pwrite_byte 0xef 0 32m "$SCRATCH_MNT/foobar" > /dev/null > +sync What's this sync for? We just do a unmount right after it, which makes it pointless and confusing. > +_scratch_unmount > + > +# Search for the last leaf of the csum tree, that will be the target to destroy. > +$BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV >> $seqres.full Please use the full command name: inspect -> inspect-internal > +target_bytenr=$($BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV | grep "leaf.*flags" | sort | tail -n1 | cut -f2 -d\ ) Same here. Also, missing at the top a: _require_btrfs_command inspect-internal dump-tree Also we're passing the symbolic name "csum" to -t, which not all versions of btrfs-progs support. The support was added in btrfs-progs 4.5 (commit 69874af7b81519e40db9d92efa6beebee4220c63). > + > +if [ -z "$target_bytenr" ]; then > + _fail "unable to locate the last csum tree leave" leave -> leaf > +fi > + > +echo "bytenr of csum tree leave to corrupt: $target_bytenr" >> $seqres.full leave -> leaf > + > +# Corrupt that csum tree block. > +physical=$(_btrfs_get_physical "$target_bytenr" 1) > +dev=$(_btrfs_get_device_path "$target_bytenr" 1) > + > +echo "physical bytenr: $physical" >> $seqres.full > +echo "physical device: $dev" >> $seqres.full > + > +_pwrite_byte 0x00 "$physical" 4 "$dev" > /dev/null > + > +for (( i = 0; i < $runtime; i++ )); do > + echo "=== run $i/$runtime ===" >> $seqres.full > + _scratch_mount -o ro > + # Since the data is on RAID0, read bios will be split at the stripe > + # (64K sized) boundary. If csum lookup failed due to corrupted csum > + # tree, there is a race window that can lead to double bio freeing > + # (triggering KASAN at least). > + cat "$SCRATCH_MNT/foobar" &> /dev/null > + _scratch_unmount > + > + # Manually check the dmesg for "BUG", and do not call _check_dmesg() > + # as it will clear 'check_dmesg' file and skips the final check after > + # the test. > + # For now just focus on the "BUG:" line from KASAN. > + if _check_dmesg_for "BUG" ; then > + _fail "Critical error(s) found in dmesg" > + fi Why do the check manually? The check script calls _check_dmesg when a test finishes and if it finds 'BUG:' there, it will make the test fail. So there's no need to do the unmount and call _check_dmesg_for. Thanks. > +done > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/319.out b/tests/btrfs/319.out > new file mode 100644 > index 00000000..d40c929a > --- /dev/null > +++ b/tests/btrfs/319.out > @@ -0,0 +1,2 @@ > +QA output created by 319 > +Silence is golden > -- > 2.46.0 > >
On Mon, Aug 26, 2024 at 1:14 PM Filipe Manana <fdmanana@kernel.org> wrote: > > On Sat, Aug 24, 2024 at 11:31 AM Qu Wenruo <wqu@suse.com> wrote: > > > > [BUG] > > There is a use-after-free bug triggered very randomly by btrfs/125. > > > > If KASAN is enabled it can be triggered on certain setup. > > Or it can lead to crash. > > > > [CAUSE] > > The test case btrfs/125 is using RAID5 for metadata, which has a known > > RMW problem if the there is some corruption on-disk. > > > > RMW will use the corrupted contents to generate a new parity, losing the > > final chance to rebuild the contents. > > > > This is specific to metadata, as for data we have extra data checksum, > > but the metadata has extra problems like possible deadlock due to the > > extra metadata read/recovery needed to search the extent tree. > > > > And we know this problem for a while but without a better solution other > > than avoid using RAID56 for metadata: > > > > > Metadata > > > Do not use raid5 nor raid6 for metadata. Use raid1 or raid1c3 > > > respectively. > > > > Combined with the above csum tree corruption, since RAID5 is stripe > > based, btrfs needs to split its read bios according to stripe boundary. > > And after a split, do a csum tree lookup for the expected csum. > > This is way too much unrelated stuff. > The problem may have been triggered sporadically by btrfs/125, but > there's no need to go into details on raid5 problems in btrfs. > Even because the use-after-free bug can be triggered without raid5, > just using raid0 like in the test case introduced by this patch. > > > > > But if that csum lookup failed, in the error path btrfs doesn't handle > > the split bios properly and lead to double freeing of the original bio > > (the one containing the bio vectors). > > > > [NEW TEST CASE] > > Unlike the original btrfs/125, which is very random and picky to > > reproduce, introduce a new test case to verify the specific behavior by: > > > > - Create a btrfs with enough csum leaves > > To bump the csum tree level, use the minimal nodesize possible (4K). > > Writing 32M data which needs at least 8 leaves for data checksum > > > > - Find the last csum tree leave and corrupt it > > > > - Read the data many times until we trigger the bug or exit gracefully > > With an x86_64 VM (which is never able to trigger btrfs/125 failure) > > with KASAN enabled, it can trigger the KASAN report in just 4 > > iterations (the default iteration number is 32). > > > > Signed-off-by: Qu Wenruo <wqu@suse.com> > > --- > > Changelog: > > v2: > > - Fix the wrong commit hash > > The proper fix is not yet merged, the old hash is a place holder > > copied from another test case but forgot to remove. > > > > - Minor wording update > > > > - Add to "dangerous" group > > --- > > tests/btrfs/319 | 84 +++++++++++++++++++++++++++++++++++++++++++++ > > tests/btrfs/319.out | 2 ++ > > 2 files changed, 86 insertions(+) > > create mode 100755 tests/btrfs/319 > > create mode 100644 tests/btrfs/319.out > > > > diff --git a/tests/btrfs/319 b/tests/btrfs/319 > > new file mode 100755 > > index 00000000..4be2b50b > > --- /dev/null > > +++ b/tests/btrfs/319 > > There's already a btrfs/319 test case in for-next btw. This needs to > be renumbered. > > > @@ -0,0 +1,84 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. > > +# > > +# FS QA Test 319 > > +# > > +# Make sure data csum lookup failure will not lead to double bio freeing > > +# > > +. ./common/preamble > > +_begin_fstest auto quick dangerous > > Missing the "raid" group. > > > + > > +_require_scratch Also this is not needed, because below we do a call to: _require_scratch_nocheck Finally the subject could be more specific like for example: "btrfs: test reading data with a corrupted checksum tree leaf" Since the test serves to check there are no use-after-free, crashes, deadlocks, etc, when reading data which has its checksums in a corrupted csum tree block. Thanks. > > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > > + "btrfs: fix a use-after-free bug when hitting errors inside btrfs_submit_chunk()" > > + > > +# The final fs will have a corrupted csum tree, which will never pass fsck > > +_require_scratch_nocheck > > +_require_scratch_dev_pool 2 > > + > > +# Use RAID0 for data to get bio splitted according to stripe boundary. > > +# This is required to trigger the bug. > > +_check_btrfs_raid_type raid0 > > + > > +# This test goes 4K sectorsize and 4K nodesize, so that we easily create > > +# higher level of csum tree. > > +_require_btrfs_support_sectorsize 4096 > > + > > +# The bug itself has a race window, run this many times to ensure triggering. > > +# On an x86_64 VM with KASAN enabled, normally it is triggered before the 10th run. > > +runtime=32 > > This is a confusing name because it actually means iterations in the > for loop below, and not a time duration. > I would suggest renaming it to "iterations" for example, or just get > rid of it since it's only used in the for loop's condition. > > > + > > +_scratch_pool_mkfs "-d raid0 -m single -n 4k -s 4k" >> $seqres.full 2>&1 > > +# This test requires data checksum to trigger the bug. > > +_scratch_mount -o datasum,datacow > > + > > +# For the smallest csum size (CRC32C) it's 4 bytes per 4K, writing 32M data > > +# will need 32K data checksum at minimal, which is at least 8 leaves. > > +_pwrite_byte 0xef 0 32m "$SCRATCH_MNT/foobar" > /dev/null > > +sync > > What's this sync for? > We just do a unmount right after it, which makes it pointless and confusing. > > > +_scratch_unmount > > + > > +# Search for the last leaf of the csum tree, that will be the target to destroy. > > +$BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV >> $seqres.full > > Please use the full command name: inspect -> inspect-internal > > > +target_bytenr=$($BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV | grep "leaf.*flags" | sort | tail -n1 | cut -f2 -d\ ) > > Same here. > > Also, missing at the top a: > > _require_btrfs_command inspect-internal dump-tree > > Also we're passing the symbolic name "csum" to -t, which not all > versions of btrfs-progs support. > The support was added in btrfs-progs 4.5 (commit > 69874af7b81519e40db9d92efa6beebee4220c63). > > > + > > +if [ -z "$target_bytenr" ]; then > > + _fail "unable to locate the last csum tree leave" > > leave -> leaf > > > +fi > > + > > +echo "bytenr of csum tree leave to corrupt: $target_bytenr" >> $seqres.full > > leave -> leaf > > > + > > +# Corrupt that csum tree block. > > +physical=$(_btrfs_get_physical "$target_bytenr" 1) > > +dev=$(_btrfs_get_device_path "$target_bytenr" 1) > > + > > +echo "physical bytenr: $physical" >> $seqres.full > > +echo "physical device: $dev" >> $seqres.full > > + > > +_pwrite_byte 0x00 "$physical" 4 "$dev" > /dev/null > > + > > +for (( i = 0; i < $runtime; i++ )); do > > + echo "=== run $i/$runtime ===" >> $seqres.full > > + _scratch_mount -o ro > > + # Since the data is on RAID0, read bios will be split at the stripe > > + # (64K sized) boundary. If csum lookup failed due to corrupted csum > > + # tree, there is a race window that can lead to double bio freeing > > + # (triggering KASAN at least). > > + cat "$SCRATCH_MNT/foobar" &> /dev/null > > + _scratch_unmount > > + > > + # Manually check the dmesg for "BUG", and do not call _check_dmesg() > > + # as it will clear 'check_dmesg' file and skips the final check after > > + # the test. > > + # For now just focus on the "BUG:" line from KASAN. > > + if _check_dmesg_for "BUG" ; then > > + _fail "Critical error(s) found in dmesg" > > + fi > > Why do the check manually? The check script calls _check_dmesg when a > test finishes and if it finds 'BUG:' there, it will make the test > fail. > So there's no need to do the unmount and call _check_dmesg_for. > > Thanks. > > > +done > > + > > +echo "Silence is golden" > > + > > +# success, all done > > +status=0 > > +exit > > diff --git a/tests/btrfs/319.out b/tests/btrfs/319.out > > new file mode 100644 > > index 00000000..d40c929a > > --- /dev/null > > +++ b/tests/btrfs/319.out > > @@ -0,0 +1,2 @@ > > +QA output created by 319 > > +Silence is golden > > -- > > 2.46.0 > > > >
On 24/8/24 6:30 pm, Qu Wenruo wrote: > [BUG] > There is a use-after-free bug triggered very randomly by btrfs/125. > > If KASAN is enabled it can be triggered on certain setup. > Or it can lead to crash. > > [CAUSE] > The test case btrfs/125 is using RAID5 for metadata, which has a known > RMW problem if the there is some corruption on-disk. > > RMW will use the corrupted contents to generate a new parity, losing the > final chance to rebuild the contents. > > This is specific to metadata, as for data we have extra data checksum, > but the metadata has extra problems like possible deadlock due to the > extra metadata read/recovery needed to search the extent tree. > > And we know this problem for a while but without a better solution other > than avoid using RAID56 for metadata: > >> Metadata >> Do not use raid5 nor raid6 for metadata. Use raid1 or raid1c3 >> respectively. > > Combined with the above csum tree corruption, since RAID5 is stripe > based, btrfs needs to split its read bios according to stripe boundary. > And after a split, do a csum tree lookup for the expected csum. > > But if that csum lookup failed, in the error path btrfs doesn't handle > the split bios properly and lead to double freeing of the original bio > (the one containing the bio vectors). > > [NEW TEST CASE] > Unlike the original btrfs/125, which is very random and picky to > reproduce, introduce a new test case to verify the specific behavior by: > > - Create a btrfs with enough csum leaves > To bump the csum tree level, use the minimal nodesize possible (4K). > Writing 32M data which needs at least 8 leaves for data checksum > > - Find the last csum tree leave and corrupt it > > - Read the data many times until we trigger the bug or exit gracefully > With an x86_64 VM (which is never able to trigger btrfs/125 failure) > with KASAN enabled, it can trigger the KASAN report in just 4 > iterations (the default iteration number is 32). > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > Changelog: > v2: > - Fix the wrong commit hash > The proper fix is not yet merged, the old hash is a place holder > copied from another test case but forgot to remove. > > - Minor wording update > > - Add to "dangerous" group > --- > tests/btrfs/319 | 84 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/319.out | 2 ++ > 2 files changed, 86 insertions(+) > create mode 100755 tests/btrfs/319 > create mode 100644 tests/btrfs/319.out > > diff --git a/tests/btrfs/319 b/tests/btrfs/319 > new file mode 100755 > index 00000000..4be2b50b > --- /dev/null > +++ b/tests/btrfs/319 > @@ -0,0 +1,84 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 319 > +# > +# Make sure data csum lookup failure will not lead to double bio freeing > +# > +. ./common/preamble > +_begin_fstest auto quick dangerous > + > +_require_scratch > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > + "btrfs: fix a use-after-free bug when hitting errors inside btrfs_submit_chunk()" > + > +# The final fs will have a corrupted csum tree, which will never pass fsck > +_require_scratch_nocheck > +_require_scratch_dev_pool 2 > + > +# Use RAID0 for data to get bio splitted according to stripe boundary. > +# This is required to trigger the bug. > +_check_btrfs_raid_type raid0 Did you mean to use _require_btrfs_raid_type(raid0)? Otherwise, the line has no effect since you're not checking _check_btrfs_raid_type's return value. The rest looks good. Thx. Anand > + > +# This test goes 4K sectorsize and 4K nodesize, so that we easily create > +# higher level of csum tree. > +_require_btrfs_support_sectorsize 4096 > + > +# The bug itself has a race window, run this many times to ensure triggering. > +# On an x86_64 VM with KASAN enabled, normally it is triggered before the 10th run. > +runtime=32 > + > +_scratch_pool_mkfs "-d raid0 -m single -n 4k -s 4k" >> $seqres.full 2>&1 > +# This test requires data checksum to trigger the bug. > +_scratch_mount -o datasum,datacow > + > +# For the smallest csum size (CRC32C) it's 4 bytes per 4K, writing 32M data > +# will need 32K data checksum at minimal, which is at least 8 leaves. > +_pwrite_byte 0xef 0 32m "$SCRATCH_MNT/foobar" > /dev/null > +sync > +_scratch_unmount > + > +# Search for the last leaf of the csum tree, that will be the target to destroy. > +$BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV >> $seqres.full > +target_bytenr=$($BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV | grep "leaf.*flags" | sort | tail -n1 | cut -f2 -d\ ) > + > +if [ -z "$target_bytenr" ]; then > + _fail "unable to locate the last csum tree leave" > +fi > + > +echo "bytenr of csum tree leave to corrupt: $target_bytenr" >> $seqres.full > + > +# Corrupt that csum tree block. > +physical=$(_btrfs_get_physical "$target_bytenr" 1) > +dev=$(_btrfs_get_device_path "$target_bytenr" 1) > + > +echo "physical bytenr: $physical" >> $seqres.full > +echo "physical device: $dev" >> $seqres.full > + > +_pwrite_byte 0x00 "$physical" 4 "$dev" > /dev/null > + > +for (( i = 0; i < $runtime; i++ )); do > + echo "=== run $i/$runtime ===" >> $seqres.full > + _scratch_mount -o ro > + # Since the data is on RAID0, read bios will be split at the stripe > + # (64K sized) boundary. If csum lookup failed due to corrupted csum > + # tree, there is a race window that can lead to double bio freeing > + # (triggering KASAN at least). > + cat "$SCRATCH_MNT/foobar" &> /dev/null > + _scratch_unmount > + > + # Manually check the dmesg for "BUG", and do not call _check_dmesg() > + # as it will clear 'check_dmesg' file and skips the final check after > + # the test. > + # For now just focus on the "BUG:" line from KASAN. > + if _check_dmesg_for "BUG" ; then > + _fail "Critical error(s) found in dmesg" > + fi > +done > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/319.out b/tests/btrfs/319.out > new file mode 100644 > index 00000000..d40c929a > --- /dev/null > +++ b/tests/btrfs/319.out > @@ -0,0 +1,2 @@ > +QA output created by 319 > +Silence is golden
在 2024/8/26 21:44, Filipe Manana 写道: [...] >> + # Manually check the dmesg for "BUG", and do not call _check_dmesg() >> + # as it will clear 'check_dmesg' file and skips the final check after >> + # the test. >> + # For now just focus on the "BUG:" line from KASAN. >> + if _check_dmesg_for "BUG" ; then >> + _fail "Critical error(s) found in dmesg" >> + fi > > Why do the check manually? The check script calls _check_dmesg when a > test finishes and if it finds 'BUG:' there, it will make the test > fail. > So there's no need to do the unmount and call _check_dmesg_for. The main reasons are: 1. Fail the test as soon as possible Mostly to let the dmesg to contain only the failed iteration. I found it especially useful to stop immediately to inspect the dmesg, during the verification of my fix. This doesn't make too much difference for routine QA runs, but in the future if some regression happened and one (maybe myself) is going to investigate the failure, this early exit will make life much easier. 2. To avoid too small dmesg buffer Since each iteration will trigger some error message due to the corrupted tree blocks, combined with the 32 runs, I'm not that confident all dmesg buffer size can save all the dmesg. Hopes this explains the reasons well. Thanks, Qu > > Thanks. > >> +done >> + >> +echo "Silence is golden" >> + >> +# success, all done >> +status=0 >> +exit >> diff --git a/tests/btrfs/319.out b/tests/btrfs/319.out >> new file mode 100644 >> index 00000000..d40c929a >> --- /dev/null >> +++ b/tests/btrfs/319.out >> @@ -0,0 +1,2 @@ >> +QA output created by 319 >> +Silence is golden >> -- >> 2.46.0 >> >> >
diff --git a/tests/btrfs/319 b/tests/btrfs/319 new file mode 100755 index 00000000..4be2b50b --- /dev/null +++ b/tests/btrfs/319 @@ -0,0 +1,84 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. +# +# FS QA Test 319 +# +# Make sure data csum lookup failure will not lead to double bio freeing +# +. ./common/preamble +_begin_fstest auto quick dangerous + +_require_scratch +_fixed_by_kernel_commit xxxxxxxxxxxx \ + "btrfs: fix a use-after-free bug when hitting errors inside btrfs_submit_chunk()" + +# The final fs will have a corrupted csum tree, which will never pass fsck +_require_scratch_nocheck +_require_scratch_dev_pool 2 + +# Use RAID0 for data to get bio splitted according to stripe boundary. +# This is required to trigger the bug. +_check_btrfs_raid_type raid0 + +# This test goes 4K sectorsize and 4K nodesize, so that we easily create +# higher level of csum tree. +_require_btrfs_support_sectorsize 4096 + +# The bug itself has a race window, run this many times to ensure triggering. +# On an x86_64 VM with KASAN enabled, normally it is triggered before the 10th run. +runtime=32 + +_scratch_pool_mkfs "-d raid0 -m single -n 4k -s 4k" >> $seqres.full 2>&1 +# This test requires data checksum to trigger the bug. +_scratch_mount -o datasum,datacow + +# For the smallest csum size (CRC32C) it's 4 bytes per 4K, writing 32M data +# will need 32K data checksum at minimal, which is at least 8 leaves. +_pwrite_byte 0xef 0 32m "$SCRATCH_MNT/foobar" > /dev/null +sync +_scratch_unmount + +# Search for the last leaf of the csum tree, that will be the target to destroy. +$BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV >> $seqres.full +target_bytenr=$($BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV | grep "leaf.*flags" | sort | tail -n1 | cut -f2 -d\ ) + +if [ -z "$target_bytenr" ]; then + _fail "unable to locate the last csum tree leave" +fi + +echo "bytenr of csum tree leave to corrupt: $target_bytenr" >> $seqres.full + +# Corrupt that csum tree block. +physical=$(_btrfs_get_physical "$target_bytenr" 1) +dev=$(_btrfs_get_device_path "$target_bytenr" 1) + +echo "physical bytenr: $physical" >> $seqres.full +echo "physical device: $dev" >> $seqres.full + +_pwrite_byte 0x00 "$physical" 4 "$dev" > /dev/null + +for (( i = 0; i < $runtime; i++ )); do + echo "=== run $i/$runtime ===" >> $seqres.full + _scratch_mount -o ro + # Since the data is on RAID0, read bios will be split at the stripe + # (64K sized) boundary. If csum lookup failed due to corrupted csum + # tree, there is a race window that can lead to double bio freeing + # (triggering KASAN at least). + cat "$SCRATCH_MNT/foobar" &> /dev/null + _scratch_unmount + + # Manually check the dmesg for "BUG", and do not call _check_dmesg() + # as it will clear 'check_dmesg' file and skips the final check after + # the test. + # For now just focus on the "BUG:" line from KASAN. + if _check_dmesg_for "BUG" ; then + _fail "Critical error(s) found in dmesg" + fi +done + +echo "Silence is golden" + +# success, all done +status=0 +exit diff --git a/tests/btrfs/319.out b/tests/btrfs/319.out new file mode 100644 index 00000000..d40c929a --- /dev/null +++ b/tests/btrfs/319.out @@ -0,0 +1,2 @@ +QA output created by 319 +Silence is golden