Message ID | 20240824071346.225289-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fstests: btrfs: a new test case to verify a use-after-free bug | expand |
在 2024/8/24 16:43, Qu Wenruo 写道: > [BUG] > There is a use-after-free bug triggered very randomly by btrfs/125. > > If KASAN is enabled it can be triggered on certain setup. > Or it can lead to crash. > > [CAUSE] > The test case btrfs/125 is using RAID5 for metadata, which has a known > RMW problem if the there is some corruption on-disk. > > RMW will use the corrupted contents to generate a new parity, losing the > final chance to rebuild the contents. > > This is specific to metadata, as for data we have extra data checksum, > but the metadata has extra problems like possible deadlock due to the > extra metadata read/recovery needed to search the extent tree. > > And we know this problem for a while but without a better solution other > than avoid using RAID56 for metadata: > >> Metadata >> Do not use raid5 nor raid6 for metadata. Use raid1 or raid1c3 >> respectively. > > Combined with the above csum tree corruption, since RAID5 is stripe > based, btrfs needs to split its read bios according to stripe boundary. > And after a split, do a csum tree lookup for the expected csum. > > But if that csum lookup failed, in the error path btrfs doesn't handle > the split bios properly and lead to double freeing of the original bio > (the one containing the bio vectors). > > [NEW TEST CASE] > Unlike the original btrfs/125, which is very random and picky to > reproduce, introduce a new test case to verify the specific behavior by: > > - Create a btrfs with enough csum leaves > To bump the csum tree level, use the minimal nodesize possible (4K). > Writing 32M data which needs at least 8 leaves for data checksum > > - Find the last csum tree leave and corrupt it > > - Read the data many times until we trigger the bug or exit gracefully > With an x86_64 VM (which is never able to trigger btrfs/125 failure) > with KASAN enabled, it can trigger the KASAN report in just 4 > iterations (the default iteration number is 32). > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > NOTE: the mentioned fix (currently v3) is not good enough, will be > updated to v4 to fully pass the new test case. The v4 version of the fix is submitted and can handle the test case properly now: https://lore.kernel.org/linux-btrfs/f4f916352ddf3f80048567ec7d8cc64cb388dc09.1724493430.git.wqu@suse.com/T/#u Thanks, Qu > --- > tests/btrfs/319 | 92 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/319.out | 2 + > 2 files changed, 94 insertions(+) > create mode 100755 tests/btrfs/319 > create mode 100644 tests/btrfs/319.out > > diff --git a/tests/btrfs/319 b/tests/btrfs/319 > new file mode 100755 > index 00000000..b6aecb06 > --- /dev/null > +++ b/tests/btrfs/319 > @@ -0,0 +1,92 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 319 > +# > +# Make sure data csum lookup failure will not lead to double bio freeing > +# > +. ./common/preamble > +_begin_fstest auto quick > + > +# Override the default cleanup function. > +# _cleanup() > +# { > +# cd / > +# rm -r -f $tmp.* > +# } > + > +. ./common/rc > + > +_require_scratch > +_fixed_by_kernel_commit d139ded8b9cd \ > + "btrfs: fix a use-after-free bug when hitting errors inside btrfs_submit_chunk()" > + > +# The final fs on scratch device will have corrupted csum tree, which will > +# never pass fsck. > +_require_scratch_nocheck > +_require_scratch_dev_pool 2 > + > +# Use RAID0 for data to get bio splitted according to stripe boundary. > +# This is required to trigger the bug. > +_check_btrfs_raid_type raid0 > + > +# This test goes 4K sectorsize and 4K nodesize, so that we easily create > +# higher level of csum tree. > +_require_btrfs_support_sectorsize 4096 > + > +# The bug itself has a race window, run this many times to ensure triggering. > +# On an x86_64 VM with KASAN enabled, it can be triggered before the 10th run. > +runtime=32 > + > +_scratch_pool_mkfs "-d raid0 -m single -n 4k -s 4k" >> $seqres.full 2>&1 > +# This test requires data checksum to trigger a corruption. > +_scratch_mount -o datasum,datacow > + > +# For the smallest csum size CRC32C it's 4 bytes per 4K, create 32M data > +# will need 32K data checksum, which is at least 8 leaves. > +_pwrite_byte 0xef 0 32m "$SCRATCH_MNT/foobar" > /dev/null > +sync > +_scratch_unmount > + > +# Search for the last leaf of the csum tree, that will be the target to destroy. > +$BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV >> $seqres.full > +target_bytenr=$($BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV | grep "leaf.*flags" | sort | tail -n1 | cut -f2 -d\ ) > + > +if [ -z "$target_bytenr" ]; then > + _fail "unable to locate the last csum tree leave" > +fi > + > +echo "bytenr of csum tree leave to corrupt: $target_bytenr" >> $seqres.full > + > +# Corrupt both copy of the target. > +physical=$(_btrfs_get_physical "$target_bytenr" 1) > +dev=$(_btrfs_get_device_path "$target_bytenr" 1) > + > +echo "physical bytenr: $physical" >> $seqres.full > +echo "physical device: $dev" >> $seqres.full > + > +_pwrite_byte 0x00 "$physical" 4 "$dev" > /dev/null > + > +for (( i = 0; i < $runtime; i++ )); do > + echo "=== run $i/$runtime ===" >> $seqres.full > + _scratch_mount -o ro > + # Since the data is on RAID0, read bios will be split at the stripe > + # (64K sized) boundary. If csum lookup failed due to corrupted csum > + # tree, there is a race window that can lead to double bio freeing (triggering > + # KASAN at least). > + cat "$SCRATCH_MNT/foobar" &> /dev/null > + _scratch_unmount > + > + # Manually check the dmesg for "BUG:", and do not call _check_dmesg() > + # since it will clear 'check_dmesg' file and skips the check. > + if _dmesg_since_test_start | grep -q "BUG:"; then > + _fail "Critical error(s) found in dmesg" > + fi > +done > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/319.out b/tests/btrfs/319.out > new file mode 100644 > index 00000000..d40c929a > --- /dev/null > +++ b/tests/btrfs/319.out > @@ -0,0 +1,2 @@ > +QA output created by 319 > +Silence is golden
diff --git a/tests/btrfs/319 b/tests/btrfs/319 new file mode 100755 index 00000000..b6aecb06 --- /dev/null +++ b/tests/btrfs/319 @@ -0,0 +1,92 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. +# +# FS QA Test 319 +# +# Make sure data csum lookup failure will not lead to double bio freeing +# +. ./common/preamble +_begin_fstest auto quick + +# Override the default cleanup function. +# _cleanup() +# { +# cd / +# rm -r -f $tmp.* +# } + +. ./common/rc + +_require_scratch +_fixed_by_kernel_commit d139ded8b9cd \ + "btrfs: fix a use-after-free bug when hitting errors inside btrfs_submit_chunk()" + +# The final fs on scratch device will have corrupted csum tree, which will +# never pass fsck. +_require_scratch_nocheck +_require_scratch_dev_pool 2 + +# Use RAID0 for data to get bio splitted according to stripe boundary. +# This is required to trigger the bug. +_check_btrfs_raid_type raid0 + +# This test goes 4K sectorsize and 4K nodesize, so that we easily create +# higher level of csum tree. +_require_btrfs_support_sectorsize 4096 + +# The bug itself has a race window, run this many times to ensure triggering. +# On an x86_64 VM with KASAN enabled, it can be triggered before the 10th run. +runtime=32 + +_scratch_pool_mkfs "-d raid0 -m single -n 4k -s 4k" >> $seqres.full 2>&1 +# This test requires data checksum to trigger a corruption. +_scratch_mount -o datasum,datacow + +# For the smallest csum size CRC32C it's 4 bytes per 4K, create 32M data +# will need 32K data checksum, which is at least 8 leaves. +_pwrite_byte 0xef 0 32m "$SCRATCH_MNT/foobar" > /dev/null +sync +_scratch_unmount + +# Search for the last leaf of the csum tree, that will be the target to destroy. +$BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV >> $seqres.full +target_bytenr=$($BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV | grep "leaf.*flags" | sort | tail -n1 | cut -f2 -d\ ) + +if [ -z "$target_bytenr" ]; then + _fail "unable to locate the last csum tree leave" +fi + +echo "bytenr of csum tree leave to corrupt: $target_bytenr" >> $seqres.full + +# Corrupt both copy of the target. +physical=$(_btrfs_get_physical "$target_bytenr" 1) +dev=$(_btrfs_get_device_path "$target_bytenr" 1) + +echo "physical bytenr: $physical" >> $seqres.full +echo "physical device: $dev" >> $seqres.full + +_pwrite_byte 0x00 "$physical" 4 "$dev" > /dev/null + +for (( i = 0; i < $runtime; i++ )); do + echo "=== run $i/$runtime ===" >> $seqres.full + _scratch_mount -o ro + # Since the data is on RAID0, read bios will be split at the stripe + # (64K sized) boundary. If csum lookup failed due to corrupted csum + # tree, there is a race window that can lead to double bio freeing (triggering + # KASAN at least). + cat "$SCRATCH_MNT/foobar" &> /dev/null + _scratch_unmount + + # Manually check the dmesg for "BUG:", and do not call _check_dmesg() + # since it will clear 'check_dmesg' file and skips the check. + if _dmesg_since_test_start | grep -q "BUG:"; then + _fail "Critical error(s) found in dmesg" + fi +done + +echo "Silence is golden" + +# success, all done +status=0 +exit diff --git a/tests/btrfs/319.out b/tests/btrfs/319.out new file mode 100644 index 00000000..d40c929a --- /dev/null +++ b/tests/btrfs/319.out @@ -0,0 +1,2 @@ +QA output created by 319 +Silence is golden