diff mbox series

[v3] fstests: btrfs: test reading data with a corrupted checksum tree leaf

Message ID 1b0c008b1b05a9fd24b751e174da37bd769637ff.1724717443.git.wqu@suse.com (mailing list archive)
State New
Headers show
Series [v3] fstests: btrfs: test reading data with a corrupted checksum tree leaf | expand

Commit Message

Qu Wenruo Aug. 27, 2024, 12:13 a.m. UTC
[BUG]
There is a bug report that, KASAN get triggered when:

- A read bio needs to be split
  This can happen for profiles with stripes, including
  RAID0/RAID10/RAID5/RAID6.

- An error happens before submitting the new split bio
  This includes:
  * chunk map lookup failure
  * data csum lookup failure

Then during the error path of btrfs_submit_chunk(), the original bio is
fully freed before submitted range has a chance to call its endio
function, resulting a use-after-free bug.

[NEW TEST CASE]
Introduce a new test case to verify the specific behavior by:

- Create a btrfs with enough csum leaves with data RAID0 profile
  To bump the csum tree level, use the minimal nodesize possible (4K).
  Writing 32M data which needs at least 8 leaves for data checksum

  RAID0 profile ensures the data read bios will get split.

- Find the last csum tree leave and corrupt it

- Read the data many times until we trigger the bug or exit gracefully
  With an x86_64 VM with KASAN enabled, it can trigger the KASAN report in
  just 4 iterations (the default iteration number is 32).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v3:
- Remove the unrelated btrfs/125 references
  There is nothing specific to RAID56, it's just a coincident that
  btrfs/125 leads us to the bug.
  Since we have a more comprehensive understanding of the bug, there is
  no need to mention it at all.

- More grammar fixes
- Use proper _check_btrfs_raid_type() to verify raid0 support
- Update the title to be more specific about the test case
- Renumber to btrfs/321 to avoid conflicts with an new test case
- Remove unnecessary 'sync' which is followed by unmount
- Use full subcommand name "inspect-internal"
- Explain why we want to fail early if hitting the bug
- Remove unnecessary `_require_scratch` which is duplicated to
  `_require_scratch_nocheck`

v2:
- Fix the wrong commit hash
  The proper fix is not yet merged, the old hash is a place holder
  copied from another test case but forgot to remove.

- Minor wording update

- Add to "dangerous" group
---
 tests/btrfs/321     | 83 +++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/321.out |  2 ++
 2 files changed, 85 insertions(+)
 create mode 100755 tests/btrfs/321
 create mode 100644 tests/btrfs/321.out

Comments

Anand Jain Aug. 27, 2024, 12:44 a.m. UTC | #1
On 27/8/24 8:13 am, Qu Wenruo wrote:
> [BUG]
> There is a bug report that, KASAN get triggered when:
> 
> - A read bio needs to be split
>    This can happen for profiles with stripes, including
>    RAID0/RAID10/RAID5/RAID6.
> 
> - An error happens before submitting the new split bio
>    This includes:
>    * chunk map lookup failure
>    * data csum lookup failure
> 
> Then during the error path of btrfs_submit_chunk(), the original bio is
> fully freed before submitted range has a chance to call its endio
> function, resulting a use-after-free bug.
> 
> [NEW TEST CASE]
> Introduce a new test case to verify the specific behavior by:
> 
> - Create a btrfs with enough csum leaves with data RAID0 profile
>    To bump the csum tree level, use the minimal nodesize possible (4K).
>    Writing 32M data which needs at least 8 leaves for data checksum
> 
>    RAID0 profile ensures the data read bios will get split.
> 
> - Find the last csum tree leave and corrupt it
> 
> - Read the data many times until we trigger the bug or exit gracefully
>    With an x86_64 VM with KASAN enabled, it can trigger the KASAN report in
>    just 4 iterations (the default iteration number is 32).
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v3:
> - Remove the unrelated btrfs/125 references
>    There is nothing specific to RAID56, it's just a coincident that
>    btrfs/125 leads us to the bug.
>    Since we have a more comprehensive understanding of the bug, there is
>    no need to mention it at all.
> 
> - More grammar fixes
> - Use proper _check_btrfs_raid_type() to verify raid0 support
> - Update the title to be more specific about the test case
> - Renumber to btrfs/321 to avoid conflicts with an new test case
> - Remove unnecessary 'sync' which is followed by unmount
> - Use full subcommand name "inspect-internal"
> - Explain why we want to fail early if hitting the bug
> - Remove unnecessary `_require_scratch` which is duplicated to
>    `_require_scratch_nocheck`
> 

looks good

Reviewed-by: Anand Jain <anand.jain@oracle.com>


Thx. Anand
Filipe Manana Aug. 27, 2024, 10:10 a.m. UTC | #2
On Tue, Aug 27, 2024 at 1:14 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> There is a bug report that, KASAN get triggered when:
>
> - A read bio needs to be split
>   This can happen for profiles with stripes, including
>   RAID0/RAID10/RAID5/RAID6.
>
> - An error happens before submitting the new split bio
>   This includes:
>   * chunk map lookup failure
>   * data csum lookup failure
>
> Then during the error path of btrfs_submit_chunk(), the original bio is
> fully freed before submitted range has a chance to call its endio
> function, resulting a use-after-free bug.
>
> [NEW TEST CASE]
> Introduce a new test case to verify the specific behavior by:
>
> - Create a btrfs with enough csum leaves with data RAID0 profile
>   To bump the csum tree level, use the minimal nodesize possible (4K).
>   Writing 32M data which needs at least 8 leaves for data checksum
>
>   RAID0 profile ensures the data read bios will get split.
>
> - Find the last csum tree leave and corrupt it
>
> - Read the data many times until we trigger the bug or exit gracefully
>   With an x86_64 VM with KASAN enabled, it can trigger the KASAN report in
>   just 4 iterations (the default iteration number is 32).
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Fine now, thanks.

Reviewed-by: Filipe Manana <fdmanana@suse.com>


> ---
> Changelog:
> v3:
> - Remove the unrelated btrfs/125 references
>   There is nothing specific to RAID56, it's just a coincident that
>   btrfs/125 leads us to the bug.
>   Since we have a more comprehensive understanding of the bug, there is
>   no need to mention it at all.
>
> - More grammar fixes
> - Use proper _check_btrfs_raid_type() to verify raid0 support
> - Update the title to be more specific about the test case
> - Renumber to btrfs/321 to avoid conflicts with an new test case
> - Remove unnecessary 'sync' which is followed by unmount
> - Use full subcommand name "inspect-internal"
> - Explain why we want to fail early if hitting the bug
> - Remove unnecessary `_require_scratch` which is duplicated to
>   `_require_scratch_nocheck`
>
> v2:
> - Fix the wrong commit hash
>   The proper fix is not yet merged, the old hash is a place holder
>   copied from another test case but forgot to remove.
>
> - Minor wording update
>
> - Add to "dangerous" group
> ---
>  tests/btrfs/321     | 83 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/321.out |  2 ++
>  2 files changed, 85 insertions(+)
>  create mode 100755 tests/btrfs/321
>  create mode 100644 tests/btrfs/321.out
>
> diff --git a/tests/btrfs/321 b/tests/btrfs/321
> new file mode 100755
> index 000000000000..e30199daa0d0
> --- /dev/null
> +++ b/tests/btrfs/321
> @@ -0,0 +1,83 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 321
> +#
> +# Make sure there are no use-after-free, crashes, deadlocks etc, when reading data
> +# which has its data checksums in a corrupted csum tree block.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick raid dangerous
> +
> +_require_scratch_nocheck
> +_require_scratch_dev_pool 2
> +
> +# Use RAID0 for data to get bio split according to stripe boundary.
> +# This is required to trigger the bug.
> +_require_btrfs_raid_type raid0
> +
> +# This test goes 4K sectorsize and 4K nodesize, so that we easily create
> +# higher level of csum tree.
> +_require_btrfs_support_sectorsize 4096
> +_require_btrfs_command inspect-internal dump-tree
> +
> +_fixed_by_kernel_commit xxxxxxxxxxxx \
> +       "btrfs: fix a use-after-free bug when hitting errors inside btrfs_submit_chunk()"
> +
> +# The bug itself has a race window, run this many times to ensure triggering.
> +# On an x86_64 VM with KASAN enabled, normally it is triggered before the 10th run.
> +iterations=32
> +
> +_scratch_pool_mkfs "-d raid0 -m single -n 4k -s 4k" >> $seqres.full 2>&1
> +# This test requires data checksum to trigger the bug.
> +_scratch_mount -o datasum,datacow
> +
> +# For the smallest csum size (CRC32C) it's 4 bytes per 4K, writing 32M data
> +# will need 32K data checksum at minimal, which is at least 8 leaves.
> +_pwrite_byte 0xef 0 32m "$SCRATCH_MNT/foobar" > /dev/null
> +_scratch_unmount
> +
> +
> +# Search for the last leaf of the csum tree, that will be the target to destroy.
> +$BTRFS_UTIL_PROG inspect-internal dump-tree -t 7 $SCRATCH_DEV >> $seqres.full
> +target_bytenr=$($BTRFS_UTIL_PROG inspect-internal dump-tree -t 7 $SCRATCH_DEV | grep "leaf.*flags" | sort | tail -n1 | cut -f2 -d\ )
> +
> +if [ -z "$target_bytenr" ]; then
> +       _fail "unable to locate the last csum tree leaf"
> +fi
> +
> +echo "bytenr of csum tree leaf to corrupt: $target_bytenr" >> $seqres.full
> +
> +# Corrupt that csum tree block.
> +physical=$(_btrfs_get_physical "$target_bytenr" 1)
> +dev=$(_btrfs_get_device_path "$target_bytenr" 1)
> +
> +echo "physical bytenr: $physical" >> $seqres.full
> +echo "physical device: $dev" >> $seqres.full
> +
> +_pwrite_byte 0x00 "$physical" 4 "$dev" > /dev/null
> +
> +for (( i = 0; i < $iterations; i++ )); do
> +       echo "=== run $i/$iterations ===" >> $seqres.full
> +       _scratch_mount -o ro
> +       # Since the data is on RAID0, read bios will be split at the stripe
> +       # (64K sized) boundary. If csum lookup failed due to corrupted csum
> +       # tree, there is a race window that can lead to double bio freeing
> +       # (triggering KASAN at least).
> +       cat "$SCRATCH_MNT/foobar" &> /dev/null
> +       _scratch_unmount
> +
> +       # Instead of relying on the final _check_dmesg() to find errors,
> +       # error out immediately if KASAN is triggered.
> +       # As non-triggering runs will generate quite some error messages,
> +       # making investigation much harder.
> +       if _check_dmesg_for "BUG" ; then
> +               _fail "Critical error(s) found in dmesg"
> +       fi
> +done
> +
> +echo "Silence is golden"
> +
> +status=0
> +exit
> diff --git a/tests/btrfs/321.out b/tests/btrfs/321.out
> new file mode 100644
> index 000000000000..290a5eb31312
> --- /dev/null
> +++ b/tests/btrfs/321.out
> @@ -0,0 +1,2 @@
> +QA output created by 321
> +Silence is golden
> --
> 2.46.0
>
>
diff mbox series

Patch

diff --git a/tests/btrfs/321 b/tests/btrfs/321
new file mode 100755
index 000000000000..e30199daa0d0
--- /dev/null
+++ b/tests/btrfs/321
@@ -0,0 +1,83 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 321
+#
+# Make sure there are no use-after-free, crashes, deadlocks etc, when reading data
+# which has its data checksums in a corrupted csum tree block.
+#
+. ./common/preamble
+_begin_fstest auto quick raid dangerous
+
+_require_scratch_nocheck
+_require_scratch_dev_pool 2
+
+# Use RAID0 for data to get bio split according to stripe boundary.
+# This is required to trigger the bug.
+_require_btrfs_raid_type raid0
+
+# This test goes 4K sectorsize and 4K nodesize, so that we easily create
+# higher level of csum tree.
+_require_btrfs_support_sectorsize 4096
+_require_btrfs_command inspect-internal dump-tree
+
+_fixed_by_kernel_commit xxxxxxxxxxxx \
+	"btrfs: fix a use-after-free bug when hitting errors inside btrfs_submit_chunk()"
+
+# The bug itself has a race window, run this many times to ensure triggering.
+# On an x86_64 VM with KASAN enabled, normally it is triggered before the 10th run.
+iterations=32
+
+_scratch_pool_mkfs "-d raid0 -m single -n 4k -s 4k" >> $seqres.full 2>&1
+# This test requires data checksum to trigger the bug.
+_scratch_mount -o datasum,datacow
+
+# For the smallest csum size (CRC32C) it's 4 bytes per 4K, writing 32M data
+# will need 32K data checksum at minimal, which is at least 8 leaves.
+_pwrite_byte 0xef 0 32m "$SCRATCH_MNT/foobar" > /dev/null
+_scratch_unmount
+
+
+# Search for the last leaf of the csum tree, that will be the target to destroy.
+$BTRFS_UTIL_PROG inspect-internal dump-tree -t 7 $SCRATCH_DEV >> $seqres.full
+target_bytenr=$($BTRFS_UTIL_PROG inspect-internal dump-tree -t 7 $SCRATCH_DEV | grep "leaf.*flags" | sort | tail -n1 | cut -f2 -d\ )
+
+if [ -z "$target_bytenr" ]; then
+	_fail "unable to locate the last csum tree leaf"
+fi
+
+echo "bytenr of csum tree leaf to corrupt: $target_bytenr" >> $seqres.full
+
+# Corrupt that csum tree block.
+physical=$(_btrfs_get_physical "$target_bytenr" 1)
+dev=$(_btrfs_get_device_path "$target_bytenr" 1)
+
+echo "physical bytenr: $physical" >> $seqres.full
+echo "physical device: $dev" >> $seqres.full
+
+_pwrite_byte 0x00 "$physical" 4 "$dev" > /dev/null
+
+for (( i = 0; i < $iterations; i++ )); do
+	echo "=== run $i/$iterations ===" >> $seqres.full
+	_scratch_mount -o ro
+	# Since the data is on RAID0, read bios will be split at the stripe
+	# (64K sized) boundary. If csum lookup failed due to corrupted csum
+	# tree, there is a race window that can lead to double bio freeing
+	# (triggering KASAN at least).
+	cat "$SCRATCH_MNT/foobar" &> /dev/null
+	_scratch_unmount
+
+	# Instead of relying on the final _check_dmesg() to find errors,
+	# error out immediately if KASAN is triggered.
+	# As non-triggering runs will generate quite some error messages,
+	# making investigation much harder.
+	if _check_dmesg_for "BUG" ; then
+		_fail "Critical error(s) found in dmesg"
+	fi
+done
+
+echo "Silence is golden"
+
+status=0
+exit
diff --git a/tests/btrfs/321.out b/tests/btrfs/321.out
new file mode 100644
index 000000000000..290a5eb31312
--- /dev/null
+++ b/tests/btrfs/321.out
@@ -0,0 +1,2 @@ 
+QA output created by 321
+Silence is golden