diff mbox series

btrfs: new test for logical inode resolution panic

Message ID 98d2055515cd765b0835a7f751a21cbb72c03621.1671059406.git.boris@bur.io (mailing list archive)
State New, archived
Headers show
Series btrfs: new test for logical inode resolution panic | expand

Commit Message

Boris Burkov Dec. 14, 2022, 11:12 p.m. UTC
If we create a file that has an inline extent followed by a prealloc
extent, then attempt to use the logical to inode ioctl on the prealloc
extent, but in the overwritten range, backref resolution will process
the inline extent. Depending on the leaf eb layout, this can panic.
Add a new test for this condition. In the long run, we can add spew when
we read out-of-bounds fields of inline extent items and simplify this
test to look for dmesg warnings rather than trying to force a fairly
fragile panic (dependent on non-standardized details of leaf layout).

The test causes a kernel panic unless:
btrfs: fix logical_ino ioctl panic
is applied to the kernel.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 tests/btrfs/279     | 95 +++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/279.out |  2 +
 2 files changed, 97 insertions(+)
 create mode 100755 tests/btrfs/279
 create mode 100644 tests/btrfs/279.out

Comments

David Disseldorp Dec. 15, 2022, 3:33 p.m. UTC | #1
Hi Boris,

Thanks for providing this reproducer...

On Wed, 14 Dec 2022 15:12:01 -0800, Boris Burkov wrote:

> If we create a file that has an inline extent followed by a prealloc
> extent, then attempt to use the logical to inode ioctl on the prealloc
> extent, but in the overwritten range, backref resolution will process
> the inline extent. Depending on the leaf eb layout, this can panic.
> Add a new test for this condition. In the long run, we can add spew when
> we read out-of-bounds fields of inline extent items and simplify this
> test to look for dmesg warnings rather than trying to force a fairly
> fragile panic (dependent on non-standardized details of leaf layout).
> 
> The test causes a kernel panic unless:
> btrfs: fix logical_ino ioctl panic
> is applied to the kernel.

This could be included as a test hint via _fixed_by_kernel_commit(), but
given that failure is a panic, it probably doesn't matter...

> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  tests/btrfs/279     | 95 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/279.out |  2 +

Looks like a rebase is needed - btrfs/279 is already taken.

>  2 files changed, 97 insertions(+)
>  create mode 100755 tests/btrfs/279
>  create mode 100644 tests/btrfs/279.out
> 
> diff --git a/tests/btrfs/279 b/tests/btrfs/279
> new file mode 100755
> index 00000000..ef77f84b
> --- /dev/null
> +++ b/tests/btrfs/279
> @@ -0,0 +1,95 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2022 Meta Platforms, Inc.  All Rights Reserved.
> +#
> +# FS QA Test 279
> +#
> +# Given a file with extents:
> +# [0 : 4096) (inline)
> +# [4096 : N] (prealloc)
> +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the
> +# non-inline extent, it results in reading the offset field of the inline
> +# extent, which is meaningless (it is full of user data..). If we are
> +# particularly lucky, it can be past the end of the extent buffer, resulting in
> +# a crash.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick
> +
> +# Import common functions.
> +. ./common/filter
> +. ./common/dmlogwrites
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_require_scratch
> +_require_xfs_io_command "falloc"
> +_require_xfs_io_command "fsync"
> +_require_xfs_io_command "pwrite"
> +
> +dump_tree() {
> +	$BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV
> +}
> +
> +get_extent_data() {
> +	local ino=$1
> +	dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA "
> +}
> +
> +get_prealloc_offset() {
> +	local ino=$1
> +	get_extent_data $ino | grep "disk byte" | awk '{print $5}'
> +}
> +
> +# This test needs to create conditions s.t. the special inode's inline extent
> +# is the first item in a leaf. Therefore, fix a leaf size and add 
> +# items that are otherwise not necessary to reproduce the inline-prealloc
> +# condition to get to such a state.
> +#
> +# Roughly, the idea for getting the right item fill is to:
> +# 1. create an extra file with a variable sized inline extent item
> +# 2. create our evil file that will cause the panic
> +# 3. create a whole bunch of files to create a bunch of dir/index items
> +# 4. size the variable extent item s.t. the evil extent item is item 0 in the
> +#    next leaf
> +#
> +# We do it in this somewhat convoluted way because the dir and index items all
> +# come before any inode, inode_ref, or extent_data items. So we can predictably
> +# create a bunch of them, then sneak in a funny sized extent to round out the
> +# difference.
> +
> +_scratch_mkfs "--nodesize 16k" >/dev/null
> +_scratch_mount
> +
> +f=$SCRATCH_MNT/f
> +
> +# the variable extra "leaf padding" file
> +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad
> +
> +# the evil file with an inline extent followed by a prealloc extent
> +# created by falloc with keep-size, then two non-truncating writes to the front
> +touch $f.evil
> +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil
> +ino=$(stat -c '%i' $f.evil)
> +logical=$(get_prealloc_offset $ino)
> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil
> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil
> +sync
> +
> +# a bunch of inodes to stuff dir items in front of the extent items
> +for i in $(seq 122); do

nit: a c-style bash loop would avoid the extra seq process

> +	$XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
> +done
> +sync
> +
> +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch
> +_scratch_unmount
> +
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out
> new file mode 100644
> index 00000000..c5906093
> --- /dev/null
> +++ b/tests/btrfs/279.out
> @@ -0,0 +1,2 @@
> +QA output created by 279
> +Silence is golden

With the rebase to avoid tests/btrfs/279 collision, this looks good:

Reviewed-by: David Disseldorp <ddiss@suse.de>
Zorro Lang Dec. 16, 2022, 8:46 a.m. UTC | #2
On Wed, Dec 14, 2022 at 03:12:01PM -0800, Boris Burkov wrote:
> If we create a file that has an inline extent followed by a prealloc
> extent, then attempt to use the logical to inode ioctl on the prealloc
> extent, but in the overwritten range, backref resolution will process
> the inline extent. Depending on the leaf eb layout, this can panic.
> Add a new test for this condition. In the long run, we can add spew when
> we read out-of-bounds fields of inline extent items and simplify this
> test to look for dmesg warnings rather than trying to force a fairly
> fragile panic (dependent on non-standardized details of leaf layout).
> 
> The test causes a kernel panic unless:
> btrfs: fix logical_ino ioctl panic

Hi,

Thanks for this new case.

Is this patch merged, does it has a commit id? BTW, please mark this
known issue by _fixed_by_kernel_commit in case source code.

> is applied to the kernel.
> 
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  tests/btrfs/279     | 95 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/279.out |  2 +
>  2 files changed, 97 insertions(+)
>  create mode 100755 tests/btrfs/279
>  create mode 100644 tests/btrfs/279.out
> 
> diff --git a/tests/btrfs/279 b/tests/btrfs/279
> new file mode 100755
> index 00000000..ef77f84b
> --- /dev/null
> +++ b/tests/btrfs/279

btrfs/279 was just token last weekend, please help to rebase to the latest
fstests for-next branch, or change the 279 to a big enough number which won't
cause conflict (I'll give it a proper number when merge it).

> @@ -0,0 +1,95 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2022 Meta Platforms, Inc.  All Rights Reserved.
> +#
> +# FS QA Test 279
> +#
> +# Given a file with extents:
> +# [0 : 4096) (inline)
> +# [4096 : N] (prealloc)
> +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the
> +# non-inline extent, it results in reading the offset field of the inline
> +# extent, which is meaningless (it is full of user data..). If we are
> +# particularly lucky, it can be past the end of the extent buffer, resulting in
> +# a crash.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick
> +
> +# Import common functions.
> +. ./common/filter
> +. ./common/dmlogwrites

Does this case really use helpers in above two included files?

> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_require_scratch
> +_require_xfs_io_command "falloc"

Add this case to prealloc or preallocrw group

> +_require_xfs_io_command "fsync"
> +_require_xfs_io_command "pwrite"
> +
> +dump_tree() {
> +	$BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV

_require_btrfs_command inspect-internal dump-tree

> +}
> +
> +get_extent_data() {
> +	local ino=$1
> +	dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA "
> +}
> +
> +get_prealloc_offset() {
> +	local ino=$1
> +	get_extent_data $ino | grep "disk byte" | awk '{print $5}'
> +}
> +
> +# This test needs to create conditions s.t. the special inode's inline extent
> +# is the first item in a leaf. Therefore, fix a leaf size and add 
> +# items that are otherwise not necessary to reproduce the inline-prealloc
> +# condition to get to such a state.
> +#
> +# Roughly, the idea for getting the right item fill is to:
> +# 1. create an extra file with a variable sized inline extent item
> +# 2. create our evil file that will cause the panic
> +# 3. create a whole bunch of files to create a bunch of dir/index items
> +# 4. size the variable extent item s.t. the evil extent item is item 0 in the
> +#    next leaf
> +#
> +# We do it in this somewhat convoluted way because the dir and index items all
> +# come before any inode, inode_ref, or extent_data items. So we can predictably
> +# create a bunch of them, then sneak in a funny sized extent to round out the
> +# difference.
> +
> +_scratch_mkfs "--nodesize 16k" >/dev/null

We recommend calling _fail at here if _scratch_mkfs fails with a *specified*
option.

_scratch_mkfs "--nodesize 16k" >>$seqres.full || _fail "mkfs failed"

> +_scratch_mount
> +
> +f=$SCRATCH_MNT/f
> +
> +# the variable extra "leaf padding" file
> +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad
> +
> +# the evil file with an inline extent followed by a prealloc extent
> +# created by falloc with keep-size, then two non-truncating writes to the front
> +touch $f.evil

Not sure if you reall use this touch command as you've used "-f" option for
xfs_io.

> +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil
> +ino=$(stat -c '%i' $f.evil)
> +logical=$(get_prealloc_offset $ino)
> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil
> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil

xfs_io commands can be combined in one line, to make the code looks clear (except
you hope the fd closed after each command done) E.g:

$XFS_IO_PROG -fc "falloc -k 0 1m" -c fsync $f.evil
ino=$(stat -c '%i' $f.evil)
logical=$(get_prealloc_offset $ino)
$XFS_IO_PROG -c "pwrite -q 0 23" -c fsync \
	     -c "pwrite -q 0 23" -c fsync \
	     $f.evil

> +sync
> +
> +# a bunch of inodes to stuff dir items in front of the extent items
> +for i in $(seq 122); do
> +	$XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
> +done
> +sync
> +
> +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch
   ^^
   $BTRFS_UTIL_PROG

_require_btrfs_command inspect-internal logical-resolve

The .out file only has "Silence is golden", what's this _filter_scratch used for?

> +_scratch_unmount

Is the ending _scratch_unmount useful?

> +
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out
> new file mode 100644
> index 00000000..c5906093
> --- /dev/null
> +++ b/tests/btrfs/279.out
> @@ -0,0 +1,2 @@
> +QA output created by 279
> +Silence is golden
> -- 
> 2.38.1
>
Qu Wenruo Dec. 16, 2022, 9:14 a.m. UTC | #3
On 2022/12/16 16:46, Zorro Lang wrote:
> On Wed, Dec 14, 2022 at 03:12:01PM -0800, Boris Burkov wrote:
>> If we create a file that has an inline extent followed by a prealloc
>> extent, then attempt to use the logical to inode ioctl on the prealloc
>> extent, but in the overwritten range, backref resolution will process
>> the inline extent. Depending on the leaf eb layout, this can panic.
>> Add a new test for this condition. In the long run, we can add spew when
>> we read out-of-bounds fields of inline extent items and simplify this
>> test to look for dmesg warnings rather than trying to force a fairly
>> fragile panic (dependent on non-standardized details of leaf layout).
>>
>> The test causes a kernel panic unless:
>> btrfs: fix logical_ino ioctl panic
> 
> Hi,
> 
> Thanks for this new case.
> 
> Is this patch merged, does it has a commit id? BTW, please mark this
> known issue by _fixed_by_kernel_commit in case source code.

This fix is not yet merged, but I believe it would soon get merged:

https://patchwork.kernel.org/project/linux-btrfs/patch/80f01f297721481fd0d4cbf03fd2550e25b578fb.1671058852.git.boris@bur.io/

> 
>> is applied to the kernel.
>>
>> Signed-off-by: Boris Burkov <boris@bur.io>
>> ---
>>   tests/btrfs/279     | 95 +++++++++++++++++++++++++++++++++++++++++++++
>>   tests/btrfs/279.out |  2 +
>>   2 files changed, 97 insertions(+)
>>   create mode 100755 tests/btrfs/279
>>   create mode 100644 tests/btrfs/279.out
>>
>> diff --git a/tests/btrfs/279 b/tests/btrfs/279
>> new file mode 100755
>> index 00000000..ef77f84b
>> --- /dev/null
>> +++ b/tests/btrfs/279
> 
> btrfs/279 was just token last weekend, please help to rebase to the latest
> fstests for-next branch, or change the 279 to a big enough number which won't
> cause conflict (I'll give it a proper number when merge it).
> 
>> @@ -0,0 +1,95 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (c) 2022 Meta Platforms, Inc.  All Rights Reserved.
>> +#
>> +# FS QA Test 279
>> +#
>> +# Given a file with extents:
>> +# [0 : 4096) (inline)
>> +# [4096 : N] (prealloc)
>> +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the
>> +# non-inline extent, it results in reading the offset field of the inline
>> +# extent, which is meaningless (it is full of user data..). If we are
>> +# particularly lucky, it can be past the end of the extent buffer, resulting in
>> +# a crash.
>> +#
>> +. ./common/preamble
>> +_begin_fstest auto quick
>> +
>> +# Import common functions.
>> +. ./common/filter
>> +. ./common/dmlogwrites
> 
> Does this case really use helpers in above two included files?
> 
>> +
>> +# real QA test starts here
>> +
>> +# Modify as appropriate.
>> +_supported_fs btrfs
>> +_require_scratch
>> +_require_xfs_io_command "falloc"
> 
> Add this case to prealloc or preallocrw group
> 
>> +_require_xfs_io_command "fsync"
>> +_require_xfs_io_command "pwrite"
>> +
>> +dump_tree() {
>> +	$BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV
> 
> _require_btrfs_command inspect-internal dump-tree
> 
>> +}
>> +
>> +get_extent_data() {
>> +	local ino=$1
>> +	dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA "
>> +}
>> +
>> +get_prealloc_offset() {
>> +	local ino=$1
>> +	get_extent_data $ino | grep "disk byte" | awk '{print $5}'
>> +}
>> +
>> +# This test needs to create conditions s.t. the special inode's inline extent
>> +# is the first item in a leaf. Therefore, fix a leaf size and add
>> +# items that are otherwise not necessary to reproduce the inline-prealloc
>> +# condition to get to such a state.
>> +#
>> +# Roughly, the idea for getting the right item fill is to:
>> +# 1. create an extra file with a variable sized inline extent item
>> +# 2. create our evil file that will cause the panic
>> +# 3. create a whole bunch of files to create a bunch of dir/index items
>> +# 4. size the variable extent item s.t. the evil extent item is item 0 in the
>> +#    next leaf
>> +#
>> +# We do it in this somewhat convoluted way because the dir and index items all
>> +# come before any inode, inode_ref, or extent_data items. So we can predictably
>> +# create a bunch of them, then sneak in a funny sized extent to round out the
>> +# difference.
>> +
>> +_scratch_mkfs "--nodesize 16k" >/dev/null
> 
> We recommend calling _fail at here if _scratch_mkfs fails with a *specified*
> option.
> 
> _scratch_mkfs "--nodesize 16k" >>$seqres.full || _fail "mkfs failed"
> 
>> +_scratch_mount
>> +
>> +f=$SCRATCH_MNT/f
>> +
>> +# the variable extra "leaf padding" file
>> +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad
>> +
>> +# the evil file with an inline extent followed by a prealloc extent
>> +# created by falloc with keep-size, then two non-truncating writes to the front
>> +touch $f.evil
> 
> Not sure if you reall use this touch command as you've used "-f" option for
> xfs_io.
> 
>> +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil
>> +$XFS_IO_PROG -fc fsync $f.evil
>> +ino=$(stat -c '%i' $f.evil)
>> +logical=$(get_prealloc_offset $ino)
>> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
>> +$XFS_IO_PROG -fc fsync $f.evil
>> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
>> +$XFS_IO_PROG -fc fsync $f.evil
> 
> xfs_io commands can be combined in one line, to make the code looks clear (except
> you hope the fd closed after each command done) E.g:
> 
> $XFS_IO_PROG -fc "falloc -k 0 1m" -c fsync $f.evil
> ino=$(stat -c '%i' $f.evil)
> logical=$(get_prealloc_offset $ino)
> $XFS_IO_PROG -c "pwrite -q 0 23" -c fsync \
> 	     -c "pwrite -q 0 23" -c fsync \
> 	     $f.evil
> 
>> +sync
>> +
>> +# a bunch of inodes to stuff dir items in front of the extent items
>> +for i in $(seq 122); do
>> +	$XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
>> +done
>> +sync
>> +
>> +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch
>     ^^
>     $BTRFS_UTIL_PROG
> 
> _require_btrfs_command inspect-internal logical-resolve
> 
> The .out file only has "Silence is golden", what's this _filter_scratch used for?
> 
>> +_scratch_unmount
> 
> Is the ending _scratch_unmount useful?
> 
>> +
>> +echo "Silence is golden"
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out
>> new file mode 100644
>> index 00000000..c5906093
>> --- /dev/null
>> +++ b/tests/btrfs/279.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 279
>> +Silence is golden
>> -- 
>> 2.38.1
>>
>
Filipe Manana Dec. 16, 2022, 11:10 a.m. UTC | #4
On Wed, Dec 14, 2022 at 11:15 PM Boris Burkov <boris@bur.io> wrote:
>
> If we create a file that has an inline extent followed by a prealloc
> extent, then attempt to use the logical to inode ioctl on the prealloc
> extent, but in the overwritten range, backref resolution will process
> the inline extent. Depending on the leaf eb layout, this can panic.
> Add a new test for this condition. In the long run, we can add spew when
> we read out-of-bounds fields of inline extent items and simplify this
> test to look for dmesg warnings rather than trying to force a fairly
> fragile panic (dependent on non-standardized details of leaf layout).
>
> The test causes a kernel panic unless:
> btrfs: fix logical_ino ioctl panic
> is applied to the kernel.
>
> Signed-off-by: Boris Burkov <boris@bur.io>

So in addition to feedback already received from David and Zorro, I
have some comments inlined below.

> ---
>  tests/btrfs/279     | 95 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/279.out |  2 +
>  2 files changed, 97 insertions(+)
>  create mode 100755 tests/btrfs/279
>  create mode 100644 tests/btrfs/279.out
>
> diff --git a/tests/btrfs/279 b/tests/btrfs/279
> new file mode 100755
> index 00000000..ef77f84b
> --- /dev/null
> +++ b/tests/btrfs/279
> @@ -0,0 +1,95 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2022 Meta Platforms, Inc.  All Rights Reserved.
> +#
> +# FS QA Test 279
> +#
> +# Given a file with extents:
> +# [0 : 4096) (inline)
> +# [4096 : N] (prealloc)
> +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the
> +# non-inline extent, it results in reading the offset field of the inline
> +# extent, which is meaningless (it is full of user data..). If we are
> +# particularly lucky, it can be past the end of the extent buffer, resulting in
> +# a crash.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick
> +
> +# Import common functions.
> +. ./common/filter
> +. ./common/dmlogwrites
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_require_scratch
> +_require_xfs_io_command "falloc"

Here it should be:

_require_xfs_io_command "falloc" "-k"

> +_require_xfs_io_command "fsync"
> +_require_xfs_io_command "pwrite"

This is the first time I'm seeing a test requiring xfs_io to support
the pwrite and fsync commands.
Presumably there aren't any because either these commands always
existed or they have been around for a very long time.

> +
> +dump_tree() {
> +       $BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV
> +}
> +
> +get_extent_data() {
> +       local ino=$1
> +       dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA "
> +}
> +
> +get_prealloc_offset() {
> +       local ino=$1
> +       get_extent_data $ino | grep "disk byte" | awk '{print $5}'

In fstests we use $AWK_PROG instead of bare 'awk'.

> +}
> +
> +# This test needs to create conditions s.t. the special inode's inline extent
> +# is the first item in a leaf. Therefore, fix a leaf size and add
> +# items that are otherwise not necessary to reproduce the inline-prealloc
> +# condition to get to such a state.
> +#
> +# Roughly, the idea for getting the right item fill is to:
> +# 1. create an extra file with a variable sized inline extent item
> +# 2. create our evil file that will cause the panic
> +# 3. create a whole bunch of files to create a bunch of dir/index items
> +# 4. size the variable extent item s.t. the evil extent item is item 0 in the
> +#    next leaf
> +#
> +# We do it in this somewhat convoluted way because the dir and index items all
> +# come before any inode, inode_ref, or extent_data items. So we can predictably
> +# create a bunch of them, then sneak in a funny sized extent to round out the
> +# difference.
> +
> +_scratch_mkfs "--nodesize 16k" >/dev/null

So this will fail on a machine with a 64K page size (PPC for e.g.)
using a kernel or btrfs-progs without subpage sector size support.
That will make mkfs output an error to stderr and cause the test to fail.

Please use a 64K node size and adapt the number of files below so that
we get the problematic leaf layout to trigger
the bug.

Like this the test will be able to run, and reproduce the bug on an
unpatched kernel, on machines with any page size
and on kernels without subpage sector size support (thinking of SLE
kernels for example).

That's generally what we do when we need to exercise a particular leaf
layout and allow the test to run on machines
with any page size. For example btrfs/239 and btrfs/154 do this.

> +_scratch_mount
> +
> +f=$SCRATCH_MNT/f
> +
> +# the variable extra "leaf padding" file
> +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad
> +
> +# the evil file with an inline extent followed by a prealloc extent
> +# created by falloc with keep-size, then two non-truncating writes to the front
> +touch $f.evil
> +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil
> +ino=$(stat -c '%i' $f.evil)
> +logical=$(get_prealloc_offset $ino)
> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil
> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil
> +sync

This is complex, and it doesn't provide any comments regarding:

1) Why all this frenzy of fsync and a final sync (which makes the last
fsync pointless)?

2) Why do we need to write twice to the range [0, 23)?

Wouldn't something more simple like this work too:

$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil

# sync to commit the transaction and make sure dump-tree sees the fs tree with
# the prealloc extent when we try to get its bytenr.
sync
ino=$(stat -c '%i' $f.evil)
logical=$(get_prealloc_offset $ino)

# Do a write that will result in an inline extent.
$XFS_IO_PROG -c "pwrite -q 0 23" $f.evil

# A bunch of inodes to stuff dir items in front of the file extent items.
for i in $(seq 122); do
     $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
done

# Flush all dealloc so we get all the file extent items inserted in the fs tree.
sync

Please add comments about each necessary step, as I've done there.
Otherwise it's very hard to understand why those steps are needed...

I'm surprised no one commented on this before.
I'm comfortable with btrfs' internals and yet I can't understand why
there are so many steps, and what exactly each one is supposed to
accomplish.

> +
> +# a bunch of inodes to stuff dir items in front of the extent items
> +for i in $(seq 122); do
> +       $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
> +done
> +sync
> +
> +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch
> +_scratch_unmount

The unmount is pointless, fstests framework will do that for us.

Finally I would also like to see the _fixed_by_kernel_commit
annotation in the test.
Since the fix is not yet in Linus' tree, in IMO you can replace the
commit hash with "xxxx..." and later update the test once it's merged
in Linus' tree.

Thanks.

> +
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out
> new file mode 100644
> index 00000000..c5906093
> --- /dev/null
> +++ b/tests/btrfs/279.out
> @@ -0,0 +1,2 @@
> +QA output created by 279
> +Silence is golden
> --
> 2.38.1
>
Boris Burkov Dec. 16, 2022, 9:56 p.m. UTC | #5
On Fri, Dec 16, 2022 at 11:10:26AM +0000, Filipe Manana wrote:
> On Wed, Dec 14, 2022 at 11:15 PM Boris Burkov <boris@bur.io> wrote:
> >
> > If we create a file that has an inline extent followed by a prealloc
> > extent, then attempt to use the logical to inode ioctl on the prealloc
> > extent, but in the overwritten range, backref resolution will process
> > the inline extent. Depending on the leaf eb layout, this can panic.
> > Add a new test for this condition. In the long run, we can add spew when
> > we read out-of-bounds fields of inline extent items and simplify this
> > test to look for dmesg warnings rather than trying to force a fairly
> > fragile panic (dependent on non-standardized details of leaf layout).
> >
> > The test causes a kernel panic unless:
> > btrfs: fix logical_ino ioctl panic
> > is applied to the kernel.
> >
> > Signed-off-by: Boris Burkov <boris@bur.io>
> 
> So in addition to feedback already received from David and Zorro, I
> have some comments inlined below.
> 
> > ---
> >  tests/btrfs/279     | 95 +++++++++++++++++++++++++++++++++++++++++++++
> >  tests/btrfs/279.out |  2 +
> >  2 files changed, 97 insertions(+)
> >  create mode 100755 tests/btrfs/279
> >  create mode 100644 tests/btrfs/279.out
> >
> > diff --git a/tests/btrfs/279 b/tests/btrfs/279
> > new file mode 100755
> > index 00000000..ef77f84b
> > --- /dev/null
> > +++ b/tests/btrfs/279
> > @@ -0,0 +1,95 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2022 Meta Platforms, Inc.  All Rights Reserved.
> > +#
> > +# FS QA Test 279
> > +#
> > +# Given a file with extents:
> > +# [0 : 4096) (inline)
> > +# [4096 : N] (prealloc)
> > +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the
> > +# non-inline extent, it results in reading the offset field of the inline
> > +# extent, which is meaningless (it is full of user data..). If we are
> > +# particularly lucky, it can be past the end of the extent buffer, resulting in
> > +# a crash.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto quick
> > +
> > +# Import common functions.
> > +. ./common/filter
> > +. ./common/dmlogwrites
> > +
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> > +_supported_fs btrfs
> > +_require_scratch
> > +_require_xfs_io_command "falloc"
> 
> Here it should be:
> 
> _require_xfs_io_command "falloc" "-k"
> 
> > +_require_xfs_io_command "fsync"
> > +_require_xfs_io_command "pwrite"
> 
> This is the first time I'm seeing a test requiring xfs_io to support
> the pwrite and fsync commands.
> Presumably there aren't any because either these commands always
> existed or they have been around for a very long time.
> 
> > +
> > +dump_tree() {
> > +       $BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV
> > +}
> > +
> > +get_extent_data() {
> > +       local ino=$1
> > +       dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA "
> > +}
> > +
> > +get_prealloc_offset() {
> > +       local ino=$1
> > +       get_extent_data $ino | grep "disk byte" | awk '{print $5}'
> 
> In fstests we use $AWK_PROG instead of bare 'awk'.
> 
> > +}
> > +
> > +# This test needs to create conditions s.t. the special inode's inline extent
> > +# is the first item in a leaf. Therefore, fix a leaf size and add
> > +# items that are otherwise not necessary to reproduce the inline-prealloc
> > +# condition to get to such a state.
> > +#
> > +# Roughly, the idea for getting the right item fill is to:
> > +# 1. create an extra file with a variable sized inline extent item
> > +# 2. create our evil file that will cause the panic
> > +# 3. create a whole bunch of files to create a bunch of dir/index items
> > +# 4. size the variable extent item s.t. the evil extent item is item 0 in the
> > +#    next leaf
> > +#
> > +# We do it in this somewhat convoluted way because the dir and index items all
> > +# come before any inode, inode_ref, or extent_data items. So we can predictably
> > +# create a bunch of them, then sneak in a funny sized extent to round out the
> > +# difference.
> > +
> > +_scratch_mkfs "--nodesize 16k" >/dev/null
> 
> So this will fail on a machine with a 64K page size (PPC for e.g.)
> using a kernel or btrfs-progs without subpage sector size support.
> That will make mkfs output an error to stderr and cause the test to fail.
> 
> Please use a 64K node size and adapt the number of files below so that
> we get the problematic leaf layout to trigger
> the bug.
> 
> Like this the test will be able to run, and reproduce the bug on an
> unpatched kernel, on machines with any page size
> and on kernels without subpage sector size support (thinking of SLE
> kernels for example).
> 
> That's generally what we do when we need to exercise a particular leaf
> layout and allow the test to run on machines
> with any page size. For example btrfs/239 and btrfs/154 do this.

Good point, I'll re-do it for 64k to make it fully general as that
was my intent in setting a fixed nodesize.

> 
> > +_scratch_mount
> > +
> > +f=$SCRATCH_MNT/f
> > +
> > +# the variable extra "leaf padding" file
> > +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad
> > +
> > +# the evil file with an inline extent followed by a prealloc extent
> > +# created by falloc with keep-size, then two non-truncating writes to the front
> > +touch $f.evil
> > +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil
> > +$XFS_IO_PROG -fc fsync $f.evil
> > +ino=$(stat -c '%i' $f.evil)
> > +logical=$(get_prealloc_offset $ino)
> > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> > +$XFS_IO_PROG -fc fsync $f.evil
> > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> > +$XFS_IO_PROG -fc fsync $f.evil
> > +sync
> 
> This is complex, and it doesn't provide any comments regarding:
> 
> 1) Why all this frenzy of fsync and a final sync (which makes the last
> fsync pointless)?
> 
> 2) Why do we need to write twice to the range [0, 23)?

Honestly, I'm not sure. A single write results in a regular extent while
two (separated by an fsync) results in an inline extent. Since we don't
consider the inline extent a bug (which I inferred from I think your
previous discussion with Qu for a similar case), I didn't prioritize
digging into that behavior while working on this fix/test.

> 
> Wouldn't something more simple like this work too:
> 
> $XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil
> 
> # sync to commit the transaction and make sure dump-tree sees the fs tree with
> # the prealloc extent when we try to get its bytenr.
> sync
> ino=$(stat -c '%i' $f.evil)
> logical=$(get_prealloc_offset $ino)
> 
> # Do a write that will result in an inline extent.
> $XFS_IO_PROG -c "pwrite -q 0 23" $f.evil
> 
> # A bunch of inodes to stuff dir items in front of the file extent items.
> for i in $(seq 122); do
>      $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
> done
> 
> # Flush all dealloc so we get all the file extent items inserted in the fs tree.
> sync
> 
> Please add comments about each necessary step, as I've done there.
> Otherwise it's very hard to understand why those steps are needed...
> 
> I'm surprised no one commented on this before.
> I'm comfortable with btrfs' internals and yet I can't understand why
> there are so many steps, and what exactly each one is supposed to
> accomplish.

I like fstests having a literate style, so I'm happy to do this, but I
must point out that it's far from universal in the repo...

> 
> > +
> > +# a bunch of inodes to stuff dir items in front of the extent items
> > +for i in $(seq 122); do
> > +       $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
> > +done
> > +sync
> > +
> > +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch
> > +_scratch_unmount
> 
> The unmount is pointless, fstests framework will do that for us.
> 
> Finally I would also like to see the _fixed_by_kernel_commit
> annotation in the test.
> Since the fix is not yet in Linus' tree, in IMO you can replace the
> commit hash with "xxxx..." and later update the test once it's merged
> in Linus' tree.
> 
> Thanks.

Thanks for the review.
> 
> > +
> > +echo "Silence is golden"
> > +status=0
> > +exit
> > diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out
> > new file mode 100644
> > index 00000000..c5906093
> > --- /dev/null
> > +++ b/tests/btrfs/279.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 279
> > +Silence is golden
> > --
> > 2.38.1
> >
diff mbox series

Patch

diff --git a/tests/btrfs/279 b/tests/btrfs/279
new file mode 100755
index 00000000..ef77f84b
--- /dev/null
+++ b/tests/btrfs/279
@@ -0,0 +1,95 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022 Meta Platforms, Inc.  All Rights Reserved.
+#
+# FS QA Test 279
+#
+# Given a file with extents:
+# [0 : 4096) (inline)
+# [4096 : N] (prealloc)
+# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the
+# non-inline extent, it results in reading the offset field of the inline
+# extent, which is meaningless (it is full of user data..). If we are
+# particularly lucky, it can be past the end of the extent buffer, resulting in
+# a crash.
+#
+. ./common/preamble
+_begin_fstest auto quick
+
+# Import common functions.
+. ./common/filter
+. ./common/dmlogwrites
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_require_scratch
+_require_xfs_io_command "falloc"
+_require_xfs_io_command "fsync"
+_require_xfs_io_command "pwrite"
+
+dump_tree() {
+	$BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV
+}
+
+get_extent_data() {
+	local ino=$1
+	dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA "
+}
+
+get_prealloc_offset() {
+	local ino=$1
+	get_extent_data $ino | grep "disk byte" | awk '{print $5}'
+}
+
+# This test needs to create conditions s.t. the special inode's inline extent
+# is the first item in a leaf. Therefore, fix a leaf size and add 
+# items that are otherwise not necessary to reproduce the inline-prealloc
+# condition to get to such a state.
+#
+# Roughly, the idea for getting the right item fill is to:
+# 1. create an extra file with a variable sized inline extent item
+# 2. create our evil file that will cause the panic
+# 3. create a whole bunch of files to create a bunch of dir/index items
+# 4. size the variable extent item s.t. the evil extent item is item 0 in the
+#    next leaf
+#
+# We do it in this somewhat convoluted way because the dir and index items all
+# come before any inode, inode_ref, or extent_data items. So we can predictably
+# create a bunch of them, then sneak in a funny sized extent to round out the
+# difference.
+
+_scratch_mkfs "--nodesize 16k" >/dev/null
+_scratch_mount
+
+f=$SCRATCH_MNT/f
+
+# the variable extra "leaf padding" file
+$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad
+
+# the evil file with an inline extent followed by a prealloc extent
+# created by falloc with keep-size, then two non-truncating writes to the front
+touch $f.evil
+$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil
+$XFS_IO_PROG -fc fsync $f.evil
+ino=$(stat -c '%i' $f.evil)
+logical=$(get_prealloc_offset $ino)
+$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
+$XFS_IO_PROG -fc fsync $f.evil
+$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
+$XFS_IO_PROG -fc fsync $f.evil
+sync
+
+# a bunch of inodes to stuff dir items in front of the extent items
+for i in $(seq 122); do
+	$XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
+done
+sync
+
+btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch
+_scratch_unmount
+
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out
new file mode 100644
index 00000000..c5906093
--- /dev/null
+++ b/tests/btrfs/279.out
@@ -0,0 +1,2 @@ 
+QA output created by 279
+Silence is golden