diff mbox series

btrfs: fix hole expansion when writing at an offset beyond eof

Message ID 974b00afc2e703d5e0085fefbb46a1e495956ae1.1738778225.git.fdmanana@suse.com (mailing list archive)
State New
Headers show
Series btrfs: fix hole expansion when writing at an offset beyond eof | expand

Commit Message

Filipe Manana Feb. 5, 2025, 6 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

At btrfs_write_check() if our file's i_size is not sector size aligned and
we have a write that starts at an offset larger than the i_size that falls
within the same page of the i_size, then we end up not zeroing the file
range [i_size, write_offset).

The code is this:

    start_pos = round_down(pos, fs_info->sectorsize);
    oldsize = i_size_read(inode);
    if (start_pos > oldsize) {
        /* Expand hole size to cover write data, preventing empty gap */
        loff_t end_pos = round_up(pos + count, fs_info->sectorsize);

        ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, end_pos);
        if (ret)
            return ret;
    }

So if our file's i_size is 90269 bytes and a write at offset 90365 bytes
comes in, we get 'start_pos' set to 90112 bytes, which is less than the
i_size and therefore we don't zero out the range [90269, 90365) by
calling btrfs_cont_expand().

This is an old bug introduced in commit 9036c10208e1 ("Btrfs: update hole
handling v2"), from 2008, and the buggy code got moved around over the
years.

Fix this by discarding 'start_pos' and comparing against the write offset
('pos') without any alignment.

This bug was recently exposed by test case generic/363 which tests this
scenario by polluting ranges beyond eof with a mmap write and than verify
that after a file increases we get zeroes for the range which is supposed
to be a hole and not what we wrote with the previous mmaped write.

We're only seeing this exposed now because generic/363 used to run only
on xfs until last Sunday's fstests update.

The test was failing like this:

   $ ./check generic/363
   FSTYP         -- btrfs
   PLATFORM      -- Linux/x86_64 debian0 6.13.0-rc7-btrfs-next-185+ #17 SMP PREEMPT_DYNAMIC Mon Feb  3 12:28:46 WET 2025
   MKFS_OPTIONS  -- /dev/sdc
   MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1

   generic/363 0s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad)
       --- tests/generic/363.out	2025-02-05 15:31:14.013646509 +0000
       +++ /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad	2025-02-05 17:25:33.112630781 +0000
       @@ -1 +1,46 @@
        QA output created by 363
       +READ BAD DATA: offset = 0xdcad, size = 0xd921, fname = /home/fdmanana/btrfs-tests/dev/junk
       +OFFSET      GOOD    BAD     RANGE
       +0x1609d     0x0000  0x3104  0x0
       +operation# (mod 256) for the bad data may be 4
       +0x1609e     0x0000  0x0472  0x1
       +operation# (mod 256) for the bad data may be 4
       ...
       (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/363.out /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad'  to see the entire diff)
   Ran: generic/363
   Failures: generic/363
   Failed 1 of 1 tests

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Qu Wenruo Feb. 5, 2025, 9:05 p.m. UTC | #1
在 2025/2/6 04:30, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> At btrfs_write_check() if our file's i_size is not sector size aligned and
> we have a write that starts at an offset larger than the i_size that falls
> within the same page of the i_size, then we end up not zeroing the file
> range [i_size, write_offset).
>
> The code is this:
>
>      start_pos = round_down(pos, fs_info->sectorsize);
>      oldsize = i_size_read(inode);
>      if (start_pos > oldsize) {
>          /* Expand hole size to cover write data, preventing empty gap */
>          loff_t end_pos = round_up(pos + count, fs_info->sectorsize);
>
>          ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, end_pos);
>          if (ret)
>              return ret;
>      }
>
> So if our file's i_size is 90269 bytes and a write at offset 90365 bytes
> comes in, we get 'start_pos' set to 90112 bytes, which is less than the
> i_size and therefore we don't zero out the range [90269, 90365) by
> calling btrfs_cont_expand().
>
> This is an old bug introduced in commit 9036c10208e1 ("Btrfs: update hole
> handling v2"), from 2008, and the buggy code got moved around over the
> years.
>
> Fix this by discarding 'start_pos' and comparing against the write offset
> ('pos') without any alignment.
>
> This bug was recently exposed by test case generic/363 which tests this
> scenario by polluting ranges beyond eof with a mmap write and than verify
> that after a file increases we get zeroes for the range which is supposed
> to be a hole and not what we wrote with the previous mmaped write.
>
> We're only seeing this exposed now because generic/363 used to run only
> on xfs until last Sunday's fstests update.
>
> The test was failing like this:
>
>     $ ./check generic/363
>     FSTYP         -- btrfs
>     PLATFORM      -- Linux/x86_64 debian0 6.13.0-rc7-btrfs-next-185+ #17 SMP PREEMPT_DYNAMIC Mon Feb  3 12:28:46 WET 2025
>     MKFS_OPTIONS  -- /dev/sdc
>     MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
>
>     generic/363 0s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad)
>         --- tests/generic/363.out	2025-02-05 15:31:14.013646509 +0000
>         +++ /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad	2025-02-05 17:25:33.112630781 +0000
>         @@ -1 +1,46 @@
>          QA output created by 363
>         +READ BAD DATA: offset = 0xdcad, size = 0xd921, fname = /home/fdmanana/btrfs-tests/dev/junk
>         +OFFSET      GOOD    BAD     RANGE
>         +0x1609d     0x0000  0x3104  0x0
>         +operation# (mod 256) for the bad data may be 4
>         +0x1609e     0x0000  0x0472  0x1
>         +operation# (mod 256) for the bad data may be 4
>         ...
>         (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/363.out /home/fdmanana/git/hub/xfstests/results//generic/363.out.bad'  to see the entire diff)
>     Ran: generic/363
>     Failures: generic/363
>     Failed 1 of 1 tests
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>   fs/btrfs/file.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 36f51c311bb1..ed3c0d6546c5 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1039,7 +1039,6 @@ int btrfs_write_check(struct kiocb *iocb, size_t count)
>   	loff_t pos = iocb->ki_pos;
>   	int ret;
>   	loff_t oldsize;
> -	loff_t start_pos;
>
>   	/*
>   	 * Quickly bail out on NOWAIT writes if we don't have the nodatacow or
> @@ -1066,9 +1065,8 @@ int btrfs_write_check(struct kiocb *iocb, size_t count)
>   		inode_inc_iversion(inode);
>   	}
>
> -	start_pos = round_down(pos, fs_info->sectorsize);
>   	oldsize = i_size_read(inode);
> -	if (start_pos > oldsize) {
> +	if (pos > oldsize) {
>   		/* Expand hole size to cover write data, preventing empty gap */
>   		loff_t end_pos = round_up(pos + count, fs_info->sectorsize);
>
diff mbox series

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 36f51c311bb1..ed3c0d6546c5 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1039,7 +1039,6 @@  int btrfs_write_check(struct kiocb *iocb, size_t count)
 	loff_t pos = iocb->ki_pos;
 	int ret;
 	loff_t oldsize;
-	loff_t start_pos;
 
 	/*
 	 * Quickly bail out on NOWAIT writes if we don't have the nodatacow or
@@ -1066,9 +1065,8 @@  int btrfs_write_check(struct kiocb *iocb, size_t count)
 		inode_inc_iversion(inode);
 	}
 
-	start_pos = round_down(pos, fs_info->sectorsize);
 	oldsize = i_size_read(inode);
-	if (start_pos > oldsize) {
+	if (pos > oldsize) {
 		/* Expand hole size to cover write data, preventing empty gap */
 		loff_t end_pos = round_up(pos + count, fs_info->sectorsize);