diff mbox series

[v2] Btrfs: fix data bytes_may_use underflow with fallocate due to failed quota reserve

Message ID 1553572571-15614-1-git-send-email-robbieko@synology.com (mailing list archive)
State New, archived
Headers show
Series [v2] Btrfs: fix data bytes_may_use underflow with fallocate due to failed quota reserve | expand

Commit Message

robbieko March 26, 2019, 3:56 a.m. UTC
From: Robbie Ko <robbieko@synology.com>

When doing fallocate, we first add the range to the reserve_list
and then reserve the quota.
If quota reservation fails, we'll release all reserved parts of
reserve_list.
However, cur_offset is not updated to indicate that this
range is already been inserted into the list.
Therefore, the same range is freed twice.
Once at list_for_each_entry loop, and once at the end of the
function.
This will result in WARN_ON on bytes_may_use when we free the
remaining space.

At the end, under the 'out' label we have a call to:
   btrfs_free_reserved_data_space(inode, data_reserved, alloc_start,
alloc_end - cur_offset);
The start offset, third argument, should be cur_offset.
Everything from alloc_start to cur_offset was freed by the
list_for_each_entry_safe_loop.

Fixes: 18513091af94 ("btrfs: update btrfs_space_info's bytes_may_use timely")
Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/file.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Filipe Manana March 26, 2019, 10:54 a.m. UTC | #1
On Tue, Mar 26, 2019 at 3:57 AM robbieko <robbieko@synology.com> wrote:
>
> From: Robbie Ko <robbieko@synology.com>
>
> When doing fallocate, we first add the range to the reserve_list
> and then reserve the quota.
> If quota reservation fails, we'll release all reserved parts of
> reserve_list.
> However, cur_offset is not updated to indicate that this
> range is already been inserted into the list.
> Therefore, the same range is freed twice.
> Once at list_for_each_entry loop, and once at the end of the
> function.
> This will result in WARN_ON on bytes_may_use when we free the
> remaining space.
>
> At the end, under the 'out' label we have a call to:
>    btrfs_free_reserved_data_space(inode, data_reserved, alloc_start,
> alloc_end - cur_offset);
> The start offset, third argument, should be cur_offset.
> Everything from alloc_start to cur_offset was freed by the
> list_for_each_entry_safe_loop.
>
> Fixes: 18513091af94 ("btrfs: update btrfs_space_info's bytes_may_use timely")
> Signed-off-by: Robbie Ko <robbieko@synology.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Now it looks good, thanks.

> ---
>  fs/btrfs/file.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 34fe8a5..0832449 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -3132,6 +3132,7 @@ static long btrfs_fallocate(struct file *file, int mode,
>                         ret = btrfs_qgroup_reserve_data(inode, &data_reserved,
>                                         cur_offset, last_byte - cur_offset);
>                         if (ret < 0) {
> +                               cur_offset = last_byte;
>                                 free_extent_map(em);
>                                 break;
>                         }
> @@ -3181,7 +3182,7 @@ static long btrfs_fallocate(struct file *file, int mode,
>         /* Let go of our reservation. */
>         if (ret != 0 && !(mode & FALLOC_FL_ZERO_RANGE))
>                 btrfs_free_reserved_data_space(inode, data_reserved,
> -                               alloc_start, alloc_end - cur_offset);
> +                               cur_offset, alloc_end - cur_offset);
>         extent_changeset_free(data_reserved);
>         return ret;
>  }
> --
> 1.9.1
>
David Sterba April 9, 2019, 12:10 a.m. UTC | #2
On Tue, Mar 26, 2019 at 11:56:11AM +0800, robbieko wrote:
> From: Robbie Ko <robbieko@synology.com>
> 
> When doing fallocate, we first add the range to the reserve_list
> and then reserve the quota.
> If quota reservation fails, we'll release all reserved parts of
> reserve_list.
> However, cur_offset is not updated to indicate that this
> range is already been inserted into the list.
> Therefore, the same range is freed twice.
> Once at list_for_each_entry loop, and once at the end of the
> function.
> This will result in WARN_ON on bytes_may_use when we free the
> remaining space.
> 
> At the end, under the 'out' label we have a call to:
>    btrfs_free_reserved_data_space(inode, data_reserved, alloc_start,
> alloc_end - cur_offset);
> The start offset, third argument, should be cur_offset.
> Everything from alloc_start to cur_offset was freed by the
> list_for_each_entry_safe_loop.
> 
> Fixes: 18513091af94 ("btrfs: update btrfs_space_info's bytes_may_use timely")
> Signed-off-by: Robbie Ko <robbieko@synology.com>

Added to misc-next, thanks.
David Sterba April 11, 2019, 5:52 p.m. UTC | #3
On Tue, Mar 26, 2019 at 11:56:11AM +0800, robbieko wrote:
> From: Robbie Ko <robbieko@synology.com>
> 
> When doing fallocate, we first add the range to the reserve_list
> and then reserve the quota.
> If quota reservation fails, we'll release all reserved parts of
> reserve_list.
> However, cur_offset is not updated to indicate that this
> range is already been inserted into the list.
> Therefore, the same range is freed twice.
> Once at list_for_each_entry loop, and once at the end of the
> function.
> This will result in WARN_ON on bytes_may_use when we free the
> remaining space.
> 
> At the end, under the 'out' label we have a call to:
>    btrfs_free_reserved_data_space(inode, data_reserved, alloc_start,
> alloc_end - cur_offset);
> The start offset, third argument, should be cur_offset.
> Everything from alloc_start to cur_offset was freed by the
> list_for_each_entry_safe_loop.
> 
> Fixes: 18513091af94 ("btrfs: update btrfs_space_info's bytes_may_use timely")
> Signed-off-by: Robbie Ko <robbieko@synology.com>

I see this error:

generic/102             [02:15:05]

[ 7731.223975] run fstests generic/102 at 2019-04-09 02:15:05
[ 7731.355111] BTRFS info (device vda): disk space caching is enabled
[ 7731.358367] BTRFS info (device vda): has skinny extents
[ 7731.407393] BTRFS: device fsid 99c22651-33bd-4543-96df-da02cc320166 devid 1 transid 5 /dev/vdb
[ 7731.419081] BTRFS info (device vdb): disk space caching is enabled
[ 7731.420671] BTRFS info (device vdb): has skinny extents
[ 7731.421955] BTRFS info (device vdb): flagging fs with big metadata feature
[ 7731.426107] BTRFS info (device vdb): checking UUID tree
[ 7865.389231] BTRFS info (device vdb): disk space caching is enabled
[ 7865.393289] BTRFS info (device vdb): has skinny extents
 [02:17:19]- output mismatch (see /tmp/fstests/results//generic/102.out.bad)
    --- tests/generic/102.out   2018-10-08 09:51:53.890556726 +0000
    +++ /tmp/fstests/results//generic/102.out.bad       2019-04-09 02:17:19.460000000 +0000
    @@ -1,7 +1,7 @@
     QA output created by 102
     wrote 838860800/838860800 bytes at offset 0
     XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
    -wrote 838860800/838860800 bytes at offset 0
    +wrote 109707264/838860800 bytes at offset 0
     XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
     wrote 838860800/838860800 bytes at offset 0
    ...
    (Run 'diff -u /tmp/fstests/tests/generic/102.out /tmp/fstests/results//generic/102.out.bad'  to see the entire diff)

This is with current misc-next, with 2 other patches to tree-checker
that are unrelated. I've run fstests twice to be sure that it's not some
random fluke, it happened in both cases.

Running with with one patch below the test passes so I'm reasonably sure
it's happening due to your patch.

For reference:

- misc-next top commit fbcee956b888abbcbfed295a2191b1d0eccb0f09
  (reproduced twice)

- one below your patch c8d27001b9067085bd62adf5fdf7cc8a53bd2f7f
  (not reproduced)
Filipe Manana April 11, 2019, 6:01 p.m. UTC | #4
On Thu, Apr 11, 2019 at 6:53 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Tue, Mar 26, 2019 at 11:56:11AM +0800, robbieko wrote:
> > From: Robbie Ko <robbieko@synology.com>
> >
> > When doing fallocate, we first add the range to the reserve_list
> > and then reserve the quota.
> > If quota reservation fails, we'll release all reserved parts of
> > reserve_list.
> > However, cur_offset is not updated to indicate that this
> > range is already been inserted into the list.
> > Therefore, the same range is freed twice.
> > Once at list_for_each_entry loop, and once at the end of the
> > function.
> > This will result in WARN_ON on bytes_may_use when we free the
> > remaining space.
> >
> > At the end, under the 'out' label we have a call to:
> >    btrfs_free_reserved_data_space(inode, data_reserved, alloc_start,
> > alloc_end - cur_offset);
> > The start offset, third argument, should be cur_offset.
> > Everything from alloc_start to cur_offset was freed by the
> > list_for_each_entry_safe_loop.
> >
> > Fixes: 18513091af94 ("btrfs: update btrfs_space_info's bytes_may_use timely")
> > Signed-off-by: Robbie Ko <robbieko@synology.com>
>
> I see this error:
>
> generic/102             [02:15:05]
>
> [ 7731.223975] run fstests generic/102 at 2019-04-09 02:15:05
> [ 7731.355111] BTRFS info (device vda): disk space caching is enabled
> [ 7731.358367] BTRFS info (device vda): has skinny extents
> [ 7731.407393] BTRFS: device fsid 99c22651-33bd-4543-96df-da02cc320166 devid 1 transid 5 /dev/vdb
> [ 7731.419081] BTRFS info (device vdb): disk space caching is enabled
> [ 7731.420671] BTRFS info (device vdb): has skinny extents
> [ 7731.421955] BTRFS info (device vdb): flagging fs with big metadata feature
> [ 7731.426107] BTRFS info (device vdb): checking UUID tree
> [ 7865.389231] BTRFS info (device vdb): disk space caching is enabled
> [ 7865.393289] BTRFS info (device vdb): has skinny extents
>  [02:17:19]- output mismatch (see /tmp/fstests/results//generic/102.out.bad)
>     --- tests/generic/102.out   2018-10-08 09:51:53.890556726 +0000
>     +++ /tmp/fstests/results//generic/102.out.bad       2019-04-09 02:17:19.460000000 +0000
>     @@ -1,7 +1,7 @@
>      QA output created by 102
>      wrote 838860800/838860800 bytes at offset 0
>      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>     -wrote 838860800/838860800 bytes at offset 0
>     +wrote 109707264/838860800 bytes at offset 0
>      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>      wrote 838860800/838860800 bytes at offset 0
>     ...
>     (Run 'diff -u /tmp/fstests/tests/generic/102.out /tmp/fstests/results//generic/102.out.bad'  to see the entire diff)

Unrelated to this patch.
The test fails often for at least a few kernel releases now.

The test does not even exercises fallocate() at all...

>
> This is with current misc-next, with 2 other patches to tree-checker
> that are unrelated. I've run fstests twice to be sure that it's not some
> random fluke, it happened in both cases.
>
> Running with with one patch below the test passes so I'm reasonably sure
> it's happening due to your patch.
>
> For reference:
>
> - misc-next top commit fbcee956b888abbcbfed295a2191b1d0eccb0f09
>   (reproduced twice)
>
> - one below your patch c8d27001b9067085bd62adf5fdf7cc8a53bd2f7f
>   (not reproduced)
David Sterba April 11, 2019, 6:28 p.m. UTC | #5
On Thu, Apr 11, 2019 at 06:01:16PM +0000, Filipe Manana wrote:
> > [ 7731.355111] BTRFS info (device vda): disk space caching is enabled
> > [ 7731.358367] BTRFS info (device vda): has skinny extents
> > [ 7731.407393] BTRFS: device fsid 99c22651-33bd-4543-96df-da02cc320166 devid 1 transid 5 /dev/vdb
> > [ 7731.419081] BTRFS info (device vdb): disk space caching is enabled
> > [ 7731.420671] BTRFS info (device vdb): has skinny extents
> > [ 7731.421955] BTRFS info (device vdb): flagging fs with big metadata feature
> > [ 7731.426107] BTRFS info (device vdb): checking UUID tree
> > [ 7865.389231] BTRFS info (device vdb): disk space caching is enabled
> > [ 7865.393289] BTRFS info (device vdb): has skinny extents
> >  [02:17:19]- output mismatch (see /tmp/fstests/results//generic/102.out.bad)
> >     --- tests/generic/102.out   2018-10-08 09:51:53.890556726 +0000
> >     +++ /tmp/fstests/results//generic/102.out.bad       2019-04-09 02:17:19.460000000 +0000
> >     @@ -1,7 +1,7 @@
> >      QA output created by 102
> >      wrote 838860800/838860800 bytes at offset 0
> >      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >     -wrote 838860800/838860800 bytes at offset 0
> >     +wrote 109707264/838860800 bytes at offset 0
> >      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >      wrote 838860800/838860800 bytes at offset 0
> >     ...
> >     (Run 'diff -u /tmp/fstests/tests/generic/102.out /tmp/fstests/results//generic/102.out.bad'  to see the entire diff)
> 
> Unrelated to this patch.
> The test fails often for at least a few kernel releases now.

Ok noted, thanks. After checking more logs, the test indeed fails,
though I see only a few failures in past week so it was a conincidence
that it happened just around this patch.
diff mbox series

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 34fe8a5..0832449 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -3132,6 +3132,7 @@  static long btrfs_fallocate(struct file *file, int mode,
 			ret = btrfs_qgroup_reserve_data(inode, &data_reserved,
 					cur_offset, last_byte - cur_offset);
 			if (ret < 0) {
+				cur_offset = last_byte;
 				free_extent_map(em);
 				break;
 			}
@@ -3181,7 +3182,7 @@  static long btrfs_fallocate(struct file *file, int mode,
 	/* Let go of our reservation. */
 	if (ret != 0 && !(mode & FALLOC_FL_ZERO_RANGE))
 		btrfs_free_reserved_data_space(inode, data_reserved,
-				alloc_start, alloc_end - cur_offset);
+				cur_offset, alloc_end - cur_offset);
 	extent_changeset_free(data_reserved);
 	return ret;
 }