diff mbox series

btrfs: fix the length of reserved qgroup to free

Message ID 20241008064849.1814829-1-haisuwang@tencent.com (mailing list archive)
State New
Headers show
Series btrfs: fix the length of reserved qgroup to free | expand

Commit Message

hs wang Oct. 8, 2024, 6:48 a.m. UTC
From: Haisu Wang <haisuwang@tencent.com>

The dealloc flag may be cleared and the extent won't reach the disk
in cow_file_range when errors path. The reserved qgroup space is
freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in
cow_file_range"). However, the length of untouched region to free
need to be adjusted with the region size.

Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range")
Signed-off-by: Haisu Wang <haisuwang@tencent.com>
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Qu Wenruo Oct. 8, 2024, 7:55 a.m. UTC | #1
在 2024/10/8 17:18, iamhswang@gmail.com 写道:
> From: Haisu Wang <haisuwang@tencent.com>
>
> The dealloc flag may be cleared and the extent won't reach the disk
> in cow_file_range when errors path. The reserved qgroup space is
> freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in
> cow_file_range"). However, the length of untouched region to free
> need to be adjusted with the region size.
>
> Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range")
> Signed-off-by: Haisu Wang <haisuwang@tencent.com>

Right, just several lines before that, we increased @start by
@cur_alloc_size if @extent_reserved is true.

So we can not directly use the old range size.

You can improve that one step further by not modifying @start just for
the error handling path, although that should be another patch.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>   fs/btrfs/inode.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index b0ad46b734c3..5eefa2318fa8 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1592,7 +1592,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>   		clear_bits |= EXTENT_CLEAR_DATA_RESV;
>   		extent_clear_unlock_delalloc(inode, start, end, locked_folio,
>   					     &cached, clear_bits, page_ops);
> -		btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL);
> +		btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL);
>   	}
>   	return ret;
>   }
hs wang Oct. 8, 2024, 11:18 a.m. UTC | #2
Qu Wenruo <quwenruo.btrfs@gmx.com> 于2024年10月8日周二 15:56写道:
>
>
>
> 在 2024/10/8 17:18, iamhswang@gmail.com 写道:
> > From: Haisu Wang <haisuwang@tencent.com>
> >
> > The dealloc flag may be cleared and the extent won't reach the disk
> > in cow_file_range when errors path. The reserved qgroup space is
> > freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in
> > cow_file_range"). However, the length of untouched region to free
> > need to be adjusted with the region size.
> >
> > Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range")
> > Signed-off-by: Haisu Wang <haisuwang@tencent.com>
>
> Right, just several lines before that, we increased @start by
> @cur_alloc_size if @extent_reserved is true.
>
> So we can not directly use the old range size.

Thanks for the review.

>
> You can improve that one step further by not modifying @start just for
> the error handling path, although that should be another patch.

Indeed, modify the start value based on @extent_reserved in
error path only is tricky and ambiguous.

I agree to keep the fix as simple as possible (like the previous patch),
since commit 30479f31d44d ("btrfs: fix qgroupreserve leaks in
cow_file_range") assigned to CVE-2024-46733 already.
A simple fix is easier to port to stable branch of different versions.
Also the possible change to keep @start is more like an
enhancement instead of a fix.

>
> Reviewed-by: Qu Wenruo <wqu@suse.com>
>
> Thanks,
> Qu

To make sure we are on the same page of keeping the @start
unchanged. I write a POC below for your opinion.
(Anyway, i will think/test again before convert POC to a PATCH.)

The @start will advanced in every succeed reservation, the
@cur_alloc_size can represent the @extent_reserved state
instead of using a standalone @extent_reserved flag.
In this case, the @start region no longer need to be modified
based on @extent_reserved state in the error path.

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5eefa2318fa8..0c35292550bd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1341,7 +1341,6 @@ static noinline int cow_file_range(struct
btrfs_inode *inode,
        struct extent_map *em;
        unsigned clear_bits;
        unsigned long page_ops;
-       bool extent_reserved = false;
        int ret = 0;

        if (btrfs_is_free_space_inode(inode)) {
@@ -1395,8 +1394,7 @@ static noinline int cow_file_range(struct
btrfs_inode *inode,
                struct btrfs_ordered_extent *ordered;
                struct btrfs_file_extent file_extent;

-               cur_alloc_size = num_bytes;
-               ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size,
+               ret = btrfs_reserve_extent(root, num_bytes, num_bytes,
                                           min_alloc_size, 0, alloc_hint,
                                           &ins, 1, 1);
                if (ret == -EAGAIN) {
@@ -1427,7 +1425,6 @@ static noinline int cow_file_range(struct
btrfs_inode *inode,
                if (ret < 0)
                        goto out_unlock;
                cur_alloc_size = ins.offset;
-               extent_reserved = true;

                ram_size = ins.offset;
                file_extent.disk_bytenr = ins.objectid;
@@ -1503,7 +1500,7 @@ static noinline int cow_file_range(struct
btrfs_inode *inode,
                        num_bytes -= cur_alloc_size;
                alloc_hint = ins.objectid + ins.offset;
                start += cur_alloc_size;
-               extent_reserved = false;
+               cur_alloc_size = 0;

                /*
                 * btrfs_reloc_clone_csums() error, since start is increased
@@ -1573,13 +1570,12 @@ static noinline int cow_file_range(struct
btrfs_inode *inode,
         * to decrement again the data space_info's bytes_may_use counter,
         * therefore we do not pass it the flag EXTENT_CLEAR_DATA_RESV.
         */
-       if (extent_reserved) {
+       if (cur_alloc_size) {
                extent_clear_unlock_delalloc(inode, start,
                                             start + cur_alloc_size - 1,
                                             locked_folio, &cached, clear_bits,
                                             page_ops);
                btrfs_qgroup_free_data(inode, NULL, start,
cur_alloc_size, NULL);
-               start += cur_alloc_size;
        }

        /*
@@ -1588,11 +1584,13 @@ static noinline int cow_file_range(struct
btrfs_inode *inode,
         * space_info's bytes_may_use counter, reserved in
         * btrfs_check_data_free_space().
         */
-       if (start < end) {
+       if (start + cur_alloc_size < end) {
                clear_bits |= EXTENT_CLEAR_DATA_RESV;
-               extent_clear_unlock_delalloc(inode, start, end, locked_folio,
+               extent_clear_unlock_delalloc(inode, start + cur_alloc_size,
+                                            end, locked_folio,
                                             &cached, clear_bits, page_ops);
-               btrfs_qgroup_free_data(inode, NULL, start, end - start
+ 1, NULL);
+               btrfs_qgroup_free_data(inode, NULL, start + cur_alloc_size,
+                               end - start - cur_alloc_size + 1, NULL);
        }
        return ret;
 }


Thanks,
Haisu Wang

>
> > ---
> >   fs/btrfs/inode.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index b0ad46b734c3..5eefa2318fa8 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -1592,7 +1592,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
> >               clear_bits |= EXTENT_CLEAR_DATA_RESV;
> >               extent_clear_unlock_delalloc(inode, start, end, locked_folio,
> >                                            &cached, clear_bits, page_ops);
> > -             btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL);
> > +             btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL);
> >       }
> >       return ret;
> >   }
>
Boris Burkov Oct. 8, 2024, 4:12 p.m. UTC | #3
On Tue, Oct 08, 2024 at 02:48:46PM +0800, iamhswang@gmail.com wrote:
> From: Haisu Wang <haisuwang@tencent.com>
> 
> The dealloc flag may be cleared and the extent won't reach the disk
> in cow_file_range when errors path. The reserved qgroup space is
> freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in
> cow_file_range"). However, the length of untouched region to free
> need to be adjusted with the region size.
> 
> Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range")
> Signed-off-by: Haisu Wang <haisuwang@tencent.com>

Good catch and fix, thank you!
Reviewed-by: Boris Burkov <boris@bur.io>

Can you please share more information about how you reproduced and
tested this issue for the fix? In one of the other emails in the chain,
you also mentioned a CVE, so explaining the specific impact of the bug
is helpful too.

As far as I can tell, we risk freeing too much space past the real
desired range if start gets bumped before this free, which could lead to
prematurely freeing some other rsv marked data past end. This naturally
leads to incorrect accounting, And I think would allow us to reserve
this same range again. Though perhaps delalloc extent range stuff would
prevent that. Between that, and the changesets gating most of the qgroup
freeing, it's hard to actually see what happens :)

Long ramble short: do you have a reproducer?

> ---
>  fs/btrfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index b0ad46b734c3..5eefa2318fa8 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1592,7 +1592,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>  		clear_bits |= EXTENT_CLEAR_DATA_RESV;
>  		extent_clear_unlock_delalloc(inode, start, end, locked_folio,
>  					     &cached, clear_bits, page_ops);
> -		btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL);
> +		btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL);
>  	}
>  	return ret;
>  }
> -- 
> 2.39.3
>
Qu Wenruo Oct. 8, 2024, 8:50 p.m. UTC | #4
在 2024/10/8 21:48, hs wang 写道:
> Qu Wenruo <quwenruo.btrfs@gmx.com> 于2024年10月8日周二 15:56写道:
>>
>>
>>
>> 在 2024/10/8 17:18, iamhswang@gmail.com 写道:
>>> From: Haisu Wang <haisuwang@tencent.com>
>>>
>>> The dealloc flag may be cleared and the extent won't reach the disk
>>> in cow_file_range when errors path. The reserved qgroup space is
>>> freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in
>>> cow_file_range"). However, the length of untouched region to free
>>> need to be adjusted with the region size.
>>>
>>> Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range")
>>> Signed-off-by: Haisu Wang <haisuwang@tencent.com>
>>
>> Right, just several lines before that, we increased @start by
>> @cur_alloc_size if @extent_reserved is true.
>>
>> So we can not directly use the old range size.
>
> Thanks for the review.
>
>>
>> You can improve that one step further by not modifying @start just for
>> the error handling path, although that should be another patch.
>
> Indeed, modify the start value based on @extent_reserved in
> error path only is tricky and ambiguous.
>
> I agree to keep the fix as simple as possible (like the previous patch),
> since commit 30479f31d44d ("btrfs: fix qgroupreserve leaks in
> cow_file_range") assigned to CVE-2024-46733 already.
> A simple fix is easier to port to stable branch of different versions.
> Also the possible change to keep @start is more like an
> enhancement instead of a fix.
>
>>
>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>
>> Thanks,
>> Qu
>
> To make sure we are on the same page of keeping the @start
> unchanged. I write a POC below for your opinion.
> (Anyway, i will think/test again before convert POC to a PATCH.)
>
> The @start will advanced in every succeed reservation, the
> @cur_alloc_size can represent the @extent_reserved state
> instead of using a standalone @extent_reserved flag.
> In this case, the @start region no longer need to be modified
> based on @extent_reserved state in the error path.

This snippet looks good to me.

Thanks,
Qu
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5eefa2318fa8..0c35292550bd 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1341,7 +1341,6 @@ static noinline int cow_file_range(struct
> btrfs_inode *inode,
>          struct extent_map *em;
>          unsigned clear_bits;
>          unsigned long page_ops;
> -       bool extent_reserved = false;
>          int ret = 0;
>
>          if (btrfs_is_free_space_inode(inode)) {
> @@ -1395,8 +1394,7 @@ static noinline int cow_file_range(struct
> btrfs_inode *inode,
>                  struct btrfs_ordered_extent *ordered;
>                  struct btrfs_file_extent file_extent;
>
> -               cur_alloc_size = num_bytes;
> -               ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size,
> +               ret = btrfs_reserve_extent(root, num_bytes, num_bytes,
>                                             min_alloc_size, 0, alloc_hint,
>                                             &ins, 1, 1);
>                  if (ret == -EAGAIN) {
> @@ -1427,7 +1425,6 @@ static noinline int cow_file_range(struct
> btrfs_inode *inode,
>                  if (ret < 0)
>                          goto out_unlock;
>                  cur_alloc_size = ins.offset;
> -               extent_reserved = true;
>
>                  ram_size = ins.offset;
>                  file_extent.disk_bytenr = ins.objectid;
> @@ -1503,7 +1500,7 @@ static noinline int cow_file_range(struct
> btrfs_inode *inode,
>                          num_bytes -= cur_alloc_size;
>                  alloc_hint = ins.objectid + ins.offset;
>                  start += cur_alloc_size;
> -               extent_reserved = false;
> +               cur_alloc_size = 0;
>
>                  /*
>                   * btrfs_reloc_clone_csums() error, since start is increased
> @@ -1573,13 +1570,12 @@ static noinline int cow_file_range(struct
> btrfs_inode *inode,
>           * to decrement again the data space_info's bytes_may_use counter,
>           * therefore we do not pass it the flag EXTENT_CLEAR_DATA_RESV.
>           */
> -       if (extent_reserved) {
> +       if (cur_alloc_size) {
>                  extent_clear_unlock_delalloc(inode, start,
>                                               start + cur_alloc_size - 1,
>                                               locked_folio, &cached, clear_bits,
>                                               page_ops);
>                  btrfs_qgroup_free_data(inode, NULL, start,
> cur_alloc_size, NULL);
> -               start += cur_alloc_size;
>          }
>
>          /*
> @@ -1588,11 +1584,13 @@ static noinline int cow_file_range(struct
> btrfs_inode *inode,
>           * space_info's bytes_may_use counter, reserved in
>           * btrfs_check_data_free_space().
>           */
> -       if (start < end) {
> +       if (start + cur_alloc_size < end) {
>                  clear_bits |= EXTENT_CLEAR_DATA_RESV;
> -               extent_clear_unlock_delalloc(inode, start, end, locked_folio,
> +               extent_clear_unlock_delalloc(inode, start + cur_alloc_size,
> +                                            end, locked_folio,
>                                               &cached, clear_bits, page_ops);
> -               btrfs_qgroup_free_data(inode, NULL, start, end - start
> + 1, NULL);
> +               btrfs_qgroup_free_data(inode, NULL, start + cur_alloc_size,
> +                               end - start - cur_alloc_size + 1, NULL);
>          }
>          return ret;
>   }
>
>
> Thanks,
> Haisu Wang
>
>>
>>> ---
>>>    fs/btrfs/inode.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index b0ad46b734c3..5eefa2318fa8 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -1592,7 +1592,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>>>                clear_bits |= EXTENT_CLEAR_DATA_RESV;
>>>                extent_clear_unlock_delalloc(inode, start, end, locked_folio,
>>>                                             &cached, clear_bits, page_ops);
>>> -             btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL);
>>> +             btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL);
>>>        }
>>>        return ret;
>>>    }
>>
hs wang Oct. 9, 2024, 2:40 a.m. UTC | #5
Boris Burkov <boris@bur.io> 于2024年10月9日周三 00:12写道:
>
> On Tue, Oct 08, 2024 at 02:48:46PM +0800, iamhswang@gmail.com wrote:
> > From: Haisu Wang <haisuwang@tencent.com>
> >
> > The dealloc flag may be cleared and the extent won't reach the disk
> > in cow_file_range when errors path. The reserved qgroup space is
> > freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in
> > cow_file_range"). However, the length of untouched region to free
> > need to be adjusted with the region size.
> >
> > Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range")
> > Signed-off-by: Haisu Wang <haisuwang@tencent.com>
>
> Good catch and fix, thank you!
> Reviewed-by: Boris Burkov <boris@bur.io>
>
Thanks for the review.

> Can you please share more information about how you reproduced and
> tested this issue for the fix? In one of the other emails in the chain,
> you also mentioned a CVE, so explaining the specific impact of the bug
> is helpful too.
>

Instead of hitting this in the real world, I get this while backporting the
CVE-2024-46733 fixes. I need to understand the full story and the extent
reservation/clean up context, found the free data region mismatch to the
dealloc region and the potential risky. So i write the fix of the inconsistent
size.

> As far as I can tell, we risk freeing too much space past the real
> desired range if start gets bumped before this free, which could lead to
> prematurely freeing some other rsv marked data past end. This naturally
> leads to incorrect accounting, And I think would allow us to reserve
> this same range again. Though perhaps delalloc extent range stuff would
> prevent that. Between that, and the changesets gating most of the qgroup
> freeing, it's hard to actually see what happens :)
>
> Long ramble short: do you have a reproducer?
>

Sadly, i don't have a reproducer yet.

In another mail of the chain, Wenruo suggested it is possible to
polish the usage
of @startand @extent_reserved to make it more clear/safe. I will check more to
finish this in another patch, together with generic fstest at least.

Thanks,
Haisu Wang

> > ---
> >  fs/btrfs/inode.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index b0ad46b734c3..5eefa2318fa8 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -1592,7 +1592,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
> >               clear_bits |= EXTENT_CLEAR_DATA_RESV;
> >               extent_clear_unlock_delalloc(inode, start, end, locked_folio,
> >                                            &cached, clear_bits, page_ops);
> > -             btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL);
> > +             btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL);
> >       }
> >       return ret;
> >  }
> > --
> > 2.39.3
> >
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b0ad46b734c3..5eefa2318fa8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1592,7 +1592,7 @@  static noinline int cow_file_range(struct btrfs_inode *inode,
 		clear_bits |= EXTENT_CLEAR_DATA_RESV;
 		extent_clear_unlock_delalloc(inode, start, end, locked_folio,
 					     &cached, clear_bits, page_ops);
-		btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL);
+		btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL);
 	}
 	return ret;
 }