diff mbox series

btrfs: avoid double put of block group when emptying cluster

Message ID 5ca694ff4f8cff4c0ef6896593a1f1d01fbe956d.1611610947.git.josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series btrfs: avoid double put of block group when emptying cluster | expand

Commit Message

Josef Bacik Jan. 25, 2021, 9:42 p.m. UTC
In __btrfs_return_cluster_to_free_space we will bail doing the cleanup
of the cluster if the block group we passed in doesn't match the block
group on the cluster.  However we drop a reference to block_group, as
the cluster holds a reference to the block group while it's attached to
the cluster.  If cluster->block_group != block_group however then this
is an extra put, which means we'll go negative and free this block group
down the line, leading to a UAF.

Fix this by simply bailing if the block group we passed in does not
match the block group on the cluster.

CC: stable@vger.kernel.org
Fixes: fa9c0d795f7b ("Btrfs: rework allocation clustering")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/free-space-cache.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Nikolay Borisov Jan. 26, 2021, 9:02 a.m. UTC | #1
On 25.01.21 г. 23:42 ч., Josef Bacik wrote:
> In __btrfs_return_cluster_to_free_space we will bail doing the cleanup
> of the cluster if the block group we passed in doesn't match the block
> group on the cluster.  However we drop a reference to block_group, as
> the cluster holds a reference to the block group while it's attached to
> the cluster.  If cluster->block_group != block_group however then this
> is an extra put, which means we'll go negative and free this block group
> down the line, leading to a UAF.

Was this found by code inspection or did you hit in production. Also why
in btrfs_remove_free_space_cache just before
__btrfs_return_cluster_to_free_space there is:

WARN_ON(cluster->block_group != block_group);

IMO this patch should also remove the WARN_ON if it's a valid condition
to have the passed bg be different than the one in the cluster. Also
that WARN_ON is likely racy since it's done outside of cluster->lock.

> 
> Fix this by simply bailing if the block group we passed in does not
> match the block group on the cluster.
> 
> CC: stable@vger.kernel.org
> Fixes: fa9c0d795f7b ("Btrfs: rework allocation clustering")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/free-space-cache.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index 0d6dcb5ff963..8be36cc6cbd8 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -2711,8 +2711,10 @@ static void __btrfs_return_cluster_to_free_space(
>  	struct rb_node *node;
>  
>  	spin_lock(&cluster->lock);
> -	if (cluster->block_group != block_group)
> -		goto out;
> +	if (cluster->block_group != block_group) {
> +		spin_unlock(&cluster->lock);
> +		return;
> +	}
>  
>  	cluster->block_group = NULL;
>  	cluster->window_start = 0;
> @@ -2750,8 +2752,6 @@ static void __btrfs_return_cluster_to_free_space(
>  				   entry->offset, &entry->offset_index, bitmap);
>  	}
>  	cluster->root = RB_ROOT;
> -
> -out:
>  	spin_unlock(&cluster->lock);
>  	btrfs_put_block_group(block_group);
>  }
>
Josef Bacik Jan. 26, 2021, 2:30 p.m. UTC | #2
On 1/26/21 4:02 AM, Nikolay Borisov wrote:
> 
> 
> On 25.01.21 г. 23:42 ч., Josef Bacik wrote:
>> In __btrfs_return_cluster_to_free_space we will bail doing the cleanup
>> of the cluster if the block group we passed in doesn't match the block
>> group on the cluster.  However we drop a reference to block_group, as
>> the cluster holds a reference to the block group while it's attached to
>> the cluster.  If cluster->block_group != block_group however then this
>> is an extra put, which means we'll go negative and free this block group
>> down the line, leading to a UAF.
> 
> Was this found by code inspection or did you hit in production. Also why
> in btrfs_remove_free_space_cache just before
> __btrfs_return_cluster_to_free_space there is:
> 

It was found in production sort of halfway.  I was doing something for WhatsApp 
and had to convert our block group reference counting to the refcount stuff so I 
could find where I made a mistake.  Turns out this was where the problem was, my 
stuff had just made it way more likely to happen.  I don't have the stack trace 
because this was like 6 months ago, I'm going through all my WhatsApp magic and 
getting them actually usable for upstream.

> WARN_ON(cluster->block_group != block_group);
> 
> IMO this patch should also remove the WARN_ON if it's a valid condition
> to have the passed bg be different than the one in the cluster. Also
> that WARN_ON is likely racy since it's done outside of cluster->lock.
> 

Yup that's in a follow up thing, I wanted to get the actual fix out before I got 
distracted by my mountain of meetings this week.  Thanks,

Josef
David Sterba Feb. 10, 2021, 10:50 p.m. UTC | #3
On Tue, Jan 26, 2021 at 09:30:45AM -0500, Josef Bacik wrote:
> On 1/26/21 4:02 AM, Nikolay Borisov wrote:
> > On 25.01.21 г. 23:42 ч., Josef Bacik wrote:
> >> In __btrfs_return_cluster_to_free_space we will bail doing the cleanup
> >> of the cluster if the block group we passed in doesn't match the block
> >> group on the cluster.  However we drop a reference to block_group, as
> >> the cluster holds a reference to the block group while it's attached to
> >> the cluster.  If cluster->block_group != block_group however then this
> >> is an extra put, which means we'll go negative and free this block group
> >> down the line, leading to a UAF.
> > 
> > Was this found by code inspection or did you hit in production. Also why
> > in btrfs_remove_free_space_cache just before
> > __btrfs_return_cluster_to_free_space there is:
> > 
> 
> It was found in production sort of halfway.  I was doing something for WhatsApp 
> and had to convert our block group reference counting to the refcount stuff so I 
> could find where I made a mistake.  Turns out this was where the problem was, my 
> stuff had just made it way more likely to happen.  I don't have the stack trace 
> because this was like 6 months ago, I'm going through all my WhatsApp magic and 
> getting them actually usable for upstream.
> 
> > WARN_ON(cluster->block_group != block_group);
> > 
> > IMO this patch should also remove the WARN_ON if it's a valid condition
> > to have the passed bg be different than the one in the cluster. Also
> > that WARN_ON is likely racy since it's done outside of cluster->lock.
> > 
> 
> Yup that's in a follow up thing, I wanted to get the actual fix out before I got 
> distracted by my mountain of meetings this week.  Thanks,

Removing the WARN_ON in a separate patch sounds ok to me, this patch
clearly fixes the refcounting bug, the warning condition is the same but
would need a different reasoning.

Nikolay, if you're ok with current patch version let me know if you want
a rev-by added.
Nikolay Borisov Feb. 11, 2021, 11:25 a.m. UTC | #4
On 11.02.21 г. 0:50 ч., David Sterba wrote:
> On Tue, Jan 26, 2021 at 09:30:45AM -0500, Josef Bacik wrote:
>> On 1/26/21 4:02 AM, Nikolay Borisov wrote:
>>> On 25.01.21 г. 23:42 ч., Josef Bacik wrote:
>>>> In __btrfs_return_cluster_to_free_space we will bail doing the cleanup
>>>> of the cluster if the block group we passed in doesn't match the block
>>>> group on the cluster.  However we drop a reference to block_group, as
>>>> the cluster holds a reference to the block group while it's attached to
>>>> the cluster.  If cluster->block_group != block_group however then this
>>>> is an extra put, which means we'll go negative and free this block group
>>>> down the line, leading to a UAF.
>>>
>>> Was this found by code inspection or did you hit in production. Also why
>>> in btrfs_remove_free_space_cache just before
>>> __btrfs_return_cluster_to_free_space there is:
>>>
>>
>> It was found in production sort of halfway.  I was doing something for WhatsApp 
>> and had to convert our block group reference counting to the refcount stuff so I 
>> could find where I made a mistake.  Turns out this was where the problem was, my 
>> stuff had just made it way more likely to happen.  I don't have the stack trace 
>> because this was like 6 months ago, I'm going through all my WhatsApp magic and 
>> getting them actually usable for upstream.
>>
>>> WARN_ON(cluster->block_group != block_group);
>>>
>>> IMO this patch should also remove the WARN_ON if it's a valid condition
>>> to have the passed bg be different than the one in the cluster. Also
>>> that WARN_ON is likely racy since it's done outside of cluster->lock.
>>>
>>
>> Yup that's in a follow up thing, I wanted to get the actual fix out before I got 
>> distracted by my mountain of meetings this week.  Thanks,
> 
> Removing the WARN_ON in a separate patch sounds ok to me, this patch
> clearly fixes the refcounting bug, the warning condition is the same but
> would need a different reasoning.
> 
> Nikolay, if you're ok with current patch version let me know if you want
> a rev-by added.
> 


Codewise I'm fine with it. However just had another read of the commit
message and I think it could be rewritten to be somewhat simpler:

It's wrong calling btrfs_put_block_group in
__btrfs_return_cluster_to_free_space if the block group passed is
different than the block group the cluster represents. As this means the
cluster doesn't have a reference to the passed block group. This results
in double put and an UAF.

What prompted me is that the 2nd and 3rd sentences read somewhat awkward
due to starting with 'However'
David Sterba Feb. 17, 2021, 5:29 p.m. UTC | #5
On Thu, Feb 11, 2021 at 01:25:52PM +0200, Nikolay Borisov wrote:
> 
> 
> On 11.02.21 г. 0:50 ч., David Sterba wrote:
> > On Tue, Jan 26, 2021 at 09:30:45AM -0500, Josef Bacik wrote:
> >> On 1/26/21 4:02 AM, Nikolay Borisov wrote:
> >>> On 25.01.21 г. 23:42 ч., Josef Bacik wrote:
> >>>> In __btrfs_return_cluster_to_free_space we will bail doing the cleanup
> >>>> of the cluster if the block group we passed in doesn't match the block
> >>>> group on the cluster.  However we drop a reference to block_group, as
> >>>> the cluster holds a reference to the block group while it's attached to
> >>>> the cluster.  If cluster->block_group != block_group however then this
> >>>> is an extra put, which means we'll go negative and free this block group
> >>>> down the line, leading to a UAF.
> >>>
> >>> Was this found by code inspection or did you hit in production. Also why
> >>> in btrfs_remove_free_space_cache just before
> >>> __btrfs_return_cluster_to_free_space there is:
> >>>
> >>
> >> It was found in production sort of halfway.  I was doing something for WhatsApp 
> >> and had to convert our block group reference counting to the refcount stuff so I 
> >> could find where I made a mistake.  Turns out this was where the problem was, my 
> >> stuff had just made it way more likely to happen.  I don't have the stack trace 
> >> because this was like 6 months ago, I'm going through all my WhatsApp magic and 
> >> getting them actually usable for upstream.
> >>
> >>> WARN_ON(cluster->block_group != block_group);
> >>>
> >>> IMO this patch should also remove the WARN_ON if it's a valid condition
> >>> to have the passed bg be different than the one in the cluster. Also
> >>> that WARN_ON is likely racy since it's done outside of cluster->lock.
> >>>
> >>
> >> Yup that's in a follow up thing, I wanted to get the actual fix out before I got 
> >> distracted by my mountain of meetings this week.  Thanks,
> > 
> > Removing the WARN_ON in a separate patch sounds ok to me, this patch
> > clearly fixes the refcounting bug, the warning condition is the same but
> > would need a different reasoning.
> > 
> > Nikolay, if you're ok with current patch version let me know if you want
> > a rev-by added.
> > 
> 
> 
> Codewise I'm fine with it. However just had another read of the commit
> message and I think it could be rewritten to be somewhat simpler:
> 
> It's wrong calling btrfs_put_block_group in
> __btrfs_return_cluster_to_free_space if the block group passed is
> different than the block group the cluster represents. As this means the
> cluster doesn't have a reference to the passed block group. This results
> in double put and an UAF.
> 
> What prompted me is that the 2nd and 3rd sentences read somewhat awkward
> due to starting with 'However'

Ok, updated, thanks. I left the last paragraph "Fix that ...".
diff mbox series

Patch

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 0d6dcb5ff963..8be36cc6cbd8 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2711,8 +2711,10 @@  static void __btrfs_return_cluster_to_free_space(
 	struct rb_node *node;
 
 	spin_lock(&cluster->lock);
-	if (cluster->block_group != block_group)
-		goto out;
+	if (cluster->block_group != block_group) {
+		spin_unlock(&cluster->lock);
+		return;
+	}
 
 	cluster->block_group = NULL;
 	cluster->window_start = 0;
@@ -2750,8 +2752,6 @@  static void __btrfs_return_cluster_to_free_space(
 				   entry->offset, &entry->offset_index, bitmap);
 	}
 	cluster->root = RB_ROOT;
-
-out:
 	spin_unlock(&cluster->lock);
 	btrfs_put_block_group(block_group);
 }