diff mbox series

[1/3] Btrfs: fix a block group ref counter leak after failure to remove block group

Message ID 20200601181206.24852-1-fdmanana@kernel.org (mailing list archive)
State New, archived
Headers show
Series [1/3] Btrfs: fix a block group ref counter leak after failure to remove block group | expand

Commit Message

Filipe Manana June 1, 2020, 6:12 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

When removing a block group, if we fail to delete the block group's item
from the extent tree, we jump to the 'out' label and end up decrementing
the block group's reference count once only (by 1), resulting in a counter
leak because the block group at that point was already removed from the
block group cache rbtree - so we have to decrement the reference count
twice, once for the rbtree and once for our lookup at the start of the
function.

To make things less error prone, decrement the reference count for the
rbtree immediately after removing the block group from it. This also
eleminates the need for two different exit labels on error, renaming
'out_put_label' to just 'out' and removing the old 'out'.

Fixes: f6033c5e333238 ("btrfs: fix block group leak when removing fails")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/block-group.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

Comments

Nikolay Borisov June 3, 2020, 7:32 a.m. UTC | #1
On 1.06.20 г. 21:12 ч., fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When removing a block group, if we fail to delete the block group's item
> from the extent tree, we jump to the 'out' label and end up decrementing
> the block group's reference count once only (by 1), resulting in a counter
> leak because the block group at that point was already removed from the
> block group cache rbtree - so we have to decrement the reference count
> twice, once for the rbtree and once for our lookup at the start of the
> function.

However I'm having hard time reconciling this. The block group is
removed from the block_group_cache_tree after we've called
btrfs_del_item. So if btrfs_del_item or btrfs_search_slot fail the code
jumps at out_put_group and puts the reference acquired at the beginning
of the function via btrfs_lookup_block_group.

I think what you meant is if we fail to delete the block group's item
from the freespace tree, that is if we fail
remove_block_group_free_space, then we'd have a ref leak. With this
modification to the changelog:

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> 
> To make things less error prone, decrement the reference count for the
> rbtree immediately after removing the block group from it. This also
> eleminates the need for two different exit labels on error, renaming
> 'out_put_label' to just 'out' and removing the old 'out'.

I agree with this.

> 
> Fixes: f6033c5e333238 ("btrfs: fix block group leak when removing fails")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

<snip>
Nikolay Borisov June 3, 2020, 7:44 a.m. UTC | #2
On 3.06.20 г. 10:32 ч., Nikolay Borisov wrote:
> 
> 
> On 1.06.20 г. 21:12 ч., fdmanana@kernel.org wrote:
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> When removing a block group, if we fail to delete the block group's item
>> from the extent tree, we jump to the 'out' label and end up decrementing
>> the block group's reference count once only (by 1), resulting in a counter
>> leak because the block group at that point was already removed from the
>> block group cache rbtree - so we have to decrement the reference count
>> twice, once for the rbtree and once for our lookup at the start of the
>> function.
> 
> However I'm having hard time reconciling this. The block group is
> removed from the block_group_cache_tree after we've called
> btrfs_del_item. So if btrfs_del_item or btrfs_search_slot fail the code
> jumps at out_put_group and puts the reference acquired at the beginning
> of the function via btrfs_lookup_block_group.
> 
> I think what you meant is if we fail to delete the block group's item
> from the freespace tree, that is if we fail
> remove_block_group_free_space, then we'd have a ref leak. With this
> modification to the changelog:
> 

Looking again in this function without this patch the sequence of
remove_block_group_free_space/btrfs_put_block-group/remove_block_group_item
is really bogus.

1. If remove_block_group_free_space fails the code would jump to
out_put_group which would leak the ref count for the rb tree

2. If remove_block_group_item (removal from extent tree) fails then the
code would jump to out: which won't drop the reference taken in
btrfs_remove_block_group...

> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> 
>>
>> To make things less error prone, decrement the reference count for the
>> rbtree immediately after removing the block group from it. This also
>> eleminates the need for two different exit labels on error, renaming
>> 'out_put_label' to just 'out' and removing the old 'out'.
> 
> I agree with this.
> 
>>
>> Fixes: f6033c5e333238 ("btrfs: fix block group leak when removing fails")
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> 
> <snip>
>
Filipe Manana June 3, 2020, 9:30 a.m. UTC | #3
On Wed, Jun 3, 2020 at 8:32 AM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 1.06.20 г. 21:12 ч., fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When removing a block group, if we fail to delete the block group's item
> > from the extent tree, we jump to the 'out' label and end up decrementing
> > the block group's reference count once only (by 1), resulting in a counter
> > leak because the block group at that point was already removed from the
> > block group cache rbtree - so we have to decrement the reference count
> > twice, once for the rbtree and once for our lookup at the start of the
> > function.
>
> However I'm having hard time reconciling this. The block group is
> removed from the block_group_cache_tree after we've called
> btrfs_del_item. So if btrfs_del_item or btrfs_search_slot fail the code
> jumps at out_put_group and puts the reference acquired at the beginning
> of the function via btrfs_lookup_block_group.
>
> I think what you meant is if we fail to delete the block group's item
> from the freespace tree, that is if we fail
> remove_block_group_free_space, then we'd have a ref leak.

What I meant is exactly what I wrote:
if we fail to delete the block group's item from the extent tree (the
call to remove_block_group_item()),
we end up decrementing the reference count only once because we jump
to the out label - but we
should have decremented it twice, once for the rbtree removal, which
happened before, and once for
the lookup at the start of the function.

Thanks.

> With this
> modification to the changelog:
>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>
> >
> > To make things less error prone, decrement the reference count for the
> > rbtree immediately after removing the block group from it. This also
> > eleminates the need for two different exit labels on error, renaming
> > 'out_put_label' to just 'out' and removing the old 'out'.
>
> I agree with this.
>
> >
> > Fixes: f6033c5e333238 ("btrfs: fix block group leak when removing fails")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> <snip>
Nikolay Borisov June 3, 2020, 9:37 a.m. UTC | #4
On 3.06.20 г. 12:30 ч., Filipe Manana wrote:
> On Wed, Jun 3, 2020 at 8:32 AM Nikolay Borisov <nborisov@suse.com> wrote:
>>
>>
>>
>> On 1.06.20 г. 21:12 ч., fdmanana@kernel.org wrote:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> When removing a block group, if we fail to delete the block group's item
>>> from the extent tree, we jump to the 'out' label and end up decrementing
>>> the block group's reference count once only (by 1), resulting in a counter
>>> leak because the block group at that point was already removed from the
>>> block group cache rbtree - so we have to decrement the reference count
>>> twice, once for the rbtree and once for our lookup at the start of the
>>> function.
>>
>> However I'm having hard time reconciling this. The block group is
>> removed from the block_group_cache_tree after we've called
>> btrfs_del_item. So if btrfs_del_item or btrfs_search_slot fail the code
>> jumps at out_put_group and puts the reference acquired at the beginning
>> of the function via btrfs_lookup_block_group.
>>
>> I think what you meant is if we fail to delete the block group's item
>> from the freespace tree, that is if we fail
>> remove_block_group_free_space, then we'd have a ref leak.
> 
> What I meant is exactly what I wrote:
> if we fail to delete the block group's item from the extent tree (the
> call to remove_block_group_item()),
> we end up decrementing the reference count only once because we jump
> to the out label - but we
> should have decremented it twice, once for the rbtree removal, which
> happened before, and once for
> the lookup at the start of the function.


Right, and this is case 2 I described in my 2nd email. However my
initial email referred to case 1 from my 2nd email. There are
essentially 2 bugs w.r.t missing a put_block_group: one happens when
remove_block-group_free_space fails and the 2nd one (which you have
described) when remove_block_group_item fails. IMO the change log should
describe the 2 issues.

> 
> Thanks.
> 
>> With this
>> modification to the changelog:
>>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>>
>>>
>>> To make things less error prone, decrement the reference count for the
>>> rbtree immediately after removing the block group from it. This also
>>> eleminates the need for two different exit labels on error, renaming
>>> 'out_put_label' to just 'out' and removing the old 'out'.
>>
>> I agree with this.
>>
>>>
>>> Fixes: f6033c5e333238 ("btrfs: fix block group leak when removing fails")
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>
>> <snip>
diff mbox series

Patch

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 176e8a292fd1..5bb76a437f5b 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -940,7 +940,7 @@  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	path = btrfs_alloc_path();
 	if (!path) {
 		ret = -ENOMEM;
-		goto out_put_group;
+		goto out;
 	}
 
 	/*
@@ -978,7 +978,7 @@  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 		ret = btrfs_orphan_add(trans, BTRFS_I(inode));
 		if (ret) {
 			btrfs_add_delayed_iput(inode);
-			goto out_put_group;
+			goto out;
 		}
 		clear_nlink(inode);
 		/* One for the block groups ref */
@@ -1001,13 +1001,13 @@  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 
 	ret = btrfs_search_slot(trans, tree_root, &key, path, -1, 1);
 	if (ret < 0)
-		goto out_put_group;
+		goto out;
 	if (ret > 0)
 		btrfs_release_path(path);
 	if (ret == 0) {
 		ret = btrfs_del_item(trans, tree_root, path);
 		if (ret)
-			goto out_put_group;
+			goto out;
 		btrfs_release_path(path);
 	}
 
@@ -1016,6 +1016,9 @@  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 		 &fs_info->block_group_cache_tree);
 	RB_CLEAR_NODE(&block_group->cache_node);
 
+	/* Once for the block groups rbtree. */
+	btrfs_put_block_group(block_group);
+
 	if (fs_info->first_logical_byte == block_group->start)
 		fs_info->first_logical_byte = (u64)-1;
 	spin_unlock(&fs_info->block_group_cache_lock);
@@ -1125,10 +1128,7 @@  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 
 	ret = remove_block_group_free_space(trans, block_group);
 	if (ret)
-		goto out_put_group;
-
-	/* Once for the block groups rbtree */
-	btrfs_put_block_group(block_group);
+		goto out;
 
 	ret = remove_block_group_item(trans, path, block_group);
 	if (ret < 0)
@@ -1145,10 +1145,9 @@  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 		free_extent_map(em);
 	}
 
-out_put_group:
+out:
 	/* Once for the lookup reference */
 	btrfs_put_block_group(block_group);
-out:
 	if (remove_rsv)
 		btrfs_delayed_refs_rsv_release(fs_info, 1);
 	btrfs_free_path(path);