Message ID | 20200601181206.24852-1-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] Btrfs: fix a block group ref counter leak after failure to remove block group | expand |
On 1.06.20 г. 21:12 ч., fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When removing a block group, if we fail to delete the block group's item > from the extent tree, we jump to the 'out' label and end up decrementing > the block group's reference count once only (by 1), resulting in a counter > leak because the block group at that point was already removed from the > block group cache rbtree - so we have to decrement the reference count > twice, once for the rbtree and once for our lookup at the start of the > function. However I'm having hard time reconciling this. The block group is removed from the block_group_cache_tree after we've called btrfs_del_item. So if btrfs_del_item or btrfs_search_slot fail the code jumps at out_put_group and puts the reference acquired at the beginning of the function via btrfs_lookup_block_group. I think what you meant is if we fail to delete the block group's item from the freespace tree, that is if we fail remove_block_group_free_space, then we'd have a ref leak. With this modification to the changelog: Reviewed-by: Nikolay Borisov <nborisov@suse.com> > > To make things less error prone, decrement the reference count for the > rbtree immediately after removing the block group from it. This also > eleminates the need for two different exit labels on error, renaming > 'out_put_label' to just 'out' and removing the old 'out'. I agree with this. > > Fixes: f6033c5e333238 ("btrfs: fix block group leak when removing fails") > Signed-off-by: Filipe Manana <fdmanana@suse.com> <snip>
On 3.06.20 г. 10:32 ч., Nikolay Borisov wrote: > > > On 1.06.20 г. 21:12 ч., fdmanana@kernel.org wrote: >> From: Filipe Manana <fdmanana@suse.com> >> >> When removing a block group, if we fail to delete the block group's item >> from the extent tree, we jump to the 'out' label and end up decrementing >> the block group's reference count once only (by 1), resulting in a counter >> leak because the block group at that point was already removed from the >> block group cache rbtree - so we have to decrement the reference count >> twice, once for the rbtree and once for our lookup at the start of the >> function. > > However I'm having hard time reconciling this. The block group is > removed from the block_group_cache_tree after we've called > btrfs_del_item. So if btrfs_del_item or btrfs_search_slot fail the code > jumps at out_put_group and puts the reference acquired at the beginning > of the function via btrfs_lookup_block_group. > > I think what you meant is if we fail to delete the block group's item > from the freespace tree, that is if we fail > remove_block_group_free_space, then we'd have a ref leak. With this > modification to the changelog: > Looking again in this function without this patch the sequence of remove_block_group_free_space/btrfs_put_block-group/remove_block_group_item is really bogus. 1. If remove_block_group_free_space fails the code would jump to out_put_group which would leak the ref count for the rb tree 2. If remove_block_group_item (removal from extent tree) fails then the code would jump to out: which won't drop the reference taken in btrfs_remove_block_group... > Reviewed-by: Nikolay Borisov <nborisov@suse.com> > >> >> To make things less error prone, decrement the reference count for the >> rbtree immediately after removing the block group from it. This also >> eleminates the need for two different exit labels on error, renaming >> 'out_put_label' to just 'out' and removing the old 'out'. > > I agree with this. > >> >> Fixes: f6033c5e333238 ("btrfs: fix block group leak when removing fails") >> Signed-off-by: Filipe Manana <fdmanana@suse.com> > > <snip> >
On Wed, Jun 3, 2020 at 8:32 AM Nikolay Borisov <nborisov@suse.com> wrote: > > > > On 1.06.20 г. 21:12 ч., fdmanana@kernel.org wrote: > > From: Filipe Manana <fdmanana@suse.com> > > > > When removing a block group, if we fail to delete the block group's item > > from the extent tree, we jump to the 'out' label and end up decrementing > > the block group's reference count once only (by 1), resulting in a counter > > leak because the block group at that point was already removed from the > > block group cache rbtree - so we have to decrement the reference count > > twice, once for the rbtree and once for our lookup at the start of the > > function. > > However I'm having hard time reconciling this. The block group is > removed from the block_group_cache_tree after we've called > btrfs_del_item. So if btrfs_del_item or btrfs_search_slot fail the code > jumps at out_put_group and puts the reference acquired at the beginning > of the function via btrfs_lookup_block_group. > > I think what you meant is if we fail to delete the block group's item > from the freespace tree, that is if we fail > remove_block_group_free_space, then we'd have a ref leak. What I meant is exactly what I wrote: if we fail to delete the block group's item from the extent tree (the call to remove_block_group_item()), we end up decrementing the reference count only once because we jump to the out label - but we should have decremented it twice, once for the rbtree removal, which happened before, and once for the lookup at the start of the function. Thanks. > With this > modification to the changelog: > > Reviewed-by: Nikolay Borisov <nborisov@suse.com> > > > > > To make things less error prone, decrement the reference count for the > > rbtree immediately after removing the block group from it. This also > > eleminates the need for two different exit labels on error, renaming > > 'out_put_label' to just 'out' and removing the old 'out'. > > I agree with this. > > > > > Fixes: f6033c5e333238 ("btrfs: fix block group leak when removing fails") > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > <snip>
On 3.06.20 г. 12:30 ч., Filipe Manana wrote: > On Wed, Jun 3, 2020 at 8:32 AM Nikolay Borisov <nborisov@suse.com> wrote: >> >> >> >> On 1.06.20 г. 21:12 ч., fdmanana@kernel.org wrote: >>> From: Filipe Manana <fdmanana@suse.com> >>> >>> When removing a block group, if we fail to delete the block group's item >>> from the extent tree, we jump to the 'out' label and end up decrementing >>> the block group's reference count once only (by 1), resulting in a counter >>> leak because the block group at that point was already removed from the >>> block group cache rbtree - so we have to decrement the reference count >>> twice, once for the rbtree and once for our lookup at the start of the >>> function. >> >> However I'm having hard time reconciling this. The block group is >> removed from the block_group_cache_tree after we've called >> btrfs_del_item. So if btrfs_del_item or btrfs_search_slot fail the code >> jumps at out_put_group and puts the reference acquired at the beginning >> of the function via btrfs_lookup_block_group. >> >> I think what you meant is if we fail to delete the block group's item >> from the freespace tree, that is if we fail >> remove_block_group_free_space, then we'd have a ref leak. > > What I meant is exactly what I wrote: > if we fail to delete the block group's item from the extent tree (the > call to remove_block_group_item()), > we end up decrementing the reference count only once because we jump > to the out label - but we > should have decremented it twice, once for the rbtree removal, which > happened before, and once for > the lookup at the start of the function. Right, and this is case 2 I described in my 2nd email. However my initial email referred to case 1 from my 2nd email. There are essentially 2 bugs w.r.t missing a put_block_group: one happens when remove_block-group_free_space fails and the 2nd one (which you have described) when remove_block_group_item fails. IMO the change log should describe the 2 issues. > > Thanks. > >> With this >> modification to the changelog: >> >> Reviewed-by: Nikolay Borisov <nborisov@suse.com> >> >>> >>> To make things less error prone, decrement the reference count for the >>> rbtree immediately after removing the block group from it. This also >>> eleminates the need for two different exit labels on error, renaming >>> 'out_put_label' to just 'out' and removing the old 'out'. >> >> I agree with this. >> >>> >>> Fixes: f6033c5e333238 ("btrfs: fix block group leak when removing fails") >>> Signed-off-by: Filipe Manana <fdmanana@suse.com> >> >> <snip>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 176e8a292fd1..5bb76a437f5b 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -940,7 +940,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; - goto out_put_group; + goto out; } /* @@ -978,7 +978,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, ret = btrfs_orphan_add(trans, BTRFS_I(inode)); if (ret) { btrfs_add_delayed_iput(inode); - goto out_put_group; + goto out; } clear_nlink(inode); /* One for the block groups ref */ @@ -1001,13 +1001,13 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, ret = btrfs_search_slot(trans, tree_root, &key, path, -1, 1); if (ret < 0) - goto out_put_group; + goto out; if (ret > 0) btrfs_release_path(path); if (ret == 0) { ret = btrfs_del_item(trans, tree_root, path); if (ret) - goto out_put_group; + goto out; btrfs_release_path(path); } @@ -1016,6 +1016,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, &fs_info->block_group_cache_tree); RB_CLEAR_NODE(&block_group->cache_node); + /* Once for the block groups rbtree. */ + btrfs_put_block_group(block_group); + if (fs_info->first_logical_byte == block_group->start) fs_info->first_logical_byte = (u64)-1; spin_unlock(&fs_info->block_group_cache_lock); @@ -1125,10 +1128,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, ret = remove_block_group_free_space(trans, block_group); if (ret) - goto out_put_group; - - /* Once for the block groups rbtree */ - btrfs_put_block_group(block_group); + goto out; ret = remove_block_group_item(trans, path, block_group); if (ret < 0) @@ -1145,10 +1145,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, free_extent_map(em); } -out_put_group: +out: /* Once for the lookup reference */ btrfs_put_block_group(block_group); -out: if (remove_rsv) btrfs_delayed_refs_rsv_release(fs_info, 1); btrfs_free_path(path);