diff mbox series

[1/2] btrfs: free fs roots on failed mount

Message ID 20200722160722.8641-1-josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series [1/2] btrfs: free fs roots on failed mount | expand

Commit Message

Josef Bacik July 22, 2020, 4:07 p.m. UTC
While testing a weird problem with -o degraded, I noticed I was getting
leaked root errors

BTRFS warning (device loop0): writable mount is not allowed due to too many missing devices
BTRFS error (device loop0): open_ctree failed
BTRFS error (device loop0): leaked root -9-0 refcount 1

This is the DATA_RELOC root, which gets read before the other fs roots,
but is included in the fs roots radix tree, and thus gets freed by
btrfs_free_fs_roots.  Fix this by moving the call into fail_tree_roots:
in open_ctree.  With this fix we no longer leak that root on mount
failure.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Sterba July 27, 2020, 2:19 p.m. UTC | #1
On Wed, Jul 22, 2020 at 12:07:21PM -0400, Josef Bacik wrote:
> While testing a weird problem with -o degraded, I noticed I was getting
> leaked root errors
> 
> BTRFS warning (device loop0): writable mount is not allowed due to too many missing devices
> BTRFS error (device loop0): open_ctree failed
> BTRFS error (device loop0): leaked root -9-0 refcount 1
> 
> This is the DATA_RELOC root, which gets read before the other fs roots,
> but is included in the fs roots radix tree, and thus gets freed by
> btrfs_free_fs_roots.  Fix this by moving the call into fail_tree_roots:
> in open_ctree.  With this fix we no longer leak that root on mount
> failure.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/disk-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c850d7f44fbe..f1fdbdd44c02 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3421,7 +3421,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>  fail_trans_kthread:
>  	kthread_stop(fs_info->transaction_kthread);
>  	btrfs_cleanup_transaction(fs_info);
> -	btrfs_free_fs_roots(fs_info);
>  fail_cleaner:
>  	kthread_stop(fs_info->cleaner_kthread);
>  
> @@ -3441,6 +3440,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>  	btrfs_put_block_group_cache(fs_info);
>  
>  fail_tree_roots:
> +	btrfs_free_fs_roots(fs_info);
>  	free_root_pointers(fs_info, true);

The data reloc tree is freed inside free_root_pointers, that it's also
in the radix tree is for convenience so I'd rather fix it inside
free_root_pointers and not reorder btrfs_free_fs_roots.

>  	invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
>  
> -- 
> 2.24.1
Josef Bacik July 27, 2020, 2:33 p.m. UTC | #2
On 7/27/20 10:19 AM, David Sterba wrote:
> On Wed, Jul 22, 2020 at 12:07:21PM -0400, Josef Bacik wrote:
>> While testing a weird problem with -o degraded, I noticed I was getting
>> leaked root errors
>>
>> BTRFS warning (device loop0): writable mount is not allowed due to too many missing devices
>> BTRFS error (device loop0): open_ctree failed
>> BTRFS error (device loop0): leaked root -9-0 refcount 1
>>
>> This is the DATA_RELOC root, which gets read before the other fs roots,
>> but is included in the fs roots radix tree, and thus gets freed by
>> btrfs_free_fs_roots.  Fix this by moving the call into fail_tree_roots:
>> in open_ctree.  With this fix we no longer leak that root on mount
>> failure.
>>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>>   fs/btrfs/disk-io.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index c850d7f44fbe..f1fdbdd44c02 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3421,7 +3421,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>>   fail_trans_kthread:
>>   	kthread_stop(fs_info->transaction_kthread);
>>   	btrfs_cleanup_transaction(fs_info);
>> -	btrfs_free_fs_roots(fs_info);
>>   fail_cleaner:
>>   	kthread_stop(fs_info->cleaner_kthread);
>>   
>> @@ -3441,6 +3440,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>>   	btrfs_put_block_group_cache(fs_info);
>>   
>>   fail_tree_roots:
>> +	btrfs_free_fs_roots(fs_info);
>>   	free_root_pointers(fs_info, true);
> 
> The data reloc tree is freed inside free_root_pointers, that it's also
> in the radix tree is for convenience so I'd rather fix it inside
> free_root_pointers and not reorder btrfs_free_fs_roots.
> 

We can't do that, because free_root_pointers() is called to drop the extent 
buffers when we're looping through the backup roots.  btrfs_free_fs_roots() is 
the correct thing to do here, it goes through anything that ended up in the 
radix tree.  Thanks,

Josef
David Sterba July 27, 2020, 3:17 p.m. UTC | #3
On Mon, Jul 27, 2020 at 10:33:32AM -0400, Josef Bacik wrote:
> >> @@ -3441,6 +3440,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
> >>   	btrfs_put_block_group_cache(fs_info);
> >>   
> >>   fail_tree_roots:
> >> +	btrfs_free_fs_roots(fs_info);
> >>   	free_root_pointers(fs_info, true);
> > 
> > The data reloc tree is freed inside free_root_pointers, that it's also
> > in the radix tree is for convenience so I'd rather fix it inside
> > free_root_pointers and not reorder btrfs_free_fs_roots.
> 
> We can't do that, because free_root_pointers() is called to drop the extent 
> buffers when we're looping through the backup roots.  btrfs_free_fs_roots() is 
> the correct thing to do here, it goes through anything that ended up in the 
> radix tree.  Thanks,

I see, free_root_pointers is used elsewhere. I'm concerned about the
reordeing because there are several functions that are now called after
the roots are freed.

(before)	btrfs_free_fs_roots(fs_info);

		kthread_stop(fs_info->cleaner_kthread);
		filemap_write_and_wait(fs_info->btree_inode->i_mapping);
		btrfs_sysfs_remove_mounted(fs_info);
		btrfs_sysfs_remove_fsid(fs_info->fs_devices);
		btrfs_put_block_group_cache(fs_info);

(after)		btrfs_free_fs_roots(fs_info);

If something called by btrfs_free_fs_roots depends on structures
removed/cleaned by the other functions, eg. btrfs_put_block_group_cache
ti could be a problem.

I haven't spotted anything so far, the functions are called after
failure still during mount, this is easier than shutdown sequence after
a full mount.
Josef Bacik July 27, 2020, 3:37 p.m. UTC | #4
On 7/27/20 11:17 AM, David Sterba wrote:
> On Mon, Jul 27, 2020 at 10:33:32AM -0400, Josef Bacik wrote:
>>>> @@ -3441,6 +3440,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>>>>    	btrfs_put_block_group_cache(fs_info);
>>>>    
>>>>    fail_tree_roots:
>>>> +	btrfs_free_fs_roots(fs_info);
>>>>    	free_root_pointers(fs_info, true);
>>>
>>> The data reloc tree is freed inside free_root_pointers, that it's also
>>> in the radix tree is for convenience so I'd rather fix it inside
>>> free_root_pointers and not reorder btrfs_free_fs_roots.
>>
>> We can't do that, because free_root_pointers() is called to drop the extent
>> buffers when we're looping through the backup roots.  btrfs_free_fs_roots() is
>> the correct thing to do here, it goes through anything that ended up in the
>> radix tree.  Thanks,
> 
> I see, free_root_pointers is used elsewhere. I'm concerned about the
> reordeing because there are several functions that are now called after
> the roots are freed.
> 
> (before)	btrfs_free_fs_roots(fs_info);
> 
> 		kthread_stop(fs_info->cleaner_kthread);
> 		filemap_write_and_wait(fs_info->btree_inode->i_mapping);
> 		btrfs_sysfs_remove_mounted(fs_info);
> 		btrfs_sysfs_remove_fsid(fs_info->fs_devices);
> 		btrfs_put_block_group_cache(fs_info);
> 
> (after)		btrfs_free_fs_roots(fs_info);
> 
> If something called by btrfs_free_fs_roots depends on structures
> removed/cleaned by the other functions, eg. btrfs_put_block_group_cache
> ti could be a problem.
> 
> I haven't spotted anything so far, the functions are called after
> failure still during mount, this is easier than shutdown sequence after
> a full mount.
> 

Yeah I'm always loathe to move this stuff around.  I actually think we can end 
up with the case described in close_ctree if we get an error during log replay 
on mount.  I'll rework this some more.  Thanks,

Josef
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c850d7f44fbe..f1fdbdd44c02 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3421,7 +3421,6 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 fail_trans_kthread:
 	kthread_stop(fs_info->transaction_kthread);
 	btrfs_cleanup_transaction(fs_info);
-	btrfs_free_fs_roots(fs_info);
 fail_cleaner:
 	kthread_stop(fs_info->cleaner_kthread);
 
@@ -3441,6 +3440,7 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	btrfs_put_block_group_cache(fs_info);
 
 fail_tree_roots:
+	btrfs_free_fs_roots(fs_info);
 	free_root_pointers(fs_info, true);
 	invalidate_inode_pages2(fs_info->btree_inode->i_mapping);