Message ID | 20200722160722.8641-1-josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] btrfs: free fs roots on failed mount | expand |
On Wed, Jul 22, 2020 at 12:07:21PM -0400, Josef Bacik wrote: > While testing a weird problem with -o degraded, I noticed I was getting > leaked root errors > > BTRFS warning (device loop0): writable mount is not allowed due to too many missing devices > BTRFS error (device loop0): open_ctree failed > BTRFS error (device loop0): leaked root -9-0 refcount 1 > > This is the DATA_RELOC root, which gets read before the other fs roots, > but is included in the fs roots radix tree, and thus gets freed by > btrfs_free_fs_roots. Fix this by moving the call into fail_tree_roots: > in open_ctree. With this fix we no longer leak that root on mount > failure. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/disk-io.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index c850d7f44fbe..f1fdbdd44c02 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3421,7 +3421,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device > fail_trans_kthread: > kthread_stop(fs_info->transaction_kthread); > btrfs_cleanup_transaction(fs_info); > - btrfs_free_fs_roots(fs_info); > fail_cleaner: > kthread_stop(fs_info->cleaner_kthread); > > @@ -3441,6 +3440,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device > btrfs_put_block_group_cache(fs_info); > > fail_tree_roots: > + btrfs_free_fs_roots(fs_info); > free_root_pointers(fs_info, true); The data reloc tree is freed inside free_root_pointers, that it's also in the radix tree is for convenience so I'd rather fix it inside free_root_pointers and not reorder btrfs_free_fs_roots. > invalidate_inode_pages2(fs_info->btree_inode->i_mapping); > > -- > 2.24.1
On 7/27/20 10:19 AM, David Sterba wrote: > On Wed, Jul 22, 2020 at 12:07:21PM -0400, Josef Bacik wrote: >> While testing a weird problem with -o degraded, I noticed I was getting >> leaked root errors >> >> BTRFS warning (device loop0): writable mount is not allowed due to too many missing devices >> BTRFS error (device loop0): open_ctree failed >> BTRFS error (device loop0): leaked root -9-0 refcount 1 >> >> This is the DATA_RELOC root, which gets read before the other fs roots, >> but is included in the fs roots radix tree, and thus gets freed by >> btrfs_free_fs_roots. Fix this by moving the call into fail_tree_roots: >> in open_ctree. With this fix we no longer leak that root on mount >> failure. >> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com> >> --- >> fs/btrfs/disk-io.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index c850d7f44fbe..f1fdbdd44c02 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -3421,7 +3421,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device >> fail_trans_kthread: >> kthread_stop(fs_info->transaction_kthread); >> btrfs_cleanup_transaction(fs_info); >> - btrfs_free_fs_roots(fs_info); >> fail_cleaner: >> kthread_stop(fs_info->cleaner_kthread); >> >> @@ -3441,6 +3440,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device >> btrfs_put_block_group_cache(fs_info); >> >> fail_tree_roots: >> + btrfs_free_fs_roots(fs_info); >> free_root_pointers(fs_info, true); > > The data reloc tree is freed inside free_root_pointers, that it's also > in the radix tree is for convenience so I'd rather fix it inside > free_root_pointers and not reorder btrfs_free_fs_roots. > We can't do that, because free_root_pointers() is called to drop the extent buffers when we're looping through the backup roots. btrfs_free_fs_roots() is the correct thing to do here, it goes through anything that ended up in the radix tree. Thanks, Josef
On Mon, Jul 27, 2020 at 10:33:32AM -0400, Josef Bacik wrote: > >> @@ -3441,6 +3440,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device > >> btrfs_put_block_group_cache(fs_info); > >> > >> fail_tree_roots: > >> + btrfs_free_fs_roots(fs_info); > >> free_root_pointers(fs_info, true); > > > > The data reloc tree is freed inside free_root_pointers, that it's also > > in the radix tree is for convenience so I'd rather fix it inside > > free_root_pointers and not reorder btrfs_free_fs_roots. > > We can't do that, because free_root_pointers() is called to drop the extent > buffers when we're looping through the backup roots. btrfs_free_fs_roots() is > the correct thing to do here, it goes through anything that ended up in the > radix tree. Thanks, I see, free_root_pointers is used elsewhere. I'm concerned about the reordeing because there are several functions that are now called after the roots are freed. (before) btrfs_free_fs_roots(fs_info); kthread_stop(fs_info->cleaner_kthread); filemap_write_and_wait(fs_info->btree_inode->i_mapping); btrfs_sysfs_remove_mounted(fs_info); btrfs_sysfs_remove_fsid(fs_info->fs_devices); btrfs_put_block_group_cache(fs_info); (after) btrfs_free_fs_roots(fs_info); If something called by btrfs_free_fs_roots depends on structures removed/cleaned by the other functions, eg. btrfs_put_block_group_cache ti could be a problem. I haven't spotted anything so far, the functions are called after failure still during mount, this is easier than shutdown sequence after a full mount.
On 7/27/20 11:17 AM, David Sterba wrote: > On Mon, Jul 27, 2020 at 10:33:32AM -0400, Josef Bacik wrote: >>>> @@ -3441,6 +3440,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device >>>> btrfs_put_block_group_cache(fs_info); >>>> >>>> fail_tree_roots: >>>> + btrfs_free_fs_roots(fs_info); >>>> free_root_pointers(fs_info, true); >>> >>> The data reloc tree is freed inside free_root_pointers, that it's also >>> in the radix tree is for convenience so I'd rather fix it inside >>> free_root_pointers and not reorder btrfs_free_fs_roots. >> >> We can't do that, because free_root_pointers() is called to drop the extent >> buffers when we're looping through the backup roots. btrfs_free_fs_roots() is >> the correct thing to do here, it goes through anything that ended up in the >> radix tree. Thanks, > > I see, free_root_pointers is used elsewhere. I'm concerned about the > reordeing because there are several functions that are now called after > the roots are freed. > > (before) btrfs_free_fs_roots(fs_info); > > kthread_stop(fs_info->cleaner_kthread); > filemap_write_and_wait(fs_info->btree_inode->i_mapping); > btrfs_sysfs_remove_mounted(fs_info); > btrfs_sysfs_remove_fsid(fs_info->fs_devices); > btrfs_put_block_group_cache(fs_info); > > (after) btrfs_free_fs_roots(fs_info); > > If something called by btrfs_free_fs_roots depends on structures > removed/cleaned by the other functions, eg. btrfs_put_block_group_cache > ti could be a problem. > > I haven't spotted anything so far, the functions are called after > failure still during mount, this is easier than shutdown sequence after > a full mount. > Yeah I'm always loathe to move this stuff around. I actually think we can end up with the case described in close_ctree if we get an error during log replay on mount. I'll rework this some more. Thanks, Josef
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c850d7f44fbe..f1fdbdd44c02 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3421,7 +3421,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device fail_trans_kthread: kthread_stop(fs_info->transaction_kthread); btrfs_cleanup_transaction(fs_info); - btrfs_free_fs_roots(fs_info); fail_cleaner: kthread_stop(fs_info->cleaner_kthread); @@ -3441,6 +3440,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device btrfs_put_block_group_cache(fs_info); fail_tree_roots: + btrfs_free_fs_roots(fs_info); free_root_pointers(fs_info, true); invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
While testing a weird problem with -o degraded, I noticed I was getting leaked root errors BTRFS warning (device loop0): writable mount is not allowed due to too many missing devices BTRFS error (device loop0): open_ctree failed BTRFS error (device loop0): leaked root -9-0 refcount 1 This is the DATA_RELOC root, which gets read before the other fs roots, but is included in the fs roots radix tree, and thus gets freed by btrfs_free_fs_roots. Fix this by moving the call into fail_tree_roots: in open_ctree. With this fix we no longer leak that root on mount failure. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/disk-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)