Message ID | 20200724064610.69442-2-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] btrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode | expand |
On 7/24/20 2:46 AM, Qu Wenruo wrote: > [BUG] > When quota is enabled for TEST_DEV, generic/013 sometimes fails like this: > > generic/013 14s ... _check_dmesg: something found in dmesg (see xfstests-dev/results//generic/013.dmesg) > > And with the following metadata leak: > > BTRFS warning (device dm-3): qgroup 0/1370 has unreleased space, type 2 rsv 49152 > ------------[ cut here ]------------ > WARNING: CPU: 2 PID: 47912 at fs/btrfs/disk-io.c:4078 close_ctree+0x1dc/0x323 [btrfs] > Call Trace: > btrfs_put_super+0x15/0x17 [btrfs] > generic_shutdown_super+0x72/0x110 > kill_anon_super+0x18/0x30 > btrfs_kill_super+0x17/0x30 [btrfs] > deactivate_locked_super+0x3b/0xa0 > deactivate_super+0x40/0x50 > cleanup_mnt+0x135/0x190 > __cleanup_mnt+0x12/0x20 > task_work_run+0x64/0xb0 > __prepare_exit_to_usermode+0x1bc/0x1c0 > __syscall_return_slowpath+0x47/0x230 > do_syscall_64+0x64/0xb0 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > ---[ end trace a6cfd45ba80e4e06 ]--- > BTRFS error (device dm-3): qgroup reserved space leaked > BTRFS info (device dm-3): disk space caching is enabled > BTRFS info (device dm-3): has skinny extents > > [CAUSE] > The qgroup preallocated meta rsv operations of that offending root are: > > btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072 > btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072 > btrfs_subvolume_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=49152 > btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072 > btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072 > > It's pretty obvious that, we reserve qgroup meta rsv in > btrfs_subvolume_reserve_metadata(), but doesn't have corresponding > release/convert calls in btrfs_subvolume_release_metadata(). > > This leads to the leakage. > > [FIX] > To fix this bug, we should follow what we're doing in > btrfs_delalloc_reserve_metadata(), where we reserve qgroup space, and > add it to block_rsv->qgroup_rsv_reserved. > > And free the qgroup reserved metadata space when releasing the > block_rsv. > > To do this, we need to change the btrfs_subvolume_release_metadata() to > accept btrfs_root, and record the qgroup_to_release number, and call > btrfs_qgroup_convert_reserved_meta() for it. > > Fixes: 733e03a0b26a ("btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans") > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Seems like this class of issues could be avoided if the qgroup reservation stuff actually took the block_rsv so they could update the ->qgroup_rsv_reserved counter, and then the reserve/cleanup functions would do the right thing themselves, instead of needing to make sure they adjust things as necessary in all the callers. This would be reasonable follow up work. Thanks, Josef
On 2020/8/25 上午2:08, Josef Bacik wrote: > On 7/24/20 2:46 AM, Qu Wenruo wrote: >> [BUG] >> When quota is enabled for TEST_DEV, generic/013 sometimes fails like >> this: >> >> generic/013 14s ... _check_dmesg: something found in dmesg (see >> xfstests-dev/results//generic/013.dmesg) >> >> And with the following metadata leak: >> >> BTRFS warning (device dm-3): qgroup 0/1370 has unreleased space, >> type 2 rsv 49152 >> ------------[ cut here ]------------ >> WARNING: CPU: 2 PID: 47912 at fs/btrfs/disk-io.c:4078 >> close_ctree+0x1dc/0x323 [btrfs] >> Call Trace: >> btrfs_put_super+0x15/0x17 [btrfs] >> generic_shutdown_super+0x72/0x110 >> kill_anon_super+0x18/0x30 >> btrfs_kill_super+0x17/0x30 [btrfs] >> deactivate_locked_super+0x3b/0xa0 >> deactivate_super+0x40/0x50 >> cleanup_mnt+0x135/0x190 >> __cleanup_mnt+0x12/0x20 >> task_work_run+0x64/0xb0 >> __prepare_exit_to_usermode+0x1bc/0x1c0 >> __syscall_return_slowpath+0x47/0x230 >> do_syscall_64+0x64/0xb0 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> ---[ end trace a6cfd45ba80e4e06 ]--- >> BTRFS error (device dm-3): qgroup reserved space leaked >> BTRFS info (device dm-3): disk space caching is enabled >> BTRFS info (device dm-3): has skinny extents >> >> [CAUSE] >> The qgroup preallocated meta rsv operations of that offending root are: >> >> btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 >> num_bytes=131072 >> btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 >> num_bytes=131072 >> btrfs_subvolume_reserve_metadata: rsv_meta_prealloc root=1370 >> num_bytes=49152 >> btrfs_delayed_inode_release_metadata: convert_meta_prealloc >> root=1370 num_bytes=-131072 >> btrfs_delayed_inode_release_metadata: convert_meta_prealloc >> root=1370 num_bytes=-131072 >> >> It's pretty obvious that, we reserve qgroup meta rsv in >> btrfs_subvolume_reserve_metadata(), but doesn't have corresponding >> release/convert calls in btrfs_subvolume_release_metadata(). >> >> This leads to the leakage. >> >> [FIX] >> To fix this bug, we should follow what we're doing in >> btrfs_delalloc_reserve_metadata(), where we reserve qgroup space, and >> add it to block_rsv->qgroup_rsv_reserved. >> >> And free the qgroup reserved metadata space when releasing the >> block_rsv. >> >> To do this, we need to change the btrfs_subvolume_release_metadata() to >> accept btrfs_root, and record the qgroup_to_release number, and call >> btrfs_qgroup_convert_reserved_meta() for it. >> >> Fixes: 733e03a0b26a ("btrfs: qgroup: Split meta rsv type into >> meta_prealloc and meta_pertrans") >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > Reviewed-by: Josef Bacik <josef@toxicpanda.com> > > Seems like this class of issues could be avoided if the qgroup > reservation stuff actually took the block_rsv so they could update the > ->qgroup_rsv_reserved counter, and then the reserve/cleanup functions > would do the right thing themselves, instead of needing to make sure > they adjust things as necessary in all the callers. This would be > reasonable follow up work. Thanks, Exactly. Especially the old qgroup specific reserve calculation is completely to reduce early EDQUOT, but now we have retry mechanism, we can completely rely on the blk rsv numbers. That would be my next qgroup work objective. Thanks, Qu > > Josef
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 9c7e466f27a9..e1db56ff8f6f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2619,7 +2619,7 @@ enum btrfs_flush_state { int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, struct btrfs_block_rsv *rsv, int nitems, bool use_global_rsv); -void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info, +void btrfs_subvolume_release_metadata(struct btrfs_root *root, struct btrfs_block_rsv *rsv); void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 611b3412fbfd..e9def7cdf662 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4047,7 +4047,7 @@ int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry) err = ret; inode->i_flags |= S_DEAD; out_release: - btrfs_subvolume_release_metadata(fs_info, &block_rsv); + btrfs_subvolume_release_metadata(root, &block_rsv); out_up_write: up_write(&fs_info->subvol_sem); if (err) { diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index bd3511c5ca81..976aeddca86c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -618,7 +618,7 @@ static noinline int create_subvol(struct inode *dir, trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) { ret = PTR_ERR(trans); - btrfs_subvolume_release_metadata(fs_info, &block_rsv); + btrfs_subvolume_release_metadata(root, &block_rsv); goto fail_free; } trans->block_rsv = &block_rsv; @@ -742,7 +742,7 @@ static noinline int create_subvol(struct inode *dir, kfree(root_item); trans->block_rsv = NULL; trans->bytes_reserved = 0; - btrfs_subvolume_release_metadata(fs_info, &block_rsv); + btrfs_subvolume_release_metadata(root, &block_rsv); err = btrfs_commit_transaction(trans); if (err && !ret) @@ -856,7 +856,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, if (ret && pending_snapshot->snap) pending_snapshot->snap->anon_dev = 0; btrfs_put_root(pending_snapshot->snap); - btrfs_subvolume_release_metadata(fs_info, &pending_snapshot->block_rsv); + btrfs_subvolume_release_metadata(root, &pending_snapshot->block_rsv); free_pending: if (pending_snapshot->anon_dev) free_anon_bdev(pending_snapshot->anon_dev); diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index c89697486366..702dc5441f03 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -512,11 +512,20 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, if (ret && qgroup_num_bytes) btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes); + if (!ret) { + spin_lock(&rsv->lock); + rsv->qgroup_rsv_reserved += qgroup_num_bytes; + spin_unlock(&rsv->lock); + } return ret; } -void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info, +void btrfs_subvolume_release_metadata(struct btrfs_root *root, struct btrfs_block_rsv *rsv) { - btrfs_block_rsv_release(fs_info, rsv, (u64)-1, NULL); + struct btrfs_fs_info *fs_info = root->fs_info; + u64 qgroup_to_release; + + btrfs_block_rsv_release(fs_info, rsv, (u64)-1, &qgroup_to_release); + btrfs_qgroup_convert_reserved_meta(root, qgroup_to_release); }
[BUG] When quota is enabled for TEST_DEV, generic/013 sometimes fails like this: generic/013 14s ... _check_dmesg: something found in dmesg (see xfstests-dev/results//generic/013.dmesg) And with the following metadata leak: BTRFS warning (device dm-3): qgroup 0/1370 has unreleased space, type 2 rsv 49152 ------------[ cut here ]------------ WARNING: CPU: 2 PID: 47912 at fs/btrfs/disk-io.c:4078 close_ctree+0x1dc/0x323 [btrfs] Call Trace: btrfs_put_super+0x15/0x17 [btrfs] generic_shutdown_super+0x72/0x110 kill_anon_super+0x18/0x30 btrfs_kill_super+0x17/0x30 [btrfs] deactivate_locked_super+0x3b/0xa0 deactivate_super+0x40/0x50 cleanup_mnt+0x135/0x190 __cleanup_mnt+0x12/0x20 task_work_run+0x64/0xb0 __prepare_exit_to_usermode+0x1bc/0x1c0 __syscall_return_slowpath+0x47/0x230 do_syscall_64+0x64/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace a6cfd45ba80e4e06 ]--- BTRFS error (device dm-3): qgroup reserved space leaked BTRFS info (device dm-3): disk space caching is enabled BTRFS info (device dm-3): has skinny extents [CAUSE] The qgroup preallocated meta rsv operations of that offending root are: btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072 btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072 btrfs_subvolume_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=49152 btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072 btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072 It's pretty obvious that, we reserve qgroup meta rsv in btrfs_subvolume_reserve_metadata(), but doesn't have corresponding release/convert calls in btrfs_subvolume_release_metadata(). This leads to the leakage. [FIX] To fix this bug, we should follow what we're doing in btrfs_delalloc_reserve_metadata(), where we reserve qgroup space, and add it to block_rsv->qgroup_rsv_reserved. And free the qgroup reserved metadata space when releasing the block_rsv. To do this, we need to change the btrfs_subvolume_release_metadata() to accept btrfs_root, and record the qgroup_to_release number, and call btrfs_qgroup_convert_reserved_meta() for it. Fixes: 733e03a0b26a ("btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans") Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/ctree.h | 2 +- fs/btrfs/inode.c | 2 +- fs/btrfs/ioctl.c | 6 +++--- fs/btrfs/root-tree.c | 13 +++++++++++-- 4 files changed, 16 insertions(+), 7 deletions(-)