From patchwork Wed Jan 23 07:15:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10776625 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40E121575 for ; Wed, 23 Jan 2019 07:15:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 29BFA2874A for ; Wed, 23 Jan 2019 07:15:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1DAD12B03E; Wed, 23 Jan 2019 07:15:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A0FA92AA92 for ; Wed, 23 Jan 2019 07:15:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726158AbfAWHP2 (ORCPT ); Wed, 23 Jan 2019 02:15:28 -0500 Received: from mx2.suse.de ([195.135.220.15]:47334 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725950AbfAWHP2 (ORCPT ); Wed, 23 Jan 2019 02:15:28 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 84E21B088 for ; Wed, 23 Jan 2019 07:15:26 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: dsterba@suse.cz Subject: [Patch v5 1/7] btrfs: qgroup: Move reserved data account from btrfs_delayed_ref_head to btrfs_qgroup_extent_record Date: Wed, 23 Jan 2019 15:15:12 +0800 Message-Id: <20190123071518.2528-2-wqu@suse.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190123071518.2528-1-wqu@suse.com> References: <20190123071518.2528-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP [BUG] Btrfs/139 will fail with a pretty high possibility if the testing machine (VM) only has 2G ram. Resulting the final write success while it should fail due to EDQUOT, and the result fs will has quota exceeding the limit by 16K. The simplified reproducer will be: (needs a 2G ram VM) mkfs.btrfs -f $dev mount $dev $mnt btrfs subv create $mnt/subv btrfs quota enable $mnt btrfs quota rescan -w $mnt btrfs qgroup limit -e 1G $mnt/subv for i in $(seq -w 1 8); do xfs_io -f -c "pwrite 0 128M" $mnt/subv/file_$i > /dev/null echo "file $i written" > /dev/kmsg done sync btrfs qgroup show -pcre --raw $mnt The last pwrite will not trigger EDQUOT and final qgroup show will show something like: qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/5 16384 16384 none none --- --- 0/256 1073758208 1073758208 none 1073741824 --- --- And 1073758208 is larger than > 1073741824. [CAUSE] It's a bug in btrfs qgroup data reserved space management. For quota limit, we must ensure that: reserved (data + metadata) + rfer/excl <= limit Since rfer/excl is only updated at transaction commmit time, reserved space needs to be taken special care. One important part of reserved space is data, and for a new data extent written to disk, we still need to take the reserved space until rfer/excl numbers get update. Originally when an ordered extent finishes, we migrate the reserved qgroup data space from extent_io tree to delayed ref head of the data extent, expecting delayed ref will only be cleaned up at commit transaction time. However for small RAM machine, due to memory pressure dirty pages can be flushed back to disk without committing a transaction. The related events will be something like: BTRFS info (device dm-3): has skinny extents file 1 written btrfs_finish_ordered_io: ino=258 ordered offset=0 len=54947840 btrfs_finish_ordered_io: ino=258 ordered offset=54947840 len=5636096 btrfs_finish_ordered_io: ino=258 ordered offset=61153280 len=57344 btrfs_finish_ordered_io: ino=258 ordered offset=61210624 len=8192 btrfs_finish_ordered_io: ino=258 ordered offset=60583936 len=569344 cleanup_ref_head: num_bytes=54947840 cleanup_ref_head: num_bytes=5636096 cleanup_ref_head: num_bytes=569344 cleanup_ref_head: num_bytes=57344 cleanup_ref_head: num_bytes=8192 ^^^^^^^^^^^^^^^^ This will free qgroup data reserved space file 2 written ... file 8 written cleanup_ref_head: num_bytes=8192 ... btrfs_commit_transaction <<< the only transaction committed during the test When file 2 is written, we have already freed 128M reserved qgroup data space for ino 258. Thus later write won't trigger EDQUOT. This allows us to write more data beyond qgroup limit. In my 2G ram VM, it could reach about 1.2G before hitting EDQUOT. [FIX] By moving reserved qgroup data space from btrfs_delayed_ref_head to btrfs_qgroup_extent_record, we can ensure that reserved qgroup data space won't be freed half way before commit transaction, thus fix the problem. Fixes: f64d5ca86821 ("btrfs: delayed_ref: Add new function to record reserved space into delayed ref") Signed-off-by: Qu Wenruo --- fs/btrfs/delayed-ref.c | 15 ++++----------- fs/btrfs/delayed-ref.h | 11 ----------- fs/btrfs/extent-tree.c | 3 --- fs/btrfs/qgroup.c | 19 +++++++++++++++---- fs/btrfs/qgroup.h | 20 +++++++++++--------- include/trace/events/btrfs.h | 29 ----------------------------- 6 files changed, 30 insertions(+), 67 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index cad36c99a483..7d2a413df90d 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -602,17 +602,14 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref, RB_CLEAR_NODE(&head_ref->href_node); head_ref->processing = 0; head_ref->total_ref_mod = count_mod; - head_ref->qgroup_reserved = 0; - head_ref->qgroup_ref_root = 0; spin_lock_init(&head_ref->lock); mutex_init(&head_ref->mutex); if (qrecord) { if (ref_root && reserved) { - head_ref->qgroup_ref_root = ref_root; - head_ref->qgroup_reserved = reserved; + qrecord->data_rsv = reserved; + qrecord->data_rsv_refroot = ref_root; } - qrecord->bytenr = bytenr; qrecord->num_bytes = num_bytes; qrecord->old_roots = NULL; @@ -651,10 +648,6 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans, existing = htree_insert(&delayed_refs->href_root, &head_ref->href_node); if (existing) { - WARN_ON(qrecord && head_ref->qgroup_ref_root - && head_ref->qgroup_reserved - && existing->qgroup_ref_root - && existing->qgroup_reserved); update_existing_head_ref(trans, existing, head_ref, old_ref_mod); /* @@ -770,7 +763,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) && is_fstree(ref_root)) { - record = kmalloc(sizeof(*record), GFP_NOFS); + record = kzalloc(sizeof(*record), GFP_NOFS); if (!record) { kmem_cache_free(btrfs_delayed_tree_ref_cachep, ref); kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref); @@ -867,7 +860,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans, if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) && is_fstree(ref_root)) { - record = kmalloc(sizeof(*record), GFP_NOFS); + record = kzalloc(sizeof(*record), GFP_NOFS); if (!record) { kmem_cache_free(btrfs_delayed_data_ref_cachep, ref); kmem_cache_free(btrfs_delayed_ref_head_cachep, diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index d2af974f68a1..70606da440aa 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -102,17 +102,6 @@ struct btrfs_delayed_ref_head { */ int ref_mod; - /* - * For qgroup reserved space freeing. - * - * ref_root and reserved will be recorded after - * BTRFS_ADD_DELAYED_EXTENT is called. - * And will be used to free reserved qgroup space at - * run_delayed_refs() time. - */ - u64 qgroup_ref_root; - u64 qgroup_reserved; - /* * when a new extent is allocated, it is just reserved in memory * The actual extent isn't inserted into the extent allocation tree diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index b15afeae16df..208335a2aa29 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2494,9 +2494,6 @@ static void cleanup_ref_head_accounting(struct btrfs_trans_handle *trans, } } - /* Also free its reserved qgroup space */ - btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root, - head->qgroup_reserved); btrfs_delayed_refs_rsv_release(fs_info, nr_items); } diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 4e473a998219..f214b490d80c 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1546,12 +1546,18 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info, parent_node = *p; entry = rb_entry(parent_node, struct btrfs_qgroup_extent_record, node); - if (bytenr < entry->bytenr) + if (bytenr < entry->bytenr) { p = &(*p)->rb_left; - else if (bytenr > entry->bytenr) + } else if (bytenr > entry->bytenr) { p = &(*p)->rb_right; - else + } else { + if (record->data_rsv && !entry->data_rsv) { + entry->data_rsv = record->data_rsv; + entry->data_rsv_refroot = + record->data_rsv_refroot; + } return 1; + } } rb_link_node(&record->node, parent_node, p); @@ -1597,7 +1603,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr, if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || bytenr == 0 || num_bytes == 0) return 0; - record = kmalloc(sizeof(*record), gfp_flag); + record = kzalloc(sizeof(*record), gfp_flag); if (!record) return -ENOMEM; @@ -2576,6 +2582,11 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans) goto cleanup; } + /* Free the reserved data space */ + btrfs_qgroup_free_refroot(fs_info, + record->data_rsv_refroot, + record->data_rsv, + BTRFS_QGROUP_RSV_DATA); /* * Use SEQ_LAST as time_seq to do special search, which * doesn't lock tree or delayed_refs and search current diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 20c6bd5fa701..d4fae53969d4 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -45,6 +45,17 @@ struct btrfs_qgroup_extent_record { struct rb_node node; u64 bytenr; u64 num_bytes; + + /* + * For qgroup reserved data space freeing. + * + * @data_rsv_refroot and @data_rsv will be recorded after + * BTRFS_ADD_DELAYED_EXTENT is called. + * And will be used to free reserved qgroup space at + * transaction commit time. + */ + u32 data_rsv; /* reserved data space needs to be freed */ + u64 data_rsv_refroot; /* which root the reserved data belongs to */ struct ulist *old_roots; }; @@ -252,15 +263,6 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info, u64 ref_root, u64 num_bytes, enum btrfs_qgroup_rsv_type type); -static inline void btrfs_qgroup_free_delayed_ref(struct btrfs_fs_info *fs_info, - u64 ref_root, u64 num_bytes) -{ - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) - return; - trace_btrfs_qgroup_free_delayed_ref(fs_info, ref_root, num_bytes); - btrfs_qgroup_free_refroot(fs_info, ref_root, num_bytes, - BTRFS_QGROUP_RSV_DATA); -} #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid, diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 2887503e4d12..43d3ee1a6544 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -1512,35 +1512,6 @@ DEFINE_EVENT(btrfs__qgroup_rsv_data, btrfs_qgroup_release_data, TP_ARGS(inode, start, len, reserved, op) ); -DECLARE_EVENT_CLASS(btrfs__qgroup_delayed_ref, - - TP_PROTO(const struct btrfs_fs_info *fs_info, - u64 ref_root, u64 reserved), - - TP_ARGS(fs_info, ref_root, reserved), - - TP_STRUCT__entry_btrfs( - __field( u64, ref_root ) - __field( u64, reserved ) - ), - - TP_fast_assign_btrfs(fs_info, - __entry->ref_root = ref_root; - __entry->reserved = reserved; - ), - - TP_printk_btrfs("root=%llu reserved=%llu op=free", - __entry->ref_root, __entry->reserved) -); - -DEFINE_EVENT(btrfs__qgroup_delayed_ref, btrfs_qgroup_free_delayed_ref, - - TP_PROTO(const struct btrfs_fs_info *fs_info, - u64 ref_root, u64 reserved), - - TP_ARGS(fs_info, ref_root, reserved) -); - DECLARE_EVENT_CLASS(btrfs_qgroup_extent, TP_PROTO(const struct btrfs_fs_info *fs_info, const struct btrfs_qgroup_extent_record *rec), From patchwork Wed Jan 23 07:15:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10776629 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A51D114E5 for ; Wed, 23 Jan 2019 07:15:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90B762874A for ; Wed, 23 Jan 2019 07:15:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 851EA2B1A6; Wed, 23 Jan 2019 07:15:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F39282874A for ; Wed, 23 Jan 2019 07:15:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726349AbfAWHPd (ORCPT ); Wed, 23 Jan 2019 02:15:33 -0500 Received: from mx2.suse.de ([195.135.220.15]:47352 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725950AbfAWHPd (ORCPT ); Wed, 23 Jan 2019 02:15:33 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id E1757B08A; Wed, 23 Jan 2019 07:15:30 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: dsterba@suse.cz, Josef Bacik , stable@vger.kernel.org, David Sterba , Filipe Manana Subject: [Patch v5 2/7] btrfs: honor path->skip_locking in backref code Date: Wed, 23 Jan 2019 15:15:13 +0800 Message-Id: <20190123071518.2528-3-wqu@suse.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190123071518.2528-1-wqu@suse.com> References: <20190123071518.2528-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Josef Bacik Qgroups will do the old roots lookup at delayed ref time, which could be while walking down the extent root while running a delayed ref. This should be fine, except we specifically lock eb's in the backref walking code irrespective of path->skip_locking, which deadlocks the system. Fix up the backref code to honor path->skip_locking, nobody will be modifying the commit_root when we're searching so it's completely safe to do. This happens Since fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans"), kernel may lockup with quota enabled. There is one backref trace triggered by snapshot dropping along with write operation in the source subvolume. The example can be reliably reproduced: btrfs-cleaner D 0 4062 2 0x80000000 Call Trace: schedule+0x32/0x90 btrfs_tree_read_lock+0x93/0x130 [btrfs] find_parent_nodes+0x29b/0x1170 [btrfs] btrfs_find_all_roots_safe+0xa8/0x120 [btrfs] btrfs_find_all_roots+0x57/0x70 [btrfs] btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs] btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs] btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs] do_walk_down+0x541/0x5e3 [btrfs] walk_down_tree+0xab/0xe7 [btrfs] btrfs_drop_snapshot+0x356/0x71a [btrfs] btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs] cleaner_kthread+0x12b/0x160 [btrfs] kthread+0x112/0x130 ret_from_fork+0x27/0x50 When dropping snapshots with qgroup enabled, we will trigger backref walk. However such backref walk at that timing is pretty dangerous, as if one of the parent nodes get WRITE locked by other thread, we could cause a dead lock. For example: FS 260 FS 261 (Dropped) node A node B / \ / \ node C node D node E / \ / \ / \ leaf F|leaf G|leaf H|leaf I|leaf J|leaf K The lock sequence would be: Thread A (cleaner) | Thread B (other writer) ----------------------------------------------------------------------- write_lock(B) | write_lock(D) | ^^^ called by walk_down_tree() | | write_lock(A) | write_lock(D) << Stall read_lock(H) << for backref walk | read_lock(D) << lock owner is | the same thread A | so read lock is OK | read_lock(A) << Stall | So thread A hold write lock D, and needs read lock A to unlock. While thread B holds write lock A, while needs lock D to unlock. This will cause a deadlock. This is not only limited to snapshot dropping case. As the backref walk, even only happens on commit trees, is breaking the normal top-down locking order, makes it deadlock prone. Fixes: fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans") CC: stable@vger.kernel.org # 4.19+ Reported-and-tested-by: David Sterba Reported-by: Filipe Manana Reviewed-by: Qu Wenruo Signed-off-by: Josef Bacik [ copy logs and deadlock analysis from Qu's patch ] Signed-off-by: David Sterba --- fs/btrfs/backref.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 78556447e1d5..973e8251b1bf 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -712,7 +712,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info, * read tree blocks and add keys where required. */ static int add_missing_keys(struct btrfs_fs_info *fs_info, - struct preftrees *preftrees) + struct preftrees *preftrees, bool lock) { struct prelim_ref *ref; struct extent_buffer *eb; @@ -737,12 +737,14 @@ static int add_missing_keys(struct btrfs_fs_info *fs_info, free_extent_buffer(eb); return -EIO; } - btrfs_tree_read_lock(eb); + if (lock) + btrfs_tree_read_lock(eb); if (btrfs_header_level(eb) == 0) btrfs_item_key_to_cpu(eb, &ref->key_for_search, 0); else btrfs_node_key_to_cpu(eb, &ref->key_for_search, 0); - btrfs_tree_read_unlock(eb); + if (lock) + btrfs_tree_read_unlock(eb); free_extent_buffer(eb); prelim_ref_insert(fs_info, &preftrees->indirect, ref, NULL); cond_resched(); @@ -1227,7 +1229,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, btrfs_release_path(path); - ret = add_missing_keys(fs_info, &preftrees); + ret = add_missing_keys(fs_info, &preftrees, path->skip_locking == 0); if (ret) goto out; @@ -1288,11 +1290,13 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, ret = -EIO; goto out; } - btrfs_tree_read_lock(eb); + if (!path->skip_locking) + btrfs_tree_read_lock(eb); btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); ret = find_extent_in_eb(eb, bytenr, *extent_item_pos, &eie, ignore_offset); - btrfs_tree_read_unlock_blocking(eb); + if (!path->skip_locking) + btrfs_tree_read_unlock_blocking(eb); free_extent_buffer(eb); if (ret < 0) goto out; From patchwork Wed Jan 23 07:15:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10776627 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9928A14E5 for ; Wed, 23 Jan 2019 07:15:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 83D8C2AA92 for ; Wed, 23 Jan 2019 07:15:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 784502B1A6; Wed, 23 Jan 2019 07:15:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D1BD42AA92 for ; Wed, 23 Jan 2019 07:15:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726647AbfAWHPg (ORCPT ); Wed, 23 Jan 2019 02:15:36 -0500 Received: from mx2.suse.de ([195.135.220.15]:47380 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726435AbfAWHPg (ORCPT ); Wed, 23 Jan 2019 02:15:36 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 68007B06D for ; Wed, 23 Jan 2019 07:15:33 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: dsterba@suse.cz Subject: [Patch v5 3/7] btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots() Date: Wed, 23 Jan 2019 15:15:14 +0800 Message-Id: <20190123071518.2528-4-wqu@suse.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190123071518.2528-1-wqu@suse.com> References: <20190123071518.2528-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Relocation code will drop btrfs_root::reloc_root as soon as merge_reloc_root() finishes. However later qgroup code will need to access btrfs_root::reloc_root after merge_reloc_root() for delayed subtree rescan. So alter the timming of resetting btrfs_root:::reloc_root, make it happens after transaction commit. With this patch, we will introduce a new btrfs_root::state, BTRFS_ROOT_DEAD_RELOC_TREE, to info part of btrfs_root::reloc_tree user that although btrfs_root::reloc_tree is still non-NULL, but still it's not used any more. The lifespan of btrfs_root::reloc tree will become: Old behavior | New ------------------------------------------------------------------------ btrfs_init_reloc_root() --- | btrfs_init_reloc_root() --- set reloc_root | | set reloc_root | | | | | | | merge_reloc_root() | | merge_reloc_root() | |- btrfs_update_reloc_root() --- | |- btrfs_update_reloc_root() -+- clear btrfs_root::reloc_root | set ROOT_DEAD_RELOC_TREE | | record root into dirty | | roots rbtree | | | | reloc_block_group() Or | | btrfs_recover_relocation() | | | After transaction commit | | |- clean_dirty_subvols() --- | clear btrfs_root::reloc_root During ROOT_DEAD_RELOC_TREE set lifespan, the only user of btrfs_root::reloc_tree should be qgroup. Since reloc root needs a longer life-span, this patch will also delay btrfs_drop_snapshot() call. Now btrfs_drop_snapshot() is called in clean_dirty_subvols(). This patch will increase the size of btrfs_root by 16 bytes. Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 15 ++++++++ fs/btrfs/disk-io.c | 1 + fs/btrfs/relocation.c | 86 ++++++++++++++++++++++++++++++++++--------- 3 files changed, 85 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0a68cf7032f5..865ce9531d96 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1195,6 +1195,13 @@ enum { BTRFS_ROOT_MULTI_LOG_TASKS, BTRFS_ROOT_DIRTY, BTRFS_ROOT_DELETING, + + /* + * Reloc tree is orphan, only kept here for qgroup delayed subtree scan + * + * Set for the subvolume tree owning the reloc tree. + */ + BTRFS_ROOT_DEAD_RELOC_TREE, }; /* @@ -1307,6 +1314,14 @@ struct btrfs_root { struct list_head ordered_root; u64 nr_ordered_extents; + /* + * Not empty if this subvolume root has gone through tree block swap + * (relocation) + * + * Will be used by reloc_control::dirty_subvol_roots. + */ + struct list_head reloc_dirty_list; + /* * Number of currently running SEND ioctls to prevent * manipulation with the read-only status via SUBVOL_SETFLAGS diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8da2f380d3c0..bfefa1de0455 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1175,6 +1175,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, INIT_LIST_HEAD(&root->delalloc_root); INIT_LIST_HEAD(&root->ordered_extents); INIT_LIST_HEAD(&root->ordered_root); + INIT_LIST_HEAD(&root->reloc_dirty_list); INIT_LIST_HEAD(&root->logged_list[0]); INIT_LIST_HEAD(&root->logged_list[1]); spin_lock_init(&root->inode_lock); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 272b287f8cf0..70a3e1170e8a 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -162,6 +162,8 @@ struct reloc_control { struct mapping_tree reloc_root_tree; /* list of reloc trees */ struct list_head reloc_roots; + /* list of subvolume trees who get relocated */ + struct list_head dirty_subvol_roots; /* size of metadata reservation for merging reloc trees */ u64 merging_rsv_size; /* size of relocated tree nodes */ @@ -1467,15 +1469,17 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans, struct btrfs_root_item *root_item; int ret; - if (!root->reloc_root) + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) || + !root->reloc_root) goto out; reloc_root = root->reloc_root; root_item = &reloc_root->root_item; + /* root->reloc_root will stay until current relocation finished */ if (fs_info->reloc_ctl->merge_reloc_tree && btrfs_root_refs(root_item) == 0) { - root->reloc_root = NULL; + set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); __del_reloc_root(reloc_root); } @@ -2120,6 +2124,59 @@ static int find_next_key(struct btrfs_path *path, int level, return 1; } +/* + * Helper to insert current subvolume into reloc_control::dirty_subvol_roots + */ +static void insert_dirty_subvol(struct btrfs_trans_handle *trans, + struct reloc_control *rc, + struct btrfs_root *root) +{ + struct btrfs_root *reloc_root = root->reloc_root; + struct btrfs_root_item *reloc_root_item; + u64 root_objectid = root->root_key.objectid; + + /* @root must be a subvolume tree root with a valid reloc tree */ + ASSERT(root_objectid != BTRFS_TREE_RELOC_OBJECTID); + ASSERT(reloc_root); + + reloc_root_item = &reloc_root->root_item; + memset(&reloc_root_item->drop_progress, 0, + sizeof(reloc_root_item->drop_progress)); + reloc_root_item->drop_level = 0; + btrfs_set_root_refs(reloc_root_item, 0); + btrfs_update_reloc_root(trans, root); + + if (list_empty(&root->reloc_dirty_list)) { + btrfs_grab_fs_root(root); + list_add_tail(&root->reloc_dirty_list, &rc->dirty_subvol_roots); + } + return; +} + +static int clean_dirty_subvols(struct reloc_control *rc) +{ + struct btrfs_root *root; + struct btrfs_root *next; + int ret = 0; + int tmp_ret; + + list_for_each_entry_safe(root, next, &rc->dirty_subvol_roots, + reloc_dirty_list) { + struct btrfs_root *reloc_root = root->reloc_root; + + clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); + list_del_init(&root->reloc_dirty_list); + root->reloc_root = NULL; + if (reloc_root) { + tmp_ret = btrfs_drop_snapshot(reloc_root, NULL, 0, 1); + if (tmp_ret < 0 && !ret) + ret = tmp_ret; + } + btrfs_put_fs_root(root); + } + return ret; +} + /* * merge the relocated tree blocks in reloc tree with corresponding * fs tree. @@ -2259,13 +2316,8 @@ static noinline_for_stack int merge_reloc_root(struct reloc_control *rc, out: btrfs_free_path(path); - if (err == 0) { - memset(&root_item->drop_progress, 0, - sizeof(root_item->drop_progress)); - root_item->drop_level = 0; - btrfs_set_root_refs(root_item, 0); - btrfs_update_reloc_root(trans, root); - } + if (err == 0) + insert_dirty_subvol(trans, rc, root); if (trans) btrfs_end_transaction_throttle(trans); @@ -2410,14 +2462,6 @@ void merge_reloc_roots(struct reloc_control *rc) } else { list_del_init(&reloc_root->root_list); } - - ret = btrfs_drop_snapshot(reloc_root, rc->block_rsv, 0, 1); - if (ret < 0) { - if (list_empty(&reloc_root->root_list)) - list_add_tail(&reloc_root->root_list, - &reloc_roots); - goto out; - } } if (found) { @@ -4079,6 +4123,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) goto out_free; } btrfs_commit_transaction(trans); + ret = clean_dirty_subvols(rc); + if (ret < 0 && !err) + err = ret; out_free: btrfs_free_block_rsv(fs_info, rc->block_rsv); btrfs_free_path(path); @@ -4173,6 +4220,7 @@ static struct reloc_control *alloc_reloc_control(void) return NULL; INIT_LIST_HEAD(&rc->reloc_roots); + INIT_LIST_HEAD(&rc->dirty_subvol_roots); backref_cache_init(&rc->backref_cache); mapping_tree_init(&rc->reloc_root_tree); extent_io_tree_init(&rc->processed_blocks, NULL); @@ -4468,6 +4516,10 @@ int btrfs_recover_relocation(struct btrfs_root *root) goto out_free; } err = btrfs_commit_transaction(trans); + + ret = clean_dirty_subvols(rc); + if (ret < 0 && !err) + err = ret; out_free: kfree(rc); out: From patchwork Wed Jan 23 07:15:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10776633 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 229A31515 for ; Wed, 23 Jan 2019 07:15:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0DB382874A for ; Wed, 23 Jan 2019 07:15:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 028782AA92; Wed, 23 Jan 2019 07:15:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 974522B852 for ; Wed, 23 Jan 2019 07:15:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726682AbfAWHPj (ORCPT ); Wed, 23 Jan 2019 02:15:39 -0500 Received: from mx2.suse.de ([195.135.220.15]:47404 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726611AbfAWHPi (ORCPT ); Wed, 23 Jan 2019 02:15:38 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 449C5B088 for ; Wed, 23 Jan 2019 07:15:36 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: dsterba@suse.cz Subject: [Patch v5 4/7] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap() Date: Wed, 23 Jan 2019 15:15:15 +0800 Message-Id: <20190123071518.2528-5-wqu@suse.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190123071518.2528-1-wqu@suse.com> References: <20190123071518.2528-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Refactor btrfs_qgroup_trace_subtree_swap() into qgroup_trace_subtree_swap(), which only needs two extent buffer and some other bool to control the behavior. This provides the basis for later delayed subtree scan work. Signed-off-by: Qu Wenruo --- fs/btrfs/qgroup.c | 78 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 57 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index f214b490d80c..565bb661210f 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2023,6 +2023,60 @@ static int qgroup_trace_new_subtree_blocks(struct btrfs_trans_handle* trans, return ret; } +static int qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans, + struct extent_buffer *src_eb, + struct extent_buffer *dst_eb, + u64 last_snapshot, bool trace_leaf) +{ + struct btrfs_fs_info *fs_info = trans->fs_info; + struct btrfs_path *dst_path = NULL; + int level; + int ret; + + if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + return 0; + + /* Wrong parameter order */ + if (btrfs_header_generation(src_eb) > btrfs_header_generation(dst_eb)) { + btrfs_err_rl(fs_info, + "%s: bad parameter order, src_gen=%llu dst_gen=%llu", __func__, + btrfs_header_generation(src_eb), + btrfs_header_generation(dst_eb)); + return -EUCLEAN; + } + + if (!extent_buffer_uptodate(src_eb) || + !extent_buffer_uptodate(dst_eb)) { + ret = -EIO; + goto out; + } + + level = btrfs_header_level(dst_eb); + dst_path = btrfs_alloc_path(); + if (!dst_path) { + ret = -ENOMEM; + goto out; + } + /* For dst_path */ + extent_buffer_get(dst_eb); + dst_path->nodes[level] = dst_eb; + dst_path->slots[level] = 0; + dst_path->locks[level] = 0; + + /* Do the generation aware breadth-first search */ + ret = qgroup_trace_new_subtree_blocks(trans, src_eb, dst_path, level, + level, last_snapshot, trace_leaf); + if (ret < 0) + goto out; + ret = 0; + +out: + btrfs_free_path(dst_path); + if (ret < 0) + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; + return ret; +} + /* * Inform qgroup to trace subtree swap used in balance. * @@ -2048,14 +2102,12 @@ int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans, u64 last_snapshot) { struct btrfs_fs_info *fs_info = trans->fs_info; - struct btrfs_path *dst_path = NULL; struct btrfs_key first_key; struct extent_buffer *src_eb = NULL; struct extent_buffer *dst_eb = NULL; bool trace_leaf = false; u64 child_gen; u64 child_bytenr; - int level; int ret; if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) @@ -2106,22 +2158,9 @@ int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans, goto out; } - level = btrfs_header_level(dst_eb); - dst_path = btrfs_alloc_path(); - if (!dst_path) { - ret = -ENOMEM; - goto out; - } - - /* For dst_path */ - extent_buffer_get(dst_eb); - dst_path->nodes[level] = dst_eb; - dst_path->slots[level] = 0; - dst_path->locks[level] = 0; - - /* Do the generation-aware breadth-first search */ - ret = qgroup_trace_new_subtree_blocks(trans, src_eb, dst_path, level, - level, last_snapshot, trace_leaf); + /* Do the generation aware breadth-first search */ + ret = qgroup_trace_subtree_swap(trans, src_eb, dst_eb, last_snapshot, + trace_leaf); if (ret < 0) goto out; ret = 0; @@ -2129,9 +2168,6 @@ int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans, out: free_extent_buffer(src_eb); free_extent_buffer(dst_eb); - btrfs_free_path(dst_path); - if (ret < 0) - fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; return ret; } From patchwork Wed Jan 23 07:15:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10776635 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A3CC914E5 for ; Wed, 23 Jan 2019 07:15:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8B3002874A for ; Wed, 23 Jan 2019 07:15:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7FC712B03E; Wed, 23 Jan 2019 07:15:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 935762874A for ; Wed, 23 Jan 2019 07:15:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726761AbfAWHPo (ORCPT ); Wed, 23 Jan 2019 02:15:44 -0500 Received: from mx2.suse.de ([195.135.220.15]:47410 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726228AbfAWHPm (ORCPT ); Wed, 23 Jan 2019 02:15:42 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8E582B06D for ; Wed, 23 Jan 2019 07:15:39 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: dsterba@suse.cz Subject: [Patch v5 5/7] btrfs: qgroup: Introduce per-root swapped blocks infrastructure Date: Wed, 23 Jan 2019 15:15:16 +0800 Message-Id: <20190123071518.2528-6-wqu@suse.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190123071518.2528-1-wqu@suse.com> References: <20190123071518.2528-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP To allow delayed subtree swap rescan, btrfs needs to record per-root info about which tree blocks get swapped. So this patch introduces per-root btrfs_qgroup_swapped_blocks structure, which records which tree blocks get swapped. The designed workflow will be: 1) Record the subtree root block get swapped. During subtree swap: O = Old tree blocks N = New tree blocks reloc tree subvolume tree X Root Root / \ / \ NA OB OA OB / | | \ / | | \ NC ND OE OF OC OD OE OF In these case, NA and OA is going to be swapped, record (NA, OA) into subvolume tree X. 2) After subtree swap. reloc tree subvolume tree X Root Root / \ / \ OA OB NA OB / | | \ / | | \ OC OD OE OF NC ND OE OF 3a) CoW happens for OB If we are going to CoW tree block OB, we check OB's bytenr against tree X's swapped_blocks structure. It doesn't fit any one, nothing will happen. 3b) CoW happens for NA Check NA's bytenr against tree X's swapped_blocks, and get a hit. Then we do subtree scan on both subtree OA and NA. Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND). Then no matter what we do to subvolume tree X, qgroup numbers will still be correct. Then NA's record get removed from X's swapped_blocks. 4) Transaction commit Any record in X's swapped_blocks get removed, since there is no modification to swapped subtrees, no need to trigger heavy qgroup subtree rescan for them. This will introduce 128 bytes overhead for each btrfs_root even qgroup is not enabled. Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 14 ++++ fs/btrfs/disk-io.c | 1 + fs/btrfs/qgroup.c | 150 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/qgroup.h | 92 +++++++++++++++++++++++++ fs/btrfs/relocation.c | 7 ++ fs/btrfs/transaction.c | 1 + 6 files changed, 265 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 865ce9531d96..603c3a3eed83 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1204,6 +1204,17 @@ enum { BTRFS_ROOT_DEAD_RELOC_TREE, }; +/* + * Record swapped tree blocks of a subvolume tree for delayed subtree trace + * code. For detail check comment in fs/btrfs/qgroup.c. + */ +struct btrfs_qgroup_swapped_blocks { + spinlock_t lock; + /* RM_EMPTY_ROOT() of above blocks[] */ + bool swapped; + struct rb_root blocks[BTRFS_MAX_LEVEL]; +}; + /* * in ram representation of the tree. extent_root is used for all allocations * and for the extent tree extent_root root. @@ -1339,6 +1350,9 @@ struct btrfs_root { /* Number of active swapfiles */ atomic_t nr_swapfiles; + /* Record pairs of swapped blocks for qgroup */ + struct btrfs_qgroup_swapped_blocks swapped_blocks; + #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS u64 alloc_bytenr; #endif diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index bfefa1de0455..31b2facdfc1e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1219,6 +1219,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, root->anon_dev = 0; spin_lock_init(&root->root_item_lock); + btrfs_qgroup_init_swapped_blocks(&root->swapped_blocks); } static struct btrfs_root *btrfs_alloc_root(struct btrfs_fs_info *fs_info, diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 565bb661210f..2ac32cad5e0c 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -3830,3 +3830,153 @@ void btrfs_qgroup_check_reserved_leak(struct inode *inode) } extent_changeset_release(&changeset); } + +void btrfs_qgroup_init_swapped_blocks( + struct btrfs_qgroup_swapped_blocks *swapped_blocks) +{ + int i; + + spin_lock_init(&swapped_blocks->lock); + for (i = 0; i < BTRFS_MAX_LEVEL; i++) + swapped_blocks->blocks[i] = RB_ROOT; + swapped_blocks->swapped = false; +} + +/* + * Delete all swapped blocks record of @root. + * Every record here means we skipped a full subtree scan for qgroup. + * + * Get called when commit one transaction. + */ +void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root) +{ + struct btrfs_qgroup_swapped_blocks *swapped_blocks; + int i; + + swapped_blocks = &root->swapped_blocks; + + spin_lock(&swapped_blocks->lock); + if (!swapped_blocks->swapped) + goto out; + for (i = 0; i < BTRFS_MAX_LEVEL; i++) { + struct rb_root *cur_root = &swapped_blocks->blocks[i]; + struct btrfs_qgroup_swapped_block *entry; + struct btrfs_qgroup_swapped_block *next; + + rbtree_postorder_for_each_entry_safe(entry, next, cur_root, + node) + kfree(entry); + swapped_blocks->blocks[i] = RB_ROOT; + } + swapped_blocks->swapped = false; +out: + spin_unlock(&swapped_blocks->lock); +} + +/* + * Adding subtree roots record into @subvol_root. + * + * @subvol_root: tree root of the subvolume tree get swapped + * @bg: block group under balance + * @subvol_parent/slot: pointer to the subtree root in subvolume tree + * @reloc_parent/slot: pointer to the subtree root in reloc tree + * BOTH POINTERS ARE BEFORE TREE SWAP + * @last_snapshot: last snapshot generation of the subvolume tree + */ +int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans, + struct btrfs_root *subvol_root, + struct btrfs_block_group_cache *bg, + struct extent_buffer *subvol_parent, int subvol_slot, + struct extent_buffer *reloc_parent, int reloc_slot, + u64 last_snapshot) +{ + int level = btrfs_header_level(subvol_parent) - 1; + struct btrfs_qgroup_swapped_blocks *blocks = &subvol_root->swapped_blocks; + struct btrfs_fs_info *fs_info = subvol_root->fs_info; + struct btrfs_qgroup_swapped_block *block; + struct rb_node **cur; + struct rb_node *parent = NULL; + int ret = 0; + + if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + return 0; + + if (btrfs_node_ptr_generation(subvol_parent, subvol_slot) > + btrfs_node_ptr_generation(reloc_parent, reloc_slot)) { + btrfs_err_rl(fs_info, + "%s: bad parameter order, subvol_gen=%llu reloc_gen=%llu", + __func__, + btrfs_node_ptr_generation(subvol_parent, subvol_slot), + btrfs_node_ptr_generation(reloc_parent, reloc_slot)); + return -EUCLEAN; + } + + block = kmalloc(sizeof(*block), GFP_NOFS); + if (!block) { + ret = -ENOMEM; + goto out; + } + + /* + * @reloc_parent/slot is still before swap, while @block is going to + * record the bytenr after swap, so we do the swap here. + */ + block->subvol_bytenr = btrfs_node_blockptr(reloc_parent, reloc_slot); + block->subvol_generation = btrfs_node_ptr_generation(reloc_parent, + reloc_slot); + block->reloc_bytenr = btrfs_node_blockptr(subvol_parent, subvol_slot); + block->reloc_generation = btrfs_node_ptr_generation(subvol_parent, + subvol_slot); + block->last_snapshot = last_snapshot; + block->level = level; + if (bg->flags & BTRFS_BLOCK_GROUP_DATA) + block->trace_leaf = true; + else + block->trace_leaf = false; + btrfs_node_key_to_cpu(reloc_parent, &block->first_key, reloc_slot); + + /* Insert @block into @blocks */ + spin_lock(&blocks->lock); + cur = &blocks->blocks[level].rb_node; + while (*cur) { + struct btrfs_qgroup_swapped_block *entry; + + parent = *cur; + entry = rb_entry(parent, struct btrfs_qgroup_swapped_block, + node); + + if (entry->subvol_bytenr < block->subvol_bytenr) { + cur = &(*cur)->rb_left; + } else if (entry->subvol_bytenr > block->subvol_bytenr) { + cur = &(*cur)->rb_right; + } else { + if (entry->subvol_generation != + block->subvol_generation || + entry->reloc_bytenr != block->reloc_bytenr || + entry->reloc_generation != + block->reloc_generation) { + /* + * Duplicated but mismatch entry found. + * Shouldn't happen. + * + * Marking qgroup inconsistent should be enough + * for end users. + */ + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); + ret = -EEXIST; + } + kfree(block); + goto out_unlock; + } + } + rb_link_node(&block->node, parent, cur); + rb_insert_color(&block->node, &blocks->blocks[level]); + blocks->swapped = true; +out_unlock: + spin_unlock(&blocks->lock); +out: + if (ret < 0) + fs_info->qgroup_flags |= + BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; + return ret; +} diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index d4fae53969d4..539528d6c1c1 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -6,6 +6,8 @@ #ifndef BTRFS_QGROUP_H #define BTRFS_QGROUP_H +#include +#include #include "ulist.h" #include "delayed-ref.h" @@ -37,6 +39,66 @@ * Normally at qgroup rescan and transaction commit time. */ +/* + * Special performance optimization for balance. + * + * For balance, we need to swap subtree of subvolume and reloc tree. + * In theory, we need to trace all subtree blocks of both subvolume and reloc + * tree, since their owner has changed during such swap. + * + * However since balance has ensured that both subtrees are containing the + * same contents and have the same tree structures, such swap won't cause + * qgroup number change. + * + * But there is a race window between subtree swap and transaction commit, + * during that window, if we increase/decrease tree level or merge/split tree + * blocks, we still needs to trace original subtrees. + * + * So for balance, we use a delayed subtree trace, whose workflow is: + * + * 1) Record the subtree root block get swapped. + * + * During subtree swap: + * O = Old tree blocks + * N = New tree blocks + * reloc tree subvolume tree X + * Root Root + * / \ / \ + * NA OB OA OB + * / | | \ / | | \ + * NC ND OE OF OC OD OE OF + * + * In these case, NA and OA is going to be swapped, record (NA, OA) into + * subvolume tree X. + * + * 2) After subtree swap. + * reloc tree subvolume tree X + * Root Root + * / \ / \ + * OA OB NA OB + * / | | \ / | | \ + * OC OD OE OF NC ND OE OF + * + * 3a) CoW happens for OB + * If we are going to CoW tree block OB, we check OB's bytenr against + * tree X's swapped_blocks structure. + * It doesn't fit any one, nothing will happen. + * + * 3b) CoW happens for NA + * Check NA's bytenr against tree X's swapped_blocks, and get a hit. + * Then we do subtree scan on both subtree OA and NA. + * Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND). + * + * Then no matter what we do to subvolume tree X, qgroup numbers will + * still be correct. + * Then NA's record get removed from X's swapped_blocks. + * + * 4) Transaction commit + * Any record in X's swapped_blocks get removed, since there is no + * modification to swapped subtrees, no need to trigger heavy qgroup + * subtree rescan for them. + */ + /* * Record a dirty extent, and info qgroup to update quota on it * TODO: Use kmem cache to alloc it. @@ -59,6 +121,24 @@ struct btrfs_qgroup_extent_record { struct ulist *old_roots; }; +struct btrfs_qgroup_swapped_block { + struct rb_node node; + + int level; + bool trace_leaf; + + /* bytenr/generation of the tree block in subvolume tree after swap */ + u64 subvol_bytenr; + u64 subvol_generation; + + /* bytenr/generation of the tree block in reloc tree after swap */ + u64 reloc_bytenr; + u64 reloc_generation; + + u64 last_snapshot; + struct btrfs_key first_key; +}; + /* * Qgroup reservation types: * @@ -327,4 +407,16 @@ void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes); void btrfs_qgroup_check_reserved_leak(struct inode *inode); +/* btrfs_qgroup_swapped_blocks related functions */ +void btrfs_qgroup_init_swapped_blocks( + struct btrfs_qgroup_swapped_blocks *swapped_blocks); + +void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root); +int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans, + struct btrfs_root *subvol_root, + struct btrfs_block_group_cache *bg, + struct extent_buffer *subvol_parent, int subvol_slot, + struct extent_buffer *reloc_parent, int reloc_slot, + u64 last_snapshot); + #endif diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 70a3e1170e8a..cc55249eadb1 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1898,6 +1898,13 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, if (ret < 0) break; + btrfs_node_key_to_cpu(parent, &first_key, slot); + ret = btrfs_qgroup_add_swapped_blocks(trans, dest, + rc->block_group, parent, slot, + path->nodes[level], path->slots[level], + last_snapshot); + if (ret < 0) + break; /* * swap blocks in fs tree and reloc tree. */ diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 127fa1535f58..22b0dacae003 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -122,6 +122,7 @@ static noinline void switch_commit_roots(struct btrfs_transaction *trans) if (is_fstree(root->root_key.objectid)) btrfs_unpin_free_ino(root); clear_btree_io_tree(&root->dirty_log_pages); + btrfs_qgroup_clean_swapped_blocks(root); } /* We can free old roots now. */ From patchwork Wed Jan 23 07:15:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10776637 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A06DC14E5 for ; Wed, 23 Jan 2019 07:15:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8B2F62874A for ; Wed, 23 Jan 2019 07:15:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7FBE72B03E; Wed, 23 Jan 2019 07:15:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E471D2874A for ; Wed, 23 Jan 2019 07:15:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726772AbfAWHPp (ORCPT ); Wed, 23 Jan 2019 02:15:45 -0500 Received: from mx2.suse.de ([195.135.220.15]:47444 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726611AbfAWHPo (ORCPT ); Wed, 23 Jan 2019 02:15:44 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 9CC83B055 for ; Wed, 23 Jan 2019 07:15:42 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: dsterba@suse.cz Subject: [Patch v5 6/7] btrfs: qgroup: Use delayed subtree rescan for balance Date: Wed, 23 Jan 2019 15:15:17 +0800 Message-Id: <20190123071518.2528-7-wqu@suse.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190123071518.2528-1-wqu@suse.com> References: <20190123071518.2528-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Before this patch, qgroup code trace the whole subtree of subvolume and reloc trees unconditionally. This makes qgroup numbers consistent, but it could cause tons of unnecessary extent trace, which cause a lot of overhead. However for subtree swap of balance, since both subtree contains the same content and tree structures, just swap them won't change qgroup numbers. It's the race window between subtree swap and transaction commit could cause qgroup number change. This patch will delay the qgroup subtree scan until CoW happens for the subtree root. So if there is no other operations for the fs, balance won't cause extra qgroup overhead. (best case scenario) And depends on the workload, most of the subtree scan can still be avoided. Only for worst case scenario, it will fall back to old subtree swap overhead. (scan all swapped subtrees) [[Benchmark]] Hardware: VM 4G vRAM, 8 vCPUs, disk is using 'unsafe' cache mode, backing device is SAMSUNG 850 evo SSD. Host has 16G ram. Mkfs parameter: --nodesize 4K (To bump up tree size) Initial subvolume contents: 4G data copied from /usr and /lib. (With enough regular small files) Snapshots: 16 snapshots of the original subvolume. each snapshot has 3 random files modified. balance parameter: -m So the content should be pretty similar to a real world root fs layout. And after file system population, there is no other activity, so it should be the best case scenario. | v4.20-rc1 | w/ patchset | diff ----------------------------------------------------------------------- relocated extents | 22615 | 22457 | -0.1% qgroup dirty extents | 163457 | 121606 | -25.6% time (sys) | 22.884s | 18.842s | -17.6% time (real) | 27.724s | 22.884s | -17.5% Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.c | 8 ++++ fs/btrfs/qgroup.c | 88 +++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/qgroup.h | 2 + fs/btrfs/relocation.c | 14 +++---- 4 files changed, 103 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index d92462fe66c8..ed28aa7c5f5c 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -13,6 +13,7 @@ #include "print-tree.h" #include "locking.h" #include "volumes.h" +#include "qgroup.h" static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, int level); @@ -1465,6 +1466,13 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle *trans, btrfs_set_lock_blocking(parent); btrfs_set_lock_blocking(buf); + /* + * Before CoWing this block for later modification, check if it's + * the subtree root and do the delayed subtree trace if needed. + * + * Also We don't care about the error, as it's handled internally. + */ + btrfs_qgroup_trace_subtree_after_cow(trans, root, buf); ret = __btrfs_cow_block(trans, root, buf, parent, parent_slot, cow_ret, search_start, 0); diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 2ac32cad5e0c..a518dcae83b4 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -3980,3 +3980,91 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans, BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; return ret; } + +/* + * Check if the tree block is a subtree root, and if so do the needed + * delayed subtree trace for qgroup. + * + * This is called during btrfs_cow_block(). + */ +int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, + struct btrfs_root *root, + struct extent_buffer *subvol_eb) +{ + struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_qgroup_swapped_blocks *blocks = &root->swapped_blocks; + struct btrfs_qgroup_swapped_block *block; + struct extent_buffer *reloc_eb = NULL; + struct rb_node *node; + bool found = false; + bool swapped = false; + int level = btrfs_header_level(subvol_eb); + int ret = 0; + int i; + + if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + return 0; + if (!is_fstree(root->root_key.objectid) || !root->reloc_root) + return 0; + + spin_lock(&blocks->lock); + if (!blocks->swapped) { + spin_unlock(&blocks->lock); + return 0; + } + node = blocks->blocks[level].rb_node; + + while (node) { + block = rb_entry(node, struct btrfs_qgroup_swapped_block, node); + if (block->subvol_bytenr < subvol_eb->start) { + node = node->rb_left; + } else if (block->subvol_bytenr > subvol_eb->start) { + node = node->rb_right; + } else { + found = true; + break; + } + } + if (!found) { + spin_unlock(&blocks->lock); + goto out; + } + /* Found one, remove it from @blocks first and update blocks->swapped */ + rb_erase(&block->node, &blocks->blocks[level]); + for (i = 0; i < BTRFS_MAX_LEVEL; i++) { + if (RB_EMPTY_ROOT(&blocks->blocks[i])) { + swapped = true; + break; + } + } + blocks->swapped = swapped; + spin_unlock(&blocks->lock); + + /* Read out reloc subtree root */ + reloc_eb = read_tree_block(fs_info, block->reloc_bytenr, + block->reloc_generation, block->level, + &block->first_key); + if (IS_ERR(reloc_eb)) { + ret = PTR_ERR(subvol_eb); + reloc_eb = NULL; + goto free_out; + } + if (!extent_buffer_uptodate(reloc_eb)) { + ret = -EIO; + goto free_out; + } + + ret = qgroup_trace_subtree_swap(trans, reloc_eb, subvol_eb, + block->last_snapshot, block->trace_leaf); +free_out: + kfree(block); + free_extent_buffer(reloc_eb); +out: + if (ret < 0) { + btrfs_err_rl(fs_info, + "failed to account subtree at bytenr %llu: %d", + subvol_eb->start, ret); + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; + } + return ret; +} diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 539528d6c1c1..ad3ab67c7aad 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -418,5 +418,7 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans, struct extent_buffer *subvol_parent, int subvol_slot, struct extent_buffer *reloc_parent, int reloc_slot, u64 last_snapshot); +int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, + struct btrfs_root *root, struct extent_buffer *eb); #endif diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index cc55249eadb1..c91caeb58035 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1889,16 +1889,12 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, * If not traced, we will leak data numbers * 2) Fs subtree * If not traced, we will double count old data - * and tree block numbers, if current trans doesn't free - * data reloc tree inode. + * + * We don't scan the subtree right now, but only record + * the swapped tree blocks. + * The real subtree rescan is delayed until we have new + * CoW on the subtree root node before transaction commit. */ - ret = btrfs_qgroup_trace_subtree_swap(trans, rc->block_group, - parent, slot, path->nodes[level], - path->slots[level], last_snapshot); - if (ret < 0) - break; - - btrfs_node_key_to_cpu(parent, &first_key, slot); ret = btrfs_qgroup_add_swapped_blocks(trans, dest, rc->block_group, parent, slot, path->nodes[level], path->slots[level], From patchwork Wed Jan 23 07:15:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10776641 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F26C14E5 for ; Wed, 23 Jan 2019 07:15:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8AAEF2874A for ; Wed, 23 Jan 2019 07:15:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7EC8E2B03E; Wed, 23 Jan 2019 07:15:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0AD282874A for ; Wed, 23 Jan 2019 07:15:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726804AbfAWHPs (ORCPT ); Wed, 23 Jan 2019 02:15:48 -0500 Received: from mx2.suse.de ([195.135.220.15]:47468 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726783AbfAWHPr (ORCPT ); Wed, 23 Jan 2019 02:15:47 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id A8F8BB06D for ; Wed, 23 Jan 2019 07:15:45 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: dsterba@suse.cz Subject: [Patch v5 7/7] btrfs: qgroup: Cleanup old subtree swap code Date: Wed, 23 Jan 2019 15:15:18 +0800 Message-Id: <20190123071518.2528-8-wqu@suse.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190123071518.2528-1-wqu@suse.com> References: <20190123071518.2528-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Since it's replaced by new delayed subtree swap code, remove the original code. The cleanup is small since most of its core function is still used by delayed subtree swap trace. Signed-off-by: Qu Wenruo --- fs/btrfs/qgroup.c | 94 ----------------------------------------------- fs/btrfs/qgroup.h | 6 --- 2 files changed, 100 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index a518dcae83b4..36359c0bb361 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2077,100 +2077,6 @@ static int qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans, return ret; } -/* - * Inform qgroup to trace subtree swap used in balance. - * - * Unlike btrfs_qgroup_trace_subtree(), this function will only trace - * new tree blocks whose generation is equal to (or larger than) @last_snapshot. - * - * Will go down the tree block pointed by @dst_eb (pointed by @dst_parent and - * @dst_slot), and find any tree blocks whose generation is at @last_snapshot, - * and then go down @src_eb (pointed by @src_parent and @src_slot) to find - * the counterpart of the tree block, then mark both tree blocks as qgroup dirty, - * and skip all tree blocks whose generation is smaller than last_snapshot. - * - * This would skip tons of tree blocks of original btrfs_qgroup_trace_subtree(), - * which could be the cause of very slow balance if the file tree is large. - * - * @src_parent, @src_slot: pointer to src (file tree) eb. - * @dst_parent, @dst_slot: pointer to dst (reloc tree) eb. - */ -int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans, - struct btrfs_block_group_cache *bg_cache, - struct extent_buffer *src_parent, int src_slot, - struct extent_buffer *dst_parent, int dst_slot, - u64 last_snapshot) -{ - struct btrfs_fs_info *fs_info = trans->fs_info; - struct btrfs_key first_key; - struct extent_buffer *src_eb = NULL; - struct extent_buffer *dst_eb = NULL; - bool trace_leaf = false; - u64 child_gen; - u64 child_bytenr; - int ret; - - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) - return 0; - - /* Check parameter order */ - if (btrfs_node_ptr_generation(src_parent, src_slot) > - btrfs_node_ptr_generation(dst_parent, dst_slot)) { - btrfs_err_rl(fs_info, - "%s: bad parameter order, src_gen=%llu dst_gen=%llu", __func__, - btrfs_node_ptr_generation(src_parent, src_slot), - btrfs_node_ptr_generation(dst_parent, dst_slot)); - return -EUCLEAN; - } - - /* - * Only trace leaf if we're relocating data block groups, this could - * reduce tons of data extents tracing for meta/sys bg relocation. - */ - if (bg_cache->flags & BTRFS_BLOCK_GROUP_DATA) - trace_leaf = true; - /* Read out real @src_eb, pointed by @src_parent and @src_slot */ - child_bytenr = btrfs_node_blockptr(src_parent, src_slot); - child_gen = btrfs_node_ptr_generation(src_parent, src_slot); - btrfs_node_key_to_cpu(src_parent, &first_key, src_slot); - - src_eb = read_tree_block(fs_info, child_bytenr, child_gen, - btrfs_header_level(src_parent) - 1, &first_key); - if (IS_ERR(src_eb)) { - ret = PTR_ERR(src_eb); - goto out; - } - - /* Read out real @dst_eb, pointed by @src_parent and @src_slot */ - child_bytenr = btrfs_node_blockptr(dst_parent, dst_slot); - child_gen = btrfs_node_ptr_generation(dst_parent, dst_slot); - btrfs_node_key_to_cpu(dst_parent, &first_key, dst_slot); - - dst_eb = read_tree_block(fs_info, child_bytenr, child_gen, - btrfs_header_level(dst_parent) - 1, &first_key); - if (IS_ERR(dst_eb)) { - ret = PTR_ERR(dst_eb); - goto out; - } - - if (!extent_buffer_uptodate(src_eb) || !extent_buffer_uptodate(dst_eb)) { - ret = -EINVAL; - goto out; - } - - /* Do the generation aware breadth-first search */ - ret = qgroup_trace_subtree_swap(trans, src_eb, dst_eb, last_snapshot, - trace_leaf); - if (ret < 0) - goto out; - ret = 0; - -out: - free_extent_buffer(src_eb); - free_extent_buffer(dst_eb); - return ret; -} - int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans, struct extent_buffer *root_eb, u64 root_gen, int root_level) diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index ad3ab67c7aad..7f7d421e8dc3 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -327,12 +327,6 @@ int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans, int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans, struct extent_buffer *root_eb, u64 root_gen, int root_level); - -int btrfs_qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans, - struct btrfs_block_group_cache *bg_cache, - struct extent_buffer *src_parent, int src_slot, - struct extent_buffer *dst_parent, int dst_slot, - u64 last_snapshot); int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr, u64 num_bytes, struct ulist *old_roots, struct ulist *new_roots);