From patchwork Wed Oct 26 10:07:33 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Xiaoguang Wang X-Patchwork-Id: 9396517 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 94EC360231 for ; Wed, 26 Oct 2016 10:14:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D0D1299E7 for ; Wed, 26 Oct 2016 10:14:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4161029A14; Wed, 26 Oct 2016 10:14:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4C52B299E7 for ; Wed, 26 Oct 2016 10:14:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754202AbcJZKOG (ORCPT ); Wed, 26 Oct 2016 06:14:06 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:32666 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753567AbcJZKOE (ORCPT ); Wed, 26 Oct 2016 06:14:04 -0400 X-IronPort-AV: E=Sophos;i="5.20,367,1444665600"; d="scan'208";a="927869" Received: from unknown (HELO cn.fujitsu.com) ([10.167.250.3]) by song.cn.fujitsu.com with ESMTP; 26 Oct 2016 18:14:00 +0800 Received: from localhost.localdomain (unknown [10.167.226.107]) by cn.fujitsu.com (Postfix) with ESMTP id A7C3940142D3; Wed, 26 Oct 2016 18:13:56 +0800 (CST) From: Wang Xiaoguang To: linux-btrfs@vger.kernel.org Cc: dsterba@suse.com Subject: [PATCH v3] btrfs: imporve delayed refs iterations Date: Wed, 26 Oct 2016 18:07:33 +0800 Message-Id: <20161026100733.7262-1-wangxg.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.9.0 MIME-Version: 1.0 X-yoursite-MailScanner-ID: A7C3940142D3.A35C0 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: wangxg.fnst@cn.fujitsu.com Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This issue was found when I tried to delete a heavily reflinked file, when deleting such files, other transaction operation will not have a chance to make progress, for example, start_transaction() will blocked in wait_current_trans(root) for long time, sometimes it even triggers soft lockups, and the time taken to delete such heavily reflinked file is also very large, often hundreds of seconds. Using perf top, it reports that: PerfTop: 7416 irqs/sec kernel:99.8% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) --------------------------------------------------------------------------------------- 84.37% [btrfs] [k] __btrfs_run_delayed_refs.constprop.80 11.02% [kernel] [k] delay_tsc 0.79% [kernel] [k] _raw_spin_unlock_irq 0.78% [kernel] [k] _raw_spin_unlock_irqrestore 0.45% [kernel] [k] do_raw_spin_lock 0.18% [kernel] [k] __slab_alloc It seems __btrfs_run_delayed_refs() took most cpu time, after some debug work, I found it's select_delayed_ref() causing this issue, for a delayed head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but select_delayed_ref() will firstly try to iterate node list to find BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and waste much time. To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head, then in select_delayed_ref(), if this list is not empty, we can directly use nodes in this list. With this patch, it just took about 10~15 seconds to delte the same file. Now using perf top, it reports that: PerfTop: 2734 irqs/sec kernel:99.5% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) ---------------------------------------------------------------------------------------- 20.74% [kernel] [k] _raw_spin_unlock_irqrestore 16.33% [kernel] [k] __slab_alloc 5.41% [kernel] [k] lock_acquired 4.42% [kernel] [k] lock_acquire 4.05% [kernel] [k] lock_release 3.37% [kernel] [k] _raw_spin_unlock_irq For normal files, this patch also gives help, at least we do not need to iterate whole list to found BTRFS_ADD_DELAYED_REF nodes. Signed-off-by: Wang Xiaoguang Reviewed-by: Liu Bo Tested-by: Holger Hoffstätte --- V2: remove useless "if" statment and use ASSERT to ensure that if a delayed ref's action is BTRFS_DROP_DELAYED_REF, it should not be in ref_add_list. V3: Use list_first_entry() and add some ASSERTs for btrfs_delayed_ref_node's action member. --- fs/btrfs/delayed-ref.c | 17 +++++++++++++++++ fs/btrfs/delayed-ref.h | 8 ++++++++ fs/btrfs/disk-io.c | 2 ++ fs/btrfs/extent-tree.c | 15 +++++++++------ 4 files changed, 36 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 8d93854..eaec19b 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -189,6 +189,8 @@ static inline void drop_delayed_ref(struct btrfs_trans_handle *trans, } else { assert_spin_locked(&head->lock); list_del(&ref->list); + if (!list_empty(&ref->add_list)) + list_del(&ref->add_list); } ref->in_tree = 0; btrfs_put_delayed_ref(ref); @@ -431,6 +433,14 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans, exist->action = ref->action; mod = -exist->ref_mod; exist->ref_mod = ref->ref_mod; + if (ref->action == BTRFS_ADD_DELAYED_REF) + list_add_tail(&exist->add_list, + &href->ref_add_list); + else if (ref->action == BTRFS_DROP_DELAYED_REF) { + ASSERT(!list_empty(&exist->add_list)); + list_del(&exist->add_list); + } else + ASSERT(0); } else mod = -ref->ref_mod; } @@ -444,6 +454,8 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans, add_tail: list_add_tail(&ref->list, &href->ref_list); + if (ref->action == BTRFS_ADD_DELAYED_REF) + list_add_tail(&ref->add_list, &href->ref_add_list); atomic_inc(&root->num_entries); trans->delayed_ref_updates++; spin_unlock(&href->lock); @@ -590,6 +602,7 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, head_ref->must_insert_reserved = must_insert_reserved; head_ref->is_data = is_data; INIT_LIST_HEAD(&head_ref->ref_list); + INIT_LIST_HEAD(&head_ref->ref_add_list); head_ref->processing = 0; head_ref->total_ref_mod = count_mod; head_ref->qgroup_reserved = 0; @@ -671,6 +684,8 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info, ref->is_head = 0; ref->in_tree = 1; ref->seq = seq; + INIT_LIST_HEAD(&ref->list); + INIT_LIST_HEAD(&ref->add_list); full_ref = btrfs_delayed_node_to_tree_ref(ref); full_ref->parent = parent; @@ -726,6 +741,8 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info, ref->is_head = 0; ref->in_tree = 1; ref->seq = seq; + INIT_LIST_HEAD(&ref->list); + INIT_LIST_HEAD(&ref->add_list); full_ref = btrfs_delayed_node_to_data_ref(ref); full_ref->parent = parent; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index 43f3629..dba9784 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -42,6 +42,12 @@ struct btrfs_delayed_ref_node { /*data/tree ref use list, stored in ref_head->ref_list. */ struct list_head list; + /* + * If action is BTRFS_ADD_DELAYED_REF, also link this node to + * ref_head->ref_add_list, then we do not need to iterate the + * whole ref_head->ref_list to find BTRFS_ADD_DELAYED_REF nodes. + */ + struct list_head add_list; /* the starting bytenr of the extent */ u64 bytenr; @@ -99,6 +105,8 @@ struct btrfs_delayed_ref_head { spinlock_t lock; struct list_head ref_list; + /* accumulate add BTRFS_ADD_DELAYED_REF nodes to this ref_add_list. */ + struct list_head ref_add_list; struct rb_node href_node; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3a57f99..bc2edaf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4354,6 +4354,8 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, list) { ref->in_tree = 0; list_del(&ref->list); + if (!list_empty(&ref->add_list)) + list_del(&ref->add_list); atomic_dec(&delayed_refs->num_entries); btrfs_put_delayed_ref(ref); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3c8f0ec..7293f70 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2454,13 +2454,14 @@ select_delayed_ref(struct btrfs_delayed_ref_head *head) * the extent item from the extent tree, when there still are references * to add, which would fail because they would not find the extent item. */ - list_for_each_entry(ref, &head->ref_list, list) { - if (ref->action == BTRFS_ADD_DELAYED_REF) - return ref; - } + if (!list_empty(&head->ref_add_list)) + return list_first_entry(&head->ref_add_list, + struct btrfs_delayed_ref_node, add_list); - return list_entry(head->ref_list.next, struct btrfs_delayed_ref_node, - list); + ref = list_first_entry(&head->ref_list, struct btrfs_delayed_ref_node, + list); + ASSERT(list_empty(&ref->add_list)); + return ref; } /* @@ -2620,6 +2621,8 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, actual_count++; ref->in_tree = 0; list_del(&ref->list); + if (!list_empty(&ref->add_list)) + list_del(&ref->add_list); } atomic_dec(&delayed_refs->num_entries);