[v4,2/7] btrfs: qgroup: Don't trigger backref walk at delayed ref insert time

[BUG]
Since fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting
time out of commit trans"), kernel may lockup with quota enabled.

There is one backref trace triggered by snapshot dropping along with
write operation in the source subvolume.
The example can be stably reproduced.

  btrfs-cleaner   D    0  4062      2 0x80000000
  Call Trace:
   schedule+0x32/0x90
   btrfs_tree_read_lock+0x93/0x130 [btrfs]
   find_parent_nodes+0x29b/0x1170 [btrfs]
   btrfs_find_all_roots_safe+0xa8/0x120 [btrfs]
   btrfs_find_all_roots+0x57/0x70 [btrfs]
   btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs]
   btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs]
   btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs]
   do_walk_down+0x541/0x5e3 [btrfs]
   walk_down_tree+0xab/0xe7 [btrfs]
   btrfs_drop_snapshot+0x356/0x71a [btrfs]
   btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs]
   cleaner_kthread+0x12b/0x160 [btrfs]
   kthread+0x112/0x130
   ret_from_fork+0x27/0x50

[CAUSE]
When dropping snapshots with qgroup enabled, we will trigger backref
walk.

However such backref walk at that timing is pretty dangerous, as if one
of the parent nodes get WRITE locked by other thread, we could cause a
dead lock.

For example:

           FS 260     FS 261 (Dropped)
            node A        node B
           /      \      /      \
       node C      node D      node E
      /   \         /  \        /     \
  leaf F|leaf G|leaf H|leaf I|leaf J|leaf K

The lock sequence would be:

      Thread A (cleaner)             |       Thread B (other writer)
-----------------------------------------------------------------------
write_lock(B)                        |
write_lock(D)                        |
^^^ called by walk_down_tree()       |
                                     |       write_lock(A)
                                     |       write_lock(D) << Stall
read_lock(H) << for backref walk     |
read_lock(D) << lock owner is        |
                the same thread A    |
                so read lock is OK   |
read_lock(A) << Stall                |

So thread A hold write lock D, and needs read lock A to unlock.
While thread B holds write lock A, while needs lock D to unlock.

This will cause a dead lock.

This is not only limited to snapshot dropping case.
As the backref walk, even only happens on commit trees, is breaking the
normal top-down locking order, makes it deadlock prone.

[FIX]
The solution is not to trigger backref walk with any write lock hold.
This means we can't do backref walk at delayed ref insert time.

So this patch will mostly revert commit fb235dc06fac ("btrfs: qgroup:
Move half of the qgroup accounting time out of commit trans").

Reported-by: David Sterba <dsterba@suse.cz>
Reported-by: Filipe Manana <fdmanana@suse.com>
Fixes: fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/delayed-ref.c | 24 ++++--------------------
 fs/btrfs/qgroup.c      | 39 +++------------------------------------
 fs/btrfs/qgroup.h      | 30 ++----------------------------
 3 files changed, 9 insertions(+), 84 deletions(-)

Message ID	20190115081604.785-3-wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F832184E for <patchwork-linux-btrfs@patchwork.kernel.org>; Tue, 15 Jan 2019 08:16:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 31D0F2B3FC for <patchwork-linux-btrfs@patchwork.kernel.org>; Tue, 15 Jan 2019 08:16:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 263732B405; Tue, 15 Jan 2019 08:16:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D6EE2B3FE for <patchwork-linux-btrfs@patchwork.kernel.org>; Tue, 15 Jan 2019 08:16:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728453AbfAOIQZ (ORCPT <rfc822;patchwork-linux-btrfs@patchwork.kernel.org>); Tue, 15 Jan 2019 03:16:25 -0500 Received: from mx2.suse.de ([195.135.220.15]:47668 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728448AbfAOIQZ (ORCPT <rfc822;linux-btrfs@vger.kernel.org>); Tue, 15 Jan 2019 03:16:25 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id E573BAFAE for <linux-btrfs@vger.kernel.org>; Tue, 15 Jan 2019 08:16:23 +0000 (UTC) From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Cc: David Sterba <dsterba@suse.cz>, Filipe Manana <fdmanana@suse.com> Subject: [PATCH v4 2/7] btrfs: qgroup: Don't trigger backref walk at delayed ref insert time Date: Tue, 15 Jan 2019 16:15:59 +0800 Message-Id: <20190115081604.785-3-wqu@suse.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190115081604.785-1-wqu@suse.com> References: <20190115081604.785-1-wqu@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: <linux-btrfs.vger.kernel.org> X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP
Series	btrfs: qgroup: Delay subtree scan to reduce overhead \| expand [v4,0/7] btrfs: qgroup: Delay subtree scan to reduce overhead [v4,1/7] btrfs: qgroup: Move reserved data account from btrfs_delayed_ref_head to btrfs_qgroup_exte… [v4,2/7] btrfs: qgroup: Don't trigger backref walk at delayed ref insert time [v4,3/7] btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots() [v4,4/7] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap() [v4,5/7] btrfs: qgroup: Introduce per-root swapped blocks infrastructure [v4,6/7] btrfs: qgroup: Use delayed subtree rescan for balance [v4,7/7] btrfs: qgroup: Cleanup old subtree swap code

[v4,2/7] btrfs: qgroup: Don't trigger backref walk at delayed ref insert time

Commit Message

Patch