From patchwork Wed May 20 06:58:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11559427 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 873D360D for ; Wed, 20 May 2020 06:59:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7779620756 for ; Wed, 20 May 2020 06:59:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726510AbgETG7B (ORCPT ); Wed, 20 May 2020 02:59:01 -0400 Received: from mx2.suse.de ([195.135.220.15]:42408 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726369AbgETG7B (ORCPT ); Wed, 20 May 2020 02:59:01 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 88497ACBD for ; Wed, 20 May 2020 06:59:02 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/2] btrfs: relocation: Fix reloc root leakage and the NULL pointer reference caused by the leakage Date: Wed, 20 May 2020 14:58:50 +0800 Message-Id: <20200520065851.12689-2-wqu@suse.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520065851.12689-1-wqu@suse.com> References: <20200520065851.12689-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] When balance is canceled, there is a pretty high chance that unmounting the fs can lead to lead the NULL pointer dereference: BTRFS warning (device dm-3): page private not zero on page 223158272 ... BTRFS warning (device dm-3): page private not zero on page 223162368 BTRFS error (device dm-3): leaked root 18446744073709551608-304 refcount 1 BUG: kernel NULL pointer dereference, address: 0000000000000168 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 5793 Comm: umount Tainted: G O 5.7.0-rc5-custom+ #53 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:__lock_acquire+0x5dc/0x24c0 Call Trace: lock_acquire+0xab/0x390 _raw_spin_lock+0x39/0x80 btrfs_release_extent_buffer_pages+0xd7/0x200 [btrfs] release_extent_buffer+0xb2/0x170 [btrfs] free_extent_buffer+0x66/0xb0 [btrfs] btrfs_put_root+0x8e/0x130 [btrfs] btrfs_check_leaked_roots.cold+0x5/0x5d [btrfs] btrfs_free_fs_info+0xe5/0x120 [btrfs] btrfs_kill_super+0x1f/0x30 [btrfs] deactivate_locked_super+0x3b/0x80 deactivate_super+0x3e/0x50 cleanup_mnt+0x109/0x160 __cleanup_mnt+0x12/0x20 task_work_run+0x67/0xa0 exit_to_usermode_loop+0xc5/0xd0 syscall_return_slowpath+0x205/0x360 do_syscall_64+0x6e/0xb0 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x7fd028ef740b [CAUSE] When balance is canceled, all reloc roots are marked orphan, and orphan reloc roots are going to be cleaned up. However for orphan reloc roots and merged reloc roots, their lifespan are quite different: Merged reloc roots | Orphan reloc roots by cancel -------------------------------------------------------------------- create_reloc_root() | create_reloc_root() |- refs == 1 | |- refs == 1 | btrfs_grab_root(reloc_root); | btrfs_grab_root(reloc_root); |- refs == 2 | |- refs == 2 | root->reloc_root = reloc_root; | root->reloc_root = reloc_root; >>> No difference so far <<< | prepare_to_merge() | prepare_to_merge() |- btrfs_set_root_refs(item, 1);| |- if (!err) (err == -EINTR) | merge_reloc_roots() | merge_reloc_roots() |- merge_reloc_root() | |- Doing nothing to put reloc root |- insert_dirty_subvol() | |- refs == 2 |- __del_reloc_root() | |- btrfs_put_root() | |- refs == 1 | >>> Now orphan reloc roots still have refs 2 <<< | clean_dirty_subvols() | clean_dirty_subvols() |- btrfs_drop_snapshot() | |- btrfS_drop_snapshot() |- reloc_root get freed | |- reloc_root still has refs 2 | related ebs get freed, but | reloc_root still recorded in | allocated_roots btrfs_check_leaked_roots() | btrfs_check_leaked_roots() |- No leaked roots | |- Leaked reloc_roots detected | |- btrfs_put_root() | |- free_extent_buffer(root->node); | |- eb already freed, caused NULL | pointer dereference [FIX] The fix is to clear fs_root->reloc_root and put it at merge_reloc_roots() time, so that we won't leak reloc roots. Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") Signed-off-by: Qu Wenruo --- fs/btrfs/relocation.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 9afc1a6928cf..420348606123 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1917,12 +1917,11 @@ void merge_reloc_roots(struct reloc_control *rc) reloc_root = list_entry(reloc_roots.next, struct btrfs_root, root_list); + root = read_fs_root(fs_info, + reloc_root->root_key.offset); if (btrfs_root_refs(&reloc_root->root_item) > 0) { - root = read_fs_root(fs_info, - reloc_root->root_key.offset); BUG_ON(IS_ERR(root)); BUG_ON(root->reloc_root != reloc_root); - ret = merge_reloc_root(rc, root); btrfs_put_root(root); if (ret) { @@ -1932,6 +1931,14 @@ void merge_reloc_roots(struct reloc_control *rc) goto out; } } else { + if (!IS_ERR(root)) { + if (root->reloc_root == reloc_root) { + root->reloc_root = NULL; + btrfs_put_root(reloc_root); + } + btrfs_put_root(root); + } + list_del_init(&reloc_root->root_list); /* Don't forget to queue this reloc root for cleanup */ list_add_tail(&reloc_root->reloc_dirty_list, From patchwork Wed May 20 06:58:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 11559429 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 37C6560D for ; Wed, 20 May 2020 06:59:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 204D7207D3 for ; Wed, 20 May 2020 06:59:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726522AbgETG7D (ORCPT ); Wed, 20 May 2020 02:59:03 -0400 Received: from mx2.suse.de ([195.135.220.15]:42420 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726369AbgETG7D (ORCPT ); Wed, 20 May 2020 02:59:03 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id CDA97AC77 for ; Wed, 20 May 2020 06:59:04 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/2] btrfs: relocation: Clear the DEAD_RELOC_TREE bit for orphan roots to prevent runaway balance Date: Wed, 20 May 2020 14:58:51 +0800 Message-Id: <20200520065851.12689-3-wqu@suse.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200520065851.12689-1-wqu@suse.com> References: <20200520065851.12689-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] There are several reported runaway balance, that balance is flooding the kernel with "Found X extents" where the X never changes. [CAUSE] Commit d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") introduced BTRFS_ROOT_DEAD_RELOC_TREE bit to indicate that one subvolume has finished its tree blocks swap with its reloc tree. However if balance is canceled or hits ENOSPC halfway, we didn't clear the BTRFS_ROOT_DEAD_RELOC_TREE bit, leaving that bit hanging forever until unmount. Any subvolume root with that bit, would cause backref cache to skip this tree block, as it has finished its tree block swap. This would cause all tree blocks of that root be ignored by balance, leading to runaway balance. [FIX] Fix the problem by also clearing the BTRFS_ROOT_DEAD_RELOC_TREE bit for the original subvolume of orphan reloc root. Furthermore to avoid such damn bug to bother anybody anymore, add unmount time check to detect and warn about this bit. Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 1 + fs/btrfs/relocation.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index fced949b150c..e6def8fc87dd 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1996,6 +1996,7 @@ void btrfs_put_root(struct btrfs_root *root) if (refcount_dec_and_test(&root->refs)) { WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree)); + WARN_ON(test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state)); if (root->anon_dev) free_anon_bdev(root->anon_dev); btrfs_drew_lock_destroy(&root->snapshot_lock); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 420348606123..595097c4a060 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1936,6 +1936,8 @@ void merge_reloc_roots(struct reloc_control *rc) root->reloc_root = NULL; btrfs_put_root(reloc_root); } + clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, + &root->state); btrfs_put_root(root); }