From patchwork Mon Oct 11 12:06:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12549819 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 983E3C433FE for ; Mon, 11 Oct 2021 12:07:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 800B660187 for ; Mon, 11 Oct 2021 12:07:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236419AbhJKMI7 (ORCPT ); Mon, 11 Oct 2021 08:08:59 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:52182 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236420AbhJKMI6 (ORCPT ); Mon, 11 Oct 2021 08:08:58 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 134BE21C95 for ; Mon, 11 Oct 2021 12:06:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1633954018; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mhLaKSLoBZni/hGkx4jdrs+hVw+5Q0TkItdGwb4OpjQ=; b=AuxxrnMtCRrhOqfxV73vrL8SOdthJ4W1skHd89W09Sa+alBz4eLVdKNNHnHCSzF/rQHdK6 hwDVjcNrFp123uzjRfEVy9KSrlucQ0hwmibk1Gp574/dfVk+erPtR1Zv1eB1tgUBi8SMws EytT2FF0WGWpRKfxAp7NHLmn0VaOeAk= Received: from adam-pc.lan (wqu.tcp.ovpn2.nue.suse.de [10.163.34.62]) by relay2.suse.de (Postfix) with ESMTP id 1D608A3B89 for ; Mon, 11 Oct 2021 12:06:56 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 1/3] btrfs-progs: rename @data parameter to @profile in extent allocation path Date: Mon, 11 Oct 2021 20:06:48 +0800 Message-Id: <20211011120650.179017-2-wqu@suse.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211011120650.179017-1-wqu@suse.com> References: <20211011120650.179017-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In function btrfs_reserve_extent(), we call find_free_extent() passing "u64 profile" into "int data". This is definitely a width reduction, but when looking further into the code, it's more serious than that, in fact the "int data" parameter is not really to indicate whether it's data extent, but really a block group profile (with block group type). This is not only width reduction, but also confusing. Thankfully so for we don't have any BLOCK_GROUP bits beyond 32 bits, so the width reduction is not causing a big problem. This patch will rename the "int data" parameter to a more proper one, "u64 profile" in all involved call paths. Signed-off-by: Qu Wenruo --- kernel-shared/extent-tree.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c index 9c6d17a52a24..8e0614e033fa 100644 --- a/kernel-shared/extent-tree.c +++ b/kernel-shared/extent-tree.c @@ -54,7 +54,7 @@ static int __free_extent(struct btrfs_trans_handle *trans, u64 owner_offset, int refs_to_drop); static struct btrfs_block_group * btrfs_find_block_group(struct btrfs_root *root, struct btrfs_block_group - *hint, u64 search_start, int data, int owner); + *hint, u64 search_start, u64 profile, int owner); static int remove_sb_from_cache(struct btrfs_root *root, struct btrfs_block_group *cache) @@ -264,7 +264,7 @@ static int block_group_bits(struct btrfs_block_group *cache, u64 bits) static int noinline find_search_start(struct btrfs_root *root, struct btrfs_block_group **cache_ret, - u64 *start_ret, int num, int data) + u64 *start_ret, int num, u64 profile) { int ret; struct btrfs_block_group *cache = *cache_ret; @@ -282,7 +282,7 @@ again: goto out; last = max(search_start, cache->start); - if (cache->ro || !block_group_bits(cache, data)) + if (cache->ro || !block_group_bits(cache, profile)) goto new_group; if (btrfs_is_zoned(root->fs_info)) { @@ -339,7 +339,7 @@ wrapped: static struct btrfs_block_group * btrfs_find_block_group(struct btrfs_root *root, struct btrfs_block_group - *hint, u64 search_start, int data, int owner) + *hint, u64 search_start, u64 profile, int owner) { struct btrfs_block_group *cache; struct btrfs_block_group *found_group = NULL; @@ -357,7 +357,7 @@ btrfs_find_block_group(struct btrfs_root *root, struct btrfs_block_group if (search_start) { struct btrfs_block_group *shint; shint = btrfs_lookup_block_group(info, search_start); - if (shint && !shint->ro && block_group_bits(shint, data)) { + if (shint && !shint->ro && block_group_bits(shint, profile)) { used = shint->used; if (used + shint->pinned < div_factor(shint->length, factor)) { @@ -365,7 +365,7 @@ btrfs_find_block_group(struct btrfs_root *root, struct btrfs_block_group } } } - if (hint && !hint->ro && block_group_bits(hint, data)) { + if (hint && !hint->ro && block_group_bits(hint, profile)) { used = hint->used; if (used + hint->pinned < div_factor(hint->length, factor)) { @@ -390,7 +390,7 @@ again: last = cache->start + cache->length; used = cache->used; - if (!cache->ro && block_group_bits(cache, data)) { + if (!cache->ro && block_group_bits(cache, profile)) { if (full_search) free_check = cache->length; else @@ -2177,7 +2177,7 @@ static int noinline find_free_extent(struct btrfs_trans_handle *trans, u64 search_start, u64 search_end, u64 hint_byte, struct btrfs_key *ins, u64 exclude_start, u64 exclude_nr, - int data) + u64 profile) { int ret; u64 orig_search_start = search_start; @@ -2198,11 +2198,11 @@ static int noinline find_free_extent(struct btrfs_trans_handle *trans, if (!block_group) hint_byte = search_start; block_group = btrfs_find_block_group(root, block_group, - hint_byte, data, 1); + hint_byte, profile, 1); } else { block_group = btrfs_find_block_group(root, trans->block_group, - search_start, data, 1); + search_start, profile, 1); } total_needed += empty_size; @@ -2217,7 +2217,7 @@ check_failed: orig_search_start); } ret = find_search_start(root, &block_group, &search_start, - total_needed, data); + total_needed, profile); if (ret) goto new_group; @@ -2255,7 +2255,7 @@ check_failed: goto new_group; } - if (!(data & BTRFS_BLOCK_GROUP_DATA)) { + if (!(profile & BTRFS_BLOCK_GROUP_DATA)) { if (check_crossing_stripes(info, ins->objectid, num_bytes)) { struct btrfs_block_group *bg_cache; u64 bg_offset; @@ -2295,7 +2295,7 @@ new_group: } cond_resched(); block_group = btrfs_find_block_group(root, block_group, - search_start, data, 0); + search_start, profile, 0); goto check_failed; error: From patchwork Mon Oct 11 12:06:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12549821 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8730CC433EF for ; Mon, 11 Oct 2021 12:07:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6FD91603E9 for ; Mon, 11 Oct 2021 12:07:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236424AbhJKMJG (ORCPT ); Mon, 11 Oct 2021 08:09:06 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:58440 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236412AbhJKMJB (ORCPT ); Mon, 11 Oct 2021 08:09:01 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 077F62005A; Mon, 11 Oct 2021 12:07:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1633954020; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EhOPtBkxyv4K+Bf2ck658UI5zKP+8qvL3MmNHPy1ArU=; b=KNbvwdPRWrH6qO0Iy+kfJkYmtM8VXtlt+aFD00hr7cWhw+optcjABGoTAWINvAyubgjiFc NjQkxnRUm7Sx1QTeYP+sq290QBcnQuKH6aeGeCdxar+h8q8C0im17g4PEy+k0uIl66IEWO MJBso1T2AZuNzu2vxR/8pBf595yhuXE= Received: from adam-pc.lan (wqu.tcp.ovpn2.nue.suse.de [10.163.34.62]) by relay2.suse.de (Postfix) with ESMTP id C673AA3B8D; Mon, 11 Oct 2021 12:06:58 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: FireFish5000 Subject: [PATCH v2 2/3] btrfs-progs: mkfs: recow all tree blocks properly Date: Mon, 11 Oct 2021 20:06:49 +0800 Message-Id: <20211011120650.179017-3-wqu@suse.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211011120650.179017-1-wqu@suse.com> References: <20211011120650.179017-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] Since btrfs-progs v5.14, mkfs.btrfs no longer cleans up the temporary SINGLE metadata chunks if "-R free-space-tree" is specified: $ mkfs.btrfs -f -R free-space-tree -m dup -d dup /dev/test/test $ btrfs ins dump-tree -t chunk /dev/test/test | grep "type METADATA" length 8388608 owner 2 stripe_len 65536 type METADATA length 268435456 owner 2 stripe_len 65536 type METADATA|DUP [CAUSE] Since commit 4b6cf2a3eb78 ("btrfs-progs: mkfs: generate free space tree at make_btrfs() time"), free space tree is created when the temporary btrfs image is created. This behavior itself has no problem at all. The problem happens when "-m DUP -d DUP" (or other profiles) is specified. This makes btrfs to create extra chunks, enlarging free space tree so that it can be as high as level 1. During mkfs, we rely on recow_roots() to re-CoW all tree blocks to the newly allocated chunks. But __recow_root() can only handle tree root at level 0, as it forces root node to be CoWed, not bothering the children leaves/nodes. This makes part of the free space cache tree still live on the old temporary chunks, leaving later cleanup_temp_chunks() unable to delete temporary SINGLE chunks. [FIX] Rework __recow_root() to do a proper CoW of the whole tree. But above rework is not enough, as if a free space tree block is allocated during current transaction, but before new chunks added. Then the reworked __recow_root() can't CoW it, as btrfs_search_slot() won't CoW a tree block allocated in current transaction. So this patch will also commit current transaction before calling recow_roots(), to force us to re-cow all tree blocks. This shouldn't be a problem, as at the time of calling, we should have less than a dozen tree blocks, thus there won't be a performance impact. Reported-by: FireFish5000 Fixes: 4b6cf2a3eb78 ("btrfs-progs: mkfs: generate free space tree at make_btrfs() time") Signed-off-by: Qu Wenruo --- mkfs/main.c | 90 +++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 77 insertions(+), 13 deletions(-) diff --git a/mkfs/main.c b/mkfs/main.c index 11a0989be281..2e3d1bf69629 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -210,21 +210,59 @@ err: } static int __recow_root(struct btrfs_trans_handle *trans, - struct btrfs_root *root) + struct btrfs_root *root) { - struct extent_buffer *tmp; + struct btrfs_path path; + struct btrfs_key key; int ret; - if (trans->transid != btrfs_root_generation(&root->root_item)) { - extent_buffer_get(root->node); - ret = __btrfs_cow_block(trans, root, root->node, - NULL, 0, &tmp, 0, 0); - if (ret) - return ret; - free_extent_buffer(tmp); - } + btrfs_init_path(&path); + key.objectid = 0; + key.type = 0; + key.offset = 0; - return 0; + /* Get a path to the most-left leaves */ + ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0); + if (ret < 0) + return ret; + + while (true) { + struct btrfs_key found_key; + + /* + * Our parent nodes must be no newer than the leaf, thus + * if the leaf is as new as the trans, no need to re-cow. + */ + if (btrfs_header_generation(path.nodes[0]) == trans->transid) + goto next; + + /* + * Grab the key of current tree block and do a CoW search to + * the current tree block. + */ + btrfs_item_key_to_cpu(path.nodes[0], &key, 0); + btrfs_release_path(&path); + + /* This will ensure this leaf and all its parent get CoWed */ + ret = btrfs_search_slot(trans, root, &key, &path, 0, 1); + if (ret < 0) + goto out; + ret = 0; + btrfs_item_key_to_cpu(path.nodes[0], &found_key, 0); + ASSERT(btrfs_comp_cpu_keys(&key, &found_key) == 0); + +next: + ret = btrfs_next_leaf(root, &path); + if (ret < 0) + goto out; + if (ret > 0) { + ret = 0; + goto out; + } + } +out: + btrfs_release_path(&path); + return ret; } static int recow_roots(struct btrfs_trans_handle *trans, @@ -305,7 +343,7 @@ static int create_raid_groups(struct btrfs_trans_handle *trans, u64 metadata_profile, bool mixed, struct mkfs_allocation *allocation) { - int ret; + int ret = 0; if (metadata_profile) { u64 meta_flags = BTRFS_BLOCK_GROUP_METADATA; @@ -332,7 +370,6 @@ static int create_raid_groups(struct btrfs_trans_handle *trans, if (ret) return ret; } - ret = recow_roots(trans, root); return ret; } @@ -1479,6 +1516,33 @@ raid_groups: goto out; } + /* + * Commit current trans so we can cow all existing tree blocks + * to newly created raid groups. + * As currently we use btrfs_search_slot() to CoW tree blocks in + * recow_roots(), if a tree block is already modified in current trans, + * it won't be re-CoWed, thus it will stay in temporary chunks. + */ + ret = btrfs_commit_transaction(trans, root); + if (ret) { + errno = -ret; + error("unable to commit transaction before recowing trees: %m"); + goto out; + } + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + errno = -PTR_ERR(trans); + error("failed to start transaction: %m"); + goto error; + } + /* CoW all tree blocks to newly created chunks */ + ret = recow_roots(trans, root); + if (ret) { + errno = -ret; + error("unable to CoW tree blocks to new profiles: %m"); + goto out; + } + ret = create_data_reloc_tree(trans); if (ret) { error("unable to create data reloc tree: %d", ret); From patchwork Mon Oct 11 12:06:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12549823 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05127C433FE for ; Mon, 11 Oct 2021 12:07:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DBC83603E9 for ; Mon, 11 Oct 2021 12:07:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236423AbhJKMJJ (ORCPT ); Mon, 11 Oct 2021 08:09:09 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:52224 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236430AbhJKMJC (ORCPT ); Mon, 11 Oct 2021 08:09:02 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id AF40021C95 for ; Mon, 11 Oct 2021 12:07:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1633954021; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mBCpzLXOQt5E+ogyHATMlbB+3w/n9T1izap85Vz0Q0Q=; b=rq3W9U1NcFPrzFB1PF5JojpIyRWpbyRxTus/wPJwkYXB5RPn1NwZbZt7m95X9LoSH/iD8M G4xHFOiJLd0HzAIb1667aTXOKTkKi62GB8gaCmPB/a1ACkgAzSI/cragGic5oTukR9/fxu I1TDgLzL3CG0rvjpNt9Elwvg/ikh9fE= Received: from adam-pc.lan (wqu.tcp.ovpn2.nue.suse.de [10.163.34.62]) by relay2.suse.de (Postfix) with ESMTP id B8637A3B87 for ; Mon, 11 Oct 2021 12:07:00 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 3/3] btrfs-progs: mfks-tests: make sure mkfs.btrfs cleans up temporary chunks Date: Mon, 11 Oct 2021 20:06:50 +0800 Message-Id: <20211011120650.179017-4-wqu@suse.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211011120650.179017-1-wqu@suse.com> References: <20211011120650.179017-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since current "btrfs filesystem df" command will warn if there are multiple profiles of the same type, it's a good way to detect left-over temporary chunks. This patch will enhance the existing mkfs-tests/001-basic-profiles test case to also check for the warning messages, to make sure mkfs.btrfs has properly cleaned up all temporary chunks. There is a special workaround newly implemented in test_get_info(), as recent kernel introduced single device RAID0 support, which is no different than SINGLE. But for single device RAID0, kernel may choose to preallocate new chunks with SINGLE profile, causing false alerts. Work around this kernel bug by mounting the btrfs read-only to prevent preallocating new chunks. Signed-off-by: Qu Wenruo --- tests/mkfs-tests/001-basic-profiles/test.sh | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/tests/mkfs-tests/001-basic-profiles/test.sh b/tests/mkfs-tests/001-basic-profiles/test.sh index b3ba50d71ddc..0be199749864 100755 --- a/tests/mkfs-tests/001-basic-profiles/test.sh +++ b/tests/mkfs-tests/001-basic-profiles/test.sh @@ -11,10 +11,22 @@ setup_root_helper test_get_info() { + tmp_out=$(mktemp --tmpdir btrfs-progs-mkfs-tests-get-info.XXXXXX) run_check $SUDO_HELPER "$TOP/btrfs" inspect-internal dump-super "$dev1" run_check $SUDO_HELPER "$TOP/btrfs" check "$dev1" - run_check $SUDO_HELPER mount "$dev1" "$TEST_MNT" - run_check "$TOP/btrfs" filesystem df "$TEST_MNT" + + btrfs ins dump-tree -t chunk "$dev1" >> "$RESULTS" + + # Work around a kernel bug that kernel will treat SINGLE and single + # device RAID0 as the same. + # Thus kernel may create new SINGLE chunks, causing extra warning + # when testing single device RAID0. + run_check $SUDO_HELPER mount -o ro "$dev1" "$TEST_MNT" + if grep -q "Multiple block group profiles detected" "$tmp_out"; then + rm -- "$tmp_out" + _fail "temporary chunks are not properly cleaned up" + fi + rm -- "$tmp_out" run_check $SUDO_HELPER "$TOP/btrfs" filesystem usage "$TEST_MNT" run_check $SUDO_HELPER "$TOP/btrfs" device usage "$TEST_MNT" run_check $SUDO_HELPER umount "$TEST_MNT"