From patchwork Wed May 17 07:35:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13244314 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D5EFC77B75 for ; Wed, 17 May 2023 07:36:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230163AbjEQHgq (ORCPT ); Wed, 17 May 2023 03:36:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230000AbjEQHgL (ORCPT ); Wed, 17 May 2023 03:36:11 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F33C42D4D for ; Wed, 17 May 2023 00:36:03 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9BF32228CD for ; Wed, 17 May 2023 07:36:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1684308962; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Kj10L6DXTY8zk0e7VQnJgHuxkoMEZL2sXdchSamom+E=; b=JOoew057Wt6ylp54/zWLW3lcGtpStEGohXxEFokm/EGet2Dfa28NTCw4u5TN3v41VpZnAV CAIhU6EIv4EsEoxyUqROU92di8Rz0nzLbjZGaasu44wW69zxJ4EVaWYKgH9WFMDad/tzp1 Z1u5yrTs1/ydX9ZY5yX8X+QbaRHm7OQ= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9F5FC13358 for ; Wed, 17 May 2023 07:36:01 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id qNUzF+GDZGQkEQAAMHmgww (envelope-from ) for ; Wed, 17 May 2023 07:36:01 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/7] btrfs-progs: tune: rework the main idea of csum change Date: Wed, 17 May 2023 15:35:36 +0800 Message-Id: <800ec0b5107b26309517bd9890275b61a9d15bb1.1684308139.git.wqu@suse.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The existing attempt for changing csum types is as the following: - Create a new temporary csum root - Generate new data csums into the temporary csum root - Drop the old csum tree and make the temporary one as csum root - Change the checksums for metadata in-place Unfortunately after some experiments, the csum root switch method has a big pitfall, the backref items in extent tree. Those backref items still point back to the old tree, meaning without a lot of extra tricks, the extent tree would be corrupted. Thus we have to go a new single tree variant: - Generate new data csums into the csum root The new data csums would have a different objectid to distinguish them. - Drop the old data csum items - Change the key objectids of the new csums - Change the checksums for metadata in-place This means unfortunately we have to revert most of the old code, and update the temporary item format. The new temporary item would only record the target csum type. At every stage we have a method to determine the progress, thus no need for an item, but in the future it's still open for change. Signed-off-by: Qu Wenruo --- kernel-shared/ctree.c | 3 - kernel-shared/ctree.h | 19 +- kernel-shared/disk-io.c | 8 - kernel-shared/file-item.c | 12 - kernel-shared/print-tree.c | 11 +- kernel-shared/uapi/btrfs_tree.h | 1 + tune/change-csum.c | 518 ++------------------------------ tune/main.c | 2 +- tune/tune.h | 3 +- 9 files changed, 34 insertions(+), 543 deletions(-) diff --git a/kernel-shared/ctree.c b/kernel-shared/ctree.c index 782bc6cc80c1..bcf16271d864 100644 --- a/kernel-shared/ctree.c +++ b/kernel-shared/ctree.c @@ -403,9 +403,6 @@ int btrfs_create_root(struct btrfs_trans_handle *trans, fs_info->block_group_root = new_root; break; - case BTRFS_CSUM_TREE_TMP_OBJECTID: - fs_info->csum_tree_tmp = new_root; - break; /* * Essential trees can't be created by this function, yet. * As we expect such skeleton exists, or a lot of functions like diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h index 3b7d98bff469..5d3392ae82a6 100644 --- a/kernel-shared/ctree.h +++ b/kernel-shared/ctree.h @@ -56,9 +56,8 @@ static inline unsigned long btrfs_chunk_item_size(int num_stripes) sizeof(struct btrfs_stripe) * (num_stripes - 1); } -/* Temporary flag not on-disk for blocks that have changed csum already */ -#define BTRFS_HEADER_FLAG_CSUM_NEW (1ULL << 16) -#define BTRFS_SUPER_FLAG_CHANGING_CSUM (1ULL << 37) +#define BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM (1ULL << 36) +#define BTRFS_SUPER_FLAG_CHANGING_META_CSUM (1ULL << 37) /* * The fs is undergoing block group tree feature change. @@ -306,9 +305,6 @@ struct btrfs_fs_info { /* the log root tree is a directory of all the other log roots */ struct btrfs_root *log_root_tree; - /* When switching csums */ - struct btrfs_root *csum_tree_tmp; - struct cache_tree extent_cache; u64 max_cache_size; u64 cache_size; @@ -365,7 +361,6 @@ struct btrfs_fs_info { unsigned int skip_leaf_item_checks:1; int transaction_aborted; - int force_csum_type; int (*free_extent_hook)(u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid, u64 owner, u64 offset, @@ -670,17 +665,11 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info) * - balance status item (objectid -4) * (BTRFS_BALANCE_OBJECTID, BTRFS_TEMPORARY_ITEM_KEY, 0) * - * - second csum tree for conversion (objecitd + * - second csum tree for conversion (objecitd -13) + * (BTRFS_CSUM_CHANGE_OBJECTID, BTRFS_TEMPORARY_ITEM_KEY, ) */ #define BTRFS_TEMPORARY_ITEM_KEY 248 -/* - * Temporary value - * - * root tree pointer of checksum tree with new checksum type - */ -#define BTRFS_CSUM_TREE_TMP_OBJECTID 13ULL - /* * Obsolete name, see BTRFS_PERSISTENT_ITEM_KEY */ diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c index 5cbfcdd8452c..442d3af8bc01 100644 --- a/kernel-shared/disk-io.c +++ b/kernel-shared/disk-io.c @@ -215,12 +215,6 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info, u16 csum_size = fs_info->csum_size; u16 csum_type = fs_info->csum_type; - if (fs_info->force_csum_type != -1) { - /* printf("CSUM TREE: offset %llu\n", buf->start); */ - csum_type = fs_info->force_csum_type; - csum_size = btrfs_csum_type_size(csum_type); - } - if (verify && fs_info->suppress_check_block_errors) return verify_tree_block_csum_silent(buf, csum_size, csum_type); return csum_tree_block_size(buf, csum_size, verify, csum_type); @@ -475,7 +469,6 @@ int write_tree_block(struct btrfs_trans_handle *trans, if (trans && !btrfs_buffer_uptodate(eb, trans->transid, 0)) BUG(); - btrfs_clear_header_flag(eb, BTRFS_HEADER_FLAG_CSUM_NEW); btrfs_set_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN); csum_tree_block(fs_info, eb, 0); @@ -885,7 +878,6 @@ struct btrfs_fs_info *btrfs_new_fs_info(int writable, u64 sb_bytenr) fs_info->metadata_alloc_profile = (u64)-1; fs_info->system_alloc_profile = fs_info->metadata_alloc_profile; fs_info->nr_global_roots = 1; - fs_info->force_csum_type = -1; return fs_info; diff --git a/kernel-shared/file-item.c b/kernel-shared/file-item.c index b372cc5eab54..9b59a4b7a9ae 100644 --- a/kernel-shared/file-item.c +++ b/kernel-shared/file-item.c @@ -142,7 +142,6 @@ btrfs_lookup_csum(struct btrfs_trans_handle *trans, struct btrfs_csum_item *item; struct extent_buffer *leaf; u64 csum_offset = 0; - u16 csum_type = root->fs_info->csum_type; u16 csum_size = root->fs_info->csum_size; int csums_in_item; @@ -154,11 +153,6 @@ btrfs_lookup_csum(struct btrfs_trans_handle *trans, goto fail; leaf = path->nodes[0]; - if (leaf->fs_info->force_csum_type != -1) { - csum_type = root->fs_info->force_csum_type; - csum_size = btrfs_csum_type_size(csum_type); - } - if (ret > 0) { ret = 1; if (path->slots[0] == 0) @@ -208,12 +202,6 @@ int btrfs_csum_file_block(struct btrfs_trans_handle *trans, u16 csum_size = root->fs_info->csum_size; u16 csum_type = root->fs_info->csum_type; - if (root->fs_info->force_csum_type != -1) { - /* printf("CSUM DATA: offset %llu (%d -> %d)\n", bytenr, csum_type, root->fs_info->force_csum_type); */ - csum_type = root->fs_info->force_csum_type; - csum_size = btrfs_csum_type_size(csum_type); - } - path = btrfs_alloc_path(); if (!path) return -ENOMEM; diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c index 2cfd6b950ec5..aaaf58ae2e0f 100644 --- a/kernel-shared/print-tree.c +++ b/kernel-shared/print-tree.c @@ -790,6 +790,9 @@ void print_objectid(FILE *stream, u64 objectid, u8 type) case BTRFS_BLOCK_GROUP_TREE_OBJECTID: fprintf(stream, "BLOCK_GROUP_TREE"); break; + case BTRFS_CSUM_CHANGE_OBJECTID: + fprintf(stream, "CSUM_CHANGE"); + break; case (u64)-1: fprintf(stream, "-1"); break; @@ -1142,8 +1145,12 @@ static void print_temporary_item(struct extent_buffer *eb, void *ptr, case BTRFS_BALANCE_OBJECTID: print_balance_item(eb, ptr); break; - case BTRFS_CSUM_TREE_TMP_OBJECTID: - printf("\t\tcsum tree tmp root %llu\n", offset); + case BTRFS_CSUM_CHANGE_OBJECTID: + if (offset < btrfs_get_num_csums()) + printf("\t\ttarget csum type %s (%llu)\n", + btrfs_super_csum_name(offset) ,offset); + else + printf("\t\tunknown csum type %llu\n", offset); break; default: printf("\t\tunknown temporary item objectid %llu\n", objectid); diff --git a/kernel-shared/uapi/btrfs_tree.h b/kernel-shared/uapi/btrfs_tree.h index 5b9f71ab15de..ad555e7055ab 100644 --- a/kernel-shared/uapi/btrfs_tree.h +++ b/kernel-shared/uapi/btrfs_tree.h @@ -106,6 +106,7 @@ */ #define BTRFS_FREE_INO_OBJECTID -12ULL +#define BTRFS_CSUM_CHANGE_OBJECTID -13ULL /* dummy objectid represents multiple objectids */ #define BTRFS_MULTIPLE_OBJECTIDS -255ULL diff --git a/tune/change-csum.c b/tune/change-csum.c index 4531f2190f06..7a9f6351e7fe 100644 --- a/tune/change-csum.c +++ b/tune/change-csum.c @@ -26,510 +26,28 @@ #include "common/internal.h" #include "tune/tune.h" -static int change_tree_csum(struct btrfs_trans_handle *trans, struct btrfs_root *root, - int csum_type) +int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) { - struct btrfs_fs_info *fs_info = root->fs_info; - struct btrfs_path path; - struct btrfs_key key = {0, 0, 0}; - int ret = 0; - int level; - - btrfs_init_path(&path); - /* No transaction, all in-place */ - ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0); - if (ret < 0) - goto out; - - while (1) { - level = 1; - while (path.nodes[level]) { - /* Caching can make double writes */ - if (!btrfs_header_flag(path.nodes[level], BTRFS_HEADER_FLAG_CSUM_NEW)) { - ret = write_tree_block(NULL, fs_info, path.nodes[level]); - if (ret < 0) - goto out; - btrfs_set_header_flag(path.nodes[level], - BTRFS_HEADER_FLAG_CSUM_NEW); - } - level++; - } - ret = write_tree_block(NULL, fs_info, path.nodes[0]); - if (ret < 0) - goto out; - ret = btrfs_next_leaf(root, &path); - if (ret < 0) - goto out; - if (ret > 0) { - ret = 0; - goto out; - } - } -out: - btrfs_release_path(&path); - return ret; -} - -static struct btrfs_csum_item *lookup_tmp_csum(struct btrfs_trans_handle *trans, - struct btrfs_path *path, u64 bytenr, int cow) -{ - int ret; - struct btrfs_fs_info *fs_info = trans->fs_info; - struct btrfs_root *csum_root = fs_info->csum_tree_tmp; - struct btrfs_key file_key; - struct btrfs_key found_key; - struct btrfs_csum_item *item; - struct extent_buffer *leaf; - u64 csum_offset = 0; - u16 csum_type = fs_info->csum_type; - u16 csum_size = fs_info->csum_size; - int csums_in_item; - - file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; - file_key.offset = bytenr; - file_key.type = BTRFS_EXTENT_CSUM_KEY; - ret = btrfs_search_slot(trans, csum_root, &file_key, path, 0, cow); - if (ret < 0) - goto fail; - leaf = path->nodes[0]; - - if (leaf->fs_info->force_csum_type != -1) { - csum_type = fs_info->force_csum_type; - csum_size = btrfs_csum_type_size(csum_type); - } - - if (ret > 0) { - ret = 1; - if (path->slots[0] == 0) - goto fail; - path->slots[0]--; - btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]); - if (found_key.type != BTRFS_EXTENT_CSUM_KEY) - goto fail; - - csum_offset = (bytenr - found_key.offset) / fs_info->sectorsize; - csums_in_item = btrfs_item_size(leaf, path->slots[0]); - csums_in_item /= csum_size; - - if (csum_offset >= csums_in_item) { - ret = -EFBIG; - goto fail; - } - } - item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_csum_item); - item = (struct btrfs_csum_item *)((unsigned char *)item + - csum_offset * csum_size); - return item; -fail: - if (ret > 0) - ret = -ENOENT; - return ERR_PTR(ret); -} - -#define MAX_CSUM_ITEMS(r, size) ((((BTRFS_LEAF_DATA_SIZE(r->fs_info) - \ - sizeof(struct btrfs_item) * 2) / \ - size) - 1)) - -static int csum_file_block(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info, - u64 alloc_end, u64 bytenr, char *data, size_t len) -{ - struct btrfs_root *csum_root = fs_info->csum_tree_tmp; - int ret = 0; - struct btrfs_key file_key; - struct btrfs_key found_key; - u64 next_offset = (u64)-1; - int found_next = 0; - struct btrfs_path *path; - struct btrfs_csum_item *item; - struct extent_buffer *leaf = NULL; - u64 csum_offset; - u8 csum_result[BTRFS_CSUM_SIZE]; - u32 sectorsize = fs_info->sectorsize; - u32 nritems; - u32 ins_size; - u16 csum_size; - u16 csum_type; - - if (fs_info->force_csum_type != -1) - return -EINVAL; - - csum_type = fs_info->force_csum_type; - csum_size = btrfs_csum_type_size(csum_type); - - path = btrfs_alloc_path(); - if (!path) - return -ENOMEM; - - file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; - file_key.type = BTRFS_EXTENT_CSUM_KEY; - file_key.offset = bytenr; - - item = lookup_tmp_csum(trans, path, bytenr, 1); - if (!IS_ERR(item)) { - leaf = path->nodes[0]; - ret = 0; - goto found; - } - ret = PTR_ERR(item); - if (ret == -EFBIG) { - u32 item_size; - - /* We found one, but it isn't big enough yet */ - leaf = path->nodes[0]; - item_size = btrfs_item_size(leaf, path->slots[0]); - if ((item_size / csum_size) >= MAX_CSUM_ITEMS(csum_root, csum_size)) { - /* Already at max size, make a new one */ - goto insert; - } - } else { - int slot = path->slots[0] + 1; - - /* We didn't find a csum item, insert one */ - nritems = btrfs_header_nritems(path->nodes[0]); - if (path->slots[0] >= nritems - 1) { - ret = btrfs_next_leaf(csum_root, path); - if (ret == 1) - found_next = 1; - if (ret != 0) - goto insert; - slot = 0; - } - btrfs_item_key_to_cpu(path->nodes[0], &found_key, slot); - if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID || - found_key.type != BTRFS_EXTENT_CSUM_KEY) { - found_next = 1; - goto insert; - } - next_offset = found_key.offset; - found_next = 1; - goto insert; - } + /* Phase 0, check conflicting features. */ /* - * At this point, we know the tree has an item, but it isn't big - * enough yet to put our csum in. Grow it. + * Phase 1, generate new data csums. + * + * The new data csums would have a different key objectid, and there + * will be a temporary item in root tree to indicate the new checksum + * algo. */ - btrfs_release_path(path); - ret = btrfs_search_slot(trans, csum_root, &file_key, path, csum_size, 1); - if (ret < 0) - goto fail; - if (ret == 0) - BUG(); - if (path->slots[0] == 0) - goto insert; - path->slots[0]--; - leaf = path->nodes[0]; - btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]); - csum_offset = (file_key.offset - found_key.offset) / sectorsize; - if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID || - found_key.type != BTRFS_EXTENT_CSUM_KEY || - csum_offset >= MAX_CSUM_ITEMS(csum_root, csum_size)) { - goto insert; - } - if (csum_offset >= btrfs_item_size(leaf, path->slots[0]) / csum_size) { - u32 diff = (csum_offset + 1) * csum_size; - diff = diff - btrfs_item_size(leaf, path->slots[0]); - if (diff != csum_size) - goto insert; - ret = btrfs_extend_item(csum_root, path, diff); - BUG_ON(ret); - goto csum; - } + /* Phase 2, delete the old data csums. */ -insert: - btrfs_release_path(path); - csum_offset = 0; - if (found_next) { - u64 tmp = min(alloc_end, next_offset); - tmp -= file_key.offset; - tmp /= sectorsize; - tmp = max((u64)1, tmp); - tmp = min(tmp, (u64)MAX_CSUM_ITEMS(csum_root, csum_size)); - ins_size = csum_size * tmp; - } else { - ins_size = csum_size; - } - ret = btrfs_insert_empty_item(trans, csum_root, path, &file_key, ins_size); - if (ret < 0) - goto fail; - if (ret != 0) { - WARN_ON(1); - goto fail; - } -csum: - leaf = path->nodes[0]; - item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_csum_item); - ret = 0; - item = (struct btrfs_csum_item *)((unsigned char *)item + - csum_offset * csum_size); -found: - btrfs_csum_data(fs_info, csum_type, (u8 *)data, csum_result, len); - write_extent_buffer(leaf, csum_result, (unsigned long)item, csum_size); - btrfs_mark_buffer_dirty(path->nodes[0]); -fail: - btrfs_free_path(path); - return ret; -} - -static int populate_csum(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info, char *buf, u64 start, - u64 len) -{ - u64 offset = 0; - u64 sectorsize; - int ret = 0; - - while (offset < len) { - sectorsize = fs_info->sectorsize; - ret = read_data_from_disk(fs_info, buf, start + offset, - §orsize, 0); - if (ret) - break; - ret = csum_file_block(trans, fs_info, start + len, start + offset, - buf, sectorsize); - if (ret) - break; - offset += sectorsize; - } - return ret; -} - -static int fill_csum_tree_from_extent(struct btrfs_fs_info *fs_info) -{ - struct btrfs_root *extent_root = btrfs_extent_root(fs_info, 0); - struct btrfs_trans_handle *trans; - struct btrfs_path path; - struct btrfs_extent_item *ei; - struct extent_buffer *leaf; - char *buf; - struct btrfs_key key; - int ret; - - trans = btrfs_start_transaction(extent_root, 1); - if (trans == NULL) { - ret = PTR_ERR(trans); - errno = -ret; - error_msg(ERROR_MSG_START_TRANS, "%m"); - return -EINVAL; - } - - btrfs_init_path(&path); - key.objectid = 0; - key.type = BTRFS_EXTENT_ITEM_KEY; - key.offset = 0; - ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0); - if (ret < 0) { - btrfs_release_path(&path); - return ret; - } - - buf = malloc(fs_info->sectorsize); - if (!buf) { - btrfs_release_path(&path); - return -ENOMEM; - } - - while (1) { - if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) { - ret = btrfs_next_leaf(extent_root, &path); - if (ret < 0) - break; - if (ret) { - ret = 0; - break; - } - } - leaf = path.nodes[0]; - - btrfs_item_key_to_cpu(leaf, &key, path.slots[0]); - if (key.type != BTRFS_EXTENT_ITEM_KEY) { - path.slots[0]++; - continue; - } - - ei = btrfs_item_ptr(leaf, path.slots[0], struct btrfs_extent_item); - if (!(btrfs_extent_flags(leaf, ei) & BTRFS_EXTENT_FLAG_DATA)) { - path.slots[0]++; - continue; - } - - ret = populate_csum(trans, fs_info, buf, key.objectid, key.offset); - if (ret) - break; - path.slots[0]++; - } - - btrfs_release_path(&path); - free(buf); - - /* dont' commit if thre's error */ - ret = btrfs_commit_transaction(trans, extent_root); - - return ret; -} - -int rewrite_checksums(struct btrfs_fs_info *fs_info, int csum_type) -{ - struct btrfs_root *root; - struct btrfs_super_block *disk_super; - struct btrfs_trans_handle *trans; - struct btrfs_path path; - struct btrfs_key key; - u64 super_flags; - int ret; - - disk_super = fs_info->super_copy; - super_flags = btrfs_super_flags(disk_super); - - /* FIXME: Sanity checks */ - if (0) { - error("UUID rewrite in progress, cannot change csum"); - return 1; - } - - pr_verbose(LOG_DEFAULT, "Change csum from %s to %s\n", - btrfs_super_csum_name(fs_info->csum_type), - btrfs_super_csum_name(csum_type)); - - fs_info->force_csum_type = csum_type; - root = fs_info->tree_root; - - /* Step 1 sets the in progress flag, no other change to the sb */ - pr_verbose(LOG_DEFAULT, "Set superblock flag CHANGING_CSUM\n"); - trans = btrfs_start_transaction(root, 1); - if (IS_ERR(trans)) { - ret = PTR_ERR(trans); - errno = -ret; - error_msg(ERROR_MSG_START_TRANS, "%m"); - return ret; - } - - btrfs_init_path(&path); - key.objectid = BTRFS_CSUM_TREE_TMP_OBJECTID; - key.type = BTRFS_TEMPORARY_ITEM_KEY; - key.offset = 0; - ret = btrfs_search_slot(trans, root, &key, &path, 0, 0); - if (ret < 0) - return ret; - - if (ret == 1) { - struct item { - u64 offset; - u64 generation; - u16 csum_type; - /* - * - generation when last synced - * - must recheck the whole tree anyway in case the fs - * was mounted between and there are some extents missing - */ - } item[1]; - - ret = btrfs_create_root(trans, fs_info, BTRFS_CSUM_TREE_TMP_OBJECTID); - if (ret < 0) { - return ret; - } else { - item->offset = btrfs_header_bytenr(fs_info->csum_tree_tmp->node); - item->generation = btrfs_super_generation(fs_info->super_copy); - item->csum_type = csum_type; - ret = btrfs_insert_item(trans, fs_info->tree_root, &key, item, - sizeof(*item)); - if (ret < 0) - return ret; - } - } else { - error("updating existing tmp csum root not implemented"); - exit(1); - } - - super_flags |= BTRFS_SUPER_FLAG_CHANGING_CSUM; - btrfs_set_super_flags(disk_super, super_flags); - /* Change csum type here */ - btrfs_set_super_csum_type(disk_super, csum_type); - ret = btrfs_commit_transaction(trans, root); - if (ret < 0) - return ret; - btrfs_release_path(&path); - - struct { - struct btrfs_root *root; - const char *name; - u64 objectid; - bool p; - bool g; - } trees[] = { - { .p = true, .root = fs_info->tree_root, .name = "root tree" }, - { .p = true, .root = fs_info->chunk_root, .name = "chunk tree" }, - { .p = true, .root = fs_info->dev_root, .name = "dev tree" }, - { .p = true, .root = fs_info->uuid_root, .name = "uuid tree" }, - { .p = true, .root = fs_info->quota_root, .name = "quota tree" }, - { .p = true, .root = fs_info->block_group_root, .name = "block group tree" }, - { .g = true, .objectid = BTRFS_EXTENT_TREE_OBJECTID, .name = "extent tree" }, - { .g = true, .objectid = BTRFS_CSUM_TREE_OBJECTID, .name = "csum tree" }, - { .g = true, .objectid = BTRFS_FREE_SPACE_TREE_OBJECTID, .name = "free space tree" }, - { .p = true, .root = fs_info->csum_tree_tmp, .name = "csum tmp tree" }, - { .objectid = BTRFS_DATA_RELOC_TREE_OBJECTID, .name = "data reloc tree" }, - { .objectid = BTRFS_FS_TREE_OBJECTID, .name = "fs tree" }, - /* TODO: iterate all fs trees */ - /* TODO: crashes if trees not present */ - /* { .objectid = BTRFS_TREE_LOG_OBJECTID, .name = "tree log tree" }, */ - /* { .objectid = BTRFS_TREE_RELOC_OBJECTID, .name = "tree reloc tree" }, */ - /* { .objectid = BTRFS_BLOCK_GROUP_TREE_OBJECTID, .name = "block group tree" }, */ - }; - - for (int i = 0; i < ARRAY_SIZE(trees); i++) { - pr_verbose(LOG_DEFAULT, "Change csum in %s\n", trees[i].name); - if (trees[i].p) { - root = trees[i].root; - if (!root) - continue; - } else if (trees[i].g) { - key.objectid = trees[i].objectid; - key.type = BTRFS_ROOT_ITEM_KEY; - key.offset = 0; - root = btrfs_global_root(fs_info, &key); - if (!root) - continue; - } else { - key.objectid = trees[i].objectid; - key.type = BTRFS_ROOT_ITEM_KEY; - key.offset = (u64)-1; - root = btrfs_read_fs_root_no_cache(fs_info, &key); - if (!root) - continue; - } - ret = change_tree_csum(trans, root, csum_type); - if (ret < 0) { - error("failed to change csum of %s: %d", trees[i].name, ret); - goto out; - } - } - - /* DATA */ - pr_verbose(LOG_DEFAULT, "Change csum of data blocks\n"); - ret = fill_csum_tree_from_extent(fs_info); - if (ret < 0) - goto out; - - /* TODO: sync last status of old csum tree */ - /* TODO: delete old csum tree */ - - /* Last, change csum in super */ - ret = write_all_supers(fs_info); - if (ret < 0) - goto out; - - /* All checksums done, drop the flag, super block csum will get updated */ - pr_verbose(LOG_DEFAULT, "Clear superblock flag CHANGING_CSUM\n"); - super_flags = btrfs_super_flags(fs_info->super_copy); - super_flags &= ~BTRFS_SUPER_FLAG_CHANGING_CSUM; - btrfs_set_super_flags(fs_info->super_copy, super_flags); - btrfs_set_super_csum_type(disk_super, csum_type); - ret = write_all_supers(fs_info); - pr_verbose(LOG_DEFAULT, "Checksum change finished\n"); -out: - /* check errors */ - - return ret; + /* Phase 3, change the new csum key objectid */ + + /* + * Phase 4, change the csums for metadata. + * + * This has to be done in-place, as we don't have a good method + * like relocation in progs. + * Thus we have to support reading a tree block with either csum. + */ + return -EOPNOTSUPP; } diff --git a/tune/main.c b/tune/main.c index c3e18df5ed5c..e38c1f6d3729 100644 --- a/tune/main.c +++ b/tune/main.c @@ -373,7 +373,7 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[]) if (csum_type != -1) { /* TODO: check conflicting flags */ pr_verbose(LOG_DEFAULT, "Proceed to switch checksums\n"); - ret = rewrite_checksums(root->fs_info, csum_type); + ret = btrfs_change_csum_type(root->fs_info, csum_type); } if (change_metadata_uuid) { diff --git a/tune/tune.h b/tune/tune.h index 753dc95eb138..0ef249d89eee 100644 --- a/tune/tune.h +++ b/tune/tune.h @@ -32,6 +32,5 @@ int set_metadata_uuid(struct btrfs_root *root, const char *uuid_string); int convert_to_bg_tree(struct btrfs_fs_info *fs_info); int convert_to_extent_tree(struct btrfs_fs_info *fs_info); -int rewrite_checksums(struct btrfs_fs_info *fs_info, int csum_type); - +int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type); #endif From patchwork Wed May 17 07:35:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13244313 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16081C77B7A for ; Wed, 17 May 2023 07:36:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229907AbjEQHgp (ORCPT ); Wed, 17 May 2023 03:36:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59850 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229925AbjEQHgL (ORCPT ); Wed, 17 May 2023 03:36:11 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33111527C for ; Wed, 17 May 2023 00:36:05 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D2D7B203B0 for ; Wed, 17 May 2023 07:36:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1684308963; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=shT6JAVzDIl9W31VXrwplD+sYZEFYwPLksGTz/UxirY=; b=XB5+UwDxscelAN+of/+wfKo2A7gD8PUaQWkCxU9qe+l53QPrHXXZV9vMh2sGgC+1ldLxHi EQfbw9M0M16W1fDbV9klxUmSJiocq1T73jIxqvOf8A5EkiWXXWNe7SVVWzJHwnFpJvfvRr NX6+wWjS6jOhiPLzfj1aTYk81yzat0M= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1B59113358 for ; Wed, 17 May 2023 07:36:02 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id OFGsM+KDZGQkEQAAMHmgww (envelope-from ) for ; Wed, 17 May 2023 07:36:02 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/7] btrfs-progs: tune: implement the prerequisite checks for csum change Date: Wed, 17 May 2023 15:35:37 +0800 Message-Id: <7e638e79f7e1d2bce75961626f3aa18ff8a5abc2.1684308139.git.wqu@suse.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The overall idea is to make sure no running operations (balance, dev-replace, dirty log) for the fs before csum change. And also reject half converted csums for now. Signed-off-by: Qu Wenruo --- tune/change-csum.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/tune/change-csum.c b/tune/change-csum.c index 7a9f6351e7fe..daab70b6eb4a 100644 --- a/tune/change-csum.c +++ b/tune/change-csum.c @@ -26,9 +26,68 @@ #include "common/internal.h" #include "tune/tune.h" +static int check_csum_change_requreiment(struct btrfs_fs_info *fs_info) +{ + struct btrfs_root *tree_root = fs_info->tree_root; + struct btrfs_root *dev_root = fs_info->dev_root; + struct btrfs_path path = { 0 }; + struct btrfs_key key; + int ret; + + if (btrfs_super_log_root(fs_info->super_copy)) { + error("dirty log tree detected, please replay the log or zero it."); + return -EINVAL; + } + if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) { + error("no csum change support for extent-tree-v2 feature yet."); + return -EOPNOTSUPP; + } + if (btrfs_super_flags(fs_info->super_copy) & + (BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM | + BTRFS_SUPER_FLAG_CHANGING_META_CSUM)) { + error("resume from half converted status is not yet supported"); + return -EOPNOTSUPP; + } + key.objectid = BTRFS_BALANCE_OBJECTID; + key.type = BTRFS_TEMPORARY_ITEM_KEY; + key.offset = 0; + ret = btrfs_search_slot(NULL, tree_root, &key, &path, 0, 0); + btrfs_release_path(&path); + if (ret < 0) { + errno = -ret; + error("failed to check the balance status: %m"); + return ret; + } + if (ret == 0) { + error("running balance detected, please finish or cancel it."); + return -EINVAL; + } + + key.objectid = 0; + key.type = BTRFS_DEV_REPLACE_KEY; + key.offset = 0; + ret = btrfs_search_slot(NULL, dev_root, &key, &path, 0, 0); + btrfs_release_path(&path); + if (ret < 0) { + errno = -ret; + error("failed to check the dev-reaplce status: %m"); + return ret; + } + if (ret == 0) { + error("running dev-replace detected, please finish or cancel it."); + return -EINVAL; + } + return 0; +} + int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) { + int ret; + /* Phase 0, check conflicting features. */ + ret = check_csum_change_requreiment(fs_info); + if (ret < 0) + return ret; /* * Phase 1, generate new data csums. From patchwork Wed May 17 07:35:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13244315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE204C77B7A for ; Wed, 17 May 2023 07:36:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230041AbjEQHgs (ORCPT ); Wed, 17 May 2023 03:36:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230052AbjEQHgO (ORCPT ); Wed, 17 May 2023 03:36:14 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACAC55252 for ; Wed, 17 May 2023 00:36:06 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 15C2D228CC for ; Wed, 17 May 2023 07:36:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1684308965; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jRGmQvmBY86v55suzVIa9Wtb2ilZKBf2nyp7093ql4E=; b=ZPAoBNfHY6CgLVE3oEfPp1Q+UlxiZql6bCRJmslC8EQ3Kw+3n5S0SnhLyxOFBBnWsb8Yn+ BlwN1ULTW1sDY/kcnmzTXJzJO7Xgw3BjPDscedMfGB77XDM8orBTm5OE3oYisMvIIK7vuk Rm/TW/imHUwQYwIn0aaAxT4BMBY5UfM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 50F0913358 for ; Wed, 17 May 2023 07:36:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id MIMzBeSDZGQkEQAAMHmgww (envelope-from ) for ; Wed, 17 May 2023 07:36:04 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/7] btrfs-progs: tune: add the ability to read and verify the data before generating new checksum Date: Wed, 17 May 2023 15:35:38 +0800 Message-Id: <002f1672a85549a445d36d3fde0d643981efb663.1684308139.git.wqu@suse.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch introduces a new helper function, read_verify_one_data_sector(), to do the data read and checksum verification (against the old csum). This data would be later re-used to generate a new csum. And since we're introduce the helper function, we also build the skeleton to iterate the data extents using the old csum tree. This method is much better compared to iterating using extent tree, which has no directly indicator on whether the data extent has csum or not (nodatasum or preallocated). Signed-off-by: Qu Wenruo --- tune/change-csum.c | 244 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 244 insertions(+) diff --git a/tune/change-csum.c b/tune/change-csum.c index daab70b6eb4a..9d1b529e9c34 100644 --- a/tune/change-csum.c +++ b/tune/change-csum.c @@ -20,10 +20,12 @@ #include #include "kernel-shared/ctree.h" #include "kernel-shared/disk-io.h" +#include "kernel-shared/volumes.h" #include "kernel-shared/extent_io.h" #include "kernel-shared/transaction.h" #include "common/messages.h" #include "common/internal.h" +#include "common/utils.h" #include "tune/tune.h" static int check_csum_change_requreiment(struct btrfs_fs_info *fs_info) @@ -80,6 +82,242 @@ static int check_csum_change_requreiment(struct btrfs_fs_info *fs_info) return 0; } +static int get_last_csum_bytenr(struct btrfs_fs_info *fs_info, u64 *result) +{ + struct btrfs_root *csum_root = btrfs_csum_root(fs_info, 0); + struct btrfs_path path = { 0 }; + struct btrfs_key key; + int ret; + + key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; + key.type = BTRFS_EXTENT_CSUM_KEY; + key.offset = (u64)-1; + + ret = btrfs_search_slot(NULL, csum_root, &key, &path, 0, 0); + if (ret < 0) + return ret; + assert(ret > 0); + ret = btrfs_previous_item(csum_root, &path, BTRFS_EXTENT_CSUM_OBJECTID, + BTRFS_EXTENT_CSUM_KEY); + if (ret < 0) + return ret; + /* + * Emptry csum tree, set last csum byte to 0 so we can skip new data + * csum generation. + */ + if (ret > 0) { + *result = 0; + btrfs_release_path(&path); + return 0; + } + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]); + *result = key.offset + btrfs_item_size(path.nodes[0], path.slots[0]) / + fs_info->csum_size * fs_info->sectorsize; + btrfs_release_path(&path); + return 0; +} + +static int read_verify_one_data_sector(struct btrfs_fs_info *fs_info, + u64 logical, void *data_buf, + const void *old_csums) +{ + const u32 sectorsize = fs_info->sectorsize; + int num_copies = btrfs_num_copies(fs_info, logical, sectorsize); + bool found_good = false; + + for (int mirror = 1; mirror <= num_copies; mirror++) { + u8 csum_has[BTRFS_CSUM_SIZE]; + u64 readlen = sectorsize; + int ret; + + ret = read_data_from_disk(fs_info, data_buf, logical, &readlen, + mirror); + if (ret < 0) { + errno = -ret; + error("failed to read logical %llu: %m", logical); + continue; + } + btrfs_csum_data(fs_info, fs_info->csum_type, data_buf, csum_has, + sectorsize); + if (memcmp(csum_has, old_csums, fs_info->csum_size) == 0) { + found_good = true; + break; + } else { + char found[BTRFS_CSUM_STRING_LEN]; + char want[BTRFS_CSUM_STRING_LEN]; + + btrfs_format_csum(fs_info->csum_type, old_csums, want); + btrfs_format_csum(fs_info->csum_type, csum_has, found); + error("csum mismatch for logical %llu mirror %u, has %s expected %s", + logical, mirror, found, want); + } + } + if (!found_good) + return -EIO; + return 0; +} + +static int generate_new_csum_range(struct btrfs_trans_handle *trans, + u64 logical, u64 length, u16 new_csum_type, + const void *old_csums) +{ + struct btrfs_fs_info *fs_info = trans->fs_info; + const u32 sectorsize = fs_info->sectorsize; + int ret = 0; + void *buf; + + buf = malloc(fs_info->sectorsize); + if (!buf) + return -ENOMEM; + + for (u64 cur = logical; cur < logical + length; cur += sectorsize) { + ret = read_verify_one_data_sector(fs_info, cur, buf, old_csums + + (cur - logical) / sectorsize * fs_info->csum_size); + + if (ret < 0) { + error("failed to recover a good copy for data at logical %llu", + logical); + goto out; + } + /* Calculate new csum and insert it into the csum tree. */ + ret = -EOPNOTSUPP; + } +out: + free(buf); + return ret; +} + +/* + * After reading this many bytes of data, commit the current transaction. + * + * Only a soft cap, we can exceed the threshold if hitting a large enough csum + * item. + */ +#define CSUM_CHANGE_BYTES_THRESHOLD (SZ_2M) +static int generate_new_data_csums(struct btrfs_fs_info *fs_info, u16 new_csum_type) +{ + struct btrfs_root *tree_root = fs_info->tree_root; + struct btrfs_root *csum_root = btrfs_csum_root(fs_info, 0); + struct btrfs_trans_handle *trans; + struct btrfs_path path = { 0 }; + struct btrfs_key key; + const u32 new_csum_size = btrfs_csum_type_size(new_csum_type); + void *csum_buffer; + u64 converted_bytes = 0; + u64 last_csum; + u64 cur = 0; + int ret; + + ret = get_last_csum_bytenr(fs_info, &last_csum); + if (ret < 0) { + errno = -ret; + error("failed to get the last csum item: %m"); + return ret; + } + csum_buffer = malloc(fs_info->nodesize); + if (!csum_buffer) + return -ENOMEM; + + trans = btrfs_start_transaction(tree_root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + errno = -ret; + error("failed to start transaction: %m"); + goto out; + } + key.objectid = BTRFS_CSUM_CHANGE_OBJECTID; + key.type = BTRFS_TEMPORARY_ITEM_KEY; + key.offset = new_csum_type; + ret = btrfs_insert_empty_item(trans, tree_root, &path, &key, 0); + btrfs_release_path(&path); + if (ret < 0) { + errno = -ret; + error("failed to insert csum change item: %m"); + btrfs_abort_transaction(trans, ret); + goto out; + } + btrfs_set_super_flags(fs_info->super_copy, + btrfs_super_flags(fs_info->super_copy) | + BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM); + ret = btrfs_commit_transaction(trans, tree_root); + if (ret < 0) { + errno = -ret; + error("failed to commit the initial transaction: %m"); + goto out; + } + + trans = btrfs_start_transaction(csum_root, + CSUM_CHANGE_BYTES_THRESHOLD / fs_info->sectorsize * + new_csum_size); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + errno = -ret; + error("failed to start transaction: %m"); + return ret; + } + + while (cur < last_csum) { + u64 start; + u64 len; + u32 item_size; + + key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; + key.type = BTRFS_EXTENT_CSUM_KEY; + key.offset = cur; + + ret = btrfs_search_slot(NULL, csum_root, &key, &path, 0, 0); + if (ret < 0) + goto out; + if (ret > 0 && path.slots[0] >= + btrfs_header_nritems(path.nodes[0])) { + ret = btrfs_next_leaf(csum_root, &path); + if (ret > 0) { + ret = 0; + btrfs_release_path(&path); + break; + } + if (ret < 0) { + btrfs_release_path(&path); + goto out; + } + } + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]); + assert(key.offset >= cur); + item_size = btrfs_item_size(path.nodes[0], path.slots[0]); + + start = key.offset; + len = item_size / fs_info->csum_size * fs_info->sectorsize; + read_extent_buffer(path.nodes[0], csum_buffer, + btrfs_item_ptr_offset(path.nodes[0], path.slots[0]), + item_size); + btrfs_release_path(&path); + + ret = generate_new_csum_range(trans, start, len, new_csum_type, + csum_buffer); + if (ret < 0) + goto out; + converted_bytes += len; + if (converted_bytes >= CSUM_CHANGE_BYTES_THRESHOLD) { + converted_bytes = 0; + ret = btrfs_commit_transaction(trans, csum_root); + if (ret < 0) + goto out; + trans = btrfs_start_transaction(csum_root, + CSUM_CHANGE_BYTES_THRESHOLD / + fs_info->sectorsize * new_csum_size); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto out; + } + } + cur = start + len; + } + ret = btrfs_commit_transaction(trans, csum_root); +out: + free(csum_buffer); + return ret; +} + int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) { int ret; @@ -96,6 +334,12 @@ int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) * will be a temporary item in root tree to indicate the new checksum * algo. */ + ret = generate_new_data_csums(fs_info, new_csum_type); + if (ret < 0) { + errno = -ret; + error("failed to generate new data csums: %m"); + return ret; + } /* Phase 2, delete the old data csums. */ From patchwork Wed May 17 07:35:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13244316 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1979C77B7A for ; Wed, 17 May 2023 07:36:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229922AbjEQHgx (ORCPT ); Wed, 17 May 2023 03:36:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230048AbjEQHgT (ORCPT ); Wed, 17 May 2023 03:36:19 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8007526E for ; Wed, 17 May 2023 00:36:07 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 4C9641FF66 for ; Wed, 17 May 2023 07:36:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1684308966; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pmFoY+Py4viPmGF+xDWbAea3mgtjMM1fZ7ep1ABYKxU=; b=nk3hoFydtP1WtUAPMrRQbukg82dFzY4F/tKm366dE47oB/I57YE/9lnSmXnn9MDPHL+v/C u3cXnBW/L4AtdPiIeR78TLGsH09MQhSjj9YtfSPPQ8Ij5ByypfxLvcbUiNRXW1He68Z1s+ Izt7Jq/fNVcHwjU7vnBwwBKm+pgWKr0= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 88F8413358 for ; Wed, 17 May 2023 07:36:05 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id UJe/EuWDZGQkEQAAMHmgww (envelope-from ) for ; Wed, 17 May 2023 07:36:05 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 4/7] btrfs-progs: tune: add the ability to generate new data checksums Date: Wed, 17 May 2023 15:35:39 +0800 Message-Id: <03dc355cd17236651c57b4d18210a2a74ee3129e.1684308139.git.wqu@suse.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch would modify btrfs_csum_file_block() to handle csum type other than the one used in the current fs. The new data checksum would use a different objectid (-13) to distinguish with the existing one (-10). This needs to change tree-checker accept such new key objectid and skip the item size checks, since new csum can be larger than the original csum. After this stage, the resulted csum tree would look like this: item 0 key (CSUM_CHANGE EXTENT_CSUM 13631488) itemoff 8091 itemsize 8192 range start 13631488 end 22020096 length 8388608 item 1 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 7067 itemsize 1024 range start 13631488 end 14680064 length 1048576 Note the itemsize is 8 times the original one, as the original csum is CRC32, while target csum is SHA256, which is 8 times the size. Signed-off-by: Qu Wenruo --- check/mode-common.c | 11 ++++++----- convert/main.c | 12 ++++++------ kernel-shared/file-item.c | 34 ++++++++++++++++++---------------- kernel-shared/file-item.h | 4 ++-- kernel-shared/tree-checker.c | 12 ++++++++---- mkfs/rootdir.c | 11 ++++++----- tune/change-csum.c | 10 +++++++++- 7 files changed, 55 insertions(+), 39 deletions(-) diff --git a/check/mode-common.c b/check/mode-common.c index a38d2afc6b6f..175e90f78bdc 100644 --- a/check/mode-common.c +++ b/check/mode-common.c @@ -1209,18 +1209,19 @@ static int populate_csum(struct btrfs_trans_handle *trans, struct btrfs_root *csum_root, char *buf, u64 start, u64 len) { + struct btrfs_fs_info *fs_info = trans->fs_info; u64 offset = 0; - u64 sectorsize; + u64 sectorsize = fs_info->sectorsize; int ret = 0; while (offset < len) { - sectorsize = gfs_info->sectorsize; - ret = read_data_from_disk(gfs_info, buf, start + offset, + ret = read_data_from_disk(fs_info, buf, start + offset, §orsize, 0); if (ret) break; - ret = btrfs_csum_file_block(trans, start + len, start + offset, - buf, sectorsize); + ret = btrfs_csum_file_block(trans, start + offset, + BTRFS_EXTENT_CSUM_OBJECTID, + fs_info->csum_type, buf); if (ret) break; offset += sectorsize; diff --git a/convert/main.c b/convert/main.c index 9781200d7e42..0a62101d7e48 100644 --- a/convert/main.c +++ b/convert/main.c @@ -182,7 +182,8 @@ static int csum_disk_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 disk_bytenr, u64 num_bytes) { - u32 blocksize = root->fs_info->sectorsize; + struct btrfs_fs_info *fs_info = trans->fs_info; + u32 blocksize = fs_info->sectorsize; u64 offset; char *buffer; int ret = 0; @@ -193,7 +194,7 @@ static int csum_disk_extent(struct btrfs_trans_handle *trans, for (offset = 0; offset < num_bytes; offset += blocksize) { u64 read_len = blocksize; - ret = read_data_from_disk(root->fs_info, buffer, + ret = read_data_from_disk(fs_info, buffer, disk_bytenr + offset, &read_len, 0); if (ret) break; @@ -203,10 +204,9 @@ static int csum_disk_extent(struct btrfs_trans_handle *trans, ret = -EIO; break; } - ret = btrfs_csum_file_block(trans, - disk_bytenr + num_bytes, - disk_bytenr + offset, - buffer, blocksize); + ret = btrfs_csum_file_block(trans, disk_bytenr + offset, + BTRFS_EXTENT_CSUM_OBJECTID, + fs_info->csum_type, buffer); if (ret) break; } diff --git a/kernel-shared/file-item.c b/kernel-shared/file-item.c index 9b59a4b7a9ae..1a2f5f147328 100644 --- a/kernel-shared/file-item.c +++ b/kernel-shared/file-item.c @@ -134,7 +134,7 @@ static struct btrfs_csum_item * btrfs_lookup_csum(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, - u64 bytenr, int cow) + u64 bytenr, u64 csum_objectid, u16 csum_type, int cow) { int ret; struct btrfs_key file_key; @@ -142,10 +142,10 @@ btrfs_lookup_csum(struct btrfs_trans_handle *trans, struct btrfs_csum_item *item; struct extent_buffer *leaf; u64 csum_offset = 0; - u16 csum_size = root->fs_info->csum_size; + u16 csum_size = btrfs_csum_type_size(csum_type); int csums_in_item; - file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; + file_key.objectid = csum_objectid; file_key.offset = bytenr; file_key.type = BTRFS_EXTENT_CSUM_KEY; ret = btrfs_search_slot(trans, root, &file_key, path, 0, cow); @@ -159,7 +159,8 @@ btrfs_lookup_csum(struct btrfs_trans_handle *trans, goto fail; path->slots[0]--; btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]); - if (found_key.type != BTRFS_EXTENT_CSUM_KEY) + if (found_key.type != BTRFS_EXTENT_CSUM_KEY || + found_key.objectid != csum_objectid) goto fail; csum_offset = (bytenr - found_key.offset) / @@ -182,10 +183,10 @@ fail: return ERR_PTR(ret); } -int btrfs_csum_file_block(struct btrfs_trans_handle *trans, - u64 alloc_end, u64 bytenr, char *data, size_t len) +int btrfs_csum_file_block(struct btrfs_trans_handle *trans, u64 logical, + u64 csum_objectid, u32 csum_type, const char *data) { - struct btrfs_root *root = btrfs_csum_root(trans->fs_info, bytenr); + struct btrfs_root *root = btrfs_csum_root(trans->fs_info, logical); int ret = 0; struct btrfs_key file_key; struct btrfs_key found_key; @@ -199,18 +200,18 @@ int btrfs_csum_file_block(struct btrfs_trans_handle *trans, u32 sectorsize = root->fs_info->sectorsize; u32 nritems; u32 ins_size; - u16 csum_size = root->fs_info->csum_size; - u16 csum_type = root->fs_info->csum_type; + u16 csum_size = btrfs_csum_type_size(csum_type); path = btrfs_alloc_path(); if (!path) return -ENOMEM; - file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; - file_key.offset = bytenr; + file_key.objectid = csum_objectid; + file_key.offset = logical; file_key.type = BTRFS_EXTENT_CSUM_KEY; - item = btrfs_lookup_csum(trans, root, path, bytenr, 1); + item = btrfs_lookup_csum(trans, root, path, logical, csum_objectid, + csum_type, 1); if (!IS_ERR(item)) { leaf = path->nodes[0]; ret = 0; @@ -241,7 +242,7 @@ int btrfs_csum_file_block(struct btrfs_trans_handle *trans, slot = 0; } btrfs_item_key_to_cpu(path->nodes[0], &found_key, slot); - if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID || + if (found_key.objectid != csum_objectid || found_key.type != BTRFS_EXTENT_CSUM_KEY) { found_next = 1; goto insert; @@ -270,7 +271,7 @@ int btrfs_csum_file_block(struct btrfs_trans_handle *trans, leaf = path->nodes[0]; btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]); csum_offset = (file_key.offset - found_key.offset) / sectorsize; - if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID || + if (found_key.objectid != csum_objectid || found_key.type != BTRFS_EXTENT_CSUM_KEY || csum_offset >= MAX_CSUM_ITEMS(root, csum_size)) { goto insert; @@ -290,7 +291,7 @@ insert: btrfs_release_path(path); csum_offset = 0; if (found_next) { - u64 tmp = min(alloc_end, next_offset); + u64 tmp = min(logical + sectorsize, next_offset); tmp -= file_key.offset; tmp /= sectorsize; tmp = max((u64)1, tmp); @@ -314,7 +315,8 @@ csum: item = (struct btrfs_csum_item *)((unsigned char *)item + csum_offset * csum_size); found: - btrfs_csum_data(root->fs_info, csum_type, (u8 *)data, csum_result, len); + btrfs_csum_data(root->fs_info, csum_type, (u8 *)data, csum_result, + sectorsize); write_extent_buffer(leaf, csum_result, (unsigned long)item, csum_size); btrfs_mark_buffer_dirty(path->nodes[0]); diff --git a/kernel-shared/file-item.h b/kernel-shared/file-item.h index 25dfecca3429..efbe5f2093aa 100644 --- a/kernel-shared/file-item.h +++ b/kernel-shared/file-item.h @@ -80,8 +80,8 @@ int btrfs_insert_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 objectid, u64 pos, u64 offset, u64 disk_num_bytes, u64 num_bytes); -int btrfs_csum_file_block(struct btrfs_trans_handle *trans, - u64 alloc_end, u64 bytenr, char *data, size_t len); +int btrfs_csum_file_block(struct btrfs_trans_handle *trans, u64 logical, + u64 csum_objectid, u32 csum_type, const char *data); int btrfs_insert_inline_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 objectid, u64 offset, const char *buffer, size_t size); diff --git a/kernel-shared/tree-checker.c b/kernel-shared/tree-checker.c index b28e42821533..8a1675749769 100644 --- a/kernel-shared/tree-checker.c +++ b/kernel-shared/tree-checker.c @@ -367,10 +367,12 @@ static int check_csum_item(struct extent_buffer *leaf, struct btrfs_key *key, u32 sectorsize = fs_info->sectorsize; const u32 csumsize = fs_info->csum_size; - if (unlikely(key->objectid != BTRFS_EXTENT_CSUM_OBJECTID)) { + if (unlikely(key->objectid != BTRFS_EXTENT_CSUM_OBJECTID && + key->objectid != BTRFS_CSUM_CHANGE_OBJECTID)) { generic_err(leaf, slot, - "invalid key objectid for csum item, have %llu expect %llu", - key->objectid, BTRFS_EXTENT_CSUM_OBJECTID); + "invalid key objectid for csum item, have %llu expect %llu or %llu", + key->objectid, BTRFS_EXTENT_CSUM_OBJECTID, + BTRFS_CSUM_CHANGE_OBJECTID); return -EUCLEAN; } if (unlikely(!IS_ALIGNED(key->offset, sectorsize))) { @@ -385,7 +387,9 @@ static int check_csum_item(struct extent_buffer *leaf, struct btrfs_key *key, btrfs_item_size(leaf, slot), csumsize); return -EUCLEAN; } - if (slot > 0 && prev_key->type == BTRFS_EXTENT_CSUM_KEY) { + if (slot > 0 && prev_key->type == BTRFS_EXTENT_CSUM_KEY && + !(key->objectid == BTRFS_CSUM_CHANGE_OBJECTID || + prev_key->objectid == BTRFS_CSUM_CHANGE_OBJECTID)) { u64 prev_csum_end; u32 prev_item_size; diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c index 5fd3c6feea5c..4f7feb529998 100644 --- a/mkfs/rootdir.c +++ b/mkfs/rootdir.c @@ -306,12 +306,13 @@ static int add_file_items(struct btrfs_trans_handle *trans, struct btrfs_inode_item *btrfs_inode, u64 objectid, struct stat *st, const char *path_name) { + struct btrfs_fs_info *fs_info = trans->fs_info; int ret = -1; ssize_t ret_read; u64 bytes_read = 0; struct btrfs_key key; int blocks; - u32 sectorsize = root->fs_info->sectorsize; + u32 sectorsize = fs_info->sectorsize; u64 first_block = 0; u64 file_pos = 0; u64 cur_bytes; @@ -332,7 +333,7 @@ static int add_file_items(struct btrfs_trans_handle *trans, if (st->st_size % sectorsize) blocks += 1; - if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) && + if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(fs_info) && st->st_size < sectorsize) { char *buffer = malloc(st->st_size); @@ -397,9 +398,9 @@ again: goto end; } - ret = btrfs_csum_file_block(trans, - first_block + bytes_read + sectorsize, - first_block + bytes_read, buf, sectorsize); + ret = btrfs_csum_file_block(trans, first_block + bytes_read, + BTRFS_EXTENT_CSUM_OBJECTID, + fs_info->csum_type, buf); if (ret) goto end; diff --git a/tune/change-csum.c b/tune/change-csum.c index 9d1b529e9c34..a30d142c1600 100644 --- a/tune/change-csum.c +++ b/tune/change-csum.c @@ -21,6 +21,7 @@ #include "kernel-shared/ctree.h" #include "kernel-shared/disk-io.h" #include "kernel-shared/volumes.h" +#include "kernel-shared/file-item.h" #include "kernel-shared/extent_io.h" #include "kernel-shared/transaction.h" #include "common/messages.h" @@ -180,7 +181,14 @@ static int generate_new_csum_range(struct btrfs_trans_handle *trans, goto out; } /* Calculate new csum and insert it into the csum tree. */ - ret = -EOPNOTSUPP; + ret = btrfs_csum_file_block(trans, cur, + BTRFS_CSUM_CHANGE_OBJECTID, new_csum_type, buf); + if (ret < 0) { + errno = -ret; + error("failed to insert new csum for data at logical %llu: %m", + cur); + goto out; + } } out: free(buf); From patchwork Wed May 17 07:35:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13244317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB832C77B75 for ; Wed, 17 May 2023 07:36:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229978AbjEQHgz (ORCPT ); Wed, 17 May 2023 03:36:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230016AbjEQHgU (ORCPT ); Wed, 17 May 2023 03:36:20 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C16AAE52 for ; Wed, 17 May 2023 00:36:08 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 82370203FF for ; Wed, 17 May 2023 07:36:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1684308967; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=khM5rjNMvZXQdcEyD6GtQ5FUuN2k5B41GbOI6BglZCs=; b=GgUx+XOya2+eQoCrIQU8e1mq3S6pPaKaiSqN8SFVkYFFb30w2FNmWoowrWc6b+TToqdESC EHqL+y8lIF8Y0QPTEwlTgBLO5Ayr0bbF12acObzfLANXL65SVnJzEmsigbiYHxF231OsID os/4IBDVoipMhQ2or+xlvSYhOtUKYkw= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id BDA1B13358 for ; Wed, 17 May 2023 07:36:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id IHDhH+aDZGQkEQAAMHmgww (envelope-from ) for ; Wed, 17 May 2023 07:36:06 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 5/7] btrfs-progs: tune: add the ability to delete old data csums Date: Wed, 17 May 2023 15:35:40 +0800 Message-Id: <7a6073cbd38c1cc09adb904c7ba73a7d89407fc9.1684308139.git.wqu@suse.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The new helper function, delete_old_data_csums(), would delete the old data csums while keep the new one untouched. Since the new data csums have a key objectid (-13) smaller than the old data csums (-10), we can safely delete from the tail of the btree. Signed-off-by: Qu Wenruo --- tune/change-csum.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 65 insertions(+) diff --git a/tune/change-csum.c b/tune/change-csum.c index a30d142c1600..61368ddf34b9 100644 --- a/tune/change-csum.c +++ b/tune/change-csum.c @@ -326,6 +326,68 @@ out: return ret; } +static int delete_old_data_csums(struct btrfs_fs_info *fs_info) +{ + struct btrfs_root *csum_root = btrfs_csum_root(fs_info, 0); + struct btrfs_trans_handle *trans; + struct btrfs_path path = { 0 }; + struct btrfs_key last_key; + int ret; + + last_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; + last_key.type = BTRFS_EXTENT_CSUM_KEY; + last_key.offset = (u64)-1; + + trans = btrfs_start_transaction(csum_root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + errno = -ret; + error("failed to start transaction to delete old data csums: %m"); + return ret; + } + while (true) { + int start_slot; + int nr; + + ret = btrfs_search_slot(trans, csum_root, &last_key, &path, -1, 1); + + nr = btrfs_header_nritems(path.nodes[0]); + /* No item left (empty csum tree), exit. */ + if (!nr) + break; + for (start_slot = 0; start_slot < nr; start_slot++) { + struct btrfs_key found_key; + + btrfs_item_key_to_cpu(path.nodes[0], &found_key, start_slot); + /* Break from the for loop, we found the first old csum. */ + if (found_key.objectid == BTRFS_EXTENT_CSUM_OBJECTID) + break; + } + /* No more old csum item detected, exit. */ + if (start_slot == nr) + break; + + /* Delete items starting from @start_slot to the end. */ + ret = btrfs_del_items(trans, csum_root, &path, start_slot, + nr - start_slot); + if (ret < 0) { + errno = -ret; + error("failed to delete items: %m"); + break; + } + btrfs_release_path(&path); + } + btrfs_release_path(&path); + if (ret < 0) + btrfs_abort_transaction(trans, ret); + ret = btrfs_commit_transaction(trans, csum_root); + if (ret < 0) { + errno = -ret; + error("failed to commit transaction after deleting the old data csums: %m"); + } + return ret; +} + int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) { int ret; @@ -350,6 +412,9 @@ int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) } /* Phase 2, delete the old data csums. */ + ret = delete_old_data_csums(fs_info); + if (ret < 0) + return ret; /* Phase 3, change the new csum key objectid */ From patchwork Wed May 17 07:35:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13244318 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8B79C77B7A for ; Wed, 17 May 2023 07:37:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230005AbjEQHg7 (ORCPT ); Wed, 17 May 2023 03:36:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230075AbjEQHgY (ORCPT ); Wed, 17 May 2023 03:36:24 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3684855B5 for ; Wed, 17 May 2023 00:36:10 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B6978228CD for ; Wed, 17 May 2023 07:36:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1684308968; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pr7fTcIopwFahE9smHdJfRG2vpk6pQLIeViYTMv3ihI=; b=euTuhw0nvsoI0nXcTaOUPMB4FhXzEAIptteD6YFdteuSe6qHHdnwfcW2aoF/FzVW9O18y8 o521h2uGUlGc2T8l9pBdlNmQAPclWV9Iuf2YGO0TJzSLGMIA0laV+dTgGLEtVqrCcQwuIr 7cylMh2qglkFfDLvOsytU/npXua+wlo= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id F3A8613358 for ; Wed, 17 May 2023 07:36:07 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id UP/iLOeDZGQkEQAAMHmgww (envelope-from ) for ; Wed, 17 May 2023 07:36:07 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 6/7] btrfs-progs: tune: add the ability to migrate the temporary csum items to regular csum items Date: Wed, 17 May 2023 15:35:41 +0800 Message-Id: <4ae50da7c1c0a990e83f07de68417726da8e5312.1684308139.git.wqu@suse.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org At this stage, the csum tree should only contain the temporary csum items (CSUM_CHANGE, EXTENT_CSUM, logical), and no more old csum items. Now we can convert those temporary csum items back to regular csum items by changing their key objectids back to EXTENT_CSUM. Signed-off-by: Qu Wenruo --- tune/change-csum.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) diff --git a/tune/change-csum.c b/tune/change-csum.c index 61368ddf34b9..167760536336 100644 --- a/tune/change-csum.c +++ b/tune/change-csum.c @@ -388,6 +388,89 @@ static int delete_old_data_csums(struct btrfs_fs_info *fs_info) return ret; } +static int change_csum_objectids(struct btrfs_fs_info *fs_info) +{ + struct btrfs_root *csum_root = btrfs_csum_root(fs_info, 0); + struct btrfs_trans_handle *trans; + struct btrfs_path path = { 0 }; + struct btrfs_key last_key; + u64 super_flags; + int ret = 0; + + last_key.objectid = BTRFS_CSUM_CHANGE_OBJECTID; + last_key.type = BTRFS_EXTENT_CSUM_KEY; + last_key.offset = (u64)-1; + + trans = btrfs_start_transaction(csum_root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + errno = -ret; + error("failed to start transaction to change csum objectids: %m"); + return ret; + } + while (true) { + struct btrfs_key found_key; + int nr; + + ret = btrfs_search_slot(trans, csum_root, &last_key, &path, 0, 1); + if (ret < 0) + goto out; + assert(ret > 0); + + nr = btrfs_header_nritems(path.nodes[0]); + /* No item left (empty csum tree), exit. */ + if (!nr) + goto out; + /* No more temporary csum items, all converted, exit. */ + if (path.slots[0] == 0) + goto out; + + /* All csum items should be new csums. */ + btrfs_item_key_to_cpu(path.nodes[0], &found_key, 0); + assert(found_key.objectid == BTRFS_CSUM_CHANGE_OBJECTID); + + /* + * Start changing the objectids, since EXTENT_CSUM (-10) is + * larger than CSUM_CHANGE (-13), we always change from the tail. + */ + for (int i = nr - 1; i >= 0; i--) { + btrfs_item_key_to_cpu(path.nodes[0], &found_key, i); + found_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; + path.slots[0] = i; + ret = btrfs_set_item_key_safe(csum_root, &path, &found_key); + if (ret < 0) { + errno = -ret; + error("failed to set item key for data csum at logical %llu: %m", + found_key.offset); + goto out; + } + } + btrfs_release_path(&path); + } +out: + btrfs_release_path(&path); + if (ret < 0) { + btrfs_abort_transaction(trans, ret); + return ret; + } + + /* + * All data csum items has been changed to the new type, we can clear + * the superblock flag for data csum change, and go to the metadata csum + * change phase. + */ + super_flags = btrfs_super_flags(fs_info->super_copy); + super_flags &= ~BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM; + super_flags |= BTRFS_SUPER_FLAG_CHANGING_META_CSUM; + btrfs_set_super_flags(fs_info->super_copy, super_flags); + ret = btrfs_commit_transaction(trans, csum_root); + if (ret < 0) { + errno = -ret; + error("failed to commit transaction after changing data csum objectids: %m"); + } + return ret; +} + int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) { int ret; @@ -417,6 +500,9 @@ int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) return ret; /* Phase 3, change the new csum key objectid */ + ret = change_csum_objectids(fs_info); + if (ret < 0) + return ret; /* * Phase 4, change the csums for metadata. From patchwork Wed May 17 07:35:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13244319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0216EC7EE2A for ; Wed, 17 May 2023 07:37:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230017AbjEQHhC (ORCPT ); Wed, 17 May 2023 03:37:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229887AbjEQHg1 (ORCPT ); Wed, 17 May 2023 03:36:27 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39DAF4C2D for ; Wed, 17 May 2023 00:36:11 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id ECE9E228CC for ; Wed, 17 May 2023 07:36:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1684308969; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5NITmkOFquxmjy/UFc1LE/m3Hlpv+Mnkk6ileNvcmuU=; b=YIB5XbBKSaTEHwAich1IM+58TcwBixgGt9S9dI3s3O+MPUJROPB4K0drKJ9pEaO6Ko6awQ ziVmnf/pCrD/2o/qY9imshQ9+dzbgOTswhQlpgMRd2NWaEMD3g8SoyC5ZUKofu7mnOpWXm gygbYOaIPJCjMNiiaRRNsuRIARu5SD8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 35CF213358 for ; Wed, 17 May 2023 07:36:08 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id QAA1OuiDZGQkEQAAMHmgww (envelope-from ) for ; Wed, 17 May 2023 07:36:08 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 7/7] btrfs-progs: tune: add the ability to change metadata csums Date: Wed, 17 May 2023 15:35:42 +0800 Message-Id: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The csum change for metadata is like uuid-change, we go with in-place csum update without any COW. During the rewrite, we will manually check the csum (both old and new) for each tree block. And only rewrite the csum if the tree block matches its old csum. (For tree block matches its new csum, we need to do nothing). And when everything is done, just update the superblock to reflect the csum type change. Signed-off-by: Qu Wenruo --- tune/change-csum.c | 143 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 142 insertions(+), 1 deletion(-) diff --git a/tune/change-csum.c b/tune/change-csum.c index 167760536336..c8809300a143 100644 --- a/tune/change-csum.c +++ b/tune/change-csum.c @@ -471,8 +471,144 @@ out: return ret; } +static int rewrite_tree_block_csum(struct btrfs_fs_info *fs_info, u64 logical, + u16 new_csum_type) +{ + struct extent_buffer *eb; + u8 result_old[BTRFS_CSUM_SIZE]; + u8 result_new[BTRFS_CSUM_SIZE]; + int ret; + + eb = alloc_dummy_extent_buffer(fs_info, logical, fs_info->nodesize); + if (!eb) + return -ENOMEM; + + ret = btrfs_read_extent_buffer(eb, 0, 0, NULL); + if (ret < 0) { + errno = -ret; + error("failed to read tree block at logical %llu: %m", logical); + goto out; + } + + /* Verify the csum first. */ + btrfs_csum_data(fs_info, fs_info->csum_type, (u8 *)eb->data + BTRFS_CSUM_SIZE, + result_old, fs_info->nodesize - BTRFS_CSUM_SIZE); + btrfs_csum_data(fs_info, new_csum_type, (u8 *)eb->data + BTRFS_CSUM_SIZE, + result_new, fs_info->nodesize - BTRFS_CSUM_SIZE); + + /* Matches old csum, rewrite. */ + if (memcmp_extent_buffer(eb, result_old, 0, fs_info->csum_size) == 0) { + write_extent_buffer(eb, result_new, 0, + btrfs_csum_type_size(new_csum_type)); + ret = write_data_to_disk(fs_info, eb->data, eb->start, + fs_info->nodesize); + if (ret < 0) { + errno = -ret; + error("failed to write tree block at logical %llu: %m", + logical); + } + goto out; + } + + /* Already new csum. */ + if (memcmp_extent_buffer(eb, result_new, 0, fs_info->csum_size) == 0) + goto out; + + /* Csum doesn't match either old or new csum type, bad tree block. */ + ret = -EIO; + error("tree block csum mismatch at logical %llu", logical); +out: + free_extent_buffer(eb); + return ret; +} + +static int change_meta_csums(struct btrfs_fs_info *fs_info, u32 new_csum_type) +{ + struct btrfs_root *extent_root = btrfs_extent_root(fs_info, 0); + struct btrfs_path path = { 0 }; + struct btrfs_key key; + int ret; + + /* + * Disable metadata csum checks first, as we may hit tree blocks with + * either old or new csums. + * We will manually check the meta csums here. + */ + fs_info->skip_csum_check = true; + + key.objectid = 0; + key.type = 0; + key.offset = 0; + + ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0); + if (ret < 0) { + errno = -ret; + error("failed to get the first tree block of extent tree: %m"); + return ret; + } + assert(ret > 0); + while (true) { + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]); + if (key.type != BTRFS_EXTENT_ITEM_KEY && + key.type != BTRFS_METADATA_ITEM_KEY) + goto next; + + if (key.type == BTRFS_EXTENT_ITEM_KEY) { + struct btrfs_extent_item *ei; + ei = btrfs_item_ptr(path.nodes[0], path.slots[0], + struct btrfs_extent_item); + if (btrfs_extent_flags(path.nodes[0], ei) & + BTRFS_EXTENT_FLAG_DATA) + goto next; + } + ret = rewrite_tree_block_csum(fs_info, key.objectid, new_csum_type); + if (ret < 0) { + errno = -ret; + error("failed to rewrite csum for tree block %llu: %m", + key.offset); + goto out; + } +next: + ret = btrfs_next_extent_item(extent_root, &path, U64_MAX); + if (ret < 0) { + errno = -ret; + error("failed to get next extent item: %m"); + } + if (ret > 0) { + ret = 0; + goto out; + } + } +out: + btrfs_release_path(&path); + + /* + * Finish the change by clearing the csum change flag and update the superblock + * csum type. + */ + if (ret == 0) { + u64 super_flags = btrfs_super_flags(fs_info->super_copy); + + btrfs_set_super_csum_type(fs_info->super_copy, new_csum_type); + super_flags &= ~(BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM | + BTRFS_SUPER_FLAG_CHANGING_META_CSUM); + btrfs_set_super_flags(fs_info->super_copy, super_flags); + + fs_info->csum_type = new_csum_type; + fs_info->csum_size = btrfs_csum_type_size(new_csum_type); + + ret = write_all_supers(fs_info); + if (ret < 0) { + errno = -ret; + error("failed to write super blocks: %m"); + } + } + return ret; +} + int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) { + u16 old_csum_type = fs_info->csum_type; int ret; /* Phase 0, check conflicting features. */ @@ -511,5 +647,10 @@ int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) * like relocation in progs. * Thus we have to support reading a tree block with either csum. */ - return -EOPNOTSUPP; + ret = change_meta_csums(fs_info, new_csum_type); + if (ret == 0) + printf("converted csum type from %s (%u) to %s (%u)\n", + btrfs_super_csum_name(old_csum_type), old_csum_type, + btrfs_super_csum_name(new_csum_type), new_csum_type); + return ret; }