From patchwork Wed May 17 07:35:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13244315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE204C77B7A for ; Wed, 17 May 2023 07:36:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230041AbjEQHgs (ORCPT ); Wed, 17 May 2023 03:36:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230052AbjEQHgO (ORCPT ); Wed, 17 May 2023 03:36:14 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACAC55252 for ; Wed, 17 May 2023 00:36:06 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 15C2D228CC for ; Wed, 17 May 2023 07:36:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1684308965; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jRGmQvmBY86v55suzVIa9Wtb2ilZKBf2nyp7093ql4E=; b=ZPAoBNfHY6CgLVE3oEfPp1Q+UlxiZql6bCRJmslC8EQ3Kw+3n5S0SnhLyxOFBBnWsb8Yn+ BlwN1ULTW1sDY/kcnmzTXJzJO7Xgw3BjPDscedMfGB77XDM8orBTm5OE3oYisMvIIK7vuk Rm/TW/imHUwQYwIn0aaAxT4BMBY5UfM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 50F0913358 for ; Wed, 17 May 2023 07:36:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id MIMzBeSDZGQkEQAAMHmgww (envelope-from ) for ; Wed, 17 May 2023 07:36:04 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/7] btrfs-progs: tune: add the ability to read and verify the data before generating new checksum Date: Wed, 17 May 2023 15:35:38 +0800 Message-Id: <002f1672a85549a445d36d3fde0d643981efb663.1684308139.git.wqu@suse.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch introduces a new helper function, read_verify_one_data_sector(), to do the data read and checksum verification (against the old csum). This data would be later re-used to generate a new csum. And since we're introduce the helper function, we also build the skeleton to iterate the data extents using the old csum tree. This method is much better compared to iterating using extent tree, which has no directly indicator on whether the data extent has csum or not (nodatasum or preallocated). Signed-off-by: Qu Wenruo --- tune/change-csum.c | 244 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 244 insertions(+) diff --git a/tune/change-csum.c b/tune/change-csum.c index daab70b6eb4a..9d1b529e9c34 100644 --- a/tune/change-csum.c +++ b/tune/change-csum.c @@ -20,10 +20,12 @@ #include #include "kernel-shared/ctree.h" #include "kernel-shared/disk-io.h" +#include "kernel-shared/volumes.h" #include "kernel-shared/extent_io.h" #include "kernel-shared/transaction.h" #include "common/messages.h" #include "common/internal.h" +#include "common/utils.h" #include "tune/tune.h" static int check_csum_change_requreiment(struct btrfs_fs_info *fs_info) @@ -80,6 +82,242 @@ static int check_csum_change_requreiment(struct btrfs_fs_info *fs_info) return 0; } +static int get_last_csum_bytenr(struct btrfs_fs_info *fs_info, u64 *result) +{ + struct btrfs_root *csum_root = btrfs_csum_root(fs_info, 0); + struct btrfs_path path = { 0 }; + struct btrfs_key key; + int ret; + + key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; + key.type = BTRFS_EXTENT_CSUM_KEY; + key.offset = (u64)-1; + + ret = btrfs_search_slot(NULL, csum_root, &key, &path, 0, 0); + if (ret < 0) + return ret; + assert(ret > 0); + ret = btrfs_previous_item(csum_root, &path, BTRFS_EXTENT_CSUM_OBJECTID, + BTRFS_EXTENT_CSUM_KEY); + if (ret < 0) + return ret; + /* + * Emptry csum tree, set last csum byte to 0 so we can skip new data + * csum generation. + */ + if (ret > 0) { + *result = 0; + btrfs_release_path(&path); + return 0; + } + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]); + *result = key.offset + btrfs_item_size(path.nodes[0], path.slots[0]) / + fs_info->csum_size * fs_info->sectorsize; + btrfs_release_path(&path); + return 0; +} + +static int read_verify_one_data_sector(struct btrfs_fs_info *fs_info, + u64 logical, void *data_buf, + const void *old_csums) +{ + const u32 sectorsize = fs_info->sectorsize; + int num_copies = btrfs_num_copies(fs_info, logical, sectorsize); + bool found_good = false; + + for (int mirror = 1; mirror <= num_copies; mirror++) { + u8 csum_has[BTRFS_CSUM_SIZE]; + u64 readlen = sectorsize; + int ret; + + ret = read_data_from_disk(fs_info, data_buf, logical, &readlen, + mirror); + if (ret < 0) { + errno = -ret; + error("failed to read logical %llu: %m", logical); + continue; + } + btrfs_csum_data(fs_info, fs_info->csum_type, data_buf, csum_has, + sectorsize); + if (memcmp(csum_has, old_csums, fs_info->csum_size) == 0) { + found_good = true; + break; + } else { + char found[BTRFS_CSUM_STRING_LEN]; + char want[BTRFS_CSUM_STRING_LEN]; + + btrfs_format_csum(fs_info->csum_type, old_csums, want); + btrfs_format_csum(fs_info->csum_type, csum_has, found); + error("csum mismatch for logical %llu mirror %u, has %s expected %s", + logical, mirror, found, want); + } + } + if (!found_good) + return -EIO; + return 0; +} + +static int generate_new_csum_range(struct btrfs_trans_handle *trans, + u64 logical, u64 length, u16 new_csum_type, + const void *old_csums) +{ + struct btrfs_fs_info *fs_info = trans->fs_info; + const u32 sectorsize = fs_info->sectorsize; + int ret = 0; + void *buf; + + buf = malloc(fs_info->sectorsize); + if (!buf) + return -ENOMEM; + + for (u64 cur = logical; cur < logical + length; cur += sectorsize) { + ret = read_verify_one_data_sector(fs_info, cur, buf, old_csums + + (cur - logical) / sectorsize * fs_info->csum_size); + + if (ret < 0) { + error("failed to recover a good copy for data at logical %llu", + logical); + goto out; + } + /* Calculate new csum and insert it into the csum tree. */ + ret = -EOPNOTSUPP; + } +out: + free(buf); + return ret; +} + +/* + * After reading this many bytes of data, commit the current transaction. + * + * Only a soft cap, we can exceed the threshold if hitting a large enough csum + * item. + */ +#define CSUM_CHANGE_BYTES_THRESHOLD (SZ_2M) +static int generate_new_data_csums(struct btrfs_fs_info *fs_info, u16 new_csum_type) +{ + struct btrfs_root *tree_root = fs_info->tree_root; + struct btrfs_root *csum_root = btrfs_csum_root(fs_info, 0); + struct btrfs_trans_handle *trans; + struct btrfs_path path = { 0 }; + struct btrfs_key key; + const u32 new_csum_size = btrfs_csum_type_size(new_csum_type); + void *csum_buffer; + u64 converted_bytes = 0; + u64 last_csum; + u64 cur = 0; + int ret; + + ret = get_last_csum_bytenr(fs_info, &last_csum); + if (ret < 0) { + errno = -ret; + error("failed to get the last csum item: %m"); + return ret; + } + csum_buffer = malloc(fs_info->nodesize); + if (!csum_buffer) + return -ENOMEM; + + trans = btrfs_start_transaction(tree_root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + errno = -ret; + error("failed to start transaction: %m"); + goto out; + } + key.objectid = BTRFS_CSUM_CHANGE_OBJECTID; + key.type = BTRFS_TEMPORARY_ITEM_KEY; + key.offset = new_csum_type; + ret = btrfs_insert_empty_item(trans, tree_root, &path, &key, 0); + btrfs_release_path(&path); + if (ret < 0) { + errno = -ret; + error("failed to insert csum change item: %m"); + btrfs_abort_transaction(trans, ret); + goto out; + } + btrfs_set_super_flags(fs_info->super_copy, + btrfs_super_flags(fs_info->super_copy) | + BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM); + ret = btrfs_commit_transaction(trans, tree_root); + if (ret < 0) { + errno = -ret; + error("failed to commit the initial transaction: %m"); + goto out; + } + + trans = btrfs_start_transaction(csum_root, + CSUM_CHANGE_BYTES_THRESHOLD / fs_info->sectorsize * + new_csum_size); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + errno = -ret; + error("failed to start transaction: %m"); + return ret; + } + + while (cur < last_csum) { + u64 start; + u64 len; + u32 item_size; + + key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; + key.type = BTRFS_EXTENT_CSUM_KEY; + key.offset = cur; + + ret = btrfs_search_slot(NULL, csum_root, &key, &path, 0, 0); + if (ret < 0) + goto out; + if (ret > 0 && path.slots[0] >= + btrfs_header_nritems(path.nodes[0])) { + ret = btrfs_next_leaf(csum_root, &path); + if (ret > 0) { + ret = 0; + btrfs_release_path(&path); + break; + } + if (ret < 0) { + btrfs_release_path(&path); + goto out; + } + } + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]); + assert(key.offset >= cur); + item_size = btrfs_item_size(path.nodes[0], path.slots[0]); + + start = key.offset; + len = item_size / fs_info->csum_size * fs_info->sectorsize; + read_extent_buffer(path.nodes[0], csum_buffer, + btrfs_item_ptr_offset(path.nodes[0], path.slots[0]), + item_size); + btrfs_release_path(&path); + + ret = generate_new_csum_range(trans, start, len, new_csum_type, + csum_buffer); + if (ret < 0) + goto out; + converted_bytes += len; + if (converted_bytes >= CSUM_CHANGE_BYTES_THRESHOLD) { + converted_bytes = 0; + ret = btrfs_commit_transaction(trans, csum_root); + if (ret < 0) + goto out; + trans = btrfs_start_transaction(csum_root, + CSUM_CHANGE_BYTES_THRESHOLD / + fs_info->sectorsize * new_csum_size); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto out; + } + } + cur = start + len; + } + ret = btrfs_commit_transaction(trans, csum_root); +out: + free(csum_buffer); + return ret; +} + int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) { int ret; @@ -96,6 +334,12 @@ int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type) * will be a temporary item in root tree to indicate the new checksum * algo. */ + ret = generate_new_data_csums(fs_info, new_csum_type); + if (ret < 0) { + errno = -ret; + error("failed to generate new data csums: %m"); + return ret; + } /* Phase 2, delete the old data csums. */