From patchwork Tue Aug 24 07:41:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12454113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4D18C4338F for ; Tue, 24 Aug 2021 07:42:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C8AAD60F91 for ; Tue, 24 Aug 2021 07:42:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234956AbhHXHnO (ORCPT ); Tue, 24 Aug 2021 03:43:14 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:55624 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235459AbhHXHmD (ORCPT ); Tue, 24 Aug 2021 03:42:03 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C695A20043 for ; Tue, 24 Aug 2021 07:41:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1629790874; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H/wTJErevrgfWxMf/w02OpN73ncnwnyzl2SITHsxYtE=; b=G9h4XhDMoOvhGn3rvZPY5Me/uItGvpCQOOsUJOspYKXne/LgsGKStwOe5+GtKcCxOcLOcl rjF3ee+RrxiK/IGEYG+z7cZuUNFB7L+HFkXSDCPSgo8OnIZ2vEdnZzAaqzYOf+BgMFSXtJ aN1/4NC4biIV+RfoD/4DJH81LfO+SMk= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id C353013942 for ; Tue, 24 Aug 2021 07:41:13 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id YJ8gHJmiJGF8bwAAGKfGzw (envelope-from ) for ; Tue, 24 Aug 2021 07:41:13 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v7 1/4] btrfs-progs: image: introduce framework for more dump versions Date: Tue, 24 Aug 2021 15:41:05 +0800 Message-Id: <20210824074108.44759-2-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210824074108.44759-1-wqu@suse.com> References: <20210824074108.44759-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The original dump format only contains a @magic member to verify the format, this means if we want to introduce new on-disk format or change certain size limit, we can only introduce new magic as version. This patch will introduce the framework to allow multiple magic numbers to co-exist for further functions. This patch will introduce the following members for each dump version. - max_pending_size The threshold size for an cluster. It's not a hard limit but a soft one. One cluster can go larger than max_pending_size for one item, but next item would go to next cluster. - magic_cpu The magic number in CPU endian. - extra_sb_flags If the super block of this restore needs extra super block flags like BTRFS_SUPER_FLAG_METADUMP_V2. For incoming data dump feature, we don't need any extra super block flags. This change also implies that all image dumps will use the same magic for all clusters. No mixing is allowed, as we will use the first cluster to determine the dump version. Signed-off-by: Qu Wenruo --- image/main.c | 72 ++++++++++++++++++++++++++++++++++++++++++------ image/metadump.h | 12 ++++++-- 2 files changed, 73 insertions(+), 11 deletions(-) diff --git a/image/main.c b/image/main.c index b29e68f80863..65a42ad7d85d 100644 --- a/image/main.c +++ b/image/main.c @@ -45,6 +45,19 @@ #define MAX_WORKER_THREADS (32) +const struct dump_version dump_versions[] = { + /* + * The original format, which only supports tree blocks and + * free space cache dump. + */ + { .version = 0, + .max_pending_size = SZ_256K, + .magic_cpu = 0xbd5c25e27295668bULL, + .extra_sb_flags = 1 } +}; + +const struct dump_version *current_version = &dump_versions[0]; + struct async_work { struct list_head list; struct list_head ordered; @@ -406,7 +419,7 @@ static void meta_cluster_init(struct metadump_struct *md, u64 start) md->num_items = 0; md->num_ready = 0; header = &md->cluster.header; - header->magic = cpu_to_le64(HEADER_MAGIC); + header->magic = cpu_to_le64(current_version->magic_cpu); header->bytenr = cpu_to_le64(start); header->nritems = cpu_to_le32(0); header->compress = md->compress_level > 0 ? @@ -718,7 +731,7 @@ static int add_extent(u64 start, u64 size, struct metadump_struct *md, { int ret; if (md->data != data || - md->pending_size + size > MAX_PENDING_SIZE || + md->pending_size + size > current_version->max_pending_size || md->pending_start + md->pending_size != start) { ret = flush_pending(md, 0); if (ret) @@ -1047,7 +1060,8 @@ static void update_super_old(u8 *buffer) u32 sectorsize = btrfs_super_sectorsize(super); u64 flags = btrfs_super_flags(super); - flags |= BTRFS_SUPER_FLAG_METADUMP; + if (current_version->extra_sb_flags) + flags |= BTRFS_SUPER_FLAG_METADUMP; btrfs_set_super_flags(super, flags); key = (struct btrfs_disk_key *)(super->sys_chunk_array); @@ -1147,7 +1161,8 @@ finish: if (mdres->clear_space_cache) btrfs_set_super_cache_generation(super, 0); - flags |= BTRFS_SUPER_FLAG_METADUMP_V2; + if (current_version->extra_sb_flags) + flags |= BTRFS_SUPER_FLAG_METADUMP_V2; btrfs_set_super_flags(super, flags); btrfs_set_super_sys_array_size(super, new_array_size); btrfs_set_super_num_devices(super, 1); @@ -1337,7 +1352,7 @@ static void *restore_worker(void *data) u8 *outbuf; int outfd; int ret; - int compress_size = MAX_PENDING_SIZE * 4; + int compress_size = current_version->max_pending_size * 4; outfd = fileno(mdres->out); buffer = malloc(compress_size); @@ -1490,6 +1505,42 @@ static void mdrestore_destroy(struct mdrestore_struct *mdres, int num_threads) free(mdres->original_super); } +static int detect_version(FILE *in) +{ + struct meta_cluster *cluster; + u8 buf[BLOCK_SIZE]; + bool found = false; + int i; + int ret; + + if (fseek(in, 0, SEEK_SET) < 0) { + error("seek failed: %m"); + return -errno; + } + ret = fread(buf, BLOCK_SIZE, 1, in); + if (!ret) { + error("failed to read header"); + return -EIO; + } + + fseek(in, 0, SEEK_SET); + cluster = (struct meta_cluster *)buf; + for (i = 0; i < ARRAY_SIZE(dump_versions); i++) { + if (le64_to_cpu(cluster->header.magic) == + dump_versions[i].magic_cpu) { + found = true; + current_version = &dump_versions[i]; + break; + } + } + + if (!found) { + error("unrecognized header format"); + return -EINVAL; + } + return 0; +} + static int mdrestore_init(struct mdrestore_struct *mdres, FILE *in, FILE *out, int old_restore, int num_threads, int fixup_offset, @@ -1497,6 +1548,9 @@ static int mdrestore_init(struct mdrestore_struct *mdres, { int i, ret = 0; + ret = detect_version(in); + if (ret < 0) + return ret; memset(mdres, 0, sizeof(*mdres)); pthread_cond_init(&mdres->cond, NULL); pthread_mutex_init(&mdres->mutex, NULL); @@ -1850,7 +1904,7 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres) u64 current_cluster = 0, bytenr; u64 item_bytenr; u32 bufsize, nritems, i; - u32 max_size = MAX_PENDING_SIZE * 2; + u32 max_size = current_version->max_pending_size * 2; u8 *buffer, *tmp = NULL; int ret = 0; @@ -1903,7 +1957,7 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres) ret = 0; header = &cluster->header; - if (le64_to_cpu(header->magic) != HEADER_MAGIC || + if (le64_to_cpu(header->magic) != current_version->magic_cpu || le64_to_cpu(header->bytenr) != current_cluster) { error("bad header in metadump image"); ret = -EIO; @@ -2102,7 +2156,7 @@ static int build_chunk_tree(struct mdrestore_struct *mdres, ret = 0; header = &cluster->header; - if (le64_to_cpu(header->magic) != HEADER_MAGIC || + if (le64_to_cpu(header->magic) != current_version->magic_cpu || le64_to_cpu(header->bytenr) != 0) { error("bad header in metadump image"); return -EIO; @@ -2675,7 +2729,7 @@ static int restore_metadump(const char *input, FILE *out, int old_restore, break; header = &cluster->header; - if (le64_to_cpu(header->magic) != HEADER_MAGIC || + if (le64_to_cpu(header->magic) != current_version->magic_cpu || le64_to_cpu(header->bytenr) != bytenr) { error("bad header in metadump image"); ret = -EIO; diff --git a/image/metadump.h b/image/metadump.h index 57bc3bf285b0..bcffbd4722b0 100644 --- a/image/metadump.h +++ b/image/metadump.h @@ -22,8 +22,6 @@ #include "kernel-lib/list.h" #include "kernel-shared/ctree.h" -#define HEADER_MAGIC 0xbd5c25e27295668bULL -#define MAX_PENDING_SIZE SZ_256K #define BLOCK_SIZE SZ_1K #define BLOCK_MASK (BLOCK_SIZE - 1) @@ -33,6 +31,16 @@ #define COMPRESS_NONE 0 #define COMPRESS_ZLIB 1 +struct dump_version { + u64 magic_cpu; + int version; + int max_pending_size; + unsigned int extra_sb_flags:1; +}; + +extern const struct dump_version dump_versions[]; +const extern struct dump_version *current_version; + struct meta_cluster_item { __le64 bytenr; __le32 size; From patchwork Tue Aug 24 07:41:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12454119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBB0CC432BE for ; Tue, 24 Aug 2021 07:42:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AB28D60F91 for ; Tue, 24 Aug 2021 07:42:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235291AbhHXHnV (ORCPT ); Tue, 24 Aug 2021 03:43:21 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:55630 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235479AbhHXHmK (ORCPT ); Tue, 24 Aug 2021 03:42:10 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 7610020044 for ; Tue, 24 Aug 2021 07:41:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1629790876; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x4tZDm7DJwLoIorGaGOMRB4LBxQYySaLV4hNDnJHsJ8=; b=Mi9I9BxEmxtfmY0+R1eopoUl6PuWtzCUDb4LNLVShnyEeQrxpJOQA4y2C8SSu454fpZL6O +gXxysOMrq/AMVuPx1hdLfqq+ICgWRrqGRciZxlibEGaKgf1E9WAlVcWF30H7pLsmE5tsJ q0a/0I8LhC7Cbo3E6zJn7aJCiTwaMv8= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 7053C13942 for ; Tue, 24 Aug 2021 07:41:15 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id 0PiTB5uiJGF8bwAAGKfGzw (envelope-from ) for ; Tue, 24 Aug 2021 07:41:15 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v7 2/4] btrfs-progs: image: introduce -d option to dump data Date: Tue, 24 Aug 2021 15:41:06 +0800 Message-Id: <20210824074108.44759-3-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210824074108.44759-1-wqu@suse.com> References: <20210824074108.44759-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This new experimental data dump feature will dump the whole image, not only the existing tree blocks but also all its data extents(*). This feature will rely on the new dump format (_DUmP_v1), as it needs extra large extent size limit, and older btrfs-image dump can't handle such large item/cluster size. Since we're dumping all extents including data extents, for the restored image there is no need to use any extra super block flags to inform kernel. Kernel should just treat the restored image as any ordinary btrfs. This new feature will be hidden behind the experimental features, that's to say, if --enable-experimental is not enabled, although we still have the option, it will not do anything but output an error message. *: The data extents will be dumped as is, that's to say, even for preallocated extent, its (meaningless) data will be read out and dumpped. This behavior will cause extra space usage for the image, but we can skip all the complex partially shared preallocated extent check. Signed-off-by: Qu Wenruo --- image/main.c | 62 ++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 50 insertions(+), 12 deletions(-) diff --git a/image/main.c b/image/main.c index 65a42ad7d85d..b57120875f72 100644 --- a/image/main.c +++ b/image/main.c @@ -53,7 +53,17 @@ const struct dump_version dump_versions[] = { { .version = 0, .max_pending_size = SZ_256K, .magic_cpu = 0xbd5c25e27295668bULL, - .extra_sb_flags = 1 } + .extra_sb_flags = 1 }, +#if EXPERIMENTAL + /* + * The newer format, with much larger item size to contain + * any data extent. + */ + { .version = 1, + .max_pending_size = SZ_256M, + .magic_cpu = 0x31765f506d55445fULL, /* ascii _DUmP_v1, no null */ + .extra_sb_flags = 0 }, +#endif }; const struct dump_version *current_version = &dump_versions[0]; @@ -455,10 +465,14 @@ static void metadump_destroy(struct metadump_struct *md, int num_threads) static int metadump_init(struct metadump_struct *md, struct btrfs_root *root, FILE *out, int num_threads, int compress_level, - enum sanitize_mode sanitize_names) + bool dump_data, enum sanitize_mode sanitize_names) { int i, ret = 0; + /* We need larger item/cluster limit for data extents */ + if (dump_data) + current_version = &dump_versions[1]; + memset(md, 0, sizeof(*md)); INIT_LIST_HEAD(&md->list); INIT_LIST_HEAD(&md->ordered); @@ -886,7 +900,7 @@ static int copy_space_cache(struct btrfs_root *root, } static int copy_from_extent_tree(struct metadump_struct *metadump, - struct btrfs_path *path) + struct btrfs_path *path, bool dump_data) { struct btrfs_root *extent_root; struct extent_buffer *leaf; @@ -951,9 +965,15 @@ static int copy_from_extent_tree(struct metadump_struct *metadump, ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_extent_item); if (btrfs_extent_flags(leaf, ei) & - BTRFS_EXTENT_FLAG_TREE_BLOCK) { + BTRFS_EXTENT_FLAG_TREE_BLOCK || + (dump_data && (btrfs_extent_flags(leaf, ei) & + BTRFS_EXTENT_FLAG_DATA))) { + bool is_data; + + is_data = btrfs_extent_flags(leaf, ei) & + BTRFS_EXTENT_FLAG_DATA; ret = add_extent(bytenr, num_bytes, metadump, - 0); + is_data); if (ret) { error("unable to add block %llu: %d", (unsigned long long)bytenr, ret); @@ -976,7 +996,7 @@ static int copy_from_extent_tree(struct metadump_struct *metadump, static int create_metadump(const char *input, FILE *out, int num_threads, int compress_level, enum sanitize_mode sanitize, - int walk_trees) + int walk_trees, bool dump_data) { struct btrfs_root *root; struct btrfs_path path; @@ -991,7 +1011,7 @@ static int create_metadump(const char *input, FILE *out, int num_threads, } ret = metadump_init(&metadump, root, out, num_threads, - compress_level, sanitize); + compress_level, dump_data, sanitize); if (ret) { error("failed to initialize metadump: %d", ret); close_ctree(root); @@ -1023,7 +1043,7 @@ static int create_metadump(const char *input, FILE *out, int num_threads, goto out; } } else { - ret = copy_from_extent_tree(&metadump, &path); + ret = copy_from_extent_tree(&metadump, &path, dump_data); if (ret) { err = ret; goto out; @@ -2929,6 +2949,7 @@ static void print_usage(int ret) printf("\t-s \tsanitize file names, use once to just use garbage, use twice if you want crc collisions\n"); printf("\t-w \twalk all trees instead of using extent tree, do this if your extent tree is broken\n"); printf("\t-m \trestore for multiple devices\n"); + printf("\t-d \talso dump data, conflicts with -w\n"); printf("\n"); printf("\tIn the dump mode, source is the btrfs device and target is the output file (use '-' for stdout).\n"); printf("\tIn the restore mode, source is the dumped image and target is the btrfs device/file.\n"); @@ -2948,6 +2969,7 @@ int BOX_MAIN(image)(int argc, char *argv[]) int ret; enum sanitize_mode sanitize = SANITIZE_NONE; int dev_cnt = 0; + bool dump_data = false; int usage_error = 0; FILE *out; @@ -2956,7 +2978,7 @@ int BOX_MAIN(image)(int argc, char *argv[]) { "help", no_argument, NULL, GETOPT_VAL_HELP}, { NULL, 0, NULL, 0 } }; - int c = getopt_long(argc, argv, "rc:t:oswm", long_options, NULL); + int c = getopt_long(argc, argv, "rc:t:oswmd", long_options, NULL); if (c < 0) break; switch (c) { @@ -2996,6 +3018,9 @@ int BOX_MAIN(image)(int argc, char *argv[]) create = 0; multi_devices = 1; break; + case 'd': + dump_data = true; + break; case GETOPT_VAL_HELP: default: print_usage(c != GETOPT_VAL_HELP); @@ -3008,16 +3033,28 @@ int BOX_MAIN(image)(int argc, char *argv[]) dev_cnt = argc - optind - 1; +#if !EXPERIMENTAL + if (dump_data) { + error( +"data dump feature is experimental and is not configured in this build"); + print_usage(1); + } +#endif if (create) { if (old_restore) { error( "create and restore cannot be used at the same time"); usage_error++; } + if (dump_data && walk_trees) { + error("-d conflicts with -w option"); + usage_error++; + } } else { - if (walk_trees || sanitize != SANITIZE_NONE || compress_level) { + if (walk_trees || sanitize != SANITIZE_NONE || compress_level || + dump_data) { error( - "using -w, -s, -c options for restore makes no sense"); + "using -w, -s, -c, -d options for restore makes no sense"); usage_error++; } if (multi_devices && dev_cnt < 2) { @@ -3070,7 +3107,8 @@ int BOX_MAIN(image)(int argc, char *argv[]) } ret = create_metadump(source, out, num_threads, - compress_level, sanitize, walk_trees); + compress_level, sanitize, walk_trees, + dump_data); } else { ret = restore_metadump(source, out, old_restore, num_threads, 0, target, multi_devices); From patchwork Tue Aug 24 07:41:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12454117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2E01C4338F for ; Tue, 24 Aug 2021 07:42:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DA19361026 for ; Tue, 24 Aug 2021 07:42:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234784AbhHXHnT (ORCPT ); Tue, 24 Aug 2021 03:43:19 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:48260 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235489AbhHXHmK (ORCPT ); Tue, 24 Aug 2021 03:42:10 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 247332208F for ; Tue, 24 Aug 2021 07:41:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1629790878; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bhslULLOY/iae9Ou0arR6/PMSqwCEYpZtWXRM+tycAk=; b=eXdRSqO3Ths8shXxFgH6mo9y204r/EHUlAKiAn8KpY36BL51xQHW9rQqesOUS2fgB8wJ66 xiJIPirxCg/ty3quQXDI9jZIjCkQYxUy1D7EZHuw5FxCRp/nq4GQ1YUxw5wL/ntxriB3dM h09tvuTU7ckzerjnyPeKZ9MEKNmhG+0= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 2126113942 for ; Tue, 24 Aug 2021 07:41:16 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id MN0cMJyiJGF8bwAAGKfGzw (envelope-from ) for ; Tue, 24 Aug 2021 07:41:16 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v7 3/4] btrfs-progs: image: reduce memory requirement for decompression Date: Tue, 24 Aug 2021 15:41:07 +0800 Message-Id: <20210824074108.44759-4-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210824074108.44759-1-wqu@suse.com> References: <20210824074108.44759-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org With recent change to enlarge max_pending_size to 256M for data dump, the decompress code requires quite a lot of memory space. (256M * 4). The main reason behind it is, we're using wrapped uncompress() function call, which needs the buffer to be large enough to contain the decompressed data. This patch will re-work the decompress work to use inflate() which can resume it decompression so that we can use a much smaller buffer size. This patch choose to use 512K buffer size. Now the memory consumption for restore is reduced to Cluster data size + 512K * nr_running_threads Instead of the original one: Cluster data size + 1G * nr_running_threads Signed-off-by: Qu Wenruo --- image/main.c | 222 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 146 insertions(+), 76 deletions(-) diff --git a/image/main.c b/image/main.c index b57120875f72..c622c544b5d3 100644 --- a/image/main.c +++ b/image/main.c @@ -1363,130 +1363,200 @@ static void write_backup_supers(int fd, u8 *buf) } } -static void *restore_worker(void *data) +/* + * Restore one item. + * + * For uncompressed data, it's just reading from work->buf then write to output. + * For compressed data, since we can have very large decompressed data + * (up to 256M), we need to consider memory usage. So here we will fill buffer + * then write the decompressed buffer to output. + */ +static int restore_one_work(struct mdrestore_struct *mdres, + struct async_work *async, u8 *buffer, int bufsize) { - struct mdrestore_struct *mdres = (struct mdrestore_struct *)data; - struct async_work *async; - size_t size; - u8 *buffer; - u8 *outbuf; - int outfd; + z_stream strm; + int buf_offset = 0; /* offset inside work->buffer */ + int out_offset = 0; /* offset for output */ + int out_len; + int outfd = fileno(mdres->out); + int compress_method = mdres->compress_method; int ret; - int compress_size = current_version->max_pending_size * 4; - outfd = fileno(mdres->out); - buffer = malloc(compress_size); - if (!buffer) { - error("not enough memory for restore worker buffer"); - pthread_mutex_lock(&mdres->mutex); - if (!mdres->error) - mdres->error = -ENOMEM; - pthread_mutex_unlock(&mdres->mutex); - pthread_exit(NULL); + ASSERT(is_power_of_2(bufsize)); + + if (compress_method == COMPRESS_ZLIB) { + strm.zalloc = Z_NULL; + strm.zfree = Z_NULL; + strm.opaque = Z_NULL; + strm.avail_in = async->bufsize; + strm.next_in = async->buffer; + strm.avail_out = 0; + strm.next_out = Z_NULL; + ret = inflateInit(&strm); + if (ret != Z_OK) { + error("failed to initialize decompress parameters: %d", + ret); + return ret; + } } + while (buf_offset < async->bufsize) { + bool compress_end = false; + int read_size = min_t(u64, async->bufsize - buf_offset, + bufsize); - while (1) { - u64 bytenr, physical_dup; - off_t offset = 0; - int err = 0; - - pthread_mutex_lock(&mdres->mutex); - while (!mdres->nodesize || list_empty(&mdres->list)) { - if (mdres->done) { - pthread_mutex_unlock(&mdres->mutex); - goto out; + /* Read part */ + if (compress_method == COMPRESS_ZLIB) { + if (strm.avail_out == 0) { + strm.avail_out = bufsize; + strm.next_out = buffer; } - pthread_cond_wait(&mdres->cond, &mdres->mutex); - } - async = list_entry(mdres->list.next, struct async_work, list); - list_del_init(&async->list); - - if (mdres->compress_method == COMPRESS_ZLIB) { - size = compress_size; pthread_mutex_unlock(&mdres->mutex); - ret = uncompress(buffer, (unsigned long *)&size, - async->buffer, async->bufsize); + ret = inflate(&strm, Z_NO_FLUSH); pthread_mutex_lock(&mdres->mutex); - if (ret != Z_OK) { - error("decompression failed with %d", ret); - err = -EIO; + switch (ret) { + case Z_NEED_DICT: + ret = Z_DATA_ERROR; + __attribute__ ((fallthrough)); + case Z_DATA_ERROR: + case Z_MEM_ERROR: + goto out; + } + if (ret == Z_STREAM_END) { + ret = 0; + compress_end = true; } - outbuf = buffer; + out_len = bufsize - strm.avail_out; } else { - outbuf = async->buffer; - size = async->bufsize; + /* No compress, read as many data as possible */ + memcpy(buffer, async->buffer + buf_offset, read_size); + + buf_offset += read_size; + out_len = read_size; } + /* Fixup part */ if (!mdres->multi_devices) { if (async->start == BTRFS_SUPER_INFO_OFFSET) { - memcpy(mdres->original_super, outbuf, + memcpy(mdres->original_super, buffer, BTRFS_SUPER_INFO_SIZE); if (mdres->old_restore) { - update_super_old(outbuf); + update_super_old(buffer); } else { - ret = update_super(mdres, outbuf); - if (ret) - err = ret; + ret = update_super(mdres, buffer); + if (ret < 0) + goto out; } } else if (!mdres->old_restore) { - ret = fixup_chunk_tree_block(mdres, async, outbuf, size); + ret = fixup_chunk_tree_block(mdres, async, + buffer, out_len); if (ret) - err = ret; + goto out; } } + /* Write part */ if (!mdres->fixup_offset) { + int size = out_len; + off_t offset = 0; + while (size) { + u64 logical = async->start + out_offset + offset; u64 chunk_size = size; - physical_dup = 0; + u64 physical_dup = 0; + u64 bytenr; + if (!mdres->multi_devices && !mdres->old_restore) bytenr = logical_to_physical(mdres, - async->start + offset, - &chunk_size, - &physical_dup); + logical, &chunk_size, + &physical_dup); else - bytenr = async->start + offset; + bytenr = logical; - ret = pwrite64(outfd, outbuf+offset, chunk_size, - bytenr); + ret = pwrite64(outfd, buffer + offset, chunk_size, bytenr); if (ret != chunk_size) - goto error; + goto write_error; if (physical_dup) - ret = pwrite64(outfd, outbuf+offset, - chunk_size, - physical_dup); + ret = pwrite64(outfd, buffer + offset, + chunk_size, physical_dup); if (ret != chunk_size) - goto error; + goto write_error; size -= chunk_size; offset += chunk_size; continue; - -error: - if (ret < 0) { - error("unable to write to device: %m"); - err = errno; - } else { - error("short write"); - err = -EIO; - } } } else if (async->start != BTRFS_SUPER_INFO_OFFSET) { - ret = write_data_to_disk(mdres->info, outbuf, async->start, size, 0); + ret = write_data_to_disk(mdres->info, buffer, + async->start, out_len, 0); if (ret) { error("failed to write data"); exit(1); } } - /* backup super blocks are already there at fixup_offset stage */ - if (!mdres->multi_devices && async->start == BTRFS_SUPER_INFO_OFFSET) - write_backup_supers(outfd, outbuf); + if (async->start == BTRFS_SUPER_INFO_OFFSET && + !mdres->multi_devices) + write_backup_supers(outfd, buffer); + out_offset += out_len; + if (compress_end) { + inflateEnd(&strm); + break; + } + } + return ret; + +write_error: + if (ret < 0) { + error("unable to write to device: %m"); + ret = -errno; + } else { + error("short write"); + ret = -EIO; + } +out: + if (compress_method == COMPRESS_ZLIB) + inflateEnd(&strm); + return ret; +} + +static void *restore_worker(void *data) +{ + struct mdrestore_struct *mdres = (struct mdrestore_struct *)data; + struct async_work *async; + u8 *buffer; + int ret; + int buffer_size = SZ_512K; + + buffer = malloc(buffer_size); + if (!buffer) { + error("not enough memory for restore worker buffer"); + pthread_mutex_lock(&mdres->mutex); + if (!mdres->error) + mdres->error = -ENOMEM; + pthread_mutex_unlock(&mdres->mutex); + pthread_exit(NULL); + } + + while (1) { + pthread_mutex_lock(&mdres->mutex); + while (!mdres->nodesize || list_empty(&mdres->list)) { + if (mdres->done) { + pthread_mutex_unlock(&mdres->mutex); + goto out; + } + pthread_cond_wait(&mdres->cond, &mdres->mutex); + } + async = list_entry(mdres->list.next, struct async_work, list); + list_del_init(&async->list); - if (err && !mdres->error) - mdres->error = err; + ret = restore_one_work(mdres, async, buffer, buffer_size); + if (ret < 0) { + mdres->error = ret; + pthread_mutex_unlock(&mdres->mutex); + goto out; + } mdres->num_items--; pthread_mutex_unlock(&mdres->mutex); From patchwork Tue Aug 24 07:41:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12454115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FDA8C432BE for ; Tue, 24 Aug 2021 07:42:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 26C6161246 for ; Tue, 24 Aug 2021 07:42:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235329AbhHXHnR (ORCPT ); Tue, 24 Aug 2021 03:43:17 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:55636 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235503AbhHXHmK (ORCPT ); Tue, 24 Aug 2021 03:42:10 -0400 Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id CBD8120045 for ; Tue, 24 Aug 2021 07:41:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1629790879; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+bvDyA9GJ+JddJun3Uq/N+G3PTIr28MA+wnLFZUE/nw=; b=t/mvndioRHjrYxGEGN6QbR4B/7yGfPXnnudwisCA8XvEFJhiwWRXTVJf2RXsxTZKQuFUoI jhhrg5lL3pyN0BjtIFHqHvPTwXkF5QuudVW/R0avMh/R/EAGK4W8jYF6tYoAXmMMEMlblw StnBAkWfV0Hkbd5fVm/SRefpFL7hS1Q= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id C350113942 for ; Tue, 24 Aug 2021 07:41:18 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id OJdgG56iJGF8bwAAGKfGzw (envelope-from ) for ; Tue, 24 Aug 2021 07:41:18 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v7 4/4] btrfs-progs: image: fix restored image size misalignment Date: Tue, 24 Aug 2021 15:41:08 +0800 Message-Id: <20210824074108.44759-5-wqu@suse.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210824074108.44759-1-wqu@suse.com> References: <20210824074108.44759-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BUG] There is a small device size misalignment between the super block device size and the device extent size: total_bytes 10737418240 <<< bytes_used 15097856 dev_item.total_bytes 10737418240 dev_item.bytes_used 1094713344 item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98 devid 1 total_bytes 1095761920 bytes_used 1094713344 ^^^^^^^^^^ [CAUSE] In fixup_device_size(), we only reset superblock device item size, which will be overwritten in write_dev_supers() using btrfs_device::total_bytes. And it doesn't touch btrfs_superblock::total_bytes either. [FIX] So fix the small mismatch by also resetting btrfs_device::total_bytes, btrfs_device::bytes_used and btrfs_superblock::total_bytes. Thankfully since commit 73dd4e3c87c9 ("btrfs-progs: image: Don't modify the chunk and device tree if the source dump is single device") single device dump won't have such problem, but it's still worthy for multi-device dump. Signed-off-by: Qu Wenruo --- image/main.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/image/main.c b/image/main.c index c622c544b5d3..bf4c62dafbaa 100644 --- a/image/main.c +++ b/image/main.c @@ -2377,6 +2377,7 @@ static int fixup_device_size(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_dev_item *dev_item; struct btrfs_dev_extent *dev_ext; + struct btrfs_device *dev; struct btrfs_path path; struct extent_buffer *leaf; struct btrfs_root *root = fs_info->chunk_root; @@ -2395,6 +2396,8 @@ static int fixup_device_size(struct btrfs_trans_handle *trans, key.type = BTRFS_DEV_EXTENT_KEY; key.offset = (u64)-1; + dev = list_first_entry(&fs_info->fs_devices->devices, + struct btrfs_device, dev_list); ret = btrfs_search_slot(NULL, fs_info->dev_root, &key, &path, 0, 0); if (ret < 0) { errno = -ret; @@ -2428,6 +2431,9 @@ static int fixup_device_size(struct btrfs_trans_handle *trans, btrfs_set_stack_device_total_bytes(dev_item, dev_size); btrfs_set_stack_device_bytes_used(dev_item, mdres->alloced_chunks); + dev->total_bytes = dev_size; + dev->bytes_used = mdres->alloced_chunks; + btrfs_set_super_total_bytes(fs_info->super_copy, dev_size); ret = fstat(out_fd, &buf); if (ret < 0) { error("failed to stat result image: %m");