From patchwork Thu Jun 6 11:06:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10979275 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1297514C0 for ; Thu, 6 Jun 2019 11:06:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 017B4280B0 for ; Thu, 6 Jun 2019 11:06:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E717A2875A; Thu, 6 Jun 2019 11:06:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90CA7280B0 for ; Thu, 6 Jun 2019 11:06:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727888AbfFFLG1 (ORCPT ); Thu, 6 Jun 2019 07:06:27 -0400 Received: from mx2.suse.de ([195.135.220.15]:34810 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727669AbfFFLG0 (ORCPT ); Thu, 6 Jun 2019 07:06:26 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C2E2CAE54 for ; Thu, 6 Jun 2019 11:06:25 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/9] btrfs-progs: image: Use SZ_* to replace intermediate size Date: Thu, 6 Jun 2019 19:06:03 +0800 Message-Id: <20190606110611.27176-2-wqu@suse.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190606110611.27176-1-wqu@suse.com> References: <20190606110611.27176-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Qu Wenruo --- image/metadump.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/image/metadump.h b/image/metadump.h index 8ace60f5..f85c9bcf 100644 --- a/image/metadump.h +++ b/image/metadump.h @@ -23,8 +23,8 @@ #include "ctree.h" #define HEADER_MAGIC 0xbd5c25e27295668bULL -#define MAX_PENDING_SIZE (256 * 1024) -#define BLOCK_SIZE 1024 +#define MAX_PENDING_SIZE SZ_256K +#define BLOCK_SIZE SZ_1K #define BLOCK_MASK (BLOCK_SIZE - 1) #define ITEMS_PER_CLUSTER ((BLOCK_SIZE - sizeof(struct meta_cluster)) / \ From patchwork Thu Jun 6 11:06:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10979277 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E6F6314C0 for ; Thu, 6 Jun 2019 11:06:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D5D80280B0 for ; Thu, 6 Jun 2019 11:06:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CA3922875A; Thu, 6 Jun 2019 11:06:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 892F2280B0 for ; Thu, 6 Jun 2019 11:06:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727903AbfFFLG3 (ORCPT ); Thu, 6 Jun 2019 07:06:29 -0400 Received: from mx2.suse.de ([195.135.220.15]:34822 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727669AbfFFLG3 (ORCPT ); Thu, 6 Jun 2019 07:06:29 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 7D4CAAE5A for ; Thu, 6 Jun 2019 11:06:28 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/9] btrfs-progs: image: Fix a indent misalign Date: Thu, 6 Jun 2019 19:06:04 +0800 Message-Id: <20190606110611.27176-3-wqu@suse.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190606110611.27176-1-wqu@suse.com> References: <20190606110611.27176-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Qu Wenruo --- image/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/image/main.c b/image/main.c index 4fba8283..fb9fc48c 100644 --- a/image/main.c +++ b/image/main.c @@ -2702,7 +2702,7 @@ int main(int argc, char *argv[]) create = 0; multi_devices = 1; break; - case GETOPT_VAL_HELP: + case GETOPT_VAL_HELP: default: print_usage(c != GETOPT_VAL_HELP); } From patchwork Thu Jun 6 11:06:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10979279 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A437576 for ; Thu, 6 Jun 2019 11:06:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 92C1C280B0 for ; Thu, 6 Jun 2019 11:06:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 869932875A; Thu, 6 Jun 2019 11:06:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3BE0E280B0 for ; Thu, 6 Jun 2019 11:06:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727908AbfFFLGf (ORCPT ); Thu, 6 Jun 2019 07:06:35 -0400 Received: from mx2.suse.de ([195.135.220.15]:34862 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727669AbfFFLGf (ORCPT ); Thu, 6 Jun 2019 07:06:35 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 3BC3CAE54 for ; Thu, 6 Jun 2019 11:06:34 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/9] btrfs-progs: image: Fix a access-beyond-boundary bug when there are 32 online CPUs Date: Thu, 6 Jun 2019 19:06:05 +0800 Message-Id: <20190606110611.27176-4-wqu@suse.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190606110611.27176-1-wqu@suse.com> References: <20190606110611.27176-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP [BUG] When there are over 32 (in my example, 35) online CPUs, btrfs-image -c9 will just hang. [CAUSE] Btrfs-image has a hard coded limit (32) on how many threads we can use. For the "-t" option we do the up limit check. But when we don't specify "-t" option and speicified "-c" option, then btrfs-image will try to auto detect the number of online CPUs, and use it without checking if it's over the up limit. And for num_threads larger than the up limit, we will over write the adjust members of metadump_struct/mdrestore_struct, corrupting pthread_mutex_t and pthread_cond_t, causing synchronising problem. Nowadays, with SMT/HT and higher cpu core counts, it's not hard to go beyond 32 threads, and hit the bug. [FIX] Just do extra num_threads check before using the number from sysconf(). Signed-off-by: Qu Wenruo Reviewed-by: Su Yue --- image/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/image/main.c b/image/main.c index fb9fc48c..80f09c21 100644 --- a/image/main.c +++ b/image/main.c @@ -2758,6 +2758,7 @@ int main(int argc, char *argv[]) if (tmp <= 0) tmp = 1; + tmp = min_t(long, tmp, MAX_WORKER_THREADS); num_threads = tmp; } } else { From patchwork Thu Jun 6 11:06:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10979281 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4824914C0 for ; Thu, 6 Jun 2019 11:06:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 37026280B0 for ; Thu, 6 Jun 2019 11:06:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2B0832875A; Thu, 6 Jun 2019 11:06:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BC093280B0 for ; Thu, 6 Jun 2019 11:06:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727924AbfFFLGj (ORCPT ); Thu, 6 Jun 2019 07:06:39 -0400 Received: from mx2.suse.de ([195.135.220.15]:34874 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725784AbfFFLGj (ORCPT ); Thu, 6 Jun 2019 07:06:39 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8CDAEAE54 for ; Thu, 6 Jun 2019 11:06:38 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 4/9] btrfs-progs: image: Verify the superblock before restore Date: Thu, 6 Jun 2019 19:06:06 +0800 Message-Id: <20190606110611.27176-5-wqu@suse.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190606110611.27176-1-wqu@suse.com> References: <20190606110611.27176-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch will export disk-io.c::check_super() as btrfs_check_super() and use it in btrfs-image for extra verification. Signed-off-by: Qu Wenruo --- disk-io.c | 6 +++--- disk-io.h | 1 + image/main.c | 5 +++++ 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/disk-io.c b/disk-io.c index 151eb3b5..ffe4a8c5 100644 --- a/disk-io.c +++ b/disk-io.c @@ -1347,7 +1347,7 @@ struct btrfs_root *open_ctree_fd(int fp, const char *path, u64 sb_bytenr, * - number of devices - something sane * - sys array size - maximum */ -static int check_super(struct btrfs_super_block *sb, unsigned sbflags) +int btrfs_check_super(struct btrfs_super_block *sb, unsigned sbflags) { u8 result[BTRFS_CSUM_SIZE]; u32 crc; @@ -1547,7 +1547,7 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr, if (btrfs_super_bytenr(buf) != sb_bytenr) return -EIO; - ret = check_super(buf, sbflags); + ret = btrfs_check_super(buf, sbflags); if (ret < 0) return ret; memcpy(sb, buf, BTRFS_SUPER_INFO_SIZE); @@ -1572,7 +1572,7 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr, /* if magic is NULL, the device was removed */ if (btrfs_super_magic(buf) == 0 && i == 0) break; - if (check_super(buf, sbflags)) + if (btrfs_check_super(buf, sbflags)) continue; if (!fsid_is_initialized) { diff --git a/disk-io.h b/disk-io.h index ddf3a380..c97aa234 100644 --- a/disk-io.h +++ b/disk-io.h @@ -171,6 +171,7 @@ static inline int close_ctree(struct btrfs_root *root) int write_all_supers(struct btrfs_fs_info *fs_info); int write_ctree_super(struct btrfs_trans_handle *trans); +int btrfs_check_super(struct btrfs_super_block *sb, unsigned sbflags); int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr, unsigned sbflags); int btrfs_map_bh_to_logical(struct btrfs_root *root, struct extent_buffer *bh, diff --git a/image/main.c b/image/main.c index 80f09c21..0b7c8736 100644 --- a/image/main.c +++ b/image/main.c @@ -2040,6 +2040,11 @@ static int build_chunk_tree(struct mdrestore_struct *mdres, pthread_mutex_lock(&mdres->mutex); super = (struct btrfs_super_block *)buffer; + ret = btrfs_check_super(super, 0); + if (ret < 0) { + error("invalid superblock"); + return ret; + } chunk_root_bytenr = btrfs_super_chunk_root(super); mdres->nodesize = btrfs_super_nodesize(super); if (btrfs_super_incompat_flags(super) & From patchwork Thu Jun 6 11:06:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10979283 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD3E176 for ; Thu, 6 Jun 2019 11:06:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AAAC4280B0 for ; Thu, 6 Jun 2019 11:06:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9EE402875A; Thu, 6 Jun 2019 11:06:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 06FBD28758 for ; Thu, 6 Jun 2019 11:06:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727935AbfFFLGq (ORCPT ); Thu, 6 Jun 2019 07:06:46 -0400 Received: from mx2.suse.de ([195.135.220.15]:34890 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725784AbfFFLGp (ORCPT ); Thu, 6 Jun 2019 07:06:45 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B5AC3AE54 for ; Thu, 6 Jun 2019 11:06:42 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 5/9] btrfs-progs: image: Introduce framework for more dump versions Date: Thu, 6 Jun 2019 19:06:07 +0800 Message-Id: <20190606110611.27176-6-wqu@suse.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190606110611.27176-1-wqu@suse.com> References: <20190606110611.27176-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The original dump format only contains a @magic member to verify the format, this means if we want to introduce new on-disk format or change certain size limit, we can only introduce new magic as kind of version. This patch will introduce the framework to allow multiple magic to co-exist for further functions. This patch will introduce the following members for each dump version. - max_pending_size The threshold size for an cluster. It's not a hard limit but a soft one. One cluster can go larger than max_pending_size for one item, but next item would go to next cluster. - magic_cpu The magic number in CPU endian. - extra_sb_flags If the super block of this restore needs extra super block flags like BTRFS_SUPER_FLAG_METADUMP_V2. For incoming data dump feature, we don't need any extra super block flags. This change also implies that all image dumps will use the same magic for all clusters. No mixing is allowed, as we will use the first cluster to determine the dump version. Signed-off-by: Qu Wenruo --- image/main.c | 80 ++++++++++++++++++++++++++++++++++++++++-------- image/metadump.h | 13 ++++++-- 2 files changed, 78 insertions(+), 15 deletions(-) diff --git a/image/main.c b/image/main.c index 0b7c8736..e8b45a1a 100644 --- a/image/main.c +++ b/image/main.c @@ -41,6 +41,19 @@ #define MAX_WORKER_THREADS (32) +const struct dump_version dump_versions[NR_DUMP_VERSIONS] = { + /* + * The original format, which only supports tree blocks and + * free space cache dump. + */ + { .version = 0, + .max_pending_size = SZ_256K, + .magic_cpu = 0xbd5c25e27295668bULL, + .extra_sb_flags = 1 } +}; + +const struct dump_version *current_version = &dump_versions[0]; + struct async_work { struct list_head list; struct list_head ordered; @@ -395,7 +408,7 @@ static void meta_cluster_init(struct metadump_struct *md, u64 start) md->num_items = 0; md->num_ready = 0; header = &md->cluster.header; - header->magic = cpu_to_le64(HEADER_MAGIC); + header->magic = cpu_to_le64(current_version->magic_cpu); header->bytenr = cpu_to_le64(start); header->nritems = cpu_to_le32(0); header->compress = md->compress_level > 0 ? @@ -707,7 +720,7 @@ static int add_extent(u64 start, u64 size, struct metadump_struct *md, { int ret; if (md->data != data || - md->pending_size + size > MAX_PENDING_SIZE || + md->pending_size + size > current_version->max_pending_size || md->pending_start + md->pending_size != start) { ret = flush_pending(md, 0); if (ret) @@ -1093,7 +1106,8 @@ static void update_super_old(u8 *buffer) u32 sectorsize = btrfs_super_sectorsize(super); u64 flags = btrfs_super_flags(super); - flags |= BTRFS_SUPER_FLAG_METADUMP; + if (current_version->extra_sb_flags) + flags |= BTRFS_SUPER_FLAG_METADUMP; btrfs_set_super_flags(super, flags); key = (struct btrfs_disk_key *)(super->sys_chunk_array); @@ -1186,7 +1200,8 @@ static int update_super(struct mdrestore_struct *mdres, u8 *buffer) if (mdres->clear_space_cache) btrfs_set_super_cache_generation(super, 0); - flags |= BTRFS_SUPER_FLAG_METADUMP_V2; + if (current_version->extra_sb_flags) + flags |= BTRFS_SUPER_FLAG_METADUMP_V2; btrfs_set_super_flags(super, flags); btrfs_set_super_sys_array_size(super, new_array_size); btrfs_set_super_num_devices(super, 1); @@ -1374,7 +1389,7 @@ static void *restore_worker(void *data) u8 *outbuf; int outfd; int ret; - int compress_size = MAX_PENDING_SIZE * 4; + int compress_size = current_version->max_pending_size * 4; outfd = fileno(mdres->out); buffer = malloc(compress_size); @@ -1523,6 +1538,42 @@ static void mdrestore_destroy(struct mdrestore_struct *mdres, int num_threads) pthread_mutex_destroy(&mdres->mutex); } +static int detect_version(FILE *in) +{ + struct meta_cluster *cluster; + u8 buf[BLOCK_SIZE]; + bool found = false; + int i; + int ret; + + if (fseek(in, 0, SEEK_SET) < 0) { + error("seek failed: %m"); + return -errno; + } + ret = fread(buf, BLOCK_SIZE, 1, in); + if (!ret) { + error("failed to read header"); + return -EIO; + } + + fseek(in, 0, SEEK_SET); + cluster = (struct meta_cluster *)buf; + for (i = 0; i < NR_DUMP_VERSIONS; i++) { + if (le64_to_cpu(cluster->header.magic) == + dump_versions[i].magic_cpu) { + found = true; + current_version = &dump_versions[i]; + break; + } + } + + if (!found) { + error("unrecognized header format"); + return -EINVAL; + } + return 0; +} + static int mdrestore_init(struct mdrestore_struct *mdres, FILE *in, FILE *out, int old_restore, int num_threads, int fixup_offset, @@ -1530,6 +1581,9 @@ static int mdrestore_init(struct mdrestore_struct *mdres, { int i, ret = 0; + ret = detect_version(in); + if (ret < 0) + return ret; memset(mdres, 0, sizeof(*mdres)); pthread_cond_init(&mdres->cond, NULL); pthread_mutex_init(&mdres->mutex, NULL); @@ -1577,9 +1631,9 @@ static int fill_mdres_info(struct mdrestore_struct *mdres, return 0; if (mdres->compress_method == COMPRESS_ZLIB) { - size_t size = MAX_PENDING_SIZE * 2; + size_t size = current_version->max_pending_size * 2; - buffer = malloc(MAX_PENDING_SIZE * 2); + buffer = malloc(current_version->max_pending_size * 2); if (!buffer) return -ENOMEM; ret = uncompress(buffer, (unsigned long *)&size, @@ -1818,7 +1872,7 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres, u64 current_cluster = cluster_bytenr, bytenr; u64 item_bytenr; u32 bufsize, nritems, i; - u32 max_size = MAX_PENDING_SIZE * 2; + u32 max_size = current_version->max_pending_size * 2; u8 *buffer, *tmp = NULL; int ret = 0; @@ -1874,7 +1928,7 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres, ret = 0; header = &cluster->header; - if (le64_to_cpu(header->magic) != HEADER_MAGIC || + if (le64_to_cpu(header->magic) != current_version->magic_cpu || le64_to_cpu(header->bytenr) != current_cluster) { error("bad header in metadump image"); ret = -EIO; @@ -1977,7 +2031,7 @@ static int build_chunk_tree(struct mdrestore_struct *mdres, ret = 0; header = &cluster->header; - if (le64_to_cpu(header->magic) != HEADER_MAGIC || + if (le64_to_cpu(header->magic) != current_version->magic_cpu || le64_to_cpu(header->bytenr) != 0) { error("bad header in metadump image"); return -EIO; @@ -2018,10 +2072,10 @@ static int build_chunk_tree(struct mdrestore_struct *mdres, } if (mdres->compress_method == COMPRESS_ZLIB) { - size_t size = MAX_PENDING_SIZE * 2; + size_t size = current_version->max_pending_size * 2; u8 *tmp; - tmp = malloc(MAX_PENDING_SIZE * 2); + tmp = malloc(current_version->max_pending_size * 2); if (!tmp) { free(buffer); return -ENOMEM; @@ -2478,7 +2532,7 @@ static int restore_metadump(const char *input, FILE *out, int old_restore, break; header = &cluster->header; - if (le64_to_cpu(header->magic) != HEADER_MAGIC || + if (le64_to_cpu(header->magic) != current_version->magic_cpu || le64_to_cpu(header->bytenr) != bytenr) { error("bad header in metadump image"); ret = -EIO; diff --git a/image/metadump.h b/image/metadump.h index f85c9bcf..941d4b82 100644 --- a/image/metadump.h +++ b/image/metadump.h @@ -22,8 +22,6 @@ #include "kernel-lib/list.h" #include "ctree.h" -#define HEADER_MAGIC 0xbd5c25e27295668bULL -#define MAX_PENDING_SIZE SZ_256K #define BLOCK_SIZE SZ_1K #define BLOCK_MASK (BLOCK_SIZE - 1) @@ -33,6 +31,17 @@ #define COMPRESS_NONE 0 #define COMPRESS_ZLIB 1 +struct dump_version { + u64 magic_cpu; + int version; + int max_pending_size; + unsigned int extra_sb_flags:1; +}; + +#define NR_DUMP_VERSIONS 1 +extern const struct dump_version dump_versions[NR_DUMP_VERSIONS]; +const extern struct dump_version *current_version; + struct meta_cluster_item { __le64 bytenr; __le32 size; From patchwork Thu Jun 6 11:06:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10979285 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CE29114C0 for ; Thu, 6 Jun 2019 11:06:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BB0CB280B0 for ; Thu, 6 Jun 2019 11:06:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AE1652875A; Thu, 6 Jun 2019 11:06:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 19114280B0 for ; Thu, 6 Jun 2019 11:06:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727941AbfFFLGv (ORCPT ); Thu, 6 Jun 2019 07:06:51 -0400 Received: from mx2.suse.de ([195.135.220.15]:34896 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725784AbfFFLGv (ORCPT ); Thu, 6 Jun 2019 07:06:51 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 68640AE54 for ; Thu, 6 Jun 2019 11:06:49 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 6/9] btrfs-progs: image: Introduce -d option to dump data Date: Thu, 6 Jun 2019 19:06:08 +0800 Message-Id: <20190606110611.27176-7-wqu@suse.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190606110611.27176-1-wqu@suse.com> References: <20190606110611.27176-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This new data dump feature will dump the whole image, not long the existing tree blocks but also all its data extents(*). This feature will rely on the new dump format (_DUmP_v1), as it needs extra large extent size limit, and older btrfs-image dump can't handle such large item/cluster size. Since we're dumping all extents including data extents, for the restored image there is no need to use any extra super block flags to inform kernel. Kernel should just treat the restored image as any ordinary btrfs. *: The data extents will be dumped as is, that's to say, even for preallocated extent, its (meaningless) data will be read out and dumpped. This behavior will cause extra space usage for the image, but we can skip all the complex partially shared preallocated extent check. Signed-off-by: Qu Wenruo --- image/main.c | 53 +++++++++++++++++++++++++++++++++++++----------- image/metadump.h | 2 +- 2 files changed, 42 insertions(+), 13 deletions(-) diff --git a/image/main.c b/image/main.c index e8b45a1a..f394bfc8 100644 --- a/image/main.c +++ b/image/main.c @@ -49,7 +49,15 @@ const struct dump_version dump_versions[NR_DUMP_VERSIONS] = { { .version = 0, .max_pending_size = SZ_256K, .magic_cpu = 0xbd5c25e27295668bULL, - .extra_sb_flags = 1 } + .extra_sb_flags = 1 }, + /* + * The newer format, with much larger item size to contain + * any data extent. + */ + { .version = 1, + .max_pending_size = SZ_256M, + .magic_cpu = 0x31765f506d55445fULL, /* ascii _DUmP_v1, no null */ + .extra_sb_flags = 0 }, }; const struct dump_version *current_version = &dump_versions[0]; @@ -444,10 +452,14 @@ static void metadump_destroy(struct metadump_struct *md, int num_threads) static int metadump_init(struct metadump_struct *md, struct btrfs_root *root, FILE *out, int num_threads, int compress_level, - enum sanitize_mode sanitize_names) + bool dump_data, enum sanitize_mode sanitize_names) { int i, ret = 0; + /* We need larger item/cluster limit for data extents */ + if (dump_data) + current_version = &dump_versions[1]; + memset(md, 0, sizeof(*md)); INIT_LIST_HEAD(&md->list); INIT_LIST_HEAD(&md->ordered); @@ -912,7 +924,7 @@ static int copy_space_cache(struct btrfs_root *root, } static int copy_from_extent_tree(struct metadump_struct *metadump, - struct btrfs_path *path) + struct btrfs_path *path, bool dump_data) { struct btrfs_root *extent_root; struct extent_buffer *leaf; @@ -977,9 +989,15 @@ static int copy_from_extent_tree(struct metadump_struct *metadump, ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_extent_item); if (btrfs_extent_flags(leaf, ei) & - BTRFS_EXTENT_FLAG_TREE_BLOCK) { + BTRFS_EXTENT_FLAG_TREE_BLOCK || + btrfs_extent_flags(leaf, ei) & + BTRFS_EXTENT_FLAG_DATA) { + bool is_data; + + is_data = btrfs_extent_flags(leaf, ei) & + BTRFS_EXTENT_FLAG_DATA; ret = add_extent(bytenr, num_bytes, metadump, - 0); + is_data); if (ret) { error("unable to add block %llu: %d", (unsigned long long)bytenr, ret); @@ -1022,7 +1040,7 @@ static int copy_from_extent_tree(struct metadump_struct *metadump, static int create_metadump(const char *input, FILE *out, int num_threads, int compress_level, enum sanitize_mode sanitize, - int walk_trees) + int walk_trees, bool dump_data) { struct btrfs_root *root; struct btrfs_path path; @@ -1037,7 +1055,7 @@ static int create_metadump(const char *input, FILE *out, int num_threads, } ret = metadump_init(&metadump, root, out, num_threads, - compress_level, sanitize); + compress_level, dump_data, sanitize); if (ret) { error("failed to initialize metadump: %d", ret); close_ctree(root); @@ -1069,7 +1087,7 @@ static int create_metadump(const char *input, FILE *out, int num_threads, goto out; } } else { - ret = copy_from_extent_tree(&metadump, &path); + ret = copy_from_extent_tree(&metadump, &path, dump_data); if (ret) { err = ret; goto out; @@ -2694,6 +2712,7 @@ static void print_usage(int ret) printf("\t-s \tsanitize file names, use once to just use garbage, use twice if you want crc collisions\n"); printf("\t-w \twalk all trees instead of using extent tree, do this if your extent tree is broken\n"); printf("\t-m \trestore for multiple devices\n"); + printf("\t-d \talso dump data, conflicts with -w\n"); printf("\n"); printf("\tIn the dump mode, source is the btrfs device and target is the output file (use '-' for stdout).\n"); printf("\tIn the restore mode, source is the dumped image and target is the btrfs device/file.\n"); @@ -2713,6 +2732,7 @@ int main(int argc, char *argv[]) int ret; enum sanitize_mode sanitize = SANITIZE_NONE; int dev_cnt = 0; + bool dump_data = false; int usage_error = 0; FILE *out; @@ -2721,7 +2741,7 @@ int main(int argc, char *argv[]) { "help", no_argument, NULL, GETOPT_VAL_HELP}, { NULL, 0, NULL, 0 } }; - int c = getopt_long(argc, argv, "rc:t:oswm", long_options, NULL); + int c = getopt_long(argc, argv, "rc:t:oswmd", long_options, NULL); if (c < 0) break; switch (c) { @@ -2761,6 +2781,9 @@ int main(int argc, char *argv[]) create = 0; multi_devices = 1; break; + case 'd': + dump_data = true; + break; case GETOPT_VAL_HELP: default: print_usage(c != GETOPT_VAL_HELP); @@ -2779,10 +2802,15 @@ int main(int argc, char *argv[]) "create and restore cannot be used at the same time"); usage_error++; } + if (dump_data && walk_trees) { + error("-d conflicts with -f option"); + usage_error++; + } } else { - if (walk_trees || sanitize != SANITIZE_NONE || compress_level) { + if (walk_trees || sanitize != SANITIZE_NONE || compress_level || + dump_data) { error( - "using -w, -s, -c options for restore makes no sense"); + "using -w, -s, -c, -d options for restore makes no sense"); usage_error++; } if (multi_devices && dev_cnt < 2) { @@ -2835,7 +2863,8 @@ int main(int argc, char *argv[]) } ret = create_metadump(source, out, num_threads, - compress_level, sanitize, walk_trees); + compress_level, sanitize, walk_trees, + dump_data); } else { ret = restore_metadump(source, out, old_restore, num_threads, 0, target, multi_devices); diff --git a/image/metadump.h b/image/metadump.h index 941d4b82..a04f63a9 100644 --- a/image/metadump.h +++ b/image/metadump.h @@ -38,7 +38,7 @@ struct dump_version { unsigned int extra_sb_flags:1; }; -#define NR_DUMP_VERSIONS 1 +#define NR_DUMP_VERSIONS 2 extern const struct dump_version dump_versions[NR_DUMP_VERSIONS]; const extern struct dump_version *current_version; From patchwork Thu Jun 6 11:06:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10979287 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA6E014C0 for ; Thu, 6 Jun 2019 11:07:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA4BE280B0 for ; Thu, 6 Jun 2019 11:07:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ADF2A28774; Thu, 6 Jun 2019 11:07:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2C227280B0 for ; Thu, 6 Jun 2019 11:07:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726963AbfFFLHA (ORCPT ); Thu, 6 Jun 2019 07:07:00 -0400 Received: from mx2.suse.de ([195.135.220.15]:34928 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725784AbfFFLHA (ORCPT ); Thu, 6 Jun 2019 07:07:00 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 3FE3BAE54 for ; Thu, 6 Jun 2019 11:06:58 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 7/9] btrfs-progs: image: Allow restore to record system chunk ranges for later usage Date: Thu, 6 Jun 2019 19:06:09 +0800 Message-Id: <20190606110611.27176-8-wqu@suse.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190606110611.27176-1-wqu@suse.com> References: <20190606110611.27176-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently we are doing a pretty slow search for system chunks before restoring real data. The current behavior is to search all clusters for chunk tree root first, then search all clusters again and again for every chunk tree block. This causes recursive calls and pretty slow start up, the only good news is since chunk tree are normally small, we don't need to iterate too many times, thus overall it's acceptable. To address such bad behavior, we could take usage of system chunk array in the super block. By recording all system chunks ranges, we could easily determine if an extent belongs to chunk tree, thus do one loop simple linear search for chunk tree leaves. This patch only introduces the code base for later patches. Signed-off-by: Qu Wenruo --- image/main.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 103 insertions(+) diff --git a/image/main.c b/image/main.c index f394bfc8..0460a5f5 100644 --- a/image/main.c +++ b/image/main.c @@ -35,6 +35,7 @@ #include "utils.h" #include "volumes.h" #include "extent_io.h" +#include "extent-cache.h" #include "help.h" #include "image/metadump.h" #include "image/sanitize.h" @@ -112,6 +113,11 @@ struct mdrestore_struct { pthread_mutex_t mutex; pthread_cond_t cond; + /* + * Records system chunk ranges, so restore can use this to determine + * if an item is in chunk tree range. + */ + struct cache_tree sys_chunks; struct rb_root chunk_tree; struct rb_root physical_tree; struct list_head list; @@ -121,6 +127,8 @@ struct mdrestore_struct { u64 devid; u64 alloced_chunks; u64 last_physical_offset; + /* An quicker checker for if a item is in sys chunk range */ + u64 sys_chunk_end; u8 uuid[BTRFS_UUID_SIZE]; u8 fsid[BTRFS_FSID_SIZE]; @@ -1544,6 +1552,7 @@ static void mdrestore_destroy(struct mdrestore_struct *mdres, int num_threads) rb_erase(&entry->p, &mdres->physical_tree); free(entry); } + free_extent_cache_tree(&mdres->sys_chunks); pthread_mutex_lock(&mdres->mutex); mdres->done = 1; pthread_cond_broadcast(&mdres->cond); @@ -1607,6 +1616,7 @@ static int mdrestore_init(struct mdrestore_struct *mdres, pthread_mutex_init(&mdres->mutex, NULL); INIT_LIST_HEAD(&mdres->list); INIT_LIST_HEAD(&mdres->overlapping_chunks); + cache_tree_init(&mdres->sys_chunks); mdres->in = in; mdres->out = out; mdres->old_restore = old_restore; @@ -2025,6 +2035,92 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres, return ret; } +/* + * Add system chunks in super blocks into mdres->sys_chunks, so later + * we can determine if an item is a chunk tree block. + */ +static int add_sys_array(struct mdrestore_struct *mdres, + struct btrfs_super_block *sb) +{ + struct btrfs_disk_key *disk_key; + struct btrfs_key key; + struct btrfs_chunk *chunk; + struct cache_extent *cache; + u32 cur_offset; + u32 len = 0; + u32 array_size; + u8 *array_ptr; + int ret; + + array_size = btrfs_super_sys_array_size(sb); + array_ptr = sb->sys_chunk_array; + cur_offset = 0; + + while (cur_offset < array_size) { + u32 num_stripes; + + disk_key = (struct btrfs_disk_key *)array_ptr; + len = sizeof(*disk_key); + if (cur_offset + len > array_size) + goto out_short_read; + btrfs_disk_key_to_cpu(&key, disk_key); + + array_ptr += len; + cur_offset += len; + + if (key.type == BTRFS_CHUNK_ITEM_KEY) { + chunk = (struct btrfs_chunk *)array_ptr; + + /* + * At least one btrfs_chunk with one stripe must be + * present, exact stripe count check comes afterwards + */ + len = btrfs_chunk_item_size(1); + if (cur_offset + len > array_size) + goto out_short_read; + num_stripes = btrfs_stack_chunk_num_stripes(chunk); + if (!num_stripes) { + printk( + "ERROR: invalid number of stripes %u in sys_array at offset %u\n", + num_stripes, cur_offset); + ret = -EIO; + break; + } + len = btrfs_chunk_item_size(num_stripes); + if (cur_offset + len > array_size) + goto out_short_read; + if (btrfs_stack_chunk_type(chunk) & + BTRFS_BLOCK_GROUP_SYSTEM) { + ret = add_merge_cache_extent(&mdres->sys_chunks, + key.offset, + btrfs_stack_chunk_length(chunk)); + if (ret < 0) + break; + } + } else { + error("unexpected item type %u in sys_array offset %u", + key.type, cur_offset); + ret = -EUCLEAN; + break; + } + array_ptr += len; + cur_offset += len; + } + + /* Get the last system chunk end as a quicker check */ + cache = last_cache_extent(&mdres->sys_chunks); + if (!cache) { + error("no system chunk found in super block"); + return -EUCLEAN; + } + mdres->sys_chunk_end = cache->start + cache->size - 1; + return ret; +out_short_read: + error("sys_array too short to read %u bytes at offset %u\n", + len, cur_offset); + return -EUCLEAN; +} + static int build_chunk_tree(struct mdrestore_struct *mdres, struct meta_cluster *cluster) { @@ -2117,6 +2213,13 @@ static int build_chunk_tree(struct mdrestore_struct *mdres, error("invalid superblock"); return ret; } + ret = add_sys_array(mdres, super); + if (ret < 0) { + error("failed to read system chunk array"); + free(buffer); + pthread_mutex_unlock(&mdres->mutex); + return ret; + } chunk_root_bytenr = btrfs_super_chunk_root(super); mdres->nodesize = btrfs_super_nodesize(super); if (btrfs_super_incompat_flags(super) & From patchwork Thu Jun 6 11:06:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10979289 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D550676 for ; Thu, 6 Jun 2019 11:07:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C35CA280B0 for ; Thu, 6 Jun 2019 11:07:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B75DB2875A; Thu, 6 Jun 2019 11:07:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58E65280B0 for ; Thu, 6 Jun 2019 11:07:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727943AbfFFLHD (ORCPT ); Thu, 6 Jun 2019 07:07:03 -0400 Received: from mx2.suse.de ([195.135.220.15]:34940 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725784AbfFFLHD (ORCPT ); Thu, 6 Jun 2019 07:07:03 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C0E72AE54 for ; Thu, 6 Jun 2019 11:07:01 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 8/9] btrfs-progs: image: Introduce helper to determine if a tree block is in the range of system chunks Date: Thu, 6 Jun 2019 19:06:10 +0800 Message-Id: <20190606110611.27176-9-wqu@suse.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190606110611.27176-1-wqu@suse.com> References: <20190606110611.27176-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce a new helper function, is_in_sys_chunks(), to determine if an item is in the range of system chunks. Since btrfs-image will merge adjacent same type extents into one item, this function is designed to return true for any bytes in system chunk range. Signed-off-by: Qu Wenruo --- image/main.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/image/main.c b/image/main.c index 0460a5f5..dc677409 100644 --- a/image/main.c +++ b/image/main.c @@ -1780,6 +1780,54 @@ static int wait_for_worker(struct mdrestore_struct *mdres) return ret; } +/* + * Check if a range [start ,start + len] has ANY bytes covered by + * system chunks ranges. + */ +static bool is_in_sys_chunks(struct mdrestore_struct *mdres, u64 start, + u64 len) +{ + struct rb_node *node = mdres->sys_chunks.root.rb_node; + struct cache_extent *entry; + struct cache_extent *next; + struct cache_extent *prev; + + if (start > mdres->sys_chunk_end) + return false; + + while (node) { + entry = rb_entry(node, struct cache_extent, rb_node); + if (start > entry->start) { + if (!node->rb_right) + break; + node = node->rb_right; + } else if (start < entry->start) { + if (!node->rb_left) + break; + node = node->rb_left; + } else { + /* already in a system chunk */ + return true; + } + } + if (!node) + return false; + entry = rb_entry(node, struct cache_extent, rb_node); + /* Now we have entry which is the nearst chunk around @start */ + if (start > entry->start) { + prev = entry; + next = next_cache_extent(entry); + } else { + prev = prev_cache_extent(entry); + next = entry; + } + if (prev && prev->start + prev->size > start) + return true; + if (next && start + len > next->start) + return true; + return false; +} + static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer, u64 bytenr, u64 item_bytenr, u32 bufsize, u64 cluster_bytenr) From patchwork Thu Jun 6 11:06:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 10979291 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 57AF376 for ; Thu, 6 Jun 2019 11:07:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 44223280B0 for ; Thu, 6 Jun 2019 11:07:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 387E92875A; Thu, 6 Jun 2019 11:07:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5C10E280B0 for ; Thu, 6 Jun 2019 11:07:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727949AbfFFLHH (ORCPT ); Thu, 6 Jun 2019 07:07:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:34946 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725784AbfFFLHH (ORCPT ); Thu, 6 Jun 2019 07:07:07 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 6E3B2AE54 for ; Thu, 6 Jun 2019 11:07:05 +0000 (UTC) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 9/9] btrfs-progs: image: Rework how we search chunk tree blocks Date: Thu, 6 Jun 2019 19:06:11 +0800 Message-Id: <20190606110611.27176-10-wqu@suse.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190606110611.27176-1-wqu@suse.com> References: <20190606110611.27176-1-wqu@suse.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Before this patch, we were using a very inefficient way to search chunks: We iterate through all clusters to find the chunk root tree block first, then re-iterate all clusters again to find every child tree blocks. Every time we need to iterate all clusters just to find a chunk tree block. This is obviously inefficient, specially when chunk tree get larger. So the original author leaves a comment on it: /* If you have to ask you aren't worthy */ static int search_for_chunk_blocks() This patch will change the behavior so that we will only iterate all clusters once. The idea behind the optimization is, since we have the superblock restored first, we could use the CHUNK_ITEMs in super_block::sys_chunk_array to build a SYSTEM chunk mapping. Then when we start to iterate through all items, we can easily skip unrelated items at different level: - At cluster level If a cluster starts beyond last system chunk map, it must not contain any chunk tree blocks (as chunk tree blocks only lives inside system chunks) - At item level If one item has no intersection with any system chunk map, then it must not contain any tree blocks. By this, we can iterate through all clusters just once, and find out all CHUNK_ITEMs. Signed-off-by: Qu Wenruo --- image/main.c | 213 +++++++++++++++++++++++++++------------------------ 1 file changed, 113 insertions(+), 100 deletions(-) diff --git a/image/main.c b/image/main.c index dc677409..8cecb228 100644 --- a/image/main.c +++ b/image/main.c @@ -142,8 +142,6 @@ struct mdrestore_struct { struct btrfs_fs_info *info; }; -static int search_for_chunk_blocks(struct mdrestore_struct *mdres, - u64 search, u64 cluster_bytenr); static struct extent_buffer *alloc_dummy_eb(u64 bytenr, u32 size); static void csum_block(u8 *buf, size_t len) @@ -1828,67 +1826,17 @@ static bool is_in_sys_chunks(struct mdrestore_struct *mdres, u64 start, return false; } -static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer, - u64 bytenr, u64 item_bytenr, u32 bufsize, - u64 cluster_bytenr) +static int read_chunk_tree_block(struct mdrestore_struct *mdres, + struct extent_buffer *eb) { - struct extent_buffer *eb; - int ret = 0; int i; - eb = alloc_dummy_eb(bytenr, mdres->nodesize); - if (!eb) { - ret = -ENOMEM; - goto out; - } - - while (item_bytenr != bytenr) { - buffer += mdres->nodesize; - item_bytenr += mdres->nodesize; - } - - memcpy(eb->data, buffer, mdres->nodesize); - if (btrfs_header_bytenr(eb) != bytenr) { - error("eb bytenr does not match found bytenr: %llu != %llu", - (unsigned long long)btrfs_header_bytenr(eb), - (unsigned long long)bytenr); - ret = -EIO; - goto out; - } - - if (memcmp(mdres->fsid, eb->data + offsetof(struct btrfs_header, fsid), - BTRFS_FSID_SIZE)) { - error("filesystem metadata UUID of eb %llu does not match", - (unsigned long long)bytenr); - ret = -EIO; - goto out; - } - - if (btrfs_header_owner(eb) != BTRFS_CHUNK_TREE_OBJECTID) { - error("wrong eb %llu owner %llu", - (unsigned long long)bytenr, - (unsigned long long)btrfs_header_owner(eb)); - ret = -EIO; - goto out; - } - for (i = 0; i < btrfs_header_nritems(eb); i++) { struct btrfs_chunk *chunk; struct fs_chunk *fs_chunk; struct btrfs_key key; u64 type; - if (btrfs_header_level(eb)) { - u64 blockptr = btrfs_node_blockptr(eb, i); - - ret = search_for_chunk_blocks(mdres, blockptr, - cluster_bytenr); - if (ret) - break; - continue; - } - - /* Yay a leaf! We loves leafs! */ btrfs_item_key_to_cpu(eb, &key, i); if (key.type != BTRFS_CHUNK_ITEM_KEY) continue; @@ -1896,8 +1844,7 @@ static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer, fs_chunk = malloc(sizeof(struct fs_chunk)); if (!fs_chunk) { error("not enough memory to allocate chunk"); - ret = -ENOMEM; - break; + return -ENOMEM; } memset(fs_chunk, 0, sizeof(*fs_chunk)); chunk = btrfs_item_ptr(eb, i, struct btrfs_chunk); @@ -1906,19 +1853,18 @@ static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer, fs_chunk->physical = btrfs_stripe_offset_nr(eb, chunk, 0); fs_chunk->bytes = btrfs_chunk_length(eb, chunk); INIT_LIST_HEAD(&fs_chunk->list); + if (tree_search(&mdres->physical_tree, &fs_chunk->p, physical_cmp, 1) != NULL) list_add(&fs_chunk->list, &mdres->overlapping_chunks); else tree_insert(&mdres->physical_tree, &fs_chunk->p, physical_cmp); - type = btrfs_chunk_type(eb, chunk); if (type & BTRFS_BLOCK_GROUP_DUP) { fs_chunk->physical_dup = btrfs_stripe_offset_nr(eb, chunk, 1); } - if (fs_chunk->physical_dup + fs_chunk->bytes > mdres->last_physical_offset) mdres->last_physical_offset = fs_chunk->physical_dup + @@ -1933,19 +1879,80 @@ static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer, mdres->alloced_chunks += fs_chunk->bytes; tree_insert(&mdres->chunk_tree, &fs_chunk->l, chunk_cmp); } -out: + return 0; +} + +static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer, + u64 item_bytenr, u32 bufsize, + u64 cluster_bytenr) +{ + struct extent_buffer *eb; + u32 nodesize = mdres->nodesize; + u64 bytenr; + size_t cur_offset; + int ret = 0; + + eb = alloc_dummy_eb(0, mdres->nodesize); + if (!eb) + return -ENOMEM; + + for (cur_offset = 0; cur_offset < bufsize; cur_offset += nodesize) { + bytenr = item_bytenr + cur_offset; + if (!is_in_sys_chunks(mdres, bytenr, nodesize)) + continue; + memcpy(eb->data, buffer + cur_offset, nodesize); + if (btrfs_header_bytenr(eb) != bytenr) { + error( + "eb bytenr does not match found bytenr: %llu != %llu", + (unsigned long long)btrfs_header_bytenr(eb), + (unsigned long long)bytenr); + ret = -EUCLEAN; + break; + } + if (memcmp(mdres->fsid, eb->data + + offsetof(struct btrfs_header, fsid), + BTRFS_FSID_SIZE)) { + error( + "filesystem metadata UUID of eb %llu does not match", + bytenr); + ret = -EUCLEAN; + break; + } + if (btrfs_header_owner(eb) != BTRFS_CHUNK_TREE_OBJECTID) { + error("wrong eb %llu owner %llu", + (unsigned long long)bytenr, + (unsigned long long)btrfs_header_owner(eb)); + ret = -EUCLEAN; + break; + } + /* + * No need to search node, as we will iterate all tree blocks + * in chunk tree, only need to bother leaves. + */ + if (btrfs_header_level(eb)) + continue; + ret = read_chunk_tree_block(mdres, eb); + if (ret < 0) + break; + } free(eb); return ret; } -/* If you have to ask you aren't worthy */ -static int search_for_chunk_blocks(struct mdrestore_struct *mdres, - u64 search, u64 cluster_bytenr) +/* + * This function will try to find all chunk items in the dump image. + * + * This function will iterate all clusters, and find any item inside + * system chunk ranges. + * For such item, it will try to read them as tree blocks, and find + * CHUNK_ITEMs, add them to @mdres. + */ +static int search_for_chunk_blocks(struct mdrestore_struct *mdres) { struct meta_cluster *cluster; struct meta_cluster_header *header; struct meta_cluster_item *item; - u64 current_cluster = cluster_bytenr, bytenr; + u64 current_cluster = 0, bytenr; u64 item_bytenr; u32 bufsize, nritems, i; u32 max_size = current_version->max_pending_size * 2; @@ -1976,43 +1983,45 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres, } bytenr = current_cluster; + /* Main loop, iterating all clusters */ while (1) { if (fseek(mdres->in, current_cluster, SEEK_SET)) { error("seek failed: %m"); ret = -EIO; - break; + goto out; } ret = fread(cluster, BLOCK_SIZE, 1, mdres->in); if (ret == 0) { - if (cluster_bytenr != 0) { - cluster_bytenr = 0; - current_cluster = 0; - bytenr = 0; - continue; - } + if (feof(mdres->in)) + goto out; error( "unknown state after reading cluster at %llu, probably corrupted data", - cluster_bytenr); + current_cluster); ret = -EIO; - break; + goto out; } else if (ret < 0) { error("unable to read image at %llu: %m", - (unsigned long long)cluster_bytenr); - break; + current_cluster); + goto out; } - ret = 0; header = &cluster->header; if (le64_to_cpu(header->magic) != current_version->magic_cpu || le64_to_cpu(header->bytenr) != current_cluster) { error("bad header in metadump image"); ret = -EIO; - break; + goto out; } + /* We're already over the system chunk end, no need to search*/ + if (current_cluster > mdres->sys_chunk_end) + goto out; + bytenr += BLOCK_SIZE; nritems = le32_to_cpu(header->nritems); + + /* Search items for tree blocks in sys chunks */ for (i = 0; i < nritems; i++) { size_t size; @@ -2020,11 +2029,21 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres, bufsize = le32_to_cpu(item->size); item_bytenr = le64_to_cpu(item->bytenr); - if (bufsize > max_size) { - error("item %u too big: %u > %u", i, bufsize, - max_size); - ret = -EIO; - break; + /* + * Only data extent/free space cache can be that big, + * adjacent tree blocks won't be able to be merged + * beyond max_size. + */ + if (bufsize > max_size || + !is_in_sys_chunks(mdres, item_bytenr, bufsize)) { + ret = fseek(mdres->in, bufsize, SEEK_CUR); + if (ret < 0) { + error("failed to seek: %m"); + ret = -errno; + goto out; + } + bytenr += bufsize; + continue; } if (mdres->compress_method == COMPRESS_ZLIB) { @@ -2032,7 +2051,7 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres, if (ret != 1) { error("read error: %m"); ret = -EIO; - break; + goto out; } size = max_size; @@ -2043,40 +2062,36 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres, error("decompression failed with %d", ret); ret = -EIO; - break; + goto out; } } else { ret = fread(buffer, bufsize, 1, mdres->in); if (ret != 1) { error("read error: %m"); ret = -EIO; - break; + goto out; } size = bufsize; } ret = 0; - if (item_bytenr <= search && - item_bytenr + size > search) { - ret = read_chunk_block(mdres, buffer, search, - item_bytenr, size, - current_cluster); - if (!ret) - ret = 1; - break; + ret = read_chunk_block(mdres, buffer, + item_bytenr, size, + current_cluster); + if (ret < 0) { + error( + "failed to search tree blocks in item bytenr %llu size %lu", + item_bytenr, size); + goto out; } bytenr += bufsize; } - if (ret) { - if (ret > 0) - ret = 0; - break; - } if (bytenr & BLOCK_MASK) bytenr += BLOCK_SIZE - (bytenr & BLOCK_MASK); current_cluster = bytenr; } +out: free(tmp); free(buffer); free(cluster); @@ -2175,7 +2190,6 @@ static int build_chunk_tree(struct mdrestore_struct *mdres, struct btrfs_super_block *super; struct meta_cluster_header *header; struct meta_cluster_item *item = NULL; - u64 chunk_root_bytenr = 0; u32 i, nritems; u64 bytenr = 0; u8 *buffer; @@ -2268,7 +2282,6 @@ static int build_chunk_tree(struct mdrestore_struct *mdres, pthread_mutex_unlock(&mdres->mutex); return ret; } - chunk_root_bytenr = btrfs_super_chunk_root(super); mdres->nodesize = btrfs_super_nodesize(super); if (btrfs_super_incompat_flags(super) & BTRFS_FEATURE_INCOMPAT_METADATA_UUID) @@ -2281,7 +2294,7 @@ static int build_chunk_tree(struct mdrestore_struct *mdres, free(buffer); pthread_mutex_unlock(&mdres->mutex); - return search_for_chunk_blocks(mdres, chunk_root_bytenr, 0); + return search_for_chunk_blocks(mdres); } static int range_contains_super(u64 physical, u64 bytes)