From patchwork Tue Dec 6 08:23:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13065514 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDC3EC47090 for ; Tue, 6 Dec 2022 08:24:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230282AbiLFIYO (ORCPT ); Tue, 6 Dec 2022 03:24:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233585AbiLFIYA (ORCPT ); Tue, 6 Dec 2022 03:24:00 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 302CF13E07 for ; Tue, 6 Dec 2022 00:23:59 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8788A1FDC8 for ; Tue, 6 Dec 2022 08:23:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670315037; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x3//WwO+YIV/iyzRlSD38kaok82pWiWaeXHbMH5lq1Q=; b=uAp2AqIejLPJx3KDnEk61JY2pfCET6nBDwNc78+2tIUOcC8dWv2Exjv0jYqwOu86L7+tkO pKGj0Pe55t0EfHJrOi3S+F5tvPpmP1+md7NJS8+iovr9C/w8OeO/r6Py/QfbJ6ZETTZ7u3 hSXtZ4G/5PHgp08eGZnxgiEch2/bsHQ= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id EAB44132F3 for ; Tue, 6 Dec 2022 08:23:56 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id GAjkLRz8jmNRbAAAGKfGzw (envelope-from ) for ; Tue, 06 Dec 2022 08:23:56 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PoC PATCH 01/11] btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based interface Date: Tue, 6 Dec 2022 16:23:28 +0800 Message-Id: <5fc73bee19c14a6e881e0b785837cc5e41876267.1670314744.git.wqu@suse.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org These new structures will have "scrub2_" as their prefix, along with the alloc and free functions for them. The basic idea is, we keep the existing per-device scrub behavior, but get rid of the bio form shaping by always read the full BTRFS_STRIPE_LEN range. This means we will read some sectors which is not scrub target, but that's fine. At write back time we still only submit repaired sectors. With every read submitted in BTRFS_STRIPE_LEN, there should not be much need for a complex bio form shaping mechanism. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 156 +++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/scrub.h | 6 ++ 2 files changed, 162 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 52b346795f66..286bdcb8b7ad 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -70,6 +70,101 @@ struct scrub_ctx; */ #define BTRFS_MAX_MIRRORS (4 + 1) +/* + * Represent one sector inside a scrub2_stripe. + * Contains all the info to verify the sector. + */ +struct scrub2_sector { + bool is_metadata; + + union { + /* + * Csum pointer for data csum verification. + * Should point to a sector csum inside scrub2_stripe::csums. + * + * NULL if this data sector has no csum. + */ + u8 *csum; + + /* + * Extra info for metadata verification. + * All sectors inside a tree block shares the same + * geneartion. + */ + u64 generation; + }; +}; + +/* + * Represent one continuous range with a length of BTRFS_STRIPE_LEN. + */ +struct scrub2_stripe { + struct btrfs_block_group *bg; + + struct page *pages[BTRFS_STRIPE_LEN / PAGE_SIZE]; + struct scrub2_sector *sectors; + + u64 logical; + /* + * We use btrfs_submit_bio() infrastructure, thus logical + mirror_num + * is enough to locate one stripe. + */ + u16 mirror_num; + + /* Should be BTRFS_STRIPE_LEN / sectorsize. */ + u16 nr_sectors; + + atomic_t pending_io; + wait_queue_head_t io_wait; + + /* Indicates which sectors are covered by extent items. */ + unsigned long used_sector_bitmap; + + /* + * Records the errors found after the initial read. + * This will be used for repair, as any sector with error needs repair + * (if found a good copy). + */ + unsigned long init_error_bitmap; + + /* + * After reading another copy and verification, sectors can be repaired + * will be cleared. + */ + unsigned long current_error_bitmap; + + /* + * The following error_bitmap are all for the initial read operation. + * After the initial read, we should not touch those error bitmaps, as + * they will later be used to do error reporting. + * + * Indicates IO errors during read. + */ + unsigned long io_error_bitmap; + + /* For both metadata and data. */ + unsigned long csum_error_bitmap; + + /* + * Indicates metadata specific errors. + * (basic sanity checks to transid errors) + */ + unsigned long meta_error_bitmap; + + /* + * Checksum for the whole stripe if this stripe is inside a data block + * group. + */ + u8 *csums; + + /* + * Used to verify any tree block if this stripe is inside a meta block + * group. + * We reuse the same eb for all metadata of the same stripe. + */ + struct extent_buffer *dummy_eb; +}; + struct scrub_recover { refcount_t refs; struct btrfs_io_context *bioc; @@ -266,6 +361,67 @@ static void detach_scrub_page_private(struct page *page) #endif } +static void free_scrub2_stripe(struct scrub2_stripe *stripe) +{ + int i; + + if (!stripe) + return; + + for (i = 0; i < BTRFS_STRIPE_LEN >> PAGE_SHIFT; i++) { + if (stripe->pages[i]) + __free_page(stripe->pages[i]); + } + kfree(stripe->sectors); + kfree(stripe->csums); + if (stripe->dummy_eb) + free_extent_buffer(stripe->dummy_eb); + kfree(stripe); +} + +struct scrub2_stripe *alloc_scrub2_stripe(struct btrfs_fs_info *fs_info, + struct btrfs_block_group *bg) +{ + struct scrub2_stripe *stripe; + int ret; + + stripe = kzalloc(sizeof(*stripe), GFP_KERNEL); + if (!stripe) + return NULL; + + init_waitqueue_head(&stripe->io_wait); + atomic_set(&stripe->pending_io, 0); + + stripe->nr_sectors = BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits; + + ret = btrfs_alloc_page_array(BTRFS_STRIPE_LEN >> PAGE_SHIFT, + stripe->pages); + if (ret < 0) + goto cleanup; + + stripe->sectors = kcalloc(stripe->nr_sectors, + sizeof(struct scrub2_sector), GFP_KERNEL); + if (!stripe->sectors) + goto cleanup; + + if (bg->flags & BTRFS_BLOCK_GROUP_METADATA) { + stripe->dummy_eb = alloc_dummy_extent_buffer(fs_info, 0); + if (!stripe->dummy_eb) + goto cleanup; + } + if (bg->flags & BTRFS_BLOCK_GROUP_DATA) { + stripe->csums = kzalloc( + (BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits) * + fs_info->csum_size, GFP_KERNEL); + if (!stripe->csums) + goto cleanup; + } + return stripe; +cleanup: + free_scrub2_stripe(stripe); + return NULL; +} + static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx, struct btrfs_device *dev, u64 logical, u64 physical, diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index 7639103ebf9d..d278c0f43007 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -13,4 +13,10 @@ int btrfs_scrub_cancel_dev(struct btrfs_device *dev); int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, struct btrfs_scrub_progress *progress); +/* + * The following functions are temporary exports to avoid warning on unused + * static functions. + */ +struct scrub2_stripe *alloc_scrub2_stripe(struct btrfs_fs_info *fs_info, + struct btrfs_block_group *bg); #endif From patchwork Tue Dec 6 08:23:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13065515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DA31C3A5A7 for ; Tue, 6 Dec 2022 08:24:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233196AbiLFIYQ (ORCPT ); Tue, 6 Dec 2022 03:24:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233599AbiLFIYA (ORCPT ); Tue, 6 Dec 2022 03:24:00 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD61B13E0C for ; Tue, 6 Dec 2022 00:23:59 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 7BCF71FE6D for ; Tue, 6 Dec 2022 08:23:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670315038; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8Tpu3fLHEDGKy4nbwhMA33gmqSGODVVVXdf+BsOfmdw=; b=HXjz6sR4KnGxPdSD1e49L5YtFZaA8eU2cFWFxrLU7zwIoDFuKZ5o4JCmWoQmaTztem8+rk NfFje2vjpolASMjc/8tGudQ+tI08TsQ9BzHskODtZ6iCF8xgqdG3yCYH8WvfpJ0WdccbdF mw0uCyam0JaH4lYJXOljzj11SFX+G7I= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id DE142132F3 for ; Tue, 6 Dec 2022 08:23:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id iMzuKh38jmNRbAAAGKfGzw (envelope-from ) for ; Tue, 06 Dec 2022 08:23:57 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PoC PATCH 02/11] btrfs: scrub: introduce a helper to find and fill the sector info for a scrub2_stripe Date: Tue, 6 Dec 2022 16:23:29 +0800 Message-Id: <9060f52ffb145f99f0bcb22cff62be7fb8aca580.1670314744.git.wqu@suse.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The new helper will search the extent tree to find the first extent of a logical range, then fill the sectors array by two loops: - Loop 1 to fill common bits and metadata generation - Loop 2 to fill csum data (only for data bgs) This loop will use the new btrfs_lookup_csums_bitmap() to fill the full csum buffer, and set scrub2_sector::has_csum. With all the needed info fulfilled by this function, later we only need to submit and verify the stripe. Here we temporarily export the helper to avoid wanring on unused static function. Signed-off-by: Qu Wenruo --- fs/btrfs/file-item.c | 8 ++- fs/btrfs/file-item.h | 3 +- fs/btrfs/raid56.c | 2 +- fs/btrfs/scrub.c | 131 +++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/scrub.h | 7 +++ 5 files changed, 148 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 5de73466b2ca..67a0fc54c95e 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -671,7 +671,8 @@ int btrfs_lookup_csums_list(struct btrfs_root *root, u64 start, u64 end, * in is large enough to contain all csums. */ int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end, - u8 *csum_buf, unsigned long *csum_bitmap) + u8 *csum_buf, unsigned long *csum_bitmap, + bool search_commit) { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_key key; @@ -687,6 +688,11 @@ int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end, path = btrfs_alloc_path(); if (!path) return -ENOMEM; + if (search_commit) { + path->skip_locking = 1; + path->reada = READA_FORWARD; + path->search_commit_root = 1; + } key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; key.type = BTRFS_EXTENT_CSUM_KEY; diff --git a/fs/btrfs/file-item.h b/fs/btrfs/file-item.h index 031225668434..64f4e5ca394a 100644 --- a/fs/btrfs/file-item.h +++ b/fs/btrfs/file-item.h @@ -55,7 +55,8 @@ int btrfs_lookup_csums_list(struct btrfs_root *root, u64 start, u64 end, struct list_head *list, int search_commit, bool nowait); int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end, - u8 *csum_buf, unsigned long *csum_bitmap); + u8 *csum_buf, unsigned long *csum_bitmap, + bool search_commit); void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode, const struct btrfs_path *path, struct btrfs_file_extent_item *fi, diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 2d90a6b5eb00..cea448f4eda3 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -2171,7 +2171,7 @@ static void fill_data_csums(struct btrfs_raid_bio *rbio) } ret = btrfs_lookup_csums_bitmap(csum_root, start, start + len - 1, - rbio->csum_buf, rbio->csum_bitmap); + rbio->csum_buf, rbio->csum_bitmap, false); if (ret < 0) goto error; if (bitmap_empty(rbio->csum_bitmap, len >> fs_info->sectorsize_bits)) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 286bdcb8b7ad..f4632fca5e67 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3526,6 +3526,137 @@ static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical, return ret; } +static void fill_one_extent_info(struct btrfs_fs_info *fs_info, + struct scrub2_stripe *stripe, + u64 extent_start, u64 extent_len, + u64 extent_flags, u64 extent_gen) +{ + u64 cur_logical; + + for (cur_logical = max(stripe->logical, extent_start); + cur_logical < min(stripe->logical + BTRFS_STRIPE_LEN, + extent_start + extent_len); + cur_logical += fs_info->sectorsize) { + const int nr_sector = (cur_logical - stripe->logical) >> + fs_info->sectorsize_bits; + struct scrub2_sector *sector = &stripe->sectors[nr_sector]; + + set_bit(nr_sector, &stripe->used_sector_bitmap); + if (extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { + sector->is_metadata = true; + sector->generation = extent_gen; + } + } +} + +/* + * Locate one stripe which has at least one extent in its range. + * + * Return 0 if found such stripe, and store its info into @stripe. + * Return >0 if there is no such stripe in the specified range. + * Return <0 for error. + */ +int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, + struct btrfs_root *csum_root, + struct btrfs_block_group *bg, + u64 logical_start, u64 logical_len, + struct scrub2_stripe *stripe) +{ + struct btrfs_fs_info *fs_info = extent_root->fs_info; + const u64 logical_end = logical_start + logical_len; + struct btrfs_path path = { 0 }; + u64 cur_logical = logical_start; + u64 stripe_end; + u64 extent_start; + u64 extent_len; + u64 extent_flags; + u64 extent_gen; + int ret; + + memset(stripe->sectors, 0, sizeof(struct scrub2_sector) * stripe->nr_sectors); + bitmap_zero(&stripe->init_error_bitmap, stripe->nr_sectors); + bitmap_zero(&stripe->current_error_bitmap, stripe->nr_sectors); + + /* The range must be inside the bg */ + ASSERT(logical_start >= bg->start && logical_end <= bg->start + bg->length); + + path.search_commit_root = 1; + path.skip_locking = 1; + + ret = find_first_extent_item(extent_root, &path, logical_start, + logical_len); + /* Either error or not found. */ + if (ret) + goto out; + get_extent_info(&path, &extent_start, &extent_len, + &extent_flags, &extent_gen); + cur_logical = max(extent_start, cur_logical); + + /* + * Round down to stripe boundary. + * + * The extra calculation against bg->start is to handle block groups + * whose logical bytenr is not BTRFS_STRIPE_LEN aligned. + */ + stripe->logical = round_down(cur_logical - bg->start, BTRFS_STRIPE_LEN) + + bg->start; + stripe_end = stripe->logical + BTRFS_STRIPE_LEN - 1; + + /* Fill the first extent info into stripe->sectors[] array. */ + fill_one_extent_info(fs_info, stripe, extent_start, extent_len, + extent_flags, extent_gen); + cur_logical = extent_start + extent_len; + + /* Fill the extent info for the remaining sectors. */ + while (cur_logical <= stripe_end) { + ret = find_first_extent_item(extent_root, &path, cur_logical, + stripe_end - cur_logical + 1); + if (ret < 0) + goto out; + if (ret > 0) { + ret = 0; + break; + } + get_extent_info(&path, &extent_start, &extent_len, + &extent_flags, &extent_gen); + fill_one_extent_info(fs_info, stripe, extent_start, extent_len, + extent_flags, extent_gen); + cur_logical = extent_start + extent_len; + } + + /* Now fill the data csum. */ + if (bg->flags & BTRFS_BLOCK_GROUP_DATA) { + int sector_nr; + unsigned long csum_bitmap = 0; + + /* Csum space should have already been allocated. */ + ASSERT(stripe->csums); + + /* + * Our csum bitmap should be large enough, as BTRFS_STRIPE_LEN + * should contain at most 16 sectors. + */ + ASSERT(BITS_PER_LONG >= + BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits); + + ret = btrfs_lookup_csums_bitmap(csum_root, stripe->logical, + stripe_end, stripe->csums, + &csum_bitmap, true); + if (ret < 0) + goto out; + if (ret > 0) + ret = 0; + + for_each_set_bit(sector_nr, &csum_bitmap, stripe->nr_sectors) { + stripe->sectors[sector_nr].csum = stripe->csums + + sector_nr * fs_info->csum_size; + } + } +out: + btrfs_release_path(&path); + return ret; +} + /* * Scrub one range which can only has simple mirror based profile. * (Including all range in SINGLE/DUP/RAID1/RAID1C*, and each stripe in diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index d278c0f43007..0b2a89f7a2e0 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -17,6 +17,13 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, * The following functions are temporary exports to avoid warning on unused * static functions. */ +struct scrub2_stripe; struct scrub2_stripe *alloc_scrub2_stripe(struct btrfs_fs_info *fs_info, struct btrfs_block_group *bg); +int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, + struct btrfs_root *csum_root, + struct btrfs_block_group *bg, + u64 logical_start, u64 logical_len, + struct scrub2_stripe *stripe); + #endif From patchwork Tue Dec 6 08:23:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13065516 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54C34C352A1 for ; Tue, 6 Dec 2022 08:24:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233536AbiLFIYR (ORCPT ); Tue, 6 Dec 2022 03:24:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233119AbiLFIYC (ORCPT ); Tue, 6 Dec 2022 03:24:02 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24DABEE36 for ; Tue, 6 Dec 2022 00:24:01 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id DBE3C1FE6F for ; Tue, 6 Dec 2022 08:23:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670315039; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xy0WK7fDCYameXUtqVlzP6nHTIUSzBKgG6/JKujqMpw=; b=U6HcfwnZObb6D6hr8gOb7TPjGFm/a0Fbkmw5O/g84vj1eLS4cNWrQh7c/vuVxvqQhIIbEf MxK37Jwr0uLx4NHotQzJUALOp/slHuIXchLVoXL/jUCcYH/a+A8wNEROYjNDihuiSNe9A4 TAiyPEApiR3eOMYwYGQcfNx+/XjzyUU= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 413C3132F3 for ; Tue, 6 Dec 2022 08:23:58 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id OOWCKB78jmNRbAAAGKfGzw (envelope-from ) for ; Tue, 06 Dec 2022 08:23:58 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PoC PATCH 03/11] btrfs: scrub: introduce a helper to verify one scrub2_stripe Date: Tue, 6 Dec 2022 16:23:30 +0800 Message-Id: X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The new helper, scrub2_verify_stripe() is not much different than the old scrub way. The difference is mostly in how we handle metadata verification. This version will use a dummy extent buffer so we can share all the existing metadata verification code. Currently the helper will only verify and update the error bitmaps, we don't yet output any error message, as that can only be done after either we repaired the stripe or exhausted all the mirrors. Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 10 +-- fs/btrfs/disk-io.h | 2 + fs/btrfs/scrub.c | 153 +++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/scrub.h | 1 + 4 files changed, 161 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 0888d484df80..e2b91f14d14a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -98,7 +98,7 @@ struct async_submit_bio { /* * Compute the csum of a btree block and store the result to provided buffer. */ -static void csum_tree_block(struct extent_buffer *buf, u8 *result) +void btrfs_csum_tree_block(struct extent_buffer *buf, u8 *result) { struct btrfs_fs_info *fs_info = buf->fs_info; const int num_pages = num_extent_pages(buf); @@ -337,7 +337,7 @@ static int csum_one_extent_buffer(struct extent_buffer *eb) ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid, offsetof(struct btrfs_header, fsid), BTRFS_FSID_SIZE) == 0); - csum_tree_block(eb, result); + btrfs_csum_tree_block(eb, result); if (btrfs_header_level(eb)) ret = btrfs_check_node(eb); @@ -448,7 +448,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec return csum_one_extent_buffer(eb); } -static int check_tree_block_fsid(struct extent_buffer *eb) +int btrfs_check_tree_block_fsid(struct extent_buffer *eb) { struct btrfs_fs_info *fs_info = eb->fs_info; struct btrfs_fs_devices *fs_devices = fs_info->fs_devices, *seed_devs; @@ -499,7 +499,7 @@ static int validate_extent_buffer(struct extent_buffer *eb, ret = -EIO; goto out; } - if (check_tree_block_fsid(eb)) { + if (btrfs_check_tree_block_fsid(eb)) { btrfs_err_rl(fs_info, "bad fsid on logical %llu mirror %u", eb->start, eb->read_mirror); ret = -EIO; @@ -514,7 +514,7 @@ static int validate_extent_buffer(struct extent_buffer *eb, goto out; } - csum_tree_block(eb, result); + btrfs_csum_tree_block(eb, result); header_csum = page_address(eb->pages[0]) + get_eb_offset_in_page(eb, offsetof(struct btrfs_header, csum)); diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 363935cfc084..08c498b8c40d 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -44,6 +44,8 @@ void btrfs_clear_oneshot_options(struct btrfs_fs_info *fs_info); int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info); int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, const struct btrfs_super_block *disk_sb); +int btrfs_check_tree_block_fsid(struct extent_buffer *eb); +void btrfs_csum_tree_block(struct extent_buffer *buf, u8 *result); int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options); diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index f4632fca5e67..de194c31428e 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3657,6 +3657,159 @@ int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, return ret; } +static void scrub2_copy_sector_into_eb(struct scrub2_stripe *stripe, + int sector_nr) +{ + struct btrfs_fs_info *fs_info = stripe->bg->fs_info; + struct extent_buffer *eb = stripe->dummy_eb; + int i; + const unsigned int sectors_per_tree = fs_info->nodesize >> + fs_info->sectorsize_bits; + + /* Our tree block should not cross stripe boundary. */ + ASSERT(sector_nr >= 0 && + sector_nr + sectors_per_tree - 1 < stripe->nr_sectors); + + eb->start = stripe->logical + (sector_nr << fs_info->sectorsize_bits); + + for (i = sector_nr; i < sector_nr + sectors_per_tree; i++) { + int page_index = i << fs_info->sectorsize_bits >> PAGE_SHIFT; + void *src = page_address(stripe->pages[page_index]) + + offset_in_page(i << fs_info->sectorsize_bits); + + write_extent_buffer(eb, src, + (i - sector_nr) << fs_info->sectorsize_bits, + fs_info->sectorsize); + } +} + +/* + * At this stage, we should only update the error bitmaps, not yet output + * any warning message. + * The warning messages would be outputted after exhausting all copies (without + * a good copy), or after repaired the stripe. + */ +static void scrub2_verify_one_metadata(struct scrub2_stripe *stripe, + int sector_nr) +{ + struct btrfs_fs_info *fs_info = stripe->bg->fs_info; + struct extent_buffer *eb = stripe->dummy_eb; + const unsigned int sectors_per_tree = fs_info->nodesize >> + fs_info->sectorsize_bits; + u8 result[BTRFS_CSUM_SIZE]; + const u8 *header_csum; + + scrub2_copy_sector_into_eb(stripe, sector_nr); + + /* Basic sanity checks (bytenr and fsid) */ + if (btrfs_header_bytenr(eb) != + stripe->logical + (sector_nr << fs_info->sectorsize_bits)) { + bitmap_set(&stripe->meta_error_bitmap, sector_nr, + sectors_per_tree); + return; + } + if (btrfs_check_tree_block_fsid(eb)) { + bitmap_set(&stripe->meta_error_bitmap, sector_nr, + sectors_per_tree); + return; + } + if (btrfs_header_level(eb) >= BTRFS_MAX_LEVEL) { + bitmap_set(&stripe->meta_error_bitmap, sector_nr, + sectors_per_tree); + return; + } + btrfs_csum_tree_block(eb, result); + header_csum = page_address(eb->pages[0]) + + get_eb_offset_in_page(eb, offsetof(struct btrfs_header, csum)); + if (memcmp(result, header_csum, fs_info->csum_size) != 0) { + bitmap_set(&stripe->csum_error_bitmap, sector_nr, + sectors_per_tree); + return; + } + if (btrfs_header_generation(eb) != + stripe->sectors[sector_nr].generation) { + bitmap_set(&stripe->meta_error_bitmap, sector_nr, + sectors_per_tree); + return; + } +} + +static void scrub2_verify_one_sector(struct scrub2_stripe *stripe, + int sector_nr) +{ + struct btrfs_fs_info *fs_info = stripe->bg->fs_info; + struct scrub2_sector *sector = &stripe->sectors[sector_nr]; + const unsigned int sectors_per_tree = fs_info->nodesize >> + fs_info->sectorsize_bits; + int page_index = sector_nr << fs_info->sectorsize_bits >> PAGE_SHIFT; + int pgoff = offset_in_page(sector_nr << fs_info->sectorsize_bits); + + ASSERT(sector_nr >= 0 && sector_nr < stripe->nr_sectors); + + /* Sector not utilized, skip it. */ + if (test_bit(sector_nr, &stripe->used_sector_bitmap)) + return; + + /* Metadata, verify the full tree block. */ + if (sector->is_metadata) { + /* + * Check if the tree block crosses the stripe boudary. + * If crossed the boundary, we can not verify it but only + * gives a warning. + * + * This can only happen in very old fs where chunks are not + * ensured to be stripe aligned. + */ + if (unlikely(sector_nr + sectors_per_tree >= + stripe->nr_sectors)) { + btrfs_warn_rl(fs_info, + "tree block at %llu crosses stripe boundary %llu", + stripe->logical + + (sector_nr << fs_info->sectorsize_bits), + stripe->logical); + return; + } + scrub2_verify_one_metadata(stripe, sector_nr); + return; + } + + /* Data is much easier, we just verify the data csum (if we have). */ + if (sector->csum) { + int ret; + u8 csum_buf[BTRFS_CSUM_SIZE]; + + ret = btrfs_check_sector_csum(fs_info, stripe->pages[page_index], pgoff, + csum_buf, sector->csum); + if (ret < 0) + set_bit(sector_nr, &stripe->csum_error_bitmap); + } +} + +void scrub2_verify_one_stripe(struct scrub2_stripe *stripe) +{ + struct btrfs_fs_info *fs_info = stripe->bg->fs_info; + const unsigned int sectors_per_tree = fs_info->nodesize >> + fs_info->sectorsize_bits; + int sector_nr; + + for_each_set_bit(sector_nr, &stripe->used_sector_bitmap, + stripe->nr_sectors) { + scrub2_verify_one_sector(stripe, sector_nr); + if (stripe->sectors[sector_nr].is_metadata) + sector_nr += sectors_per_tree - 1; + } + /* + * All error bitmap updated. OR all errors into the + * initial_error_bitmap for later repair runs. + */ + bitmap_or(&stripe->init_error_bitmap, &stripe->io_error_bitmap, + &stripe->csum_error_bitmap, stripe->nr_sectors); + bitmap_or(&stripe->init_error_bitmap, &stripe->init_error_bitmap, + &stripe->meta_error_bitmap, stripe->nr_sectors); + bitmap_copy(&stripe->current_error_bitmap, &stripe->init_error_bitmap, + stripe->nr_sectors); +} + /* * Scrub one range which can only has simple mirror based profile. * (Including all range in SINGLE/DUP/RAID1/RAID1C*, and each stripe in diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index 0b2a89f7a2e0..0aaed61e4d4d 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -25,5 +25,6 @@ int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, struct btrfs_block_group *bg, u64 logical_start, u64 logical_len, struct scrub2_stripe *stripe); +void scrub2_verify_one_stripe(struct scrub2_stripe *stripe); #endif From patchwork Tue Dec 6 08:23:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13065517 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABAE2C47090 for ; Tue, 6 Dec 2022 08:24:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233551AbiLFIYS (ORCPT ); Tue, 6 Dec 2022 03:24:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233466AbiLFIYD (ORCPT ); Tue, 6 Dec 2022 03:24:03 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29FB7CE0D for ; Tue, 6 Dec 2022 00:24:02 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C873621C04 for ; Tue, 6 Dec 2022 08:24:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670315040; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h/fQ66Px29j+ycB/rYTS+wW0uX7s+Wg/1g7K+LcSaXE=; b=pGCzBUHzYrGzKddDPLOhgXNSklRJDWZ2L5Xw4NWpxXPvaic3L8zBU4uSiKHqSbi47CXOiz Mf2U8vHC6y7eOdGlBJQgCzd5qFVVcvvYL15gEmg9ljASAZls1FQ6GdhpIizn3gWwgwbRx5 WF9x4nWBrC7RxAWXuKtYZUrLFAsYHlg= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 36231132F3 for ; Tue, 6 Dec 2022 08:24:00 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id MO+2ASD8jmNRbAAAGKfGzw (envelope-from ) for ; Tue, 06 Dec 2022 08:24:00 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PoC PATCH 04/11] btrfs: scrub: add the repair function for scrub2_stripe Date: Tue, 6 Dec 2022 16:23:31 +0800 Message-Id: X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The new helper, scrub2_repair_one_stripe(), would do the following work to repair one scrub2_stripe: - Go through each remaining mirrors - Submit a BTRFS_STRIPE_LEN read for the target mirror - Run verification for above read - Copy repaired sectors back to the original stripe And clear the current_error_bitmap bit for the original stripe. - Check if we repaired all the sectors This is a little different than the original repair behavior: - We only try the next mirror if the current mirror can not repair all sectors. While the old behavior is to submit read concurrently for all the remaining mirrors. I'd prefer to put the parallel read into the new scrub_fs interface instead. For current per-device scrub, the sequential repair only makes difference for RAID1C* and RAID6. Thus I'd prefer a cleaner code instead. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 158 ++++++++++++++++++++++++++++++++++++++++++++++- fs/btrfs/scrub.h | 2 +- 2 files changed, 158 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index de194c31428e..15c95cf88a2e 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -24,6 +24,7 @@ #include "accessors.h" #include "file-item.h" #include "scrub.h" +#include "bio.h" /* * This is only the first step towards a full-features scrub. It reads all @@ -416,12 +417,44 @@ struct scrub2_stripe *alloc_scrub2_stripe(struct btrfs_fs_info *fs_info, if (!stripe->csums) goto cleanup; } + stripe->bg = bg; return stripe; cleanup: free_scrub2_stripe(stripe); return NULL; } +static struct scrub2_stripe *clone_scrub2_stripe(struct btrfs_fs_info *fs_info, + const struct scrub2_stripe *src) +{ + struct scrub2_stripe *dst; + int sector_nr; + + dst = alloc_scrub2_stripe(fs_info, src->bg); + if (!dst) + return NULL; + if (src->csums) + memcpy(dst->csums, src->csums, + src->nr_sectors * fs_info->csum_size); + bitmap_copy(&dst->used_sector_bitmap, &src->used_sector_bitmap, + src->nr_sectors); + for_each_set_bit(sector_nr, &src->used_sector_bitmap, src->nr_sectors) { + dst->sectors[sector_nr].is_metadata = + src->sectors[sector_nr].is_metadata; + /* For meta, copy the generation number. */ + if (src->sectors[sector_nr].is_metadata) { + dst->sectors[sector_nr].generation = + src->sectors[sector_nr].generation; + continue; + } + /* For data, only update csum pointer if there is data csum. */ + if (src->sectors[sector_nr].csum) + dst->sectors[sector_nr].csum = dst->csums + + sector_nr * fs_info->csum_size; + } + return dst; +} + static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx, struct btrfs_device *dev, u64 logical, u64 physical, @@ -3750,6 +3783,10 @@ static void scrub2_verify_one_sector(struct scrub2_stripe *stripe, if (test_bit(sector_nr, &stripe->used_sector_bitmap)) return; + /* IO error, no need to check. */ + if (test_bit(sector_nr, &stripe->io_error_bitmap)) + return; + /* Metadata, verify the full tree block. */ if (sector->is_metadata) { /* @@ -3785,7 +3822,7 @@ static void scrub2_verify_one_sector(struct scrub2_stripe *stripe, } } -void scrub2_verify_one_stripe(struct scrub2_stripe *stripe) +static void scrub2_verify_one_stripe(struct scrub2_stripe *stripe) { struct btrfs_fs_info *fs_info = stripe->bg->fs_info; const unsigned int sectors_per_tree = fs_info->nodesize >> @@ -3810,6 +3847,125 @@ void scrub2_verify_one_stripe(struct scrub2_stripe *stripe) stripe->nr_sectors); } +static void scrub2_read_endio(struct btrfs_bio *bbio) +{ + struct scrub2_stripe *stripe = bbio->private; + + if (bbio->bio.bi_status) { + bitmap_set(&stripe->io_error_bitmap, 0, stripe->nr_sectors); + bitmap_set(&stripe->init_error_bitmap, 0, stripe->nr_sectors); + } + bio_put(&bbio->bio); + if (atomic_dec_and_test(&stripe->pending_io)) + wake_up(&stripe->io_wait); +} + +static void scrub2_read_and_wait_stripe(struct scrub2_stripe *stripe) +{ + struct btrfs_fs_info *fs_info = stripe->bg->fs_info; + struct bio *bio; + int ret; + int i; + + ASSERT(stripe->mirror_num >= 1); + + ASSERT(atomic_read(&stripe->pending_io) == 0); + bio = btrfs_bio_alloc(BTRFS_STRIPE_LEN >> PAGE_SHIFT, REQ_OP_READ, + scrub2_read_endio, stripe); + /* Backed by mempool, should not fail. */ + ASSERT(bio); + + bio->bi_iter.bi_sector = stripe->logical >> SECTOR_SHIFT; + + for (i = 0; i < BTRFS_STRIPE_LEN >> PAGE_SHIFT; i++) { + ret = bio_add_page(bio, stripe->pages[i], PAGE_SIZE, 0); + ASSERT(ret == PAGE_SIZE); + } + atomic_inc(&stripe->pending_io); + btrfs_submit_bio(fs_info, bio, stripe->mirror_num); + wait_event(stripe->io_wait, atomic_read(&stripe->pending_io) == 0); + scrub2_verify_one_stripe(stripe); +} + +static void scrub2_repair_from_mirror(struct scrub2_stripe *orig, + struct scrub2_stripe *repair, + int mirror_num) +{ + struct btrfs_fs_info *fs_info = orig->bg->fs_info; + const int nr_sectors = orig->nr_sectors; + int sector_nr; + + /* Reset the error bitmaps for @repair stripe. */ + bitmap_zero(&repair->current_error_bitmap, nr_sectors); + bitmap_zero(&repair->io_error_bitmap, nr_sectors); + bitmap_zero(&repair->csum_error_bitmap, nr_sectors); + bitmap_zero(&repair->meta_error_bitmap, nr_sectors); + + repair->mirror_num = mirror_num; + scrub2_read_and_wait_stripe(repair); + + for_each_set_bit(sector_nr, &orig->used_sector_bitmap, nr_sectors) { + int page_index = (sector_nr << fs_info->sectorsize_bits) >> + PAGE_SHIFT; + int pgoff = offset_in_page(sector_nr << fs_info->sectorsize_bits); + + if (test_bit(sector_nr, &orig->current_error_bitmap) && + !test_bit(sector_nr, &repair->current_error_bitmap)) { + + /* Copy the repaired content. */ + memcpy_page(orig->pages[page_index], pgoff, + repair->pages[page_index], pgoff, + fs_info->sectorsize); + /* + * And clear the bit in @current_error_bitmap, so + * later we know we need to write this sector back. + */ + clear_bit(sector_nr, &orig->current_error_bitmap); + } + } +} + +void scrub2_repair_one_stripe(struct scrub2_stripe *stripe) +{ + struct btrfs_fs_info *fs_info = stripe->bg->fs_info; + struct scrub2_stripe *repair; + int nr_copies; + int i; + + /* + * The stripe should only have been verified once, thus its init and + * current error bitmap should match. + */ + ASSERT(bitmap_equal(&stripe->current_error_bitmap, + &stripe->init_error_bitmap, stripe->nr_sectors)); + + /* The stripe has no error from the beginning. */ + if (bitmap_empty(&stripe->init_error_bitmap, stripe->nr_sectors)) + return; + nr_copies = btrfs_num_copies(fs_info, stripe->logical, + fs_info->sectorsize); + /* No extra mirrors to go. */ + if (nr_copies == 1) + return; + + repair = clone_scrub2_stripe(fs_info, stripe); + /* Iterate all other copies. */ + for (i = 0; i < nr_copies - 1; i++) { + int next_mirror; + + next_mirror = (i + stripe->mirror_num) >= nr_copies ? + (i + stripe->mirror_num - nr_copies) : + i + stripe->mirror_num; + scrub2_repair_from_mirror(stripe, repair, next_mirror); + + /* Already repaired all bad sectors. */ + if (bitmap_empty(&stripe->current_error_bitmap, + stripe->nr_sectors)) + break; + } + free_scrub2_stripe(repair); +} + /* * Scrub one range which can only has simple mirror based profile. * (Including all range in SINGLE/DUP/RAID1/RAID1C*, and each stripe in diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index 0aaed61e4d4d..2f1fceace633 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -25,6 +25,6 @@ int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, struct btrfs_block_group *bg, u64 logical_start, u64 logical_len, struct scrub2_stripe *stripe); -void scrub2_verify_one_stripe(struct scrub2_stripe *stripe); +void scrub2_repair_one_stripe(struct scrub2_stripe *stripe); #endif From patchwork Tue Dec 6 08:23:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13065518 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACD5CC352A1 for ; Tue, 6 Dec 2022 08:24:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232897AbiLFIYT (ORCPT ); Tue, 6 Dec 2022 03:24:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233619AbiLFIYE (ORCPT ); Tue, 6 Dec 2022 03:24:04 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E9F213EBC for ; Tue, 6 Dec 2022 00:24:03 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id BF2451FDC8 for ; Tue, 6 Dec 2022 08:24:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670315041; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oTApO1vATqymtPASDcyaQqMix6RFtvR3yV9/M+v9yQM=; b=ljIVPYvyfcI1c2aXwjxpXxHNAkTE6jEiKiA3BUoL686DebiLTfkKRd72dWp6PAtr+whQYe 8vwcEW/F7aeDgXG5pgmCjAa03EJJ924YgcHDXMXCzoAGJCO80gJnFXmYFAKU0mqo/s7mmT 1uijIP1rEaIwiB4Mwwy03bB0WNfKebA= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 2DCB5132F3 for ; Tue, 6 Dec 2022 08:24:00 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id qPaOOyD8jmNRbAAAGKfGzw (envelope-from ) for ; Tue, 06 Dec 2022 08:24:00 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PoC PATCH 05/11] btrfs: scrub: add a writeback helper for scrub2_stripe Date: Tue, 6 Dec 2022 16:23:32 +0800 Message-Id: <1e92cdad3b60464ed6f1cf4c0a24cac7d270e3ef.1670314744.git.wqu@suse.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add a new helper, scrub2_writeback_sectors(), to writeback specified sectors of a scrub2_stripe. Unlike the read path, writeback can only submit writes for the repair sectors, no longer in a BTRFS_STRIPE_LEN size. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/scrub.h | 2 + 2 files changed, 111 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 15c95cf88a2e..a581d1e4ae44 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -152,6 +152,15 @@ struct scrub2_stripe { */ unsigned long meta_error_bitmap; + /* This is only for write back cases (repair or replace). */ + unsigned long write_error_bitmap; + + /* + * Spinlock for write_error_bitmap, as that's the only case we can have + * multiple bios for one stripe. + */ + spinlock_t write_error_bitmap_lock; + /* * Checksum for the whole stripe if this stripe is inside a data block * group. @@ -392,6 +401,7 @@ struct scrub2_stripe *alloc_scrub2_stripe(struct btrfs_fs_info *fs_info, init_waitqueue_head(&stripe->io_wait); atomic_set(&stripe->pending_io, 0); + spin_lock_init(&stripe->write_error_bitmap_lock); stripe->nr_sectors = BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits; @@ -3925,10 +3935,102 @@ static void scrub2_repair_from_mirror(struct scrub2_stripe *orig, } } +static void scrub2_write_endio(struct btrfs_bio *bbio) +{ + struct scrub2_stripe *stripe = bbio->private; + struct btrfs_fs_info *fs_info = stripe->bg->fs_info; + struct bio_vec *first_bvec = bio_first_bvec_all(&bbio->bio); + struct bio_vec *bvec; + struct bvec_iter_all iter_all; + unsigned long flags; + int bio_size = 0; + int first_sector_nr; + int i; + + bio_for_each_segment_all(bvec, &bbio->bio, iter_all) + bio_size += bvec->bv_len; + + for (i = 0; i < BTRFS_STRIPE_LEN >> PAGE_SHIFT; i++) { + if (stripe->pages[i] == first_bvec->bv_page) + break; + } + /* + * Since our pages should all be from stripe->pages[], we should find + * the page. + */ + ASSERT(i < BTRFS_STRIPE_LEN >> PAGE_SHIFT); + first_sector_nr = ((i << PAGE_SHIFT) + first_bvec->bv_offset) >> + fs_info->sectorsize_bits; + bio_put(&bbio->bio); + + spin_lock_irqsave(&stripe->write_error_bitmap_lock, flags); + bitmap_set(&stripe->write_error_bitmap, first_sector_nr, + bio_size >> fs_info->sectorsize_bits); + spin_unlock_irqrestore(&stripe->write_error_bitmap_lock, flags); + if (atomic_dec_and_test(&stripe->pending_io)) + wake_up(&stripe->io_wait); +} + +/* + * Writeback sectors specified by @write_bitmap. + * + * Called by scrub repair (writeback repaired sectors) or dev-replace + * (writeback the sectors to the replace dst device). + */ +void scrub2_writeback_sectors(struct scrub2_stripe *stripe, + unsigned long *write_bitmap) +{ + struct btrfs_fs_info *fs_info = stripe->bg->fs_info; + struct bio *bio = NULL; + int sector_nr; + + ASSERT(atomic_read(&stripe->pending_io) == 0); + + /* Go through each initially corrupted sector. */ + for_each_set_bit(sector_nr, write_bitmap, stripe->nr_sectors) { + const int page_index = (sector_nr << fs_info->sectorsize_bits) >> + PAGE_SHIFT; + const int pgoff = offset_in_page(sector_nr << + fs_info->sectorsize_bits); + int ret; + + /* + * No bio allocated or we can not merge with previous sector + * (previous sector is not a repaired one). + */ + if (!bio || sector_nr == 0 || + !(test_bit(sector_nr, &stripe->init_error_bitmap) && + !test_bit(sector_nr - 1, &stripe->current_error_bitmap))) { + if (bio) { + atomic_inc(&stripe->pending_io); + btrfs_submit_bio(fs_info, bio, + stripe->mirror_num); + } + bio = btrfs_bio_alloc(BTRFS_STRIPE_LEN >> PAGE_SHIFT, + REQ_OP_WRITE, scrub2_write_endio, stripe); + ASSERT(bio); + + bio->bi_iter.bi_sector = (stripe->logical + + (sector_nr << fs_info->sectorsize_bits)) >> + SECTOR_SHIFT; + } + ret = bio_add_page(bio, stripe->pages[page_index], + fs_info->sectorsize, pgoff); + ASSERT(ret == fs_info->sectorsize); + } + if (bio) { + atomic_inc(&stripe->pending_io); + btrfs_submit_bio(fs_info, bio, stripe->mirror_num); + } + + wait_event(stripe->io_wait, atomic_read(&stripe->pending_io) == 0); +} + void scrub2_repair_one_stripe(struct scrub2_stripe *stripe) { struct btrfs_fs_info *fs_info = stripe->bg->fs_info; struct scrub2_stripe *repair; + unsigned long writeback_bitmap = 0; int nr_copies; int i; @@ -3964,6 +4066,13 @@ void scrub2_repair_one_stripe(struct scrub2_stripe *stripe) break; } free_scrub2_stripe(repair); + /* + * Writeback the sectors which are in the init_error_bitmap, but not + * int the current_error_bitmap. + * Thus writeback = init_error & !current_error. + */ + bitmap_andnot(&writeback_bitmap, &stripe->init_error_bitmap, + &stripe->current_error_bitmap, stripe->nr_sectors); } /* diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index 2f1fceace633..a9519214bd41 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -26,5 +26,7 @@ int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, u64 logical_start, u64 logical_len, struct scrub2_stripe *stripe); void scrub2_repair_one_stripe(struct scrub2_stripe *stripe); +void scrub2_writeback_sectors(struct scrub2_stripe *stripe, + unsigned long *write_bitmap); #endif From patchwork Tue Dec 6 08:23:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13065519 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E96EBC47090 for ; Tue, 6 Dec 2022 08:24:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231249AbiLFIYU (ORCPT ); Tue, 6 Dec 2022 03:24:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233646AbiLFIYF (ORCPT ); Tue, 6 Dec 2022 03:24:05 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D2AF13F93 for ; Tue, 6 Dec 2022 00:24:04 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id B27CC1FE6D for ; Tue, 6 Dec 2022 08:24:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670315042; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qB4eJjY3ZDKNhWntgUZzQ347EmSaBmUpCMMc+ldXtR0=; b=lww1cWajaIObZz10MsvaqNN+qvqA7jd9FlF4e1ZBzwa3y1L494s+f7oZl6Kz4BTVsvmSG6 3gIFUP04M1j7l6fwE5I/xRXaX5vRfioLQ21mX+wgWkX2ra2CxI1KmoPIHZncQAWQ71nZa5 luaBhvGc6I09hx9U6hIk3wS/UCNSHjE= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 21A8E132F3 for ; Tue, 6 Dec 2022 08:24:01 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id YJKJOCH8jmNRbAAAGKfGzw (envelope-from ) for ; Tue, 06 Dec 2022 08:24:01 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PoC PATCH 06/11] btrfs: scrub: add the error reporting for scrub2_stripe Date: Tue, 6 Dec 2022 16:23:33 +0800 Message-Id: <633a0a4c44723cf193962e012530da4591d9b7a9.1670314744.git.wqu@suse.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The new helper, scrub2_report_errors(), will report the result of the scrub to dmesg. The main reporting is done by introducing a new helper, scrub2_print_warning(), which is mostly the same content from scrub_print_wanring(), but without the need for a scrub_block. Since we're reporting the errors, it's the perfect timing to update the scrub stat too. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 165 ++++++++++++++++++++++++++++++++++++++++++++--- fs/btrfs/scrub.h | 2 + 2 files changed, 158 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index a581d1e4ae44..cf0b879709b1 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -118,6 +118,13 @@ struct scrub2_stripe { atomic_t pending_io; wait_queue_head_t io_wait; + /* + * How many data/meta extents are in this stripe. + * Only for scrub stat report purpose. + */ + u16 nr_data_extents; + u16 nr_meta_extents; + /* Indicates which sectors are covered by extent items. */ unsigned long used_sector_bitmap; @@ -1097,9 +1104,9 @@ static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes, return 0; } -static void scrub_print_warning(const char *errstr, struct scrub_block *sblock) +static void scrub2_print_warning(const char *errstr, struct btrfs_device *dev, + bool is_super, u64 logical, u64 physical) { - struct btrfs_device *dev; struct btrfs_fs_info *fs_info; struct btrfs_path *path; struct btrfs_key found_key; @@ -1113,22 +1120,20 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock) u8 ref_level = 0; int ret; - WARN_ON(sblock->sector_count < 1); - dev = sblock->dev; - fs_info = sblock->sctx->fs_info; + fs_info = dev->fs_info; /* Super block error, no need to search extent tree. */ - if (sblock->sectors[0]->flags & BTRFS_EXTENT_FLAG_SUPER) { + if (is_super) { btrfs_warn_in_rcu(fs_info, "%s on device %s, physical %llu", - errstr, btrfs_dev_name(dev), sblock->physical); + errstr, btrfs_dev_name(dev), physical); return; } path = btrfs_alloc_path(); if (!path) return; - swarn.physical = sblock->physical; - swarn.logical = sblock->logical; + swarn.physical = physical; + swarn.logical = logical; swarn.errstr = errstr; swarn.dev = NULL; @@ -1177,6 +1182,13 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock) btrfs_free_path(path); } +static void scrub_print_warning(const char *errstr, struct scrub_block *sblock) +{ + scrub2_print_warning(errstr, sblock->dev, + sblock->sectors[0]->flags & BTRFS_EXTENT_FLAG_SUPER, + sblock->logical, sblock->physical); +} + static inline void scrub_get_recover(struct scrub_recover *recover) { refcount_inc(&recover->refs); @@ -3633,6 +3645,11 @@ int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, goto out; get_extent_info(&path, &extent_start, &extent_len, &extent_flags, &extent_gen); + if (extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) + stripe->nr_meta_extents++; + if (extent_flags & BTRFS_EXTENT_FLAG_DATA) + stripe->nr_data_extents++; + cur_logical = max(extent_start, cur_logical); /* @@ -3662,6 +3679,10 @@ int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, } get_extent_info(&path, &extent_start, &extent_len, &extent_flags, &extent_gen); + if (extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) + stripe->nr_meta_extents++; + if (extent_flags & BTRFS_EXTENT_FLAG_DATA) + stripe->nr_data_extents++; fill_one_extent_info(fs_info, stripe, extent_start, extent_len, extent_flags, extent_gen); cur_logical = extent_start + extent_len; @@ -4075,6 +4096,132 @@ void scrub2_repair_one_stripe(struct scrub2_stripe *stripe) &stripe->current_error_bitmap, stripe->nr_sectors); } +void scrub2_report_errors(struct scrub_ctx *sctx, + struct scrub2_stripe *stripe) +{ + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + struct btrfs_fs_info *fs_info = sctx->fs_info; + struct btrfs_device *dev = NULL; + u64 physical = 0; + int nr_data_sectors = 0; + int nr_meta_sectors = 0; + int nr_nodatacsum_sectors = 0; + int nr_repaired_sectors = 0; + int sector_nr; + + /* + * Init needed infos for error reporting, as although our scrub2 + * infrastucture is all based on btrfs_submit_bio() thus no need for + * dev/physical. + * + * But our error reporting system still needs dev and physical. + */ + if (!bitmap_empty(&stripe->init_error_bitmap, stripe->nr_sectors)) { + u64 mapped_len = fs_info->sectorsize; + struct btrfs_io_context *bioc = NULL; + int stripe_index = stripe->mirror_num - 1; + int ret; + + /* For scrub, our mirror_num should always start at 1. */ + ASSERT(stripe->mirror_num >= 1); + ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, + stripe->logical, &mapped_len, &bioc); + /* + * If we failed, sblock->sctx will be NULL, and later detailed + * reports will just be skipped. + */ + if (ret < 0) + goto skip; + physical = bioc->stripes[stripe_index].physical; + dev = bioc->stripes[stripe_index].dev; + btrfs_put_bioc(bioc); + } + +skip: + for_each_set_bit(sector_nr, &stripe->used_sector_bitmap, + stripe->nr_sectors) { + bool repaired = false; + + if (stripe->sectors[sector_nr].is_metadata) { + nr_meta_sectors++; + } else { + nr_data_sectors++; + if (!stripe->sectors[sector_nr].csum) + nr_nodatacsum_sectors++; + } + + if (test_bit(sector_nr, &stripe->init_error_bitmap) && + !test_bit(sector_nr, &stripe->current_error_bitmap)) { + nr_repaired_sectors++; + repaired = true; + } + + /* Good sector from the beginning, nothing need to be done. */ + if (!test_bit(sector_nr, &stripe->init_error_bitmap)) + continue; + + /* + * Report error for the corrupted sectors. + * If repaired, just output the message of repaired message. + */ + if (repaired) { + if (dev) + btrfs_err_rl_in_rcu(fs_info, + "fixed up error at logical %llu on dev %s physical %llu", + stripe->logical, btrfs_dev_name(dev), + physical); + else + btrfs_err_rl_in_rcu(fs_info, + "fixed up error at logical %llu on mirror %u", + stripe->logical, stripe->mirror_num); + continue; + } + + /* The remaining are all for unrepaired. */ + if (dev) + btrfs_err_rl_in_rcu(fs_info, + "unable to fixup (regular) error at logical %llu on dev %s physical %llu", + stripe->logical, btrfs_dev_name(dev), + physical); + else + btrfs_err_rl_in_rcu(fs_info, + "unable to fixup (regular) error at logical %llu on mirror %u", + stripe->logical, stripe->mirror_num); + + if (test_bit(sector_nr, &stripe->io_error_bitmap)) + if (__ratelimit(&rs) && dev) + scrub2_print_warning("i/o error", dev, false, + stripe->logical, physical); + if (test_bit(sector_nr, &stripe->csum_error_bitmap)) + if (__ratelimit(&rs) && dev) + scrub2_print_warning("checksum error", dev, false, + stripe->logical, physical); + if (test_bit(sector_nr, &stripe->meta_error_bitmap)) + if (__ratelimit(&rs) && dev) + scrub2_print_warning("header error", dev, false, + stripe->logical, physical); + } + + spin_lock(&sctx->stat_lock); + sctx->stat.data_extents_scrubbed += stripe->nr_data_extents; + sctx->stat.tree_extents_scrubbed += stripe->nr_meta_extents; + sctx->stat.data_bytes_scrubbed += nr_data_sectors << + fs_info->sectorsize_bits; + sctx->stat.tree_bytes_scrubbed += nr_meta_sectors << + fs_info->sectorsize_bits; + sctx->stat.read_errors += + bitmap_weight(&stripe->io_error_bitmap, stripe->nr_sectors); + sctx->stat.csum_errors += + bitmap_weight(&stripe->csum_error_bitmap, stripe->nr_sectors); + sctx->stat.verify_errors += + bitmap_weight(&stripe->meta_error_bitmap, stripe->nr_sectors); + sctx->stat.uncorrectable_errors += + bitmap_weight(&stripe->current_error_bitmap, stripe->nr_sectors); + sctx->stat.corrected_errors += nr_repaired_sectors; + spin_unlock(&sctx->stat_lock); +} + /* * Scrub one range which can only has simple mirror based profile. * (Including all range in SINGLE/DUP/RAID1/RAID1C*, and each stripe in diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index a9519214bd41..362742692a29 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -28,5 +28,7 @@ int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, void scrub2_repair_one_stripe(struct scrub2_stripe *stripe); void scrub2_writeback_sectors(struct scrub2_stripe *stripe, unsigned long *write_bitmap); +void scrub2_report_errors(struct scrub_ctx *sctx, + struct scrub2_stripe *stripe); #endif From patchwork Tue Dec 6 08:23:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13065520 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 399CBC352A1 for ; Tue, 6 Dec 2022 08:24:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233119AbiLFIYW (ORCPT ); Tue, 6 Dec 2022 03:24:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233747AbiLFIYG (ORCPT ); Tue, 6 Dec 2022 03:24:06 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E49E31572C for ; Tue, 6 Dec 2022 00:24:04 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id A5AB51FE71 for ; Tue, 6 Dec 2022 08:24:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670315043; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=coso4UdSH4Ds/0byzwJhw+jNeHVi3yKj1p/tdnpxAgQ=; b=RDQYdElnhl1BmXdG0bZVNDmG0CM2I4B5tcQ8epQCNSVPrpv0IfJi9N00pm1ChgHyw1O/rk 1eAKpDCR8zPpCiYkzxWqYIOouzyml38/QRogIK65Zs1rztGecYVI9C/OSOaGs1uoXxDoF6 lzewkiTjPmuT7++xVlm1UxKv1O7GX/g= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 163F1132F3 for ; Tue, 6 Dec 2022 08:24:02 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id aBOpNSL8jmNRbAAAGKfGzw (envelope-from ) for ; Tue, 06 Dec 2022 08:24:02 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PoC PATCH 07/11] btrfs: scrub: add raid56 P/Q scrubbing support Date: Tue, 6 Dec 2022 16:23:34 +0800 Message-Id: <9a9101481ba88a952e50963030599d8e987be006.1670314744.git.wqu@suse.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The overall idea is pretty much the same as the old RAID56 P/Q scrub code, but there are some difference: - If data stripes contains any sector which can not be repaired, we abort This is to prevent corrupted sector to spread to P/Q. Although if we failed to repair the data sectors, there is already not much left to save in P/Q. We may still want to hold the P/Q untouched, as there is still cases like degraded RAID56, if that missing device come back, we may recover the corruption. - Use the scrub2 interface to scrub the data stripes Obviously, this is to remove the dependency on the old infra. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 256 ++++++++++++++++++++++++++++++++++++++++++++++- fs/btrfs/scrub.h | 7 +- 2 files changed, 257 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index cf0b879709b1..162f7e1dd378 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -180,6 +180,26 @@ struct scrub2_stripe { * We reuse the same eb for all metadata of the same stripe. */ struct extent_buffer *dummy_eb; + + /* The following members are only for stripe group usage.*/ + struct work_struct work; + struct scrub2_stripe_group *group; +}; + +/* + * Indicate multiple (related) stripes in a group. + * + * This is mostly for RAID56 usage only. + */ +struct scrub2_stripe_group { + struct scrub_ctx *sctx; + atomic_t pending_stripes; + wait_queue_head_t stripe_wait; + int nr_stripes; + + struct btrfs_io_context *bioc; + + struct scrub2_stripe **stripes; }; struct scrub_recover { @@ -3998,8 +4018,8 @@ static void scrub2_write_endio(struct btrfs_bio *bbio) * Called by scrub repair (writeback repaired sectors) or dev-replace * (writeback the sectors to the replace dst device). */ -void scrub2_writeback_sectors(struct scrub2_stripe *stripe, - unsigned long *write_bitmap) +static void scrub2_writeback_sectors(struct scrub2_stripe *stripe, + unsigned long *write_bitmap) { struct btrfs_fs_info *fs_info = stripe->bg->fs_info; struct bio *bio = NULL; @@ -4047,7 +4067,7 @@ void scrub2_writeback_sectors(struct scrub2_stripe *stripe, wait_event(stripe->io_wait, atomic_read(&stripe->pending_io) == 0); } -void scrub2_repair_one_stripe(struct scrub2_stripe *stripe) +static void scrub2_repair_one_stripe(struct scrub2_stripe *stripe) { struct btrfs_fs_info *fs_info = stripe->bg->fs_info; struct scrub2_stripe *repair; @@ -4222,6 +4242,236 @@ void scrub2_report_errors(struct scrub_ctx *sctx, spin_unlock(&sctx->stat_lock); } +static void scrub2_data_stripe_for_raid56(struct work_struct *work) +{ + struct scrub2_stripe *stripe = container_of(work, struct scrub2_stripe, + work); + struct scrub2_stripe_group *full_stripe = stripe->group; + unsigned long writeback_bitmap = 0; + + ASSERT(full_stripe); + + scrub2_read_and_wait_stripe(stripe); + scrub2_repair_one_stripe(stripe); + + bitmap_andnot(&writeback_bitmap, &stripe->init_error_bitmap, + &stripe->current_error_bitmap, stripe->nr_sectors); + if (!full_stripe->sctx->readonly) + scrub2_writeback_sectors(stripe, &writeback_bitmap); + + if (atomic_dec_and_test(&full_stripe->pending_stripes)) + wake_up(&full_stripe->stripe_wait); +} + +static void scrub2_release_full_stripe(struct btrfs_fs_info *fs_info, + struct scrub2_stripe_group *full_stripe) +{ + int i; + + btrfs_put_bioc(full_stripe->bioc); + btrfs_bio_counter_dec(fs_info); + ASSERT(atomic_read(&full_stripe->pending_stripes) == 0); + if (full_stripe->nr_stripes) { + for (i = 0; i < full_stripe->nr_stripes; i++) { + if (!full_stripe->stripes[i]) + continue; + free_scrub2_stripe(full_stripe->stripes[i]); + full_stripe->stripes[i] = 0; + } + kfree(full_stripe->stripes); + full_stripe->stripes = NULL; + } +} + +static int scrub2_init_raid56_full_stripe(struct scrub_ctx *sctx, + struct btrfs_block_group *bg, + struct scrub2_stripe_group *full_stripe, + u64 full_stripe_logical) +{ + struct btrfs_fs_info *fs_info = bg->fs_info; + u64 mapped_length = fs_info->sectorsize; + struct btrfs_io_context *bioc = NULL; + int nr_stripes; + int nr_pq; + int ret; + int i; + + memset(full_stripe, 0, sizeof(*full_stripe)); + atomic_set(&full_stripe->pending_stripes, 0); + init_waitqueue_head(&full_stripe->stripe_wait); + full_stripe->sctx = sctx; + + btrfs_bio_counter_inc_blocked(fs_info); + ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, + full_stripe_logical, &mapped_length, &bioc); + if (ret < 0) + goto out; + + ASSERT(bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK); + if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID6) + nr_pq = 2; + else + nr_pq = 1; + + nr_stripes = bioc->num_stripes - bioc->num_tgtdevs; + full_stripe->nr_stripes = nr_stripes - nr_pq; + + full_stripe->stripes = kcalloc(full_stripe->nr_stripes, + sizeof(struct scrub2_stripe *), GFP_KERNEL); + if (!full_stripe->stripes) { + ret = -ENOMEM; + goto out; + } + + /* Allocate all stripes.*/ + for (i = 0; i < full_stripe->nr_stripes; i++) { + full_stripe->stripes[i] = alloc_scrub2_stripe(fs_info, bg); + if (!full_stripe->stripes[i]) { + ret = -ENOMEM; + goto out; + } + full_stripe->stripes[i]->group = full_stripe; + full_stripe->stripes[i]->logical = bioc->raid_map[i]; + full_stripe->stripes[i]->mirror_num = 1; + } +out: + scrub2_release_full_stripe(fs_info, full_stripe); + return ret; +} + +static int scrub2_wait_data_stripes(struct btrfs_block_group *bg, + struct scrub2_stripe_group *full_stripe, + u64 full_stripe_logical) +{ + struct btrfs_fs_info *fs_info = bg->fs_info; + struct btrfs_root *extent_root; + struct btrfs_root *csum_root; + int ret; + int i; + + extent_root = btrfs_extent_root(fs_info, full_stripe_logical); + csum_root = btrfs_csum_root(fs_info, full_stripe_logical); + + /* Fullfill the extent info for each data extents. */ + for (i = 0; i < full_stripe->nr_stripes; i++) { + u64 cur_logical = full_stripe_logical + i * BTRFS_STRIPE_LEN; + + ret = scrub2_find_fill_first_stripe(extent_root, csum_root, bg, + cur_logical, BTRFS_STRIPE_LEN, + full_stripe->stripes[i]); + if (ret < 0) + goto out; + } + + /* + * Now submit data stripes. They go through the regular + * read-verify-repair routine. + */ + for (i = 0; i < full_stripe->nr_stripes; i++) { + INIT_WORK(&full_stripe->stripes[i]->work, + scrub2_data_stripe_for_raid56); + atomic_inc(&full_stripe->pending_stripes); + queue_work(fs_info->scrub_workers, + &full_stripe->stripes[i]->work); + } + /* + * We want to wait above scrub for data stripes to finish before + * scrubbing the P/Q stripes. + * As the P/Q scrub relies on above data stripes to be good. + */ + wait_event(full_stripe->stripe_wait, + atomic_read(&full_stripe->pending_stripes) == 0); +out: + return ret; +} + +static void scrub2_wait_raid56_scrub_endio(struct bio *bio) +{ + complete(bio->bi_private); +} + +/* + * For current per-device scrub, RAID56 data stripes are handled just like + * RAID0: + * - Read data stripes + * - Verify + * - If corrupted, try extra mirrors (rebuild) and verify again + * + * But if we hit a parity stripe, we have to do above loop for every data + * stripes, and only when all of the sectors in them are fine, we can + * check the parity. + */ +int scrub2_raid56_parity(struct scrub_ctx *sctx, + struct btrfs_block_group *bg, + struct btrfs_device *target, + u64 full_stripe_logical) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + struct scrub2_stripe_group full_stripe; + struct btrfs_raid_bio *rbio; + struct bio *bio; + DECLARE_COMPLETION_ONSTACK(done); + unsigned long used_sector_bitmap = 0; + int ret; + int i; + + ret = scrub2_init_raid56_full_stripe(sctx, bg, &full_stripe, + full_stripe_logical); + if (ret < 0) + return ret; + + ret = scrub2_wait_data_stripes(bg, &full_stripe, full_stripe_logical); + if (ret < 0) + goto out; + + /* + * If we have any unrepaired sectors, we can not scrub P/Q, as + * it may use corrupted data to calculate new P/Q and spread + * corruption. + */ + for (i = 0; i < full_stripe.nr_stripes; i++) { + struct scrub2_stripe *stripe = full_stripe.stripes[i]; + + if (!bitmap_empty(&stripe->current_error_bitmap, + stripe->nr_sectors)) { + ret = -EIO; + goto out; + } + /* Also calculate the used_sector_bitmap for P/Q scrub. */ + bitmap_or(&used_sector_bitmap, &used_sector_bitmap, + &stripe->used_sector_bitmap, stripe->nr_sectors); + } + + bio = bio_alloc(NULL, 0, REQ_OP_READ, GFP_KERNEL); + ASSERT(bio); + bio->bi_iter.bi_sector = full_stripe_logical >> SECTOR_SHIFT; + bio->bi_private = &done; + bio->bi_end_io = scrub2_wait_raid56_scrub_endio; + + rbio = raid56_parity_alloc_scrub_rbio(bio, full_stripe.bioc, target, + &used_sector_bitmap, + full_stripe.stripes[0]->nr_sectors); + if (!rbio) { + bio_put(bio); + ret = -ENOMEM; + goto out; + } + raid56_parity_submit_scrub_rbio(rbio); + + /* + * RAID56 scrub code has already handled dev replace case. + * So we just wait for it to finish, and no need to handle + * dev-replace anymore. + */ + wait_for_completion_io(&done); + ret = blk_status_to_errno(bio->bi_status); + bio_put(bio); + +out: + scrub2_release_full_stripe(fs_info, &full_stripe); + return ret; +} + /* * Scrub one range which can only has simple mirror based profile. * (Including all range in SINGLE/DUP/RAID1/RAID1C*, and each stripe in diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index 362742692a29..c22349380c50 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -25,10 +25,11 @@ int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, struct btrfs_block_group *bg, u64 logical_start, u64 logical_len, struct scrub2_stripe *stripe); -void scrub2_repair_one_stripe(struct scrub2_stripe *stripe); -void scrub2_writeback_sectors(struct scrub2_stripe *stripe, - unsigned long *write_bitmap); void scrub2_report_errors(struct scrub_ctx *sctx, struct scrub2_stripe *stripe); +int scrub2_raid56_parity(struct scrub_ctx *sctx, + struct btrfs_block_group *bg, + struct btrfs_device *target, + u64 full_stripe_logical); #endif From patchwork Tue Dec 6 08:23:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13065521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DF2CC3A5A7 for ; Tue, 6 Dec 2022 08:24:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231722AbiLFIYY (ORCPT ); Tue, 6 Dec 2022 03:24:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233486AbiLFIYG (ORCPT ); Tue, 6 Dec 2022 03:24:06 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D83B2DFC3 for ; Tue, 6 Dec 2022 00:24:05 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 994BE21C04 for ; Tue, 6 Dec 2022 08:24:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670315044; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HyAvAqVyTXiAW74hmD2xN4DKVLgr65Y7AM7t51i1lV8=; b=S3GN5yB3ShgVCV7iBesf1zG6SOG/9iwsqZEau/Nih+5wj9bgSW+7nwIAlCn2myU0d5VrDb vLhJyzdACgKOY1VbpeOnU0dtFgXxN23eZDBgwrXTfpsy3o85KAMR8Znmht/lkOTPrsNY9I 3vPXbr09BBN//h4qDYvQxdlDMpS7FHY= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 0930E132F3 for ; Tue, 6 Dec 2022 08:24:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id yMObMiP8jmNRbAAAGKfGzw (envelope-from ) for ; Tue, 06 Dec 2022 08:24:03 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PoC PATCH 08/11] btrfs: scrub: use dedicated super block verification function to scrub one super block Date: Tue, 6 Dec 2022 16:23:35 +0800 Message-Id: <8f9c062829fb59ea0ae793801e528afabff7e979.1670314744.git.wqu@suse.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There is really no need to go through the super complex scrub_sectors() to just handle super blocks. This patch will introduce a dedicated function (less than 50 lines) to handle super block scrubing. This new function will introduce a behavior change, instead of using the complex but concurrent scrub_bio system, here we just go submit-and-wait. There is really not much sense to care the performance of super block scrubbing. It only has 3 super blocks at most, and they are all scattered around the devices already. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 43 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 39 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 162f7e1dd378..9a9e706cba3e 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -5239,6 +5239,38 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, return ret; } +static int scrub2_one_super(struct scrub_ctx *sctx, + struct btrfs_device *scrub_dev, + struct page *page, u64 physical, u64 generation) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + struct bio_vec bvec; + struct bio bio; + struct btrfs_super_block *sb = page_address(page); + int ret; + + bio_init(&bio, scrub_dev->bdev, &bvec, 1, REQ_OP_READ); + bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE, 0); + ret = submit_bio_wait(&bio); + bio_uninit(&bio); + + if (ret < 0) + return ret; + ret = btrfs_check_super_csum(fs_info, sb); + if (ret < 0) + return ret; + if (btrfs_super_generation(sb) != generation) { + btrfs_err_rl(fs_info, +"super block at physical %llu devid %llu has bad generation, has %llu expect %llu", + physical, scrub_dev->devid, + btrfs_super_generation(sb), generation); + return -EUCLEAN; + } + + ret = btrfs_validate_super(fs_info, sb, -1); + return ret; +} + static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev) { @@ -5246,11 +5278,16 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, u64 bytenr; u64 gen; int ret; + struct page *page; struct btrfs_fs_info *fs_info = sctx->fs_info; if (BTRFS_FS_ERROR(fs_info)) return -EROFS; + page = alloc_page(GFP_KERNEL); + if (!page) + return -ENOMEM; + /* Seed devices of a new filesystem has their own generation. */ if (scrub_dev->fs_devices != fs_info->fs_devices) gen = scrub_dev->generation; @@ -5265,13 +5302,11 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, if (!btrfs_check_super_location(scrub_dev, bytenr)) continue; - ret = scrub_sectors(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr, - scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i, - NULL, bytenr); + ret = scrub2_one_super(sctx, scrub_dev, page, bytenr, gen); if (ret) return ret; } - wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); + __free_page(page); return 0; } From patchwork Tue Dec 6 08:23:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13065522 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFB34C352A1 for ; Tue, 6 Dec 2022 08:24:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232616AbiLFIY0 (ORCPT ); Tue, 6 Dec 2022 03:24:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233314AbiLFIYI (ORCPT ); Tue, 6 Dec 2022 03:24:08 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC32315FD7 for ; Tue, 6 Dec 2022 00:24:06 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 98DD21FDC8 for ; Tue, 6 Dec 2022 08:24:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670315045; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hQ3S0hgdMPVDvH+JgrVnpPFqM2yrGjiHlk3SHkGjLPY=; b=oibH25gVIql/O+Z38AQst51JSNKuOuIiriXj4CU4KH0IQGvN3fonhw1peA0yu0RDh+UXsi iI3BpXjEx77WtNAPeH+dKR4masLyfVyL0l3N5F0Uew268eU8/UfIgWaBMl5xlzAsHWQoLE lqmlvNnRQrzeKLQYVSgtGoW21BzRb3k= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 00502132F3 for ; Tue, 6 Dec 2022 08:24:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id 4JafLyT8jmNRbAAAGKfGzw (envelope-from ) for ; Tue, 06 Dec 2022 08:24:04 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PoC PATCH 09/11] btrfs: scrub: switch to the new scrub2_stripe based infrastructure Date: Tue, 6 Dec 2022 16:23:36 +0800 Message-Id: X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since scrub2_stripe now can handle both regular and raid56 scrubbing, it's time to switch to the new infrastructure. Please note that, the following old functions are temporarily exported: - scrub_extent() - scrub_raid56_parity() The reason is, the cleanup is too large (will be at least 2K lines removed), thus this patch is really just doing the minimal to switch the infrastructure. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 192 ++++++++++++++++++++++------------------------- fs/btrfs/scrub.h | 21 ++---- 2 files changed, 98 insertions(+), 115 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 9a9e706cba3e..77209792fa90 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -416,8 +416,8 @@ static void free_scrub2_stripe(struct scrub2_stripe *stripe) kfree(stripe); } -struct scrub2_stripe *alloc_scrub2_stripe(struct btrfs_fs_info *fs_info, - struct btrfs_block_group *bg) +static struct scrub2_stripe *alloc_scrub2_stripe(struct btrfs_fs_info *fs_info, + struct btrfs_block_group *bg) { struct scrub2_stripe *stripe; int ret; @@ -2908,10 +2908,10 @@ static int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum) } /* scrub extent tries to collect up to 64 kB for each bio */ -static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, - u64 logical, u32 len, - u64 physical, struct btrfs_device *dev, u64 flags, - u64 gen, int mirror_num) +int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, + u64 logical, u32 len, u64 physical, + struct btrfs_device *dev, u64 flags, + u64 gen, int mirror_num) { struct btrfs_device *src_dev = dev; u64 src_physical = physical; @@ -3497,11 +3497,9 @@ static int scrub_raid56_data_stripe_for_parity(struct scrub_ctx *sctx, return ret; } -static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, - struct map_lookup *map, - struct btrfs_device *sdev, - u64 logic_start, - u64 logic_end) +int scrub_raid56_parity(struct scrub_ctx *sctx, struct map_lookup *map, + struct btrfs_device *sdev, u64 logic_start, + u64 logic_end) { struct btrfs_fs_info *fs_info = sctx->fs_info; struct btrfs_path *path; @@ -3631,11 +3629,11 @@ static void fill_one_extent_info(struct btrfs_fs_info *fs_info, * Return >0 if there is no such stripe in the specified range. * Return <0 for error. */ -int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, - struct btrfs_root *csum_root, - struct btrfs_block_group *bg, - u64 logical_start, u64 logical_len, - struct scrub2_stripe *stripe) +static int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, + struct btrfs_root *csum_root, + struct btrfs_block_group *bg, + u64 logical_start, u64 logical_len, + struct scrub2_stripe *stripe) { struct btrfs_fs_info *fs_info = extent_root->fs_info; const u64 logical_end = logical_start + logical_len; @@ -4116,8 +4114,8 @@ static void scrub2_repair_one_stripe(struct scrub2_stripe *stripe) &stripe->current_error_bitmap, stripe->nr_sectors); } -void scrub2_report_errors(struct scrub_ctx *sctx, - struct scrub2_stripe *stripe) +static void scrub2_report_errors(struct scrub_ctx *sctx, + struct scrub2_stripe *stripe) { static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); @@ -4401,10 +4399,10 @@ static void scrub2_wait_raid56_scrub_endio(struct bio *bio) * stripes, and only when all of the sectors in them are fine, we can * check the parity. */ -int scrub2_raid56_parity(struct scrub_ctx *sctx, - struct btrfs_block_group *bg, - struct btrfs_device *target, - u64 full_stripe_logical) +static int scrub2_raid56_parity(struct scrub_ctx *sctx, + struct btrfs_block_group *bg, + struct btrfs_device *target, + u64 full_stripe_logical) { struct btrfs_fs_info *fs_info = sctx->fs_info; struct scrub2_stripe_group full_stripe; @@ -4472,6 +4470,30 @@ int scrub2_raid56_parity(struct scrub_ctx *sctx, return ret; } +/* Only reset the bitmaps and scrub2_sector info, but keeps the pages. */ +static void scrub2_reset_stripe(struct scrub2_stripe *stripe) +{ + int i; + + stripe->used_sector_bitmap = 0; + stripe->init_error_bitmap = 0; + stripe->current_error_bitmap = 0; + + stripe->io_error_bitmap = 0; + stripe->csum_error_bitmap = 0; + stripe->meta_error_bitmap = 0; + stripe->write_error_bitmap = 0; + + stripe->nr_meta_extents = 0; + stripe->nr_data_extents = 0; + + for (i = 0; i < stripe->nr_sectors; i++) { + stripe->sectors[i].is_metadata = false; + stripe->sectors[i].csum = NULL; + stripe->sectors[i].generation = 0; + } +} + /* * Scrub one range which can only has simple mirror based profile. * (Including all range in SINGLE/DUP/RAID1/RAID1C*, and each stripe in @@ -4481,6 +4503,7 @@ int scrub2_raid56_parity(struct scrub_ctx *sctx, * and @logical_length parameter. */ static int scrub_simple_mirror(struct scrub_ctx *sctx, + struct scrub2_stripe *stripe, struct btrfs_root *extent_root, struct btrfs_root *csum_root, struct btrfs_block_group *bg, @@ -4491,8 +4514,6 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx, { struct btrfs_fs_info *fs_info = sctx->fs_info; const u64 logical_end = logical_start + logical_length; - /* An artificial limit, inherit from old scrub behavior */ - const u32 max_length = SZ_64K; struct btrfs_path path = { 0 }; u64 cur_logical = logical_start; int ret; @@ -4502,13 +4523,12 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx, path.search_commit_root = 1; path.skip_locking = 1; + + stripe->mirror_num = mirror_num; + /* Go through each extent items inside the logical range */ while (cur_logical < logical_end) { - u64 extent_start; - u64 extent_len; - u64 extent_flags; - u64 extent_gen; - u64 scrub_len; + unsigned long writeback_bitmap = 0; /* Canceled? */ if (atomic_read(&fs_info->scrub_cancel_req) || @@ -4538,8 +4558,10 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx, } spin_unlock(&bg->lock); - ret = find_first_extent_item(extent_root, &path, cur_logical, - logical_end - cur_logical); + scrub2_reset_stripe(stripe); + ret = scrub2_find_fill_first_stripe(extent_root, csum_root, bg, + cur_logical, logical_end - cur_logical, stripe); + if (ret > 0) { /* No more extent, just update the accounting */ sctx->stat.last_physical = physical + logical_length; @@ -4548,52 +4570,30 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx, } if (ret < 0) break; - get_extent_info(&path, &extent_start, &extent_len, - &extent_flags, &extent_gen); - /* Skip hole range which doesn't have any extent */ - cur_logical = max(extent_start, cur_logical); + scrub2_read_and_wait_stripe(stripe); + scrub2_repair_one_stripe(stripe); + scrub2_report_errors(sctx, stripe); - /* - * Scrub len has three limits: - * - Extent size limit - * - Scrub range limit - * This is especially imporatant for RAID0/RAID10 to reuse - * this function - * - Max scrub size limit - */ - scrub_len = min(min(extent_start + extent_len, - logical_end), cur_logical + max_length) - - cur_logical; + if (sctx->is_dev_replace) { + /* We have to write all good sectors back. */ + bitmap_andnot(&writeback_bitmap, + &stripe->used_sector_bitmap, + &stripe->current_error_bitmap, + stripe->nr_sectors); + scrub2_writeback_sectors(stripe, &writeback_bitmap); - if (extent_flags & BTRFS_EXTENT_FLAG_DATA) { - ret = btrfs_lookup_csums_list(csum_root, cur_logical, - cur_logical + scrub_len - 1, - &sctx->csum_list, 1, false); - if (ret) - break; - } - if ((extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) && - does_range_cross_boundary(extent_start, extent_len, - logical_start, logical_length)) { - btrfs_err(fs_info, -"scrub: tree block %llu spanning boundaries, ignored. boundary=[%llu, %llu)", - extent_start, logical_start, logical_end); - spin_lock(&sctx->stat_lock); - sctx->stat.uncorrectable_errors++; - spin_unlock(&sctx->stat_lock); - cur_logical += scrub_len; - continue; - } - ret = scrub_extent(sctx, map, cur_logical, scrub_len, - cur_logical - logical_start + physical, - device, extent_flags, extent_gen, - mirror_num); - scrub_free_csums(sctx); - if (ret) - break; - if (sctx->is_dev_replace) + /* TODO: add support for zoned devices. */ sync_replace_for_zoned(sctx); - cur_logical += scrub_len; + } else if (!sctx->readonly) { + /* Only writeback the repaired sectors. */ + bitmap_andnot(&writeback_bitmap, + &stripe->init_error_bitmap, + &stripe->current_error_bitmap, + stripe->nr_sectors); + scrub2_writeback_sectors(stripe, &writeback_bitmap); + } + cur_logical = stripe->logical + BTRFS_STRIPE_LEN; + /* Don't hold CPU for too long time */ cond_resched(); } @@ -4638,6 +4638,7 @@ static int simple_stripe_mirror_num(struct map_lookup *map, int stripe_index) } static int scrub_simple_stripe(struct scrub_ctx *sctx, + struct scrub2_stripe *stripe, struct btrfs_root *extent_root, struct btrfs_root *csum_root, struct btrfs_block_group *bg, @@ -4659,9 +4660,9 @@ static int scrub_simple_stripe(struct scrub_ctx *sctx, * just RAID1, so we can reuse scrub_simple_mirror() to scrub * this stripe. */ - ret = scrub_simple_mirror(sctx, extent_root, csum_root, bg, map, - cur_logical, map->stripe_len, device, - cur_physical, mirror_num); + ret = scrub_simple_mirror(sctx, stripe, extent_root, csum_root, + bg, map, cur_logical, map->stripe_len, + device, cur_physical, mirror_num); if (ret) return ret; /* Skip to next stripe which belongs to the target device */ @@ -4678,10 +4679,10 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev, int stripe_index) { - struct btrfs_path *path; struct btrfs_fs_info *fs_info = sctx->fs_info; struct btrfs_root *root; struct btrfs_root *csum_root; + struct scrub2_stripe *stripe; struct blk_plug plug; struct map_lookup *map = em->map_lookup; const u64 profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK; @@ -4697,22 +4698,12 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, /* Offset inside the chunk */ u64 offset; u64 stripe_logical; - u64 stripe_end; int stop_loop = 0; - path = btrfs_alloc_path(); - if (!path) + stripe = alloc_scrub2_stripe(fs_info, bg); + if (!stripe) return -ENOMEM; - /* - * work on commit root. The related disk blocks are static as - * long as COW is applied. This means, it is save to rewrite - * them to repair disk errors without any race conditions - */ - path->search_commit_root = 1; - path->skip_locking = 1; - path->reada = READA_FORWARD; - wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); scrub_blocked_if_needed(fs_info); @@ -4751,16 +4742,16 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, * Only @physical and @mirror_num needs to calculated using * @stripe_index. */ - ret = scrub_simple_mirror(sctx, root, csum_root, bg, map, - bg->start, bg->length, scrub_dev, + ret = scrub_simple_mirror(sctx, stripe, root, csum_root, bg, + map, bg->start, bg->length, scrub_dev, map->stripes[stripe_index].physical, stripe_index + 1); offset = 0; goto out; } if (profile & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) { - ret = scrub_simple_stripe(sctx, root, csum_root, bg, map, - scrub_dev, stripe_index); + ret = scrub_simple_stripe(sctx, stripe, root, csum_root, bg, + map, scrub_dev, stripe_index); offset = map->stripe_len * (stripe_index / map->sub_stripes); goto out; } @@ -4789,10 +4780,8 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (ret) { /* it is parity strip */ stripe_logical += chunk_logical; - stripe_end = stripe_logical + increment; - ret = scrub_raid56_parity(sctx, map, scrub_dev, - stripe_logical, - stripe_end); + ret = scrub2_raid56_parity(sctx, bg, scrub_dev, + stripe_logical); if (ret) goto out; goto next; @@ -4806,8 +4795,8 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, * We can reuse scrub_simple_mirror() here, as the repair part * is still based on @mirror_num. */ - ret = scrub_simple_mirror(sctx, root, csum_root, bg, map, - logical, map->stripe_len, + ret = scrub_simple_mirror(sctx, stripe, root, csum_root, bg, + map, logical, map->stripe_len, scrub_dev, physical, 1); if (ret < 0) goto out; @@ -4825,6 +4814,8 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, break; } out: + free_scrub2_stripe(stripe); + /* push queued extents */ scrub_submit(sctx); mutex_lock(&sctx->wr_lock); @@ -4832,7 +4823,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, mutex_unlock(&sctx->wr_lock); blk_finish_plug(&plug); - btrfs_free_path(path); if (sctx->is_dev_replace && ret >= 0) { int ret2; diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index c22349380c50..1498c770fb77 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -17,19 +17,12 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, * The following functions are temporary exports to avoid warning on unused * static functions. */ -struct scrub2_stripe; -struct scrub2_stripe *alloc_scrub2_stripe(struct btrfs_fs_info *fs_info, - struct btrfs_block_group *bg); -int scrub2_find_fill_first_stripe(struct btrfs_root *extent_root, - struct btrfs_root *csum_root, - struct btrfs_block_group *bg, - u64 logical_start, u64 logical_len, - struct scrub2_stripe *stripe); -void scrub2_report_errors(struct scrub_ctx *sctx, - struct scrub2_stripe *stripe); -int scrub2_raid56_parity(struct scrub_ctx *sctx, - struct btrfs_block_group *bg, - struct btrfs_device *target, - u64 full_stripe_logical); +int scrub_raid56_parity(struct scrub_ctx *sctx, struct map_lookup *map, + struct btrfs_device *sdev, u64 logic_start, + u64 logic_end); +int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, + u64 logical, u32 len, u64 physical, + struct btrfs_device *dev, u64 flags, + u64 gen, int mirror_num); #endif From patchwork Tue Dec 6 08:23:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13065523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C65F1C47090 for ; Tue, 6 Dec 2022 08:24:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233497AbiLFIY2 (ORCPT ); Tue, 6 Dec 2022 03:24:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232511AbiLFIYJ (ORCPT ); Tue, 6 Dec 2022 03:24:09 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 110C3644E for ; Tue, 6 Dec 2022 00:24:08 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C6F7621C04 for ; Tue, 6 Dec 2022 08:24:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670315046; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jZxHZRUaWOWvjhyfdDo3SUrWin+5Hy63ZMUN7bRWDwE=; b=VUWUpqnpkGjLSnU9OG0UopvyVsWpNAZggtzs3jCj8XWjE9fDU75z0ao1yTxQvovFtk9k0e vg6awilCDZzgfBmXQtcLcTJbYlnR6i53RoXNX7DePqU5AOHTNKFVnkKJWH8Uj3KmgQMtnT a2XwqkNYAGt2jkFptQtptLLIJBBeaXc= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id F34C7132F3 for ; Tue, 6 Dec 2022 08:24:05 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id 8Cr2LyX8jmNRbAAAGKfGzw (envelope-from ) for ; Tue, 06 Dec 2022 08:24:05 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PoC PATCH 10/11] btrfs: scrub: cleanup the old scrub_parity infrastructure Date: Tue, 6 Dec 2022 16:23:37 +0800 Message-Id: <3f3a3123889794545b17a4b215ba60ba4f248bc2.1670314744.git.wqu@suse.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since RAID56 scrub has switched to use scrub2_stripe with scrub2_stripe_group, there is no need for the scrub_parity infrastructure. Cleanup them up, while still keep the existing scrub_block infrastructure, as this cleanup is already too big for a single patch. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 505 ----------------------------------------------- fs/btrfs/scrub.h | 3 - 2 files changed, 508 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 77209792fa90..f3981f11dd2c 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -257,7 +257,6 @@ struct scrub_block { atomic_t outstanding_sectors; refcount_t refs; /* free mem on transition to zero */ struct scrub_ctx *sctx; - struct scrub_parity *sparity; struct { unsigned int header_error:1; unsigned int checksum_error:1; @@ -271,37 +270,6 @@ struct scrub_block { struct work_struct work; }; -/* Used for the chunks with parity stripe such RAID5/6 */ -struct scrub_parity { - struct scrub_ctx *sctx; - - struct btrfs_device *scrub_dev; - - u64 logic_start; - - u64 logic_end; - - int nsectors; - - u32 stripe_len; - - refcount_t refs; - - struct list_head sectors_list; - - /* Work of parity check and repair */ - struct work_struct work; - - /* Mark the parity blocks which have data */ - unsigned long dbitmap; - - /* - * Mark the parity blocks which have data, but errors happen when - * read data or check data - */ - unsigned long ebitmap; -}; - struct scrub_ctx { struct scrub_bio *bios[SCRUB_BIOS_PER_SCTX]; struct btrfs_fs_info *fs_info; @@ -640,8 +608,6 @@ static int scrub_checksum_super(struct scrub_block *sblock); static void scrub_block_put(struct scrub_block *sblock); static void scrub_sector_get(struct scrub_sector *sector); static void scrub_sector_put(struct scrub_sector *sector); -static void scrub_parity_get(struct scrub_parity *sparity); -static void scrub_parity_put(struct scrub_parity *sparity); static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len, u64 physical, struct btrfs_device *dev, u64 flags, u64 gen, int mirror_num, u8 *csum, @@ -1992,13 +1958,6 @@ static void scrub_write_block_to_dev_replace(struct scrub_block *sblock) struct btrfs_fs_info *fs_info = sblock->sctx->fs_info; int i; - /* - * This block is used for the check of the parity on the source device, - * so the data needn't be written into the destination device. - */ - if (sblock->sparity) - return; - for (i = 0; i < sblock->sector_count; i++) { int ret; @@ -2358,9 +2317,6 @@ static void scrub_block_put(struct scrub_block *sblock) if (refcount_dec_and_test(&sblock->refs)) { int i; - if (sblock->sparity) - scrub_parity_put(sblock->sparity); - for (i = 0; i < sblock->sector_count; i++) scrub_sector_put(sblock->sectors[i]); for (i = 0; i < DIV_ROUND_UP(sblock->len, PAGE_SIZE); i++) { @@ -2776,45 +2732,6 @@ static void scrub_bio_end_io_worker(struct work_struct *work) scrub_pending_bio_dec(sctx); } -static inline void __scrub_mark_bitmap(struct scrub_parity *sparity, - unsigned long *bitmap, - u64 start, u32 len) -{ - u64 offset; - u32 nsectors; - u32 sectorsize_bits = sparity->sctx->fs_info->sectorsize_bits; - - if (len >= sparity->stripe_len) { - bitmap_set(bitmap, 0, sparity->nsectors); - return; - } - - start -= sparity->logic_start; - start = div64_u64_rem(start, sparity->stripe_len, &offset); - offset = offset >> sectorsize_bits; - nsectors = len >> sectorsize_bits; - - if (offset + nsectors <= sparity->nsectors) { - bitmap_set(bitmap, offset, nsectors); - return; - } - - bitmap_set(bitmap, offset, sparity->nsectors - offset); - bitmap_set(bitmap, 0, nsectors - (sparity->nsectors - offset)); -} - -static inline void scrub_parity_mark_sectors_error(struct scrub_parity *sparity, - u64 start, u32 len) -{ - __scrub_mark_bitmap(sparity, &sparity->ebitmap, start, len); -} - -static inline void scrub_parity_mark_sectors_data(struct scrub_parity *sparity, - u64 start, u32 len) -{ - __scrub_mark_bitmap(sparity, &sparity->dbitmap, start, len); -} - static void scrub_block_complete(struct scrub_block *sblock) { int corrupted = 0; @@ -2832,17 +2749,6 @@ static void scrub_block_complete(struct scrub_block *sblock) if (!corrupted && sblock->sctx->is_dev_replace) scrub_write_block_to_dev_replace(sblock); } - - if (sblock->sparity && corrupted && !sblock->data_corrected) { - u64 start = sblock->logical; - u64 end = sblock->logical + - sblock->sectors[sblock->sector_count - 1]->offset + - sblock->sctx->fs_info->sectorsize; - - ASSERT(end - start <= U32_MAX); - scrub_parity_mark_sectors_error(sblock->sparity, - start, end - start); - } } static void drop_csum_range(struct scrub_ctx *sctx, struct btrfs_ordered_sum *sum) @@ -2978,123 +2884,6 @@ int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, return 0; } -static int scrub_sectors_for_parity(struct scrub_parity *sparity, - u64 logical, u32 len, - u64 physical, struct btrfs_device *dev, - u64 flags, u64 gen, int mirror_num, u8 *csum) -{ - struct scrub_ctx *sctx = sparity->sctx; - struct scrub_block *sblock; - const u32 sectorsize = sctx->fs_info->sectorsize; - int index; - - ASSERT(IS_ALIGNED(len, sectorsize)); - - sblock = alloc_scrub_block(sctx, dev, logical, physical, physical, mirror_num); - if (!sblock) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); - return -ENOMEM; - } - - sblock->sparity = sparity; - scrub_parity_get(sparity); - - for (index = 0; len > 0; index++) { - struct scrub_sector *sector; - - sector = alloc_scrub_sector(sblock, logical); - if (!sector) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); - scrub_block_put(sblock); - return -ENOMEM; - } - sblock->sectors[index] = sector; - /* For scrub parity */ - scrub_sector_get(sector); - list_add_tail(§or->list, &sparity->sectors_list); - sector->flags = flags; - sector->generation = gen; - if (csum) { - sector->have_csum = 1; - memcpy(sector->csum, csum, sctx->fs_info->csum_size); - } else { - sector->have_csum = 0; - } - - /* Iterate over the stripe range in sectorsize steps */ - len -= sectorsize; - logical += sectorsize; - physical += sectorsize; - } - - WARN_ON(sblock->sector_count == 0); - for (index = 0; index < sblock->sector_count; index++) { - struct scrub_sector *sector = sblock->sectors[index]; - int ret; - - ret = scrub_add_sector_to_rd_bio(sctx, sector); - if (ret) { - scrub_block_put(sblock); - return ret; - } - } - - /* Last one frees, either here or in bio completion for last sector */ - scrub_block_put(sblock); - return 0; -} - -static int scrub_extent_for_parity(struct scrub_parity *sparity, - u64 logical, u32 len, - u64 physical, struct btrfs_device *dev, - u64 flags, u64 gen, int mirror_num) -{ - struct scrub_ctx *sctx = sparity->sctx; - int ret; - u8 csum[BTRFS_CSUM_SIZE]; - u32 blocksize; - - if (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state)) { - scrub_parity_mark_sectors_error(sparity, logical, len); - return 0; - } - - if (flags & BTRFS_EXTENT_FLAG_DATA) { - blocksize = sparity->stripe_len; - } else if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { - blocksize = sparity->stripe_len; - } else { - blocksize = sctx->fs_info->sectorsize; - WARN_ON(1); - } - - while (len) { - u32 l = min(len, blocksize); - int have_csum = 0; - - if (flags & BTRFS_EXTENT_FLAG_DATA) { - /* push csums to sbio */ - have_csum = scrub_find_csum(sctx, logical, csum); - if (have_csum == 0) - goto skip; - } - ret = scrub_sectors_for_parity(sparity, logical, l, physical, dev, - flags, gen, mirror_num, - have_csum ? csum : NULL); - if (ret) - return ret; -skip: - len -= l; - logical += l; - physical += l; - } - return 0; -} - /* * Given a physical address, this will calculate it's * logical offset. if this is a parity stripe, it will return @@ -3139,119 +2928,6 @@ static int get_raid56_logic_offset(u64 physical, int num, return 1; } -static void scrub_free_parity(struct scrub_parity *sparity) -{ - struct scrub_ctx *sctx = sparity->sctx; - struct scrub_sector *curr, *next; - int nbits; - - nbits = bitmap_weight(&sparity->ebitmap, sparity->nsectors); - if (nbits) { - spin_lock(&sctx->stat_lock); - sctx->stat.read_errors += nbits; - sctx->stat.uncorrectable_errors += nbits; - spin_unlock(&sctx->stat_lock); - } - - list_for_each_entry_safe(curr, next, &sparity->sectors_list, list) { - list_del_init(&curr->list); - scrub_sector_put(curr); - } - - kfree(sparity); -} - -static void scrub_parity_bio_endio_worker(struct work_struct *work) -{ - struct scrub_parity *sparity = container_of(work, struct scrub_parity, - work); - struct scrub_ctx *sctx = sparity->sctx; - - btrfs_bio_counter_dec(sctx->fs_info); - scrub_free_parity(sparity); - scrub_pending_bio_dec(sctx); -} - -static void scrub_parity_bio_endio(struct bio *bio) -{ - struct scrub_parity *sparity = bio->bi_private; - struct btrfs_fs_info *fs_info = sparity->sctx->fs_info; - - if (bio->bi_status) - bitmap_or(&sparity->ebitmap, &sparity->ebitmap, - &sparity->dbitmap, sparity->nsectors); - - bio_put(bio); - - INIT_WORK(&sparity->work, scrub_parity_bio_endio_worker); - queue_work(fs_info->scrub_parity_workers, &sparity->work); -} - -static void scrub_parity_check_and_repair(struct scrub_parity *sparity) -{ - struct scrub_ctx *sctx = sparity->sctx; - struct btrfs_fs_info *fs_info = sctx->fs_info; - struct bio *bio; - struct btrfs_raid_bio *rbio; - struct btrfs_io_context *bioc = NULL; - u64 length; - int ret; - - if (!bitmap_andnot(&sparity->dbitmap, &sparity->dbitmap, - &sparity->ebitmap, sparity->nsectors)) - goto out; - - length = sparity->logic_end - sparity->logic_start; - - btrfs_bio_counter_inc_blocked(fs_info); - ret = btrfs_map_sblock(fs_info, BTRFS_MAP_WRITE, sparity->logic_start, - &length, &bioc); - if (ret || !bioc || !bioc->raid_map) - goto bioc_out; - - bio = bio_alloc(NULL, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS); - bio->bi_iter.bi_sector = sparity->logic_start >> 9; - bio->bi_private = sparity; - bio->bi_end_io = scrub_parity_bio_endio; - - rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, - sparity->scrub_dev, - &sparity->dbitmap, - sparity->nsectors); - btrfs_put_bioc(bioc); - if (!rbio) - goto rbio_out; - - scrub_pending_bio_inc(sctx); - raid56_parity_submit_scrub_rbio(rbio); - return; - -rbio_out: - bio_put(bio); -bioc_out: - btrfs_bio_counter_dec(fs_info); - bitmap_or(&sparity->ebitmap, &sparity->ebitmap, &sparity->dbitmap, - sparity->nsectors); - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); -out: - scrub_free_parity(sparity); -} - -static void scrub_parity_get(struct scrub_parity *sparity) -{ - refcount_inc(&sparity->refs); -} - -static void scrub_parity_put(struct scrub_parity *sparity) -{ - if (!refcount_dec_and_test(&sparity->refs)) - return; - - scrub_parity_check_and_repair(sparity); -} - /* * Return 0 if the extent item range covers any byte of the range. * Return <0 if the extent item is before @search_start. @@ -3378,187 +3054,6 @@ static void get_extent_info(struct btrfs_path *path, u64 *extent_start_ret, *generation_ret = btrfs_extent_generation(path->nodes[0], ei); } -static bool does_range_cross_boundary(u64 extent_start, u64 extent_len, - u64 boundary_start, u64 boudary_len) -{ - return (extent_start < boundary_start && - extent_start + extent_len > boundary_start) || - (extent_start < boundary_start + boudary_len && - extent_start + extent_len > boundary_start + boudary_len); -} - -static int scrub_raid56_data_stripe_for_parity(struct scrub_ctx *sctx, - struct scrub_parity *sparity, - struct map_lookup *map, - struct btrfs_device *sdev, - struct btrfs_path *path, - u64 logical) -{ - struct btrfs_fs_info *fs_info = sctx->fs_info; - struct btrfs_root *extent_root = btrfs_extent_root(fs_info, logical); - struct btrfs_root *csum_root = btrfs_csum_root(fs_info, logical); - u64 cur_logical = logical; - int ret; - - ASSERT(map->type & BTRFS_BLOCK_GROUP_RAID56_MASK); - - /* Path must not be populated */ - ASSERT(!path->nodes[0]); - - while (cur_logical < logical + map->stripe_len) { - struct btrfs_io_context *bioc = NULL; - struct btrfs_device *extent_dev; - u64 extent_start; - u64 extent_size; - u64 mapped_length; - u64 extent_flags; - u64 extent_gen; - u64 extent_physical; - u64 extent_mirror_num; - - ret = find_first_extent_item(extent_root, path, cur_logical, - logical + map->stripe_len - cur_logical); - /* No more extent item in this data stripe */ - if (ret > 0) { - ret = 0; - break; - } - if (ret < 0) - break; - get_extent_info(path, &extent_start, &extent_size, &extent_flags, - &extent_gen); - - /* Metadata should not cross stripe boundaries */ - if ((extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) && - does_range_cross_boundary(extent_start, extent_size, - logical, map->stripe_len)) { - btrfs_err(fs_info, - "scrub: tree block %llu spanning stripes, ignored. logical=%llu", - extent_start, logical); - spin_lock(&sctx->stat_lock); - sctx->stat.uncorrectable_errors++; - spin_unlock(&sctx->stat_lock); - cur_logical += extent_size; - continue; - } - - /* Skip hole range which doesn't have any extent */ - cur_logical = max(extent_start, cur_logical); - - /* Truncate the range inside this data stripe */ - extent_size = min(extent_start + extent_size, - logical + map->stripe_len) - cur_logical; - extent_start = cur_logical; - ASSERT(extent_size <= U32_MAX); - - scrub_parity_mark_sectors_data(sparity, extent_start, extent_size); - - mapped_length = extent_size; - ret = btrfs_map_block(fs_info, BTRFS_MAP_READ, extent_start, - &mapped_length, &bioc, 0); - if (!ret && (!bioc || mapped_length < extent_size)) - ret = -EIO; - if (ret) { - btrfs_put_bioc(bioc); - scrub_parity_mark_sectors_error(sparity, extent_start, - extent_size); - break; - } - extent_physical = bioc->stripes[0].physical; - extent_mirror_num = bioc->mirror_num; - extent_dev = bioc->stripes[0].dev; - btrfs_put_bioc(bioc); - - ret = btrfs_lookup_csums_list(csum_root, extent_start, - extent_start + extent_size - 1, - &sctx->csum_list, 1, false); - if (ret) { - scrub_parity_mark_sectors_error(sparity, extent_start, - extent_size); - break; - } - - ret = scrub_extent_for_parity(sparity, extent_start, - extent_size, extent_physical, - extent_dev, extent_flags, - extent_gen, extent_mirror_num); - scrub_free_csums(sctx); - - if (ret) { - scrub_parity_mark_sectors_error(sparity, extent_start, - extent_size); - break; - } - - cond_resched(); - cur_logical += extent_size; - } - btrfs_release_path(path); - return ret; -} - -int scrub_raid56_parity(struct scrub_ctx *sctx, struct map_lookup *map, - struct btrfs_device *sdev, u64 logic_start, - u64 logic_end) -{ - struct btrfs_fs_info *fs_info = sctx->fs_info; - struct btrfs_path *path; - u64 cur_logical; - int ret; - struct scrub_parity *sparity; - int nsectors; - - path = btrfs_alloc_path(); - if (!path) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); - return -ENOMEM; - } - path->search_commit_root = 1; - path->skip_locking = 1; - - ASSERT(map->stripe_len <= U32_MAX); - nsectors = map->stripe_len >> fs_info->sectorsize_bits; - ASSERT(nsectors <= BITS_PER_LONG); - sparity = kzalloc(sizeof(struct scrub_parity), GFP_NOFS); - if (!sparity) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); - btrfs_free_path(path); - return -ENOMEM; - } - - ASSERT(map->stripe_len <= U32_MAX); - sparity->stripe_len = map->stripe_len; - sparity->nsectors = nsectors; - sparity->sctx = sctx; - sparity->scrub_dev = sdev; - sparity->logic_start = logic_start; - sparity->logic_end = logic_end; - refcount_set(&sparity->refs, 1); - INIT_LIST_HEAD(&sparity->sectors_list); - - ret = 0; - for (cur_logical = logic_start; cur_logical < logic_end; - cur_logical += map->stripe_len) { - ret = scrub_raid56_data_stripe_for_parity(sctx, sparity, map, - sdev, path, cur_logical); - if (ret < 0) - break; - } - - scrub_parity_put(sparity); - scrub_submit(sctx); - mutex_lock(&sctx->wr_lock); - scrub_wr_submit(sctx); - mutex_unlock(&sctx->wr_lock); - - btrfs_free_path(path); - return ret < 0 ? ret : 0; -} - static void sync_replace_for_zoned(struct scrub_ctx *sctx) { if (!btrfs_is_zoned(sctx->fs_info)) diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index 1498c770fb77..d387c7eef061 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -17,9 +17,6 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, * The following functions are temporary exports to avoid warning on unused * static functions. */ -int scrub_raid56_parity(struct scrub_ctx *sctx, struct map_lookup *map, - struct btrfs_device *sdev, u64 logic_start, - u64 logic_end); int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, u64 logical, u32 len, u64 physical, struct btrfs_device *dev, u64 flags, From patchwork Tue Dec 6 08:23:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13065524 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07CC2C636F9 for ; Tue, 6 Dec 2022 08:24:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232849AbiLFIYa (ORCPT ); Tue, 6 Dec 2022 03:24:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232435AbiLFIYL (ORCPT ); Tue, 6 Dec 2022 03:24:11 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15FFDD8C for ; Tue, 6 Dec 2022 00:24:09 -0800 (PST) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C5F0D21C0B for ; Tue, 6 Dec 2022 08:24:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670315047; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Cb736H8c0T1nOKVthUNnZykPgASk/5BEBXgudAJRoFc=; b=AQRA77e2a/HT2mMNgtO6cPAu8aASHIWqquhDZyf0Rh7pQj13lP4xoZg+IoBcWcg5e8+Ys3 +3ETWjRJgYs8474eN95c+Lwy7xG01Ayw2g6Mde+h0n611YyNW7uMGtLQvMI/9iUH58uloA kgOsBr8MMDwzdXsOcCN7XfjJXJRzHIQ= Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 2FAF2132F3 for ; Tue, 6 Dec 2022 08:24:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id iF5HOyb8jmNRbAAAGKfGzw (envelope-from ) for ; Tue, 06 Dec 2022 08:24:06 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PoC PATCH 11/11] btrfs: scrub: cleanup scrub_extent() and its related functions Date: Tue, 6 Dec 2022 16:23:38 +0800 Message-Id: <87b515af90d9533754af5d549d4f6f6d3b909036.1670314744.git.wqu@suse.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since scrub_simple_mirror() has migrated to scrub2_stripe based solution, scrub_extent() can be safely removed. With that function removed, all the static functions called by it can also be removed. Please note that, the following structures are no longer in use: - scrub_block - scrub_sector - scrub_bio But this cleanup is already too large for a single patch, thus they are left for further cleanup. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 443 ----------------------------------------------- fs/btrfs/scrub.h | 9 - 2 files changed, 452 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index f3981f11dd2c..41e676e2a1b9 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -608,18 +608,8 @@ static int scrub_checksum_super(struct scrub_block *sblock); static void scrub_block_put(struct scrub_block *sblock); static void scrub_sector_get(struct scrub_sector *sector); static void scrub_sector_put(struct scrub_sector *sector); -static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len, - u64 physical, struct btrfs_device *dev, u64 flags, - u64 gen, int mirror_num, u8 *csum, - u64 physical_for_dev_replace); -static void scrub_bio_end_io(struct bio *bio); static void scrub_bio_end_io_worker(struct work_struct *work); static void scrub_block_complete(struct scrub_block *sblock); -static void scrub_find_good_copy(struct btrfs_fs_info *fs_info, - u64 extent_logical, u32 extent_len, - u64 *extent_physical, - struct btrfs_device **extent_dev, - int *extent_mirror_num); static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx, struct scrub_sector *sector); static void scrub_wr_submit(struct scrub_ctx *sctx); @@ -2415,281 +2405,6 @@ static void scrub_submit(struct scrub_ctx *sctx) submit_bio(sbio->bio); } -static int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx, - struct scrub_sector *sector) -{ - struct scrub_block *sblock = sector->sblock; - struct scrub_bio *sbio; - const u32 sectorsize = sctx->fs_info->sectorsize; - int ret; - -again: - /* - * grab a fresh bio or wait for one to become available - */ - while (sctx->curr == -1) { - spin_lock(&sctx->list_lock); - sctx->curr = sctx->first_free; - if (sctx->curr != -1) { - sctx->first_free = sctx->bios[sctx->curr]->next_free; - sctx->bios[sctx->curr]->next_free = -1; - sctx->bios[sctx->curr]->sector_count = 0; - spin_unlock(&sctx->list_lock); - } else { - spin_unlock(&sctx->list_lock); - wait_event(sctx->list_wait, sctx->first_free != -1); - } - } - sbio = sctx->bios[sctx->curr]; - if (sbio->sector_count == 0) { - sbio->physical = sblock->physical + sector->offset; - sbio->logical = sblock->logical + sector->offset; - sbio->dev = sblock->dev; - if (!sbio->bio) { - sbio->bio = bio_alloc(sbio->dev->bdev, sctx->sectors_per_bio, - REQ_OP_READ, GFP_NOFS); - } - sbio->bio->bi_private = sbio; - sbio->bio->bi_end_io = scrub_bio_end_io; - sbio->bio->bi_iter.bi_sector = sbio->physical >> 9; - sbio->status = 0; - } else if (sbio->physical + sbio->sector_count * sectorsize != - sblock->physical + sector->offset || - sbio->logical + sbio->sector_count * sectorsize != - sblock->logical + sector->offset || - sbio->dev != sblock->dev) { - scrub_submit(sctx); - goto again; - } - - sbio->sectors[sbio->sector_count] = sector; - ret = bio_add_scrub_sector(sbio->bio, sector, sectorsize); - if (ret != sectorsize) { - if (sbio->sector_count < 1) { - bio_put(sbio->bio); - sbio->bio = NULL; - return -EIO; - } - scrub_submit(sctx); - goto again; - } - - scrub_block_get(sblock); /* one for the page added to the bio */ - atomic_inc(&sblock->outstanding_sectors); - sbio->sector_count++; - if (sbio->sector_count == sctx->sectors_per_bio) - scrub_submit(sctx); - - return 0; -} - -static void scrub_missing_raid56_end_io(struct bio *bio) -{ - struct scrub_block *sblock = bio->bi_private; - struct btrfs_fs_info *fs_info = sblock->sctx->fs_info; - - btrfs_bio_counter_dec(fs_info); - if (bio->bi_status) - sblock->no_io_error_seen = 0; - - bio_put(bio); - - queue_work(fs_info->scrub_workers, &sblock->work); -} - -static void scrub_missing_raid56_worker(struct work_struct *work) -{ - struct scrub_block *sblock = container_of(work, struct scrub_block, work); - struct scrub_ctx *sctx = sblock->sctx; - struct btrfs_fs_info *fs_info = sctx->fs_info; - u64 logical; - struct btrfs_device *dev; - - logical = sblock->logical; - dev = sblock->dev; - - if (sblock->no_io_error_seen) - scrub_recheck_block_checksum(sblock); - - if (!sblock->no_io_error_seen) { - spin_lock(&sctx->stat_lock); - sctx->stat.read_errors++; - spin_unlock(&sctx->stat_lock); - btrfs_err_rl_in_rcu(fs_info, - "IO error rebuilding logical %llu for dev %s", - logical, btrfs_dev_name(dev)); - } else if (sblock->header_error || sblock->checksum_error) { - spin_lock(&sctx->stat_lock); - sctx->stat.uncorrectable_errors++; - spin_unlock(&sctx->stat_lock); - btrfs_err_rl_in_rcu(fs_info, - "failed to rebuild valid logical %llu for dev %s", - logical, btrfs_dev_name(dev)); - } else { - scrub_write_block_to_dev_replace(sblock); - } - - if (sctx->is_dev_replace && sctx->flush_all_writes) { - mutex_lock(&sctx->wr_lock); - scrub_wr_submit(sctx); - mutex_unlock(&sctx->wr_lock); - } - - scrub_block_put(sblock); - scrub_pending_bio_dec(sctx); -} - -static void scrub_missing_raid56_pages(struct scrub_block *sblock) -{ - struct scrub_ctx *sctx = sblock->sctx; - struct btrfs_fs_info *fs_info = sctx->fs_info; - u64 length = sblock->sector_count << fs_info->sectorsize_bits; - u64 logical = sblock->logical; - struct btrfs_io_context *bioc = NULL; - struct bio *bio; - struct btrfs_raid_bio *rbio; - int ret; - int i; - - btrfs_bio_counter_inc_blocked(fs_info); - ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical, - &length, &bioc); - if (ret || !bioc || !bioc->raid_map) - goto bioc_out; - - if (WARN_ON(!sctx->is_dev_replace || - !(bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK))) { - /* - * We shouldn't be scrubbing a missing device. Even for dev - * replace, we should only get here for RAID 5/6. We either - * managed to mount something with no mirrors remaining or - * there's a bug in scrub_find_good_copy()/btrfs_map_block(). - */ - goto bioc_out; - } - - bio = bio_alloc(NULL, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS); - bio->bi_iter.bi_sector = logical >> 9; - bio->bi_private = sblock; - bio->bi_end_io = scrub_missing_raid56_end_io; - - rbio = raid56_alloc_missing_rbio(bio, bioc); - if (!rbio) - goto rbio_out; - - for (i = 0; i < sblock->sector_count; i++) { - struct scrub_sector *sector = sblock->sectors[i]; - - raid56_add_scrub_pages(rbio, scrub_sector_get_page(sector), - scrub_sector_get_page_offset(sector), - sector->offset + sector->sblock->logical); - } - - INIT_WORK(&sblock->work, scrub_missing_raid56_worker); - scrub_block_get(sblock); - scrub_pending_bio_inc(sctx); - raid56_submit_missing_rbio(rbio); - btrfs_put_bioc(bioc); - return; - -rbio_out: - bio_put(bio); -bioc_out: - btrfs_bio_counter_dec(fs_info); - btrfs_put_bioc(bioc); - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); -} - -static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len, - u64 physical, struct btrfs_device *dev, u64 flags, - u64 gen, int mirror_num, u8 *csum, - u64 physical_for_dev_replace) -{ - struct scrub_block *sblock; - const u32 sectorsize = sctx->fs_info->sectorsize; - int index; - - sblock = alloc_scrub_block(sctx, dev, logical, physical, - physical_for_dev_replace, mirror_num); - if (!sblock) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); - return -ENOMEM; - } - - for (index = 0; len > 0; index++) { - struct scrub_sector *sector; - /* - * Here we will allocate one page for one sector to scrub. - * This is fine if PAGE_SIZE == sectorsize, but will cost - * more memory for PAGE_SIZE > sectorsize case. - */ - u32 l = min(sectorsize, len); - - sector = alloc_scrub_sector(sblock, logical); - if (!sector) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); - scrub_block_put(sblock); - return -ENOMEM; - } - sector->flags = flags; - sector->generation = gen; - if (csum) { - sector->have_csum = 1; - memcpy(sector->csum, csum, sctx->fs_info->csum_size); - } else { - sector->have_csum = 0; - } - len -= l; - logical += l; - physical += l; - physical_for_dev_replace += l; - } - - WARN_ON(sblock->sector_count == 0); - if (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state)) { - /* - * This case should only be hit for RAID 5/6 device replace. See - * the comment in scrub_missing_raid56_pages() for details. - */ - scrub_missing_raid56_pages(sblock); - } else { - for (index = 0; index < sblock->sector_count; index++) { - struct scrub_sector *sector = sblock->sectors[index]; - int ret; - - ret = scrub_add_sector_to_rd_bio(sctx, sector); - if (ret) { - scrub_block_put(sblock); - return ret; - } - } - - if (flags & BTRFS_EXTENT_FLAG_SUPER) - scrub_submit(sctx); - } - - /* last one frees, either here or in bio completion for last page */ - scrub_block_put(sblock); - return 0; -} - -static void scrub_bio_end_io(struct bio *bio) -{ - struct scrub_bio *sbio = bio->bi_private; - struct btrfs_fs_info *fs_info = sbio->dev->fs_info; - - sbio->status = bio->bi_status; - sbio->bio = bio; - - queue_work(fs_info->scrub_workers, &sbio->work); -} - static void scrub_bio_end_io_worker(struct work_struct *work) { struct scrub_bio *sbio = container_of(work, struct scrub_bio, work); @@ -2751,139 +2466,6 @@ static void scrub_block_complete(struct scrub_block *sblock) } } -static void drop_csum_range(struct scrub_ctx *sctx, struct btrfs_ordered_sum *sum) -{ - sctx->stat.csum_discards += sum->len >> sctx->fs_info->sectorsize_bits; - list_del(&sum->list); - kfree(sum); -} - -/* - * Find the desired csum for range [logical, logical + sectorsize), and store - * the csum into @csum. - * - * The search source is sctx->csum_list, which is a pre-populated list - * storing bytenr ordered csum ranges. We're responsible to cleanup any range - * that is before @logical. - * - * Return 0 if there is no csum for the range. - * Return 1 if there is csum for the range and copied to @csum. - */ -static int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum) -{ - bool found = false; - - while (!list_empty(&sctx->csum_list)) { - struct btrfs_ordered_sum *sum = NULL; - unsigned long index; - unsigned long num_sectors; - - sum = list_first_entry(&sctx->csum_list, - struct btrfs_ordered_sum, list); - /* The current csum range is beyond our range, no csum found */ - if (sum->bytenr > logical) - break; - - /* - * The current sum is before our bytenr, since scrub is always - * done in bytenr order, the csum will never be used anymore, - * clean it up so that later calls won't bother with the range, - * and continue search the next range. - */ - if (sum->bytenr + sum->len <= logical) { - drop_csum_range(sctx, sum); - continue; - } - - /* Now the csum range covers our bytenr, copy the csum */ - found = true; - index = (logical - sum->bytenr) >> sctx->fs_info->sectorsize_bits; - num_sectors = sum->len >> sctx->fs_info->sectorsize_bits; - - memcpy(csum, sum->sums + index * sctx->fs_info->csum_size, - sctx->fs_info->csum_size); - - /* Cleanup the range if we're at the end of the csum range */ - if (index == num_sectors - 1) - drop_csum_range(sctx, sum); - break; - } - if (!found) - return 0; - return 1; -} - -/* scrub extent tries to collect up to 64 kB for each bio */ -int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, - u64 logical, u32 len, u64 physical, - struct btrfs_device *dev, u64 flags, - u64 gen, int mirror_num) -{ - struct btrfs_device *src_dev = dev; - u64 src_physical = physical; - int src_mirror = mirror_num; - int ret; - u8 csum[BTRFS_CSUM_SIZE]; - u32 blocksize; - - if (flags & BTRFS_EXTENT_FLAG_DATA) { - if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) - blocksize = map->stripe_len; - else - blocksize = sctx->fs_info->sectorsize; - spin_lock(&sctx->stat_lock); - sctx->stat.data_extents_scrubbed++; - sctx->stat.data_bytes_scrubbed += len; - spin_unlock(&sctx->stat_lock); - } else if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { - if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) - blocksize = map->stripe_len; - else - blocksize = sctx->fs_info->nodesize; - spin_lock(&sctx->stat_lock); - sctx->stat.tree_extents_scrubbed++; - sctx->stat.tree_bytes_scrubbed += len; - spin_unlock(&sctx->stat_lock); - } else { - blocksize = sctx->fs_info->sectorsize; - WARN_ON(1); - } - - /* - * For dev-replace case, we can have @dev being a missing device. - * Regular scrub will avoid its execution on missing device at all, - * as that would trigger tons of read error. - * - * Reading from missing device will cause read error counts to - * increase unnecessarily. - * So here we change the read source to a good mirror. - */ - if (sctx->is_dev_replace && !dev->bdev) - scrub_find_good_copy(sctx->fs_info, logical, len, &src_physical, - &src_dev, &src_mirror); - while (len) { - u32 l = min(len, blocksize); - int have_csum = 0; - - if (flags & BTRFS_EXTENT_FLAG_DATA) { - /* push csums to sbio */ - have_csum = scrub_find_csum(sctx, logical, csum); - if (have_csum == 0) - ++sctx->stat.no_csum; - } - ret = scrub_sectors(sctx, logical, l, src_physical, src_dev, - flags, gen, src_mirror, - have_csum ? csum : NULL, physical); - if (ret) - return ret; - len -= l; - logical += l; - physical += l; - src_physical += l; - } - return 0; -} - /* * Given a physical address, this will calculate it's * logical offset. if this is a parity stripe, it will return @@ -5132,28 +4714,3 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, return dev ? (sctx ? 0 : -ENOTCONN) : -ENODEV; } - -static void scrub_find_good_copy(struct btrfs_fs_info *fs_info, - u64 extent_logical, u32 extent_len, - u64 *extent_physical, - struct btrfs_device **extent_dev, - int *extent_mirror_num) -{ - u64 mapped_length; - struct btrfs_io_context *bioc = NULL; - int ret; - - mapped_length = extent_len; - ret = btrfs_map_block(fs_info, BTRFS_MAP_READ, extent_logical, - &mapped_length, &bioc, 0); - if (ret || !bioc || mapped_length < extent_len || - !bioc->stripes[0].dev->bdev) { - btrfs_put_bioc(bioc); - return; - } - - *extent_physical = bioc->stripes[0].physical; - *extent_mirror_num = bioc->mirror_num; - *extent_dev = bioc->stripes[0].dev; - btrfs_put_bioc(bioc); -} diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index d387c7eef061..7639103ebf9d 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -13,13 +13,4 @@ int btrfs_scrub_cancel_dev(struct btrfs_device *dev); int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, struct btrfs_scrub_progress *progress); -/* - * The following functions are temporary exports to avoid warning on unused - * static functions. - */ -int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, - u64 logical, u32 len, u64 physical, - struct btrfs_device *dev, u64 flags, - u64 gen, int mirror_num); - #endif