From patchwork Wed Mar 29 07:09:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13191989 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 916F3C6FD18 for ; Wed, 29 Mar 2023 07:10:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229847AbjC2HKQ (ORCPT ); Wed, 29 Mar 2023 03:10:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229850AbjC2HKN (ORCPT ); Wed, 29 Mar 2023 03:10:13 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 054202D5E for ; Wed, 29 Mar 2023 00:10:10 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 99D1A21A20 for ; Wed, 29 Mar 2023 07:10:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1680073809; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SBpApp6DH2psjFgqsJ861qO0zObmglyOuuVSpfIWMWY=; b=ZPQRxBivpP0hyQUmP80pcEBFotB26CBgADIpfUxENDeGUnzYsUevG231KxEXJguqxXl+zf DJRvpWzMqVvXLpZn1h8U8z0xzZowb333mB164fAs6C34yJEc2Vhy8eWW17IJJLTxg1sQNC IxCeSr1WYueg+q27TWgBkFVG+oV7o+8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 05D8E138FF for ; Wed, 29 Mar 2023 07:10:08 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id SM6AMVDkI2T4NQAAMHmgww (envelope-from ) for ; Wed, 29 Mar 2023 07:10:08 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/6] btrfs: scrub: use scrub_stripe to implement RAID56 P/Q scrub Date: Wed, 29 Mar 2023 15:09:45 +0800 Message-Id: X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch would implement the only missing part for scrub: RAID56 P/Q stripe scrub. The workflow is pretty straightforward for the new function, scrub_raid56_parity_stripe(): - Go through the regular scrub path for each data stripe - Wait for the verification and repair to finish - Writeback the repaired sectors to data stripes - Make sure all stripes are properly repaired If we have sectors unrepaired, we can not continue, or we may further corrupt the P/Q stripe. - Submit the rbio for P/Q stripe The dev-replace would be handled inside raid56_parity_submit_scrub_rbio() path. - Wait for the above bio to finish Although the old code is no longer used, we still keep the declaration, as the cleanup can be several times larger than this patch itself. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 217 ++++++++++++++++++++++++++++++++++++++++++++--- fs/btrfs/scrub.h | 5 ++ 2 files changed, 212 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 42e18cc905b8..c340fd51aa65 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -99,6 +99,13 @@ enum scrub_stripe_flags { /* Set when the read-repair is finished. */ SCRUB_STRIPE_FLAG_REPAIR_DONE, + + /* + * Set for data stripes if it's triggered from P/Q stripe. + * During such scrub, we should not report error in data stripes, nor + * update the accounting. + */ + SCRUB_STRIPE_FLAG_NO_REPORT, }; #define SCRUB_STRIPE_PAGES (BTRFS_STRIPE_LEN / PAGE_SIZE) @@ -279,6 +286,7 @@ struct scrub_parity { struct scrub_ctx { struct scrub_bio *bios[SCRUB_BIOS_PER_SCTX]; struct scrub_stripe stripes[SCRUB_STRIPES_PER_SCTX]; + struct scrub_stripe *raid56_data_stripes; struct btrfs_fs_info *fs_info; int first_free; int curr; @@ -2509,6 +2517,9 @@ static void scrub_stripe_report_errors(struct scrub_ctx *sctx, int nr_repaired_sectors = 0; int sector_nr; + if (test_bit(SCRUB_STRIPE_FLAG_NO_REPORT, &stripe->state)) + return; + /* * Init needed infos for error reporting. * @@ -3823,11 +3834,9 @@ static int scrub_raid56_data_stripe_for_parity(struct scrub_ctx *sctx, return ret; } -static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, - struct map_lookup *map, - struct btrfs_device *sdev, - u64 logic_start, - u64 logic_end) +int scrub_raid56_parity(struct scrub_ctx *sctx, struct map_lookup *map, + struct btrfs_device *sdev, u64 logic_start, + u64 logic_end) { struct btrfs_fs_info *fs_info = sctx->fs_info; struct btrfs_path *path; @@ -4200,6 +4209,11 @@ static void flush_scrub_stripes(struct scrub_ctx *sctx) sctx->cur_stripe = 0; } +static void raid56_scrub_wait_endio(struct bio *bio) +{ + complete(bio->bi_private); +} + static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *bg, struct btrfs_device *dev, int mirror_num, @@ -4225,6 +4239,167 @@ static int queue_scrub_stripe(struct scrub_ctx *sctx, return 0; } +static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx, + struct btrfs_device *scrub_dev, + struct btrfs_block_group *bg, + struct map_lookup *map, + u64 full_stripe_start) +{ + DECLARE_COMPLETION_ONSTACK(io_done); + struct btrfs_fs_info *fs_info = sctx->fs_info; + struct btrfs_raid_bio *rbio; + struct btrfs_io_context *bioc = NULL; + struct bio *bio; + struct scrub_stripe *stripe; + bool all_empty = true; + const int data_stripes = nr_data_stripes(map); + unsigned long extent_bitmap = 0; + u64 length = data_stripes << BTRFS_STRIPE_LEN_SHIFT; + int ret; + + ASSERT(sctx->raid56_data_stripes); + + for (int i = 0; i < data_stripes; i++) { + int stripe_index; + int rot; + u64 physical; + + stripe = &sctx->raid56_data_stripes[i]; + rot = div_u64(full_stripe_start - bg->start, data_stripes) >> + BTRFS_STRIPE_LEN_SHIFT; + stripe_index = (i + rot) % map->num_stripes; + physical = map->stripes[stripe_index].physical + + (rot << BTRFS_STRIPE_LEN_SHIFT); + + scrub_reset_stripe(stripe); + set_bit(SCRUB_STRIPE_FLAG_NO_REPORT, &stripe->state); + ret = scrub_find_fill_first_stripe(bg, + map->stripes[stripe_index].dev, physical, 1, + full_stripe_start + (i << BTRFS_STRIPE_LEN_SHIFT), + BTRFS_STRIPE_LEN, stripe); + if (ret < 0) + goto out; + /* + * No extent in this data stripe, need to manually mark them + * initialized to make later read submission happy. + */ + if (ret > 0) { + stripe->logical = full_stripe_start + + (i << BTRFS_STRIPE_LEN_SHIFT); + stripe->dev = map->stripes[stripe_index].dev; + stripe->mirror_num = 1; + set_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &stripe->state); + } + } + + /* Check if all data stripes are empty. */ + for (int i = 0; i < data_stripes; i++) { + stripe = &sctx->raid56_data_stripes[i]; + if (!bitmap_empty(&stripe->extent_sector_bitmap, + stripe->nr_sectors)) { + all_empty = false; + break; + } + } + if (all_empty) { + ret = 0; + goto out; + } + + for (int i = 0; i < data_stripes; i++) { + stripe = &sctx->raid56_data_stripes[i]; + scrub_submit_initial_read(sctx, stripe); + } + for (int i = 0; i < data_stripes; i++) { + stripe = &sctx->raid56_data_stripes[i]; + + wait_event(stripe->repair_wait, + test_bit(SCRUB_STRIPE_FLAG_REPAIR_DONE, + &stripe->state)); + } + /* For now, no zoned support for RAID56. */ + ASSERT(!btrfs_is_zoned(sctx->fs_info)); + + /* Writeback for the repaired sectors. */ + for (int i = 0; i < data_stripes; i++) { + unsigned long repaired; + + stripe = &sctx->raid56_data_stripes[i]; + + bitmap_andnot(&repaired, &stripe->init_error_bitmap, + &stripe->error_bitmap, stripe->nr_sectors); + scrub_write_sectors(sctx, stripe, repaired, false); + } + + /* Wait for above writebacks to finish. */ + for (int i = 0; i < data_stripes; i++) { + stripe = &sctx->raid56_data_stripes[i]; + + wait_scrub_stripe_io(stripe); + } + + /* + * Now all data stripes are properly verify. Check if we have any + * unrepaired, if so abort immediate or we can further corrupt the + * P/Q stripes. + * + * During the loop, also populate extent_bitmap. + */ + for (int i = 0; i < data_stripes; i++) { + unsigned long error; + + stripe = &sctx->raid56_data_stripes[i]; + + /* + * We should only check the errors where there is an extent. + * As we may hit an empty data stripe while it's missing. + */ + bitmap_and(&error, &stripe->error_bitmap, + &stripe->extent_sector_bitmap, stripe->nr_sectors); + if (!bitmap_empty(&error, stripe->nr_sectors)) { + btrfs_err(fs_info, +"unrepaired sectors detected, full stripe %llu data stripe %u errors %*pbl", + full_stripe_start, i, stripe->nr_sectors, + &error); + ret = -EIO; + goto out; + } + bitmap_or(&extent_bitmap, &extent_bitmap, + &stripe->extent_sector_bitmap, stripe->nr_sectors); + } + + /* Now we can check and regenerate the P/Q stripe. */ + bio = bio_alloc(NULL, 1, REQ_OP_READ, GFP_NOFS); + bio->bi_iter.bi_sector = full_stripe_start >> SECTOR_SHIFT; + bio->bi_private = &io_done; + bio->bi_end_io = raid56_scrub_wait_endio; + + btrfs_bio_counter_inc_blocked(fs_info); + ret = btrfs_map_sblock(fs_info, BTRFS_MAP_WRITE, full_stripe_start, + &length, &bioc); + if (ret < 0) { + btrfs_put_bioc(bioc); + btrfs_bio_counter_dec(fs_info); + goto out; + } + rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, scrub_dev, &extent_bitmap, + BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits); + btrfs_put_bioc(bioc); + if (!rbio) { + ret = -ENOMEM; + btrfs_bio_counter_dec(fs_info); + goto out; + } + raid56_parity_submit_scrub_rbio(rbio); + wait_for_completion_io(&io_done); + ret = blk_status_to_errno(bio->bi_status); + bio_put(bio); + btrfs_bio_counter_dec(fs_info); + +out: + return ret; +} + /* * Scrub one range which can only has simple mirror based profile. * (Including all range in SINGLE/DUP/RAID1/RAID1C*, and each stripe in @@ -4398,7 +4573,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, /* Offset inside the chunk */ u64 offset; u64 stripe_logical; - u64 stripe_end; int stop_loop = 0; wait_event(sctx->list_wait, @@ -4413,6 +4587,25 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, sctx->flush_all_writes = true; } + /* Prepare the extra data stripes used by RAID56. */ + if (profile & BTRFS_BLOCK_GROUP_RAID56_MASK) { + ASSERT(sctx->raid56_data_stripes == NULL); + + sctx->raid56_data_stripes = kcalloc(nr_data_stripes(map), + sizeof(struct scrub_stripe), GFP_KERNEL); + if (!sctx->raid56_data_stripes) { + ret = -ENOMEM; + goto out; + } + for (int i = 0; i < nr_data_stripes(map); i++) { + ret = init_scrub_stripe(fs_info, + &sctx->raid56_data_stripes[i]); + if (ret < 0) + goto out; + sctx->raid56_data_stripes[i].bg = bg; + sctx->raid56_data_stripes[i].sctx = sctx; + } + } /* * There used to be a big double loop to handle all profiles using the * same routine, which grows larger and more gross over time. @@ -4466,10 +4659,8 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (ret) { /* it is parity strip */ stripe_logical += chunk_logical; - stripe_end = stripe_logical + increment; - ret = scrub_raid56_parity(sctx, map, scrub_dev, - stripe_logical, - stripe_end); + ret = scrub_raid56_parity_stripe(sctx, scrub_dev, bg, + map, stripe_logical); if (ret) goto out; goto next; @@ -4507,6 +4698,12 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, scrub_wr_submit(sctx); mutex_unlock(&sctx->wr_lock); flush_scrub_stripes(sctx); + if (sctx->raid56_data_stripes) { + for (int i = 0; i < nr_data_stripes(map); i++) + release_scrub_stripe(&sctx->raid56_data_stripes[i]); + kfree(sctx->raid56_data_stripes); + sctx->raid56_data_stripes = NULL; + } if (sctx->is_dev_replace && ret >= 0) { int ret2; diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index 7639103ebf9d..6c17668dfad9 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -13,4 +13,9 @@ int btrfs_scrub_cancel_dev(struct btrfs_device *dev); int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, struct btrfs_scrub_progress *progress); +/* Temporary declaration, would be deleted later. */ +int scrub_raid56_parity(struct scrub_ctx *sctx, struct map_lookup *map, + struct btrfs_device *sdev, u64 logic_start, + u64 logic_end); + #endif From patchwork Wed Mar 29 07:09:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13191991 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 349EAC77B61 for ; Wed, 29 Mar 2023 07:10:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229854AbjC2HKS (ORCPT ); Wed, 29 Mar 2023 03:10:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229764AbjC2HKP (ORCPT ); Wed, 29 Mar 2023 03:10:15 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 115E010E9 for ; Wed, 29 Mar 2023 00:10:12 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id BF47121A27 for ; Wed, 29 Mar 2023 07:10:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1680073810; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oFX6UdDBr467d3PhwT075wmbRskqeqos3NeGXp73P0U=; b=blGo03ZeKNjcFWjd913lHrpGyVavyEiPGQOzvmBes9e7TX+W/YfpB6htsVEcMuQ3NMNeg2 KNsPOA9BZNg1ZVW0bBCqkauSQQiVS5gWNOFtKQKaBrARp49Q/lMyKARd63zY1TnaIELHiX ym2sS1iLg7zsyNZ+Pw0bsNIh0dn15KM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id F1BA4138FF for ; Wed, 29 Mar 2023 07:10:09 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id KBuIL1HkI2T4NQAAMHmgww (envelope-from ) for ; Wed, 29 Mar 2023 07:10:09 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/6] btrfs: scrub: remove scrub_parity structure Date: Wed, 29 Mar 2023 15:09:46 +0800 Message-Id: X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The structure scrub_parity is used to indicate that some extents are scrubbed for the purpose of RAID56 P/Q scrubbing. Since the whole RAID56 P/Q scrubbing path is replaced with new scrub_stripe infrastructure, and we no longer needs to use scrub_parity to modify the behavior of data stripes, we can remove the structure completely. This removal involves: - scrub_parity_workers Now only one worker would be utilized, scrub_workers, to do the read and repair. All writeback would happen at the main scrub thread. - scrub_block::sparity member - scrub_parity structure - function scrub_parity_get() - function scrub_parity_put() - function scrub_free_parity() - function __scrub_mark_bitmap() - function scrub_parity_mark_sectors_error() - function scrub_parity_mark_sectors_data() Those helpers are no longer needed, scrub_stripe has its bitmaps and we can use bitmap helpers to get the error/data status. - scrub_parity_bio_endio() - scrub_parity_check_and_repair() - function scrub_sectors_for_parity() - function scrub_extent_for_parity() - function scrub_raid56_data_stripe_for_parity() - function scrub_raid56_parity() The new code would reuse the scrub read-repair and writeback path. Just skip the dev-replace phase. And scrub_stripe infrastructure allows us to submit and wait for those data stripes before scrubbing P/Q, without extra infrastructures. The following two functions are temporary exported for later cleanup: - scrub_find_csum() - scrub_add_sector_to_rd_bio() Signed-off-by: Qu Wenruo --- fs/btrfs/fs.h | 1 - fs/btrfs/scrub.c | 526 +---------------------------------------------- fs/btrfs/scrub.h | 8 +- 3 files changed, 10 insertions(+), 525 deletions(-) diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index abe5b3475a9d..b00f0e9c99dc 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -633,7 +633,6 @@ struct btrfs_fs_info { refcount_t scrub_workers_refcnt; struct workqueue_struct *scrub_workers; struct workqueue_struct *scrub_wr_completion_workers; - struct workqueue_struct *scrub_parity_workers; struct btrfs_subpage_info *subpage_info; struct btrfs_discard_ctl discard_ctl; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index c340fd51aa65..8e9f47859d3d 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -238,7 +238,6 @@ struct scrub_block { atomic_t outstanding_sectors; refcount_t refs; /* free mem on transition to zero */ struct scrub_ctx *sctx; - struct scrub_parity *sparity; struct { unsigned int header_error:1; unsigned int checksum_error:1; @@ -252,37 +251,6 @@ struct scrub_block { struct work_struct work; }; -/* Used for the chunks with parity stripe such RAID5/6 */ -struct scrub_parity { - struct scrub_ctx *sctx; - - struct btrfs_device *scrub_dev; - - u64 logic_start; - - u64 logic_end; - - int nsectors; - - u32 stripe_len; - - refcount_t refs; - - struct list_head sectors_list; - - /* Work of parity check and repair */ - struct work_struct work; - - /* Mark the parity blocks which have data */ - unsigned long dbitmap; - - /* - * Mark the parity blocks which have data, but errors happen when - * read data or check data - */ - unsigned long ebitmap; -}; - struct scrub_ctx { struct scrub_bio *bios[SCRUB_BIOS_PER_SCTX]; struct scrub_stripe stripes[SCRUB_STRIPES_PER_SCTX]; @@ -588,8 +556,6 @@ static int scrub_checksum_super(struct scrub_block *sblock); static void scrub_block_put(struct scrub_block *sblock); static void scrub_sector_get(struct scrub_sector *sector); static void scrub_sector_put(struct scrub_sector *sector); -static void scrub_parity_get(struct scrub_parity *sparity); -static void scrub_parity_put(struct scrub_parity *sparity); static void scrub_bio_end_io(struct bio *bio); static void scrub_bio_end_io_worker(struct work_struct *work); static void scrub_block_complete(struct scrub_block *sblock); @@ -1945,13 +1911,6 @@ static void scrub_write_block_to_dev_replace(struct scrub_block *sblock) struct btrfs_fs_info *fs_info = sblock->sctx->fs_info; int i; - /* - * This block is used for the check of the parity on the source device, - * so the data needn't be written into the destination device. - */ - if (sblock->sparity) - return; - for (i = 0; i < sblock->sector_count; i++) { int ret; @@ -2956,9 +2915,6 @@ static void scrub_block_put(struct scrub_block *sblock) if (refcount_dec_and_test(&sblock->refs)) { int i; - if (sblock->sparity) - scrub_parity_put(sblock->sparity); - for (i = 0; i < sblock->sector_count; i++) scrub_sector_put(sblock->sectors[i]); for (i = 0; i < DIV_ROUND_UP(sblock->len, PAGE_SIZE); i++) { @@ -3062,8 +3018,8 @@ static void scrub_submit(struct scrub_ctx *sctx) submit_bio(sbio->bio); } -static int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx, - struct scrub_sector *sector) +int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx, + struct scrub_sector *sector) { struct scrub_block *sblock = sector->sblock; struct scrub_bio *sbio; @@ -3183,45 +3139,6 @@ static void scrub_bio_end_io_worker(struct work_struct *work) scrub_pending_bio_dec(sctx); } -static inline void __scrub_mark_bitmap(struct scrub_parity *sparity, - unsigned long *bitmap, - u64 start, u32 len) -{ - u64 offset; - u32 nsectors; - u32 sectorsize_bits = sparity->sctx->fs_info->sectorsize_bits; - - if (len >= sparity->stripe_len) { - bitmap_set(bitmap, 0, sparity->nsectors); - return; - } - - start -= sparity->logic_start; - start = div64_u64_rem(start, sparity->stripe_len, &offset); - offset = offset >> sectorsize_bits; - nsectors = len >> sectorsize_bits; - - if (offset + nsectors <= sparity->nsectors) { - bitmap_set(bitmap, offset, nsectors); - return; - } - - bitmap_set(bitmap, offset, sparity->nsectors - offset); - bitmap_set(bitmap, 0, nsectors - (sparity->nsectors - offset)); -} - -static inline void scrub_parity_mark_sectors_error(struct scrub_parity *sparity, - u64 start, u32 len) -{ - __scrub_mark_bitmap(sparity, &sparity->ebitmap, start, len); -} - -static inline void scrub_parity_mark_sectors_data(struct scrub_parity *sparity, - u64 start, u32 len) -{ - __scrub_mark_bitmap(sparity, &sparity->dbitmap, start, len); -} - static void scrub_block_complete(struct scrub_block *sblock) { int corrupted = 0; @@ -3239,17 +3156,6 @@ static void scrub_block_complete(struct scrub_block *sblock) if (!corrupted && sblock->sctx->is_dev_replace) scrub_write_block_to_dev_replace(sblock); } - - if (sblock->sparity && corrupted && !sblock->data_corrected) { - u64 start = sblock->logical; - u64 end = sblock->logical + - sblock->sectors[sblock->sector_count - 1]->offset + - sblock->sctx->fs_info->sectorsize; - - ASSERT(end - start <= U32_MAX); - scrub_parity_mark_sectors_error(sblock->sparity, - start, end - start); - } } static void drop_csum_range(struct scrub_ctx *sctx, struct btrfs_ordered_sum *sum) @@ -3270,7 +3176,7 @@ static void drop_csum_range(struct scrub_ctx *sctx, struct btrfs_ordered_sum *su * Return 0 if there is no csum for the range. * Return 1 if there is csum for the range and copied to @csum. */ -static int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum) +int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum) { bool found = false; @@ -3314,123 +3220,6 @@ static int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum) return 1; } -static int scrub_sectors_for_parity(struct scrub_parity *sparity, - u64 logical, u32 len, - u64 physical, struct btrfs_device *dev, - u64 flags, u64 gen, int mirror_num, u8 *csum) -{ - struct scrub_ctx *sctx = sparity->sctx; - struct scrub_block *sblock; - const u32 sectorsize = sctx->fs_info->sectorsize; - int index; - - ASSERT(IS_ALIGNED(len, sectorsize)); - - sblock = alloc_scrub_block(sctx, dev, logical, physical, physical, mirror_num); - if (!sblock) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); - return -ENOMEM; - } - - sblock->sparity = sparity; - scrub_parity_get(sparity); - - for (index = 0; len > 0; index++) { - struct scrub_sector *sector; - - sector = alloc_scrub_sector(sblock, logical); - if (!sector) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); - scrub_block_put(sblock); - return -ENOMEM; - } - sblock->sectors[index] = sector; - /* For scrub parity */ - scrub_sector_get(sector); - list_add_tail(§or->list, &sparity->sectors_list); - sector->flags = flags; - sector->generation = gen; - if (csum) { - sector->have_csum = 1; - memcpy(sector->csum, csum, sctx->fs_info->csum_size); - } else { - sector->have_csum = 0; - } - - /* Iterate over the stripe range in sectorsize steps */ - len -= sectorsize; - logical += sectorsize; - physical += sectorsize; - } - - WARN_ON(sblock->sector_count == 0); - for (index = 0; index < sblock->sector_count; index++) { - struct scrub_sector *sector = sblock->sectors[index]; - int ret; - - ret = scrub_add_sector_to_rd_bio(sctx, sector); - if (ret) { - scrub_block_put(sblock); - return ret; - } - } - - /* Last one frees, either here or in bio completion for last sector */ - scrub_block_put(sblock); - return 0; -} - -static int scrub_extent_for_parity(struct scrub_parity *sparity, - u64 logical, u32 len, - u64 physical, struct btrfs_device *dev, - u64 flags, u64 gen, int mirror_num) -{ - struct scrub_ctx *sctx = sparity->sctx; - int ret; - u8 csum[BTRFS_CSUM_SIZE]; - u32 blocksize; - - if (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state)) { - scrub_parity_mark_sectors_error(sparity, logical, len); - return 0; - } - - if (flags & BTRFS_EXTENT_FLAG_DATA) { - blocksize = sparity->stripe_len; - } else if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { - blocksize = sparity->stripe_len; - } else { - blocksize = sctx->fs_info->sectorsize; - WARN_ON(1); - } - - while (len) { - u32 l = min(len, blocksize); - int have_csum = 0; - - if (flags & BTRFS_EXTENT_FLAG_DATA) { - /* push csums to sbio */ - have_csum = scrub_find_csum(sctx, logical, csum); - if (have_csum == 0) - goto skip; - } - ret = scrub_sectors_for_parity(sparity, logical, l, physical, dev, - flags, gen, mirror_num, - have_csum ? csum : NULL); - if (ret) - return ret; -skip: - len -= l; - logical += l; - physical += l; - } - return 0; -} - /* * Given a physical address, this will calculate it's * logical offset. if this is a parity stripe, it will return @@ -3476,119 +3265,6 @@ static int get_raid56_logic_offset(u64 physical, int num, return 1; } -static void scrub_free_parity(struct scrub_parity *sparity) -{ - struct scrub_ctx *sctx = sparity->sctx; - struct scrub_sector *curr, *next; - int nbits; - - nbits = bitmap_weight(&sparity->ebitmap, sparity->nsectors); - if (nbits) { - spin_lock(&sctx->stat_lock); - sctx->stat.read_errors += nbits; - sctx->stat.uncorrectable_errors += nbits; - spin_unlock(&sctx->stat_lock); - } - - list_for_each_entry_safe(curr, next, &sparity->sectors_list, list) { - list_del_init(&curr->list); - scrub_sector_put(curr); - } - - kfree(sparity); -} - -static void scrub_parity_bio_endio_worker(struct work_struct *work) -{ - struct scrub_parity *sparity = container_of(work, struct scrub_parity, - work); - struct scrub_ctx *sctx = sparity->sctx; - - btrfs_bio_counter_dec(sctx->fs_info); - scrub_free_parity(sparity); - scrub_pending_bio_dec(sctx); -} - -static void scrub_parity_bio_endio(struct bio *bio) -{ - struct scrub_parity *sparity = bio->bi_private; - struct btrfs_fs_info *fs_info = sparity->sctx->fs_info; - - if (bio->bi_status) - bitmap_or(&sparity->ebitmap, &sparity->ebitmap, - &sparity->dbitmap, sparity->nsectors); - - bio_put(bio); - - INIT_WORK(&sparity->work, scrub_parity_bio_endio_worker); - queue_work(fs_info->scrub_parity_workers, &sparity->work); -} - -static void scrub_parity_check_and_repair(struct scrub_parity *sparity) -{ - struct scrub_ctx *sctx = sparity->sctx; - struct btrfs_fs_info *fs_info = sctx->fs_info; - struct bio *bio; - struct btrfs_raid_bio *rbio; - struct btrfs_io_context *bioc = NULL; - u64 length; - int ret; - - if (!bitmap_andnot(&sparity->dbitmap, &sparity->dbitmap, - &sparity->ebitmap, sparity->nsectors)) - goto out; - - length = sparity->logic_end - sparity->logic_start; - - btrfs_bio_counter_inc_blocked(fs_info); - ret = btrfs_map_sblock(fs_info, BTRFS_MAP_WRITE, sparity->logic_start, - &length, &bioc); - if (ret || !bioc) - goto bioc_out; - - bio = bio_alloc(NULL, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS); - bio->bi_iter.bi_sector = sparity->logic_start >> 9; - bio->bi_private = sparity; - bio->bi_end_io = scrub_parity_bio_endio; - - rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, - sparity->scrub_dev, - &sparity->dbitmap, - sparity->nsectors); - btrfs_put_bioc(bioc); - if (!rbio) - goto rbio_out; - - scrub_pending_bio_inc(sctx); - raid56_parity_submit_scrub_rbio(rbio); - return; - -rbio_out: - bio_put(bio); -bioc_out: - btrfs_bio_counter_dec(fs_info); - bitmap_or(&sparity->ebitmap, &sparity->ebitmap, &sparity->dbitmap, - sparity->nsectors); - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); -out: - scrub_free_parity(sparity); -} - -static void scrub_parity_get(struct scrub_parity *sparity) -{ - refcount_inc(&sparity->refs); -} - -static void scrub_parity_put(struct scrub_parity *sparity) -{ - if (!refcount_dec_and_test(&sparity->refs)) - return; - - scrub_parity_check_and_repair(sparity); -} - /* * Return 0 if the extent item range covers any byte of the range. * Return <0 if the extent item is before @search_start. @@ -3715,185 +3391,6 @@ static void get_extent_info(struct btrfs_path *path, u64 *extent_start_ret, *generation_ret = btrfs_extent_generation(path->nodes[0], ei); } -static bool does_range_cross_boundary(u64 extent_start, u64 extent_len, - u64 boundary_start, u64 boudary_len) -{ - return (extent_start < boundary_start && - extent_start + extent_len > boundary_start) || - (extent_start < boundary_start + boudary_len && - extent_start + extent_len > boundary_start + boudary_len); -} - -static int scrub_raid56_data_stripe_for_parity(struct scrub_ctx *sctx, - struct scrub_parity *sparity, - struct map_lookup *map, - struct btrfs_device *sdev, - struct btrfs_path *path, - u64 logical) -{ - struct btrfs_fs_info *fs_info = sctx->fs_info; - struct btrfs_root *extent_root = btrfs_extent_root(fs_info, logical); - struct btrfs_root *csum_root = btrfs_csum_root(fs_info, logical); - u64 cur_logical = logical; - int ret; - - ASSERT(map->type & BTRFS_BLOCK_GROUP_RAID56_MASK); - - /* Path must not be populated */ - ASSERT(!path->nodes[0]); - - while (cur_logical < logical + BTRFS_STRIPE_LEN) { - struct btrfs_io_context *bioc = NULL; - struct btrfs_device *extent_dev; - u64 extent_start; - u64 extent_size; - u64 mapped_length; - u64 extent_flags; - u64 extent_gen; - u64 extent_physical; - u64 extent_mirror_num; - - ret = find_first_extent_item(extent_root, path, cur_logical, - logical + BTRFS_STRIPE_LEN - cur_logical); - /* No more extent item in this data stripe */ - if (ret > 0) { - ret = 0; - break; - } - if (ret < 0) - break; - get_extent_info(path, &extent_start, &extent_size, &extent_flags, - &extent_gen); - - /* Metadata should not cross stripe boundaries */ - if ((extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) && - does_range_cross_boundary(extent_start, extent_size, - logical, BTRFS_STRIPE_LEN)) { - btrfs_err(fs_info, - "scrub: tree block %llu spanning stripes, ignored. logical=%llu", - extent_start, logical); - spin_lock(&sctx->stat_lock); - sctx->stat.uncorrectable_errors++; - spin_unlock(&sctx->stat_lock); - cur_logical += extent_size; - continue; - } - - /* Skip hole range which doesn't have any extent */ - cur_logical = max(extent_start, cur_logical); - - /* Truncate the range inside this data stripe */ - extent_size = min(extent_start + extent_size, - logical + BTRFS_STRIPE_LEN) - cur_logical; - extent_start = cur_logical; - ASSERT(extent_size <= U32_MAX); - - scrub_parity_mark_sectors_data(sparity, extent_start, extent_size); - - mapped_length = extent_size; - ret = btrfs_map_block(fs_info, BTRFS_MAP_READ, extent_start, - &mapped_length, &bioc, 0); - if (!ret && (!bioc || mapped_length < extent_size)) - ret = -EIO; - if (ret) { - btrfs_put_bioc(bioc); - scrub_parity_mark_sectors_error(sparity, extent_start, - extent_size); - break; - } - extent_physical = bioc->stripes[0].physical; - extent_mirror_num = bioc->mirror_num; - extent_dev = bioc->stripes[0].dev; - btrfs_put_bioc(bioc); - - ret = btrfs_lookup_csums_list(csum_root, extent_start, - extent_start + extent_size - 1, - &sctx->csum_list, 1, false); - if (ret) { - scrub_parity_mark_sectors_error(sparity, extent_start, - extent_size); - break; - } - - ret = scrub_extent_for_parity(sparity, extent_start, - extent_size, extent_physical, - extent_dev, extent_flags, - extent_gen, extent_mirror_num); - scrub_free_csums(sctx); - - if (ret) { - scrub_parity_mark_sectors_error(sparity, extent_start, - extent_size); - break; - } - - cond_resched(); - cur_logical += extent_size; - } - btrfs_release_path(path); - return ret; -} - -int scrub_raid56_parity(struct scrub_ctx *sctx, struct map_lookup *map, - struct btrfs_device *sdev, u64 logic_start, - u64 logic_end) -{ - struct btrfs_fs_info *fs_info = sctx->fs_info; - struct btrfs_path *path; - u64 cur_logical; - int ret; - struct scrub_parity *sparity; - int nsectors; - - path = btrfs_alloc_path(); - if (!path) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); - return -ENOMEM; - } - path->search_commit_root = 1; - path->skip_locking = 1; - - nsectors = BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits; - ASSERT(nsectors <= BITS_PER_LONG); - sparity = kzalloc(sizeof(struct scrub_parity), GFP_NOFS); - if (!sparity) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); - btrfs_free_path(path); - return -ENOMEM; - } - - sparity->stripe_len = BTRFS_STRIPE_LEN; - sparity->nsectors = nsectors; - sparity->sctx = sctx; - sparity->scrub_dev = sdev; - sparity->logic_start = logic_start; - sparity->logic_end = logic_end; - refcount_set(&sparity->refs, 1); - INIT_LIST_HEAD(&sparity->sectors_list); - - ret = 0; - for (cur_logical = logic_start; cur_logical < logic_end; - cur_logical += BTRFS_STRIPE_LEN) { - ret = scrub_raid56_data_stripe_for_parity(sctx, sparity, map, - sdev, path, cur_logical); - if (ret < 0) - break; - } - - scrub_parity_put(sparity); - scrub_submit(sctx); - mutex_lock(&sctx->wr_lock); - scrub_wr_submit(sctx); - mutex_unlock(&sctx->wr_lock); - - btrfs_free_path(path); - return ret < 0 ? ret : 0; -} - static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical, u64 physical, u64 physical_end) { @@ -5192,20 +4689,15 @@ static void scrub_workers_put(struct btrfs_fs_info *fs_info) struct workqueue_struct *scrub_workers = fs_info->scrub_workers; struct workqueue_struct *scrub_wr_comp = fs_info->scrub_wr_completion_workers; - struct workqueue_struct *scrub_parity = - fs_info->scrub_parity_workers; fs_info->scrub_workers = NULL; fs_info->scrub_wr_completion_workers = NULL; - fs_info->scrub_parity_workers = NULL; mutex_unlock(&fs_info->scrub_lock); if (scrub_workers) destroy_workqueue(scrub_workers); if (scrub_wr_comp) destroy_workqueue(scrub_wr_comp); - if (scrub_parity) - destroy_workqueue(scrub_parity); } } @@ -5217,7 +4709,6 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info, { struct workqueue_struct *scrub_workers = NULL; struct workqueue_struct *scrub_wr_comp = NULL; - struct workqueue_struct *scrub_parity = NULL; unsigned int flags = WQ_FREEZABLE | WQ_UNBOUND; int max_active = fs_info->thread_pool_size; int ret = -ENOMEM; @@ -5234,18 +4725,12 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info, if (!scrub_wr_comp) goto fail_scrub_wr_completion_workers; - scrub_parity = alloc_workqueue("btrfs-scrubparity", flags, max_active); - if (!scrub_parity) - goto fail_scrub_parity_workers; - mutex_lock(&fs_info->scrub_lock); if (refcount_read(&fs_info->scrub_workers_refcnt) == 0) { ASSERT(fs_info->scrub_workers == NULL && - fs_info->scrub_wr_completion_workers == NULL && - fs_info->scrub_parity_workers == NULL); + fs_info->scrub_wr_completion_workers == NULL); fs_info->scrub_workers = scrub_workers; fs_info->scrub_wr_completion_workers = scrub_wr_comp; - fs_info->scrub_parity_workers = scrub_parity; refcount_set(&fs_info->scrub_workers_refcnt, 1); mutex_unlock(&fs_info->scrub_lock); return 0; @@ -5255,8 +4740,7 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info, mutex_unlock(&fs_info->scrub_lock); ret = 0; - destroy_workqueue(scrub_parity); -fail_scrub_parity_workers: + destroy_workqueue(scrub_wr_comp); fail_scrub_wr_completion_workers: destroy_workqueue(scrub_workers); diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index 6c17668dfad9..d23068e741fb 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -14,8 +14,10 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, struct btrfs_scrub_progress *progress); /* Temporary declaration, would be deleted later. */ -int scrub_raid56_parity(struct scrub_ctx *sctx, struct map_lookup *map, - struct btrfs_device *sdev, u64 logic_start, - u64 logic_end); +struct scrub_ctx; +struct scrub_sector; +int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum); +int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx, + struct scrub_sector *sector); #endif From patchwork Wed Mar 29 07:09:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13191990 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80DC6C761AF for ; Wed, 29 Mar 2023 07:10:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229850AbjC2HKR (ORCPT ); Wed, 29 Mar 2023 03:10:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229782AbjC2HKP (ORCPT ); Wed, 29 Mar 2023 03:10:15 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 00A9B212F for ; Wed, 29 Mar 2023 00:10:12 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B72E721A28 for ; Wed, 29 Mar 2023 07:10:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1680073811; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6oWIsek5+ngKgbXbfYwWYOMQBk1Jc2/kfLjuI95BPZM=; b=nodLSkK52edXW1olUdHycV6I79cyBBhRC++ue7LPgRrwH/ARjUgQzzre2rCTmHdGjtqjjf beh8Y45YXla3iyxZ0AvFO5ll8h75u+odlCRWcuR5ZmzyBB+iREQ4QyoctzJkkTrGNUtc9C 2swy9SEXPd6M8rLOFe1RXroJHciOgr4= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 25083138FF for ; Wed, 29 Mar 2023 07:10:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id iG0SOVLkI2T4NQAAMHmgww (envelope-from ) for ; Wed, 29 Mar 2023 07:10:10 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/6] btrfs: scrub: remove the old writeback infrastructure Date: Wed, 29 Mar 2023 15:09:47 +0800 Message-Id: X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since the whole scrub path is switched to scrub_stripe based solution, the old writeback path can be removed completely, which involves: - scrub_ctx::wr_curr_bio member - scrub_ctx::flush_all_writes member - function scrub_write_block_to_dev_replace() - function scrub_write_sector_to_dev_replace() - function scrub_add_sector_to_wr_bio() - function scrub_wr_submit() - function scrub_wr_bio_end_io() - function scrub_wr_bio_end_io_worker() And one more function needs to be exported temporarily: - scrub_sector_get() Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 221 +---------------------------------------------- fs/btrfs/scrub.h | 1 + 2 files changed, 3 insertions(+), 219 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 8e9f47859d3d..6e981b77c73f 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -275,10 +275,8 @@ struct scrub_ctx { int is_dev_replace; u64 write_pointer; - struct scrub_bio *wr_curr_bio; struct mutex wr_lock; struct btrfs_device *wr_tgtdev; - bool flush_all_writes; /* * statistics @@ -547,23 +545,14 @@ static int scrub_repair_block_from_good_copy(struct scrub_block *sblock_bad, static int scrub_repair_sector_from_good_copy(struct scrub_block *sblock_bad, struct scrub_block *sblock_good, int sector_num, int force_write); -static void scrub_write_block_to_dev_replace(struct scrub_block *sblock); -static int scrub_write_sector_to_dev_replace(struct scrub_block *sblock, - int sector_num); static int scrub_checksum_data(struct scrub_block *sblock); static int scrub_checksum_tree_block(struct scrub_block *sblock); static int scrub_checksum_super(struct scrub_block *sblock); static void scrub_block_put(struct scrub_block *sblock); -static void scrub_sector_get(struct scrub_sector *sector); static void scrub_sector_put(struct scrub_sector *sector); static void scrub_bio_end_io(struct bio *bio); static void scrub_bio_end_io_worker(struct work_struct *work); static void scrub_block_complete(struct scrub_block *sblock); -static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx, - struct scrub_sector *sector); -static void scrub_wr_submit(struct scrub_ctx *sctx); -static void scrub_wr_bio_end_io(struct bio *bio); -static void scrub_wr_bio_end_io_worker(struct work_struct *work); static void scrub_put_ctx(struct scrub_ctx *sctx); static inline int scrub_is_page_on_raid56(struct scrub_sector *sector) @@ -872,7 +861,6 @@ static noinline_for_stack void scrub_free_ctx(struct scrub_ctx *sctx) for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) release_scrub_stripe(&sctx->stripes[i]); - kfree(sctx->wr_curr_bio); scrub_free_csums(sctx); kfree(sctx); } @@ -934,13 +922,10 @@ static noinline_for_stack struct scrub_ctx *scrub_setup_ctx( init_waitqueue_head(&sctx->list_wait); sctx->throttle_deadline = 0; - WARN_ON(sctx->wr_curr_bio != NULL); mutex_init(&sctx->wr_lock); - sctx->wr_curr_bio = NULL; if (is_dev_replace) { WARN_ON(!fs_info->dev_replace.tgtdev); sctx->wr_tgtdev = fs_info->dev_replace.tgtdev; - sctx->flush_all_writes = false; } return sctx; @@ -1306,8 +1291,6 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) sblock_to_check->data_corrected = 1; spin_unlock(&sctx->stat_lock); - if (sctx->is_dev_replace) - scrub_write_block_to_dev_replace(sblock_bad); goto out; } @@ -1396,7 +1379,6 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) !sblock_other->checksum_error && sblock_other->no_io_error_seen) { if (sctx->is_dev_replace) { - scrub_write_block_to_dev_replace(sblock_other); goto corrected_error; } else { ret = scrub_repair_block_from_good_copy( @@ -1478,13 +1460,6 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) */ if (!sblock_other) sblock_other = sblock_bad; - - if (scrub_write_sector_to_dev_replace(sblock_other, - sector_num) != 0) { - atomic64_inc( - &fs_info->dev_replace.num_write_errors); - success = 0; - } } else if (sblock_other) { ret = scrub_repair_sector_from_good_copy(sblock_bad, sblock_other, @@ -1906,31 +1881,6 @@ static int scrub_repair_sector_from_good_copy(struct scrub_block *sblock_bad, return 0; } -static void scrub_write_block_to_dev_replace(struct scrub_block *sblock) -{ - struct btrfs_fs_info *fs_info = sblock->sctx->fs_info; - int i; - - for (i = 0; i < sblock->sector_count; i++) { - int ret; - - ret = scrub_write_sector_to_dev_replace(sblock, i); - if (ret) - atomic64_inc(&fs_info->dev_replace.num_write_errors); - } -} - -static int scrub_write_sector_to_dev_replace(struct scrub_block *sblock, int sector_num) -{ - const u32 sectorsize = sblock->sctx->fs_info->sectorsize; - struct scrub_sector *sector = sblock->sectors[sector_num]; - - if (sector->io_error) - memset(scrub_sector_get_kaddr(sector), 0, sectorsize); - - return scrub_add_sector_to_wr_bio(sblock->sctx, sector); -} - static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) { int ret = 0; @@ -1958,150 +1908,6 @@ static void scrub_block_get(struct scrub_block *sblock) refcount_inc(&sblock->refs); } -static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx, - struct scrub_sector *sector) -{ - struct scrub_block *sblock = sector->sblock; - struct scrub_bio *sbio; - int ret; - const u32 sectorsize = sctx->fs_info->sectorsize; - - mutex_lock(&sctx->wr_lock); -again: - if (!sctx->wr_curr_bio) { - sctx->wr_curr_bio = kzalloc(sizeof(*sctx->wr_curr_bio), - GFP_KERNEL); - if (!sctx->wr_curr_bio) { - mutex_unlock(&sctx->wr_lock); - return -ENOMEM; - } - sctx->wr_curr_bio->sctx = sctx; - sctx->wr_curr_bio->sector_count = 0; - } - sbio = sctx->wr_curr_bio; - if (sbio->sector_count == 0) { - ret = fill_writer_pointer_gap(sctx, sector->offset + - sblock->physical_for_dev_replace); - if (ret) { - mutex_unlock(&sctx->wr_lock); - return ret; - } - - sbio->physical = sblock->physical_for_dev_replace + sector->offset; - sbio->logical = sblock->logical + sector->offset; - sbio->dev = sctx->wr_tgtdev; - if (!sbio->bio) { - sbio->bio = bio_alloc(sbio->dev->bdev, sctx->sectors_per_bio, - REQ_OP_WRITE, GFP_NOFS); - } - sbio->bio->bi_private = sbio; - sbio->bio->bi_end_io = scrub_wr_bio_end_io; - sbio->bio->bi_iter.bi_sector = sbio->physical >> 9; - sbio->status = 0; - } else if (sbio->physical + sbio->sector_count * sectorsize != - sblock->physical_for_dev_replace + sector->offset || - sbio->logical + sbio->sector_count * sectorsize != - sblock->logical + sector->offset) { - scrub_wr_submit(sctx); - goto again; - } - - ret = bio_add_scrub_sector(sbio->bio, sector, sectorsize); - if (ret != sectorsize) { - if (sbio->sector_count < 1) { - bio_put(sbio->bio); - sbio->bio = NULL; - mutex_unlock(&sctx->wr_lock); - return -EIO; - } - scrub_wr_submit(sctx); - goto again; - } - - sbio->sectors[sbio->sector_count] = sector; - scrub_sector_get(sector); - /* - * Since ssector no longer holds a page, but uses sblock::pages, we - * have to ensure the sblock had not been freed before our write bio - * finished. - */ - scrub_block_get(sector->sblock); - - sbio->sector_count++; - if (sbio->sector_count == sctx->sectors_per_bio) - scrub_wr_submit(sctx); - mutex_unlock(&sctx->wr_lock); - - return 0; -} - -static void scrub_wr_submit(struct scrub_ctx *sctx) -{ - struct scrub_bio *sbio; - - if (!sctx->wr_curr_bio) - return; - - sbio = sctx->wr_curr_bio; - sctx->wr_curr_bio = NULL; - scrub_pending_bio_inc(sctx); - /* process all writes in a single worker thread. Then the block layer - * orders the requests before sending them to the driver which - * doubled the write performance on spinning disks when measured - * with Linux 3.5 */ - btrfsic_check_bio(sbio->bio); - submit_bio(sbio->bio); - - if (btrfs_is_zoned(sctx->fs_info)) - sctx->write_pointer = sbio->physical + sbio->sector_count * - sctx->fs_info->sectorsize; -} - -static void scrub_wr_bio_end_io(struct bio *bio) -{ - struct scrub_bio *sbio = bio->bi_private; - struct btrfs_fs_info *fs_info = sbio->dev->fs_info; - - sbio->status = bio->bi_status; - sbio->bio = bio; - - INIT_WORK(&sbio->work, scrub_wr_bio_end_io_worker); - queue_work(fs_info->scrub_wr_completion_workers, &sbio->work); -} - -static void scrub_wr_bio_end_io_worker(struct work_struct *work) -{ - struct scrub_bio *sbio = container_of(work, struct scrub_bio, work); - struct scrub_ctx *sctx = sbio->sctx; - int i; - - ASSERT(sbio->sector_count <= SCRUB_SECTORS_PER_BIO); - if (sbio->status) { - struct btrfs_dev_replace *dev_replace = - &sbio->sctx->fs_info->dev_replace; - - for (i = 0; i < sbio->sector_count; i++) { - struct scrub_sector *sector = sbio->sectors[i]; - - sector->io_error = 1; - atomic64_inc(&dev_replace->num_write_errors); - } - } - - /* - * In scrub_add_sector_to_wr_bio() we grab extra ref for sblock, now in - * endio we should put the sblock. - */ - for (i = 0; i < sbio->sector_count; i++) { - scrub_block_put(sbio->sectors[i]->sblock); - scrub_sector_put(sbio->sectors[i]); - } - - bio_put(sbio->bio); - kfree(sbio); - scrub_pending_bio_dec(sctx); -} - static int scrub_checksum(struct scrub_block *sblock) { u64 flags; @@ -2927,7 +2733,7 @@ static void scrub_block_put(struct scrub_block *sblock) } } -static void scrub_sector_get(struct scrub_sector *sector) +void scrub_sector_get(struct scrub_sector *sector) { atomic_inc(§or->refs); } @@ -3130,21 +2936,12 @@ static void scrub_bio_end_io_worker(struct work_struct *work) sctx->first_free = sbio->index; spin_unlock(&sctx->list_lock); - if (sctx->is_dev_replace && sctx->flush_all_writes) { - mutex_lock(&sctx->wr_lock); - scrub_wr_submit(sctx); - mutex_unlock(&sctx->wr_lock); - } - scrub_pending_bio_dec(sctx); } static void scrub_block_complete(struct scrub_block *sblock) { - int corrupted = 0; - if (!sblock->no_io_error_seen) { - corrupted = 1; scrub_handle_errored_block(sblock); } else { /* @@ -3152,9 +2949,7 @@ static void scrub_block_complete(struct scrub_block *sblock) * dev replace case, otherwise write here in dev replace * case. */ - corrupted = scrub_checksum(sblock); - if (!corrupted && sblock->sctx->is_dev_replace) - scrub_write_block_to_dev_replace(sblock); + scrub_checksum(sblock); } } @@ -3937,14 +3732,11 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx, /* Paused? */ if (atomic_read(&fs_info->scrub_pause_req)) { /* Push queued extents */ - sctx->flush_all_writes = true; scrub_submit(sctx); mutex_lock(&sctx->wr_lock); - scrub_wr_submit(sctx); mutex_unlock(&sctx->wr_lock); wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); - sctx->flush_all_writes = false; scrub_blocked_if_needed(fs_info); } /* Block group removed? */ @@ -4081,7 +3873,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, mutex_lock(&sctx->wr_lock); sctx->write_pointer = physical; mutex_unlock(&sctx->wr_lock); - sctx->flush_all_writes = true; } /* Prepare the extra data stripes used by RAID56. */ @@ -4191,9 +3982,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, out: /* push queued extents */ scrub_submit(sctx); - mutex_lock(&sctx->wr_lock); - scrub_wr_submit(sctx); - mutex_unlock(&sctx->wr_lock); flush_scrub_stripes(sctx); if (sctx->raid56_data_stripes) { for (int i = 0; i < nr_data_stripes(map); i++) @@ -4529,11 +4317,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, * write requests are really completed when bios_in_flight * changes to 0. */ - sctx->flush_all_writes = true; scrub_submit(sctx); - mutex_lock(&sctx->wr_lock); - scrub_wr_submit(sctx); - mutex_unlock(&sctx->wr_lock); wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); @@ -4547,7 +4331,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, */ wait_event(sctx->list_wait, atomic_read(&sctx->workers_pending) == 0); - sctx->flush_all_writes = false; scrub_pause_off(fs_info); diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index d23068e741fb..f47492e78e1c 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -19,5 +19,6 @@ struct scrub_sector; int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum); int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx, struct scrub_sector *sector); +void scrub_sector_get(struct scrub_sector *sector); #endif From patchwork Wed Mar 29 07:09:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13191992 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69BA1C6FD18 for ; Wed, 29 Mar 2023 07:10:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229782AbjC2HKT (ORCPT ); Wed, 29 Mar 2023 03:10:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229852AbjC2HKS (ORCPT ); Wed, 29 Mar 2023 03:10:18 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDADB269E for ; Wed, 29 Mar 2023 00:10:13 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id AD67821A3A for ; Wed, 29 Mar 2023 07:10:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1680073812; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jVCyEtiRY4T5m13/FXJ+9NKqhj928g+pB9uu7fJFALo=; b=YfH2rhmEUkrVrxIkSc+d6LejwA8+siHke6BFCVhK3UPIQEGX652vkbUuv8wLFw98qk55x4 xz1Ta7qOq9MRwtZzME0snHwLFG8bxoapzmSw+KWk2GNANko0x5eZRNCaiL12/TkvyJYMI1 FihtAXhRKWEknYnKZmaYKaDLR59K3BU= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1C286138FF for ; Wed, 29 Mar 2023 07:10:11 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id cOkKN1PkI2T4NQAAMHmgww (envelope-from ) for ; Wed, 29 Mar 2023 07:10:11 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 4/6] btrfs: scrub: remove the old scrub recheck code Date: Wed, 29 Mar 2023 15:09:48 +0800 Message-Id: <503fd9d09b0954bbde645a2bbeadb3349e443021.1680073696.git.wqu@suse.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The old scrub code has different entrance to verify the content, and since we have removed the writeback path, now we can start removing the re-check part, including: - scrub_recover structure - scrub_sector::recover member - function scrub_setup_recheck_block() - function scrub_recheck_block() - function scrub_recheck_block_checksum() - function scrub_repair_block_group_good_copy() - function scrub_repair_sector_from_good_copy() - function scrub_is_page_on_raid56() - function full_stripe_lock() - function search_full_stripe_lock() - function get_full_stripe_logical() - function insert_full_stripe_lock() - function lock_full_stripe() - function unlock_full_stripe() - btrfs_block_group::full_stripe_locks_root member - btrfs_full_stripe_locks_tree structure This infrastructure is to ensure RAID56 scrub is properly handling recovery and P/Q scrub correctly. This is no longer needed, before P/Q scrub we will wait for all the involved data stripes to be scrubbed first, and RAID56 code has internal lock to ensure no race in the same full stripe. - function scrub_print_warning() - function scrub_get_recover() - function scrub_put_recover() - function scrub_handle_errored_block() - function scrub_setup_recheck_block() - function scrub_bio_wait_endio() - function scrub_submit_raid56_bio_wait() - function scrub_recheck_block_on_raid56() - function scrub_recheck_block() - function scrub_recheck_block_checksum() - function scrub_repair_block_from_good_copy() - function scrub_repair_sector_from_good_copy() And two more functions exported temporarily for later cleanup: - alloc_scrub_sector() - alloc_scrub_block() Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 11 - fs/btrfs/block-group.h | 8 - fs/btrfs/scrub.c | 997 +---------------------------------------- fs/btrfs/scrub.h | 7 + 4 files changed, 14 insertions(+), 1009 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 46a8ca24afaa..f1dd7ee7ad96 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -160,15 +160,6 @@ void btrfs_put_block_group(struct btrfs_block_group *cache) btrfs_discard_cancel_work(&cache->fs_info->discard_ctl, cache); - /* - * If not empty, someone is still holding mutex of - * full_stripe_lock, which can only be released by caller. - * And it will definitely cause use-after-free when caller - * tries to release full stripe lock. - * - * No better way to resolve, but only to warn. - */ - WARN_ON(!RB_EMPTY_ROOT(&cache->full_stripe_locks_root.root)); kfree(cache->free_space_ctl); kfree(cache->physical_map); kfree(cache); @@ -2124,8 +2115,6 @@ static struct btrfs_block_group *btrfs_create_block_group_cache( btrfs_init_free_space_ctl(cache, cache->free_space_ctl); atomic_set(&cache->frozen, 0); mutex_init(&cache->free_space_lock); - cache->full_stripe_locks_root.root = RB_ROOT; - mutex_init(&cache->full_stripe_locks_root.lock); return cache; } diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 6e4a0b429ac3..c462aa578d61 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -94,11 +94,6 @@ struct btrfs_caching_control { /* * Tree to record all locked full stripes of a RAID5/6 block group */ -struct btrfs_full_stripe_locks_tree { - struct rb_root root; - struct mutex lock; -}; - struct btrfs_block_group { struct btrfs_fs_info *fs_info; struct inode *inode; @@ -229,9 +224,6 @@ struct btrfs_block_group { */ int swap_extents; - /* Record locked full stripes for RAID5/6 block group */ - struct btrfs_full_stripe_locks_tree full_stripe_locks_root; - /* * Allocation offset for the block group to implement sequential * allocation. This is used only on a zoned filesystem. diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 6e981b77c73f..63182aed390f 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -183,12 +183,6 @@ struct scrub_stripe { struct work_struct work; }; -struct scrub_recover { - refcount_t refs; - struct btrfs_io_context *bioc; - u64 map_length; -}; - struct scrub_sector { struct scrub_block *sblock; struct list_head list; @@ -200,8 +194,6 @@ struct scrub_sector { unsigned int have_csum:1; unsigned int io_error:1; u8 csum[BTRFS_CSUM_SIZE]; - - struct scrub_recover *recover; }; struct scrub_bio { @@ -303,13 +295,6 @@ struct scrub_warning { struct btrfs_device *dev; }; -struct full_stripe_lock { - struct rb_node node; - u64 logical; - u64 refs; - struct mutex mutex; -}; - #ifndef CONFIG_64BIT /* This structure is for architectures whose (void *) is smaller than u64 */ struct scrub_page_private { @@ -406,11 +391,11 @@ static void wait_scrub_stripe_io(struct scrub_stripe *stripe) wait_event(stripe->io_wait, atomic_read(&stripe->pending_io) == 0); } -static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx, - struct btrfs_device *dev, - u64 logical, u64 physical, - u64 physical_for_dev_replace, - int mirror_num) +struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx, + struct btrfs_device *dev, + u64 logical, u64 physical, + u64 physical_for_dev_replace, + int mirror_num) { struct scrub_block *sblock; @@ -437,8 +422,7 @@ static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx, * * Will also allocate new pages for @sblock if needed. */ -static struct scrub_sector *alloc_scrub_sector(struct scrub_block *sblock, - u64 logical) +struct scrub_sector *alloc_scrub_sector(struct scrub_block *sblock, u64 logical) { const pgoff_t page_index = (logical - sblock->logical) >> PAGE_SHIFT; struct scrub_sector *ssector; @@ -534,17 +518,6 @@ static int bio_add_scrub_sector(struct bio *bio, struct scrub_sector *ssector, scrub_sector_get_page_offset(ssector)); } -static int scrub_setup_recheck_block(struct scrub_block *original_sblock, - struct scrub_block *sblocks_for_recheck[]); -static void scrub_recheck_block(struct btrfs_fs_info *fs_info, - struct scrub_block *sblock, - int retry_failed_mirror); -static void scrub_recheck_block_checksum(struct scrub_block *sblock); -static int scrub_repair_block_from_good_copy(struct scrub_block *sblock_bad, - struct scrub_block *sblock_good); -static int scrub_repair_sector_from_good_copy(struct scrub_block *sblock_bad, - struct scrub_block *sblock_good, - int sector_num, int force_write); static int scrub_checksum_data(struct scrub_block *sblock); static int scrub_checksum_tree_block(struct scrub_block *sblock); static int scrub_checksum_super(struct scrub_block *sblock); @@ -555,12 +528,6 @@ static void scrub_bio_end_io_worker(struct work_struct *work); static void scrub_block_complete(struct scrub_block *sblock); static void scrub_put_ctx(struct scrub_ctx *sctx); -static inline int scrub_is_page_on_raid56(struct scrub_sector *sector) -{ - return sector->recover && - (sector->recover->bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK); -} - static void scrub_pending_bio_inc(struct scrub_ctx *sctx) { refcount_inc(&sctx->refs); @@ -606,223 +573,6 @@ static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) scrub_pause_off(fs_info); } -/* - * Insert new full stripe lock into full stripe locks tree - * - * Return pointer to existing or newly inserted full_stripe_lock structure if - * everything works well. - * Return ERR_PTR(-ENOMEM) if we failed to allocate memory - * - * NOTE: caller must hold full_stripe_locks_root->lock before calling this - * function - */ -static struct full_stripe_lock *insert_full_stripe_lock( - struct btrfs_full_stripe_locks_tree *locks_root, - u64 fstripe_logical) -{ - struct rb_node **p; - struct rb_node *parent = NULL; - struct full_stripe_lock *entry; - struct full_stripe_lock *ret; - - lockdep_assert_held(&locks_root->lock); - - p = &locks_root->root.rb_node; - while (*p) { - parent = *p; - entry = rb_entry(parent, struct full_stripe_lock, node); - if (fstripe_logical < entry->logical) { - p = &(*p)->rb_left; - } else if (fstripe_logical > entry->logical) { - p = &(*p)->rb_right; - } else { - entry->refs++; - return entry; - } - } - - /* - * Insert new lock. - */ - ret = kmalloc(sizeof(*ret), GFP_KERNEL); - if (!ret) - return ERR_PTR(-ENOMEM); - ret->logical = fstripe_logical; - ret->refs = 1; - mutex_init(&ret->mutex); - - rb_link_node(&ret->node, parent, p); - rb_insert_color(&ret->node, &locks_root->root); - return ret; -} - -/* - * Search for a full stripe lock of a block group - * - * Return pointer to existing full stripe lock if found - * Return NULL if not found - */ -static struct full_stripe_lock *search_full_stripe_lock( - struct btrfs_full_stripe_locks_tree *locks_root, - u64 fstripe_logical) -{ - struct rb_node *node; - struct full_stripe_lock *entry; - - lockdep_assert_held(&locks_root->lock); - - node = locks_root->root.rb_node; - while (node) { - entry = rb_entry(node, struct full_stripe_lock, node); - if (fstripe_logical < entry->logical) - node = node->rb_left; - else if (fstripe_logical > entry->logical) - node = node->rb_right; - else - return entry; - } - return NULL; -} - -/* - * Helper to get full stripe logical from a normal bytenr. - * - * Caller must ensure @cache is a RAID56 block group. - */ -static u64 get_full_stripe_logical(struct btrfs_block_group *cache, u64 bytenr) -{ - u64 ret; - - /* - * Due to chunk item size limit, full stripe length should not be - * larger than U32_MAX. Just a sanity check here. - */ - WARN_ON_ONCE(cache->full_stripe_len >= U32_MAX); - - /* - * round_down() can only handle power of 2, while RAID56 full - * stripe length can be 64KiB * n, so we need to manually round down. - */ - ret = div64_u64(bytenr - cache->start, cache->full_stripe_len) * - cache->full_stripe_len + cache->start; - return ret; -} - -/* - * Lock a full stripe to avoid concurrency of recovery and read - * - * It's only used for profiles with parities (RAID5/6), for other profiles it - * does nothing. - * - * Return 0 if we locked full stripe covering @bytenr, with a mutex held. - * So caller must call unlock_full_stripe() at the same context. - * - * Return <0 if encounters error. - */ -static int lock_full_stripe(struct btrfs_fs_info *fs_info, u64 bytenr, - bool *locked_ret) -{ - struct btrfs_block_group *bg_cache; - struct btrfs_full_stripe_locks_tree *locks_root; - struct full_stripe_lock *existing; - u64 fstripe_start; - int ret = 0; - - *locked_ret = false; - bg_cache = btrfs_lookup_block_group(fs_info, bytenr); - if (!bg_cache) { - ASSERT(0); - return -ENOENT; - } - - /* Profiles not based on parity don't need full stripe lock */ - if (!(bg_cache->flags & BTRFS_BLOCK_GROUP_RAID56_MASK)) - goto out; - locks_root = &bg_cache->full_stripe_locks_root; - - fstripe_start = get_full_stripe_logical(bg_cache, bytenr); - - /* Now insert the full stripe lock */ - mutex_lock(&locks_root->lock); - existing = insert_full_stripe_lock(locks_root, fstripe_start); - mutex_unlock(&locks_root->lock); - if (IS_ERR(existing)) { - ret = PTR_ERR(existing); - goto out; - } - mutex_lock(&existing->mutex); - *locked_ret = true; -out: - btrfs_put_block_group(bg_cache); - return ret; -} - -/* - * Unlock a full stripe. - * - * NOTE: Caller must ensure it's the same context calling corresponding - * lock_full_stripe(). - * - * Return 0 if we unlock full stripe without problem. - * Return <0 for error - */ -static int unlock_full_stripe(struct btrfs_fs_info *fs_info, u64 bytenr, - bool locked) -{ - struct btrfs_block_group *bg_cache; - struct btrfs_full_stripe_locks_tree *locks_root; - struct full_stripe_lock *fstripe_lock; - u64 fstripe_start; - bool freeit = false; - int ret = 0; - - /* If we didn't acquire full stripe lock, no need to continue */ - if (!locked) - return 0; - - bg_cache = btrfs_lookup_block_group(fs_info, bytenr); - if (!bg_cache) { - ASSERT(0); - return -ENOENT; - } - if (!(bg_cache->flags & BTRFS_BLOCK_GROUP_RAID56_MASK)) - goto out; - - locks_root = &bg_cache->full_stripe_locks_root; - fstripe_start = get_full_stripe_logical(bg_cache, bytenr); - - mutex_lock(&locks_root->lock); - fstripe_lock = search_full_stripe_lock(locks_root, fstripe_start); - /* Unpaired unlock_full_stripe() detected */ - if (!fstripe_lock) { - WARN_ON(1); - ret = -ENOENT; - mutex_unlock(&locks_root->lock); - goto out; - } - - if (fstripe_lock->refs == 0) { - WARN_ON(1); - btrfs_warn(fs_info, "full stripe lock at %llu refcount underflow", - fstripe_lock->logical); - } else { - fstripe_lock->refs--; - } - - if (fstripe_lock->refs == 0) { - rb_erase(&fstripe_lock->node, &locks_root->root); - freeit = true; - } - mutex_unlock(&locks_root->lock); - - mutex_unlock(&fstripe_lock->mutex); - if (freeit) - kfree(fstripe_lock); -out: - btrfs_put_block_group(bg_cache); - return ret; -} - static void scrub_free_csums(struct scrub_ctx *sctx) { while (!list_empty(&sctx->csum_list)) { @@ -1103,444 +853,6 @@ static void scrub_print_common_warning(const char *errstr, struct btrfs_device * btrfs_free_path(path); } -static void scrub_print_warning(const char *errstr, struct scrub_block *sblock) -{ - scrub_print_common_warning(errstr, sblock->dev, - sblock->sectors[0]->flags & BTRFS_EXTENT_FLAG_SUPER, - sblock->logical, sblock->physical); -} - -static inline void scrub_get_recover(struct scrub_recover *recover) -{ - refcount_inc(&recover->refs); -} - -static inline void scrub_put_recover(struct btrfs_fs_info *fs_info, - struct scrub_recover *recover) -{ - if (refcount_dec_and_test(&recover->refs)) { - btrfs_bio_counter_dec(fs_info); - btrfs_put_bioc(recover->bioc); - kfree(recover); - } -} - -/* - * scrub_handle_errored_block gets called when either verification of the - * sectors failed or the bio failed to read, e.g. with EIO. In the latter - * case, this function handles all sectors in the bio, even though only one - * may be bad. - * The goal of this function is to repair the errored block by using the - * contents of one of the mirrors. - */ -static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) -{ - struct scrub_ctx *sctx = sblock_to_check->sctx; - struct btrfs_device *dev = sblock_to_check->dev; - struct btrfs_fs_info *fs_info; - u64 logical; - unsigned int failed_mirror_index; - unsigned int is_metadata; - unsigned int have_csum; - /* One scrub_block for each mirror */ - struct scrub_block *sblocks_for_recheck[BTRFS_MAX_MIRRORS] = { 0 }; - struct scrub_block *sblock_bad; - int ret; - int mirror_index; - int sector_num; - int success; - bool full_stripe_locked; - unsigned int nofs_flag; - static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL, - DEFAULT_RATELIMIT_BURST); - - BUG_ON(sblock_to_check->sector_count < 1); - fs_info = sctx->fs_info; - if (sblock_to_check->sectors[0]->flags & BTRFS_EXTENT_FLAG_SUPER) { - /* - * If we find an error in a super block, we just report it. - * They will get written with the next transaction commit - * anyway - */ - scrub_print_warning("super block error", sblock_to_check); - spin_lock(&sctx->stat_lock); - ++sctx->stat.super_errors; - spin_unlock(&sctx->stat_lock); - btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_CORRUPTION_ERRS); - return 0; - } - logical = sblock_to_check->logical; - ASSERT(sblock_to_check->mirror_num); - failed_mirror_index = sblock_to_check->mirror_num - 1; - is_metadata = !(sblock_to_check->sectors[0]->flags & - BTRFS_EXTENT_FLAG_DATA); - have_csum = sblock_to_check->sectors[0]->have_csum; - - if (!sctx->is_dev_replace && btrfs_repair_one_zone(fs_info, logical)) - return 0; - - /* - * We must use GFP_NOFS because the scrub task might be waiting for a - * worker task executing this function and in turn a transaction commit - * might be waiting the scrub task to pause (which needs to wait for all - * the worker tasks to complete before pausing). - * We do allocations in the workers through insert_full_stripe_lock() - * and scrub_add_sector_to_wr_bio(), which happens down the call chain of - * this function. - */ - nofs_flag = memalloc_nofs_save(); - /* - * For RAID5/6, race can happen for a different device scrub thread. - * For data corruption, Parity and Data threads will both try - * to recovery the data. - * Race can lead to doubly added csum error, or even unrecoverable - * error. - */ - ret = lock_full_stripe(fs_info, logical, &full_stripe_locked); - if (ret < 0) { - memalloc_nofs_restore(nofs_flag); - spin_lock(&sctx->stat_lock); - if (ret == -ENOMEM) - sctx->stat.malloc_errors++; - sctx->stat.read_errors++; - sctx->stat.uncorrectable_errors++; - spin_unlock(&sctx->stat_lock); - return ret; - } - - /* - * read all mirrors one after the other. This includes to - * re-read the extent or metadata block that failed (that was - * the cause that this fixup code is called) another time, - * sector by sector this time in order to know which sectors - * caused I/O errors and which ones are good (for all mirrors). - * It is the goal to handle the situation when more than one - * mirror contains I/O errors, but the errors do not - * overlap, i.e. the data can be repaired by selecting the - * sectors from those mirrors without I/O error on the - * particular sectors. One example (with blocks >= 2 * sectorsize) - * would be that mirror #1 has an I/O error on the first sector, - * the second sector is good, and mirror #2 has an I/O error on - * the second sector, but the first sector is good. - * Then the first sector of the first mirror can be repaired by - * taking the first sector of the second mirror, and the - * second sector of the second mirror can be repaired by - * copying the contents of the 2nd sector of the 1st mirror. - * One more note: if the sectors of one mirror contain I/O - * errors, the checksum cannot be verified. In order to get - * the best data for repairing, the first attempt is to find - * a mirror without I/O errors and with a validated checksum. - * Only if this is not possible, the sectors are picked from - * mirrors with I/O errors without considering the checksum. - * If the latter is the case, at the end, the checksum of the - * repaired area is verified in order to correctly maintain - * the statistics. - */ - for (mirror_index = 0; mirror_index < BTRFS_MAX_MIRRORS; mirror_index++) { - /* - * Note: the two members refs and outstanding_sectors are not - * used in the blocks that are used for the recheck procedure. - * - * But alloc_scrub_block() will initialize sblock::ref anyway, - * so we can use scrub_block_put() to clean them up. - * - * And here we don't setup the physical/dev for the sblock yet, - * they will be correctly initialized in scrub_setup_recheck_block(). - */ - sblocks_for_recheck[mirror_index] = alloc_scrub_block(sctx, NULL, - logical, 0, 0, mirror_index); - if (!sblocks_for_recheck[mirror_index]) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - sctx->stat.read_errors++; - sctx->stat.uncorrectable_errors++; - spin_unlock(&sctx->stat_lock); - btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_READ_ERRS); - goto out; - } - } - - /* Setup the context, map the logical blocks and alloc the sectors */ - ret = scrub_setup_recheck_block(sblock_to_check, sblocks_for_recheck); - if (ret) { - spin_lock(&sctx->stat_lock); - sctx->stat.read_errors++; - sctx->stat.uncorrectable_errors++; - spin_unlock(&sctx->stat_lock); - btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_READ_ERRS); - goto out; - } - BUG_ON(failed_mirror_index >= BTRFS_MAX_MIRRORS); - sblock_bad = sblocks_for_recheck[failed_mirror_index]; - - /* build and submit the bios for the failed mirror, check checksums */ - scrub_recheck_block(fs_info, sblock_bad, 1); - - if (!sblock_bad->header_error && !sblock_bad->checksum_error && - sblock_bad->no_io_error_seen) { - /* - * The error disappeared after reading sector by sector, or - * the area was part of a huge bio and other parts of the - * bio caused I/O errors, or the block layer merged several - * read requests into one and the error is caused by a - * different bio (usually one of the two latter cases is - * the cause) - */ - spin_lock(&sctx->stat_lock); - sctx->stat.unverified_errors++; - sblock_to_check->data_corrected = 1; - spin_unlock(&sctx->stat_lock); - - goto out; - } - - if (!sblock_bad->no_io_error_seen) { - spin_lock(&sctx->stat_lock); - sctx->stat.read_errors++; - spin_unlock(&sctx->stat_lock); - if (__ratelimit(&rs)) - scrub_print_warning("i/o error", sblock_to_check); - btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_READ_ERRS); - } else if (sblock_bad->checksum_error) { - spin_lock(&sctx->stat_lock); - sctx->stat.csum_errors++; - spin_unlock(&sctx->stat_lock); - if (__ratelimit(&rs)) - scrub_print_warning("checksum error", sblock_to_check); - btrfs_dev_stat_inc_and_print(dev, - BTRFS_DEV_STAT_CORRUPTION_ERRS); - } else if (sblock_bad->header_error) { - spin_lock(&sctx->stat_lock); - sctx->stat.verify_errors++; - spin_unlock(&sctx->stat_lock); - if (__ratelimit(&rs)) - scrub_print_warning("checksum/header error", - sblock_to_check); - if (sblock_bad->generation_error) - btrfs_dev_stat_inc_and_print(dev, - BTRFS_DEV_STAT_GENERATION_ERRS); - else - btrfs_dev_stat_inc_and_print(dev, - BTRFS_DEV_STAT_CORRUPTION_ERRS); - } - - if (sctx->readonly) { - ASSERT(!sctx->is_dev_replace); - goto out; - } - - /* - * now build and submit the bios for the other mirrors, check - * checksums. - * First try to pick the mirror which is completely without I/O - * errors and also does not have a checksum error. - * If one is found, and if a checksum is present, the full block - * that is known to contain an error is rewritten. Afterwards - * the block is known to be corrected. - * If a mirror is found which is completely correct, and no - * checksum is present, only those sectors are rewritten that had - * an I/O error in the block to be repaired, since it cannot be - * determined, which copy of the other sectors is better (and it - * could happen otherwise that a correct sector would be - * overwritten by a bad one). - */ - for (mirror_index = 0; ;mirror_index++) { - struct scrub_block *sblock_other; - - if (mirror_index == failed_mirror_index) - continue; - - /* raid56's mirror can be more than BTRFS_MAX_MIRRORS */ - if (!scrub_is_page_on_raid56(sblock_bad->sectors[0])) { - if (mirror_index >= BTRFS_MAX_MIRRORS) - break; - if (!sblocks_for_recheck[mirror_index]->sector_count) - break; - - sblock_other = sblocks_for_recheck[mirror_index]; - } else { - struct scrub_recover *r = sblock_bad->sectors[0]->recover; - int max_allowed = r->bioc->num_stripes - r->bioc->replace_nr_stripes; - - if (mirror_index >= max_allowed) - break; - if (!sblocks_for_recheck[1]->sector_count) - break; - - ASSERT(failed_mirror_index == 0); - sblock_other = sblocks_for_recheck[1]; - sblock_other->mirror_num = 1 + mirror_index; - } - - /* build and submit the bios, check checksums */ - scrub_recheck_block(fs_info, sblock_other, 0); - - if (!sblock_other->header_error && - !sblock_other->checksum_error && - sblock_other->no_io_error_seen) { - if (sctx->is_dev_replace) { - goto corrected_error; - } else { - ret = scrub_repair_block_from_good_copy( - sblock_bad, sblock_other); - if (!ret) - goto corrected_error; - } - } - } - - if (sblock_bad->no_io_error_seen && !sctx->is_dev_replace) - goto did_not_correct_error; - - /* - * In case of I/O errors in the area that is supposed to be - * repaired, continue by picking good copies of those sectors. - * Select the good sectors from mirrors to rewrite bad sectors from - * the area to fix. Afterwards verify the checksum of the block - * that is supposed to be repaired. This verification step is - * only done for the purpose of statistic counting and for the - * final scrub report, whether errors remain. - * A perfect algorithm could make use of the checksum and try - * all possible combinations of sectors from the different mirrors - * until the checksum verification succeeds. For example, when - * the 2nd sector of mirror #1 faces I/O errors, and the 2nd sector - * of mirror #2 is readable but the final checksum test fails, - * then the 2nd sector of mirror #3 could be tried, whether now - * the final checksum succeeds. But this would be a rare - * exception and is therefore not implemented. At least it is - * avoided that the good copy is overwritten. - * A more useful improvement would be to pick the sectors - * without I/O error based on sector sizes (512 bytes on legacy - * disks) instead of on sectorsize. Then maybe 512 byte of one - * mirror could be repaired by taking 512 byte of a different - * mirror, even if other 512 byte sectors in the same sectorsize - * area are unreadable. - */ - success = 1; - for (sector_num = 0; sector_num < sblock_bad->sector_count; - sector_num++) { - struct scrub_sector *sector_bad = sblock_bad->sectors[sector_num]; - struct scrub_block *sblock_other = NULL; - - /* Skip no-io-error sectors in scrub */ - if (!sector_bad->io_error && !sctx->is_dev_replace) - continue; - - if (scrub_is_page_on_raid56(sblock_bad->sectors[0])) { - /* - * In case of dev replace, if raid56 rebuild process - * didn't work out correct data, then copy the content - * in sblock_bad to make sure target device is identical - * to source device, instead of writing garbage data in - * sblock_for_recheck array to target device. - */ - sblock_other = NULL; - } else if (sector_bad->io_error) { - /* Try to find no-io-error sector in mirrors */ - for (mirror_index = 0; - mirror_index < BTRFS_MAX_MIRRORS && - sblocks_for_recheck[mirror_index]->sector_count > 0; - mirror_index++) { - if (!sblocks_for_recheck[mirror_index]-> - sectors[sector_num]->io_error) { - sblock_other = sblocks_for_recheck[mirror_index]; - break; - } - } - if (!sblock_other) - success = 0; - } - - if (sctx->is_dev_replace) { - /* - * Did not find a mirror to fetch the sector from. - * scrub_write_sector_to_dev_replace() handles this - * case (sector->io_error), by filling the block with - * zeros before submitting the write request - */ - if (!sblock_other) - sblock_other = sblock_bad; - } else if (sblock_other) { - ret = scrub_repair_sector_from_good_copy(sblock_bad, - sblock_other, - sector_num, 0); - if (0 == ret) - sector_bad->io_error = 0; - else - success = 0; - } - } - - if (success && !sctx->is_dev_replace) { - if (is_metadata || have_csum) { - /* - * need to verify the checksum now that all - * sectors on disk are repaired (the write - * request for data to be repaired is on its way). - * Just be lazy and use scrub_recheck_block() - * which re-reads the data before the checksum - * is verified, but most likely the data comes out - * of the page cache. - */ - scrub_recheck_block(fs_info, sblock_bad, 1); - if (!sblock_bad->header_error && - !sblock_bad->checksum_error && - sblock_bad->no_io_error_seen) - goto corrected_error; - else - goto did_not_correct_error; - } else { -corrected_error: - spin_lock(&sctx->stat_lock); - sctx->stat.corrected_errors++; - sblock_to_check->data_corrected = 1; - spin_unlock(&sctx->stat_lock); - btrfs_err_rl_in_rcu(fs_info, - "fixed up error at logical %llu on dev %s", - logical, btrfs_dev_name(dev)); - } - } else { -did_not_correct_error: - spin_lock(&sctx->stat_lock); - sctx->stat.uncorrectable_errors++; - spin_unlock(&sctx->stat_lock); - btrfs_err_rl_in_rcu(fs_info, - "unable to fixup (regular) error at logical %llu on dev %s", - logical, btrfs_dev_name(dev)); - } - -out: - for (mirror_index = 0; mirror_index < BTRFS_MAX_MIRRORS; mirror_index++) { - struct scrub_block *sblock = sblocks_for_recheck[mirror_index]; - struct scrub_recover *recover; - int sector_index; - - /* Not allocated, continue checking the next mirror */ - if (!sblock) - continue; - - for (sector_index = 0; sector_index < sblock->sector_count; - sector_index++) { - /* - * Here we just cleanup the recover, each sector will be - * properly cleaned up by later scrub_block_put() - */ - recover = sblock->sectors[sector_index]->recover; - if (recover) { - scrub_put_recover(fs_info, recover); - sblock->sectors[sector_index]->recover = NULL; - } - } - scrub_block_put(sblock); - } - - ret = unlock_full_stripe(fs_info, logical, full_stripe_locked); - memalloc_nofs_restore(nofs_flag); - if (ret < 0) - return ret; - return 0; -} - static inline int scrub_nr_raid_mirrors(struct btrfs_io_context *bioc) { if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID5) @@ -1583,224 +895,6 @@ static inline void scrub_stripe_index_and_offset(u64 logical, u64 map_type, } } -static int scrub_setup_recheck_block(struct scrub_block *original_sblock, - struct scrub_block *sblocks_for_recheck[]) -{ - struct scrub_ctx *sctx = original_sblock->sctx; - struct btrfs_fs_info *fs_info = sctx->fs_info; - u64 logical = original_sblock->logical; - u64 length = original_sblock->sector_count << fs_info->sectorsize_bits; - u64 generation = original_sblock->sectors[0]->generation; - u64 flags = original_sblock->sectors[0]->flags; - u64 have_csum = original_sblock->sectors[0]->have_csum; - struct scrub_recover *recover; - struct btrfs_io_context *bioc; - u64 sublen; - u64 mapped_length; - u64 stripe_offset; - int stripe_index; - int sector_index = 0; - int mirror_index; - int nmirrors; - int ret; - - while (length > 0) { - sublen = min_t(u64, length, fs_info->sectorsize); - mapped_length = sublen; - bioc = NULL; - - /* - * With a length of sectorsize, each returned stripe represents - * one mirror - */ - btrfs_bio_counter_inc_blocked(fs_info); - ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, - logical, &mapped_length, &bioc); - if (ret || !bioc || mapped_length < sublen) { - btrfs_put_bioc(bioc); - btrfs_bio_counter_dec(fs_info); - return -EIO; - } - - recover = kzalloc(sizeof(struct scrub_recover), GFP_KERNEL); - if (!recover) { - btrfs_put_bioc(bioc); - btrfs_bio_counter_dec(fs_info); - return -ENOMEM; - } - - refcount_set(&recover->refs, 1); - recover->bioc = bioc; - recover->map_length = mapped_length; - - ASSERT(sector_index < SCRUB_MAX_SECTORS_PER_BLOCK); - - nmirrors = min(scrub_nr_raid_mirrors(bioc), BTRFS_MAX_MIRRORS); - - for (mirror_index = 0; mirror_index < nmirrors; - mirror_index++) { - struct scrub_block *sblock; - struct scrub_sector *sector; - - sblock = sblocks_for_recheck[mirror_index]; - sblock->sctx = sctx; - - sector = alloc_scrub_sector(sblock, logical); - if (!sector) { - spin_lock(&sctx->stat_lock); - sctx->stat.malloc_errors++; - spin_unlock(&sctx->stat_lock); - scrub_put_recover(fs_info, recover); - return -ENOMEM; - } - sector->flags = flags; - sector->generation = generation; - sector->have_csum = have_csum; - if (have_csum) - memcpy(sector->csum, - original_sblock->sectors[0]->csum, - sctx->fs_info->csum_size); - - scrub_stripe_index_and_offset(logical, - bioc->map_type, - bioc->full_stripe_logical, - bioc->num_stripes - - bioc->replace_nr_stripes, - mirror_index, - &stripe_index, - &stripe_offset); - /* - * We're at the first sector, also populate @sblock - * physical and dev. - */ - if (sector_index == 0) { - sblock->physical = - bioc->stripes[stripe_index].physical + - stripe_offset; - sblock->dev = bioc->stripes[stripe_index].dev; - sblock->physical_for_dev_replace = - original_sblock->physical_for_dev_replace; - } - - BUG_ON(sector_index >= original_sblock->sector_count); - scrub_get_recover(recover); - sector->recover = recover; - } - scrub_put_recover(fs_info, recover); - length -= sublen; - logical += sublen; - sector_index++; - } - - return 0; -} - -static void scrub_bio_wait_endio(struct bio *bio) -{ - complete(bio->bi_private); -} - -static int scrub_submit_raid56_bio_wait(struct btrfs_fs_info *fs_info, - struct bio *bio, - struct scrub_sector *sector) -{ - DECLARE_COMPLETION_ONSTACK(done); - - bio->bi_iter.bi_sector = (sector->offset + sector->sblock->logical) >> - SECTOR_SHIFT; - bio->bi_private = &done; - bio->bi_end_io = scrub_bio_wait_endio; - raid56_parity_recover(bio, sector->recover->bioc, sector->sblock->mirror_num); - - wait_for_completion_io(&done); - return blk_status_to_errno(bio->bi_status); -} - -static void scrub_recheck_block_on_raid56(struct btrfs_fs_info *fs_info, - struct scrub_block *sblock) -{ - struct scrub_sector *first_sector = sblock->sectors[0]; - struct bio *bio; - int i; - - /* All sectors in sblock belong to the same stripe on the same device. */ - ASSERT(sblock->dev); - if (!sblock->dev->bdev) - goto out; - - bio = bio_alloc(sblock->dev->bdev, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS); - - for (i = 0; i < sblock->sector_count; i++) { - struct scrub_sector *sector = sblock->sectors[i]; - - bio_add_scrub_sector(bio, sector, fs_info->sectorsize); - } - - if (scrub_submit_raid56_bio_wait(fs_info, bio, first_sector)) { - bio_put(bio); - goto out; - } - - bio_put(bio); - - scrub_recheck_block_checksum(sblock); - - return; -out: - for (i = 0; i < sblock->sector_count; i++) - sblock->sectors[i]->io_error = 1; - - sblock->no_io_error_seen = 0; -} - -/* - * This function will check the on disk data for checksum errors, header errors - * and read I/O errors. If any I/O errors happen, the exact sectors which are - * errored are marked as being bad. The goal is to enable scrub to take those - * sectors that are not errored from all the mirrors so that the sectors that - * are errored in the just handled mirror can be repaired. - */ -static void scrub_recheck_block(struct btrfs_fs_info *fs_info, - struct scrub_block *sblock, - int retry_failed_mirror) -{ - int i; - - sblock->no_io_error_seen = 1; - - /* short cut for raid56 */ - if (!retry_failed_mirror && scrub_is_page_on_raid56(sblock->sectors[0])) - return scrub_recheck_block_on_raid56(fs_info, sblock); - - for (i = 0; i < sblock->sector_count; i++) { - struct scrub_sector *sector = sblock->sectors[i]; - struct bio bio; - struct bio_vec bvec; - - if (sblock->dev->bdev == NULL) { - sector->io_error = 1; - sblock->no_io_error_seen = 0; - continue; - } - - bio_init(&bio, sblock->dev->bdev, &bvec, 1, REQ_OP_READ); - bio_add_scrub_sector(&bio, sector, fs_info->sectorsize); - bio.bi_iter.bi_sector = (sblock->physical + sector->offset) >> - SECTOR_SHIFT; - - btrfsic_check_bio(&bio); - if (submit_bio_wait(&bio)) { - sector->io_error = 1; - sblock->no_io_error_seen = 0; - } - - bio_uninit(&bio); - } - - if (sblock->no_io_error_seen) - scrub_recheck_block_checksum(sblock); -} - static inline int scrub_check_fsid(u8 fsid[], struct scrub_sector *sector) { struct btrfs_fs_devices *fs_devices = sector->sblock->dev->fs_devices; @@ -1810,77 +904,6 @@ static inline int scrub_check_fsid(u8 fsid[], struct scrub_sector *sector) return !ret; } -static void scrub_recheck_block_checksum(struct scrub_block *sblock) -{ - sblock->header_error = 0; - sblock->checksum_error = 0; - sblock->generation_error = 0; - - if (sblock->sectors[0]->flags & BTRFS_EXTENT_FLAG_DATA) - scrub_checksum_data(sblock); - else - scrub_checksum_tree_block(sblock); -} - -static int scrub_repair_block_from_good_copy(struct scrub_block *sblock_bad, - struct scrub_block *sblock_good) -{ - int i; - int ret = 0; - - for (i = 0; i < sblock_bad->sector_count; i++) { - int ret_sub; - - ret_sub = scrub_repair_sector_from_good_copy(sblock_bad, - sblock_good, i, 1); - if (ret_sub) - ret = ret_sub; - } - - return ret; -} - -static int scrub_repair_sector_from_good_copy(struct scrub_block *sblock_bad, - struct scrub_block *sblock_good, - int sector_num, int force_write) -{ - struct scrub_sector *sector_bad = sblock_bad->sectors[sector_num]; - struct scrub_sector *sector_good = sblock_good->sectors[sector_num]; - struct btrfs_fs_info *fs_info = sblock_bad->sctx->fs_info; - const u32 sectorsize = fs_info->sectorsize; - - if (force_write || sblock_bad->header_error || - sblock_bad->checksum_error || sector_bad->io_error) { - struct bio bio; - struct bio_vec bvec; - int ret; - - if (!sblock_bad->dev->bdev) { - btrfs_warn_rl(fs_info, - "scrub_repair_page_from_good_copy(bdev == NULL) is unexpected"); - return -EIO; - } - - bio_init(&bio, sblock_bad->dev->bdev, &bvec, 1, REQ_OP_WRITE); - bio.bi_iter.bi_sector = (sblock_bad->physical + - sector_bad->offset) >> SECTOR_SHIFT; - ret = bio_add_scrub_sector(&bio, sector_good, sectorsize); - - btrfsic_check_bio(&bio); - ret = submit_bio_wait(&bio); - bio_uninit(&bio); - - if (ret) { - btrfs_dev_stat_inc_and_print(sblock_bad->dev, - BTRFS_DEV_STAT_WRITE_ERRS); - atomic64_inc(&fs_info->dev_replace.num_write_errors); - return -EIO; - } - } - - return 0; -} - static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) { int ret = 0; @@ -1936,9 +959,6 @@ static int scrub_checksum(struct scrub_block *sblock) ret = scrub_checksum_super(sblock); else WARN_ON(1); - if (ret) - scrub_handle_errored_block(sblock); - return ret; } @@ -2941,16 +1961,13 @@ static void scrub_bio_end_io_worker(struct work_struct *work) static void scrub_block_complete(struct scrub_block *sblock) { - if (!sblock->no_io_error_seen) { - scrub_handle_errored_block(sblock); - } else { + if (sblock->no_io_error_seen) /* * if has checksum error, write via repair mechanism in * dev replace case, otherwise write here in dev replace * case. */ scrub_checksum(sblock); - } } static void drop_csum_range(struct scrub_ctx *sctx, struct btrfs_ordered_sum *sum) diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index f47492e78e1c..7d1982893363 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -16,9 +16,16 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, /* Temporary declaration, would be deleted later. */ struct scrub_ctx; struct scrub_sector; +struct scrub_block; int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum); int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx, struct scrub_sector *sector); void scrub_sector_get(struct scrub_sector *sector); +struct scrub_sector *alloc_scrub_sector(struct scrub_block *sblock, u64 logical); +struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx, + struct btrfs_device *dev, + u64 logical, u64 physical, + u64 physical_for_dev_replace, + int mirror_num); #endif From patchwork Wed Mar 29 07:09:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13191993 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D5E1C74A5B for ; Wed, 29 Mar 2023 07:10:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229861AbjC2HKU (ORCPT ); Wed, 29 Mar 2023 03:10:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229851AbjC2HKS (ORCPT ); Wed, 29 Mar 2023 03:10:18 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7B33199 for ; Wed, 29 Mar 2023 00:10:14 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id A6F981FE0F for ; Wed, 29 Mar 2023 07:10:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1680073813; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Tmdsy6jZUtUjcUcDktd4m0tqGAUYmFWzmMLydyMY27k=; b=Koo2HUVUuno4V1lqL/rrkYTU4kmiVBVIPPgxZVta/A6qY/NvurBOgnzhpjf7ioniYStsyG R72Fz5orulrrnuw8irNoinKmHX5fNbXaEwjaYU302vJex7LMOa3rtQuI9T53EKGLvn9PWw +XyESeVfeTjWC6YKZovLr26tCFwzJkg= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 136D2138FF for ; Wed, 29 Mar 2023 07:10:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id SH26NFTkI2T4NQAAMHmgww (envelope-from ) for ; Wed, 29 Mar 2023 07:10:12 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 5/6] btrfs: scrub: remove scrub_block and scrub_sector structures Date: Wed, 29 Mar 2023 15:09:49 +0800 Message-Id: <43abb4fe2a827c06b14d94d241cd05c094514c0f.1680073696.git.wqu@suse.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Those two structures are used to represent a bunch of sectors for scrub, but now they are fully replaced by scrub_stripe in one go, so we can remove them now. This involves: - structure scrub_block - structure scrub_sector - structure scrub_page_private - function attach_scrub_page_private() - function detach_scrub_page_private() Now we no longer need to use page::private to handle subpage. - function alloc_scrub_block() - function alloc_scrub_sector() - function scrub_sector_get_page() - function scrub_sector_get_page_offset() - function scrub_sector_get_kaddr() - function bio_add_scrub_sector() - function scrub_checksum_data() - function scrub_checksum_tree_block() - function scrub_checksum_super() - function scrub_check_fsid() - function scrub_block_get() - function scrub_block_put() - function scrub_sector_get() - function scrub_sector_put() - function scrub_bio_end_io() - function scrub_block_complete() - function scrub_add_sector_to_rd_bio() Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 564 ----------------------------------------------- fs/btrfs/scrub.h | 10 - 2 files changed, 574 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 63182aed390f..8c44a2fa0a19 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -38,7 +38,6 @@ * - add a mode to also read unallocated space */ -struct scrub_block; struct scrub_ctx; /* @@ -183,19 +182,6 @@ struct scrub_stripe { struct work_struct work; }; -struct scrub_sector { - struct scrub_block *sblock; - struct list_head list; - u64 flags; /* extent flags */ - u64 generation; - /* Offset in bytes to @sblock. */ - u32 offset; - atomic_t refs; - unsigned int have_csum:1; - unsigned int io_error:1; - u8 csum[BTRFS_CSUM_SIZE]; -}; - struct scrub_bio { int index; struct scrub_ctx *sctx; @@ -204,45 +190,11 @@ struct scrub_bio { blk_status_t status; u64 logical; u64 physical; - struct scrub_sector *sectors[SCRUB_SECTORS_PER_BIO]; int sector_count; int next_free; struct work_struct work; }; -struct scrub_block { - /* - * Each page will have its page::private used to record the logical - * bytenr. - */ - struct page *pages[SCRUB_MAX_PAGES]; - struct scrub_sector *sectors[SCRUB_MAX_SECTORS_PER_BLOCK]; - struct btrfs_device *dev; - /* Logical bytenr of the sblock */ - u64 logical; - u64 physical; - u64 physical_for_dev_replace; - /* Length of sblock in bytes */ - u32 len; - int sector_count; - int mirror_num; - - atomic_t outstanding_sectors; - refcount_t refs; /* free mem on transition to zero */ - struct scrub_ctx *sctx; - struct { - unsigned int header_error:1; - unsigned int checksum_error:1; - unsigned int no_io_error_seen:1; - unsigned int generation_error:1; /* also sets header_error */ - - /* The following is for the data used to check parity */ - /* It is for the data with checksum */ - unsigned int data_corrected:1; - }; - struct work_struct work; -}; - struct scrub_ctx { struct scrub_bio *bios[SCRUB_BIOS_PER_SCTX]; struct scrub_stripe stripes[SCRUB_STRIPES_PER_SCTX]; @@ -295,44 +247,6 @@ struct scrub_warning { struct btrfs_device *dev; }; -#ifndef CONFIG_64BIT -/* This structure is for architectures whose (void *) is smaller than u64 */ -struct scrub_page_private { - u64 logical; -}; -#endif - -static int attach_scrub_page_private(struct page *page, u64 logical) -{ -#ifdef CONFIG_64BIT - attach_page_private(page, (void *)logical); - return 0; -#else - struct scrub_page_private *spp; - - spp = kmalloc(sizeof(*spp), GFP_KERNEL); - if (!spp) - return -ENOMEM; - spp->logical = logical; - attach_page_private(page, (void *)spp); - return 0; -#endif -} - -static void detach_scrub_page_private(struct page *page) -{ -#ifdef CONFIG_64BIT - detach_page_private(page); - return; -#else - struct scrub_page_private *spp; - - spp = detach_page_private(page); - kfree(spp); - return; -#endif -} - static void release_scrub_stripe(struct scrub_stripe *stripe) { if (!stripe) @@ -391,141 +305,7 @@ static void wait_scrub_stripe_io(struct scrub_stripe *stripe) wait_event(stripe->io_wait, atomic_read(&stripe->pending_io) == 0); } -struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx, - struct btrfs_device *dev, - u64 logical, u64 physical, - u64 physical_for_dev_replace, - int mirror_num) -{ - struct scrub_block *sblock; - - sblock = kzalloc(sizeof(*sblock), GFP_KERNEL); - if (!sblock) - return NULL; - refcount_set(&sblock->refs, 1); - sblock->sctx = sctx; - sblock->logical = logical; - sblock->physical = physical; - sblock->physical_for_dev_replace = physical_for_dev_replace; - sblock->dev = dev; - sblock->mirror_num = mirror_num; - sblock->no_io_error_seen = 1; - /* - * Scrub_block::pages will be allocated at alloc_scrub_sector() when - * the corresponding page is not allocated. - */ - return sblock; -} - -/* - * Allocate a new scrub sector and attach it to @sblock. - * - * Will also allocate new pages for @sblock if needed. - */ -struct scrub_sector *alloc_scrub_sector(struct scrub_block *sblock, u64 logical) -{ - const pgoff_t page_index = (logical - sblock->logical) >> PAGE_SHIFT; - struct scrub_sector *ssector; - - /* We must never have scrub_block exceed U32_MAX in size. */ - ASSERT(logical - sblock->logical < U32_MAX); - - ssector = kzalloc(sizeof(*ssector), GFP_KERNEL); - if (!ssector) - return NULL; - - /* Allocate a new page if the slot is not allocated */ - if (!sblock->pages[page_index]) { - int ret; - - sblock->pages[page_index] = alloc_page(GFP_KERNEL); - if (!sblock->pages[page_index]) { - kfree(ssector); - return NULL; - } - ret = attach_scrub_page_private(sblock->pages[page_index], - sblock->logical + (page_index << PAGE_SHIFT)); - if (ret < 0) { - kfree(ssector); - __free_page(sblock->pages[page_index]); - sblock->pages[page_index] = NULL; - return NULL; - } - } - - atomic_set(&ssector->refs, 1); - ssector->sblock = sblock; - /* The sector to be added should not be used */ - ASSERT(sblock->sectors[sblock->sector_count] == NULL); - ssector->offset = logical - sblock->logical; - - /* The sector count must be smaller than the limit */ - ASSERT(sblock->sector_count < SCRUB_MAX_SECTORS_PER_BLOCK); - - sblock->sectors[sblock->sector_count] = ssector; - sblock->sector_count++; - sblock->len += sblock->sctx->fs_info->sectorsize; - - return ssector; -} - -static struct page *scrub_sector_get_page(struct scrub_sector *ssector) -{ - struct scrub_block *sblock = ssector->sblock; - pgoff_t index; - /* - * When calling this function, ssector must be alreaday attached to the - * parent sblock. - */ - ASSERT(sblock); - - /* The range should be inside the sblock range */ - ASSERT(ssector->offset < sblock->len); - - index = ssector->offset >> PAGE_SHIFT; - ASSERT(index < SCRUB_MAX_PAGES); - ASSERT(sblock->pages[index]); - ASSERT(PagePrivate(sblock->pages[index])); - return sblock->pages[index]; -} - -static unsigned int scrub_sector_get_page_offset(struct scrub_sector *ssector) -{ - struct scrub_block *sblock = ssector->sblock; - - /* - * When calling this function, ssector must be already attached to the - * parent sblock. - */ - ASSERT(sblock); - - /* The range should be inside the sblock range */ - ASSERT(ssector->offset < sblock->len); - - return offset_in_page(ssector->offset); -} - -static char *scrub_sector_get_kaddr(struct scrub_sector *ssector) -{ - return page_address(scrub_sector_get_page(ssector)) + - scrub_sector_get_page_offset(ssector); -} - -static int bio_add_scrub_sector(struct bio *bio, struct scrub_sector *ssector, - unsigned int len) -{ - return bio_add_page(bio, scrub_sector_get_page(ssector), len, - scrub_sector_get_page_offset(ssector)); -} - -static int scrub_checksum_data(struct scrub_block *sblock); -static int scrub_checksum_tree_block(struct scrub_block *sblock); -static int scrub_checksum_super(struct scrub_block *sblock); -static void scrub_block_put(struct scrub_block *sblock); -static void scrub_sector_put(struct scrub_sector *sector); -static void scrub_bio_end_io(struct bio *bio); static void scrub_bio_end_io_worker(struct work_struct *work); -static void scrub_block_complete(struct scrub_block *sblock); static void scrub_put_ctx(struct scrub_ctx *sctx); static void scrub_pending_bio_inc(struct scrub_ctx *sctx) @@ -595,8 +375,6 @@ static noinline_for_stack void scrub_free_ctx(struct scrub_ctx *sctx) if (sctx->curr != -1) { struct scrub_bio *sbio = sctx->bios[sctx->curr]; - for (i = 0; i < sbio->sector_count; i++) - scrub_block_put(sbio->sectors[i]->sblock); bio_put(sbio->bio); } @@ -895,15 +673,6 @@ static inline void scrub_stripe_index_and_offset(u64 logical, u64 map_type, } } -static inline int scrub_check_fsid(u8 fsid[], struct scrub_sector *sector) -{ - struct btrfs_fs_devices *fs_devices = sector->sblock->dev->fs_devices; - int ret; - - ret = memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE); - return !ret; -} - static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) { int ret = 0; @@ -926,68 +695,6 @@ static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) return ret; } -static void scrub_block_get(struct scrub_block *sblock) -{ - refcount_inc(&sblock->refs); -} - -static int scrub_checksum(struct scrub_block *sblock) -{ - u64 flags; - int ret; - - /* - * No need to initialize these stats currently, - * because this function only use return value - * instead of these stats value. - * - * Todo: - * always use stats - */ - sblock->header_error = 0; - sblock->generation_error = 0; - sblock->checksum_error = 0; - - WARN_ON(sblock->sector_count < 1); - flags = sblock->sectors[0]->flags; - ret = 0; - if (flags & BTRFS_EXTENT_FLAG_DATA) - ret = scrub_checksum_data(sblock); - else if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) - ret = scrub_checksum_tree_block(sblock); - else if (flags & BTRFS_EXTENT_FLAG_SUPER) - ret = scrub_checksum_super(sblock); - else - WARN_ON(1); - return ret; -} - -static int scrub_checksum_data(struct scrub_block *sblock) -{ - struct scrub_ctx *sctx = sblock->sctx; - struct btrfs_fs_info *fs_info = sctx->fs_info; - SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); - u8 csum[BTRFS_CSUM_SIZE]; - struct scrub_sector *sector; - char *kaddr; - - BUG_ON(sblock->sector_count < 1); - sector = sblock->sectors[0]; - if (!sector->have_csum) - return 0; - - kaddr = scrub_sector_get_kaddr(sector); - - shash->tfm = fs_info->csum_shash; - crypto_shash_init(shash); - - crypto_shash_digest(shash, kaddr, fs_info->sectorsize, csum); - - if (memcmp(csum, sector->csum, fs_info->csum_size)) - sblock->checksum_error = 1; - return sblock->checksum_error; -} - static struct page *scrub_stripe_get_page(struct scrub_stripe *stripe, int sector_nr) { @@ -1602,168 +1309,6 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, } } -static int scrub_checksum_tree_block(struct scrub_block *sblock) -{ - struct scrub_ctx *sctx = sblock->sctx; - struct btrfs_header *h; - struct btrfs_fs_info *fs_info = sctx->fs_info; - SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); - u8 calculated_csum[BTRFS_CSUM_SIZE]; - u8 on_disk_csum[BTRFS_CSUM_SIZE]; - /* - * This is done in sectorsize steps even for metadata as there's a - * constraint for nodesize to be aligned to sectorsize. This will need - * to change so we don't misuse data and metadata units like that. - */ - const u32 sectorsize = sctx->fs_info->sectorsize; - const int num_sectors = fs_info->nodesize >> fs_info->sectorsize_bits; - int i; - struct scrub_sector *sector; - char *kaddr; - - BUG_ON(sblock->sector_count < 1); - - /* Each member in sectors is just one sector */ - ASSERT(sblock->sector_count == num_sectors); - - sector = sblock->sectors[0]; - kaddr = scrub_sector_get_kaddr(sector); - h = (struct btrfs_header *)kaddr; - memcpy(on_disk_csum, h->csum, sctx->fs_info->csum_size); - - /* - * we don't use the getter functions here, as we - * a) don't have an extent buffer and - * b) the page is already kmapped - */ - if (sblock->logical != btrfs_stack_header_bytenr(h)) { - sblock->header_error = 1; - btrfs_warn_rl(fs_info, - "tree block %llu mirror %u has bad bytenr, has %llu want %llu", - sblock->logical, sblock->mirror_num, - btrfs_stack_header_bytenr(h), - sblock->logical); - goto out; - } - - if (!scrub_check_fsid(h->fsid, sector)) { - sblock->header_error = 1; - btrfs_warn_rl(fs_info, - "tree block %llu mirror %u has bad fsid, has %pU want %pU", - sblock->logical, sblock->mirror_num, - h->fsid, sblock->dev->fs_devices->fsid); - goto out; - } - - if (memcmp(h->chunk_tree_uuid, fs_info->chunk_tree_uuid, BTRFS_UUID_SIZE)) { - sblock->header_error = 1; - btrfs_warn_rl(fs_info, - "tree block %llu mirror %u has bad chunk tree uuid, has %pU want %pU", - sblock->logical, sblock->mirror_num, - h->chunk_tree_uuid, fs_info->chunk_tree_uuid); - goto out; - } - - shash->tfm = fs_info->csum_shash; - crypto_shash_init(shash); - crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE, - sectorsize - BTRFS_CSUM_SIZE); - - for (i = 1; i < num_sectors; i++) { - kaddr = scrub_sector_get_kaddr(sblock->sectors[i]); - crypto_shash_update(shash, kaddr, sectorsize); - } - - crypto_shash_final(shash, calculated_csum); - if (memcmp(calculated_csum, on_disk_csum, sctx->fs_info->csum_size)) { - sblock->checksum_error = 1; - btrfs_warn_rl(fs_info, - "tree block %llu mirror %u has bad csum, has " CSUM_FMT " want " CSUM_FMT, - sblock->logical, sblock->mirror_num, - CSUM_FMT_VALUE(fs_info->csum_size, on_disk_csum), - CSUM_FMT_VALUE(fs_info->csum_size, calculated_csum)); - goto out; - } - - if (sector->generation != btrfs_stack_header_generation(h)) { - sblock->header_error = 1; - sblock->generation_error = 1; - btrfs_warn_rl(fs_info, - "tree block %llu mirror %u has bad generation, has %llu want %llu", - sblock->logical, sblock->mirror_num, - btrfs_stack_header_generation(h), - sector->generation); - } - -out: - return sblock->header_error || sblock->checksum_error; -} - -static int scrub_checksum_super(struct scrub_block *sblock) -{ - struct btrfs_super_block *s; - struct scrub_ctx *sctx = sblock->sctx; - struct btrfs_fs_info *fs_info = sctx->fs_info; - SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); - u8 calculated_csum[BTRFS_CSUM_SIZE]; - struct scrub_sector *sector; - char *kaddr; - int fail_gen = 0; - int fail_cor = 0; - - BUG_ON(sblock->sector_count < 1); - sector = sblock->sectors[0]; - kaddr = scrub_sector_get_kaddr(sector); - s = (struct btrfs_super_block *)kaddr; - - if (sblock->logical != btrfs_super_bytenr(s)) - ++fail_cor; - - if (sector->generation != btrfs_super_generation(s)) - ++fail_gen; - - if (!scrub_check_fsid(s->fsid, sector)) - ++fail_cor; - - shash->tfm = fs_info->csum_shash; - crypto_shash_init(shash); - crypto_shash_digest(shash, kaddr + BTRFS_CSUM_SIZE, - BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE, calculated_csum); - - if (memcmp(calculated_csum, s->csum, sctx->fs_info->csum_size)) - ++fail_cor; - - return fail_cor + fail_gen; -} - -static void scrub_block_put(struct scrub_block *sblock) -{ - if (refcount_dec_and_test(&sblock->refs)) { - int i; - - for (i = 0; i < sblock->sector_count; i++) - scrub_sector_put(sblock->sectors[i]); - for (i = 0; i < DIV_ROUND_UP(sblock->len, PAGE_SIZE); i++) { - if (sblock->pages[i]) { - detach_scrub_page_private(sblock->pages[i]); - __free_page(sblock->pages[i]); - } - } - kfree(sblock); - } -} - -void scrub_sector_get(struct scrub_sector *sector) -{ - atomic_inc(§or->refs); -} - -static void scrub_sector_put(struct scrub_sector *sector) -{ - if (atomic_dec_and_test(§or->refs)) - kfree(sector); -} - static void scrub_throttle_dev_io(struct scrub_ctx *sctx, struct btrfs_device *device, unsigned int bio_size) @@ -1844,110 +1389,12 @@ static void scrub_submit(struct scrub_ctx *sctx) submit_bio(sbio->bio); } -int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx, - struct scrub_sector *sector) -{ - struct scrub_block *sblock = sector->sblock; - struct scrub_bio *sbio; - const u32 sectorsize = sctx->fs_info->sectorsize; - int ret; - -again: - /* - * grab a fresh bio or wait for one to become available - */ - while (sctx->curr == -1) { - spin_lock(&sctx->list_lock); - sctx->curr = sctx->first_free; - if (sctx->curr != -1) { - sctx->first_free = sctx->bios[sctx->curr]->next_free; - sctx->bios[sctx->curr]->next_free = -1; - sctx->bios[sctx->curr]->sector_count = 0; - spin_unlock(&sctx->list_lock); - } else { - spin_unlock(&sctx->list_lock); - wait_event(sctx->list_wait, sctx->first_free != -1); - } - } - sbio = sctx->bios[sctx->curr]; - if (sbio->sector_count == 0) { - sbio->physical = sblock->physical + sector->offset; - sbio->logical = sblock->logical + sector->offset; - sbio->dev = sblock->dev; - if (!sbio->bio) { - sbio->bio = bio_alloc(sbio->dev->bdev, sctx->sectors_per_bio, - REQ_OP_READ, GFP_NOFS); - } - sbio->bio->bi_private = sbio; - sbio->bio->bi_end_io = scrub_bio_end_io; - sbio->bio->bi_iter.bi_sector = sbio->physical >> 9; - sbio->status = 0; - } else if (sbio->physical + sbio->sector_count * sectorsize != - sblock->physical + sector->offset || - sbio->logical + sbio->sector_count * sectorsize != - sblock->logical + sector->offset || - sbio->dev != sblock->dev) { - scrub_submit(sctx); - goto again; - } - - sbio->sectors[sbio->sector_count] = sector; - ret = bio_add_scrub_sector(sbio->bio, sector, sectorsize); - if (ret != sectorsize) { - if (sbio->sector_count < 1) { - bio_put(sbio->bio); - sbio->bio = NULL; - return -EIO; - } - scrub_submit(sctx); - goto again; - } - - scrub_block_get(sblock); /* one for the page added to the bio */ - atomic_inc(&sblock->outstanding_sectors); - sbio->sector_count++; - if (sbio->sector_count == sctx->sectors_per_bio) - scrub_submit(sctx); - - return 0; -} - -static void scrub_bio_end_io(struct bio *bio) -{ - struct scrub_bio *sbio = bio->bi_private; - struct btrfs_fs_info *fs_info = sbio->dev->fs_info; - - sbio->status = bio->bi_status; - sbio->bio = bio; - - queue_work(fs_info->scrub_workers, &sbio->work); -} - static void scrub_bio_end_io_worker(struct work_struct *work) { struct scrub_bio *sbio = container_of(work, struct scrub_bio, work); struct scrub_ctx *sctx = sbio->sctx; - int i; ASSERT(sbio->sector_count <= SCRUB_SECTORS_PER_BIO); - if (sbio->status) { - for (i = 0; i < sbio->sector_count; i++) { - struct scrub_sector *sector = sbio->sectors[i]; - - sector->io_error = 1; - sector->sblock->no_io_error_seen = 0; - } - } - - /* Now complete the scrub_block items that have all pages completed */ - for (i = 0; i < sbio->sector_count; i++) { - struct scrub_sector *sector = sbio->sectors[i]; - struct scrub_block *sblock = sector->sblock; - - if (atomic_dec_and_test(&sblock->outstanding_sectors)) - scrub_block_complete(sblock); - scrub_block_put(sblock); - } bio_put(sbio->bio); sbio->bio = NULL; @@ -1959,17 +1406,6 @@ static void scrub_bio_end_io_worker(struct work_struct *work) scrub_pending_bio_dec(sctx); } -static void scrub_block_complete(struct scrub_block *sblock) -{ - if (sblock->no_io_error_seen) - /* - * if has checksum error, write via repair mechanism in - * dev replace case, otherwise write here in dev replace - * case. - */ - scrub_checksum(sblock); -} - static void drop_csum_range(struct scrub_ctx *sctx, struct btrfs_ordered_sum *sum) { sctx->stat.csum_discards += sum->len >> sctx->fs_info->sectorsize_bits; diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index 7d1982893363..1fa4d26e8122 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -15,17 +15,7 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, /* Temporary declaration, would be deleted later. */ struct scrub_ctx; -struct scrub_sector; struct scrub_block; int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum); -int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx, - struct scrub_sector *sector); -void scrub_sector_get(struct scrub_sector *sector); -struct scrub_sector *alloc_scrub_sector(struct scrub_block *sblock, u64 logical); -struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx, - struct btrfs_device *dev, - u64 logical, u64 physical, - u64 physical_for_dev_replace, - int mirror_num); #endif From patchwork Wed Mar 29 07:09:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13191994 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CE64C77B61 for ; Wed, 29 Mar 2023 07:10:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229862AbjC2HKV (ORCPT ); Wed, 29 Mar 2023 03:10:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229853AbjC2HKS (ORCPT ); Wed, 29 Mar 2023 03:10:18 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD5E7268B for ; Wed, 29 Mar 2023 00:10:15 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9C76E219B3 for ; Wed, 29 Mar 2023 07:10:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1680073814; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EQeat43fykj0L88YzNFyJmsSk1cYRdfMjY7IK1qUh30=; b=EqRlrrpaTvfGCNlMuhxW+b417vr02Pmj7xgjp022xggnghq6TV5XJRyE/VXWlhXQwgxQAg Tygx6QybG4SKIx3lnVy/L/a4OBlqpVVJs1aBpxX35gVdJ74aRvbMHSet1QIXmgps/2mlLB ODM7y96W5+LfIEe+6sndeNAv+HM0Nxs= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0B2D7138FF for ; Wed, 29 Mar 2023 07:10:13 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id EGfnMlXkI2T4NQAAMHmgww (envelope-from ) for ; Wed, 29 Mar 2023 07:10:13 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 6/6] btrfs: scrub: remove scrub_bio structure Date: Wed, 29 Mar 2023 15:09:50 +0800 Message-Id: <9fc275192811e1998f8b06764f89ab03d1ab5645.1680073696.git.wqu@suse.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Since scrub path has been fully moved to scrub_stripe based facilities, no more scrub_bio would be submitted. Thus we can remove it completely, this involves: - SCRUB_SECTORS_PER_BIO macro - SCRUB_BIOS_PER_SCTX macro - SCRUB_MAX_PAGES macro - BTRFS_MAX_MIRRORS macro - scrub_bio structure - scrub_ctx::bios member - scrub_ctx::curr member - scrub_ctx::bios_in_flight member - scrub_ctx::workers_pending member - scrub_ctx::list_lock member - scrub_ctx::list_wait member - function scrub_bio_end_io_worker() - function scrub_pending_bio_inc() - function scrub_pending_bio_dec() - function scrub_throttle() - function scrub_submit() - function scrub_find_csum() - function drop_csum_range() - Some unnecessary flush and scrub pauses Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 245 ++--------------------------------------------- fs/btrfs/scrub.h | 5 - 2 files changed, 6 insertions(+), 244 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 8c44a2fa0a19..ddd6c432f607 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -41,14 +41,10 @@ struct scrub_ctx; /* - * The following three values only influence the performance. + * The following value only influence the performance. * - * The last one configures the number of parallel and outstanding I/O - * operations. The first one configures an upper limit for the number - * of (dynamically allocated) pages that are added to a bio. + * This determines the batch size for stripe submitted in one go. */ -#define SCRUB_SECTORS_PER_BIO 32 /* 128KiB per bio for 4KiB pages */ -#define SCRUB_BIOS_PER_SCTX 64 /* 8MiB per device in flight for 4KiB pages */ #define SCRUB_STRIPES_PER_SCTX 8 /* That would be 8 64K stripe per-device. */ /* @@ -57,19 +53,6 @@ struct scrub_ctx; */ #define SCRUB_MAX_SECTORS_PER_BLOCK (BTRFS_MAX_METADATA_BLOCKSIZE / SZ_4K) -#define SCRUB_MAX_PAGES (DIV_ROUND_UP(BTRFS_MAX_METADATA_BLOCKSIZE, PAGE_SIZE)) - -/* - * Maximum number of mirrors that can be available for all profiles counting - * the target device of dev-replace as one. During an active device replace - * procedure, the target device of the copy operation is a mirror for the - * filesystem data as well that can be used to read data in order to repair - * read errors on other disks. - * - * Current value is derived from RAID1C4 with 4 copies. - */ -#define BTRFS_MAX_MIRRORS (4 + 1) - /* Represent one sector and its needed info to verify the content. */ struct scrub_sector_verification { bool is_metadata; @@ -182,31 +165,12 @@ struct scrub_stripe { struct work_struct work; }; -struct scrub_bio { - int index; - struct scrub_ctx *sctx; - struct btrfs_device *dev; - struct bio *bio; - blk_status_t status; - u64 logical; - u64 physical; - int sector_count; - int next_free; - struct work_struct work; -}; - struct scrub_ctx { - struct scrub_bio *bios[SCRUB_BIOS_PER_SCTX]; struct scrub_stripe stripes[SCRUB_STRIPES_PER_SCTX]; struct scrub_stripe *raid56_data_stripes; struct btrfs_fs_info *fs_info; int first_free; - int curr; int cur_stripe; - atomic_t bios_in_flight; - atomic_t workers_pending; - spinlock_t list_lock; - wait_queue_head_t list_wait; struct list_head csum_list; atomic_t cancel_req; int readonly; @@ -305,22 +269,8 @@ static void wait_scrub_stripe_io(struct scrub_stripe *stripe) wait_event(stripe->io_wait, atomic_read(&stripe->pending_io) == 0); } -static void scrub_bio_end_io_worker(struct work_struct *work); static void scrub_put_ctx(struct scrub_ctx *sctx); -static void scrub_pending_bio_inc(struct scrub_ctx *sctx) -{ - refcount_inc(&sctx->refs); - atomic_inc(&sctx->bios_in_flight); -} - -static void scrub_pending_bio_dec(struct scrub_ctx *sctx) -{ - atomic_dec(&sctx->bios_in_flight); - wake_up(&sctx->list_wait); - scrub_put_ctx(sctx); -} - static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) { while (atomic_read(&fs_info->scrub_pause_req)) { @@ -371,21 +321,6 @@ static noinline_for_stack void scrub_free_ctx(struct scrub_ctx *sctx) if (!sctx) return; - /* this can happen when scrub is cancelled */ - if (sctx->curr != -1) { - struct scrub_bio *sbio = sctx->bios[sctx->curr]; - - bio_put(sbio->bio); - } - - for (i = 0; i < SCRUB_BIOS_PER_SCTX; ++i) { - struct scrub_bio *sbio = sctx->bios[i]; - - if (!sbio) - break; - kfree(sbio); - } - for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) release_scrub_stripe(&sctx->stripes[i]); @@ -410,28 +345,8 @@ static noinline_for_stack struct scrub_ctx *scrub_setup_ctx( goto nomem; refcount_set(&sctx->refs, 1); sctx->is_dev_replace = is_dev_replace; - sctx->sectors_per_bio = SCRUB_SECTORS_PER_BIO; - sctx->curr = -1; sctx->fs_info = fs_info; INIT_LIST_HEAD(&sctx->csum_list); - for (i = 0; i < SCRUB_BIOS_PER_SCTX; ++i) { - struct scrub_bio *sbio; - - sbio = kzalloc(sizeof(*sbio), GFP_KERNEL); - if (!sbio) - goto nomem; - sctx->bios[i] = sbio; - - sbio->index = i; - sbio->sctx = sctx; - sbio->sector_count = 0; - INIT_WORK(&sbio->work, scrub_bio_end_io_worker); - - if (i != SCRUB_BIOS_PER_SCTX - 1) - sctx->bios[i]->next_free = i + 1; - else - sctx->bios[i]->next_free = -1; - } for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) { int ret; @@ -441,13 +356,9 @@ static noinline_for_stack struct scrub_ctx *scrub_setup_ctx( sctx->stripes[i].sctx = sctx; } sctx->first_free = 0; - atomic_set(&sctx->bios_in_flight, 0); - atomic_set(&sctx->workers_pending, 0); atomic_set(&sctx->cancel_req, 0); - spin_lock_init(&sctx->list_lock); spin_lock_init(&sctx->stat_lock); - init_waitqueue_head(&sctx->list_wait); sctx->throttle_deadline = 0; mutex_init(&sctx->wr_lock); @@ -1309,6 +1220,10 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, } } +/* + * Throttling of IO submission, bandwidth-limit based, the timeslice is 1 + * second. Limit can be set via /sys/fs/UUID/devinfo/devid/scrub_speed_max. + */ static void scrub_throttle_dev_io(struct scrub_ctx *sctx, struct btrfs_device *device, unsigned int bio_size) @@ -1362,112 +1277,6 @@ static void scrub_throttle_dev_io(struct scrub_ctx *sctx, sctx->throttle_deadline = 0; } -/* - * Throttling of IO submission, bandwidth-limit based, the timeslice is 1 - * second. Limit can be set via /sys/fs/UUID/devinfo/devid/scrub_speed_max. - */ -static void scrub_throttle(struct scrub_ctx *sctx) -{ - struct scrub_bio *sbio = sctx->bios[sctx->curr]; - - scrub_throttle_dev_io(sctx, sbio->dev, sbio->bio->bi_iter.bi_size); -} - -static void scrub_submit(struct scrub_ctx *sctx) -{ - struct scrub_bio *sbio; - - if (sctx->curr == -1) - return; - - scrub_throttle(sctx); - - sbio = sctx->bios[sctx->curr]; - sctx->curr = -1; - scrub_pending_bio_inc(sctx); - btrfsic_check_bio(sbio->bio); - submit_bio(sbio->bio); -} - -static void scrub_bio_end_io_worker(struct work_struct *work) -{ - struct scrub_bio *sbio = container_of(work, struct scrub_bio, work); - struct scrub_ctx *sctx = sbio->sctx; - - ASSERT(sbio->sector_count <= SCRUB_SECTORS_PER_BIO); - - bio_put(sbio->bio); - sbio->bio = NULL; - spin_lock(&sctx->list_lock); - sbio->next_free = sctx->first_free; - sctx->first_free = sbio->index; - spin_unlock(&sctx->list_lock); - - scrub_pending_bio_dec(sctx); -} - -static void drop_csum_range(struct scrub_ctx *sctx, struct btrfs_ordered_sum *sum) -{ - sctx->stat.csum_discards += sum->len >> sctx->fs_info->sectorsize_bits; - list_del(&sum->list); - kfree(sum); -} - -/* - * Find the desired csum for range [logical, logical + sectorsize), and store - * the csum into @csum. - * - * The search source is sctx->csum_list, which is a pre-populated list - * storing bytenr ordered csum ranges. We're responsible to cleanup any range - * that is before @logical. - * - * Return 0 if there is no csum for the range. - * Return 1 if there is csum for the range and copied to @csum. - */ -int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum) -{ - bool found = false; - - while (!list_empty(&sctx->csum_list)) { - struct btrfs_ordered_sum *sum = NULL; - unsigned long index; - unsigned long num_sectors; - - sum = list_first_entry(&sctx->csum_list, - struct btrfs_ordered_sum, list); - /* The current csum range is beyond our range, no csum found */ - if (sum->bytenr > logical) - break; - - /* - * The current sum is before our bytenr, since scrub is always - * done in bytenr order, the csum will never be used anymore, - * clean it up so that later calls won't bother with the range, - * and continue search the next range. - */ - if (sum->bytenr + sum->len <= logical) { - drop_csum_range(sctx, sum); - continue; - } - - /* Now the csum range covers our bytenr, copy the csum */ - found = true; - index = (logical - sum->bytenr) >> sctx->fs_info->sectorsize_bits; - num_sectors = sum->len >> sctx->fs_info->sectorsize_bits; - - memcpy(csum, sum->sums + index * sctx->fs_info->csum_size, - sctx->fs_info->csum_size); - - /* Cleanup the range if we're at the end of the csum range */ - if (index == num_sectors - 1) - drop_csum_range(sctx, sum); - break; - } - if (!found) - return 0; - return 1; -} - /* * Given a physical address, this will calculate it's * logical offset. if this is a parity stripe, it will return @@ -1648,8 +1457,6 @@ static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical, if (!btrfs_is_zoned(fs_info)) return 0; - wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); - mutex_lock(&sctx->wr_lock); if (sctx->write_pointer < physical_end) { ret = btrfs_sync_zone_write_pointer(sctx->wr_tgtdev, logical, @@ -2185,11 +1992,6 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx, /* Paused? */ if (atomic_read(&fs_info->scrub_pause_req)) { /* Push queued extents */ - scrub_submit(sctx); - mutex_lock(&sctx->wr_lock); - mutex_unlock(&sctx->wr_lock); - wait_event(sctx->list_wait, - atomic_read(&sctx->bios_in_flight) == 0); scrub_blocked_if_needed(fs_info); } /* Block group removed? */ @@ -2317,8 +2119,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, u64 stripe_logical; int stop_loop = 0; - wait_event(sctx->list_wait, - atomic_read(&sctx->bios_in_flight) == 0); scrub_blocked_if_needed(fs_info); if (sctx->is_dev_replace && @@ -2433,8 +2233,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, break; } out: - /* push queued extents */ - scrub_submit(sctx); flush_scrub_stripes(sctx); if (sctx->raid56_data_stripes) { for (int i = 0; i < nr_data_stripes(map); i++) @@ -2759,34 +2557,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, ret = scrub_chunk(sctx, cache, scrub_dev, found_key.offset, dev_extent_len); - - /* - * flush, submit all pending read and write bios, afterwards - * wait for them. - * Note that in the dev replace case, a read request causes - * write requests that are submitted in the read completion - * worker. Therefore in the current situation, it is required - * that all write requests are flushed, so that all read and - * write requests are really completed when bios_in_flight - * changes to 0. - */ - scrub_submit(sctx); - - wait_event(sctx->list_wait, - atomic_read(&sctx->bios_in_flight) == 0); - - scrub_pause_on(fs_info); - - /* - * must be called before we decrease @scrub_paused. - * make sure we don't block transaction commit while - * we are waiting pending workers finished. - */ - wait_event(sctx->list_wait, - atomic_read(&sctx->workers_pending) == 0); - - scrub_pause_off(fs_info); - if (sctx->is_dev_replace && !btrfs_finish_block_group_to_copy(dev_replace->srcdev, cache, found_key.offset)) @@ -3111,12 +2881,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, ret = scrub_enumerate_chunks(sctx, dev, start, end); memalloc_nofs_restore(nofs_flag); - wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); atomic_dec(&fs_info->scrubs_running); wake_up(&fs_info->scrub_pause_wait); - wait_event(sctx->list_wait, atomic_read(&sctx->workers_pending) == 0); - if (progress) memcpy(progress, &sctx->stat, sizeof(*progress)); diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index 1fa4d26e8122..7639103ebf9d 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -13,9 +13,4 @@ int btrfs_scrub_cancel_dev(struct btrfs_device *dev); int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid, struct btrfs_scrub_progress *progress); -/* Temporary declaration, would be deleted later. */ -struct scrub_ctx; -struct scrub_block; -int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum); - #endif