From patchwork Wed Mar 2 08:44:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12765651 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6D59C433F5 for ; Wed, 2 Mar 2022 08:44:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240187AbiCBIpW (ORCPT ); Wed, 2 Mar 2022 03:45:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232852AbiCBIpS (ORCPT ); Wed, 2 Mar 2022 03:45:18 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95FC0BAB9C for ; Wed, 2 Mar 2022 00:44:34 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 1B630219A2 for ; Wed, 2 Mar 2022 08:44:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1646210673; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dYXhOPOr3U1aD9r0cEkliJR5YlOsIJx9Lpqz4lAG38k=; b=ZD+bMlIoqNuX1778Je+ZotFfxW43dds3lmceFt0xDGx98Ia5rHvS3rIOEc8ncOrS9WEe6k 7azdagfsNwdeserGWbXBIQXClk+E/S9PAAN/PRtLV/1bzyMcPoDBy+y4USoMLEAZx9Sl1j NDdJiT029tuhsjDt4v36YZcpGjMWnWk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 570EC13345 for ; Wed, 2 Mar 2022 08:44:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id yDWXBnAuH2JHTAAAMHmgww (envelope-from ) for ; Wed, 02 Mar 2022 08:44:32 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 4/5] btrfs: scrub: introduce scrub_block::pages for more efficient memory usage for subpage Date: Wed, 2 Mar 2022 16:44:07 +0800 Message-Id: <126d66711c933ca79b19352b79c5bdbea44dd08f.1646210538.git.wqu@suse.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org [BACKGROUND] Currently for scrub, we allocate one page for one sector, this is fine for PAGE_SIZE == sectorsize support, but can waste extra memory for subpage support. [CODE CHANGE] So this patch will make scrub_block to contain all the pages, so if we're scrubing an extent sized 64K, and our page size is also 64K, we only need to allocate one page. [LIFESPAN CHANGE] Since now scrub_sector no longer holds a page, but using scrub_block::pages[] instead, we have to ensure scrub_block has a longer lifespan for write bio. (The lifespan for read bio is already large enough) Now scrub_block will only be released after the write bio finished. [COMING NEXT] Currently we only added scrub_block::pages[] for this purpose, but scrub_sector is still utilizing the old scrub_sector::page. The switch will happen in the next patch. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 135 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 112 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index bab717e80918..7edf063baee6 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -54,6 +54,8 @@ struct scrub_ctx; */ #define SCRUB_MAX_SECTORS_PER_BLOCK (BTRFS_MAX_METADATA_BLOCKSIZE / SZ_4K) +#define SCRUB_MAX_PAGES (DIV_ROUND_UP(BTRFS_MAX_METADATA_BLOCKSIZE, PAGE_SIZE)) + struct scrub_recover { refcount_t refs; struct btrfs_io_context *bioc; @@ -94,8 +96,16 @@ struct scrub_bio { }; struct scrub_block { + /* + * Each page will has its page::private used to record the logical + * bytenr. + */ + struct page *pages[SCRUB_MAX_PAGES]; struct scrub_sector *sectorv[SCRUB_MAX_SECTORS_PER_BLOCK]; + u64 logical; /* Logical bytenr of the sblock */ + u32 len; /* The length of sblock in bytes */ int sector_count; + atomic_t outstanding_sectors; refcount_t refs; /* free mem on transition to zero */ struct scrub_ctx *sctx; @@ -204,7 +214,46 @@ struct full_stripe_lock { struct mutex mutex; }; -static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx) +#ifndef CONFIG_64BIT +/* This structure is for archtectures whose (void *) is smaller than u64 */ +struct scrub_page_private { + u64 logical; +}; +#endif + +static int attach_scrub_page_private(struct page *page, u64 logical) +{ +#ifdef CONFIG_64BIT + attach_page_private(page, (void *)logical); + return 0; +#else + struct scrub_page_private *spp; + + spp = kmalloc(sizeof(*spp), GFP_KERNEL); + if (!spp) + return -ENOMEM; + spp->logical = logical; + attach_page_private(page, (void *)spp); + return 0; +#endif +} + +static void detach_scrub_page_private(struct page *page) +{ +#ifdef CONFIG_64BIT + detach_page_private(page); + return; +#else + struct scrub_page_private *spp; + + spp = detach_page_private(page); + kfree(spp); + return; +#endif +} + +static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx, + u64 logical) { struct scrub_block *sblock; @@ -213,27 +262,54 @@ static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx) return NULL; refcount_set(&sblock->refs, 1); sblock->sctx = sctx; + sblock->logical = logical; sblock->no_io_error_seen = 1; + /* + * Scrub_block::pages will be allocated at alloc_scrub_sector() when + * the corresponding page is not allocated. + */ return sblock; } -/* Allocate a new scrub sector and attach it to @sblock */ +/* + * Allocate a new scrub sector and attach it to @sblock. + * + * Will also allocate new pages for @sblock if needed. + */ static struct scrub_sector *alloc_scrub_sector(struct scrub_block *sblock, - gfp_t gfp) + u64 logical, gfp_t gfp) { + const int page_index = (logical - sblock->logical) >> PAGE_SHIFT; struct scrub_sector *ssector; ssector = kzalloc(sizeof(*ssector), gfp); if (!ssector) return NULL; - ssector->page = alloc_page(gfp); - if (!ssector->page) { - kfree(ssector); - return NULL; + + /* Allocate a new page if the slot is not allocated*/ + if (!sblock->pages[page_index]) { + int ret; + + sblock->pages[page_index] = alloc_page(gfp); + if (!sblock->pages[page_index]) { + kfree(ssector); + return NULL; + } + ret = attach_scrub_page_private(sblock->pages[page_index], + sblock->logical + (page_index << PAGE_SHIFT)); + if (ret < 0) { + kfree(ssector); + __free_page(sblock->pages[page_index]); + sblock->pages[page_index] = NULL; + return NULL; + } } + atomic_set(&ssector->refs, 1); ssector->sblock = sblock; - /* This sector to be added should not be used */ + ssector->logical = logical; + + /* The slot to be added should not be used */ ASSERT(sblock->sectorv[sblock->sector_count] == NULL); /* And the sector count should be smaller than the limit */ ASSERT(sblock->sector_count < SCRUB_MAX_SECTORS_PER_BLOCK); @@ -955,7 +1031,8 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) * But alloc_scrub_block() will initialize sblock::ref anyway, * so we can use scrub_block_put() to clean them up. */ - sblocks_for_recheck[mirror_index] = alloc_scrub_block(sctx); + sblocks_for_recheck[mirror_index] = alloc_scrub_block(sctx, + logical); if (!sblocks_for_recheck[mirror_index]) { spin_lock(&sctx->stat_lock); sctx->stat.malloc_errors++; @@ -1363,7 +1440,7 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock, sblock = sblocks_for_recheck[mirror_index]; sblock->sctx = sctx; - ssector = alloc_scrub_sector(sblock, GFP_NOFS); + ssector = alloc_scrub_sector(sblock, logical, GFP_NOFS); if (!ssector) { spin_lock(&sctx->stat_lock); sctx->stat.malloc_errors++; @@ -1373,7 +1450,6 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock, } ssector->flags = flags; ssector->generation = generation; - ssector->logical = logical; ssector->have_csum = have_csum; if (have_csum) memcpy(ssector->csum, @@ -1669,6 +1745,11 @@ static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) return ret; } +static void scrub_block_get(struct scrub_block *sblock) +{ + refcount_inc(&sblock->refs); +} + static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx, struct scrub_sector *ssector) { @@ -1736,6 +1817,13 @@ static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx, sbio->sectorv[sbio->sector_count] = ssector; scrub_sector_get(ssector); + /* + * Since ssector no longer holds a page, but uses sblock::pages, we + * have to ensure the sblock didn't get freed before our write bio + * finished. + */ + scrub_block_get(ssector->sblock); + sbio->sector_count++; if (sbio->sector_count == sctx->sectors_per_bio) scrub_wr_submit(sctx); @@ -1797,8 +1885,10 @@ static void scrub_wr_bio_end_io_worker(struct btrfs_work *work) } } - for (i = 0; i < sbio->sector_count; i++) + for (i = 0; i < sbio->sector_count; i++) { + scrub_block_put(sbio->sectorv[i]->sblock); scrub_sector_put(sbio->sectorv[i]); + } bio_put(sbio->bio); kfree(sbio); @@ -1989,11 +2079,6 @@ static int scrub_checksum_super(struct scrub_block *sblock) return fail_cor + fail_gen; } -static void scrub_block_get(struct scrub_block *sblock) -{ - refcount_inc(&sblock->refs); -} - static void scrub_block_put(struct scrub_block *sblock) { if (refcount_dec_and_test(&sblock->refs)) { @@ -2004,6 +2089,12 @@ static void scrub_block_put(struct scrub_block *sblock) for (i = 0; i < sblock->sector_count; i++) scrub_sector_put(sblock->sectorv[i]); + for (i = 0; i < DIV_ROUND_UP(sblock->len, PAGE_SIZE); i++) { + if (sblock->pages[i]) { + detach_scrub_page_private(sblock->pages[i]); + __free_page(sblock->pages[i]); + } + } kfree(sblock); } } @@ -2294,7 +2385,7 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len, const u32 sectorsize = sctx->fs_info->sectorsize; int index; - sblock = alloc_scrub_block(sctx); + sblock = alloc_scrub_block(sctx, logical); if (!sblock) { spin_lock(&sctx->stat_lock); sctx->stat.malloc_errors++; @@ -2311,7 +2402,7 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len, */ u32 l = min(sectorsize, len); - ssector = alloc_scrub_sector(sblock, GFP_KERNEL); + ssector = alloc_scrub_sector(sblock, logical, GFP_KERNEL); if (!ssector) { spin_lock(&sctx->stat_lock); sctx->stat.malloc_errors++; @@ -2322,7 +2413,6 @@ static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len, ssector->dev = dev; ssector->flags = flags; ssector->generation = gen; - ssector->logical = logical; ssector->physical = physical; ssector->physical_for_dev_replace = physical_for_dev_replace; ssector->mirror_num = mirror_num; @@ -2632,7 +2722,7 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity, ASSERT(IS_ALIGNED(len, sectorsize)); - sblock = alloc_scrub_block(sctx); + sblock = alloc_scrub_block(sctx, logical); if (!sblock) { spin_lock(&sctx->stat_lock); sctx->stat.malloc_errors++; @@ -2646,7 +2736,7 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity, for (index = 0; len > 0; index++) { struct scrub_sector *ssector; - ssector = alloc_scrub_sector(sblock, GFP_KERNEL); + ssector = alloc_scrub_sector(sblock, logical, GFP_KERNEL); if (!ssector) { spin_lock(&sctx->stat_lock); sctx->stat.malloc_errors++; @@ -2661,7 +2751,6 @@ static int scrub_sectors_for_parity(struct scrub_parity *sparity, ssector->dev = dev; ssector->flags = flags; ssector->generation = gen; - ssector->logical = logical; ssector->physical = physical; ssector->mirror_num = mirror_num; if (csum) {