From patchwork Fri Dec 30 22:17:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085044 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB5E7C4332F for ; Sat, 31 Dec 2022 00:05:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235513AbiLaAFO (ORCPT ); Fri, 30 Dec 2022 19:05:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235435AbiLaAFN (ORCPT ); Fri, 30 Dec 2022 19:05:13 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D0961E3C3 for ; Fri, 30 Dec 2022 16:05:13 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9AA1761CBF for ; Sat, 31 Dec 2022 00:05:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 06BA9C433EF; Sat, 31 Dec 2022 00:05:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445112; bh=TL56CYLK4DvgDVfNqbMErVeO4A/fXq3uV/BKf9opWpM=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=CAAwvI+0/X0UpxVDpmnKG99ztdL3clrxYKj89exv9GPjM8zvMSd1wl+mEbSpBCmG+ pbG5lSq2eljSn0DnSD/xCayo9yHLGugDSo++zsQcdiTr89hCmBLERTSFZDW0F0GifZ JLURjCwHPQX2OL3Hrs0NlIGDUQIUeV60VzKu6m3nDRxQsRILmuwbpw84kFFTllUaX0 lBH/ke8s+zyIlReFeXOVigKsScmo49z6cX9u3Sy64kKLdCPMMZSDRQY60Yg+9eyOJm sfK7iJRyVlRm+11b6SrCwjShNc7nrtB7RxBpgd4hKf4Zz1u9efsb/Jkdw6UNwnHXEb s7obxIUb6t/7w== Subject: [PATCH 1/5] xfs: force all buffers to be written during btree bulk load From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:19 -0800 Message-ID: <167243863918.707598.10220298459269094461.stgit@magnolia> In-Reply-To: <167243863904.707598.12385476439101029022.stgit@magnolia> References: <167243863904.707598.12385476439101029022.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong While stress-testing online repair of btrees, I noticed periodic assertion failures from the buffer cache about buffer readers encountering buffers with DELWRI_Q set, even though the btree bulk load had already committed and the buffer itself wasn't on any delwri list. I traced this to a misunderstanding of how the delwri lists work, particularly with regards to the AIL's buffer list. If a buffer is logged and committed, the buffer can end up on that AIL buffer list. If btree repairs are run twice in rapid succession, it's possible that the first repair will invalidate the buffer and free it before the next time the AIL wakes up. This clears DELWRI_Q from the buffer state. If the second repair allocates the same block, it will then recycle the buffer to start writing the new btree block. Meanwhile, if the AIL wakes up and walks the buffer list, it will ignore the buffer because it can't lock it, and go back to sleep. When the second repair calls delwri_queue to put the buffer on the list of buffers to write before committing the new btree, it will set DELWRI_Q again, but since the buffer hasn't been removed from the AIL's buffer list, it won't add it to the bulkload buffer's list. This is incorrect, because the bulkload caller relies on delwri_submit to ensure that all the buffers have been sent to disk /before/ committing the new btree root pointer. This ordering requirement is required for data consistency. Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally drop it, so the next thread to walk through the btree will trip over a debug assertion on that flag. To fix this, create a new thread that waits for the buffer to be removed from any other delwri lists before adding the buffer to the caller's delwri list. Signed-off-by: Darrick J. Wong --- libxfs/libxfs_io.h | 11 +++++++++++ libxfs/xfs_btree_staging.c | 4 +--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h index fae86427201..4ffe788d446 100644 --- a/libxfs/libxfs_io.h +++ b/libxfs/libxfs_io.h @@ -243,6 +243,17 @@ xfs_buf_delwri_queue(struct xfs_buf *bp, struct list_head *buffer_list) return true; } +static inline void +xfs_buf_delwri_queue_here(struct xfs_buf *bp, struct list_head *buffer_list) +{ + ASSERT(list_empty(&bp->b_list)); + + /* This buffer is uptodate; don't let it get reread. */ + libxfs_buf_mark_dirty(bp); + + xfs_buf_delwri_queue(bp, buffer_list); +} + int xfs_buf_delwri_submit(struct list_head *buffer_list); void xfs_buf_delwri_cancel(struct list_head *list); diff --git a/libxfs/xfs_btree_staging.c b/libxfs/xfs_btree_staging.c index a6a90791668..baf7f422603 100644 --- a/libxfs/xfs_btree_staging.c +++ b/libxfs/xfs_btree_staging.c @@ -342,9 +342,7 @@ xfs_btree_bload_drop_buf( if (*bpp == NULL) return; - if (!xfs_buf_delwri_queue(*bpp, buffers_list)) - ASSERT(0); - + xfs_buf_delwri_queue_here(*bpp, buffers_list); xfs_buf_relse(*bpp); *bpp = NULL; } From patchwork Fri Dec 30 22:17:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085045 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 623DEC4332F for ; Sat, 31 Dec 2022 00:05:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235714AbiLaAFc (ORCPT ); Fri, 30 Dec 2022 19:05:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235435AbiLaAFb (ORCPT ); Fri, 30 Dec 2022 19:05:31 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32EEB14D14 for ; Fri, 30 Dec 2022 16:05:30 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 5A87BCE19FC for ; Sat, 31 Dec 2022 00:05:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 96034C433D2; Sat, 31 Dec 2022 00:05:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445127; bh=XH86zN/DxT8HwXBgzspYADprt3aoDvAE4xBe2RvWEHA=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=HMoy6wexTZY2qlh0NhmcJLKp3qfOq+t/CkmXwKoGgFref4o/vBiLKo6jN/C3ri/sf BP/aMu7rkJm/x/lQgQ3QtrBUkqGMc9OqaWk2XQabAvj4RWolvi/thejE1ZBs4ry8a4 rocuDJL8FzaGtIpUW5JwMa9siOIacJS8hDP6AL90HInbtf8ISZ7CwEBPNobMuTnOeT 4gaCijYfxxPGR1APliGNoYWiLs5Es51disk1SpPdsARpGjCf3N3PGPviNCOwHCDgpG bwGos7zteBaPbo01bqqv1x1sMFIpKHzD44IcvpHZZVjjSGZzd2DDHWhGYyut7QvoZ6 rPIQHD4/hTaEg== Subject: [PATCH 2/5] xfs: implement block reservation accounting for btrees we're staging From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:19 -0800 Message-ID: <167243863930.707598.16411891612752740522.stgit@magnolia> In-Reply-To: <167243863904.707598.12385476439101029022.stgit@magnolia> References: <167243863904.707598.12385476439101029022.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Create a new xrep_newbt structure to encapsulate a fake root for creating a staged btree cursor as well as to track all the blocks that we need to reserve in order to build that btree. Signed-off-by: Darrick J. Wong --- libxfs/xfs_btree_staging.h | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/libxfs/xfs_btree_staging.h b/libxfs/xfs_btree_staging.h index f0d2976050a..d6dea3f0088 100644 --- a/libxfs/xfs_btree_staging.h +++ b/libxfs/xfs_btree_staging.h @@ -38,11 +38,8 @@ struct xbtree_ifakeroot { /* Number of bytes available for this fork in the inode. */ unsigned int if_fork_size; - /* Fork format. */ - unsigned int if_format; - - /* Number of records. */ - unsigned int if_extents; + /* Which fork is this btree being built for? */ + int if_whichfork; }; /* Cursor interactions with fake roots for inode-rooted btrees. */ From patchwork Fri Dec 30 22:17:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085046 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B28D1C4332F for ; Sat, 31 Dec 2022 00:05:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235774AbiLaAFs (ORCPT ); Fri, 30 Dec 2022 19:05:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235435AbiLaAFr (ORCPT ); Fri, 30 Dec 2022 19:05:47 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D33F76157 for ; Fri, 30 Dec 2022 16:05:45 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 7CFB7B81DEC for ; Sat, 31 Dec 2022 00:05:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2CEB4C433EF; Sat, 31 Dec 2022 00:05:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445143; bh=lHuuOcQAfX93rEEltlP81qFlydyfE7z0taV6SqYbwAY=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=fj3geoTZFnG1uXMheTFCFjpEp7FNCSfJtZunqxtibkXYmZcdiFE95CSuNnm4O8NOO GWwl/al/nC+Pe05hqICdeUPzd6A/ATqmueNtX1GIhp5Y2W8vvOZlCEKBVUUJtAk6fD UTvqKhN029MigH7mNspVaawSMpDbcVXAKnDKPL22oZSw0rRwJmLYX5RJiY71t7LbJ1 ddjKENvAVBUfjv+mVCSyEYWZ/D+tEKRIXb9ogJiUHyD3ifdd62GQ+xHpvkeZAIkqbO TnqwEA2lwIuLOnq8PgGP+wqdUlRlkVNLLmfq09kEGOQoLdQQHfI5Pg3nv2fvZjtFzm yniJDG1wkxZ7w== Subject: [PATCH 3/5] xfs: move btree bulkload record initialization to ->get_record implementations From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:19 -0800 Message-ID: <167243863943.707598.17620950198542269061.stgit@magnolia> In-Reply-To: <167243863904.707598.12385476439101029022.stgit@magnolia> References: <167243863904.707598.12385476439101029022.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong When we're performing a bulk load of a btree, move the code that actually stores the btree record in the new btree block out of the generic code and into the individual ->get_record implementations. This is preparation for being able to store multiple records with a single indirect call. Signed-off-by: Darrick J. Wong --- libxfs/libxfs_api_defs.h | 1 + libxfs/xfs_btree_staging.c | 17 ++++++------- libxfs/xfs_btree_staging.h | 15 ++++++++---- repair/agbtree.c | 56 +++++++++++++++++++++++++++++++++----------- 4 files changed, 60 insertions(+), 29 deletions(-) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index f8efcce777b..5aa9c019d40 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -50,6 +50,7 @@ #define xfs_btree_bload_compute_geometry libxfs_btree_bload_compute_geometry #define xfs_btree_del_cursor libxfs_btree_del_cursor #define xfs_btree_init_block libxfs_btree_init_block +#define xfs_btree_rec_addr libxfs_btree_rec_addr #define xfs_buf_delwri_submit libxfs_buf_delwri_submit #define xfs_buf_get libxfs_buf_get #define xfs_buf_get_uncached libxfs_buf_get_uncached diff --git a/libxfs/xfs_btree_staging.c b/libxfs/xfs_btree_staging.c index baf7f422603..97fade90622 100644 --- a/libxfs/xfs_btree_staging.c +++ b/libxfs/xfs_btree_staging.c @@ -434,22 +434,19 @@ STATIC int xfs_btree_bload_leaf( struct xfs_btree_cur *cur, unsigned int recs_this_block, - xfs_btree_bload_get_record_fn get_record, + xfs_btree_bload_get_records_fn get_records, struct xfs_btree_block *block, void *priv) { - unsigned int j; + unsigned int j = 1; int ret; /* Fill the leaf block with records. */ - for (j = 1; j <= recs_this_block; j++) { - union xfs_btree_rec *block_rec; - - ret = get_record(cur, priv); - if (ret) + while (j <= recs_this_block) { + ret = get_records(cur, j, block, recs_this_block - j + 1, priv); + if (ret < 0) return ret; - block_rec = xfs_btree_rec_addr(cur, j, block); - cur->bc_ops->init_rec_from_cur(cur, block_rec); + j += ret; } return 0; @@ -787,7 +784,7 @@ xfs_btree_bload( trace_xfs_btree_bload_block(cur, level, i, blocks, &ptr, nr_this_block); - ret = xfs_btree_bload_leaf(cur, nr_this_block, bbl->get_record, + ret = xfs_btree_bload_leaf(cur, nr_this_block, bbl->get_records, block, priv); if (ret) goto out; diff --git a/libxfs/xfs_btree_staging.h b/libxfs/xfs_btree_staging.h index d6dea3f0088..82a3a8ef0f1 100644 --- a/libxfs/xfs_btree_staging.h +++ b/libxfs/xfs_btree_staging.h @@ -50,7 +50,9 @@ void xfs_btree_commit_ifakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp, int whichfork, const struct xfs_btree_ops *ops); /* Bulk loading of staged btrees. */ -typedef int (*xfs_btree_bload_get_record_fn)(struct xfs_btree_cur *cur, void *priv); +typedef int (*xfs_btree_bload_get_records_fn)(struct xfs_btree_cur *cur, + unsigned int idx, struct xfs_btree_block *block, + unsigned int nr_wanted, void *priv); typedef int (*xfs_btree_bload_claim_block_fn)(struct xfs_btree_cur *cur, union xfs_btree_ptr *ptr, void *priv); typedef size_t (*xfs_btree_bload_iroot_size_fn)(struct xfs_btree_cur *cur, @@ -58,11 +60,14 @@ typedef size_t (*xfs_btree_bload_iroot_size_fn)(struct xfs_btree_cur *cur, struct xfs_btree_bload { /* - * This function will be called nr_records times to load records into - * the btree. The function does this by setting the cursor's bc_rec - * field in in-core format. Records must be returned in sort order. + * This function will be called to load @nr_wanted records into the + * btree. The implementation does this by setting the cursor's bc_rec + * field in in-core format and using init_rec_from_cur to set the + * records in the btree block. Records must be returned in sort order. + * The function must return the number of records loaded or the usual + * negative errno. */ - xfs_btree_bload_get_record_fn get_record; + xfs_btree_bload_get_records_fn get_records; /* * This function will be called nr_blocks times to obtain a pointer diff --git a/repair/agbtree.c b/repair/agbtree.c index 0fd7ef5d351..d90cbcc2f28 100644 --- a/repair/agbtree.c +++ b/repair/agbtree.c @@ -209,18 +209,25 @@ get_bno_rec( /* Grab one bnobt record and put it in the btree cursor. */ static int -get_bnobt_record( +get_bnobt_records( struct xfs_btree_cur *cur, + unsigned int idx, + struct xfs_btree_block *block, + unsigned int nr_wanted, void *priv) { struct bt_rebuild *btr = priv; struct xfs_alloc_rec_incore *arec = &cur->bc_rec.a; + union xfs_btree_rec *block_rec; btr->bno_rec = get_bno_rec(cur, btr->bno_rec); arec->ar_startblock = btr->bno_rec->ex_startblock; arec->ar_blockcount = btr->bno_rec->ex_blockcount; btr->freeblks += btr->bno_rec->ex_blockcount; - return 0; + + block_rec = libxfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + return 1; } void @@ -247,10 +254,10 @@ init_freespace_cursors( btr_cnt->cur = libxfs_allocbt_stage_cursor(sc->mp, &btr_cnt->newbt.afake, pag, XFS_BTNUM_CNT); - btr_bno->bload.get_record = get_bnobt_record; + btr_bno->bload.get_records = get_bnobt_records; btr_bno->bload.claim_block = rebuild_claim_block; - btr_cnt->bload.get_record = get_bnobt_record; + btr_cnt->bload.get_records = get_bnobt_records; btr_cnt->bload.claim_block = rebuild_claim_block; /* @@ -371,13 +378,17 @@ get_ino_rec( /* Grab one inobt record. */ static int -get_inobt_record( +get_inobt_records( struct xfs_btree_cur *cur, + unsigned int idx, + struct xfs_btree_block *block, + unsigned int nr_wanted, void *priv) { struct bt_rebuild *btr = priv; struct xfs_inobt_rec_incore *irec = &cur->bc_rec.i; struct ino_tree_node *ino_rec; + union xfs_btree_rec *block_rec; int inocnt = 0; int finocnt = 0; int k; @@ -431,7 +442,10 @@ get_inobt_record( btr->first_agino = ino_rec->ino_startnum; btr->freecount += finocnt; btr->count += inocnt; - return 0; + + block_rec = libxfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + return 1; } /* Initialize both inode btree cursors as needed. */ @@ -490,7 +504,7 @@ init_ino_cursors( btr_ino->cur = libxfs_inobt_stage_cursor(sc->mp, &btr_ino->newbt.afake, pag, XFS_BTNUM_INO); - btr_ino->bload.get_record = get_inobt_record; + btr_ino->bload.get_records = get_inobt_records; btr_ino->bload.claim_block = rebuild_claim_block; btr_ino->first_agino = NULLAGINO; @@ -510,7 +524,7 @@ _("Unable to compute inode btree geometry, error %d.\n"), error); btr_fino->cur = libxfs_inobt_stage_cursor(sc->mp, &btr_fino->newbt.afake, pag, XFS_BTNUM_FINO); - btr_fino->bload.get_record = get_inobt_record; + btr_fino->bload.get_records = get_inobt_records; btr_fino->bload.claim_block = rebuild_claim_block; btr_fino->first_agino = NULLAGINO; @@ -560,16 +574,23 @@ _("Error %d while creating finobt btree for AG %u.\n"), error, agno); /* Grab one rmap record. */ static int -get_rmapbt_record( +get_rmapbt_records( struct xfs_btree_cur *cur, + unsigned int idx, + struct xfs_btree_block *block, + unsigned int nr_wanted, void *priv) { struct xfs_rmap_irec *rec; struct bt_rebuild *btr = priv; + union xfs_btree_rec *block_rec; rec = pop_slab_cursor(btr->slab_cursor); memcpy(&cur->bc_rec.r, rec, sizeof(struct xfs_rmap_irec)); - return 0; + + block_rec = libxfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + return 1; } /* Set up the rmap rebuild parameters. */ @@ -589,7 +610,7 @@ init_rmapbt_cursor( init_rebuild(sc, &XFS_RMAP_OINFO_AG, free_space, btr); btr->cur = libxfs_rmapbt_stage_cursor(sc->mp, &btr->newbt.afake, pag); - btr->bload.get_record = get_rmapbt_record; + btr->bload.get_records = get_rmapbt_records; btr->bload.claim_block = rebuild_claim_block; /* Compute how many blocks we'll need. */ @@ -631,16 +652,23 @@ _("Error %d while creating rmap btree for AG %u.\n"), error, agno); /* Grab one refcount record. */ static int -get_refcountbt_record( +get_refcountbt_records( struct xfs_btree_cur *cur, + unsigned int idx, + struct xfs_btree_block *block, + unsigned int nr_wanted, void *priv) { struct xfs_refcount_irec *rec; struct bt_rebuild *btr = priv; + union xfs_btree_rec *block_rec; rec = pop_slab_cursor(btr->slab_cursor); memcpy(&cur->bc_rec.rc, rec, sizeof(struct xfs_refcount_irec)); - return 0; + + block_rec = libxfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + return 1; } /* Set up the refcount rebuild parameters. */ @@ -661,7 +689,7 @@ init_refc_cursor( btr->cur = libxfs_refcountbt_stage_cursor(sc->mp, &btr->newbt.afake, pag); - btr->bload.get_record = get_refcountbt_record; + btr->bload.get_records = get_refcountbt_records; btr->bload.claim_block = rebuild_claim_block; /* Compute how many blocks we'll need. */ From patchwork Fri Dec 30 22:17:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085047 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A10AC4332F for ; Sat, 31 Dec 2022 00:06:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235845AbiLaAGE (ORCPT ); Fri, 30 Dec 2022 19:06:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235435AbiLaAGC (ORCPT ); Fri, 30 Dec 2022 19:06:02 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 479061B1D4 for ; Fri, 30 Dec 2022 16:06:01 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 003E1B81DE9 for ; Sat, 31 Dec 2022 00:06:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B9EFEC433EF; Sat, 31 Dec 2022 00:05:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445158; bh=mCHnM0BzjhS5tq403JOmyF/qfOdCIdJ4YJbg3mC/DKE=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=uhrY3rTH60+nuVyUVkqBIBSK1ynhdRL/pUjHN0HEtgucgiCFOFvxVvVutZ92W0Pgf iLCI7hgGTh6mH3obFiFh4+FMxkesf+f0QZrUSx174AlwZ1ON/eW4LX1O5S1rbFc4Nh L77SChzxEItuwcTJ9AW3jgD179ntzSbBmtS7POn6MdzfM+cHhoK4crODss4nA7P31m c5eplDgxJ8tyYb4lT8pbtdeJyYBI2GRdTA8W5krKTtdRo/nFi10MQRAdKDfG7vCyQp TjPBQhLR2N1py+q30wN+45NjownwpG39keTuA1EbFAS+oMgoFchJPWqxLvv9E32Oj/ qBUbhqmXIwCDg== Subject: [PATCH 4/5] xfs: constrain dirty buffers while formatting a staged btree From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:19 -0800 Message-ID: <167243863957.707598.4173177371145564589.stgit@magnolia> In-Reply-To: <167243863904.707598.12385476439101029022.stgit@magnolia> References: <167243863904.707598.12385476439101029022.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Constrain the number of dirty buffers that are locked by the btree staging code at any given time by establishing a threshold at which we put them all on the delwri queue and push them to disk. This limits memory consumption while writing out new btrees. Signed-off-by: Darrick J. Wong --- libxfs/xfs_btree.c | 2 +- libxfs/xfs_btree.h | 3 +++ libxfs/xfs_btree_staging.c | 50 ++++++++++++++++++++++++++++++++++---------- libxfs/xfs_btree_staging.h | 10 +++++++++ repair/agbtree.c | 1 + 5 files changed, 54 insertions(+), 12 deletions(-) diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c index 3402c25c344..7b2df32960c 100644 --- a/libxfs/xfs_btree.c +++ b/libxfs/xfs_btree.c @@ -1327,7 +1327,7 @@ xfs_btree_get_buf_block( * Read in the buffer at the given ptr and return the buffer and * the block pointer within the buffer. */ -STATIC int +int xfs_btree_read_buf_block( struct xfs_btree_cur *cur, const union xfs_btree_ptr *ptr, diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h index d8b390e895b..6a565ad5e83 100644 --- a/libxfs/xfs_btree.h +++ b/libxfs/xfs_btree.h @@ -701,6 +701,9 @@ void xfs_btree_set_ptr_null(struct xfs_btree_cur *cur, int xfs_btree_get_buf_block(struct xfs_btree_cur *cur, const union xfs_btree_ptr *ptr, struct xfs_btree_block **block, struct xfs_buf **bpp); +int xfs_btree_read_buf_block(struct xfs_btree_cur *cur, + const union xfs_btree_ptr *ptr, int flags, + struct xfs_btree_block **block, struct xfs_buf **bpp); void xfs_btree_set_sibling(struct xfs_btree_cur *cur, struct xfs_btree_block *block, const union xfs_btree_ptr *ptr, int lr); diff --git a/libxfs/xfs_btree_staging.c b/libxfs/xfs_btree_staging.c index 97fade90622..5391d3fead2 100644 --- a/libxfs/xfs_btree_staging.c +++ b/libxfs/xfs_btree_staging.c @@ -333,18 +333,35 @@ xfs_btree_commit_ifakeroot( /* * Put a btree block that we're loading onto the ordered list and release it. * The btree blocks will be written to disk when bulk loading is finished. + * If we reach the dirty buffer threshold, flush them to disk before + * continuing. */ -static void +static int xfs_btree_bload_drop_buf( - struct list_head *buffers_list, - struct xfs_buf **bpp) + struct xfs_btree_bload *bbl, + struct list_head *buffers_list, + struct xfs_buf **bpp) { - if (*bpp == NULL) - return; + struct xfs_buf *bp = *bpp; + int error; - xfs_buf_delwri_queue_here(*bpp, buffers_list); - xfs_buf_relse(*bpp); + if (!bp) + return 0; + + xfs_buf_delwri_queue_here(bp, buffers_list); + xfs_buf_relse(bp); *bpp = NULL; + bbl->nr_dirty++; + + if (!bbl->max_dirty || bbl->nr_dirty < bbl->max_dirty) + return 0; + + error = xfs_buf_delwri_submit(buffers_list); + if (error) + return error; + + bbl->nr_dirty = 0; + return 0; } /* @@ -416,7 +433,10 @@ xfs_btree_bload_prep_block( */ if (*blockp) xfs_btree_set_sibling(cur, *blockp, &new_ptr, XFS_BB_RIGHTSIB); - xfs_btree_bload_drop_buf(buffers_list, bpp); + + ret = xfs_btree_bload_drop_buf(bbl, buffers_list, bpp); + if (ret) + return ret; /* Initialize the new btree block. */ xfs_btree_init_block_cur(cur, new_bp, level, nr_this_block); @@ -480,7 +500,7 @@ xfs_btree_bload_node( ASSERT(!xfs_btree_ptr_is_null(cur, child_ptr)); - ret = xfs_btree_get_buf_block(cur, child_ptr, &child_block, + ret = xfs_btree_read_buf_block(cur, child_ptr, 0, &child_block, &child_bp); if (ret) return ret; @@ -759,6 +779,7 @@ xfs_btree_bload( cur->bc_nlevels = bbl->btree_height; xfs_btree_set_ptr_null(cur, &child_ptr); xfs_btree_set_ptr_null(cur, &ptr); + bbl->nr_dirty = 0; xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level, &avg_per_block, &blocks, &blocks_with_extra); @@ -797,7 +818,10 @@ xfs_btree_bload( xfs_btree_copy_ptrs(cur, &child_ptr, &ptr, 1); } total_blocks += blocks; - xfs_btree_bload_drop_buf(&buffers_list, &bp); + + ret = xfs_btree_bload_drop_buf(bbl, &buffers_list, &bp); + if (ret) + goto out; /* Populate the internal btree nodes. */ for (level = 1; level < cur->bc_nlevels; level++) { @@ -839,7 +863,11 @@ xfs_btree_bload( xfs_btree_copy_ptrs(cur, &first_ptr, &ptr, 1); } total_blocks += blocks; - xfs_btree_bload_drop_buf(&buffers_list, &bp); + + ret = xfs_btree_bload_drop_buf(bbl, &buffers_list, &bp); + if (ret) + goto out; + xfs_btree_copy_ptrs(cur, &child_ptr, &first_ptr, 1); } diff --git a/libxfs/xfs_btree_staging.h b/libxfs/xfs_btree_staging.h index 82a3a8ef0f1..d2eaf4fdc60 100644 --- a/libxfs/xfs_btree_staging.h +++ b/libxfs/xfs_btree_staging.h @@ -115,6 +115,16 @@ struct xfs_btree_bload { * height of the new btree. */ unsigned int btree_height; + + /* + * Flush the new btree block buffer list to disk after this many blocks + * have been formatted. Zero prohibits writing any buffers until all + * blocks have been formatted. + */ + uint16_t max_dirty; + + /* Number of dirty buffers. */ + uint16_t nr_dirty; }; int xfs_btree_bload_compute_geometry(struct xfs_btree_cur *cur, diff --git a/repair/agbtree.c b/repair/agbtree.c index d90cbcc2f28..70ad042f832 100644 --- a/repair/agbtree.c +++ b/repair/agbtree.c @@ -23,6 +23,7 @@ init_rebuild( memset(btr, 0, sizeof(struct bt_rebuild)); bulkload_init_ag(&btr->newbt, sc, oinfo); + btr->bload.max_dirty = XFS_B_TO_FSBT(sc->mp, 256U << 10); /* 256K */ bulkload_estimate_ag_slack(sc, &btr->bload, free_space); } From patchwork Fri Dec 30 22:17:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4AB5C4332F for ; Sat, 31 Dec 2022 00:06:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235435AbiLaAGT (ORCPT ); Fri, 30 Dec 2022 19:06:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235850AbiLaAGS (ORCPT ); Fri, 30 Dec 2022 19:06:18 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C98441E3C3 for ; Fri, 30 Dec 2022 16:06:16 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 86CFDB81DEC for ; Sat, 31 Dec 2022 00:06:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 458B9C433EF; Sat, 31 Dec 2022 00:06:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672445174; bh=fAX0EHj69GwzC89bFxzPUE6s00AP9GzBs2oKwH2gbik=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=qsVGXDacY4arvQYqh9Y6Nighky1xpm/4c9vFo5Ypp4jBHd1+wZxtrLH0fHlZRJWUt 91UHB2IlOyRRWCMPutAfJFWj9qDAiExv7HFRUHQluN9LuCAq7OGLO+OZtcYJpepsKS l9lIE5rx3nWkFwhpuDo+Kh17wG8ka/Fs7+YgubtKviPH9E3LZ3kdYalWxw4v6HFt5g hAXU0Oa07bFXO1GhmrLkP7q62IxzfcA67cn0yLvnwAUNgiJOdPkkLsbawpZlOy2Zpm WTjdBMyCkDlZGavRbu+oxRBU7F0OUdnRWcsmv/O8Rfpis1d+ZB7Q0g+za+6mrIe1nM BKclLdGCpKA1A== Subject: [PATCH 5/5] xfs_repair: bulk load records into new btree blocks From: "Darrick J. Wong" To: cem@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:17:19 -0800 Message-ID: <167243863970.707598.5919935346304694103.stgit@magnolia> In-Reply-To: <167243863904.707598.12385476439101029022.stgit@magnolia> References: <167243863904.707598.12385476439101029022.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Amortize the cost of indirect calls further by loading a batch of records into a new btree block instead of one record per ->get_record call. On a rmap btree with 3.9 million records, this reduces the runtime of xfs_btree_bload by 3% for xfsprogs. For the upcoming online repair functionality, this will reduce runtime by 6% when spectre mitigations are enabled in the kernel. Signed-off-by: Darrick J. Wong --- repair/agbtree.c | 161 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 90 insertions(+), 71 deletions(-) diff --git a/repair/agbtree.c b/repair/agbtree.c index 70ad042f832..cba67c5fbf4 100644 --- a/repair/agbtree.c +++ b/repair/agbtree.c @@ -220,15 +220,19 @@ get_bnobt_records( struct bt_rebuild *btr = priv; struct xfs_alloc_rec_incore *arec = &cur->bc_rec.a; union xfs_btree_rec *block_rec; + unsigned int loaded; - btr->bno_rec = get_bno_rec(cur, btr->bno_rec); - arec->ar_startblock = btr->bno_rec->ex_startblock; - arec->ar_blockcount = btr->bno_rec->ex_blockcount; - btr->freeblks += btr->bno_rec->ex_blockcount; + for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { + btr->bno_rec = get_bno_rec(cur, btr->bno_rec); + arec->ar_startblock = btr->bno_rec->ex_startblock; + arec->ar_blockcount = btr->bno_rec->ex_blockcount; + btr->freeblks += btr->bno_rec->ex_blockcount; - block_rec = libxfs_btree_rec_addr(cur, idx, block); - cur->bc_ops->init_rec_from_cur(cur, block_rec); - return 1; + block_rec = libxfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + } + + return loaded; } void @@ -388,65 +392,72 @@ get_inobt_records( { struct bt_rebuild *btr = priv; struct xfs_inobt_rec_incore *irec = &cur->bc_rec.i; - struct ino_tree_node *ino_rec; - union xfs_btree_rec *block_rec; - int inocnt = 0; - int finocnt = 0; - int k; - - btr->ino_rec = ino_rec = get_ino_rec(cur, btr->ino_rec); - - /* Transform the incore record into an on-disk record. */ - irec->ir_startino = ino_rec->ino_startnum; - irec->ir_free = ino_rec->ir_free; - - for (k = 0; k < sizeof(xfs_inofree_t) * NBBY; k++) { - ASSERT(is_inode_confirmed(ino_rec, k)); - - if (is_inode_sparse(ino_rec, k)) - continue; - if (is_inode_free(ino_rec, k)) - finocnt++; - inocnt++; - } + unsigned int loaded = 0; + + while (loaded < nr_wanted) { + struct ino_tree_node *ino_rec; + union xfs_btree_rec *block_rec; + int inocnt = 0; + int finocnt = 0; + int k; + + btr->ino_rec = ino_rec = get_ino_rec(cur, btr->ino_rec); - irec->ir_count = inocnt; - irec->ir_freecount = finocnt; - - if (xfs_has_sparseinodes(cur->bc_mp)) { - uint64_t sparse; - int spmask; - uint16_t holemask; - - /* - * Convert the 64-bit in-core sparse inode state to the - * 16-bit on-disk holemask. - */ - holemask = 0; - spmask = (1 << XFS_INODES_PER_HOLEMASK_BIT) - 1; - sparse = ino_rec->ir_sparse; - for (k = 0; k < XFS_INOBT_HOLEMASK_BITS; k++) { - if (sparse & spmask) { - ASSERT((sparse & spmask) == spmask); - holemask |= (1 << k); - } else - ASSERT((sparse & spmask) == 0); - sparse >>= XFS_INODES_PER_HOLEMASK_BIT; + /* Transform the incore record into an on-disk record. */ + irec->ir_startino = ino_rec->ino_startnum; + irec->ir_free = ino_rec->ir_free; + + for (k = 0; k < sizeof(xfs_inofree_t) * NBBY; k++) { + ASSERT(is_inode_confirmed(ino_rec, k)); + + if (is_inode_sparse(ino_rec, k)) + continue; + if (is_inode_free(ino_rec, k)) + finocnt++; + inocnt++; } - irec->ir_holemask = holemask; - } else { - irec->ir_holemask = 0; - } + irec->ir_count = inocnt; + irec->ir_freecount = finocnt; - if (btr->first_agino == NULLAGINO) - btr->first_agino = ino_rec->ino_startnum; - btr->freecount += finocnt; - btr->count += inocnt; + if (xfs_has_sparseinodes(cur->bc_mp)) { + uint64_t sparse; + int spmask; + uint16_t holemask; + + /* + * Convert the 64-bit in-core sparse inode state to the + * 16-bit on-disk holemask. + */ + holemask = 0; + spmask = (1 << XFS_INODES_PER_HOLEMASK_BIT) - 1; + sparse = ino_rec->ir_sparse; + for (k = 0; k < XFS_INOBT_HOLEMASK_BITS; k++) { + if (sparse & spmask) { + ASSERT((sparse & spmask) == spmask); + holemask |= (1 << k); + } else + ASSERT((sparse & spmask) == 0); + sparse >>= XFS_INODES_PER_HOLEMASK_BIT; + } + + irec->ir_holemask = holemask; + } else { + irec->ir_holemask = 0; + } + + if (btr->first_agino == NULLAGINO) + btr->first_agino = ino_rec->ino_startnum; + btr->freecount += finocnt; + btr->count += inocnt; + + block_rec = libxfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + loaded++; + idx++; + } - block_rec = libxfs_btree_rec_addr(cur, idx, block); - cur->bc_ops->init_rec_from_cur(cur, block_rec); - return 1; + return loaded; } /* Initialize both inode btree cursors as needed. */ @@ -585,13 +596,17 @@ get_rmapbt_records( struct xfs_rmap_irec *rec; struct bt_rebuild *btr = priv; union xfs_btree_rec *block_rec; + unsigned int loaded; - rec = pop_slab_cursor(btr->slab_cursor); - memcpy(&cur->bc_rec.r, rec, sizeof(struct xfs_rmap_irec)); + for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { + rec = pop_slab_cursor(btr->slab_cursor); + memcpy(&cur->bc_rec.r, rec, sizeof(struct xfs_rmap_irec)); - block_rec = libxfs_btree_rec_addr(cur, idx, block); - cur->bc_ops->init_rec_from_cur(cur, block_rec); - return 1; + block_rec = libxfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + } + + return loaded; } /* Set up the rmap rebuild parameters. */ @@ -663,13 +678,17 @@ get_refcountbt_records( struct xfs_refcount_irec *rec; struct bt_rebuild *btr = priv; union xfs_btree_rec *block_rec; + unsigned int loaded; - rec = pop_slab_cursor(btr->slab_cursor); - memcpy(&cur->bc_rec.rc, rec, sizeof(struct xfs_refcount_irec)); + for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { + rec = pop_slab_cursor(btr->slab_cursor); + memcpy(&cur->bc_rec.rc, rec, sizeof(struct xfs_refcount_irec)); - block_rec = libxfs_btree_rec_addr(cur, idx, block); - cur->bc_ops->init_rec_from_cur(cur, block_rec); - return 1; + block_rec = libxfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + } + + return loaded; } /* Set up the refcount rebuild parameters. */