From patchwork Sun Dec 31 21:44:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507714 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97749BE47 for ; Sun, 31 Dec 2023 21:44:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YIP7ZvnM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 691ACC433C8; Sun, 31 Dec 2023 21:44:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059090; bh=uzp84HMzlj1HZJam2UQAD3CxwT5mdpLTTCgCQNChfcc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=YIP7ZvnMeC46rxf2C4Voefub09izL2/FLOmD+7Ej8hyVjLyzLKMHNDFYfXd/EAl2C squgOu/efmrRrA/7+4WxvKl5IU41p8lr9+FCmoA7pwX5KHAOCGHMMyLHJtrUOoSDFO TvrLuGyy6xxH8byyOJD3ugKxQqOiPDeTsFHL6dt1dJpS/sQVptQ1LAGrEcVId+REW1 L57ZiFj95HCpysVl05BWxViicleeONSLawaA/9XKjB0ibhi+taux6I9aoQhMrgNdf1 o8TDz7bri8hJGMrmZCmhQcgWuWtY44bAi9g++1PAf+iX838IKBvGu57+iDmQrt4S3Y Br0+ntDVdIFYw== Date: Sun, 31 Dec 2023 13:44:49 -0800 Subject: [PATCH 01/44] xfs: prepare refcount btree cursor tracepoints for realtime From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851597.1766284.13553805852762895833.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Rework the refcount btree cursor tracepoints in preparation to handle the realtime refcount btree cursor. Mostly this involves renaming the field to "refcbno" and extracting the group number from the cursor when possible. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_trace.c | 9 ++++ fs/xfs/xfs_trace.h | 114 ++++++++++++++++++++++++++++++---------------------- 2 files changed, 74 insertions(+), 49 deletions(-) diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index fe34d7c3882f3..cb712873ce927 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -66,6 +66,15 @@ xfs_rmapbt_crack_agno_opdev( } } +static inline void +xfs_refcountbt_crack_agno_opdev( + struct xfs_btree_cur *cur, + xfs_agnumber_t *agno, + dev_t *opdev) +{ + return xfs_rmapbt_crack_agno_opdev(cur, agno, opdev); +} + /* * We include this last to have the helpers above available for the trace * event implementations. diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index c8227f44a97c0..fd4170a6aea43 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -22,6 +22,8 @@ * * rmapbno: physical block number for a reverse mapping. This is an agbno for * per-AG rmap btrees or a rgbno for realtime rmap btrees. + * refcbno: physical block number for a refcount record. This is an agbno for + * per-AG refcount btrees or a rgbno for realtime refcount btrees. * * daddr: physical block number in 512b blocks * bbcount: number of blocks in a physical extent, in 512b blocks @@ -3234,56 +3236,60 @@ DEFINE_AG_ERROR_EVENT(xfs_ag_resv_init_error); /* refcount tracepoint classes */ DECLARE_EVENT_CLASS(xfs_refcount_class, - TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t agbno, + TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t refcbno, xfs_extlen_t len), - TP_ARGS(cur, agbno, len), + TP_ARGS(cur, refcbno, len), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, refcbno) __field(xfs_extlen_t, len) ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; - __entry->agbno = agbno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); + __entry->refcbno = refcbno; __entry->len = len; ), - TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x refcbno 0x%x fsbcount 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->agbno, + __entry->refcbno, __entry->len) ); #define DEFINE_REFCOUNT_EVENT(name) \ DEFINE_EVENT(xfs_refcount_class, name, \ - TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t agbno, \ + TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t refcbno, \ xfs_extlen_t len), \ - TP_ARGS(cur, agbno, len)) + TP_ARGS(cur, refcbno, len)) TRACE_DEFINE_ENUM(XFS_LOOKUP_EQi); TRACE_DEFINE_ENUM(XFS_LOOKUP_LEi); TRACE_DEFINE_ENUM(XFS_LOOKUP_GEi); TRACE_EVENT(xfs_refcount_lookup, - TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t agbno, + TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t refcbno, xfs_lookup_t dir), - TP_ARGS(cur, agbno, dir), + TP_ARGS(cur, refcbno, dir), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, refcbno) __field(xfs_lookup_t, dir) ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; - __entry->agbno = agbno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); + __entry->refcbno = refcbno; __entry->dir = dir; ), - TP_printk("dev %d:%d agno 0x%x agbno 0x%x cmp %s(%d)", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x refcbno 0x%x cmp %s(%d)", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->agbno, + __entry->refcbno, __print_symbolic(__entry->dir, XFS_AG_BTREE_CMP_FORMAT_STR), __entry->dir) ) @@ -3294,6 +3300,7 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_class, TP_ARGS(cur, irec), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) __field(enum xfs_refc_domain, domain) __field(xfs_agblock_t, startblock) @@ -3302,14 +3309,15 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->domain = irec->rc_domain; __entry->startblock = irec->rc_startblock; __entry->blockcount = irec->rc_blockcount; __entry->refcount = irec->rc_refcount; ), - TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x dom %s refcbno 0x%x fsbcount 0x%x refcount %u", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, __print_symbolic(__entry->domain, XFS_REFC_DOMAIN_STRINGS), __entry->startblock, @@ -3325,49 +3333,52 @@ DEFINE_EVENT(xfs_refcount_extent_class, name, \ /* single-rcext and an agbno tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_extent_at_class, TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *irec, - xfs_agblock_t agbno), - TP_ARGS(cur, irec, agbno), + xfs_agblock_t refcbno), + TP_ARGS(cur, irec, refcbno), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) __field(enum xfs_refc_domain, domain) __field(xfs_agblock_t, startblock) __field(xfs_extlen_t, blockcount) __field(xfs_nlink_t, refcount) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, refcbno) ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->domain = irec->rc_domain; __entry->startblock = irec->rc_startblock; __entry->blockcount = irec->rc_blockcount; __entry->refcount = irec->rc_refcount; - __entry->agbno = agbno; + __entry->refcbno = refcbno; ), - TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u @ agbno 0x%x", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x dom %s refcbno 0x%x fsbcount 0x%x refcount %u @ refcbno 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, __print_symbolic(__entry->domain, XFS_REFC_DOMAIN_STRINGS), __entry->startblock, __entry->blockcount, __entry->refcount, - __entry->agbno) + __entry->refcbno) ) #define DEFINE_REFCOUNT_EXTENT_AT_EVENT(name) \ DEFINE_EVENT(xfs_refcount_extent_at_class, name, \ TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *irec, \ - xfs_agblock_t agbno), \ - TP_ARGS(cur, irec, agbno)) + xfs_agblock_t refcbno), \ + TP_ARGS(cur, irec, refcbno)) /* double-rcext tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class, TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, - struct xfs_refcount_irec *i2), + struct xfs_refcount_irec *i2), TP_ARGS(cur, i1, i2), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) __field(enum xfs_refc_domain, i1_domain) __field(xfs_agblock_t, i1_startblock) @@ -3380,7 +3391,7 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->i1_domain = i1->rc_domain; __entry->i1_startblock = i1->rc_startblock; __entry->i1_blockcount = i1->rc_blockcount; @@ -3390,9 +3401,10 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class, __entry->i2_blockcount = i2->rc_blockcount; __entry->i2_refcount = i2->rc_refcount; ), - TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u -- " - "dom %s agbno 0x%x fsbcount 0x%x refcount %u", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x dom %s refcbno 0x%x fsbcount 0x%x refcount %u -- " + "dom %s refcbno 0x%x fsbcount 0x%x refcount %u", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, __print_symbolic(__entry->i1_domain, XFS_REFC_DOMAIN_STRINGS), __entry->i1_startblock, @@ -3413,10 +3425,11 @@ DEFINE_EVENT(xfs_refcount_double_extent_class, name, \ /* double-rcext and an agbno tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class, TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, - struct xfs_refcount_irec *i2, xfs_agblock_t agbno), - TP_ARGS(cur, i1, i2, agbno), + struct xfs_refcount_irec *i2, xfs_agblock_t refcbno), + TP_ARGS(cur, i1, i2, refcbno), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) __field(enum xfs_refc_domain, i1_domain) __field(xfs_agblock_t, i1_startblock) @@ -3426,11 +3439,11 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class, __field(xfs_agblock_t, i2_startblock) __field(xfs_extlen_t, i2_blockcount) __field(xfs_nlink_t, i2_refcount) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, refcbno) ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->i1_domain = i1->rc_domain; __entry->i1_startblock = i1->rc_startblock; __entry->i1_blockcount = i1->rc_blockcount; @@ -3439,11 +3452,12 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class, __entry->i2_startblock = i2->rc_startblock; __entry->i2_blockcount = i2->rc_blockcount; __entry->i2_refcount = i2->rc_refcount; - __entry->agbno = agbno; + __entry->refcbno = refcbno; ), - TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u -- " - "dom %s agbno 0x%x fsbcount 0x%x refcount %u @ agbno 0x%x", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x dom %s refcbno 0x%x fsbcount 0x%x refcount %u -- " + "dom %s refcbno 0x%x fsbcount 0x%x refcount %u @ refcbno 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, __print_symbolic(__entry->i1_domain, XFS_REFC_DOMAIN_STRINGS), __entry->i1_startblock, @@ -3453,14 +3467,14 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class, __entry->i2_startblock, __entry->i2_blockcount, __entry->i2_refcount, - __entry->agbno) + __entry->refcbno) ) #define DEFINE_REFCOUNT_DOUBLE_EXTENT_AT_EVENT(name) \ DEFINE_EVENT(xfs_refcount_double_extent_at_class, name, \ TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, \ - struct xfs_refcount_irec *i2, xfs_agblock_t agbno), \ - TP_ARGS(cur, i1, i2, agbno)) + struct xfs_refcount_irec *i2, xfs_agblock_t refcbno), \ + TP_ARGS(cur, i1, i2, refcbno)) /* triple-rcext tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class, @@ -3469,6 +3483,7 @@ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class, TP_ARGS(cur, i1, i2, i3), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) __field(enum xfs_refc_domain, i1_domain) __field(xfs_agblock_t, i1_startblock) @@ -3485,7 +3500,7 @@ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->i1_domain = i1->rc_domain; __entry->i1_startblock = i1->rc_startblock; __entry->i1_blockcount = i1->rc_blockcount; @@ -3499,10 +3514,11 @@ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class, __entry->i3_blockcount = i3->rc_blockcount; __entry->i3_refcount = i3->rc_refcount; ), - TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u -- " - "dom %s agbno 0x%x fsbcount 0x%x refcount %u -- " - "dom %s agbno 0x%x fsbcount 0x%x refcount %u", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x dom %s refcbno 0x%x fsbcount 0x%x refcount %u -- " + "dom %s refcbno 0x%x fsbcount 0x%x refcount %u -- " + "dom %s refcbno 0x%x fsbcount 0x%x refcount %u", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, __print_symbolic(__entry->i1_domain, XFS_REFC_DOMAIN_STRINGS), __entry->i1_startblock, @@ -3572,21 +3588,21 @@ DECLARE_EVENT_CLASS(xfs_refcount_deferred_class, __field(dev_t, dev) __field(xfs_agnumber_t, agno) __field(int, op) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, refcbno) __field(xfs_extlen_t, len) ), TP_fast_assign( __entry->dev = mp->m_super->s_dev; __entry->agno = XFS_FSB_TO_AGNO(mp, refc->ri_startblock); __entry->op = refc->ri_type; - __entry->agbno = XFS_FSB_TO_AGBNO(mp, refc->ri_startblock); + __entry->refcbno = XFS_FSB_TO_AGBNO(mp, refc->ri_startblock); __entry->len = refc->ri_blockcount; ), - TP_printk("dev %d:%d op %s agno 0x%x agbno 0x%x fsbcount 0x%x", + TP_printk("dev %d:%d op %s agno 0x%x refcbno 0x%x fsbcount 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), __print_symbolic(__entry->op, XFS_REFCOUNT_INTENT_STRINGS), __entry->agno, - __entry->agbno, + __entry->refcbno, __entry->len) ); #define DEFINE_REFCOUNT_DEFERRED_EVENT(name) \ From patchwork Sun Dec 31 21:45:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507715 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BF91BE47 for ; Sun, 31 Dec 2023 21:45:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aS1zsAgS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0F1A6C433C7; Sun, 31 Dec 2023 21:45:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059106; bh=S1HB87Jh432SVfnrdMA60zpCCquNIBO/LU2/Jtd3JP8=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=aS1zsAgSasb1/GqrMoOWi5s7T97kOz8u2MMsTiwvHPjDQcWT6yiOoe5rj4YYDQb/i EP5BEu7RTiDGopVle1AwmjrobcYptoUx/HWYNdkmqQEmBzh8W2rfkzEKR4EYhb/0D+ mDQMH/neiQom2Y8YePMyy/laYSVR1O7nfLnzAz3087CjZvnPlw2qNGVnxjRx3lnyMv BzOqIJJHtfLuZi3/owOErEmGMKaC1kJD8kI9cm1ah/7B17O0DQQkjUwtLtZ6blQ48X ZdzVmaXrkGDfPYSuieOxS8FaLwevE52RrndNQR4dthC52xZenvkmV9wJ6M77E44xTX wjNKlJZ9X8pmw== Date: Sun, 31 Dec 2023 13:45:05 -0800 Subject: [PATCH 02/44] xfs: introduce realtime refcount btree definitions From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851613.1766284.11276299938653255299.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add new realtime refcount btree definitions. The realtime refcount btree will be rooted from a hidden inode, but has its own shape and therefore needs to have most of its own separate types. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_btree.h | 1 + fs/xfs/libxfs/xfs_format.h | 6 ++++++ fs/xfs/libxfs/xfs_types.h | 5 +++-- fs/xfs/scrub/trace.h | 1 + fs/xfs/xfs_trace.h | 1 + 5 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index 4753a5c847616..f58240adda6f4 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -65,6 +65,7 @@ union xfs_btree_rec { #define XFS_BTNUM_REFC ((xfs_btnum_t)XFS_BTNUM_REFCi) #define XFS_BTNUM_RCBAG ((xfs_btnum_t)XFS_BTNUM_RCBAGi) #define XFS_BTNUM_RTRMAP ((xfs_btnum_t)XFS_BTNUM_RTRMAPi) +#define XFS_BTNUM_RTREFC ((xfs_btnum_t)XFS_BTNUM_RTREFCi) struct xfs_btree_ops; uint32_t xfs_btree_magic(struct xfs_mount *mp, const struct xfs_btree_ops *ops); diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 1c1910256a927..0dc169fde2e3d 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1815,6 +1815,12 @@ struct xfs_refcount_key { /* btree pointer type */ typedef __be32 xfs_refcount_ptr_t; +/* + * Realtime Reference Count btree format definitions + * + * This is a btree for reference count records for realtime volumes + */ +#define XFS_RTREFC_CRC_MAGIC 0x52434e54 /* 'RCNT' */ /* * BMAP Btree format definitions diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index b3edc57dc65bd..4147ba288ec18 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -126,7 +126,7 @@ typedef enum { typedef enum { XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_REFCi, XFS_BTNUM_RCBAGi, - XFS_BTNUM_RTRMAPi, XFS_BTNUM_MAX + XFS_BTNUM_RTRMAPi, XFS_BTNUM_RTREFCi, XFS_BTNUM_MAX } xfs_btnum_t; #define XFS_BTNUM_STRINGS \ @@ -138,7 +138,8 @@ typedef enum { { XFS_BTNUM_FINOi, "finobt" }, \ { XFS_BTNUM_REFCi, "refcbt" }, \ { XFS_BTNUM_RCBAGi, "rcbagbt" }, \ - { XFS_BTNUM_RTRMAPi, "rtrmapbt" } + { XFS_BTNUM_RTRMAPi, "rtrmapbt" }, \ + { XFS_BTNUM_RTREFCi, "rtrefcbt" } struct xfs_name { const unsigned char *name; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 65e0872792e1f..72b5277f4ba6d 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -49,6 +49,7 @@ TRACE_DEFINE_ENUM(XFS_BTNUM_RMAPi); TRACE_DEFINE_ENUM(XFS_BTNUM_REFCi); TRACE_DEFINE_ENUM(XFS_BTNUM_RCBAGi); TRACE_DEFINE_ENUM(XFS_BTNUM_RTRMAPi); +TRACE_DEFINE_ENUM(XFS_BTNUM_RTREFCi); TRACE_DEFINE_ENUM(XFS_REFC_DOMAIN_SHARED); TRACE_DEFINE_ENUM(XFS_REFC_DOMAIN_COW); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index fd4170a6aea43..86e0aa946aa00 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2555,6 +2555,7 @@ TRACE_DEFINE_ENUM(XFS_BTNUM_RMAPi); TRACE_DEFINE_ENUM(XFS_BTNUM_REFCi); TRACE_DEFINE_ENUM(XFS_BTNUM_RCBAGi); TRACE_DEFINE_ENUM(XFS_BTNUM_RTRMAPi); +TRACE_DEFINE_ENUM(XFS_BTNUM_RTREFCi); DECLARE_EVENT_CLASS(xfs_btree_cur_class, TP_PROTO(struct xfs_btree_cur *cur, int level, struct xfs_buf *bp), From patchwork Sun Dec 31 21:45:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507716 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58461BE47 for ; Sun, 31 Dec 2023 21:45:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="u0uq4ELU" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB8C7C433C8; Sun, 31 Dec 2023 21:45:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059121; bh=72ryRCNcAXpJG0BOUn8Nnt8g1BBWOcYKhp5yC+o3RqU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=u0uq4ELU/EJti4EgcdJx+0In1PpT+D+sIJ4YKYZiVM8aiQ6Sb3h84IIyBAWd0Lojl XoygJbc4Zjc5CQk7j2eJlhlPNO++furS5fOMG6bdHrWwAXOU4Bw0/8PKwtekLSchJS AkeTF8KYZKNRDNz0EeC6M84N2OnvnpK+GoCaw63PuMZUU0QuOdIx1jP43yda2ZIzTK g7SnTI6mywvaficbgPeMFBkAQXCqffOxCwByv7zovOMU3P6dcYv4gDnWDcfop5TYvv 0MYFYSz/ASukdrquvCqfaUS8xOJwohy9bV0zg4lR7/C/bCvU8H0sI7A7lLjglmqyf3 8sfbdoJvw1y/w== Date: Sun, 31 Dec 2023 13:45:21 -0800 Subject: [PATCH 03/44] xfs: namespace the maximum length/refcount symbols From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851629.1766284.4083085525437258188.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Actually namespace these variables properly, so that readers can tell that this is an XFS symbol, and that it's for the refcount functionality. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_format.h | 4 ++-- fs/xfs/libxfs/xfs_refcount.c | 18 +++++++++--------- fs/xfs/scrub/refcount.c | 2 +- fs/xfs/scrub/refcount_repair.c | 4 ++-- 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 0dc169fde2e3d..473bdc2a1ad10 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1809,8 +1809,8 @@ struct xfs_refcount_key { __be32 rc_startblock; /* starting block number */ }; -#define MAXREFCOUNT ((xfs_nlink_t)~0U) -#define MAXREFCEXTLEN ((xfs_extlen_t)~0U) +#define XFS_REFC_REFCOUNT_MAX ((xfs_nlink_t)~0U) +#define XFS_REFC_LEN_MAX ((xfs_extlen_t)~0U) /* btree pointer type */ typedef __be32 xfs_refcount_ptr_t; diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index edabdf1a97d4b..b29a718737c59 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -128,7 +128,7 @@ xfs_refcount_check_irec( struct xfs_perag *pag, const struct xfs_refcount_irec *irec) { - if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN) + if (irec->rc_blockcount == 0 || irec->rc_blockcount > XFS_REFC_LEN_MAX) return __this_address; if (!xfs_refcount_check_domain(irec)) @@ -138,7 +138,7 @@ xfs_refcount_check_irec( if (!xfs_verify_agbext(pag, irec->rc_startblock, irec->rc_blockcount)) return __this_address; - if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT) + if (irec->rc_refcount == 0 || irec->rc_refcount > XFS_REFC_REFCOUNT_MAX) return __this_address; return NULL; @@ -853,9 +853,9 @@ xfs_refc_merge_refcount( const struct xfs_refcount_irec *irec, enum xfs_refc_adjust_op adjust) { - /* Once a record hits MAXREFCOUNT, it is pinned there forever */ - if (irec->rc_refcount == MAXREFCOUNT) - return MAXREFCOUNT; + /* Once a record hits XFS_REFC_REFCOUNT_MAX, it is pinned forever */ + if (irec->rc_refcount == XFS_REFC_REFCOUNT_MAX) + return XFS_REFC_REFCOUNT_MAX; return irec->rc_refcount + adjust; } @@ -898,7 +898,7 @@ xfs_refc_want_merge_center( * hence we need to catch u32 addition overflows here. */ ulen += cleft->rc_blockcount + right->rc_blockcount; - if (ulen >= MAXREFCEXTLEN) + if (ulen >= XFS_REFC_LEN_MAX) return false; *ulenp = ulen; @@ -933,7 +933,7 @@ xfs_refc_want_merge_left( * hence we need to catch u32 addition overflows here. */ ulen += cleft->rc_blockcount; - if (ulen >= MAXREFCEXTLEN) + if (ulen >= XFS_REFC_LEN_MAX) return false; return true; @@ -967,7 +967,7 @@ xfs_refc_want_merge_right( * hence we need to catch u32 addition overflows here. */ ulen += cright->rc_blockcount; - if (ulen >= MAXREFCEXTLEN) + if (ulen >= XFS_REFC_LEN_MAX) return false; return true; @@ -1197,7 +1197,7 @@ xfs_refcount_adjust_extents( * Adjust the reference count and either update the tree * (incr) or free the blocks (decr). */ - if (ext.rc_refcount == MAXREFCOUNT) + if (ext.rc_refcount == XFS_REFC_REFCOUNT_MAX) goto skip; ext.rc_refcount += adj; trace_xfs_refcount_modify_extent(cur, &ext); diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c index d0c7d4a29c0fe..31a0461e2a0e3 100644 --- a/fs/xfs/scrub/refcount.c +++ b/fs/xfs/scrub/refcount.c @@ -421,7 +421,7 @@ xchk_refcount_mergeable( if (r1->rc_refcount != r2->rc_refcount) return false; if ((unsigned long long)r1->rc_blockcount + r2->rc_blockcount > - MAXREFCEXTLEN) + XFS_REFC_LEN_MAX) return false; return true; diff --git a/fs/xfs/scrub/refcount_repair.c b/fs/xfs/scrub/refcount_repair.c index b485c5cc9f290..d4a42f8b7e95b 100644 --- a/fs/xfs/scrub/refcount_repair.c +++ b/fs/xfs/scrub/refcount_repair.c @@ -183,7 +183,7 @@ xrep_refc_stash( if (xchk_should_terminate(sc, &error)) return error; - irec.rc_refcount = min_t(uint64_t, MAXREFCOUNT, refcount); + irec.rc_refcount = min_t(uint64_t, XFS_REFC_REFCOUNT_MAX, refcount); error = xrep_refc_check_ext(rr->sc, &irec); if (error) @@ -422,7 +422,7 @@ xrep_refc_find_refcounts( /* * Set up a bag to store all the rmap records that we're tracking to * generate a reference count record. If the size of the bag exceeds - * MAXREFCOUNT, we clamp rc_refcount. + * XFS_REFC_REFCOUNT_MAX, we clamp rc_refcount. */ error = rcbag_init(sc->mp, sc->xfile_buftarg, &rcstack); if (error) From patchwork Sun Dec 31 21:45:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507717 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB637C8C8 for ; Sun, 31 Dec 2023 21:45:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kBJUhgli" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 43C1DC433C7; Sun, 31 Dec 2023 21:45:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059137; bh=thnFXH/rX+EeFWpI+ezkkLNU2BLAXjtQpPcWPNn8pac=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=kBJUhgliO+Q+5l4UBHLzCFLpOsbOnU2fAylnhpr/e3xnfbXF0I+EKdUMiHJfn/rqe 5VTw58qyyYWBNrMKhIv4QBOYD5PnMra0iQRZvZM+TZ0Sfr4ZxHMjJE/3UemkhAER13 PamcUE9M6ZF0HW34/ENYW89PmjsOK+Qft4Rm28VcVXzEbfXwp5GXU0PJGwx8FFyLXg rSKhguRJYlliM+I0OIgwhUBHUqnrjaKi4KD9d4bSANXcH1mZRp80UXPHY+BYbwmTOY FHUDZVUvIm5eCw5wsq9sDWRpi/uflsoqNpINwXfHC4qn3pUTQzcE8BPkbtTGWtNs8R OECYYmjrh+K7Q== Date: Sun, 31 Dec 2023 13:45:36 -0800 Subject: [PATCH 04/44] xfs: define the on-disk realtime refcount btree format From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851646.1766284.16245401597535436245.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Start filling out the rtrefcount btree implementation. Start with the on-disk btree format; add everything needed to read, write and manipulate refcount btree blocks. This prepares the way for connecting the btree operations implementation. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_btree.c | 5 + fs/xfs/libxfs/xfs_btree.h | 11 + fs/xfs/libxfs/xfs_format.h | 3 fs/xfs/libxfs/xfs_ondisk.h | 1 fs/xfs/libxfs/xfs_rtrefcount_btree.c | 312 ++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrefcount_btree.h | 71 ++++++++ fs/xfs/libxfs/xfs_sb.c | 8 + fs/xfs/libxfs/xfs_shared.h | 2 fs/xfs/xfs_mount.c | 7 + fs/xfs/xfs_mount.h | 9 + 11 files changed, 425 insertions(+), 5 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtrefcount_btree.c create mode 100644 fs/xfs/libxfs/xfs_rtrefcount_btree.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index f9092ae77e684..783fc053d7be9 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -48,6 +48,7 @@ xfs-y += $(addprefix libxfs/, \ xfs_rmap_btree.o \ xfs_refcount.o \ xfs_refcount_btree.o \ + xfs_rtrefcount_btree.o \ xfs_rtrmap_btree.o \ xfs_sb.o \ xfs_swapext.o \ diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 2a181cf30299f..be484f86da985 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -37,6 +37,7 @@ #include "xfs_rmap.h" #include "xfs_quota.h" #include "xfs_imeta.h" +#include "xfs_rtrefcount_btree.h" /* * Btree magic numbers. @@ -5538,6 +5539,9 @@ xfs_btree_init_cur_caches(void) if (error) goto err; error = xfs_rtrmapbt_init_cur_cache(); + if (error) + goto err; + error = xfs_rtrefcountbt_init_cur_cache(); if (error) goto err; @@ -5557,6 +5561,7 @@ xfs_btree_destroy_cur_caches(void) xfs_rmapbt_destroy_cur_cache(); xfs_refcountbt_destroy_cur_cache(); xfs_rtrmapbt_destroy_cur_cache(); + xfs_rtrefcountbt_destroy_cur_cache(); } /* Move the btree cursor before the first record. */ diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index f58240adda6f4..64e37a0ffb78e 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -229,6 +229,11 @@ union xfs_btree_irec { struct xfs_refcount_irec rc; }; +struct xbtree_refc { + unsigned int nr_ops; /* # record updates */ + unsigned int shape_changes; /* # of extent splits */ +}; + /* Per-AG btree information. */ struct xfs_btree_cur_ag { struct xfs_perag *pag; @@ -237,10 +242,7 @@ struct xfs_btree_cur_ag { struct xbtree_afakeroot *afake; /* for staging cursor */ }; union { - struct { - unsigned int nr_ops; /* # record updates */ - unsigned int shape_changes; /* # of extent splits */ - } refc; + struct xbtree_refc refc; struct { bool active; /* allocation cursor state */ } abt; @@ -261,6 +263,7 @@ struct xfs_btree_cur_ino { /* For extent swap, ignore owner check in verifier */ #define XFS_BTCUR_BMBT_INVALID_OWNER (1 << 1) + struct xbtree_refc refc; }; /* In-memory btree information */ diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 473bdc2a1ad10..c938b814c430d 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1822,6 +1822,9 @@ typedef __be32 xfs_refcount_ptr_t; */ #define XFS_RTREFC_CRC_MAGIC 0x52434e54 /* 'RCNT' */ +/* inode-rooted btree pointer type */ +typedef __be64 xfs_rtrefcount_ptr_t; + /* * BMAP Btree format definitions * diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h index 102a3574fc682..242b683125662 100644 --- a/fs/xfs/libxfs/xfs_ondisk.h +++ b/fs/xfs/libxfs/xfs_ondisk.h @@ -79,6 +79,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo, 48); XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t, 8); XFS_CHECK_STRUCT_SIZE(struct xfs_rtrmap_root, 4); + XFS_CHECK_STRUCT_SIZE(xfs_rtrefcount_ptr_t, 8); /* * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c new file mode 100644 index 0000000000000..f99c7167183e5 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -0,0 +1,312 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2021-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_alloc.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_rtrefcount_btree.h" +#include "xfs_trace.h" +#include "xfs_cksum.h" +#include "xfs_error.h" +#include "xfs_extent_busy.h" +#include "xfs_rtgroup.h" +#include "xfs_rtbitmap.h" + +static struct kmem_cache *xfs_rtrefcountbt_cur_cache; + +/* + * Realtime Reference Count btree. + * + * This is a btree used to track the owner(s) of a given extent in the realtime + * device. See the comments in xfs_refcount_btree.c for more information. + * + * This tree is basically the same as the regular refcount btree except that + * it's rooted in an inode. + */ + +static struct xfs_btree_cur * +xfs_rtrefcountbt_dup_cursor( + struct xfs_btree_cur *cur) +{ + struct xfs_btree_cur *new; + + new = xfs_rtrefcountbt_init_cursor(cur->bc_mp, cur->bc_tp, + cur->bc_ino.rtg, cur->bc_ino.ip); + + /* Copy the flags values since init cursor doesn't get them. */ + new->bc_ino.flags = cur->bc_ino.flags; + + return new; +} + +static xfs_failaddr_t +xfs_rtrefcountbt_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_target->bt_mount; + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + xfs_failaddr_t fa; + int level; + + if (!xfs_verify_magic(bp, block->bb_magic)) + return __this_address; + + if (!xfs_has_reflink(mp)) + return __this_address; + fa = xfs_btree_lblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN); + if (fa) + return fa; + level = be16_to_cpu(block->bb_level); + if (level > mp->m_rtrefc_maxlevels) + return __this_address; + + return xfs_btree_lblock_verify(bp, mp->m_rtrefc_mxr[level != 0]); +} + +static void +xfs_rtrefcountbt_read_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + if (!xfs_btree_lblock_verify_crc(bp)) + xfs_verifier_error(bp, -EFSBADCRC, __this_address); + else { + fa = xfs_rtrefcountbt_verify(bp); + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + } + + if (bp->b_error) + trace_xfs_btree_corrupt(bp, _RET_IP_); +} + +static void +xfs_rtrefcountbt_write_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + fa = xfs_rtrefcountbt_verify(bp); + if (fa) { + trace_xfs_btree_corrupt(bp, _RET_IP_); + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + return; + } + xfs_btree_lblock_calc_crc(bp); + +} + +const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops = { + .name = "xfs_rtrefcountbt", + .magic = { 0, cpu_to_be32(XFS_RTREFC_CRC_MAGIC) }, + .verify_read = xfs_rtrefcountbt_read_verify, + .verify_write = xfs_rtrefcountbt_write_verify, + .verify_struct = xfs_rtrefcountbt_verify, +}; + +const struct xfs_btree_ops xfs_rtrefcountbt_ops = { + .rec_len = sizeof(struct xfs_refcount_rec), + .key_len = sizeof(struct xfs_refcount_key), + .lru_refs = XFS_REFC_BTREE_REF, + .geom_flags = XFS_BTREE_LONG_PTRS | XFS_BTREE_ROOT_IN_INODE | + XFS_BTREE_CRC_BLOCKS | XFS_BTREE_IROOT_RECORDS, + + .dup_cursor = xfs_rtrefcountbt_dup_cursor, + .buf_ops = &xfs_rtrefcountbt_buf_ops, +}; + +/* Initialize a new rt refcount btree cursor. */ +static struct xfs_btree_cur * +xfs_rtrefcountbt_init_common( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)); + + cur = xfs_btree_alloc_cursor(mp, tp, XFS_BTNUM_RTREFC, + &xfs_rtrefcountbt_ops, mp->m_rtrefc_maxlevels, + xfs_rtrefcountbt_cur_cache); + cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_refcbt_2); + + cur->bc_ino.ip = ip; + cur->bc_ino.allocated = 0; + cur->bc_ino.flags = 0; + cur->bc_ino.refc.nr_ops = 0; + cur->bc_ino.refc.shape_changes = 0; + + cur->bc_ino.rtg = xfs_rtgroup_hold(rtg); + return cur; +} + +/* Allocate a new rt refcount btree cursor. */ +struct xfs_btree_cur * +xfs_rtrefcountbt_init_cursor( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + + cur = xfs_rtrefcountbt_init_common(mp, tp, rtg, ip); + cur->bc_nlevels = be16_to_cpu(ifp->if_broot->bb_level) + 1; + cur->bc_ino.forksize = xfs_inode_fork_size(ip, XFS_DATA_FORK); + cur->bc_ino.whichfork = XFS_DATA_FORK; + return cur; +} + +/* Create a new rt reverse mapping btree cursor with a fake root for staging. */ +struct xfs_btree_cur * +xfs_rtrefcountbt_stage_cursor( + struct xfs_mount *mp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake) +{ + struct xfs_btree_cur *cur; + + cur = xfs_rtrefcountbt_init_common(mp, NULL, rtg, ip); + cur->bc_nlevels = ifake->if_levels; + cur->bc_ino.forksize = ifake->if_fork_size; + cur->bc_ino.whichfork = -1; + xfs_btree_stage_ifakeroot(cur, ifake, NULL); + return cur; +} + +/* + * Install a new rt reverse mapping btree root. Caller is responsible for + * invalidating and freeing the old btree blocks. + */ +void +xfs_rtrefcountbt_commit_staged_btree( + struct xfs_btree_cur *cur, + struct xfs_trans *tp) +{ + struct xbtree_ifakeroot *ifake = cur->bc_ino.ifake; + struct xfs_ifork *ifp; + int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; + + ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + + /* + * Free any resources hanging off the real fork, then shallow-copy the + * staging fork's contents into the real fork to transfer everything + * we just built. + */ + ifp = xfs_ifork_ptr(cur->bc_ino.ip, XFS_DATA_FORK); + xfs_idestroy_fork(ifp); + memcpy(ifp, ifake->if_fork, sizeof(struct xfs_ifork)); + + xfs_trans_log_inode(tp, cur->bc_ino.ip, flags); + xfs_btree_commit_ifakeroot(cur, tp, XFS_DATA_FORK, + &xfs_rtrefcountbt_ops); +} + +/* Calculate number of records in a realtime refcount btree block. */ +static inline unsigned int +xfs_rtrefcountbt_block_maxrecs( + unsigned int blocklen, + bool leaf) +{ + + if (leaf) + return blocklen / sizeof(struct xfs_refcount_rec); + return blocklen / (sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); +} + +/* + * Calculate number of records in an refcount btree block. + */ +unsigned int +xfs_rtrefcountbt_maxrecs( + struct xfs_mount *mp, + unsigned int blocklen, + bool leaf) +{ + blocklen -= XFS_RTREFCOUNT_BLOCK_LEN; + return xfs_rtrefcountbt_block_maxrecs(blocklen, leaf); +} + +/* Compute the max possible height for realtime refcount btrees. */ +unsigned int +xfs_rtrefcountbt_maxlevels_ondisk(void) +{ + unsigned int minrecs[2]; + unsigned int blocklen; + + blocklen = XFS_MIN_CRC_BLOCKSIZE - XFS_BTREE_LBLOCK_CRC_LEN; + + minrecs[0] = xfs_rtrefcountbt_block_maxrecs(blocklen, true) / 2; + minrecs[1] = xfs_rtrefcountbt_block_maxrecs(blocklen, false) / 2; + + /* We need at most one record for every block in an rt group. */ + return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS); +} + +int __init +xfs_rtrefcountbt_init_cur_cache(void) +{ + xfs_rtrefcountbt_cur_cache = kmem_cache_create("xfs_rtrefcountbt_cur", + xfs_btree_cur_sizeof( + xfs_rtrefcountbt_maxlevels_ondisk()), + 0, 0, NULL); + + if (!xfs_rtrefcountbt_cur_cache) + return -ENOMEM; + return 0; +} + +void +xfs_rtrefcountbt_destroy_cur_cache(void) +{ + kmem_cache_destroy(xfs_rtrefcountbt_cur_cache); + xfs_rtrefcountbt_cur_cache = NULL; +} + +/* Compute the maximum height of a realtime refcount btree. */ +void +xfs_rtrefcountbt_compute_maxlevels( + struct xfs_mount *mp) +{ + unsigned int d_maxlevels, r_maxlevels; + + if (!xfs_has_rtreflink(mp)) { + mp->m_rtrefc_maxlevels = 0; + return; + } + + /* + * The realtime refcountbt lives on the data device, which means that + * its maximum height is constrained by the size of the data device and + * the height required to store one refcount record for each rtextent + * in an rt group. + */ + d_maxlevels = xfs_btree_space_to_height(mp->m_rtrefc_mnr, + mp->m_sb.sb_dblocks); + r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrefc_mnr, + xfs_rtb_to_rtx(mp, mp->m_sb.sb_rgblocks)); + + /* Add one level to handle the inode root level. */ + mp->m_rtrefc_maxlevels = min(d_maxlevels, r_maxlevels) + 1; +} diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.h b/fs/xfs/libxfs/xfs_rtrefcount_btree.h new file mode 100644 index 0000000000000..6d23ab3a9ad41 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.h @@ -0,0 +1,71 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (c) 2021-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_RTREFCOUNT_BTREE_H__ +#define __XFS_RTREFCOUNT_BTREE_H__ + +struct xfs_buf; +struct xfs_btree_cur; +struct xfs_mount; +struct xbtree_ifakeroot; +struct xfs_rtgroup; + +/* refcounts only exist on crc enabled filesystems */ +#define XFS_RTREFCOUNT_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN + +struct xfs_btree_cur *xfs_rtrefcountbt_init_cursor(struct xfs_mount *mp, + struct xfs_trans *tp, struct xfs_rtgroup *rtg, + struct xfs_inode *ip); +struct xfs_btree_cur *xfs_rtrefcountbt_stage_cursor(struct xfs_mount *mp, + struct xfs_rtgroup *rtg, struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake); +void xfs_rtrefcountbt_commit_staged_btree(struct xfs_btree_cur *cur, + struct xfs_trans *tp); +unsigned int xfs_rtrefcountbt_maxrecs(struct xfs_mount *mp, + unsigned int blocklen, bool leaf); +void xfs_rtrefcountbt_compute_maxlevels(struct xfs_mount *mp); + +/* + * Addresses of records, keys, and pointers within an incore rtrefcountbt block. + * + * (note that some of these may appear unused, but they are used in userspace) + */ +static inline struct xfs_refcount_rec * +xfs_rtrefcount_rec_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_refcount_rec *) + ((char *)block + XFS_RTREFCOUNT_BLOCK_LEN + + (index - 1) * sizeof(struct xfs_refcount_rec)); +} + +static inline struct xfs_refcount_key * +xfs_rtrefcount_key_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_refcount_key *) + ((char *)block + XFS_RTREFCOUNT_BLOCK_LEN + + (index - 1) * sizeof(struct xfs_refcount_key)); +} + +static inline xfs_rtrefcount_ptr_t * +xfs_rtrefcount_ptr_addr( + struct xfs_btree_block *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrefcount_ptr_t *) + ((char *)block + XFS_RTREFCOUNT_BLOCK_LEN + + maxrecs * sizeof(struct xfs_refcount_key) + + (index - 1) * sizeof(xfs_rtrefcount_ptr_t)); +} + +unsigned int xfs_rtrefcountbt_maxlevels_ondisk(void); +int __init xfs_rtrefcountbt_init_cur_cache(void); +void xfs_rtrefcountbt_destroy_cur_cache(void); + +#endif /* __XFS_RTREFCOUNT_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index a5ca8f4f8699f..b75a5bcbdf19e 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -29,6 +29,7 @@ #include "xfs_swapext.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" /* * Physical superblock buffer manipulations. Shared with libxfs in userspace. @@ -1136,6 +1137,13 @@ xfs_sb_mount_common( mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2; mp->m_refc_mnr[1] = mp->m_refc_mxr[1] / 2; + mp->m_rtrefc_mxr[0] = xfs_rtrefcountbt_maxrecs(mp, sbp->sb_blocksize, + true); + mp->m_rtrefc_mxr[1] = xfs_rtrefcountbt_maxrecs(mp, sbp->sb_blocksize, + false); + mp->m_rtrefc_mnr[0] = mp->m_rtrefc_mxr[0] / 2; + mp->m_rtrefc_mnr[1] = mp->m_rtrefc_mxr[1] / 2; + mp->m_bsize = XFS_FSB_TO_BB(mp, 1); mp->m_alloc_set_aside = xfs_alloc_set_aside(mp); mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp); diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index adb742267c9d0..3a7f92a9ac761 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -42,6 +42,7 @@ extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops; extern const struct xfs_buf_ops xfs_rtsummary_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; extern const struct xfs_buf_ops xfs_rtsb_buf_ops; +extern const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops; extern const struct xfs_buf_ops xfs_rtrmapbt_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops; @@ -56,6 +57,7 @@ extern const struct xfs_btree_ops xfs_bmbt_ops; extern const struct xfs_btree_ops xfs_refcountbt_ops; extern const struct xfs_btree_ops xfs_rmapbt_ops; extern const struct xfs_btree_ops xfs_rtrmapbt_ops; +extern const struct xfs_btree_ops xfs_rtrefcountbt_ops; /* log size calculation functions */ int xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes); diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 30d2d0c5e5e53..b20e0e6410512 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -37,6 +37,7 @@ #include "xfs_imeta.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" #include "scrub/stats.h" static DEFINE_MUTEX(xfs_uuid_table_mutex); @@ -665,7 +666,10 @@ static inline void xfs_rtbtree_compute_maxlevels( struct xfs_mount *mp) { - mp->m_rtbtree_maxlevels = mp->m_rtrmap_maxlevels; + unsigned int levels; + + levels = max(mp->m_rtrmap_maxlevels, mp->m_rtrefc_maxlevels); + mp->m_rtbtree_maxlevels = levels; } /* @@ -739,6 +743,7 @@ xfs_mountfs( xfs_rmapbt_compute_maxlevels(mp); xfs_rtrmapbt_compute_maxlevels(mp); xfs_refcountbt_compute_maxlevels(mp); + xfs_rtrefcountbt_compute_maxlevels(mp); xfs_agbtree_compute_maxlevels(mp); xfs_rtbtree_compute_maxlevels(mp); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 21c889a723820..1c99d0630364f 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -138,11 +138,14 @@ typedef struct xfs_mount { uint m_rtrmap_mnr[2]; /* min rtrmap btree records */ uint m_refc_mxr[2]; /* max refc btree records */ uint m_refc_mnr[2]; /* min refc btree records */ + uint m_rtrefc_mxr[2]; /* max rtrefc btree records */ + uint m_rtrefc_mnr[2]; /* min rtrefc btree records */ uint m_alloc_maxlevels; /* max alloc btree levels */ uint m_bm_maxlevels[2]; /* max bmap btree levels */ uint m_rmap_maxlevels; /* max rmap btree levels */ uint m_rtrmap_maxlevels; /* max rtrmap btree level */ uint m_refc_maxlevels; /* max refcount btree level */ + uint m_rtrefc_maxlevels; /* max rtrefc btree level */ unsigned int m_agbtree_maxlevels; /* max level of all AG btrees */ unsigned int m_rtbtree_maxlevels; /* max level of all rt btrees */ xfs_extlen_t m_ag_prealloc_blocks; /* reserved ag blocks */ @@ -379,6 +382,12 @@ static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp) xfs_has_rmapbt(mp); } +static inline bool xfs_has_rtreflink(struct xfs_mount *mp) +{ + return xfs_has_metadir(mp) && xfs_has_realtime(mp) && + xfs_has_reflink(mp); +} + /* * Mount features * From patchwork Sun Dec 31 21:45:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507718 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D0BDBA2E for ; Sun, 31 Dec 2023 21:45:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TQc9bZ6v" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E5A91C433C7; Sun, 31 Dec 2023 21:45:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059152; bh=2LcZaPwPmp6PFPdz+dYMQLgf1E19YPCw1kRIHfY+4Mw=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=TQc9bZ6vzX16Yki7R64GWi41MxN+bXpZ/cejv2c7NfyN+a/zUt+sNx34HHW6Bum03 qSekCqOb7AN/C35s5B563FdY3oIlGAvL0q+Sh310+6oE27gkTCTl0ZSeYyAKAAfsVq I5k4Luv+6SydGtPhtRufUpx+aCoLAZsG+B31yvnpOAX7nZ47Wnh21PQo90mLnsLdIO cao5bjFWJBvYNbNaqs6LfPyx8yf+DFAP19/8NiLLnuhWPBeb5dFRUQNUZk8f4Qm+wH hDzbZ/jilnj5iUIr5vPV+R6QZ5lty+CDArqq+1fQNKbOXyUyVJKkWaWb3kZ56ZBMdq 3ZhjoblZEe/gw== Date: Sun, 31 Dec 2023 13:45:52 -0800 Subject: [PATCH 05/44] xfs: realtime refcount btree transaction reservations From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851663.1766284.16226171166478684089.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Make sure that there's enough log reservation to handle mapping and unmapping realtime extents. We have to reserve enough space to handle a split in the rtrefcountbt to add the record and a second split in the regular refcountbt to record the rtrefcountbt split. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_trans_resv.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 423b0cede71cb..5b42603de966f 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -93,6 +93,14 @@ xfs_refcountbt_block_count( return num_ops * (2 * mp->m_refc_maxlevels - 1); } +static unsigned int +xfs_rtrefcountbt_block_count( + struct xfs_mount *mp, + unsigned int num_ops) +{ + return num_ops * (2 * mp->m_rtrefc_maxlevels - 1); +} + /* * Logging inodes is really tricksy. They are logged in memory format, * which means that what we write into the log doesn't directly translate into @@ -260,10 +268,13 @@ xfs_rtalloc_block_count( * Compute the log reservation required to handle the refcount update * transaction. Refcount updates are always done via deferred log items. * - * This is calculated as: + * This is calculated as the max of: * Data device refcount updates (t1): * the agfs of the ags containing the blocks: nr_ops * sector size * the refcount btrees: nr_ops * 1 trees * (2 * max depth - 1) * block size + * Realtime refcount updates (t2); + * the rt refcount inode + * the rtrefcount btrees: nr_ops * 1 trees * (2 * max depth - 1) * block size */ static unsigned int xfs_calc_refcountbt_reservation( @@ -271,12 +282,20 @@ xfs_calc_refcountbt_reservation( unsigned int nr_ops) { unsigned int blksz = XFS_FSB_TO_B(mp, 1); + unsigned int t1, t2 = 0; if (!xfs_has_reflink(mp)) return 0; - return xfs_calc_buf_res(nr_ops, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_refcountbt_block_count(mp, nr_ops), blksz); + t1 = xfs_calc_buf_res(nr_ops, mp->m_sb.sb_sectsize) + + xfs_calc_buf_res(xfs_refcountbt_block_count(mp, nr_ops), blksz); + + if (xfs_has_realtime(mp)) + t2 = xfs_calc_inode_res(mp, 1) + + xfs_calc_buf_res(xfs_rtrefcountbt_block_count(mp, nr_ops), + blksz); + + return max(t1, t2); } /* From patchwork Sun Dec 31 21:46:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507719 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F16CDBE48 for ; Sun, 31 Dec 2023 21:46:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="h92VJgij" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7A3A1C433C8; Sun, 31 Dec 2023 21:46:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059168; bh=mGuFCys8SsihyQFSgUUmXYgyE8Aw6atUZc6FbwH6Kl0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=h92VJgij1OS80rdsTdpf5DUiEV5zj5EY8B19AQJWEC9wSJuZiWcqRnNFPu8c4xixr 3a/wOMe6MzK6RVUezWCpprROozJA2a7UbLvrilnnTxt1V12Pd2hXVM0hUO2SXLI/Zm l9/7rLDudOMEWL3bif9B31ecV5yL1nQOT0HWFzMQwBvjn2rVwzytFJTp+bnBJtOkWW 9Bub9H7fihwasttIcot6G6mnMJCU+F5BjDkXn++3hWNaLTGYgvh0x0pVduxOsR6rJH uH2DKw33BFPIHpSVfzBT/xmKAagxTLnY1rEDGsyQtQ3lkUdvm71K/QJn+/+Y6GOlhh 28DOmc1w3MC+Q== Date: Sun, 31 Dec 2023 13:46:08 -0800 Subject: [PATCH 06/44] xfs: add realtime refcount btree operations From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851679.1766284.16783207471035270784.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Implement the generic btree operations needed to manipulate rtrefcount btree blocks. This is different from the regular refcountbt in that we allocate space from the filesystem at large, and are neither constrained to the free space nor any particular AG. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_rtrefcount_btree.c | 148 ++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index f99c7167183e5..0892c4ddc7adf 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -19,6 +19,7 @@ #include "xfs_btree.h" #include "xfs_btree_staging.h" #include "xfs_rtrefcount_btree.h" +#include "xfs_refcount.h" #include "xfs_trace.h" #include "xfs_cksum.h" #include "xfs_error.h" @@ -53,6 +54,106 @@ xfs_rtrefcountbt_dup_cursor( return new; } +STATIC int +xfs_rtrefcountbt_get_minrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrefcountbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0) / 2; + } + + return cur->bc_mp->m_rtrefc_mnr[level != 0]; +} + +STATIC int +xfs_rtrefcountbt_get_maxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrefcountbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0); + } + + return cur->bc_mp->m_rtrefc_mxr[level != 0]; +} + +STATIC void +xfs_rtrefcountbt_init_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + key->refc.rc_startblock = rec->refc.rc_startblock; +} + +STATIC void +xfs_rtrefcountbt_init_high_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + __u32 x; + + x = be32_to_cpu(rec->refc.rc_startblock); + x += be32_to_cpu(rec->refc.rc_blockcount) - 1; + key->refc.rc_startblock = cpu_to_be32(x); +} + +STATIC void +xfs_rtrefcountbt_init_rec_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_rec *rec) +{ + const struct xfs_refcount_irec *irec = &cur->bc_rec.rc; + uint32_t start; + + start = xfs_refcount_encode_startblock(irec->rc_startblock, + irec->rc_domain); + rec->refc.rc_startblock = cpu_to_be32(start); + rec->refc.rc_blockcount = cpu_to_be32(cur->bc_rec.rc.rc_blockcount); + rec->refc.rc_refcount = cpu_to_be32(cur->bc_rec.rc.rc_refcount); +} + +STATIC void +xfs_rtrefcountbt_init_ptr_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr) +{ + ptr->l = 0; +} + +STATIC int64_t +xfs_rtrefcountbt_key_diff( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key) +{ + const struct xfs_refcount_key *kp = &key->refc; + const struct xfs_refcount_irec *irec = &cur->bc_rec.rc; + uint32_t start; + + start = xfs_refcount_encode_startblock(irec->rc_startblock, + irec->rc_domain); + return (int64_t)be32_to_cpu(kp->rc_startblock) - start; +} + +STATIC int64_t +xfs_rtrefcountbt_diff_two_keys( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2, + const union xfs_btree_key *mask) +{ + ASSERT(!mask || mask->refc.rc_startblock); + + return (int64_t)be32_to_cpu(k1->refc.rc_startblock) - + be32_to_cpu(k2->refc.rc_startblock); +} + static xfs_failaddr_t xfs_rtrefcountbt_verify( struct xfs_buf *bp) @@ -119,6 +220,40 @@ const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops = { .verify_struct = xfs_rtrefcountbt_verify, }; +STATIC int +xfs_rtrefcountbt_keys_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2) +{ + return be32_to_cpu(k1->refc.rc_startblock) < + be32_to_cpu(k2->refc.rc_startblock); +} + +STATIC int +xfs_rtrefcountbt_recs_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_rec *r1, + const union xfs_btree_rec *r2) +{ + return be32_to_cpu(r1->refc.rc_startblock) + + be32_to_cpu(r1->refc.rc_blockcount) <= + be32_to_cpu(r2->refc.rc_startblock); +} + +STATIC enum xbtree_key_contig +xfs_rtrefcountbt_keys_contiguous( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key1, + const union xfs_btree_key *key2, + const union xfs_btree_key *mask) +{ + ASSERT(!mask || mask->refc.rc_startblock); + + return xbtree_key_contig(be32_to_cpu(key1->refc.rc_startblock), + be32_to_cpu(key2->refc.rc_startblock)); +} + const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .rec_len = sizeof(struct xfs_refcount_rec), .key_len = sizeof(struct xfs_refcount_key), @@ -127,7 +262,20 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = { XFS_BTREE_CRC_BLOCKS | XFS_BTREE_IROOT_RECORDS, .dup_cursor = xfs_rtrefcountbt_dup_cursor, + .alloc_block = xfs_btree_alloc_imeta_block, + .free_block = xfs_btree_free_imeta_block, + .get_minrecs = xfs_rtrefcountbt_get_minrecs, + .get_maxrecs = xfs_rtrefcountbt_get_maxrecs, + .init_key_from_rec = xfs_rtrefcountbt_init_key_from_rec, + .init_high_key_from_rec = xfs_rtrefcountbt_init_high_key_from_rec, + .init_rec_from_cur = xfs_rtrefcountbt_init_rec_from_cur, + .init_ptr_from_cur = xfs_rtrefcountbt_init_ptr_from_cur, + .key_diff = xfs_rtrefcountbt_key_diff, .buf_ops = &xfs_rtrefcountbt_buf_ops, + .diff_two_keys = xfs_rtrefcountbt_diff_two_keys, + .keys_inorder = xfs_rtrefcountbt_keys_inorder, + .recs_inorder = xfs_rtrefcountbt_recs_inorder, + .keys_contiguous = xfs_rtrefcountbt_keys_contiguous, }; /* Initialize a new rt refcount btree cursor. */ From patchwork Sun Dec 31 21:46:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507720 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B60FC147 for ; Sun, 31 Dec 2023 21:46:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lx/6Avp3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 17872C433C8; Sun, 31 Dec 2023 21:46:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059184; bh=pRFGtQGLibC8wfaokqTD/GXOcQqskZY77SX2EwFMbsI=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=lx/6Avp3yK4YDBTt23m2YFnN8NyKHurS/QP0kcFZHlOiZw0jqnP3em2D+2vvPt9R5 eBd1ropSZqrlr0cmDvCWddBSKX411p5FByppYvlqreBRqX9OXAqwDk6JG455aIqPXu /OxqqTUkvuJ1onza/Yv9Fdw+xyhRzJEUgPu2BEKm3qcQgqgha1CsyCAImj/V68sbiO tPwzk4bpc9xm6NVojS8RtjKB/uHSeEQ0E0sbQnTwt/sHwShazaR5NUpzwhq76ZqFwM ffqsw0l4AJYobhv0fGEerBfsDuzV5mOM+MhlB4yE/yvktAz8HHFmMIzr7mTU0rfi1y BnriEjAg0PGyg== Date: Sun, 31 Dec 2023 13:46:23 -0800 Subject: [PATCH 07/44] xfs: prepare refcount functions to deal with rtrefcountbt From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851694.1766284.12214907704820439385.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Prepare the high-level refcount functions to deal with the new realtime refcountbt and its slightly different conventions. Provide the ability to talk to either refcountbt or rtrefcountbt formats from the same high level code. Note that we leave the _recover_cow_leftovers functions for a separate patch so that we can convert it all at once. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_refcount.c | 93 ++++++++++++++++++++++++++++++++++-------- fs/xfs/libxfs/xfs_refcount.h | 3 + 2 files changed, 78 insertions(+), 18 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index b29a718737c59..269b950399071 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -25,6 +25,7 @@ #include "xfs_ag.h" #include "xfs_health.h" #include "xfs_refcount_item.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_refcount_intent_cache; @@ -41,6 +42,16 @@ STATIC int __xfs_refcount_cow_alloc(struct xfs_btree_cur *rcur, STATIC int __xfs_refcount_cow_free(struct xfs_btree_cur *rcur, xfs_agblock_t agbno, xfs_extlen_t aglen); +/* Return the maximum startblock number of the refcountbt. */ +static inline xfs_agblock_t +xrefc_max_startblock( + struct xfs_btree_cur *cur) +{ + if (cur->bc_btnum == XFS_BTNUM_RTREFC) + return cur->bc_mp->m_sb.sb_rgblocks; + return cur->bc_mp->m_sb.sb_agblocks; +} + /* * Look up the first record less than or equal to [bno, len] in the btree * given by cur. @@ -144,6 +155,37 @@ xfs_refcount_check_irec( return NULL; } +xfs_failaddr_t +xfs_rtrefcount_check_irec( + struct xfs_rtgroup *rtg, + const struct xfs_refcount_irec *irec) +{ + if (irec->rc_blockcount == 0 || irec->rc_blockcount > XFS_REFC_LEN_MAX) + return __this_address; + + if (!xfs_refcount_check_domain(irec)) + return __this_address; + + /* check for valid extent range, including overflow */ + if (!xfs_verify_rgbext(rtg, irec->rc_startblock, irec->rc_blockcount)) + return __this_address; + + if (irec->rc_refcount == 0 || irec->rc_refcount > XFS_REFC_REFCOUNT_MAX) + return __this_address; + + return NULL; +} + +static inline xfs_failaddr_t +xfs_refcount_check_btrec( + struct xfs_btree_cur *cur, + const struct xfs_refcount_irec *irec) +{ + if (cur->bc_btnum == XFS_BTNUM_RTREFC) + return xfs_rtrefcount_check_irec(cur->bc_ino.rtg, irec); + return xfs_refcount_check_irec(cur->bc_ag.pag, irec); +} + static inline int xfs_refcount_complain_bad_rec( struct xfs_btree_cur *cur, @@ -152,9 +194,15 @@ xfs_refcount_complain_bad_rec( { struct xfs_mount *mp = cur->bc_mp; - xfs_warn(mp, + if (cur->bc_btnum == XFS_BTNUM_RTREFC) { + xfs_warn(mp, + "RT Refcount BTree record corruption in rtgroup %u detected at %pS!", + cur->bc_ino.rtg->rtg_rgno, fa); + } else { + xfs_warn(mp, "Refcount BTree record corruption in AG %d detected at %pS!", cur->bc_ag.pag->pag_agno, fa); + } xfs_warn(mp, "Start block 0x%x, block count 0x%x, references 0x%x", irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount); @@ -180,7 +228,7 @@ xfs_refcount_get_rec( return error; xfs_refcount_btrec_to_irec(rec, irec); - fa = xfs_refcount_check_irec(cur->bc_ag.pag, irec); + fa = xfs_refcount_check_btrec(cur, irec); if (fa) return xfs_refcount_complain_bad_rec(cur, fa, irec); @@ -1047,6 +1095,15 @@ xfs_refcount_merge_extents( return 0; } +static inline struct xbtree_refc * +xrefc_btree_state( + struct xfs_btree_cur *cur) +{ + if (cur->bc_btnum == XFS_BTNUM_RTREFC) + return &cur->bc_ino.refc; + return &cur->bc_ag.refc; +} + /* * XXX: This is a pretty hand-wavy estimate. The penalty for guessing * true incorrectly is a shutdown FS; the penalty for guessing false @@ -1064,25 +1121,25 @@ xfs_refcount_still_have_space( * to handle each of the shape changes to the refcount btree. */ overhead = xfs_allocfree_block_count(cur->bc_mp, - cur->bc_ag.refc.shape_changes); - overhead += cur->bc_mp->m_refc_maxlevels; + xrefc_btree_state(cur)->shape_changes); + overhead += cur->bc_maxlevels; overhead *= cur->bc_mp->m_sb.sb_blocksize; /* * Only allow 2 refcount extent updates per transaction if the * refcount continue update "error" has been injected. */ - if (cur->bc_ag.refc.nr_ops > 2 && + if (xrefc_btree_state(cur)->nr_ops > 2 && XFS_TEST_ERROR(false, cur->bc_mp, XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE)) return false; - if (cur->bc_ag.refc.nr_ops == 0) + if (xrefc_btree_state(cur)->nr_ops == 0) return true; else if (overhead > cur->bc_tp->t_log_res) return false; return cur->bc_tp->t_log_res - overhead > - cur->bc_ag.refc.nr_ops * XFS_REFCOUNT_ITEM_OVERHEAD; + xrefc_btree_state(cur)->nr_ops * XFS_REFCOUNT_ITEM_OVERHEAD; } /* @@ -1117,7 +1174,7 @@ xfs_refcount_adjust_extents( if (error) goto out_error; if (!found_rec || ext.rc_domain != XFS_REFC_DOMAIN_SHARED) { - ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks; + ext.rc_startblock = xrefc_max_startblock(cur); ext.rc_blockcount = 0; ext.rc_refcount = 0; ext.rc_domain = XFS_REFC_DOMAIN_SHARED; @@ -1141,7 +1198,7 @@ xfs_refcount_adjust_extents( * Either cover the hole (increment) or * delete the range (decrement). */ - cur->bc_ag.refc.nr_ops++; + xrefc_btree_state(cur)->nr_ops++; if (tmp.rc_refcount) { error = xfs_refcount_insert(cur, &tmp, &found_tmp); @@ -1201,7 +1258,7 @@ xfs_refcount_adjust_extents( goto skip; ext.rc_refcount += adj; trace_xfs_refcount_modify_extent(cur, &ext); - cur->bc_ag.refc.nr_ops++; + xrefc_btree_state(cur)->nr_ops++; if (ext.rc_refcount > 1) { error = xfs_refcount_update(cur, &ext); if (error) @@ -1287,7 +1344,7 @@ xfs_refcount_adjust( if (shape_changed) shape_changes++; if (shape_changes) - cur->bc_ag.refc.shape_changes++; + xrefc_btree_state(cur)->shape_changes++; /* Now that we've taken care of the ends, adjust the middle extents */ error = xfs_refcount_adjust_extents(cur, agbno, aglen, adj); @@ -1361,8 +1418,8 @@ xfs_refcount_finish_one( * the startblock, get one now. */ if (rcur != NULL && rcur->bc_ag.pag != ri->ri_pag) { - nr_ops = rcur->bc_ag.refc.nr_ops; - shape_changes = rcur->bc_ag.refc.shape_changes; + nr_ops = xrefc_btree_state(rcur)->nr_ops; + shape_changes = xrefc_btree_state(rcur)->shape_changes; xfs_btree_del_cursor(rcur, 0); rcur = NULL; *pcur = NULL; @@ -1375,8 +1432,8 @@ xfs_refcount_finish_one( *pcur = rcur = xfs_refcountbt_init_cursor(mp, tp, agbp, ri->ri_pag); - rcur->bc_ag.refc.nr_ops = nr_ops; - rcur->bc_ag.refc.shape_changes = shape_changes; + xrefc_btree_state(rcur)->nr_ops = nr_ops; + xrefc_btree_state(rcur)->shape_changes = shape_changes; } switch (ri->ri_type) { @@ -1667,7 +1724,7 @@ xfs_refcount_adjust_cow_extents( goto out_error; } if (!found_rec) { - ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks; + ext.rc_startblock = xrefc_max_startblock(cur); ext.rc_blockcount = 0; ext.rc_refcount = 0; ext.rc_domain = XFS_REFC_DOMAIN_COW; @@ -1878,7 +1935,7 @@ xfs_refcount_recover_extent( INIT_LIST_HEAD(&rr->rr_list); xfs_refcount_btrec_to_irec(rec, &rr->rr_rrec); - if (xfs_refcount_check_irec(cur->bc_ag.pag, &rr->rr_rrec) != NULL || + if (xfs_refcount_check_btrec(cur, &rr->rr_rrec) != NULL || XFS_IS_CORRUPT(cur->bc_mp, rr->rr_rrec.rc_domain != XFS_REFC_DOMAIN_COW)) { xfs_btree_mark_sick(cur); @@ -2027,7 +2084,7 @@ xfs_refcount_query_range_helper( xfs_failaddr_t fa; xfs_refcount_btrec_to_irec(rec, &irec); - fa = xfs_refcount_check_irec(cur->bc_ag.pag, &irec); + fa = xfs_refcount_check_btrec(cur, &irec); if (fa) return xfs_refcount_complain_bad_rec(cur, fa, &irec); diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h index 68acb0b1b4a87..13344b402a72c 100644 --- a/fs/xfs/libxfs/xfs_refcount.h +++ b/fs/xfs/libxfs/xfs_refcount.h @@ -12,6 +12,7 @@ struct xfs_perag; struct xfs_btree_cur; struct xfs_bmbt_irec; struct xfs_refcount_irec; +struct xfs_rtgroup; extern int xfs_refcount_lookup_le(struct xfs_btree_cur *cur, enum xfs_refc_domain domain, xfs_agblock_t bno, int *stat); @@ -120,6 +121,8 @@ extern void xfs_refcount_btrec_to_irec(const union xfs_btree_rec *rec, struct xfs_refcount_irec *irec); xfs_failaddr_t xfs_refcount_check_irec(struct xfs_perag *pag, const struct xfs_refcount_irec *irec); +xfs_failaddr_t xfs_rtrefcount_check_irec(struct xfs_rtgroup *rtg, + const struct xfs_refcount_irec *irec); extern int xfs_refcount_insert(struct xfs_btree_cur *cur, struct xfs_refcount_irec *irec, int *stat); From patchwork Sun Dec 31 21:46:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507721 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55E39B645 for ; Sun, 31 Dec 2023 21:46:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XMuA0h9V" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4E4EC433C7; Sun, 31 Dec 2023 21:46:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059199; bh=rdMSgtqrB0GGQgN2cAYt7ZeVcVYdKCTtqB2yATG13dE=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=XMuA0h9Vdo50LZdacZTTTVCkNzClPkguew2lVcrnj3HlZsQ6QZHaiCkVbZcOWk3q/ tje7TkkS8AjSMkMRNuxRCsuvTMLK7NSKmHxqckq5ykbNED9GDFxMWyMlK7xqO/ipYt gfAU7b72T/KxBo5ZXJJmS19OCZwbV4ZTbkNFz6W+wrxdfr6534h+IKoTjcnkx+0NB6 PMt9iuMR1HgjrIxsA/rVdA1JOPEragtvu3ZeOZW5mGLFjLb+V+3Io0LzpUgLmTwprg /AIeyxAcRY6FO/OsRykW/fX7AAhTOCE7BJTJLgDMgUmv6CMZ9RDC1kDgT5LQwLVqYs /ynjWWeCUo5kQ== Date: Sun, 31 Dec 2023 13:46:39 -0800 Subject: [PATCH 08/44] xfs: add a realtime flag to the refcount update log redo items From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851710.1766284.16254295578848537162.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Extend the refcount update (CUI) log items with a new realtime flag that indicates that the updates apply against the realtime refcountbt. We'll wire up the actual refcount code later. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 10 +- fs/xfs/libxfs/xfs_defer.h | 1 fs/xfs/libxfs/xfs_log_format.h | 6 + fs/xfs/libxfs/xfs_log_recover.h | 2 fs/xfs/libxfs/xfs_refcount.c | 72 ++++++++---- fs/xfs/libxfs/xfs_refcount.h | 22 ++-- fs/xfs/scrub/cow_repair.c | 2 fs/xfs/scrub/reap.c | 5 - fs/xfs/xfs_log_recover.c | 2 fs/xfs/xfs_refcount_item.c | 242 +++++++++++++++++++++++++++++++++++++-- fs/xfs/xfs_reflink.c | 19 ++- 11 files changed, 330 insertions(+), 53 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 992e492972e76..9a285f38da4cd 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4492,8 +4492,9 @@ xfs_bmapi_write( * the refcount btree for orphan recovery. */ if (whichfork == XFS_COW_FORK) - xfs_refcount_alloc_cow_extent(tp, bma.blkno, - bma.length); + xfs_refcount_alloc_cow_extent(tp, + XFS_IS_REALTIME_INODE(ip), + bma.blkno, bma.length); } /* Deal with the allocated space we found. */ @@ -4659,7 +4660,8 @@ xfs_bmapi_convert_delalloc( *seq = READ_ONCE(ifp->if_seq); if (whichfork == XFS_COW_FORK) - xfs_refcount_alloc_cow_extent(tp, bma.blkno, bma.length); + xfs_refcount_alloc_cow_extent(tp, XFS_IS_REALTIME_INODE(ip), + bma.blkno, bma.length); error = xfs_bmap_btree_to_extents(tp, ip, bma.cur, &bma.logflags, whichfork); @@ -5271,7 +5273,7 @@ xfs_bmap_del_extent_real( */ if (want_free) { if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { - xfs_refcount_decrease_extent(tp, del); + xfs_refcount_decrease_extent(tp, isrt, del); } else { unsigned int efi_flags = 0; diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index fddcb4cccbcc2..a351f00e6d78d 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -68,6 +68,7 @@ struct xfs_defer_op_type { extern const struct xfs_defer_op_type xfs_bmap_update_defer_type; extern const struct xfs_defer_op_type xfs_refcount_update_defer_type; +extern const struct xfs_defer_op_type xfs_rtrefcount_update_defer_type; extern const struct xfs_defer_op_type xfs_rmap_update_defer_type; extern const struct xfs_defer_op_type xfs_rtrmap_update_defer_type; extern const struct xfs_defer_op_type xfs_extent_free_defer_type; diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index ea4e88d665707..a888e3d98a3de 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -252,6 +252,8 @@ typedef struct xfs_trans_header { #define XFS_LI_EFD_RT 0x124b /* realtime extent free done */ #define XFS_LI_RUI_RT 0x124c /* realtime rmap update intent */ #define XFS_LI_RUD_RT 0x124d /* realtime rmap update done */ +#define XFS_LI_CUI_RT 0x124e /* realtime refcount update intent */ +#define XFS_LI_CUD_RT 0x124f /* realtime refcount update done */ #define XFS_LI_TYPE_DESC \ { XFS_LI_EFI, "XFS_LI_EFI" }, \ @@ -275,7 +277,9 @@ typedef struct xfs_trans_header { { XFS_LI_EFI_RT, "XFS_LI_EFI_RT" }, \ { XFS_LI_EFD_RT, "XFS_LI_EFD_RT" }, \ { XFS_LI_RUI_RT, "XFS_LI_RUI_RT" }, \ - { XFS_LI_RUD_RT, "XFS_LI_RUD_RT" } + { XFS_LI_RUD_RT, "XFS_LI_RUD_RT" }, \ + { XFS_LI_CUI_RT, "XFS_LI_CUI_RT" }, \ + { XFS_LI_CUD_RT, "XFS_LI_CUD_RT" } /* * Inode Log Item Format definitions. diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h index 433974693d10b..0ea9a6db24b84 100644 --- a/fs/xfs/libxfs/xfs_log_recover.h +++ b/fs/xfs/libxfs/xfs_log_recover.h @@ -81,6 +81,8 @@ extern const struct xlog_recover_item_ops xlog_rtefi_item_ops; extern const struct xlog_recover_item_ops xlog_rtefd_item_ops; extern const struct xlog_recover_item_ops xlog_rtrui_item_ops; extern const struct xlog_recover_item_ops xlog_rtrud_item_ops; +extern const struct xlog_recover_item_ops xlog_rtcui_item_ops; +extern const struct xlog_recover_item_ops xlog_rtcud_item_ops; /* * Macros, structures, prototypes for internal log manager use. diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 269b950399071..2ae126d3bd7ff 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -26,6 +26,7 @@ #include "xfs_health.h" #include "xfs_refcount_item.h" #include "xfs_rtgroup.h" +#include "xfs_rtalloc.h" struct kmem_cache *xfs_refcount_intent_cache; @@ -1142,6 +1143,28 @@ xfs_refcount_still_have_space( xrefc_btree_state(cur)->nr_ops * XFS_REFCOUNT_ITEM_OVERHEAD; } +/* Schedule an extent free. */ +static int +xrefc_free_extent( + struct xfs_btree_cur *cur, + struct xfs_refcount_irec *rec) +{ + xfs_fsblock_t fsbno; + unsigned int flags = 0; + + if (cur->bc_btnum == XFS_BTNUM_RTREFC) { + flags |= XFS_FREE_EXTENT_REALTIME; + fsbno = xfs_rgbno_to_rtb(cur->bc_mp, cur->bc_ino.rtg->rtg_rgno, + rec->rc_startblock); + } else { + fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno, + rec->rc_startblock); + } + + return xfs_free_extent_later(cur->bc_tp, fsbno, rec->rc_blockcount, + NULL, XFS_AG_RESV_NONE, flags); +} + /* * Adjust the refcounts of middle extents. At this point we should have * split extents that crossed the adjustment range; merged with adjacent @@ -1158,7 +1181,6 @@ xfs_refcount_adjust_extents( struct xfs_refcount_irec ext, tmp; int error; int found_rec, found_tmp; - xfs_fsblock_t fsbno; /* Merging did all the work already. */ if (*aglen == 0) @@ -1211,12 +1233,7 @@ xfs_refcount_adjust_extents( goto out_error; } } else { - fsbno = XFS_AGB_TO_FSB(cur->bc_mp, - cur->bc_ag.pag->pag_agno, - tmp.rc_startblock); - error = xfs_free_extent_later(cur->bc_tp, fsbno, - tmp.rc_blockcount, NULL, - XFS_AG_RESV_NONE, 0); + error = xrefc_free_extent(cur, &tmp); if (error) goto out_error; } @@ -1274,12 +1291,7 @@ xfs_refcount_adjust_extents( } goto advloop; } else { - fsbno = XFS_AGB_TO_FSB(cur->bc_mp, - cur->bc_ag.pag->pag_agno, - ext.rc_startblock); - error = xfs_free_extent_later(cur->bc_tp, fsbno, - ext.rc_blockcount, NULL, - XFS_AG_RESV_NONE, 0); + error = xrefc_free_extent(cur, &ext); if (error) goto out_error; } @@ -1474,6 +1486,20 @@ xfs_refcount_finish_one( return error; } +/* + * Process one of the deferred realtime refcount operations. We pass back the + * btree cursor to maintain our lock on the btree between calls. + */ +int +xfs_rtrefcount_finish_one( + struct xfs_trans *tp, + struct xfs_refcount_intent *ri, + struct xfs_btree_cur **pcur) +{ + ASSERT(0); + return -EFSCORRUPTED; +} + /* * Record a refcount intent for later processing. */ @@ -1481,6 +1507,7 @@ static void __xfs_refcount_add( struct xfs_trans *tp, enum xfs_refcount_intent_type type, + bool isrt, xfs_fsblock_t startblock, xfs_extlen_t blockcount) { @@ -1492,6 +1519,7 @@ __xfs_refcount_add( ri->ri_type = type; ri->ri_startblock = startblock; ri->ri_blockcount = blockcount; + ri->ri_realtime = isrt; xfs_refcount_defer_add(tp, ri); } @@ -1502,12 +1530,13 @@ __xfs_refcount_add( void xfs_refcount_increase_extent( struct xfs_trans *tp, + bool isrt, struct xfs_bmbt_irec *PREV) { if (!xfs_has_reflink(tp->t_mountp)) return; - __xfs_refcount_add(tp, XFS_REFCOUNT_INCREASE, PREV->br_startblock, + __xfs_refcount_add(tp, XFS_REFCOUNT_INCREASE, isrt, PREV->br_startblock, PREV->br_blockcount); } @@ -1517,12 +1546,13 @@ xfs_refcount_increase_extent( void xfs_refcount_decrease_extent( struct xfs_trans *tp, + bool isrt, struct xfs_bmbt_irec *PREV) { if (!xfs_has_reflink(tp->t_mountp)) return; - __xfs_refcount_add(tp, XFS_REFCOUNT_DECREASE, PREV->br_startblock, + __xfs_refcount_add(tp, XFS_REFCOUNT_DECREASE, isrt, PREV->br_startblock, PREV->br_blockcount); } @@ -1878,6 +1908,7 @@ __xfs_refcount_cow_free( void xfs_refcount_alloc_cow_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len) { @@ -1886,16 +1917,17 @@ xfs_refcount_alloc_cow_extent( if (!xfs_has_reflink(mp)) return; - __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len); + __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, isrt, fsb, len); /* Add rmap entry */ - xfs_rmap_alloc_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); + xfs_rmap_alloc_extent(tp, isrt, fsb, len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ void xfs_refcount_free_cow_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len) { @@ -1905,8 +1937,8 @@ xfs_refcount_free_cow_extent( return; /* Remove rmap entry */ - xfs_rmap_free_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); - __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len); + xfs_rmap_free_extent(tp, isrt, fsb, len, XFS_RMAP_OWN_COW); + __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, isrt, fsb, len); } struct xfs_refcount_recovery { @@ -2013,7 +2045,7 @@ xfs_refcount_recover_cow_leftovers( /* Free the orphan record */ fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, rr->rr_rrec.rc_startblock); - xfs_refcount_free_cow_extent(tp, fsb, + xfs_refcount_free_cow_extent(tp, false, fsb, rr->rr_rrec.rc_blockcount); /* Free the block. */ diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h index 13344b402a72c..56e5834feb624 100644 --- a/fs/xfs/libxfs/xfs_refcount.h +++ b/fs/xfs/libxfs/xfs_refcount.h @@ -57,10 +57,14 @@ enum xfs_refcount_intent_type { struct xfs_refcount_intent { struct list_head ri_list; - struct xfs_perag *ri_pag; + union { + struct xfs_perag *ri_pag; + struct xfs_rtgroup *ri_rtg; + }; enum xfs_refcount_intent_type ri_type; xfs_extlen_t ri_blockcount; xfs_fsblock_t ri_startblock; + bool ri_realtime; }; /* Check that the refcount is appropriate for the record domain. */ @@ -75,22 +79,24 @@ xfs_refcount_check_domain( return true; } -void xfs_refcount_increase_extent(struct xfs_trans *tp, +void xfs_refcount_increase_extent(struct xfs_trans *tp, bool isrt, struct xfs_bmbt_irec *irec); -void xfs_refcount_decrease_extent(struct xfs_trans *tp, +void xfs_refcount_decrease_extent(struct xfs_trans *tp, bool isrt, struct xfs_bmbt_irec *irec); -extern int xfs_refcount_finish_one(struct xfs_trans *tp, +int xfs_refcount_finish_one(struct xfs_trans *tp, + struct xfs_refcount_intent *ri, struct xfs_btree_cur **pcur); +int xfs_rtrefcount_finish_one(struct xfs_trans *tp, struct xfs_refcount_intent *ri, struct xfs_btree_cur **pcur); extern int xfs_refcount_find_shared(struct xfs_btree_cur *cur, xfs_agblock_t agbno, xfs_extlen_t aglen, xfs_agblock_t *fbno, xfs_extlen_t *flen, bool find_end_of_shared); -void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, xfs_fsblock_t fsb, - xfs_extlen_t len); -void xfs_refcount_free_cow_extent(struct xfs_trans *tp, xfs_fsblock_t fsb, - xfs_extlen_t len); +void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, bool isrt, + xfs_fsblock_t fsb, xfs_extlen_t len); +void xfs_refcount_free_cow_extent(struct xfs_trans *tp, bool isrt, + xfs_fsblock_t fsb, xfs_extlen_t len); extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp, struct xfs_perag *pag); diff --git a/fs/xfs/scrub/cow_repair.c b/fs/xfs/scrub/cow_repair.c index 1e82c727af8ed..e48d869986f34 100644 --- a/fs/xfs/scrub/cow_repair.c +++ b/fs/xfs/scrub/cow_repair.c @@ -344,7 +344,7 @@ xrep_cow_alloc( if (args.fsbno == NULLFSBLOCK) return -ENOSPC; - xfs_refcount_alloc_cow_extent(sc->tp, args.fsbno, args.len); + xfs_refcount_alloc_cow_extent(sc->tp, false, args.fsbno, args.len); repl->fsbno = args.fsbno; repl->len = args.len; diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c index b8c48e36d2a8d..bb28c2d2b8780 100644 --- a/fs/xfs/scrub/reap.c +++ b/fs/xfs/scrub/reap.c @@ -420,7 +420,8 @@ xreap_agextent_iter( * records from the refcountbt, which will remove the * rmap record as well. */ - xfs_refcount_free_cow_extent(sc->tp, fsbno, *aglenp); + xfs_refcount_free_cow_extent(sc->tp, false, fsbno, + *aglenp); return 0; } @@ -452,7 +453,7 @@ xreap_agextent_iter( if (rs->oinfo == &XFS_RMAP_OINFO_COW) { ASSERT(rs->resv == XFS_AG_RESV_NONE); - xfs_refcount_free_cow_extent(sc->tp, fsbno, *aglenp); + xfs_refcount_free_cow_extent(sc->tp, false, fsbno, *aglenp); error = xfs_free_extent_later(sc->tp, fsbno, *aglenp, NULL, rs->resv, XFS_FREE_EXTENT_SKIP_DISCARD); if (error) diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 1efb69fcadf10..46b4ea4cce15a 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1797,6 +1797,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = { &xlog_rtefd_item_ops, &xlog_rtrui_item_ops, &xlog_rtrud_item_ops, + &xlog_rtcui_item_ops, + &xlog_rtcud_item_ops, }; static const struct xlog_recover_item_ops * diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index bec3b91e826a4..4d5941335bc75 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -23,6 +23,7 @@ #include "xfs_ag.h" #include "xfs_btree.h" #include "xfs_trace.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_cui_cache; struct kmem_cache *xfs_cud_cache; @@ -94,8 +95,9 @@ xfs_cui_item_format( ASSERT(atomic_read(&cuip->cui_next_extent) == cuip->cui_format.cui_nextents); + ASSERT(lip->li_type == XFS_LI_CUI || lip->li_type == XFS_LI_CUI_RT); - cuip->cui_format.cui_type = XFS_LI_CUI; + cuip->cui_format.cui_type = lip->li_type; cuip->cui_format.cui_size = 1; xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_CUI_FORMAT, &cuip->cui_format, @@ -138,12 +140,15 @@ xfs_cui_item_release( STATIC struct xfs_cui_log_item * xfs_cui_init( struct xfs_mount *mp, + unsigned short item_type, uint nextents) { struct xfs_cui_log_item *cuip; ASSERT(nextents > 0); + ASSERT(item_type == XFS_LI_CUI || item_type == XFS_LI_CUI_RT); + if (nextents > XFS_CUI_MAX_FAST_EXTENTS) cuip = kmem_zalloc(xfs_cui_log_item_sizeof(nextents), 0); @@ -151,7 +156,7 @@ xfs_cui_init( cuip = kmem_cache_zalloc(xfs_cui_cache, GFP_KERNEL | __GFP_NOFAIL); - xfs_log_item_init(mp, &cuip->cui_item, XFS_LI_CUI, &xfs_cui_item_ops); + xfs_log_item_init(mp, &cuip->cui_item, item_type, &xfs_cui_item_ops); cuip->cui_format.cui_nextents = nextents; cuip->cui_format.cui_id = (uintptr_t)(void *)cuip; atomic_set(&cuip->cui_next_extent, 0); @@ -190,7 +195,9 @@ xfs_cud_item_format( struct xfs_cud_log_item *cudp = CUD_ITEM(lip); struct xfs_log_iovec *vecp = NULL; - cudp->cud_format.cud_type = XFS_LI_CUD; + ASSERT(lip->li_type == XFS_LI_CUD || lip->li_type == XFS_LI_CUD_RT); + + cudp->cud_format.cud_type = lip->li_type; cudp->cud_format.cud_size = 1; xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_CUD_FORMAT, &cudp->cud_format, @@ -234,6 +241,14 @@ static inline struct xfs_refcount_intent *ci_entry(const struct list_head *e) return list_entry(e, struct xfs_refcount_intent, ri_list); } +static inline bool +xfs_cui_item_isrt(const struct xfs_log_item *lip) +{ + ASSERT(lip->li_type == XFS_LI_CUI || lip->li_type == XFS_LI_CUI_RT); + + return lip->li_type == XFS_LI_CUI_RT; +} + /* Sort refcount intents by AG. */ static int xfs_refcount_update_diff_items( @@ -289,11 +304,12 @@ xfs_refcount_update_create_intent( bool sort) { struct xfs_mount *mp = tp->t_mountp; - struct xfs_cui_log_item *cuip = xfs_cui_init(mp, count); + struct xfs_cui_log_item *cuip; struct xfs_refcount_intent *ri; ASSERT(count > 0); + cuip = xfs_cui_init(mp, XFS_LI_CUI, count); if (sort) list_sort(mp, items, xfs_refcount_update_diff_items); list_for_each_entry(ri, items, ri_list) @@ -301,6 +317,12 @@ xfs_refcount_update_create_intent( return &cuip->cui_item; } +static inline unsigned short +xfs_cud_type_from_cui(const struct xfs_cui_log_item *cuip) +{ + return xfs_cui_item_isrt(&cuip->cui_item) ? XFS_LI_CUD_RT : XFS_LI_CUD; +} + /* Get an CUD so we can process all the deferred refcount updates. */ static struct xfs_log_item * xfs_refcount_update_create_done( @@ -312,8 +334,8 @@ xfs_refcount_update_create_done( struct xfs_cud_log_item *cudp; cudp = kmem_cache_zalloc(xfs_cud_cache, GFP_KERNEL | __GFP_NOFAIL); - xfs_log_item_init(tp->t_mountp, &cudp->cud_item, XFS_LI_CUD, - &xfs_cud_item_ops); + xfs_log_item_init(tp->t_mountp, &cudp->cud_item, + xfs_cud_type_from_cui(cuip), &xfs_cud_item_ops); cudp->cud_cuip = cuip; cudp->cud_format.cud_cui_id = cuip->cui_format.cui_id; @@ -330,8 +352,22 @@ xfs_refcount_defer_add( trace_xfs_refcount_defer(mp, ri); - ri->ri_pag = xfs_perag_intent_get(mp, ri->ri_startblock); - xfs_defer_add(tp, &ri->ri_list, &xfs_refcount_update_defer_type); + /* + * Deferred refcount updates for the realtime and data sections must + * use separate transactions to finish deferred work because updates to + * realtime metadata files can lock AGFs to allocate btree blocks and + * we don't want that mixing with the AGF locks taken to finish data + * section updates. + */ + if (ri->ri_realtime) { + ri->ri_rtg = xfs_rtgroup_intent_get(mp, ri->ri_startblock); + xfs_defer_add(tp, &ri->ri_list, + &xfs_rtrefcount_update_defer_type); + } else { + ri->ri_pag = xfs_perag_intent_get(mp, ri->ri_startblock); + xfs_defer_add(tp, &ri->ri_list, + &xfs_refcount_update_defer_type); + } } /* Cancel a deferred refcount update. */ @@ -514,10 +550,12 @@ xfs_refcount_relog_intent( struct xfs_phys_extent *pmap; unsigned int count; + ASSERT(intent->li_type == XFS_LI_CUI || intent->li_type == XFS_LI_CUI_RT); + count = CUI_ITEM(intent)->cui_format.cui_nextents; pmap = CUI_ITEM(intent)->cui_format.cui_extents; - cuip = xfs_cui_init(tp->t_mountp, count); + cuip = xfs_cui_init(tp->t_mountp, intent->li_type, count); memcpy(cuip->cui_format.cui_extents, pmap, count * sizeof(*pmap)); atomic_set(&cuip->cui_next_extent, count); @@ -537,6 +575,105 @@ const struct xfs_defer_op_type xfs_refcount_update_defer_type = { .relog_intent = xfs_refcount_relog_intent, }; +#ifdef CONFIG_XFS_RT +/* Sort refcount intents by rtgroup. */ +static int +xfs_rtrefcount_update_diff_items( + void *priv, + const struct list_head *a, + const struct list_head *b) +{ + struct xfs_refcount_intent *ra = ci_entry(a); + struct xfs_refcount_intent *rb = ci_entry(b); + + return ra->ri_rtg->rtg_rgno - rb->ri_rtg->rtg_rgno; +} + +static struct xfs_log_item * +xfs_rtrefcount_update_create_intent( + struct xfs_trans *tp, + struct list_head *items, + unsigned int count, + bool sort) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_cui_log_item *cuip; + struct xfs_refcount_intent *ri; + + ASSERT(count > 0); + + cuip = xfs_cui_init(mp, XFS_LI_CUI_RT, count); + if (sort) + list_sort(mp, items, xfs_rtrefcount_update_diff_items); + list_for_each_entry(ri, items, ri_list) + xfs_refcount_update_log_item(tp, cuip, ri); + return &cuip->cui_item; +} + +/* Cancel a deferred realtime refcount update. */ +STATIC void +xfs_rtrefcount_update_cancel_item( + struct list_head *item) +{ + struct xfs_refcount_intent *ri = ci_entry(item); + + xfs_rtgroup_intent_put(ri->ri_rtg); + kmem_cache_free(xfs_refcount_intent_cache, ri); +} + +/* Process a deferred realtime refcount update. */ +STATIC int +xfs_rtrefcount_update_finish_item( + struct xfs_trans *tp, + struct xfs_log_item *done, + struct list_head *item, + struct xfs_btree_cur **state) +{ + struct xfs_refcount_intent *ri = ci_entry(item); + int error; + + error = xfs_rtrefcount_finish_one(tp, ri, state); + + /* Did we run out of reservation? Requeue what we didn't finish. */ + if (!error && ri->ri_blockcount > 0) { + ASSERT(ri->ri_type == XFS_REFCOUNT_INCREASE || + ri->ri_type == XFS_REFCOUNT_DECREASE); + return -EAGAIN; + } + + xfs_rtrefcount_update_cancel_item(item); + return error; +} + +/* Clean up after calling xfs_rtrefcount_finish_one. */ +STATIC void +xfs_rtrefcount_finish_one_cleanup( + struct xfs_trans *tp, + struct xfs_btree_cur *rcur, + int error) +{ + if (rcur) + xfs_btree_del_cursor(rcur, error); +} + +const struct xfs_defer_op_type xfs_rtrefcount_update_defer_type = { + .name = "rtrefcount", + .max_items = XFS_CUI_MAX_FAST_EXTENTS, + .create_intent = xfs_rtrefcount_update_create_intent, + .abort_intent = xfs_refcount_update_abort_intent, + .create_done = xfs_refcount_update_create_done, + .finish_item = xfs_rtrefcount_update_finish_item, + .finish_cleanup = xfs_rtrefcount_finish_one_cleanup, + .cancel_item = xfs_rtrefcount_update_cancel_item, + .recover_work = xfs_refcount_recover_work, + .relog_intent = xfs_refcount_relog_intent, +}; +#else +const struct xfs_defer_op_type xfs_rtrefcount_update_defer_type = { + .name = "rtrefcount", +}; +#endif /* CONFIG_XFS_RT */ + STATIC bool xfs_cui_item_match( struct xfs_log_item *lip, @@ -602,7 +739,7 @@ xlog_recover_cui_commit_pass2( return -EFSCORRUPTED; } - cuip = xfs_cui_init(mp, cui_formatp->cui_nextents); + cuip = xfs_cui_init(mp, ITEM_TYPE(item), cui_formatp->cui_nextents); xfs_cui_copy_format(&cuip->cui_format, cui_formatp); atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents); @@ -616,6 +753,61 @@ const struct xlog_recover_item_ops xlog_cui_item_ops = { .commit_pass2 = xlog_recover_cui_commit_pass2, }; +#ifdef CONFIG_XFS_RT +STATIC int +xlog_recover_rtcui_commit_pass2( + struct xlog *log, + struct list_head *buffer_list, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + struct xfs_mount *mp = log->l_mp; + struct xfs_cui_log_item *cuip; + struct xfs_cui_log_format *cui_formatp; + size_t len; + + cui_formatp = item->ri_buf[0].i_addr; + + if (item->ri_buf[0].i_len < xfs_cui_log_format_sizeof(0)) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, + item->ri_buf[0].i_addr, item->ri_buf[0].i_len); + return -EFSCORRUPTED; + } + + len = xfs_cui_log_format_sizeof(cui_formatp->cui_nextents); + if (item->ri_buf[0].i_len != len) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, + item->ri_buf[0].i_addr, item->ri_buf[0].i_len); + return -EFSCORRUPTED; + } + + cuip = xfs_cui_init(mp, ITEM_TYPE(item), cui_formatp->cui_nextents); + xfs_cui_copy_format(&cuip->cui_format, cui_formatp); + atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents); + + xlog_recover_intent_item(log, &cuip->cui_item, lsn, + &xfs_rtrefcount_update_defer_type); + return 0; +} +#else +STATIC int +xlog_recover_rtcui_commit_pass2( + struct xlog *log, + struct list_head *buffer_list, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, log->l_mp, + item->ri_buf[0].i_addr, item->ri_buf[0].i_len); + return -EFSCORRUPTED; +} +#endif + +const struct xlog_recover_item_ops xlog_rtcui_item_ops = { + .item_type = XFS_LI_CUI_RT, + .commit_pass2 = xlog_recover_rtcui_commit_pass2, +}; + /* * This routine is called when an CUD format structure is found in a committed * transaction in the log. Its purpose is to cancel the corresponding CUI if it @@ -647,3 +839,33 @@ const struct xlog_recover_item_ops xlog_cud_item_ops = { .item_type = XFS_LI_CUD, .commit_pass2 = xlog_recover_cud_commit_pass2, }; + +#ifdef CONFIG_XFS_RT +STATIC int +xlog_recover_rtcud_commit_pass2( + struct xlog *log, + struct list_head *buffer_list, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + struct xfs_cud_log_format *cud_formatp; + + cud_formatp = item->ri_buf[0].i_addr; + if (item->ri_buf[0].i_len != sizeof(struct xfs_cud_log_format)) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, log->l_mp, + item->ri_buf[0].i_addr, item->ri_buf[0].i_len); + return -EFSCORRUPTED; + } + + xlog_recover_release_intent(log, XFS_LI_CUI_RT, + cud_formatp->cud_cui_id); + return 0; +} +#else +# define xlog_recover_rtcud_commit_pass2 xlog_recover_rtcui_commit_pass2 +#endif + +const struct xlog_recover_item_ops xlog_rtcud_item_ops = { + .item_type = XFS_LI_CUD_RT, + .commit_pass2 = xlog_recover_rtcud_commit_pass2, +}; diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 7e9273cc16e14..591782ca7d284 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -585,6 +585,7 @@ xfs_reflink_cancel_cow_blocks( struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_COW_FORK); struct xfs_bmbt_irec got, del; struct xfs_iext_cursor icur; + bool isrt = XFS_IS_REALTIME_INODE(ip); int error = 0; if (!xfs_inode_has_cow_data(ip)) @@ -614,12 +615,13 @@ xfs_reflink_cancel_cow_blocks( ASSERT((*tpp)->t_highest_agno == NULLAGNUMBER); /* Free the CoW orphan record. */ - xfs_refcount_free_cow_extent(*tpp, del.br_startblock, - del.br_blockcount); + xfs_refcount_free_cow_extent(*tpp, isrt, + del.br_startblock, del.br_blockcount); error = xfs_free_extent_later(*tpp, del.br_startblock, del.br_blockcount, NULL, - XFS_AG_RESV_NONE, 0); + XFS_AG_RESV_NONE, + isrt ? XFS_FREE_EXTENT_REALTIME : 0); if (error) break; @@ -729,6 +731,7 @@ xfs_reflink_end_cow_extent( struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_COW_FORK); unsigned int resblks; int nmaps; + bool isrt = XFS_IS_REALTIME_INODE(ip); int error; /* No COW extents? That's easy! */ @@ -807,7 +810,7 @@ xfs_reflink_end_cow_extent( * or not), unmap the extent and drop its refcount. */ xfs_bmap_unmap_extent(tp, ip, XFS_DATA_FORK, &data); - xfs_refcount_decrease_extent(tp, &data); + xfs_refcount_decrease_extent(tp, isrt, &data); xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -data.br_blockcount); } else if (data.br_startblock == DELAYSTARTBLOCK) { @@ -827,7 +830,8 @@ xfs_reflink_end_cow_extent( } /* Free the CoW orphan record. */ - xfs_refcount_free_cow_extent(tp, del.br_startblock, del.br_blockcount); + xfs_refcount_free_cow_extent(tp, isrt, del.br_startblock, + del.br_blockcount); /* Map the new blocks into the data fork. */ xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &del); @@ -1164,6 +1168,7 @@ xfs_reflink_remap_extent( bool quota_reserved = true; bool smap_real; bool dmap_written = xfs_bmap_is_written_extent(dmap); + bool isrt = XFS_IS_REALTIME_INODE(ip); int iext_delta = 0; int nimaps; int error; @@ -1295,7 +1300,7 @@ xfs_reflink_remap_extent( * or not), unmap the extent and drop its refcount. */ xfs_bmap_unmap_extent(tp, ip, XFS_DATA_FORK, &smap); - xfs_refcount_decrease_extent(tp, &smap); + xfs_refcount_decrease_extent(tp, isrt, &smap); qdelta -= smap.br_blockcount; } else if (smap.br_startblock == DELAYSTARTBLOCK) { int done; @@ -1318,7 +1323,7 @@ xfs_reflink_remap_extent( * its refcount and map it into the file. */ if (dmap_written) { - xfs_refcount_increase_extent(tp, dmap); + xfs_refcount_increase_extent(tp, isrt, dmap); xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, dmap); qdelta += dmap->br_blockcount; } From patchwork Sun Dec 31 21:46:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507722 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AE49BE4A for ; Sun, 31 Dec 2023 21:46:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="edcOqYKx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 72A59C433C8; Sun, 31 Dec 2023 21:46:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059215; bh=zmZX8SYGsrwnvNQcHgZ4qxtShWHy8z+EB7e94IYIKk0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=edcOqYKx0W0PaeC3A/HQSV2OHssUDMF4vlFLaEqeXPC3Y6bAG4a0MckQbLrf8BEzz UasQoxs6h7D6BRrWy0DBEHZ6//DbwBYgOgwkCMQCLmr/+JT12Xw/yhCConggBs2l8m zkD4gvWaM3dQS1Y5rhDqzTd70LRsMZuE/nDE/oJifRrwJbqNIA2UbrNuChEch9vwhc dGMr1CUId1WkaOvss1e+9gGIQYaleWkP7jHVqG95M538VKB0HpKPB9cx0XoRR1BNFL 5CSS4PJlsf5A+ITA107KUTF2qxG/9csjZuYoE76RGXqFP1kfheOmj3CNshlXRGwdse CSahirHoF0YrQ== Date: Sun, 31 Dec 2023 13:46:54 -0800 Subject: [PATCH 09/44] xfs: support recovering refcount intent items targetting realtime extents From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851727.1766284.18367389082794551051.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now that we have reflink on the realtime device, refcount intent items have to support remapping extents on the realtime volume. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_refcount_item.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index 4d5941335bc75..11df3aaa7b3f7 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -433,6 +433,7 @@ xfs_refcount_update_abort_intent( static inline bool xfs_cui_validate_phys( struct xfs_mount *mp, + bool isrt, struct xfs_phys_extent *pmap) { if (!xfs_has_reflink(mp)) @@ -451,6 +452,9 @@ xfs_cui_validate_phys( return false; } + if (isrt) + return xfs_verify_rtbext(mp, pmap->pe_startblock, pmap->pe_len); + return xfs_verify_fsbext(mp, pmap->pe_startblock, pmap->pe_len); } @@ -458,6 +462,7 @@ static inline void xfs_cui_recover_work( struct xfs_mount *mp, struct xfs_defer_pending *dfp, + bool isrt, struct xfs_phys_extent *pmap) { struct xfs_refcount_intent *ri; @@ -467,7 +472,12 @@ xfs_cui_recover_work( ri->ri_type = pmap->pe_flags & XFS_REFCOUNT_EXTENT_TYPE_MASK; ri->ri_startblock = pmap->pe_startblock; ri->ri_blockcount = pmap->pe_len; - ri->ri_pag = xfs_perag_intent_get(mp, pmap->pe_startblock); + ri->ri_realtime = isrt; + if (isrt) { + ri->ri_rtg = xfs_rtgroup_intent_get(mp, pmap->pe_startblock); + } else { + ri->ri_pag = xfs_perag_intent_get(mp, pmap->pe_startblock); + } xfs_defer_add_item(dfp, &ri->ri_list); } @@ -486,6 +496,7 @@ xfs_refcount_recover_work( struct xfs_cui_log_item *cuip = CUI_ITEM(lip); struct xfs_trans *tp; struct xfs_mount *mp = lip->li_log->l_mp; + bool isrt = xfs_cui_item_isrt(lip); int i; int error = 0; @@ -495,7 +506,7 @@ xfs_refcount_recover_work( * just toss the CUI. */ for (i = 0; i < cuip->cui_format.cui_nextents; i++) { - if (!xfs_cui_validate_phys(mp, + if (!xfs_cui_validate_phys(mp, isrt, &cuip->cui_format.cui_extents[i])) { XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, &cuip->cui_format, @@ -503,7 +514,8 @@ xfs_refcount_recover_work( return -EFSCORRUPTED; } - xfs_cui_recover_work(mp, dfp, &cuip->cui_format.cui_extents[i]); + xfs_cui_recover_work(mp, dfp, isrt, + &cuip->cui_format.cui_extents[i]); } /* From patchwork Sun Dec 31 21:47:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507723 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 52CEEBE48 for ; Sun, 31 Dec 2023 21:47:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Pd4jPxM8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1DD62C433C7; Sun, 31 Dec 2023 21:47:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059231; bh=JHmeHBKLmKbnpHhi6WprZOGcLny5LV4Sa19vbbGv7B4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Pd4jPxM8IPRFszW+jAVRPKuMn3Yc4NW+QbXNH7VICz2cQ1OO774HyuucE4jwmePTk 1HL1cwfk16bVsAzYIw4idDiB3ufZc2/NeRNy2HO1fZ0YA602KmEgp4Je/CsRaiKdPR MvkTIjsBOSZJuWLUjLgtZT9yNkvhJV54QVTEJMm4U2t7SELajGvhxKanMR/qdN3avk nl0UMeSN1jvAnlUN/Zn9whr1c5Fx4O/zAn2L/6hne5tnn27GvloiXUlUHnlOgcZZNY ShMMXICkpUgIucjghr86wgswj/ly1uKFQAOUlHTlF+y1GzIEHSKl2EQSLR5Pp1RC6e WQWAsCGMsGHiw== Date: Sun, 31 Dec 2023 13:47:10 -0800 Subject: [PATCH 10/44] xfs: add realtime refcount btree block detection to log recovery From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851743.1766284.10397022615627765088.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Identify rt refcount btree blocks in the log correctly so that we can validate them during log recovery. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_buf_item_recover.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index c7d86636bd312..ebe7f2c3cf635 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -268,6 +268,9 @@ xlog_recover_validate_buf_type( case XFS_REFC_CRC_MAGIC: bp->b_ops = &xfs_refcountbt_buf_ops; break; + case XFS_RTREFC_CRC_MAGIC: + bp->b_ops = &xfs_rtrefcountbt_buf_ops; + break; default: warnmsg = "Bad btree block magic!"; break; @@ -772,6 +775,7 @@ xlog_recover_get_buf_lsn( break; } case XFS_RTRMAP_CRC_MAGIC: + case XFS_RTREFC_CRC_MAGIC: case XFS_BMAP_CRC_MAGIC: case XFS_BMAP_MAGIC: { struct xfs_btree_block *btb = blk; From patchwork Sun Dec 31 21:47:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507724 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41407BE47 for ; Sun, 31 Dec 2023 21:47:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="INYB9HLh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BB5E4C433C7; Sun, 31 Dec 2023 21:47:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059246; bh=6eVUvKBbRXJDGQRPfWoPr2Wx8W21OJraHdeR/5aXpHI=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=INYB9HLhnMh/GpT05jkkN/04KN8gw072aBYrYACUvqOD4sudbNnOF6YL1Ek0l6Ti7 gNvOrRawLiGgRs1FSWdW4cjDdsu2551bQPGlqXgUaZ4XQ+rClj+hEq5BqRzqujNUmQ oEI+9xLlMSrXtxXva9F5rN3N4aAwsL8zKC8K9MyQIa6hhzGPxRSXYrUxSvluHOV+CG Is7i3eTKXUsxL63IcEl8ZLS4ndix7RgkStXGyMjJgMCI77qGUu3upoHHwLxHu1iY0s jS/W2daMgk9dCjKgiSOXPXFAhhhqNBYrvybs4uwbWo3eRKV2gqcJyN4kqqp5nSArff y/0bec9+NmgpA== Date: Sun, 31 Dec 2023 13:47:26 -0800 Subject: [PATCH 11/44] xfs: add realtime refcount btree inode to metadata directory From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851758.1766284.13403280825223969168.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add a metadir path to select the realtime refcount btree inode and load it at mount time. The rtrefcountbt inode will have a unique extent format code, which means that we also have to update the inode validation and flush routines to look for it. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 8 +++- fs/xfs/libxfs/xfs_format.h | 4 ++ fs/xfs/libxfs/xfs_inode_buf.c | 8 ++++ fs/xfs/libxfs/xfs_inode_fork.c | 9 +++++ fs/xfs/libxfs/xfs_rtgroup.h | 3 ++ fs/xfs/libxfs/xfs_rtrefcount_btree.c | 33 ++++++++++++++++++ fs/xfs/libxfs/xfs_rtrefcount_btree.h | 4 ++ fs/xfs/xfs_inode.c | 13 +++++++ fs/xfs/xfs_inode_item.c | 2 + fs/xfs/xfs_inode_item_recover.c | 1 + fs/xfs/xfs_rtalloc.c | 63 ++++++++++++++++++++++++++++++++++ fs/xfs/xfs_trace.h | 1 + 12 files changed, 146 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 9a285f38da4cd..27f992dc6d2d6 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5111,9 +5111,13 @@ xfs_bmap_del_extent_real( * the same order of operations as the data device, which is: * Remove the file mapping, remove the reverse mapping, and * then free the blocks. This means that we must delay the - * freeing until after we've scheduled the rmap update. + * freeing until after we've scheduled the rmap update. If + * realtime reflink is enabled, use deferred refcount intent + * items to decide what to do with the extent, just like we do + * for the data device. */ - if (want_free && !xfs_has_rtrmapbt(mp)) { + if (want_free && !xfs_has_rtrmapbt(mp) && + !xfs_has_rtreflink(mp)) { error = xfs_rtfree_blocks(tp, del->br_startblock, del->br_blockcount); if (error) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index c938b814c430d..93a9b8e3b5694 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1028,6 +1028,7 @@ enum xfs_dinode_fmt { XFS_DINODE_FMT_BTREE, /* struct xfs_bmdr_block */ XFS_DINODE_FMT_UUID, /* added long ago, but never used */ XFS_DINODE_FMT_RMAP, /* reverse mapping btree */ + XFS_DINODE_FMT_REFCOUNT, /* reference count btree */ }; #define XFS_INODE_FORMAT_STR \ @@ -1036,7 +1037,8 @@ enum xfs_dinode_fmt { { XFS_DINODE_FMT_EXTENTS, "extent" }, \ { XFS_DINODE_FMT_BTREE, "btree" }, \ { XFS_DINODE_FMT_UUID, "uuid" }, \ - { XFS_DINODE_FMT_RMAP, "rmap" } + { XFS_DINODE_FMT_RMAP, "rmap" }, \ + { XFS_DINODE_FMT_REFCOUNT, "refcount" } /* * Max values for extnum and aextnum. diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index dae2efec1d5d0..6e08ff8d8e239 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -417,6 +417,12 @@ xfs_dinode_verify_fork( if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADIR))) return __this_address; break; + case XFS_DINODE_FMT_REFCOUNT: + if (!xfs_has_rtreflink(mp)) + return __this_address; + if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADIR))) + return __this_address; + break; default: return __this_address; } @@ -437,6 +443,7 @@ xfs_dinode_verify_forkoff( return __this_address; break; case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: if (!(xfs_has_metadir(mp) && xfs_has_parent(mp))) return __this_address; fallthrough; @@ -708,6 +715,7 @@ xfs_dinode_verify( if (flags2 & XFS_DIFLAG2_METADIR) { switch (XFS_DFORK_FORMAT(dip, XFS_DATA_FORK)) { case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: break; default: if (nextents + naextents == 0 && nblocks != 0) diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index e7ab04aea2db6..ae6e7deb04106 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -271,6 +271,11 @@ xfs_iformat_data_fork( return -EFSCORRUPTED; } return xfs_iformat_rtrmap(ip, dip); + case XFS_DINODE_FMT_REFCOUNT: + if (!xfs_has_rtreflink(ip->i_mount)) + return -EFSCORRUPTED; + ASSERT(0); /* to be implemented later */ + return -EFSCORRUPTED; default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -666,6 +671,10 @@ xfs_iflush_fork( xfs_iflush_rtrmap(ip, dip); break; + case XFS_DINODE_FMT_REFCOUNT: + ASSERT(0); /* to be implemented later */ + break; + default: ASSERT(0); break; diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 3522527e553b8..bd88a4d728135 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -25,6 +25,9 @@ struct xfs_rtgroup { /* reverse mapping btree inode */ struct xfs_inode *rtg_rmapip; + /* refcount btree inode */ + struct xfs_inode *rtg_refcountip; + /* Number of blocks in this group */ xfs_rgblock_t rtg_blockcount; diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index 0892c4ddc7adf..ead6baf6de7e4 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -26,6 +26,7 @@ #include "xfs_extent_busy.h" #include "xfs_rtgroup.h" #include "xfs_rtbitmap.h" +#include "xfs_imeta.h" static struct kmem_cache *xfs_rtrefcountbt_cur_cache; @@ -355,6 +356,7 @@ xfs_rtrefcountbt_commit_staged_btree( int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + ASSERT(ifake->if_fork->if_format == XFS_DINODE_FMT_REFCOUNT); /* * Free any resources hanging off the real fork, then shallow-copy the @@ -458,3 +460,34 @@ xfs_rtrefcountbt_compute_maxlevels( /* Add one level to handle the inode root level. */ mp->m_rtrefc_maxlevels = min(d_maxlevels, r_maxlevels) + 1; } + +#define XFS_RTREFC_NAMELEN 21 + +/* Create the metadata directory path for an rtrefcount btree inode. */ +int +xfs_rtrefcountbt_create_path( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp) +{ + struct xfs_imeta_path *path; + unsigned char *fname; + int error; + + error = xfs_imeta_create_file_path(mp, 2, &path); + if (error) + return error; + + fname = kmalloc(XFS_RTREFC_NAMELEN, GFP_KERNEL); + if (!fname) { + xfs_imeta_free_path(path); + return -ENOMEM; + } + + snprintf(fname, XFS_RTREFC_NAMELEN, "%u.refcount", rgno); + path->im_path[0] = "realtime"; + path->im_path[1] = fname; + path->im_dynamicmask = 0x2; + *pathp = path; + return 0; +} diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.h b/fs/xfs/libxfs/xfs_rtrefcount_btree.h index 6d23ab3a9ad41..ff49e95d1a490 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.h +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.h @@ -11,6 +11,7 @@ struct xfs_btree_cur; struct xfs_mount; struct xbtree_ifakeroot; struct xfs_rtgroup; +struct xfs_imeta_path; /* refcounts only exist on crc enabled filesystems */ #define XFS_RTREFCOUNT_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN @@ -68,4 +69,7 @@ unsigned int xfs_rtrefcountbt_maxlevels_ondisk(void); int __init xfs_rtrefcountbt_init_cur_cache(void); void xfs_rtrefcountbt_destroy_cur_cache(void); +int xfs_rtrefcountbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index b7cda81161fb5..334f87f7a8f3f 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2509,6 +2509,14 @@ xfs_iflush( __func__, ip->i_ino, ip); goto flush_out; } + } else if (ip->i_df.if_format == XFS_DINODE_FMT_REFCOUNT) { + if (!S_ISREG(VFS_I(ip)->i_mode) || + !(ip->i_diflags2 & XFS_DIFLAG2_METADIR)) { + xfs_alert_tag(mp, XFS_PTAG_IFLUSH, + "%s: Bad rt refcountbt inode %Lu, ptr "PTR_FMT, + __func__, ip->i_ino, ip); + goto flush_out; + } } else if (S_ISREG(VFS_I(ip)->i_mode)) { if (XFS_TEST_ERROR( ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS && @@ -2555,6 +2563,11 @@ xfs_iflush( "%s: rt rmapbt in inode %Lu attr fork, ptr "PTR_FMT, __func__, ip->i_ino, ip); goto flush_out; + } else if (ip->i_af.if_format == XFS_DINODE_FMT_REFCOUNT) { + xfs_alert_tag(mp, XFS_PTAG_IFLUSH, + "%s: rt refcountbt in inode %Lu attr fork, ptr "PTR_FMT, + __func__, ip->i_ino, ip); + goto flush_out; } } diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 2903c101505f5..fdc0b14bb9fbb 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -211,6 +211,7 @@ xfs_inode_item_data_fork_size( break; case XFS_DINODE_FMT_BTREE: case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: if ((iip->ili_fields & XFS_ILOG_DBROOT) && ip->i_df.if_broot_bytes > 0) { *nbytes += ip->i_df.if_broot_bytes; @@ -332,6 +333,7 @@ xfs_inode_item_format_data_fork( break; case XFS_DINODE_FMT_BTREE: case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: iip->ili_fields &= ~(XFS_ILOG_DDATA | XFS_ILOG_DEXT | XFS_ILOG_DEV); diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index 4cf967df28ef1..2b5f7a143c479 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -420,6 +420,7 @@ xlog_recover_inode_commit_pass2( if (unlikely(S_ISREG(ldip->di_mode))) { if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && (ldip->di_format != XFS_DINODE_FMT_RMAP) && + (ldip->di_format != XFS_DINODE_FMT_REFCOUNT) && (ldip->di_format != XFS_DINODE_FMT_BTREE)) { XFS_CORRUPTION_ERROR( "Bad log dinode data fork format for regular file", diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 37156cf8acd25..4102a46ed274d 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -31,6 +31,7 @@ #include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" #include "xfs_trace.h" +#include "xfs_rtrefcount_btree.h" /* * Realtime metadata files are not quite regular files because userspace can't @@ -42,6 +43,7 @@ static struct lock_class_key xfs_rbmip_key; static struct lock_class_key xfs_rsumip_key; static struct lock_class_key xfs_rrmapip_key; +static struct lock_class_key xfs_rrefcountip_key; /* * Read and return the summary information for a given extent size, @@ -1835,6 +1837,53 @@ xfs_rtmount_iread_extents( return error; } +/* Load realtime refcount btree inode. */ +STATIC int +xfs_rtmount_refcountbt( + struct xfs_rtgroup *rtg, + struct xfs_trans *tp) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_path *path; + struct xfs_inode *ip; + xfs_ino_t ino; + int error; + + if (!xfs_has_rtreflink(mp)) + return 0; + + error = xfs_rtrefcountbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + return error; + + error = xfs_imeta_lookup(tp, path, &ino); + if (error) + goto out_path; + + if (ino == NULLFSINO) { + error = -EFSCORRUPTED; + goto out_path; + } + + error = xfs_rt_iget(tp, ino, &xfs_rrefcountip_key, &ip); + if (error) + goto out_path; + + if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_REFCOUNT)) { + error = -EFSCORRUPTED; + goto out_rele; + } + + rtg->rtg_refcountip = ip; + ip = NULL; +out_rele: + if (ip) + xfs_imeta_irele(ip); +out_path: + xfs_imeta_free_path(path); + return error; +} + /* * Get the bitmap and summary inodes and the summary cache into the mount * structure at mount time. @@ -1886,6 +1935,12 @@ xfs_rtmount_inodes( xfs_rtgroup_rele(rtg); goto out_rele_rtgroup; } + + error = xfs_rtmount_refcountbt(rtg, tp); + if (error) { + xfs_rtgroup_rele(rtg); + goto out_rele_rtgroup; + } } xfs_alloc_rsum_cache(mp, sbp->sb_rbmblocks); @@ -1894,6 +1949,10 @@ xfs_rtmount_inodes( out_rele_rtgroup: for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_refcountip) + xfs_imeta_irele(rtg->rtg_refcountip); + rtg->rtg_refcountip = NULL; + if (rtg->rtg_rmapip) xfs_imeta_irele(rtg->rtg_rmapip); rtg->rtg_rmapip = NULL; @@ -1917,6 +1976,10 @@ xfs_rtunmount_inodes( kmem_free(mp->m_rsum_cache); for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_refcountip) + xfs_imeta_irele(rtg->rtg_refcountip); + rtg->rtg_refcountip = NULL; + if (rtg->rtg_rmapip) xfs_imeta_irele(rtg->rtg_rmapip); rtg->rtg_rmapip = NULL; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 86e0aa946aa00..f94f144f9a39d 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2225,6 +2225,7 @@ TRACE_DEFINE_ENUM(XFS_DINODE_FMT_EXTENTS); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_BTREE); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_UUID); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_RMAP); +TRACE_DEFINE_ENUM(XFS_DINODE_FMT_REFCOUNT); DECLARE_EVENT_CLASS(xfs_swap_extent_class, TP_PROTO(struct xfs_inode *ip, int which), From patchwork Sun Dec 31 21:47:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507725 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89888BE47 for ; Sun, 31 Dec 2023 21:47:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Up15CKmJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 58CA8C433C8; Sun, 31 Dec 2023 21:47:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059262; bh=2V6R5jbIGh5H1Han7a9SUUCwTBzDY2NZAyDOWoivsM0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Up15CKmJFZ/RKREco6AC3HHy4hOTSjNvSgAMfaJ9/G4vIwvRZ9b3CYZoXUtadi9B/ V7MP33hjh25u39crexqYsm5NWaeWqZtgeXXx1aatBmuskYBpy3zh7ncHFaZG9El7Xl gIsILzp1wB+oYAtyDMsrVUhT+mVBMZFMGmjemN+ccb73Kzz3WvLb2fJyWcNPw3m4G6 5zRgAUeJmpacmcOdpaMwF3OzrAX/H0FTeA8XrFCpCpMDFh8/oBDhHiUug9ki1FP3Sv YOubJb74y47oDUnZSawrnq+NIQrLYj/5SWWpm90tglDTJG50vS6DQsQxPzia46KVKn iqSVaYED2hXdw== Date: Sun, 31 Dec 2023 13:47:41 -0800 Subject: [PATCH 12/44] xfs: add metadata reservations for realtime refcount btree From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851776.1766284.11118894100439868574.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Reserve some free blocks so that we will always have enough free blocks in the data volume to handle expansion of the realtime refcount btree. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_rtrefcount_btree.c | 39 ++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrefcount_btree.h | 4 +++ fs/xfs/xfs_rtalloc.c | 9 +++++++- 3 files changed, 51 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index ead6baf6de7e4..e1e8a3ea32091 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -491,3 +491,42 @@ xfs_rtrefcountbt_create_path( *pathp = path; return 0; } + +/* Calculate the rtrefcount btree size for some records. */ +unsigned long long +xfs_rtrefcountbt_calc_size( + struct xfs_mount *mp, + unsigned long long len) +{ + return xfs_btree_calc_size(mp->m_rtrefc_mnr, len); +} + +/* + * Calculate the maximum refcount btree size. + */ +static unsigned long long +xfs_rtrefcountbt_max_size( + struct xfs_mount *mp, + xfs_rtblock_t rtblocks) +{ + /* Bail out if we're uninitialized, which can happen in mkfs. */ + if (mp->m_rtrefc_mxr[0] == 0) + return 0; + + return xfs_rtrefcountbt_calc_size(mp, rtblocks); +} + +/* + * Figure out how many blocks to reserve and how many are used by this btree. + * We need enough space to hold one record for every rt extent in the rtgroup. + */ +xfs_filblks_t +xfs_rtrefcountbt_calc_reserves( + struct xfs_mount *mp) +{ + if (!xfs_has_rtreflink(mp)) + return 0; + + return xfs_rtrefcountbt_max_size(mp, + xfs_rtb_to_rtx(mp, mp->m_sb.sb_rgblocks)); +} diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.h b/fs/xfs/libxfs/xfs_rtrefcount_btree.h index ff49e95d1a490..045f7b1f72833 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.h +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.h @@ -72,4 +72,8 @@ void xfs_rtrefcountbt_destroy_cur_cache(void); int xfs_rtrefcountbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, struct xfs_imeta_path **pathp); +xfs_filblks_t xfs_rtrefcountbt_calc_reserves(struct xfs_mount *mp); +unsigned long long xfs_rtrefcountbt_calc_size(struct xfs_mount *mp, + unsigned long long len); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 4102a46ed274d..14e17c2b39ef0 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1711,8 +1711,10 @@ xfs_rt_resv_free( struct xfs_rtgroup *rtg; xfs_rgnumber_t rgno; - for_each_rtgroup(mp, rgno, rtg) + for_each_rtgroup(mp, rgno, rtg) { + xfs_imeta_resv_free_inode(rtg->rtg_refcountip); xfs_imeta_resv_free_inode(rtg->rtg_rmapip); + } } /* Reserve space for rt metadata inodes' space expansion. */ @@ -1732,6 +1734,11 @@ xfs_rt_resv_init( err2 = xfs_imeta_resv_init_inode(rtg->rtg_rmapip, ask); if (err2 && !error) error = err2; + + ask = xfs_rtrefcountbt_calc_reserves(mp); + err2 = xfs_imeta_resv_init_inode(rtg->rtg_refcountip, ask); + if (err2 && !error) + error = err2; } return error; From patchwork Sun Dec 31 21:47:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507726 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E8F7BE4D for ; Sun, 31 Dec 2023 21:47:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YVTf8zG8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 18170C433C7; Sun, 31 Dec 2023 21:47:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059278; bh=woua6IpqqqlyjVZ9b74wmYzhNy0KRmv2OM6hIs3bF0g=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=YVTf8zG8ZcO+QjWN9s2GSu7cqa3ouYELKvcyCnT6u+bFy6jK5TZUJOYPd2uLR9ejd 3ffqz1JkVkv7pNSEa7YFa+0SPU6Opm5yPAS30l+Lwo+FeJU869DJ/EMdsK60d2WGot yLJz9j1I9aTypUWk8vdm1ywbgrleGxynZr4D+O/a8jGw9ZkH5GRlJjfWCAF0CQJ3az oXAbowd7cILir+nK3VJd9SoCebhzDGOnxkKLL9IF1DumuGxUaDj3c6dp4SH5yWCk3Q 3JHCQWlbG4qP8WJgor7F8JNGterWyxX5C3VaYIVCPB5j5kK2nmlu7Ua64uzFZoXhPH 0HF9yN3HolMzg== Date: Sun, 31 Dec 2023 13:47:57 -0800 Subject: [PATCH 13/44] xfs: wire up a new inode fork type for the realtime refcount From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851792.1766284.1788850346864434897.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Plumb in the pieces we need to embed the root of the realtime refcount btree in an inode's data fork, complete with new fork type and on-disk interpretation functions. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_format.h | 8 + fs/xfs/libxfs/xfs_inode_fork.c | 8 + fs/xfs/libxfs/xfs_ondisk.h | 1 fs/xfs/libxfs/xfs_rtrefcount_btree.c | 236 ++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrefcount_btree.h | 112 ++++++++++++++++ fs/xfs/xfs_inode_item_recover.c | 4 + 6 files changed, 366 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 93a9b8e3b5694..ca964befb51cf 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1824,6 +1824,14 @@ typedef __be32 xfs_refcount_ptr_t; */ #define XFS_RTREFC_CRC_MAGIC 0x52434e54 /* 'RCNT' */ +/* + * rt refcount root header, on-disk form only. + */ +struct xfs_rtrefcount_root { + __be16 bb_level; /* 0 is a leaf */ + __be16 bb_numrecs; /* current # of data records */ +}; + /* inode-rooted btree pointer type */ typedef __be64 xfs_rtrefcount_ptr_t; diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index ae6e7deb04106..df42ffa15d96e 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -28,6 +28,7 @@ #include "xfs_health.h" #include "xfs_symlink_remote.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" struct kmem_cache *xfs_ifork_cache; @@ -274,8 +275,7 @@ xfs_iformat_data_fork( case XFS_DINODE_FMT_REFCOUNT: if (!xfs_has_rtreflink(ip->i_mount)) return -EFSCORRUPTED; - ASSERT(0); /* to be implemented later */ - return -EFSCORRUPTED; + return xfs_iformat_rtrefcount(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -672,7 +672,9 @@ xfs_iflush_fork( break; case XFS_DINODE_FMT_REFCOUNT: - ASSERT(0); /* to be implemented later */ + ASSERT(whichfork == XFS_DATA_FORK); + if (iip->ili_fields & brootflag[whichfork]) + xfs_iflush_rtrefcount(ip, dip); break; default: diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h index 242b683125662..3a5581ecb36d4 100644 --- a/fs/xfs/libxfs/xfs_ondisk.h +++ b/fs/xfs/libxfs/xfs_ondisk.h @@ -80,6 +80,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t, 8); XFS_CHECK_STRUCT_SIZE(struct xfs_rtrmap_root, 4); XFS_CHECK_STRUCT_SIZE(xfs_rtrefcount_ptr_t, 8); + XFS_CHECK_STRUCT_SIZE(struct xfs_rtrefcount_root, 4); /* * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index e1e8a3ea32091..ae8dea035d29f 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -85,6 +85,41 @@ xfs_rtrefcountbt_get_maxrecs( return cur->bc_mp->m_rtrefc_mxr[level != 0]; } +/* + * Calculate number of records in a realtime refcount btree inode root. + */ +unsigned int +xfs_rtrefcountbt_droot_maxrecs( + unsigned int blocklen, + bool leaf) +{ + blocklen -= sizeof(struct xfs_rtrefcount_root); + + if (leaf) + return blocklen / sizeof(struct xfs_refcount_rec); + return blocklen / (2 * sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); +} + +/* + * Get the maximum records we could store in the on-disk format. + * + * For non-root nodes this is equivalent to xfs_rtrefcountbt_get_maxrecs, but + * for the root node this checks the available space in the dinode fork so that + * we can resize the in-memory buffer to match it. After a resize to the + * maximum size this function returns the same value as + * xfs_rtrefcountbt_get_maxrecs for the root node, too. + */ +STATIC int +xfs_rtrefcountbt_get_dmaxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level != cur->bc_nlevels - 1) + return cur->bc_mp->m_rtrefc_mxr[level != 0]; + return xfs_rtrefcountbt_droot_maxrecs(cur->bc_ino.forksize, level == 0); +} + STATIC void xfs_rtrefcountbt_init_key_from_rec( union xfs_btree_key *key, @@ -255,6 +290,68 @@ xfs_rtrefcountbt_keys_contiguous( be32_to_cpu(key2->refc.rc_startblock)); } +/* Move the rt refcount btree root from one incore buffer to another. */ +static void +xfs_rtrefcountbt_broot_move( + struct xfs_inode *ip, + int whichfork, + struct xfs_btree_block *dst_broot, + size_t dst_bytes, + struct xfs_btree_block *src_broot, + size_t src_bytes, + unsigned int level, + unsigned int numrecs) +{ + struct xfs_mount *mp = ip->i_mount; + void *dptr; + void *sptr; + + ASSERT(xfs_rtrefcount_droot_space(src_broot) <= + xfs_inode_fork_size(ip, whichfork)); + + /* + * We always have to move the pointers because they are not butted + * against the btree block header. + */ + if (numrecs && level > 0) { + sptr = xfs_rtrefcount_broot_ptr_addr(mp, src_broot, 1, + src_bytes); + dptr = xfs_rtrefcount_broot_ptr_addr(mp, dst_broot, 1, + dst_bytes); + memmove(dptr, sptr, numrecs * sizeof(xfs_fsblock_t)); + } + + if (src_broot == dst_broot) + return; + + /* + * If the root is being totally relocated, we have to migrate the block + * header and the keys/records that come after it. + */ + memcpy(dst_broot, src_broot, XFS_RTREFCOUNT_BLOCK_LEN); + + if (!numrecs) + return; + + if (level == 0) { + sptr = xfs_rtrefcount_rec_addr(src_broot, 1); + dptr = xfs_rtrefcount_rec_addr(dst_broot, 1); + memcpy(dptr, sptr, + numrecs * sizeof(struct xfs_refcount_rec)); + } else { + sptr = xfs_rtrefcount_key_addr(src_broot, 1); + dptr = xfs_rtrefcount_key_addr(dst_broot, 1); + memcpy(dptr, sptr, + numrecs * sizeof(struct xfs_refcount_key)); + } +} + +static const struct xfs_ifork_broot_ops xfs_rtrefcountbt_iroot_ops = { + .maxrecs = xfs_rtrefcountbt_maxrecs, + .size = xfs_rtrefcount_broot_space_calc, + .move = xfs_rtrefcountbt_broot_move, +}; + const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .rec_len = sizeof(struct xfs_refcount_rec), .key_len = sizeof(struct xfs_refcount_key), @@ -267,6 +364,7 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .free_block = xfs_btree_free_imeta_block, .get_minrecs = xfs_rtrefcountbt_get_minrecs, .get_maxrecs = xfs_rtrefcountbt_get_maxrecs, + .get_dmaxrecs = xfs_rtrefcountbt_get_dmaxrecs, .init_key_from_rec = xfs_rtrefcountbt_init_key_from_rec, .init_high_key_from_rec = xfs_rtrefcountbt_init_high_key_from_rec, .init_rec_from_cur = xfs_rtrefcountbt_init_rec_from_cur, @@ -277,6 +375,7 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .keys_inorder = xfs_rtrefcountbt_keys_inorder, .recs_inorder = xfs_rtrefcountbt_recs_inorder, .keys_contiguous = xfs_rtrefcountbt_keys_contiguous, + .iroot_ops = &xfs_rtrefcountbt_iroot_ops, }; /* Initialize a new rt refcount btree cursor. */ @@ -530,3 +629,140 @@ xfs_rtrefcountbt_calc_reserves( return xfs_rtrefcountbt_max_size(mp, xfs_rtb_to_rtx(mp, mp->m_sb.sb_rgblocks)); } + +/* + * Convert on-disk form of btree root to in-memory form. + */ +STATIC void +xfs_rtrefcountbt_from_disk( + struct xfs_inode *ip, + struct xfs_rtrefcount_root *dblock, + int dblocklen, + struct xfs_btree_block *rblock) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_refcount_key *fkp; + __be64 *fpp; + struct xfs_refcount_key *tkp; + __be64 *tpp; + struct xfs_refcount_rec *frp; + struct xfs_refcount_rec *trp; + unsigned int numrecs; + unsigned int maxrecs; + unsigned int rblocklen; + + rblocklen = xfs_rtrefcount_broot_space(mp, dblock); + + xfs_btree_init_block(mp, rblock, &xfs_rtrefcountbt_ops, 0, 0, + ip->i_ino); + + rblock->bb_level = dblock->bb_level; + rblock->bb_numrecs = dblock->bb_numrecs; + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrefcountbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrefcount_droot_key_addr(dblock, 1); + tkp = xfs_rtrefcount_key_addr(rblock, 1); + fpp = xfs_rtrefcount_droot_ptr_addr(dblock, 1, maxrecs); + tpp = xfs_rtrefcount_broot_ptr_addr(mp, rblock, 1, rblocklen); + numrecs = be16_to_cpu(dblock->bb_numrecs); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrefcount_droot_rec_addr(dblock, 1); + trp = xfs_rtrefcount_rec_addr(rblock, 1); + numrecs = be16_to_cpu(dblock->bb_numrecs); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Load a realtime reference count btree root in from disk. */ +int +xfs_iformat_rtrefcount( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrefcount_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + unsigned int numrecs; + unsigned int level; + int dsize; + + dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK); + numrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (level > mp->m_rtrefc_maxlevels || + xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize) + return -EFSCORRUPTED; + + xfs_iroot_alloc(ip, XFS_DATA_FORK, + xfs_rtrefcount_broot_space_calc(mp, level, numrecs)); + xfs_rtrefcountbt_from_disk(ip, dfp, dsize, ifp->if_broot); + return 0; +} + +/* + * Convert in-memory form of btree root to on-disk form. + */ +void +xfs_rtrefcountbt_to_disk( + struct xfs_mount *mp, + struct xfs_btree_block *rblock, + int rblocklen, + struct xfs_rtrefcount_root *dblock, + int dblocklen) +{ + struct xfs_refcount_key *fkp; + __be64 *fpp; + struct xfs_refcount_key *tkp; + __be64 *tpp; + struct xfs_refcount_rec *frp; + struct xfs_refcount_rec *trp; + unsigned int maxrecs; + unsigned int numrecs; + + ASSERT(rblock->bb_magic == cpu_to_be32(XFS_RTREFC_CRC_MAGIC)); + ASSERT(uuid_equal(&rblock->bb_u.l.bb_uuid, &mp->m_sb.sb_meta_uuid)); + ASSERT(rblock->bb_u.l.bb_blkno == cpu_to_be64(XFS_BUF_DADDR_NULL)); + ASSERT(rblock->bb_u.l.bb_leftsib == cpu_to_be64(NULLFSBLOCK)); + ASSERT(rblock->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK)); + + dblock->bb_level = rblock->bb_level; + dblock->bb_numrecs = rblock->bb_numrecs; + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrefcountbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrefcount_key_addr(rblock, 1); + tkp = xfs_rtrefcount_droot_key_addr(dblock, 1); + fpp = xfs_rtrefcount_broot_ptr_addr(mp, rblock, 1, rblocklen); + tpp = xfs_rtrefcount_droot_ptr_addr(dblock, 1, maxrecs); + numrecs = be16_to_cpu(rblock->bb_numrecs); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrefcount_rec_addr(rblock, 1); + trp = xfs_rtrefcount_droot_rec_addr(dblock, 1); + numrecs = be16_to_cpu(rblock->bb_numrecs); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Flush a realtime reference count btree root out to disk. */ +void +xfs_iflush_rtrefcount( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrefcount_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + + ASSERT(ifp->if_broot != NULL); + ASSERT(ifp->if_broot_bytes > 0); + ASSERT(xfs_rtrefcount_droot_space(ifp->if_broot) <= + xfs_inode_fork_size(ip, XFS_DATA_FORK)); + xfs_rtrefcountbt_to_disk(ip->i_mount, ifp->if_broot, + ifp->if_broot_bytes, dfp, + XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); +} diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.h b/fs/xfs/libxfs/xfs_rtrefcount_btree.h index 045f7b1f72833..bd070e54781a1 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.h +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.h @@ -27,6 +27,7 @@ void xfs_rtrefcountbt_commit_staged_btree(struct xfs_btree_cur *cur, unsigned int xfs_rtrefcountbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, bool leaf); void xfs_rtrefcountbt_compute_maxlevels(struct xfs_mount *mp); +unsigned int xfs_rtrefcountbt_droot_maxrecs(unsigned int blocklen, bool leaf); /* * Addresses of records, keys, and pointers within an incore rtrefcountbt block. @@ -76,4 +77,115 @@ xfs_filblks_t xfs_rtrefcountbt_calc_reserves(struct xfs_mount *mp); unsigned long long xfs_rtrefcountbt_calc_size(struct xfs_mount *mp, unsigned long long len); +/* Addresses of key, pointers, and records within an ondisk rtrefcount block. */ + +static inline struct xfs_refcount_rec * +xfs_rtrefcount_droot_rec_addr( + struct xfs_rtrefcount_root *block, + unsigned int index) +{ + return (struct xfs_refcount_rec *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_refcount_rec)); +} + +static inline struct xfs_refcount_key * +xfs_rtrefcount_droot_key_addr( + struct xfs_rtrefcount_root *block, + unsigned int index) +{ + return (struct xfs_refcount_key *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_refcount_key)); +} + +static inline xfs_rtrefcount_ptr_t * +xfs_rtrefcount_droot_ptr_addr( + struct xfs_rtrefcount_root *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrefcount_ptr_t *) + ((char *)(block + 1) + + maxrecs * sizeof(struct xfs_refcount_key) + + (index - 1) * sizeof(xfs_rtrefcount_ptr_t)); +} + +/* + * Address of pointers within the incore btree root. + * + * These are to be used when we know the size of the block and + * we don't have a cursor. + */ +static inline xfs_rtrefcount_ptr_t * +xfs_rtrefcount_broot_ptr_addr( + struct xfs_mount *mp, + struct xfs_btree_block *bb, + unsigned int index, + unsigned int block_size) +{ + return xfs_rtrefcount_ptr_addr(bb, index, + xfs_rtrefcountbt_maxrecs(mp, block_size, false)); +} + +/* + * Compute the space required for the incore btree root containing the given + * number of records. + */ +static inline size_t +xfs_rtrefcount_broot_space_calc( + struct xfs_mount *mp, + unsigned int level, + unsigned int nrecs) +{ + size_t sz = XFS_RTREFCOUNT_BLOCK_LEN; + + if (level > 0) + return sz + nrecs * (sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); + return sz + nrecs * sizeof(struct xfs_refcount_rec); +} + +/* + * Compute the space required for the incore btree root given the ondisk + * btree root block. + */ +static inline size_t +xfs_rtrefcount_broot_space(struct xfs_mount *mp, struct xfs_rtrefcount_root *bb) +{ + return xfs_rtrefcount_broot_space_calc(mp, be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +/* Compute the space required for the ondisk root block. */ +static inline size_t +xfs_rtrefcount_droot_space_calc( + unsigned int level, + unsigned int nrecs) +{ + size_t sz = sizeof(struct xfs_rtrefcount_root); + + if (level > 0) + return sz + nrecs * (sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); + return sz + nrecs * sizeof(struct xfs_refcount_rec); +} + +/* + * Compute the space required for the ondisk root block given an incore root + * block. + */ +static inline size_t +xfs_rtrefcount_droot_space(struct xfs_btree_block *bb) +{ + return xfs_rtrefcount_droot_space_calc(be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +int xfs_iformat_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip); +void xfs_rtrefcountbt_to_disk(struct xfs_mount *mp, + struct xfs_btree_block *rblock, int rblocklen, + struct xfs_rtrefcount_root *dblock, int dblocklen); +void xfs_iflush_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index 2b5f7a143c479..317a27e6a5a4b 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -23,6 +23,7 @@ #include "xfs_icache.h" #include "xfs_bmap_btree.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" STATIC void xlog_recover_inode_ra_pass2( @@ -284,6 +285,9 @@ xlog_recover_inode_dbroot( case XFS_DINODE_FMT_RMAP: xfs_rtrmapbt_to_disk(mp, src, len, dfork, dsize); break; + case XFS_DINODE_FMT_REFCOUNT: + xfs_rtrefcountbt_to_disk(mp, src, len, dfork, dsize); + break; default: ASSERT(0); return -EFSCORRUPTED; From patchwork Sun Dec 31 21:48:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507727 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E170BBE47 for ; Sun, 31 Dec 2023 21:48:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="g0C3xqsm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AE588C433C8; Sun, 31 Dec 2023 21:48:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059293; bh=ZD8N+nCWdNTvfc53yO/w2Dq1ZUe/JEGRyt4HQRPKm1M=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=g0C3xqsm0PMKZtE2c0O8GSJ20/aWomhfuixoDwsIMIezpgso23RD/rMNzkoW6HGRq JztgQWvt+kVoecLqPPW4HODOEDPFWcAHCnpJswloi45ospobIB2B2aewgxUR+mxhxB YM7V/BCF5gaRspR/EIe3IKzKjNAyFVPlEU8seMJLFLnQzRCX0G0RjKrpB1JOvoRnP8 KAx4TrR7SYJ8sR5H1eL1SYXMwBOpkpjQzFt1O++KcipMiUarMZ4eJV+Fb6jCs/KzUD xdbfE4Fq04PvA+VqUWIQUvJsMmzYXghtzuhDxVSY2x0/65veDfbYtzoketmaBAsILD sVsA9XuSrYOxA== Date: Sun, 31 Dec 2023 13:48:13 -0800 Subject: [PATCH 14/44] xfs: wire up realtime refcount btree cursors From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851809.1766284.6640915780614547754.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Wire up realtime refcount btree cursors wherever they're needed throughout the code base. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_refcount.c | 101 +++++++++++++++++++++++++++++++++++++++++- fs/xfs/libxfs/xfs_rtgroup.c | 10 ++++ fs/xfs/libxfs/xfs_rtgroup.h | 5 ++ fs/xfs/xfs_fsmap.c | 22 ++++++--- fs/xfs/xfs_reflink.c | 99 +++++++++++++++++++++++++++++++++-------- 5 files changed, 206 insertions(+), 31 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 2ae126d3bd7ff..60e838adde0d8 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -27,6 +27,7 @@ #include "xfs_refcount_item.h" #include "xfs_rtgroup.h" #include "xfs_rtalloc.h" +#include "xfs_rtrefcount_btree.h" struct kmem_cache *xfs_refcount_intent_cache; @@ -1486,6 +1487,33 @@ xfs_refcount_finish_one( return error; } +/* + * Set up a continuation a deferred rtrefcount operation by updating the + * intent. Checks to make sure we're not going to run off the end of the + * rtgroup. + */ +static inline int +xfs_rtrefcount_continue_op( + struct xfs_btree_cur *cur, + struct xfs_refcount_intent *ri, + xfs_agblock_t new_agbno) +{ + struct xfs_mount *mp = cur->bc_mp; + struct xfs_rtgroup *rtg = ri->ri_rtg; + + if (XFS_IS_CORRUPT(mp, !xfs_verify_rgbext(rtg, new_agbno, + ri->ri_blockcount))) { + xfs_btree_mark_sick(cur); + return -EFSCORRUPTED; + } + + ri->ri_startblock = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, new_agbno); + + ASSERT(xfs_verify_rtbext(mp, ri->ri_startblock, ri->ri_blockcount)); + ASSERT(rtg->rtg_rgno == xfs_rtb_to_rgno(mp, ri->ri_startblock)); + return 0; +} + /* * Process one of the deferred realtime refcount operations. We pass back the * btree cursor to maintain our lock on the btree between calls. @@ -1496,8 +1524,77 @@ xfs_rtrefcount_finish_one( struct xfs_refcount_intent *ri, struct xfs_btree_cur **pcur) { - ASSERT(0); - return -EFSCORRUPTED; + struct xfs_mount *mp = tp->t_mountp; + struct xfs_btree_cur *rcur = *pcur; + int error = 0; + xfs_rgnumber_t rgno; + xfs_rgblock_t bno; + unsigned long nr_ops = 0; + int shape_changes = 0; + + bno = xfs_rtb_to_rgbno(mp, ri->ri_startblock, &rgno); + + trace_xfs_refcount_deferred(mp, ri); + + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REFCOUNT_FINISH_ONE)) + return -EIO; + + /* + * If we haven't gotten a cursor or the cursor AG doesn't match + * the startblock, get one now. + */ + if (rcur != NULL && rcur->bc_ino.rtg != ri->ri_rtg) { + nr_ops = xrefc_btree_state(rcur)->nr_ops; + shape_changes = xrefc_btree_state(rcur)->shape_changes; + xfs_btree_del_cursor(rcur, 0); + rcur = NULL; + *pcur = NULL; + } + if (rcur == NULL) { + xfs_rtgroup_lock(tp, ri->ri_rtg, XFS_RTGLOCK_REFCOUNT); + *pcur = rcur = xfs_rtrefcountbt_init_cursor(mp, tp, ri->ri_rtg, + ri->ri_rtg->rtg_refcountip); + + xrefc_btree_state(rcur)->nr_ops = nr_ops; + xrefc_btree_state(rcur)->shape_changes = shape_changes; + } + + switch (ri->ri_type) { + case XFS_REFCOUNT_INCREASE: + error = xfs_refcount_adjust(rcur, &bno, &ri->ri_blockcount, + XFS_REFCOUNT_ADJUST_INCREASE); + if (error) + return error; + if (ri->ri_blockcount > 0) + error = xfs_rtrefcount_continue_op(rcur, ri, bno); + break; + case XFS_REFCOUNT_DECREASE: + error = xfs_refcount_adjust(rcur, &bno, &ri->ri_blockcount, + XFS_REFCOUNT_ADJUST_DECREASE); + if (error) + return error; + if (ri->ri_blockcount > 0) + error = xfs_rtrefcount_continue_op(rcur, ri, bno); + break; + case XFS_REFCOUNT_ALLOC_COW: + error = __xfs_refcount_cow_alloc(rcur, bno, ri->ri_blockcount); + if (error) + return error; + ri->ri_blockcount = 0; + break; + case XFS_REFCOUNT_FREE_COW: + error = __xfs_refcount_cow_free(rcur, bno, ri->ri_blockcount); + if (error) + return error; + ri->ri_blockcount = 0; + break; + default: + ASSERT(0); + return -EFSCORRUPTED; + } + if (!error && ri->ri_blockcount > 0) + trace_xfs_refcount_finish_one_leftover(mp, ri); + return error; } /* diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index 8bc97c9aa4c9c..173c3887788f7 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -561,6 +561,13 @@ xfs_rtgroup_lock( if (tp) xfs_trans_ijoin(tp, rtg->rtg_rmapip, XFS_ILOCK_EXCL); } + + if ((rtglock_flags & XFS_RTGLOCK_REFCOUNT) && rtg->rtg_refcountip) { + xfs_ilock(rtg->rtg_refcountip, XFS_ILOCK_EXCL); + if (tp) + xfs_trans_ijoin(tp, rtg->rtg_refcountip, + XFS_ILOCK_EXCL); + } } /* Unlock metadata inodes associated with this rt group. */ @@ -573,6 +580,9 @@ xfs_rtgroup_unlock( ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) || !(rtglock_flags & XFS_RTGLOCK_BITMAP)); + if ((rtglock_flags & XFS_RTGLOCK_REFCOUNT) && rtg->rtg_refcountip) + xfs_iunlock(rtg->rtg_refcountip, XFS_ILOCK_EXCL); + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg->rtg_rmapip) xfs_iunlock(rtg->rtg_rmapip, XFS_ILOCK_EXCL); diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index bd88a4d728135..659d0c15d2ade 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -248,10 +248,13 @@ int xfs_rtgroup_init_secondary_super(struct xfs_mount *mp, xfs_rgnumber_t rgno, #define XFS_RTGLOCK_BITMAP_SHARED (1U << 1) /* Lock the rt rmap inode in exclusive mode */ #define XFS_RTGLOCK_RMAP (1U << 2) +/* Lock the rt refcount inode in exclusive mode */ +#define XFS_RTGLOCK_REFCOUNT (1U << 3) #define XFS_RTGLOCK_ALL_FLAGS (XFS_RTGLOCK_BITMAP | \ XFS_RTGLOCK_BITMAP_SHARED | \ - XFS_RTGLOCK_RMAP) + XFS_RTGLOCK_RMAP | \ + XFS_RTGLOCK_REFCOUNT) void xfs_rtgroup_lock(struct xfs_trans *tp, struct xfs_rtgroup *rtg, unsigned int rtglock_flags); diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index b0eabc76eb28a..cc8175af986aa 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -27,6 +27,7 @@ #include "xfs_ag.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" /* Convert an xfs_fsmap to an fsmap. */ static void @@ -216,14 +217,16 @@ xfs_getfsmap_is_shared( *stat = false; if (!xfs_has_reflink(mp)) return 0; - /* rt files will have no perag structure */ - if (!info->pag) - return 0; + + if (info->rtg) + cur = xfs_rtrefcountbt_init_cursor(mp, tp, info->rtg, + info->rtg->rtg_refcountip); + else + cur = xfs_refcountbt_init_cursor(mp, tp, info->agf_bp, + info->pag); /* Are there any shared blocks here? */ flen = 0; - cur = xfs_refcountbt_init_cursor(mp, tp, info->agf_bp, info->pag); - error = xfs_refcount_find_shared(cur, rec->rm_startblock, rec->rm_blockcount, &fbno, &flen, false); @@ -828,7 +831,8 @@ xfs_getfsmap_rtdev_rmapbt_query( info); /* Query the rtrmapbt */ - xfs_rtgroup_lock(NULL, info->rtg, XFS_RTGLOCK_RMAP); + xfs_rtgroup_lock(NULL, info->rtg, XFS_RTGLOCK_RMAP | + XFS_RTGLOCK_REFCOUNT); *curpp = xfs_rtrmapbt_init_cursor(mp, tp, info->rtg, info->rtg->rtg_rmapip); return xfs_rmap_query_range(*curpp, &info->low, &info->high, @@ -917,7 +921,8 @@ xfs_getfsmap_rtdev_rmapbt( if (bt_cur) { xfs_rtgroup_unlock(bt_cur->bc_ino.rtg, - XFS_RTGLOCK_RMAP); + XFS_RTGLOCK_RMAP | + XFS_RTGLOCK_REFCOUNT); xfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR); bt_cur = NULL; } @@ -954,7 +959,8 @@ xfs_getfsmap_rtdev_rmapbt( } if (bt_cur) { - xfs_rtgroup_unlock(bt_cur->bc_ino.rtg, XFS_RTGLOCK_RMAP); + xfs_rtgroup_unlock(bt_cur->bc_ino.rtg, XFS_RTGLOCK_RMAP | + XFS_RTGLOCK_REFCOUNT); xfs_btree_del_cursor(bt_cur, error < 0 ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR); } diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 591782ca7d284..a10d43a1a7da4 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -30,6 +30,9 @@ #include "xfs_ag.h" #include "xfs_ag_resv.h" #include "xfs_health.h" +#include "xfs_rtrefcount_btree.h" +#include "xfs_rtalloc.h" +#include "xfs_rtgroup.h" /* * Copy on Write of Shared Blocks @@ -155,6 +158,38 @@ xfs_reflink_find_shared( return error; } +/* + * Given an RT extent, find the lowest-numbered run of shared blocks + * within that range and return the range in fbno/flen. If + * find_end_of_shared is true, return the longest contiguous extent of + * shared blocks. If there are no shared extents, fbno and flen will + * be set to NULLRGBLOCK and 0, respectively. + */ +static int +xfs_reflink_find_rtshared( + struct xfs_rtgroup *rtg, + struct xfs_trans *tp, + xfs_agblock_t rtbno, + xfs_extlen_t rtlen, + xfs_agblock_t *fbno, + xfs_extlen_t *flen, + bool find_end_of_shared) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_btree_cur *cur; + int error; + + BUILD_BUG_ON(NULLRGBLOCK != NULLAGBLOCK); + + xfs_rtgroup_lock(NULL, rtg, XFS_RTGLOCK_REFCOUNT); + cur = xfs_rtrefcountbt_init_cursor(mp, tp, rtg, rtg->rtg_refcountip); + error = xfs_refcount_find_shared(cur, rtbno, rtlen, fbno, flen, + find_end_of_shared); + xfs_btree_del_cursor(cur, error); + xfs_rtgroup_unlock(rtg, XFS_RTGLOCK_REFCOUNT); + return error; +} + /* * Trim the mapping to the next block where there's a change in the * shared/unshared status. More specifically, this means that we @@ -172,9 +207,7 @@ xfs_reflink_trim_around_shared( bool *shared) { struct xfs_mount *mp = ip->i_mount; - struct xfs_perag *pag; - xfs_agblock_t agbno; - xfs_extlen_t aglen; + xfs_agblock_t orig_bno; xfs_agblock_t fbno; xfs_extlen_t flen; int error = 0; @@ -187,13 +220,25 @@ xfs_reflink_trim_around_shared( trace_xfs_reflink_trim_around_shared(ip, irec); - pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, irec->br_startblock)); - agbno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock); - aglen = irec->br_blockcount; + if (XFS_IS_REALTIME_INODE(ip)) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; - error = xfs_reflink_find_shared(pag, NULL, agbno, aglen, &fbno, &flen, - true); - xfs_perag_put(pag); + orig_bno = xfs_rtb_to_rgbno(mp, irec->br_startblock, &rgno); + rtg = xfs_rtgroup_get(mp, rgno); + error = xfs_reflink_find_rtshared(rtg, NULL, orig_bno, + irec->br_blockcount, &fbno, &flen, true); + xfs_rtgroup_put(rtg); + } else { + struct xfs_perag *pag; + + pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, + irec->br_startblock)); + orig_bno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock); + error = xfs_reflink_find_shared(pag, NULL, orig_bno, + irec->br_blockcount, &fbno, &flen, true); + xfs_perag_put(pag); + } if (error) return error; @@ -203,7 +248,7 @@ xfs_reflink_trim_around_shared( return 0; } - if (fbno == agbno) { + if (fbno == orig_bno) { /* * The start of this extent is shared. Truncate the * mapping at the end of the shared region so that a @@ -221,7 +266,7 @@ xfs_reflink_trim_around_shared( * extent so that a subsequent iteration starts at the * start of the shared region. */ - irec->br_blockcount = fbno - agbno; + irec->br_blockcount = fbno - orig_bno; return 0; } @@ -1582,9 +1627,6 @@ xfs_reflink_inode_has_shared_extents( *has_shared = false; found = xfs_iext_lookup_extent(ip, ifp, 0, &icur, &got); while (found) { - struct xfs_perag *pag; - xfs_agblock_t agbno; - xfs_extlen_t aglen; xfs_agblock_t rbno; xfs_extlen_t rlen; @@ -1592,12 +1634,29 @@ xfs_reflink_inode_has_shared_extents( got.br_state != XFS_EXT_NORM) goto next; - pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, got.br_startblock)); - agbno = XFS_FSB_TO_AGBNO(mp, got.br_startblock); - aglen = got.br_blockcount; - error = xfs_reflink_find_shared(pag, tp, agbno, aglen, - &rbno, &rlen, false); - xfs_perag_put(pag); + if (XFS_IS_REALTIME_INODE(ip)) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno; + + rgbno = xfs_rtb_to_rgbno(mp, got.br_startblock, &rgno); + rtg = xfs_rtgroup_get(mp, rgno); + error = xfs_reflink_find_rtshared(rtg, tp, rgbno, + got.br_blockcount, &rbno, &rlen, + false); + xfs_rtgroup_put(rtg); + } else { + struct xfs_perag *pag; + xfs_agblock_t agbno; + + pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, + got.br_startblock)); + agbno = XFS_FSB_TO_AGBNO(mp, got.br_startblock); + error = xfs_reflink_find_shared(pag, tp, agbno, + got.br_blockcount, &rbno, &rlen, + false); + xfs_perag_put(pag); + } if (error) return error; From patchwork Sun Dec 31 21:48:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507728 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2D0BBA43 for ; Sun, 31 Dec 2023 21:48:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jF1va4DS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 512F9C433C8; Sun, 31 Dec 2023 21:48:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059309; bh=1MHmQkZxX1hKvwPSr06Xi5JxzD8pn/gFw7k8Yd58yK4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=jF1va4DSAmIg2CgE97DP3v0JrbT4BYelHs4r/l0GjTdk9WH2+l3vNpAziMojnCt9M fshYm9w97d7k0NCukONWTBjYGmDoUqn7Kq7cyUAZfcKBgpf1BwB6PUsT7UEHfGeSOW qXJGDmXrfrKnDqCTrG1j2ZLSiI/nacsU12G3HRcs/Qh0YGd5wJIAVL43/ydqMLzcMB XJ0bK5fraLo/j5UX4Pt/MpUzw2Uwr4X+JLIfJVt0hJI38YLCXf+wq7cE6YTMDLpfo2 J1qKu7ow9NaXvhCk9fY8B5ps8v8Whf3qPCIdVdm6ZbZaZcyeFGpmeD9siZLICiM+qS mLrFQL2pIAvTg== Date: Sun, 31 Dec 2023 13:48:28 -0800 Subject: [PATCH 15/44] xfs: create routine to allocate and initialize a realtime refcount btree inode From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851825.1766284.164144626420718315.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create a library routine to allocate and initialize an empty realtime refcountbt inode. We'll use this for growfs, mkfs, and repair. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_rtrefcount_btree.c | 34 ++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrefcount_btree.h | 5 +++++ 2 files changed, 39 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index ae8dea035d29f..fb0e4abcd6f6a 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -766,3 +766,37 @@ xfs_iflush_rtrefcount( ifp->if_broot_bytes, dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); } + +/* + * Create a realtime refcount btree inode. + * + * Regardless of the return value, the caller must clean up @upd. If a new + * inode is returned through *ipp, the caller must finish setting up the incore + * inode and release it. + */ +int +xfs_rtrefcountbt_create( + struct xfs_imeta_update *upd, + struct xfs_inode **ipp) +{ + struct xfs_mount *mp = upd->mp; + struct xfs_ifork *ifp; + int error; + + error = xfs_imeta_create(upd, S_IFREG, ipp); + if (error) + return error; + + ifp = xfs_ifork_ptr(upd->ip, XFS_DATA_FORK); + ifp->if_format = XFS_DINODE_FMT_REFCOUNT; + ASSERT(ifp->if_broot_bytes == 0); + ASSERT(ifp->if_bytes == 0); + + /* Initialize the empty incore btree root. */ + xfs_iroot_alloc(upd->ip, XFS_DATA_FORK, + xfs_rtrefcount_broot_space_calc(mp, 0, 0)); + xfs_btree_init_block(mp, ifp->if_broot, &xfs_rtrefcountbt_ops, + 0, 0, upd->ip->i_ino); + xfs_trans_log_inode(upd->tp, upd->ip, XFS_ILOG_CORE | XFS_ILOG_DBROOT); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.h b/fs/xfs/libxfs/xfs_rtrefcount_btree.h index bd070e54781a1..749c6fd02f837 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.h +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.h @@ -188,4 +188,9 @@ void xfs_rtrefcountbt_to_disk(struct xfs_mount *mp, struct xfs_rtrefcount_root *dblock, int dblocklen); void xfs_iflush_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip); +struct xfs_imeta_update; + +int xfs_rtrefcountbt_create(struct xfs_imeta_update *upd, + struct xfs_inode **ipp); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ From patchwork Sun Dec 31 21:48:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507729 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30053BE48 for ; Sun, 31 Dec 2023 21:48:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="srzeFgxf" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 04C73C433C8; Sun, 31 Dec 2023 21:48:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059325; bh=Rb40IupmMO4A0f84lfwwcei8JkEIHlY9wfK/fS8CDig=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=srzeFgxfX1C4xuk8kwoVY3CC9SwQDYdyFQwQmmi5g3Nr2sgzjOGf1svxNvsu4DzPK ASpcnTbTI+JJ58ic62r9fGeGbKWnEvsvRyoVWLSFN8ILl9AhW4wE47ipTTp3wdUOW7 oXUJMJd7GtROBPy35gn1jl14QHW8+Ag/fZXIsUp4FiMhCH3LUvJGfb5BRuRBQEJZKG OFQz2QfwPn+/zwoVHtf48Hrgu+6ZaAWnkZ6Ml3y92t6bPIJUDLfVwPgxBp76JhFSnm uFxc+ip78RY7Tmp83TotHFOguUKY1aGfyFYlgUmkrwYhvchr5XfSMv4kJjFvgv1EuM JeXpWf0ARdqfw== Date: Sun, 31 Dec 2023 13:48:44 -0800 Subject: [PATCH 16/44] xfs: update rmap to allow cow staging extents in the rt rmap From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851841.1766284.13724389503376391627.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Don't error out on CoW staging extent records when realtime reflink is enabled. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_rmap.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 43108a195004c..501427d2f38cb 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -276,6 +276,7 @@ xfs_rtrmap_check_irec( bool is_unwritten; bool is_bmbt; bool is_attr; + bool is_cow; if (irec->rm_blockcount == 0) return __this_address; @@ -287,6 +288,12 @@ xfs_rtrmap_check_irec( return __this_address; if (irec->rm_offset != 0) return __this_address; + } else if (irec->rm_owner == XFS_RMAP_OWN_COW) { + if (!xfs_has_rtreflink(mp)) + return __this_address; + if (!xfs_verify_rgbext(rtg, irec->rm_startblock, + irec->rm_blockcount)) + return __this_address; } else { if (!xfs_verify_rgbext(rtg, irec->rm_startblock, irec->rm_blockcount)) @@ -303,8 +310,10 @@ xfs_rtrmap_check_irec( is_bmbt = irec->rm_flags & XFS_RMAP_BMBT_BLOCK; is_attr = irec->rm_flags & XFS_RMAP_ATTR_FORK; is_unwritten = irec->rm_flags & XFS_RMAP_UNWRITTEN; + is_cow = xfs_has_rtreflink(mp) && + irec->rm_owner == XFS_RMAP_OWN_COW; - if (!is_inode && irec->rm_owner != XFS_RMAP_OWN_FS) + if (!is_inode && !is_cow && irec->rm_owner != XFS_RMAP_OWN_FS) return __this_address; if (!is_inode && irec->rm_offset != 0) @@ -316,6 +325,9 @@ xfs_rtrmap_check_irec( if (is_unwritten && !is_inode) return __this_address; + if (is_unwritten && is_cow) + return __this_address; + /* Check for a valid fork offset, if applicable. */ if (is_inode && !xfs_verify_fileext(mp, irec->rm_offset, irec->rm_blockcount)) From patchwork Sun Dec 31 21:49:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507730 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 191B6BE4A for ; Sun, 31 Dec 2023 21:49:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YYtcvK8g" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 998A7C433C8; Sun, 31 Dec 2023 21:49:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059340; bh=g37AKD5TOaDimrqUdEYXJ49MNbyRZg8ykHpAFqHlW2k=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=YYtcvK8gNVGtUMpmwZJRBRJ5QK7A9bPB5C4Ub/xjQw7ZrnVFZy0V2iKwP9n+I2E7I e8B+10Q8MHJtYn5QliHO8LxhRVSrobC+J2rWDM+/hww/85BWTkvlMle6gZjPWFE9zz 53AdurI3Ig6iGboK9RVqAE/XMc4GxbOEMjyoFLhPt1JPShBzb90ltdtu7M5rVGhT7B OPkOY/zO7l+DqYM4TiiWGonnU2ObyMGVXTxkUZ0w0Th5Ap+wAM2Y9DCBjlh71TXfPQ mbfEZMfeFD65EuELzTL3s2MSZbUYzvSQwKO3ictfqCWsT5Oh2NJbh+ISzhMjggy6TD +qRYIJaPx1Wrw== Date: Sun, 31 Dec 2023 13:49:00 -0800 Subject: [PATCH 17/44] xfs: compute rtrmap btree max levels when reflink enabled From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851857.1766284.10018189499117315739.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Compute the maximum possible height of the realtime rmap btree when reflink is enabled. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_rtrmap_btree.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 3084153af3a43..87ea5e3ca8937 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -738,6 +738,7 @@ xfs_rtrmapbt_maxrecs( unsigned int xfs_rtrmapbt_maxlevels_ondisk(void) { + unsigned long long max_dblocks; unsigned int minrecs[2]; unsigned int blocklen; @@ -746,8 +747,20 @@ xfs_rtrmapbt_maxlevels_ondisk(void) minrecs[0] = xfs_rtrmapbt_block_maxrecs(blocklen, true) / 2; minrecs[1] = xfs_rtrmapbt_block_maxrecs(blocklen, false) / 2; - /* We need at most one record for every block in an rt group. */ - return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS); + /* + * Compute the asymptotic maxlevels for an rtrmapbt on any rtreflink fs. + * + * On a reflink filesystem, each block in an rtgroup can have up to + * 2^32 (per the refcount record format) owners, which means that + * theoretically we could face up to 2^64 rmap records. However, we're + * likely to run out of blocks in the data device long before that + * happens, which means that we must compute the max height based on + * what the btree will look like if it consumes almost all the blocks + * in the data device due to maximal sharing factor. + */ + max_dblocks = -1U; /* max ag count */ + max_dblocks *= XFS_MAX_CRC_AG_BLOCKS; + return xfs_btree_space_to_height(minrecs, max_dblocks); } int __init @@ -786,9 +799,20 @@ xfs_rtrmapbt_compute_maxlevels( * maximum height is constrained by the size of the data device and * the height required to store one rmap record for each block in an * rt group. + * + * On a reflink filesystem, each rt block can have up to 2^32 (per the + * refcount record format) owners, which means that theoretically we + * could face up to 2^64 rmap records. This makes the computation of + * maxlevels based on record count meaningless, so we only consider the + * size of the data device. */ d_maxlevels = xfs_btree_space_to_height(mp->m_rtrmap_mnr, mp->m_sb.sb_dblocks); + if (xfs_has_rtreflink(mp)) { + mp->m_rtrmap_maxlevels = d_maxlevels + 1; + return; + } + r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrmap_mnr, mp->m_sb.sb_rgblocks); From patchwork Sun Dec 31 21:49:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507731 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3168BE48 for ; Sun, 31 Dec 2023 21:49:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LMXrFhVc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 46819C433C8; Sun, 31 Dec 2023 21:49:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059356; bh=l64/J+v96V9Ccvk6RUoJhwDQRCQuVppz7OXNr2GAT+U=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=LMXrFhVcXbqyr9QAxAsI2cjLiSp8/QRWey2PWwvQqJX1mYICoJVvymKYSDtgTnT0Q TdrYoQqGrYBcpT/wovDVCQCFpzjcqaL73Srbj5QjyRFSDpwrsAVVheePb1YAZMuDdX FZuDQ4dd6c5Hzwe+Q+JHEAJsWhyqVEbaTPOV7JzybvMGj7fqj/UnTAelH7FrT8Y9aL oWcW7fJ7ShsFm53gzplA2QgwuAvN64BhJIhIBVAcKQDuxz9zAMbUbTqAOTSkZ11uN4 HS8gcynCcOuEv/gOChH1RKpuZhKcgW6/gdndZhIV97IGrHDoR95Lt03sonAtJqhw3t U6KCMHkYQcG/w== Date: Sun, 31 Dec 2023 13:49:15 -0800 Subject: [PATCH 18/44] xfs: refactor reflink quota updates From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851872.1766284.4369986026573308387.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Hoist all quota updates for reflink into a helper function, since things are about to become more complicated. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_reflink.c | 37 ++++++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index a10d43a1a7da4..8e352b23dacf2 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -753,6 +753,35 @@ xfs_reflink_cancel_cow_range( return error; } +#ifdef CONFIG_XFS_QUOTA +/* + * Update quota accounting for a remapping operation. When we're remapping + * something from the CoW fork to the data fork, we must update the quota + * accounting for delayed allocations. For remapping from the data fork to the + * data fork, use regular block accounting. + */ +static inline void +xfs_reflink_update_quota( + struct xfs_trans *tp, + struct xfs_inode *ip, + bool is_cow, + int64_t blocks) +{ + unsigned int qflag; + + if (XFS_IS_REALTIME_INODE(ip)) { + qflag = is_cow ? XFS_TRANS_DQ_DELRTBCOUNT : + XFS_TRANS_DQ_RTBCOUNT; + } else { + qflag = is_cow ? XFS_TRANS_DQ_DELBCOUNT : + XFS_TRANS_DQ_BCOUNT; + } + xfs_trans_mod_dquot_byino(tp, ip, qflag, blocks); +} +#else +# define xfs_reflink_update_quota(tp, ip, is_cow, blocks) ((void)0) +#endif + /* * Remap part of the CoW fork into the data fork. * @@ -856,8 +885,7 @@ xfs_reflink_end_cow_extent( */ xfs_bmap_unmap_extent(tp, ip, XFS_DATA_FORK, &data); xfs_refcount_decrease_extent(tp, isrt, &data); - xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, - -data.br_blockcount); + xfs_reflink_update_quota(tp, ip, false, -data.br_blockcount); } else if (data.br_startblock == DELAYSTARTBLOCK) { int done; @@ -882,8 +910,7 @@ xfs_reflink_end_cow_extent( xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &del); /* Charge this new data fork mapping to the on-disk quota. */ - xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT, - (long)del.br_blockcount); + xfs_reflink_update_quota(tp, ip, true, del.br_blockcount); /* Remove the mapping from the CoW fork. */ xfs_bmap_del_extent_cow(ip, &icur, &got, &del); @@ -1373,7 +1400,7 @@ xfs_reflink_remap_extent( qdelta += dmap->br_blockcount; } - xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, qdelta); + xfs_reflink_update_quota(tp, ip, false, qdelta); /* Update dest isize if needed. */ newlen = XFS_FSB_TO_B(mp, dmap->br_startoff + dmap->br_blockcount); From patchwork Sun Dec 31 21:49:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507732 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DBA5BE47 for ; Sun, 31 Dec 2023 21:49:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="A7oFt3Dt" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02A41C433C8; Sun, 31 Dec 2023 21:49:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059372; bh=vDnwoHAB1eBho+zBA+qQfED9aA/Botcm3gDy2ekM6Z0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=A7oFt3DtVV2XKm1y6z3KFV8H0tZW5t5Gn2P91u4Gb/OmzlWNfO5Z45zleBaWy8STh 63v5VtSBOk8IeZ59eljgOrgV6VhZ3715GOeFwoHTWylEzshT0975eboCwoQDNDmbfR vZ4/7TZe5j5TG/PA1mZfEfk+WG2FW0DLRsTPtJzdt/1uZcCGmN4ywWb/DLzWfTbiK8 w90WE42riL4FppBpm8iNO1MWvRWEuPPkHnlcUEomNoPpwlZmoVzPtGGtUVxCRtfo3I fGo/gAoIPdJKk0z6iMtV1HT2r96B3XHaO7flfL6SkC0qleZ3elFMqBXJ2H3Lxp1OYJ frDxbHGw//MJQ== Date: Sun, 31 Dec 2023 13:49:31 -0800 Subject: [PATCH 19/44] xfs: enable CoW for realtime data From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851888.1766284.906833932864734256.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Update our write paths to support copy on write on the rt volume. This works in more or less the same way as it does on the data device, with the major exception that we never do delalloc on the rt volume. Because we consider unwritten CoW fork staging extents to be incore quota reservation, we update xfs_quota_reserve_blkres to support this case. Though xfs doesn't allow rt and quota together, the change is trivial and we shouldn't leave a logic bomb here. While we're at it, add a missing xfs_mod_delalloc call when we remove delalloc block reservation from the inode. This is largely irrelvant since realtime files do not use delalloc, but we want to avoid leaving logic bombs. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 61 ++++++++++++++++++++++++++++++++++++++-------- fs/xfs/xfs_quota.h | 6 +---- fs/xfs/xfs_reflink.c | 36 +++++++++++++++++++++------ fs/xfs/xfs_trans_dquot.c | 11 ++++++++ 4 files changed, 90 insertions(+), 24 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index ef8658a9724dd..a7a99177bbf8b 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -71,6 +71,55 @@ xfs_zero_extent( } #ifdef CONFIG_XFS_RT + +/* Update all inode and quota accounting for the allocation we just did. */ +static void +xfs_bmap_rtalloc_accounting( + struct xfs_bmalloca *ap) +{ + if (ap->flags & XFS_BMAPI_COWFORK) { + /* + * COW fork blocks are in-core only and thus are treated as + * in-core quota reservation (like delalloc blocks) even when + * converted to real blocks. The quota reservation is not + * accounted to disk until blocks are remapped to the data + * fork. So if these blocks were previously delalloc, we + * already have quota reservation and there's nothing to do + * yet. + */ + if (ap->wasdel) { + xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length); + return; + } + + /* + * Otherwise, we've allocated blocks in a hole. The transaction + * has acquired in-core quota reservation for this extent. + * Rather than account these as real blocks, however, we reduce + * the transaction quota reservation based on the allocation. + * This essentially transfers the transaction quota reservation + * to that of a delalloc extent. + */ + ap->ip->i_delayed_blks += ap->length; + xfs_trans_mod_dquot_byino(ap->tp, ap->ip, + XFS_TRANS_DQ_RES_RTBLKS, -(long)ap->length); + return; + } + + /* data fork only */ + ap->ip->i_nblocks += ap->length; + xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE); + if (ap->wasdel) { + ap->ip->i_delayed_blks -= ap->length; + xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length); + } + + /* Adjust the disk quota also. This was reserved earlier. */ + xfs_trans_mod_dquot_byino(ap->tp, ap->ip, + ap->wasdel ? XFS_TRANS_DQ_DELRTBCOUNT : + XFS_TRANS_DQ_RTBCOUNT, ap->length); +} + int xfs_bmap_rtalloc( struct xfs_bmalloca *ap) @@ -166,17 +215,7 @@ xfs_bmap_rtalloc( if (rtx != NULLRTEXTNO) { ap->blkno = xfs_rtx_to_rtb(mp, rtx); ap->length = xfs_rtxlen_to_extlen(mp, ralen); - ap->ip->i_nblocks += ap->length; - xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE); - if (ap->wasdel) - ap->ip->i_delayed_blks -= ap->length; - /* - * Adjust the disk quota also. This was reserved - * earlier. - */ - xfs_trans_mod_dquot_byino(ap->tp, ap->ip, - ap->wasdel ? XFS_TRANS_DQ_DELRTBCOUNT : - XFS_TRANS_DQ_RTBCOUNT, ap->length); + xfs_bmap_rtalloc_accounting(ap); return 0; } diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h index 55320c9ff1367..165013f03db9e 100644 --- a/fs/xfs/xfs_quota.h +++ b/fs/xfs/xfs_quota.h @@ -129,11 +129,7 @@ extern void xfs_qm_mount_quotas(struct xfs_mount *); extern void xfs_qm_unmount(struct xfs_mount *); extern void xfs_qm_unmount_quotas(struct xfs_mount *); -static inline int -xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t blocks) -{ - return xfs_trans_reserve_quota_nblks(NULL, ip, blocks, 0, false); -} +int xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t blocks); bool xfs_inode_near_dquot_enforcement(struct xfs_inode *ip, xfs_dqtype_t type); # ifdef CONFIG_XFS_LIVE_HOOKS diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 8e352b23dacf2..ed9f4ca34fcea 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -434,20 +434,26 @@ xfs_reflink_fill_cow_hole( struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; xfs_filblks_t resaligned; - xfs_extlen_t resblks; + unsigned int dblocks = 0, rblocks = 0; int nimaps; int error; bool found; resaligned = xfs_aligned_fsb_count(imap->br_startoff, imap->br_blockcount, xfs_get_cowextsz_hint(ip)); - resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned); + if (XFS_IS_REALTIME_INODE(ip)) { + dblocks = XFS_DIOSTRAT_SPACE_RES(mp, 0); + rblocks = resaligned; + } else { + dblocks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned); + rblocks = 0; + } xfs_iunlock(ip, *lockmode); *lockmode = 0; - error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, resblks, 0, - false, &tp); + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, dblocks, + rblocks, false, &tp); if (error) return error; @@ -1236,7 +1242,7 @@ xfs_reflink_remap_extent( struct xfs_trans *tp; xfs_off_t newlen; int64_t qdelta = 0; - unsigned int resblks; + unsigned int dblocks, rblocks, resblks; bool quota_reserved = true; bool smap_real; bool dmap_written = xfs_bmap_is_written_extent(dmap); @@ -1267,8 +1273,15 @@ xfs_reflink_remap_extent( * we're remapping. */ resblks = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK); + if (XFS_IS_REALTIME_INODE(ip)) { + dblocks = resblks; + rblocks = dmap->br_blockcount; + } else { + dblocks = resblks + dmap->br_blockcount; + rblocks = 0; + } error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, - resblks + dmap->br_blockcount, 0, false, &tp); + dblocks, rblocks, false, &tp); if (error == -EDQUOT || error == -ENOSPC) { quota_reserved = false; error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, @@ -1348,8 +1361,15 @@ xfs_reflink_remap_extent( * done. */ if (!quota_reserved && !smap_real && dmap_written) { - error = xfs_trans_reserve_quota_nblks(tp, ip, - dmap->br_blockcount, 0, false); + if (XFS_IS_REALTIME_INODE(ip)) { + dblocks = 0; + rblocks = dmap->br_blockcount; + } else { + dblocks = dmap->br_blockcount; + rblocks = 0; + } + error = xfs_trans_reserve_quota_nblks(tp, ip, dblocks, rblocks, + false); if (error) goto out_cancel; } diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c index 6983e35b7c2b7..5a0bdc8e06fca 100644 --- a/fs/xfs/xfs_trans_dquot.c +++ b/fs/xfs/xfs_trans_dquot.c @@ -1020,3 +1020,14 @@ xfs_trans_free_dqinfo( kmem_cache_free(xfs_dqtrx_cache, tp->t_dqinfo); tp->t_dqinfo = NULL; } + +int +xfs_quota_reserve_blkres( + struct xfs_inode *ip, + int64_t blocks) +{ + if (XFS_IS_REALTIME_INODE(ip)) + return xfs_trans_reserve_quota_nblks(NULL, ip, 0, blocks, + false); + return xfs_trans_reserve_quota_nblks(NULL, ip, blocks, 0, false); +} From patchwork Sun Dec 31 21:49:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507733 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF832BE47 for ; Sun, 31 Dec 2023 21:49:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OCjJ/5zP" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8EAF9C433C7; Sun, 31 Dec 2023 21:49:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059387; bh=bNOETkBHVW83kXWD/wKV9Oq2nW9WtdnEulw8izNR6PQ=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=OCjJ/5zPM99Fz3yWZfvjCRm49TKYOJf2orZ97c/P7fDA/1dkX/4xr6JE2mvSfl6RX fbOdGKn0oz3xHbvI9by5KoFbE+he0nsqS2r9C3Gn5QIwVzLZiZHDzZT9O8v275g2Q0 eWLdvTIUjaZH9BuwwIaZ1rJn7Ea24Skg6+5qacqP1htnGfidJMy/+EJNGpqcOabKW/ d5QcSNyGJsKT1JAL0/QHJNPCe0VTaWLVR4sImDzUOb98Pw28KY4JLYJj6BrB71658T caG2rCqrkDA2z8eJm+p5QCIB4PobOCQToPsDkUrCsVWEZQp4x754jEuPz8KujutJep QVV9JC1dsBSJw== Date: Sun, 31 Dec 2023 13:49:47 -0800 Subject: [PATCH 20/44] xfs: enable sharing of realtime file blocks From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851904.1766284.7432229795185385143.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Update the remapping routines to be able to handle realtime files. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_reflink.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index ed9f4ca34fcea..d05ab2c91bd98 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -33,6 +33,7 @@ #include "xfs_rtrefcount_btree.h" #include "xfs_rtalloc.h" #include "xfs_rtgroup.h" +#include "xfs_imeta.h" /* * Copy on Write of Shared Blocks @@ -1211,14 +1212,29 @@ xfs_reflink_update_dest( static int xfs_reflink_ag_has_free_space( struct xfs_mount *mp, - xfs_agnumber_t agno) + struct xfs_inode *ip, + xfs_fsblock_t fsb) { struct xfs_perag *pag; + xfs_agnumber_t agno; int error = 0; if (!xfs_has_rmapbt(mp)) return 0; + if (XFS_IS_REALTIME_INODE(ip)) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + rgno = xfs_rtb_to_rgno(mp, fsb); + rtg = xfs_rtgroup_get(mp, rgno); + if (xfs_imeta_resv_critical(rtg->rtg_rmapip) || + xfs_imeta_resv_critical(rtg->rtg_refcountip)) + error = -ENOSPC; + xfs_rtgroup_put(rtg); + return error; + } + + agno = XFS_FSB_TO_AGNO(mp, fsb); pag = xfs_perag_get(mp, agno); if (xfs_ag_resv_critical(pag, XFS_AG_RESV_RMAPBT) || xfs_ag_resv_critical(pag, XFS_AG_RESV_METADATA)) @@ -1332,8 +1348,8 @@ xfs_reflink_remap_extent( /* No reflinking if the AG of the dest mapping is low on space. */ if (dmap_written) { - error = xfs_reflink_ag_has_free_space(mp, - XFS_FSB_TO_AGNO(mp, dmap->br_startblock)); + error = xfs_reflink_ag_has_free_space(mp, ip, + dmap->br_startblock); if (error) goto out_cancel; } @@ -1593,8 +1609,8 @@ xfs_reflink_remap_prep( /* Check file eligibility and prepare for block sharing. */ ret = -EINVAL; - /* Don't reflink realtime inodes */ - if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest)) + /* Can't reflink between data and rt volumes */ + if (XFS_IS_REALTIME_INODE(src) != XFS_IS_REALTIME_INODE(dest)) goto out_unlock; /* Don't share DAX file data with non-DAX file. */ From patchwork Sun Dec 31 21:50:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507734 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EC62BE4A for ; Sun, 31 Dec 2023 21:50:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YstyTwCM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1BCD2C433C8; Sun, 31 Dec 2023 21:50:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059403; bh=25eNxN3ivXryMrjBbLwTSaziGLWYh5G43ccyMTiIolY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=YstyTwCMBLJ3R6LndMdd/VhfB8DujfNppFwkLYKS7U5eopbyqAsRI3z+JgwMZCF05 J/oyETvGA+GubfqI1K8X66fFeIXWnNz/0W1ayDrgCWeSAfM2gvuJTXphUlJfd78SH4 l5xc8qjSl3zK+wm2tGjKpvUqykisOabiROZROjStXV+X8rLCYNthng5MF6/QAi3nDr rssOpj0WZaL8ad7Vs/CR3GHkq2zGaI2YvWwZh1Iz3wvCkfv67h90Zfzg44h2lhkAyN 03Ir7+9zklekpTEWOOL+/7ON7VXUGEVq4luVWFTEmfC60KFlIm2ob55vHZ51KPtkES CqMJQEp1g/SbA== Date: Sun, 31 Dec 2023 13:50:02 -0800 Subject: [PATCH 21/44] xfs: allow inodes to have the realtime and reflink flags From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851920.1766284.5827263861111762378.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now that we can share blocks between realtime files, allow this combination. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_inode_buf.c | 3 ++- fs/xfs/scrub/inode.c | 5 +++-- fs/xfs/scrub/inode_repair.c | 6 ------ fs/xfs/xfs_ioctl.c | 4 ---- 4 files changed, 5 insertions(+), 13 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 6e08ff8d8e239..ba37b864f6a8b 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -691,7 +691,8 @@ xfs_dinode_verify( return __this_address; /* don't let reflink and realtime mix */ - if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME)) + if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME) && + !xfs_has_rtreflink(mp)) return __this_address; /* COW extent size hint validation */ diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index 5fc10e02b9c41..705865ec6c1c0 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -359,8 +359,9 @@ xchk_inode_flags2( if ((flags2 & XFS_DIFLAG2_REFLINK) && !S_ISREG(mode)) goto bad; - /* realtime and reflink make no sense, currently */ - if ((flags & XFS_DIFLAG_REALTIME) && (flags2 & XFS_DIFLAG2_REFLINK)) + /* realtime and reflink don't always go together */ + if ((flags & XFS_DIFLAG_REALTIME) && (flags2 & XFS_DIFLAG2_REFLINK) && + !xfs_has_rtreflink(mp)) goto bad; /* no bigtime iflag without the bigtime feature */ diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index f4f6ed6ef5120..8d67ae257e597 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -509,8 +509,6 @@ xrep_dinode_flags( flags2 |= XFS_DIFLAG2_REFLINK; else flags2 &= ~(XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE); - if (flags & XFS_DIFLAG_REALTIME) - flags2 &= ~XFS_DIFLAG2_REFLINK; if (!xfs_has_bigtime(mp)) flags2 &= ~XFS_DIFLAG2_BIGTIME; if (!xfs_has_large_extent_counts(mp)) @@ -1716,10 +1714,6 @@ xrep_inode_flags( /* DAX only applies to files and dirs. */ if (!(S_ISREG(mode) || S_ISDIR(mode))) sc->ip->i_diflags2 &= ~XFS_DIFLAG2_DAX; - - /* No reflink files on the realtime device. */ - if (sc->ip->i_diflags & XFS_DIFLAG_REALTIME) - sc->ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; } /* diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 32a52799ae826..4559d122101cd 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1119,10 +1119,6 @@ xfs_ioctl_setattr_xflags( if (mp->m_sb.sb_rblocks == 0 || mp->m_sb.sb_rextsize == 0 || xfs_extlen_to_rtxmod(mp, ip->i_extsize)) return -EINVAL; - - /* Clear reflink if we are actually able to set the rt flag. */ - if (xfs_is_reflink_inode(ip)) - ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; } /* diflags2 only valid for v3 inodes. */ From patchwork Sun Dec 31 21:50:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507735 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F3B0BBE4A for ; Sun, 31 Dec 2023 21:50:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hm/UXfuK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C7448C433C7; Sun, 31 Dec 2023 21:50:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059418; bh=VWMG0J6Oias3h+XfVej18h2xIfXCbz8i71qkZJWDo8w=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=hm/UXfuKbOFQyKL+C1z2p4HW26GXn+On4YoJeifbZnPnnxzuT7+2oUP/oO9udFZVk CG4BWwTFulRHEQN1eGuwDwQo4fVG29a0GAsd077uhUKl8c79Cv4Tb3afeT+de5ecZi CErFqfG9NaK/4duNtAikdHadjVNiHpla5OXNLRBfPjKYnYk0IojMcziqudH7kynd70 fjLELvZlYlmiu/42xQzqs2UQgWXIjh5FqctYXY3FRVv8AHb99kPvyr9nSH76UiuR6C btpaY7ZTAuIAu13w0V1sY5/ysmbao6II4mheQW/9fE2RiuhPccanNinu9YZDrxiD3i oe6g8rCHzqXNA== Date: Sun, 31 Dec 2023 13:50:18 -0800 Subject: [PATCH 22/44] xfs: refcover CoW leftovers in the realtime volume From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851936.1766284.12778272460691965326.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Scan the realtime refcount tree at mount time to get rid of leftover CoW staging extents. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_refcount.c | 64 +++++++++++++++++++++++++++++++++--------- fs/xfs/libxfs/xfs_refcount.h | 4 ++- fs/xfs/xfs_reflink.c | 14 ++++++++- 3 files changed, 65 insertions(+), 17 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 60e838adde0d8..b4ec6077270a7 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -2077,14 +2077,15 @@ xfs_refcount_recover_extent( } /* Find and remove leftover CoW reservations. */ -int -xfs_refcount_recover_cow_leftovers( +static int +xfs_refcount_recover_group_cow_leftovers( struct xfs_mount *mp, - struct xfs_perag *pag) + struct xfs_perag *pag, + struct xfs_rtgroup *rtg) { struct xfs_trans *tp; struct xfs_btree_cur *cur; - struct xfs_buf *agbp; + struct xfs_buf *agbp = NULL; struct xfs_refcount_recovery *rr, *n; struct list_head debris; union xfs_btree_irec low = { @@ -2099,7 +2100,12 @@ xfs_refcount_recover_cow_leftovers( /* reflink filesystems mustn't have AGs larger than 2^31-1 blocks */ BUILD_BUG_ON(XFS_MAX_CRC_AG_BLOCKS >= XFS_REFC_COWFLAG); - if (mp->m_sb.sb_agblocks > XFS_MAX_CRC_AG_BLOCKS) + if (pag && mp->m_sb.sb_agblocks > XFS_MAX_CRC_AG_BLOCKS) + return -EOPNOTSUPP; + + /* rtreflink filesystems can't have rtgroups larger than 2^31-1 blocks */ + BUILD_BUG_ON(XFS_MAX_RGBLOCKS >= XFS_REFC_COWFLAG); + if (rtg && mp->m_sb.sb_rgblocks >= XFS_MAX_RGBLOCKS) return -EOPNOTSUPP; INIT_LIST_HEAD(&debris); @@ -2118,16 +2124,25 @@ xfs_refcount_recover_cow_leftovers( if (error) return error; - error = xfs_alloc_read_agf(pag, tp, 0, &agbp); - if (error) - goto out_trans; - cur = xfs_refcountbt_init_cursor(mp, tp, agbp, pag); + if (rtg) { + xfs_rtgroup_lock(NULL, rtg, XFS_RTGLOCK_REFCOUNT); + cur = xfs_rtrefcountbt_init_cursor(mp, tp, rtg, + rtg->rtg_refcountip); + } else { + error = xfs_alloc_read_agf(pag, tp, 0, &agbp); + if (error) + goto out_trans; + cur = xfs_refcountbt_init_cursor(mp, tp, agbp, pag); + } /* Find all the leftover CoW staging extents. */ error = xfs_btree_query_range(cur, &low, &high, xfs_refcount_recover_extent, &debris); xfs_btree_del_cursor(cur, error); - xfs_trans_brelse(tp, agbp); + if (agbp) + xfs_trans_brelse(tp, agbp); + else + xfs_rtgroup_unlock(rtg, XFS_RTGLOCK_REFCOUNT); xfs_trans_cancel(tp); if (error) goto out_free; @@ -2140,15 +2155,20 @@ xfs_refcount_recover_cow_leftovers( goto out_free; /* Free the orphan record */ - fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, - rr->rr_rrec.rc_startblock); - xfs_refcount_free_cow_extent(tp, false, fsb, + if (rtg) + fsb = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, + rr->rr_rrec.rc_startblock); + else + fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, + rr->rr_rrec.rc_startblock); + xfs_refcount_free_cow_extent(tp, rtg != NULL, fsb, rr->rr_rrec.rc_blockcount); /* Free the block. */ error = xfs_free_extent_later(tp, fsb, rr->rr_rrec.rc_blockcount, NULL, - XFS_AG_RESV_NONE, 0); + XFS_AG_RESV_NONE, + rtg != NULL ? XFS_FREE_EXTENT_REALTIME : 0); if (error) goto out_trans; @@ -2172,6 +2192,22 @@ xfs_refcount_recover_cow_leftovers( return error; } +int +xfs_refcount_recover_cow_leftovers( + struct xfs_mount *mp, + struct xfs_perag *pag) +{ + return xfs_refcount_recover_group_cow_leftovers(mp, pag, NULL); +} + +int +xfs_refcount_recover_rtcow_leftovers( + struct xfs_mount *mp, + struct xfs_rtgroup *rtg) +{ + return xfs_refcount_recover_group_cow_leftovers(mp, NULL, rtg); +} + /* * Scan part of the keyspace of the refcount records and tell us if the area * has no records, is fully mapped by records, or is partially filled. diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h index 56e5834feb624..18e0479d3d1d0 100644 --- a/fs/xfs/libxfs/xfs_refcount.h +++ b/fs/xfs/libxfs/xfs_refcount.h @@ -97,8 +97,10 @@ void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len); void xfs_refcount_free_cow_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len); -extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp, +int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp, struct xfs_perag *pag); +int xfs_refcount_recover_rtcow_leftovers(struct xfs_mount *mp, + struct xfs_rtgroup *rtg); /* * While we're adjusting the refcounts records of an extent, we have diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index d05ab2c91bd98..b0f3170c11c8b 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1006,7 +1006,9 @@ xfs_reflink_recover_cow( struct xfs_mount *mp) { struct xfs_perag *pag; + struct xfs_rtgroup *rtg; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; int error = 0; if (!xfs_has_reflink(mp)) @@ -1016,11 +1018,19 @@ xfs_reflink_recover_cow( error = xfs_refcount_recover_cow_leftovers(mp, pag); if (error) { xfs_perag_rele(pag); - break; + return error; } } - return error; + for_each_rtgroup(mp, rgno, rtg) { + error = xfs_refcount_recover_rtcow_leftovers(mp, rtg); + if (error) { + xfs_rtgroup_rele(rtg); + return error; + } + } + + return 0; } /* From patchwork Sun Dec 31 21:50:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507736 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2218C8E7 for ; Sun, 31 Dec 2023 21:50:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="maRW++zr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 57FDEC433C8; Sun, 31 Dec 2023 21:50:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059434; bh=SJKce4HGwJcbnxSceucA7bcghmbQtbaGx7fwItsuGbI=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=maRW++zrR6EFMgznMrkFM+KDySkxpnlJ3+WKXo7EEE0rdIUsb2u7H5wrB3QYGwt43 k2fPhLZABSWp7tvFIFoTNOiaubX8jL/yB4zeQNicbDHXFXWLPswpiLyAMHuI9k5Fhb d/Ab0j1EW3gcQ2oLhw54WgCyhCBPjFwq3C2GdaCKrMPVc/cFdMro7Tuzq1dfD0p1Sb 7McHSdd2sUVglKerKHpBkaY4kvXfEUb9uK8NIngoMP4dVvtpx592Ku1pZ1t7TeUt10 I+YrfXAhET7eLVqOAgL0EdfcTTXzNupChicTzNo2iMma2poRsYurOnHeKiNYn8Jr8B ZbFWyDxR399Lw== Date: Sun, 31 Dec 2023 13:50:33 -0800 Subject: [PATCH 23/44] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851951.1766284.12589916918123658780.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Currently, we (ab)use xfs_get_extsz_hint so that it always returns a nonzero value for realtime files. This apparently was done to disable delayed allocation for realtime files. However, once we enable realtime reflink, we can also turn on the alwayscow flag to force CoW writes to realtime files. In this case, the logic will incorrectly send the write through the delalloc write path. Fix this by adjusting the logic slightly. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 27f992dc6d2d6..316b574b34b8a 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6384,9 +6384,8 @@ xfs_get_extsz_hint( * No point in aligning allocations if we need to COW to actually * write to them. */ - if (xfs_is_always_cow_inode(ip)) - return 0; - if ((ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize) + if (!xfs_is_always_cow_inode(ip) && + (ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize) return ip->i_extsize; if (XFS_IS_REALTIME_INODE(ip)) return ip->i_mount->m_sb.sb_rextsize; From patchwork Sun Dec 31 21:50:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507737 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CBB9BE47 for ; Sun, 31 Dec 2023 21:50:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VcQLTvDD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EE4E3C433C8; Sun, 31 Dec 2023 21:50:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059450; bh=D29AdvAM3W6TZFoxrtqV08LxVSoZfiToUABFj7RVwu4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=VcQLTvDDmDVJUCktp5h4UeqPaCFlbv29RnzuPTcvAGuB0P/JbNByt7r0piDGWdBcX n/UfudDDkDLlQL4CmUsLotN5+Og1r/8H9ABL2PuPVCBCBPN7I9xy8/4vz8yqXz1ExP 3LCW+UYq4e2tGHSGJIx4RmglAbzuL2QzIeUj0u6oGEYvtH4aZogYVkCDcMho27+bjs QI/4KaHV8Tb2oew+QK9HjCMyC0WnWGUwrJ3uHjfvWvipxxpDNDzAtSz0LQ2ZHVTTYx J4WVg0Lpw8FEwpRS7Ie0IqlT2OoplGChq+g9I47Klv4pVvTGq8xvfwZQaiye91fTjF rkltQLmYuIeGg== Date: Sun, 31 Dec 2023 13:50:49 -0800 Subject: [PATCH 24/44] xfs: apply rt extent alignment constraints to CoW extsize hint From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851967.1766284.11315488240210718409.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong The copy-on-write extent size hint is subject to the same alignment constraints as the regular extent size hint. Since we're in the process of adding reflink (and therefore CoW) to the realtime device, we must apply the same scattered rextsize alignment validation strategies to both hints to deal with the possibility of rextsize changing. Therefore, fix the inode validator to perform rextsize alignment checks on regular realtime files, and to remove misaligned directory hints. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_inode_buf.c | 25 ++++++++++++++++++++----- fs/xfs/xfs_inode_item.c | 14 ++++++++++++++ fs/xfs/xfs_ioctl.c | 17 +++++++++++++++-- 3 files changed, 49 insertions(+), 7 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index ba37b864f6a8b..81a12ca8ec434 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -861,11 +861,29 @@ xfs_inode_validate_cowextsize( bool rt_flag; bool hint_flag; uint32_t cowextsize_bytes; + uint32_t blocksize_bytes; rt_flag = (flags & XFS_DIFLAG_REALTIME); hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE); cowextsize_bytes = XFS_FSB_TO_B(mp, cowextsize); + /* + * Similar to extent size hints, a directory can be configured to + * propagate realtime status and a CoW extent size hint to newly + * created files even if there is no realtime device, and the hints on + * disk can become misaligned if the sysadmin changes the rt extent + * size while adding the realtime device. + * + * Therefore, we can only enforce the rextsize alignment check against + * regular realtime files, and rely on callers to decide when alignment + * checks are appropriate, and fix things up as needed. + */ + + if (rt_flag) + blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); + else + blocksize_bytes = mp->m_sb.sb_blocksize; + if (hint_flag && !xfs_has_reflink(mp)) return __this_address; @@ -879,16 +897,13 @@ xfs_inode_validate_cowextsize( if (mode && !hint_flag && cowextsize != 0) return __this_address; - if (hint_flag && rt_flag) - return __this_address; - - if (cowextsize_bytes % mp->m_sb.sb_blocksize) + if (cowextsize_bytes % blocksize_bytes) return __this_address; if (cowextsize > XFS_MAX_BMBT_EXTLEN) return __this_address; - if (cowextsize > mp->m_sb.sb_agblocks / 2) + if (!rt_flag && cowextsize > mp->m_sb.sb_agblocks / 2) return __this_address; return NULL; diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index fdc0b14bb9fbb..16d7d934da6e8 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -127,6 +127,20 @@ xfs_inode_item_precommit( if (flags & XFS_ILOG_IVERSION) flags = ((flags & ~XFS_ILOG_IVERSION) | XFS_ILOG_CORE); + /* + * Inode verifiers do not check that the CoW extent size hint is an + * integer multiple of the rt extent size on a directory with both + * rtinherit and cowextsize flags set. If we're logging a directory + * that is misconfigured in this way, clear the hint. + */ + if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && + (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) && + xfs_extlen_to_rtxmod(ip->i_mount, ip->i_cowextsize) > 0) { + ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE; + ip->i_cowextsize = 0; + flags |= XFS_ILOG_CORE; + } + if (!iip->ili_item.li_buf) { struct xfs_buf *bp; int error; diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 4559d122101cd..f85d5f142d180 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1058,8 +1058,21 @@ xfs_fill_fsxattr( } } - if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) - fa->fsx_cowextsize = XFS_FSB_TO_B(mp, ip->i_cowextsize); + if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) { + /* + * Don't let a misaligned CoW extent size hint on a directory + * escape to userspace if it won't pass the setattr checks + * later. + */ + if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && + ip->i_cowextsize % mp->m_sb.sb_rextsize > 0) { + fa->fsx_xflags &= ~FS_XFLAG_COWEXTSIZE; + fa->fsx_cowextsize = 0; + } else { + fa->fsx_cowextsize = XFS_FSB_TO_B(mp, ip->i_cowextsize); + } + } + fa->fsx_projid = ip->i_projid; if (ifp && !xfs_need_iread_extents(ifp)) fa->fsx_nextents = xfs_iext_count(ifp); From patchwork Sun Dec 31 21:51:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507738 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D977BE47 for ; Sun, 31 Dec 2023 21:51:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="b3h9DCfz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 97A7CC433C8; Sun, 31 Dec 2023 21:51:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059465; bh=ac7R9HhmFrwmG4ZLfYO90F2cm7LUO8fAaDCY7PoL/r4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=b3h9DCfztNdWEYIkQbSZHbbmaMMs70ki8kXl47QhgaQoYPTkSXVfKxJFmO8fNiCzV 2H+284yjIIOlUlNYSIvkYzlIso1YiUjUEQaMu6EdpEypxnXcpQY1EQBju4FZ/Eb+J2 pkpmGq6lv/rd0IJC31LxUpkkPe4qJlR9dzkIiR6k7i3tOSZOR1zdFjaRXXZMEFQq12 HZXPu8eliPbxsvlZhCjNkepv7lbv6hEKrNW7EkkitY1qBgYoPeZ5h7yumIDf2zsF1U cmwNSMjurU+ZhBxJXl9UXNY2DdMwuyXVZ8BX/juwMgbee2CL5JIjLLIvriD/7eT2Ya QftRcIN5vLw4w== Date: Sun, 31 Dec 2023 13:51:05 -0800 Subject: [PATCH 25/44] xfs: enable extent size hints for CoW operations From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404851984.1766284.14375473929572405267.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Wire up the copy-on-write extent size hint for realtime files, and connect it to the rt allocator so that we avoid fragmentation on rt filesystems. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 8 +++++++- fs/xfs/xfs_bmap_util.c | 5 ++++- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 316b574b34b8a..41354bdbbc90f 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6407,7 +6407,13 @@ xfs_get_cowextsz_hint( a = 0; if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) a = ip->i_cowextsize; - b = xfs_get_extsz_hint(ip); + if (XFS_IS_REALTIME_INODE(ip)) { + b = 0; + if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) + b = ip->i_extsize; + } else { + b = xfs_get_extsz_hint(ip); + } a = max(a, b); if (a == 0) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index a7a99177bbf8b..60992f8e86adf 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -138,7 +138,10 @@ xfs_bmap_rtalloc( bool ignore_locality = false; int error; - align = xfs_get_extsz_hint(ap->ip); + if (ap->flags & XFS_BMAPI_COWFORK) + align = xfs_get_cowextsz_hint(ap->ip); + else + align = xfs_get_extsz_hint(ap->ip); retry: prod = xfs_extlen_to_rtxlen(mp, align); error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, From patchwork Sun Dec 31 21:51:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507739 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBA95BE48 for ; Sun, 31 Dec 2023 21:51:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Pz7Yn81I" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 48979C433C8; Sun, 31 Dec 2023 21:51:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059481; bh=jE3bx2dMNGItyZYpGwK+8chzBjtJJgek0r/vokaBt0M=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Pz7Yn81Ir7PbMY7hof5vya3wmIsV6/58Rhtidu29WuQPzlayuoAkk5u94tt4QWgtE 7O2J2j+RV2pj8Jnh6asKn8bDtGLIa6h1xAzG1SYBc6u7dVLMD9o+JlUnZyP1QqFfEw ZxtwgYUZASitSb8UcgK/IL1v6WSIhDGw9seLsBmKUYOyOgFuZKORlStvlV4g1NrLvt RgtwqKYnzOL5Nti4yig0F5LhapxJdo1Y/JmOpi5TlUVSBGURLfI2BSwYMZXfvKD+Q/ xGgRUkEvtXJJnjgxvnlOh5pO71CAjyzWxEiuHdD59eammGlsJ4q1iXtF3EcES97g8b 0iSmrjM/GHbWA== Date: Sun, 31 Dec 2023 13:51:20 -0800 Subject: [PATCH 26/44] xfs: check that the rtrefcount maxlevels doesn't increase when growing fs From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852000.1766284.10949871237508606131.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong The size of filesystem transaction reservations depends on the maximum height (maxlevels) of the realtime btrees. Since we don't want a grow operation to increase the reservation size enough that we'll fail the minimum log size checks on the next mount, constrain growfs operations if they would cause an increase in the rt refcount btree maxlevels. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_fsops.c | 2 ++ fs/xfs/xfs_rtalloc.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index e78ee67b9dd12..9584c08480f75 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -24,6 +24,7 @@ #include "xfs_rtgroup.h" #include "xfs_rtalloc.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" /* * Write new AG headers to disk. Non-transactional, but need to be @@ -238,6 +239,7 @@ xfs_growfs_data_private( /* Compute new maxlevels for rt btrees. */ xfs_rtrmapbt_compute_maxlevels(mp); + xfs_rtrefcountbt_compute_maxlevels(mp); } return error; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 14e17c2b39ef0..54859b32d37fc 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1128,6 +1128,7 @@ xfs_growfs_check_rtgeom( fake_mp->m_features |= XFS_FEAT_REALTIME; xfs_rtrmapbt_compute_maxlevels(fake_mp); + xfs_rtrefcountbt_compute_maxlevels(fake_mp); xfs_trans_resv_calc(fake_mp, M_RES(fake_mp)); min_logfsbs = xfs_log_calc_minimum_size(fake_mp); @@ -1451,6 +1452,7 @@ xfs_growfs_rt( */ mp->m_features |= XFS_FEAT_REALTIME; xfs_rtrmapbt_compute_maxlevels(mp); + xfs_rtrefcountbt_compute_maxlevels(mp); } if (error) goto out_free; From patchwork Sun Dec 31 21:51:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507740 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BE2CBE47 for ; Sun, 31 Dec 2023 21:51:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vHs4HvX5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F3C5CC433C7; Sun, 31 Dec 2023 21:51:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059497; bh=eIsKs7fnOAuY3pD6cqbmmFfP7whr98u+fYHBGpjxzpA=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=vHs4HvX5gth7ivzkNvAhSaqD8FAO5tz2OXgcqGP5EgwS1mtthcAjnmMv69hz3dMRr Zn9leB+zA/hPUZR+3XrbNpKJ+XUPenEGo6lKbBsqTiftMTn8HwvAFwZTtWxYfGHWdu 9RFhK9Hq1LDnx9y8Qjl92UGAtQzZRP7C3q3h+XHoyMGz6g0gsC0kz4tZUct26PLFta fq0Y+0M34YYh/BPcG+UM17XYSb5c8lX1svieS5gcbVdvdIV2tVIYqp5HJnfXZWzz1F oUXjnWRJlily7VYO+yB/ZrsRenA9+pJYXCLCf3WIXWLUgJjRuaXFzIj6Pyjh30GdI6 bf56SIMkprcUA== Date: Sun, 31 Dec 2023 13:51:36 -0800 Subject: [PATCH 27/44] xfs: add realtime refcount btree when adding rt volume From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852016.1766284.14171174568346315649.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong If we're adding enough space to the realtime section to require the creation of new realtime groups, create the rt refcount btree inode before we start adding the space. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_rtalloc.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 63 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 54859b32d37fc..1dd76cb757534 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1090,6 +1090,59 @@ xfs_growfsrt_create_rtrmap( return error; } +/* Add a metadata inode for a realtime refcount btree. */ +static int +xfs_growfsrt_create_rtrefcount( + struct xfs_rtgroup *rtg) +{ + struct xfs_imeta_update upd; + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_path *path; + struct xfs_inode *ip = NULL; + int error; + + if (!xfs_has_rtreflink(mp) || rtg->rtg_refcountip) + return 0; + + error = xfs_rtrefcountbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + return error; + + error = xfs_imeta_ensure_dirpath(mp, path); + if (error) + goto out_path; + + error = xfs_imeta_start_create(mp, path, &upd); + if (error) + goto out_path; + + error = xfs_rtrefcountbt_create(&upd, &ip); + if (error) + goto out_cancel; + + lockdep_set_class(&ip->i_lock.mr_lock, &xfs_rrefcountip_key); + + error = xfs_imeta_commit_update(&upd); + if (error) + goto out_path; + + xfs_imeta_free_path(path); + xfs_finish_inode_setup(ip); + rtg->rtg_refcountip = ip; + return 0; + +out_cancel: + xfs_imeta_cancel_update(&upd, error); + /* Have to finish setting up the inode to ensure it's deleted. */ + if (ip) { + xfs_finish_inode_setup(ip); + xfs_irele(ip); + } +out_path: + xfs_imeta_free_path(path); + return error; +} + /* * Check that changes to the realtime geometry won't affect the minimum * log size, which would cause the fs to become unusable. @@ -1196,9 +1249,11 @@ xfs_growfs_rt( return -EINVAL; /* Unsupported realtime features. */ - if (!xfs_has_rtgroups(mp) && xfs_has_rmapbt(mp)) + if (!xfs_has_rtgroups(mp) && (xfs_has_rmapbt(mp) || xfs_has_reflink(mp))) return -EOPNOTSUPP; - if (xfs_has_reflink(mp) || xfs_has_quota(mp)) + if (xfs_has_quota(mp)) + return -EOPNOTSUPP; + if (xfs_has_reflink(mp) && in->extsize != 1) return -EOPNOTSUPP; nrblocks = in->newblocks; @@ -1349,6 +1404,12 @@ xfs_growfs_rt( xfs_rtgroup_rele(rtg); break; } + + error = xfs_growfsrt_create_rtrefcount(rtg); + if (error) { + xfs_rtgroup_rele(rtg); + break; + } } } From patchwork Sun Dec 31 21:51:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507741 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2EC4AC8CA for ; Sun, 31 Dec 2023 21:51:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dJG66Y3e" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 95817C433C7; Sun, 31 Dec 2023 21:51:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059512; bh=zZ2tiHsQhAWPnqxOyOyy7TQf2FeeVF+COYbY/hlBjKo=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=dJG66Y3eHAWVajNbK19rKNUhcfkGrbgmeprrI2qoC5vGlSluckmdlhSMa54D26eT/ VHyWhDOs3fatYhm3qJPNW9XnKd0yaZZtf68BIX+W0BsnRMMqsRL51UohSAVXdqgLzU DfHpE5Eek7i9fKzrReYm1tyxQq3gX8Z3f7gCCHCeJ0PBq7dk5A7ojfurikQrvaJcX2 hVlMK/0ox1B2aN9IdNp6Hua/Zro68u7Zs4d70oEzAKdVL2tYVcGAJTBNJ+ysRrEdtW 1fXjoxZPfFkVeJGgmiILp3Vd0zPP0Yejie9fwC+7akNT93xkJ3hxTmnwqvsp6Xdt9A zpT6H/7pJQUAw== Date: Sun, 31 Dec 2023 13:51:52 -0800 Subject: [PATCH 28/44] xfs: report realtime refcount btree corruption errors to the health system From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852031.1766284.15779582679192002528.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Whenever we encounter corrupt realtime refcount btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_fs_staging.h | 1 + fs/xfs/libxfs/xfs_health.h | 4 +++- fs/xfs/libxfs/xfs_inode_fork.c | 4 +++- fs/xfs/libxfs/xfs_rtrefcount_btree.c | 5 ++++- fs/xfs/xfs_health.c | 4 ++++ fs/xfs/xfs_rtalloc.c | 2 ++ 6 files changed, 17 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs_staging.h b/fs/xfs/libxfs/xfs_fs_staging.h index 9d5d6af62b616..9f0c03103f05b 100644 --- a/fs/xfs/libxfs/xfs_fs_staging.h +++ b/fs/xfs/libxfs/xfs_fs_staging.h @@ -217,6 +217,7 @@ struct xfs_rtgroup_geometry { #define XFS_RTGROUP_GEOM_SICK_SUPER (1 << 0) /* superblock */ #define XFS_RTGROUP_GEOM_SICK_BITMAP (1 << 1) /* rtbitmap for this group */ #define XFS_RTGROUP_GEOM_SICK_RMAPBT (1 << 2) /* reverse mappings */ +#define XFS_RTGROUP_GEOM_SICK_REFCNTBT (1 << 3) /* reference counts */ #define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 63, struct xfs_rtgroup_geometry) diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index aeeb62769773f..4fe4daca4c4f4 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -69,6 +69,7 @@ struct xfs_rtgroup; #define XFS_SICK_RT_SUMMARY (1 << 1) /* realtime summary */ #define XFS_SICK_RT_SUPER (1 << 2) /* rt group superblock */ #define XFS_SICK_RT_RMAPBT (1 << 3) /* reverse mappings */ +#define XFS_SICK_RT_REFCNTBT (1 << 4) /* reference counts */ /* Observable health issues for AG metadata. */ #define XFS_SICK_AG_SB (1 << 0) /* superblock */ @@ -115,7 +116,8 @@ struct xfs_rtgroup; #define XFS_SICK_RT_PRIMARY (XFS_SICK_RT_BITMAP | \ XFS_SICK_RT_SUMMARY | \ XFS_SICK_RT_SUPER | \ - XFS_SICK_RT_RMAPBT) + XFS_SICK_RT_RMAPBT | \ + XFS_SICK_RT_REFCNTBT) #define XFS_SICK_AG_PRIMARY (XFS_SICK_AG_SB | \ XFS_SICK_AG_AGF | \ diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index df42ffa15d96e..9ff913d0fa140 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -273,8 +273,10 @@ xfs_iformat_data_fork( } return xfs_iformat_rtrmap(ip, dip); case XFS_DINODE_FMT_REFCOUNT: - if (!xfs_has_rtreflink(ip->i_mount)) + if (!xfs_has_rtreflink(ip->i_mount)) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } return xfs_iformat_rtrefcount(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index fb0e4abcd6f6a..47ce0acd92a19 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -27,6 +27,7 @@ #include "xfs_rtgroup.h" #include "xfs_rtbitmap.h" #include "xfs_imeta.h" +#include "xfs_health.h" static struct kmem_cache *xfs_rtrefcountbt_cur_cache; @@ -694,8 +695,10 @@ xfs_iformat_rtrefcount( level = be16_to_cpu(dfp->bb_level); if (level > mp->m_rtrefc_maxlevels || - xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize) + xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } xfs_iroot_alloc(ip, XFS_DATA_FORK, xfs_rtrefcount_broot_space_calc(mp, level, numrecs)); diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index ed0767a6fa15a..6f40e2b728e27 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -533,6 +533,7 @@ static const struct ioctl_sick_map rtgroup_map[] = { { XFS_SICK_RT_SUPER, XFS_RTGROUP_GEOM_SICK_SUPER }, { XFS_SICK_RT_BITMAP, XFS_RTGROUP_GEOM_SICK_BITMAP }, { XFS_SICK_RT_RMAPBT, XFS_RTGROUP_GEOM_SICK_RMAPBT }, + { XFS_SICK_RT_REFCNTBT, XFS_RTGROUP_GEOM_SICK_REFCNTBT }, { 0, 0 }, }; @@ -640,6 +641,9 @@ xfs_btree_mark_sick( case XFS_BTNUM_RTRMAP: xfs_rtgroup_mark_sick(cur->bc_ino.rtg, XFS_SICK_RT_RMAPBT); return; + case XFS_BTNUM_RTREFC: + xfs_rtgroup_mark_sick(cur->bc_ino.rtg, XFS_SICK_RT_REFCNTBT); + return; case XFS_BTNUM_BNO: mask = XFS_SICK_AG_BNOBT; break; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 1dd76cb757534..11b6645a5a534 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1931,6 +1931,7 @@ xfs_rtmount_refcountbt( goto out_path; if (ino == NULLFSINO) { + xfs_rtgroup_mark_sick(rtg, XFS_SICK_RT_REFCNTBT); error = -EFSCORRUPTED; goto out_path; } @@ -1940,6 +1941,7 @@ xfs_rtmount_refcountbt( goto out_path; if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_REFCOUNT)) { + xfs_rtgroup_mark_sick(rtg, XFS_SICK_RT_REFCNTBT); error = -EFSCORRUPTED; goto out_rele; } From patchwork Sun Dec 31 21:52:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507742 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7417DC8CB for ; Sun, 31 Dec 2023 21:52:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oJkB19KP" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3D982C433C7; Sun, 31 Dec 2023 21:52:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059528; bh=aqLkI7brpZeg8APPXF04qXiabbIHQcvccm7Bidzi/gE=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=oJkB19KPrLOqAX7wcMSJnsK9gxN54TQQaB+i09JBtjSdExrzRNaaw7fz/BXMO/CEj uBgrleTlvkFEetk0J+VWmNBWS8lbiNhU3qDSttr+2PBGGXNzFKiR7bxUCWgGZ4cNeS ia5W6MH8v9X3C5nr2Gj6UWZ9lkzX9AqR4HgrhGSP5TwFlcUhrH9Ex6mFuWdQitqmxJ cAUlBHewigdYzUguZltD3kFhczg4haEwNgHt2y/2Dfek2KGUWxgC0OeNH0AXBO7PbK 3CY9gCf+z0O/OqC5zrMuxkM6VmkHCRmt8fqzcFXBNBNA71AVRpxe3KRoYlXxBXKxay oZquLX8+Xn1VQ== Date: Sun, 31 Dec 2023 13:52:07 -0800 Subject: [PATCH 29/44] xfs: scrub the realtime refcount btree From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852048.1766284.18012262642412571038.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add code to scrub realtime refcount btrees. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_fs.h | 3 fs/xfs/scrub/bmap.c | 1 fs/xfs/scrub/bmap_repair.c | 1 fs/xfs/scrub/common.c | 40 +++- fs/xfs/scrub/common.h | 5 fs/xfs/scrub/health.c | 1 fs/xfs/scrub/inode.c | 1 fs/xfs/scrub/rtrefcount.c | 497 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.c | 7 + fs/xfs/scrub/scrub.h | 3 fs/xfs/scrub/stats.c | 1 fs/xfs/scrub/trace.h | 4 13 files changed, 551 insertions(+), 14 deletions(-) create mode 100644 fs/xfs/scrub/rtrefcount.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 783fc053d7be9..4d4f340d904fc 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -194,6 +194,7 @@ xfs-$(CONFIG_XFS_ONLINE_SCRUB_STATS) += scrub/stats.o xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ rgsuper.o \ rtbitmap.o \ + rtrefcount.o \ rtrmap.o \ rtsummary.o \ ) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 0bbdbfb0a8ae7..7847da61db232 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -738,9 +738,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_RGSUPER 30 /* realtime superblock */ #define XFS_SCRUB_TYPE_RGBITMAP 31 /* realtime group bitmap */ #define XFS_SCRUB_TYPE_RTRMAPBT 32 /* rtgroup reverse mapping btree */ +#define XFS_SCRUB_TYPE_RTREFCBT 33 /* realtime reference count btree */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 33 +#define XFS_SCRUB_TYPE_NR 34 /* * This special type code only applies to the vectored scrub implementation. diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 696ac7208c4d5..5531ba2295b12 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -1038,6 +1038,7 @@ xchk_bmap( case XFS_DINODE_FMT_DEV: case XFS_DINODE_FMT_LOCAL: case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: /* No mappings to check. */ if (whichfork == XFS_COW_FORK) xchk_fblock_set_corrupt(sc, whichfork, 0); diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 25c52caae58a4..26dc04923944b 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -852,6 +852,7 @@ xrep_bmap_check_inputs( case XFS_DINODE_FMT_LOCAL: case XFS_DINODE_FMT_UUID: case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: return -ECANCELED; case XFS_DINODE_FMT_EXTENTS: case XFS_DINODE_FMT_BTREE: diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 558e267924399..a9801a5bb0383 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -37,6 +37,7 @@ #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" #include "xfs_bmap_util.h" +#include "xfs_rtrefcount_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -885,6 +886,10 @@ xchk_rtgroup_init( sr->rmap_cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, sr->rtg, sr->rtg->rtg_rmapip); + if (xfs_has_rtreflink(sc->mp) && (rtglock_flags & XFS_RTGLOCK_REFCOUNT)) + sr->refc_cur = xfs_rtrefcountbt_init_cursor(sc->mp, sc->tp, + sr->rtg, sr->rtg->rtg_refcountip); + return 0; } @@ -899,7 +904,10 @@ xchk_rtgroup_btcur_free( { if (sr->rmap_cur) xfs_btree_del_cursor(sr->rmap_cur, XFS_BTREE_ERROR); + if (sr->refc_cur) + xfs_btree_del_cursor(sr->refc_cur, XFS_BTREE_ERROR); + sr->refc_cur = NULL; sr->rmap_cur = NULL; } @@ -1762,16 +1770,26 @@ xchk_inode_count_blocks( } cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, sc->sr.rtg, sc->ip); - error = xfs_btree_count_blocks(cur, &btblocks); - xfs_btree_del_cursor(cur, error); - if (error) - return error; - - *nextents = 0; - *count = btblocks - 1; - return 0; - default: - return xfs_bmap_count_blocks(sc->tp, sc->ip, whichfork, - nextents, count); + goto meta_btree; + case XFS_DINODE_FMT_REFCOUNT: + if (!sc->sr.rtg) { + ASSERT(0); + return -EFSCORRUPTED; + } + cur = xfs_rtrefcountbt_init_cursor(sc->mp, sc->tp, sc->sr.rtg, + sc->ip); + goto meta_btree; } + + return xfs_bmap_count_blocks(sc->tp, sc->ip, whichfork, nextents, + count); +meta_btree: + error = xfs_btree_count_blocks(cur, &btblocks); + xfs_btree_del_cursor(cur, error); + if (error) + return error; + + *nextents = 0; + *count = btblocks - 1; + return 0; } diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 5dc481f69d160..ed22e1403d0f0 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -85,12 +85,14 @@ int xchk_setup_rtsummary(struct xfs_scrub *sc); int xchk_setup_rgsuperblock(struct xfs_scrub *sc); int xchk_setup_rgbitmap(struct xfs_scrub *sc); int xchk_setup_rtrmapbt(struct xfs_scrub *sc); +int xchk_setup_rtrefcountbt(struct xfs_scrub *sc); #else # define xchk_setup_rtbitmap xchk_setup_nothing # define xchk_setup_rtsummary xchk_setup_nothing # define xchk_setup_rgsuperblock xchk_setup_nothing # define xchk_setup_rgbitmap xchk_setup_nothing # define xchk_setup_rtrmapbt xchk_setup_nothing +# define xchk_setup_rtrefcountbt xchk_setup_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_ino_dqattach(struct xfs_scrub *sc); @@ -152,7 +154,8 @@ void xchk_rt_unlock_rtbitmap(struct xfs_scrub *sc); /* All the locks we need to check an rtgroup. */ #define XCHK_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP_SHARED | \ - XFS_RTGLOCK_RMAP) + XFS_RTGLOCK_RMAP | \ + XFS_RTGLOCK_REFCOUNT) int xchk_rtgroup_init(struct xfs_scrub *sc, xfs_rgnumber_t rgno, struct xchk_rt *sr, unsigned int rtglock_flags); diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c index 0fec833057697..10af83eb2dbd7 100644 --- a/fs/xfs/scrub/health.c +++ b/fs/xfs/scrub/health.c @@ -116,6 +116,7 @@ static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = { [XFS_SCRUB_TYPE_METAPATH] = { XHG_FS, XFS_SICK_FS_METAPATH }, [XFS_SCRUB_TYPE_RGSUPER] = { XHG_RTGROUP, XFS_SICK_RT_SUPER }, [XFS_SCRUB_TYPE_RTRMAPBT] = { XHG_RTGROUP, XFS_SICK_RT_RMAPBT }, + [XFS_SCRUB_TYPE_RTREFCBT] = { XHG_RTGROUP, XFS_SICK_RT_REFCNTBT }, }; /* Return the health status mask for this scrub type. */ diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index 705865ec6c1c0..f746032a9db88 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -498,6 +498,7 @@ xchk_dinode( xchk_ino_set_corrupt(sc, ino); break; case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: if (!S_ISREG(mode)) xchk_ino_set_corrupt(sc, ino); break; diff --git a/fs/xfs/scrub/rtrefcount.c b/fs/xfs/scrub/rtrefcount.c new file mode 100644 index 0000000000000..cc44438eece76 --- /dev/null +++ b/fs/xfs/scrub/rtrefcount.c @@ -0,0 +1,497 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2021-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_btree.h" +#include "xfs_rmap.h" +#include "xfs_refcount.h" +#include "xfs_inode.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" +#include "xfs_imeta.h" +#include "xfs_rtrefcount_btree.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" + +/* Set us up with the realtime refcount metadata locked. */ +int +xchk_setup_rtrefcountbt( + struct xfs_scrub *sc) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_rtgroup *rtg; + int error; + + if (xchk_need_intent_drain(sc)) + xchk_fsgates_enable(sc, XCHK_FSGATES_DRAIN); + + rtg = xfs_rtgroup_get(mp, sc->sm->sm_agno); + if (!rtg) + return -ENOENT; + + error = xchk_setup_rt(sc); + if (error) + goto out_rtg; + + error = xchk_install_live_inode(sc, rtg->rtg_refcountip); + if (error) + goto out_rtg; + + error = xchk_ino_dqattach(sc); + if (error) + goto out_rtg; + + error = xchk_rtgroup_init(sc, rtg->rtg_rgno, &sc->sr, XCHK_RTGLOCK_ALL); +out_rtg: + xfs_rtgroup_put(rtg); + return error; +} + +/* Realtime Reference count btree scrubber. */ + +/* + * Confirming Reference Counts via Reverse Mappings + * + * We want to count the reverse mappings overlapping a refcount record + * (bno, len, refcount), allowing for the possibility that some of the + * overlap may come from smaller adjoining reverse mappings, while some + * comes from single extents which overlap the range entirely. The + * outer loop is as follows: + * + * 1. For all reverse mappings overlapping the refcount extent, + * a. If a given rmap completely overlaps, mark it as seen. + * b. Otherwise, record the fragment (in agbno order) for later + * processing. + * + * Once we've seen all the rmaps, we know that for all blocks in the + * refcount record we want to find $refcount owners and we've already + * visited $seen extents that overlap all the blocks. Therefore, we + * need to find ($refcount - $seen) owners for every block in the + * extent; call that quantity $target_nr. Proceed as follows: + * + * 2. Pull the first $target_nr fragments from the list; all of them + * should start at or before the start of the extent. + * Call this subset of fragments the working set. + * 3. Until there are no more unprocessed fragments, + * a. Find the shortest fragments in the set and remove them. + * b. Note the block number of the end of these fragments. + * c. Pull the same number of fragments from the list. All of these + * fragments should start at the block number recorded in the + * previous step. + * d. Put those fragments in the set. + * 4. Check that there are $target_nr fragments remaining in the list, + * and that they all end at or beyond the end of the refcount extent. + * + * If the refcount is correct, all the check conditions in the algorithm + * should always hold true. If not, the refcount is incorrect. + */ +struct xchk_rtrefcnt_frag { + struct list_head list; + struct xfs_rmap_irec rm; +}; + +struct xchk_rtrefcnt_check { + struct xfs_scrub *sc; + struct list_head fragments; + + /* refcount extent we're examining */ + xfs_rgblock_t bno; + xfs_extlen_t len; + xfs_nlink_t refcount; + + /* number of owners seen */ + xfs_nlink_t seen; +}; + +/* + * Decide if the given rmap is large enough that we can redeem it + * towards refcount verification now, or if it's a fragment, in + * which case we'll hang onto it in the hopes that we'll later + * discover that we've collected exactly the correct number of + * fragments as the rtrefcountbt says we should have. + */ +STATIC int +xchk_rtrefcountbt_rmap_check( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xchk_rtrefcnt_check *refchk = priv; + struct xchk_rtrefcnt_frag *frag; + xfs_rgblock_t rm_last; + xfs_rgblock_t rc_last; + int error = 0; + + if (xchk_should_terminate(refchk->sc, &error)) + return error; + + rm_last = rec->rm_startblock + rec->rm_blockcount - 1; + rc_last = refchk->bno + refchk->len - 1; + + /* Confirm that a single-owner refc extent is a CoW stage. */ + if (refchk->refcount == 1 && rec->rm_owner != XFS_RMAP_OWN_COW) { + xchk_btree_xref_set_corrupt(refchk->sc, cur, 0); + return 0; + } + + if (rec->rm_startblock <= refchk->bno && rm_last >= rc_last) { + /* + * The rmap overlaps the refcount record, so we can confirm + * one refcount owner seen. + */ + refchk->seen++; + } else { + /* + * This rmap covers only part of the refcount record, so + * save the fragment for later processing. If the rmapbt + * is healthy each rmap_irec we see will be in agbno order + * so we don't need insertion sort here. + */ + frag = kmalloc(sizeof(struct xchk_rtrefcnt_frag), + XCHK_GFP_FLAGS); + if (!frag) + return -ENOMEM; + memcpy(&frag->rm, rec, sizeof(frag->rm)); + list_add_tail(&frag->list, &refchk->fragments); + } + + return 0; +} + +/* + * Given a bunch of rmap fragments, iterate through them, keeping + * a running tally of the refcount. If this ever deviates from + * what we expect (which is the rtrefcountbt's refcount minus the + * number of extents that totally covered the rtrefcountbt extent), + * we have a rtrefcountbt error. + */ +STATIC void +xchk_rtrefcountbt_process_rmap_fragments( + struct xchk_rtrefcnt_check *refchk) +{ + struct list_head worklist; + struct xchk_rtrefcnt_frag *frag; + struct xchk_rtrefcnt_frag *n; + xfs_rgblock_t bno; + xfs_rgblock_t rbno; + xfs_rgblock_t next_rbno; + xfs_nlink_t nr; + xfs_nlink_t target_nr; + + target_nr = refchk->refcount - refchk->seen; + if (target_nr == 0) + return; + + /* + * There are (refchk->rc.rc_refcount - refchk->nr refcount) + * references we haven't found yet. Pull that many off the + * fragment list and figure out where the smallest rmap ends + * (and therefore the next rmap should start). All the rmaps + * we pull off should start at or before the beginning of the + * refcount record's range. + */ + INIT_LIST_HEAD(&worklist); + rbno = NULLRGBLOCK; + + /* Make sure the fragments actually /are/ in bno order. */ + bno = 0; + list_for_each_entry(frag, &refchk->fragments, list) { + if (frag->rm.rm_startblock < bno) + goto done; + bno = frag->rm.rm_startblock; + } + + /* + * Find all the rmaps that start at or before the refc extent, + * and put them on the worklist. + */ + nr = 0; + list_for_each_entry_safe(frag, n, &refchk->fragments, list) { + if (frag->rm.rm_startblock > refchk->bno || nr > target_nr) + break; + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount; + if (bno < rbno) + rbno = bno; + list_move_tail(&frag->list, &worklist); + nr++; + } + + /* + * We should have found exactly $target_nr rmap fragments starting + * at or before the refcount extent. + */ + if (nr != target_nr) + goto done; + + while (!list_empty(&refchk->fragments)) { + /* Discard any fragments ending at rbno from the worklist. */ + nr = 0; + next_rbno = NULLRGBLOCK; + list_for_each_entry_safe(frag, n, &worklist, list) { + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount; + if (bno != rbno) { + if (bno < next_rbno) + next_rbno = bno; + continue; + } + list_del(&frag->list); + kfree(frag); + nr++; + } + + /* Try to add nr rmaps starting at rbno to the worklist. */ + list_for_each_entry_safe(frag, n, &refchk->fragments, list) { + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount; + if (frag->rm.rm_startblock != rbno) + goto done; + list_move_tail(&frag->list, &worklist); + if (next_rbno > bno) + next_rbno = bno; + nr--; + if (nr == 0) + break; + } + + /* + * If we get here and nr > 0, this means that we added fewer + * items to the worklist than we discarded because the fragment + * list ran out of items. Therefore, we cannot maintain the + * required refcount. Something is wrong, so we're done. + */ + if (nr) + goto done; + + rbno = next_rbno; + } + + /* + * Make sure the last extent we processed ends at or beyond + * the end of the refcount extent. + */ + if (rbno < refchk->bno + refchk->len) + goto done; + + /* Actually record us having seen the remaining refcount. */ + refchk->seen = refchk->refcount; +done: + /* Delete fragments and work list. */ + list_for_each_entry_safe(frag, n, &worklist, list) { + list_del(&frag->list); + kfree(frag); + } + list_for_each_entry_safe(frag, n, &refchk->fragments, list) { + list_del(&frag->list); + kfree(frag); + } +} + +/* Use the rmap entries covering this extent to verify the refcount. */ +STATIC void +xchk_rtrefcountbt_xref_rmap( + struct xfs_scrub *sc, + const struct xfs_refcount_irec *irec) +{ + struct xchk_rtrefcnt_check refchk = { + .sc = sc, + .bno = irec->rc_startblock, + .len = irec->rc_blockcount, + .refcount = irec->rc_refcount, + .seen = 0, + }; + struct xfs_rmap_irec low; + struct xfs_rmap_irec high; + struct xchk_rtrefcnt_frag *frag; + struct xchk_rtrefcnt_frag *n; + int error; + + if (!sc->sr.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + /* Cross-reference with the rmapbt to confirm the refcount. */ + memset(&low, 0, sizeof(low)); + low.rm_startblock = irec->rc_startblock; + memset(&high, 0xFF, sizeof(high)); + high.rm_startblock = irec->rc_startblock + irec->rc_blockcount - 1; + + INIT_LIST_HEAD(&refchk.fragments); + error = xfs_rmap_query_range(sc->sr.rmap_cur, &low, &high, + xchk_rtrefcountbt_rmap_check, &refchk); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + goto out_free; + + xchk_rtrefcountbt_process_rmap_fragments(&refchk); + if (irec->rc_refcount != refchk.seen) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); + +out_free: + list_for_each_entry_safe(frag, n, &refchk.fragments, list) { + list_del(&frag->list); + kfree(frag); + } +} + +/* Cross-reference with the other btrees. */ +STATIC void +xchk_rtrefcountbt_xref( + struct xfs_scrub *sc, + const struct xfs_refcount_irec *irec) +{ + xfs_rtblock_t rtbno; + + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, + irec->rc_startblock); + xchk_xref_is_used_rt_space(sc, rtbno, irec->rc_blockcount); + xchk_rtrefcountbt_xref_rmap(sc, irec); +} + +struct xchk_rtrefcbt_records { + /* Previous refcount record. */ + struct xfs_refcount_irec prev_rec; + + /* Number of CoW blocks we expect. */ + xfs_extlen_t cow_blocks; +}; + +static inline bool +xchk_rtrefcount_mergeable( + struct xchk_rtrefcbt_records *rrc, + const struct xfs_refcount_irec *r2) +{ + const struct xfs_refcount_irec *r1 = &rrc->prev_rec; + + /* Ignore if prev_rec is not yet initialized. */ + if (r1->rc_blockcount > 0) + return false; + + if (r1->rc_startblock + r1->rc_blockcount != r2->rc_startblock) + return false; + if (r1->rc_refcount != r2->rc_refcount) + return false; + if ((unsigned long long)r1->rc_blockcount + r2->rc_blockcount > + XFS_REFC_LEN_MAX) + return false; + + return true; +} + +/* Flag failures for records that could be merged. */ +STATIC void +xchk_rtrefcountbt_check_mergeable( + struct xchk_btree *bs, + struct xchk_rtrefcbt_records *rrc, + const struct xfs_refcount_irec *irec) +{ + if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + if (xchk_rtrefcount_mergeable(rrc, irec)) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + + memcpy(&rrc->prev_rec, irec, sizeof(struct xfs_refcount_irec)); +} + +/* Scrub a rtrefcountbt record. */ +STATIC int +xchk_rtrefcountbt_rec( + struct xchk_btree *bs, + const union xfs_btree_rec *rec) +{ + struct xfs_mount *mp = bs->cur->bc_mp; + struct xchk_rtrefcbt_records *rrc = bs->private; + struct xfs_refcount_irec irec; + u32 mod; + + xfs_refcount_btrec_to_irec(rec, &irec); + if (xfs_rtrefcount_check_irec(bs->cur->bc_ino.rtg, &irec) != NULL) { + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + return 0; + } + + /* We can only share full rt extents. */ + mod = xfs_rtb_to_rtxoff(mp, irec.rc_startblock); + if (mod) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + mod = xfs_rtb_to_rtxoff(mp, irec.rc_blockcount); + if (mod) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + + if (irec.rc_domain == XFS_REFC_DOMAIN_COW) + rrc->cow_blocks += irec.rc_blockcount; + + xchk_rtrefcountbt_check_mergeable(bs, rrc, &irec); + xchk_rtrefcountbt_xref(bs->sc, &irec); + + return 0; +} + +/* Make sure we have as many refc blocks as the rmap says. */ +STATIC void +xchk_refcount_xref_rmap( + struct xfs_scrub *sc, + const struct xfs_owner_info *btree_oinfo, + xfs_extlen_t cow_blocks) +{ + xfs_extlen_t refcbt_blocks = 0; + xfs_filblks_t blocks; + int error; + + if (!sc->sr.rmap_cur || !sc->sa.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + /* Check that we saw as many refcbt blocks as the rmap knows about. */ + error = xfs_btree_count_blocks(sc->sr.refc_cur, &refcbt_blocks); + if (!xchk_btree_process_error(sc, sc->sr.refc_cur, 0, &error)) + return; + error = xchk_count_rmap_ownedby_ag(sc, sc->sa.rmap_cur, btree_oinfo, + &blocks); + if (!xchk_should_check_xref(sc, &error, &sc->sa.rmap_cur)) + return; + if (blocks != refcbt_blocks) + xchk_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0); + + /* Check that we saw as many cow blocks as the rmap knows about. */ + error = xchk_count_rmap_ownedby_ag(sc, sc->sr.rmap_cur, + &XFS_RMAP_OINFO_COW, &blocks); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + return; + if (blocks != cow_blocks) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); +} + +/* Scrub the refcount btree for some AG. */ +int +xchk_rtrefcountbt( + struct xfs_scrub *sc) +{ + struct xfs_owner_info btree_oinfo; + struct xchk_rtrefcbt_records rrc = { + .cow_blocks = 0, + }; + int error; + + error = xchk_metadata_inode_forks(sc); + if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) + return error; + + xfs_rmap_ino_bmbt_owner(&btree_oinfo, sc->sr.rtg->rtg_refcountip->i_ino, + XFS_DATA_FORK); + error = xchk_btree(sc, sc->sr.refc_cur, xchk_rtrefcountbt_rec, + &btree_oinfo, &rrc); + if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) + return error; + + xchk_refcount_xref_rmap(sc, &btree_oinfo, rrc.cow_blocks); + + return 0; +} diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 8193ad6702b4d..2611e0223489c 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -481,6 +481,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .has = xfs_has_rtrmapbt, .repair = xrep_rtrmapbt, }, + [XFS_SCRUB_TYPE_RTREFCBT] = { /* realtime refcountbt */ + .type = ST_RTGROUP, + .setup = xchk_setup_rtrefcountbt, + .scrub = xchk_rtrefcountbt, + .has = xfs_has_rtreflink, + .repair = xrep_notsupported, + }, }; static int diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 38731b99625d6..7f94f31a5236e 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -125,6 +125,7 @@ struct xchk_rt { /* rtgroup btrees */ struct xfs_btree_cur *rmap_cur; + struct xfs_btree_cur *refc_cur; }; struct xfs_scrub { @@ -282,12 +283,14 @@ int xchk_rtsummary(struct xfs_scrub *sc); int xchk_rgsuperblock(struct xfs_scrub *sc); int xchk_rgbitmap(struct xfs_scrub *sc); int xchk_rtrmapbt(struct xfs_scrub *sc); +int xchk_rtrefcountbt(struct xfs_scrub *sc); #else # define xchk_rtbitmap xchk_nothing # define xchk_rtsummary xchk_nothing # define xchk_rgsuperblock xchk_nothing # define xchk_rgbitmap xchk_nothing # define xchk_rtrmapbt xchk_nothing +# define xchk_rtrefcountbt xchk_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_quota(struct xfs_scrub *sc); diff --git a/fs/xfs/scrub/stats.c b/fs/xfs/scrub/stats.c index 0da7ecabfe9d9..0e0be23adfcb4 100644 --- a/fs/xfs/scrub/stats.c +++ b/fs/xfs/scrub/stats.c @@ -84,6 +84,7 @@ static const char *name_map[XFS_SCRUB_TYPE_NR] = { [XFS_SCRUB_TYPE_RGSUPER] = "rgsuper", [XFS_SCRUB_TYPE_RGBITMAP] = "rgbitmap", [XFS_SCRUB_TYPE_RTRMAPBT] = "rtrmapbt", + [XFS_SCRUB_TYPE_RTREFCBT] = "rtrefcountbt", }; /* Format the scrub stats into a text buffer, similar to pcp style. */ diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 72b5277f4ba6d..2d373c9a53ad6 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -88,6 +88,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_METAPATH); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGSUPER); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGBITMAP); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RTRMAPBT); +TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RTREFCBT); #define XFS_SCRUB_TYPE_STRINGS \ { XFS_SCRUB_TYPE_PROBE, "probe" }, \ @@ -123,7 +124,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RTRMAPBT); { XFS_SCRUB_TYPE_METAPATH, "metapath" }, \ { XFS_SCRUB_TYPE_RGSUPER, "rgsuper" }, \ { XFS_SCRUB_TYPE_RGBITMAP, "rgbitmap" }, \ - { XFS_SCRUB_TYPE_RTRMAPBT, "rtrmapbt" } + { XFS_SCRUB_TYPE_RTRMAPBT, "rtrmapbt" }, \ + { XFS_SCRUB_TYPE_RTREFCBT, "rtrefcountbt" } #define XFS_SCRUB_FLAG_STRINGS \ { XFS_SCRUB_IFLAG_REPAIR, "repair" }, \ From patchwork Sun Dec 31 21:52:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507743 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A4DCC8CA for ; Sun, 31 Dec 2023 21:52:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mJpHaldM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DCBEBC433C7; Sun, 31 Dec 2023 21:52:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059543; bh=YjE6odRmo5fDdxD3+B213t3nX3TE0CXtZgR458zQWNc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=mJpHaldMeZ/4aJG2uEUw864vGuHWpLoxbut2cGFvYZPzkfOmDAdno0x1PbFmwK+PJ miofy8xtV1r88ql7sWsT8B28kSgw/kMQAg0ZaitZjvvEvWIaWLfFS/oAG82gogNhPB 1Wl7LZGwVRnRbKOxcgULPJ69xmAZDhqshplRe6eOoUprC5H5x5L8+nMESEegc78okD pUb7oa8iBbuCCMFVnk5ZtPsVpmQkPA+bulVoa6XuloRnPCKmUy3mPcdbWA3+eIKN52 xhS8eXhhbwFD2YfHFJtrkhfy87eo1YnNYqhGikOWd/auGn+Wp8gKYdo/u9CrOstg7c L9UaHdoyhxFfg== Date: Sun, 31 Dec 2023 13:52:23 -0800 Subject: [PATCH 30/44] xfs: cross-reference checks with the rt refcount btree From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852065.1766284.2087369433078609917.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Use the realtime refcount btree to implement cross-reference checks in other data structures. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/bmap.c | 30 +++++++++++++--- fs/xfs/scrub/rtbitmap.c | 2 + fs/xfs/scrub/rtrefcount.c | 86 +++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/rtrmap.c | 37 +++++++++++++++++++ fs/xfs/scrub/scrub.h | 9 +++++ 5 files changed, 158 insertions(+), 6 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 5531ba2295b12..41b83515b4491 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -348,12 +348,30 @@ xchk_bmap_rt_iextent_xref( xchk_xref_is_used_rt_space(info->sc, irec->br_startblock, irec->br_blockcount); - xchk_bmap_xref_rmap(info, irec, rgbno); - - xfs_rmap_ino_owner(&oinfo, info->sc->ip->i_ino, info->whichfork, - irec->br_startoff); - xchk_xref_is_only_rt_owned_by(info->sc, rgbno, irec->br_blockcount, - &oinfo); + switch (info->whichfork) { + case XFS_DATA_FORK: + xchk_bmap_xref_rmap(info, irec, rgbno); + if (!xfs_is_reflink_inode(info->sc->ip)) { + xfs_rmap_ino_owner(&oinfo, info->sc->ip->i_ino, + info->whichfork, irec->br_startoff); + xchk_xref_is_only_rt_owned_by(info->sc, rgbno, + irec->br_blockcount, &oinfo); + xchk_xref_is_not_rt_shared(info->sc, rgbno, + irec->br_blockcount); + } + xchk_xref_is_not_rt_cow_staging(info->sc, rgbno, + irec->br_blockcount); + break; + case XFS_COW_FORK: + xchk_bmap_xref_rmap_cow(info, irec, rgbno); + xchk_xref_is_only_rt_owned_by(info->sc, rgbno, + irec->br_blockcount, &XFS_RMAP_OINFO_COW); + xchk_xref_is_rt_cow_staging(info->sc, rgbno, + irec->br_blockcount); + xchk_xref_is_not_rt_shared(info->sc, rgbno, + irec->br_blockcount); + break; + } out_free: xchk_rtgroup_btcur_free(&info->sc->sr); diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 47463ef336eed..234c9a1f224d5 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -158,6 +158,8 @@ xchk_rgbitmap_xref( rgbno = xfs_rtb_to_rgbno(sc->mp, startblock, &rgno); xchk_xref_has_no_rt_owner(sc, rgbno, blockcount); + xchk_xref_is_not_rt_shared(sc, rgbno, blockcount); + xchk_xref_is_not_rt_cow_staging(sc, rgbno, blockcount); if (rgb->next_free_rtblock < startblock) { xfs_rgblock_t next_rgbno; diff --git a/fs/xfs/scrub/rtrefcount.c b/fs/xfs/scrub/rtrefcount.c index cc44438eece76..360b9c6a4c04d 100644 --- a/fs/xfs/scrub/rtrefcount.c +++ b/fs/xfs/scrub/rtrefcount.c @@ -495,3 +495,89 @@ xchk_rtrefcountbt( return 0; } + +/* xref check that a cow staging extent is marked in the rtrefcountbt. */ +void +xchk_xref_is_rt_cow_staging( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + struct xfs_refcount_irec rc; + int has_refcount; + int error; + + if (!sc->sr.refc_cur || xchk_skip_xref(sc->sm)) + return; + + /* Find the CoW staging extent. */ + error = xfs_refcount_lookup_le(sc->sr.refc_cur, XFS_REFC_DOMAIN_COW, + bno, &has_refcount); + if (!xchk_should_check_xref(sc, &error, &sc->sr.refc_cur)) + return; + if (!has_refcount) { + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); + return; + } + + error = xfs_refcount_get_rec(sc->sr.refc_cur, &rc, &has_refcount); + if (!xchk_should_check_xref(sc, &error, &sc->sr.refc_cur)) + return; + if (!has_refcount) { + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); + return; + } + + /* CoW lookup returned a shared extent record? */ + if (rc.rc_domain != XFS_REFC_DOMAIN_COW) + xchk_btree_xref_set_corrupt(sc, sc->sa.refc_cur, 0); + + /* Must be at least as long as what was passed in */ + if (rc.rc_blockcount < len) + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); +} + +/* + * xref check that the extent is not shared. Only file data blocks + * can have multiple owners. + */ +void +xchk_xref_is_not_rt_shared( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + enum xbtree_recpacking outcome; + int error; + + if (!sc->sr.refc_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_refcount_has_records(sc->sr.refc_cur, + XFS_REFC_DOMAIN_SHARED, bno, len, &outcome); + if (!xchk_should_check_xref(sc, &error, &sc->sr.refc_cur)) + return; + if (outcome != XBTREE_RECPACKING_EMPTY) + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); +} + +/* xref check that the extent is not being used for CoW staging. */ +void +xchk_xref_is_not_rt_cow_staging( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + enum xbtree_recpacking outcome; + int error; + + if (!sc->sr.refc_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_refcount_has_records(sc->sr.refc_cur, XFS_REFC_DOMAIN_COW, + bno, len, &outcome); + if (!xchk_should_check_xref(sc, &error, &sc->sr.refc_cur)) + return; + if (outcome != XBTREE_RECPACKING_EMPTY) + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); +} diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index ce21fa95a5da7..60751c21fe52e 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -22,6 +22,7 @@ #include "xfs_rtalloc.h" #include "xfs_rtgroup.h" #include "xfs_imeta.h" +#include "xfs_refcount.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -158,6 +159,37 @@ xchk_rtrmapbt_check_mergeable( memcpy(&cr->prev_rec, irec, sizeof(struct xfs_rmap_irec)); } +/* Cross-reference a rmap against the refcount btree. */ +STATIC void +xchk_rtrmapbt_xref_rtrefc( + struct xfs_scrub *sc, + struct xfs_rmap_irec *irec) +{ + xfs_rgblock_t fbno; + xfs_extlen_t flen; + bool is_inode; + bool is_bmbt; + bool is_attr; + bool is_unwritten; + int error; + + if (!sc->sr.refc_cur || xchk_skip_xref(sc->sm)) + return; + + is_inode = !XFS_RMAP_NON_INODE_OWNER(irec->rm_owner); + is_bmbt = irec->rm_flags & XFS_RMAP_BMBT_BLOCK; + is_attr = irec->rm_flags & XFS_RMAP_ATTR_FORK; + is_unwritten = irec->rm_flags & XFS_RMAP_UNWRITTEN; + + /* If this is shared, must be a data fork extent. */ + error = xfs_refcount_find_shared(sc->sr.refc_cur, irec->rm_startblock, + irec->rm_blockcount, &fbno, &flen, false); + if (!xchk_should_check_xref(sc, &error, &sc->sr.refc_cur)) + return; + if (flen != 0 && (!is_inode || is_attr || is_bmbt || is_unwritten)) + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); +} + /* Cross-reference with other metadata. */ STATIC void xchk_rtrmapbt_xref( @@ -173,6 +205,11 @@ xchk_rtrmapbt_xref( irec->rm_startblock); xchk_xref_is_used_rt_space(sc, rtbno, irec->rm_blockcount); + if (irec->rm_owner == XFS_RMAP_OWN_COW) + xchk_xref_is_cow_staging(sc, irec->rm_startblock, + irec->rm_blockcount); + else + xchk_rtrmapbt_xref_rtrefc(sc, irec); } /* Scrub a realtime rmapbt record. */ diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 7f94f31a5236e..93a4aba096d57 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -330,11 +330,20 @@ void xchk_xref_has_rt_owner(struct xfs_scrub *sc, xfs_rgblock_t rgbno, xfs_extlen_t len); void xchk_xref_is_only_rt_owned_by(struct xfs_scrub *sc, xfs_rgblock_t rgbno, xfs_extlen_t len, const struct xfs_owner_info *oinfo); +void xchk_xref_is_rt_cow_staging(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); +void xchk_xref_is_not_rt_shared(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); +void xchk_xref_is_not_rt_cow_staging(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); #else # define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0) # define xchk_xref_has_no_rt_owner(sc, rtbno, len) do { } while (0) # define xchk_xref_has_rt_owner(sc, rtbno, len) do { } while (0) # define xchk_xref_is_only_rt_owned_by(sc, bno, len, oinfo) do { } while (0) +# define xchk_xref_is_rt_cow_staging(sc, bno, len) do { } while (0) +# define xchk_xref_is_not_rt_shared(sc, bno, len) do { } while (0) +# define xchk_xref_is_not_rt_cow_staging(sc, bno, len) do { } while (0) #endif #endif /* __XFS_SCRUB_SCRUB_H__ */ From patchwork Sun Dec 31 21:52:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507744 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE8E4C8CB for ; Sun, 31 Dec 2023 21:52:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="g91tbVwq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8FBBBC433C8; Sun, 31 Dec 2023 21:52:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059559; bh=MnzpG1AuU4zOHOFyjFX+FAxXdwC2ovwMXJj+3j65ezA=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=g91tbVwqoIxICQL8lfcin4UW6h0C6WmRIYuxKG2vgF+9I03Qt/WsYGKWEZXZc00q8 RT8wtwZ1fPa7aGIyljImtT58tOUpM0VtO2qs+Tjhu6EPFNtoiTsBlMJslzO/Zup8bH Wuk3boLQJqrpkcQ71Z5QAfIThFiEvj2XzvIWVLPe/1rVxdC0y08HNc9lx4EHPmA50e E2Dv9io4VeJcGGp8M4OONwQ7KDHmlIMlGkBLI1fL8LjMgC86O4EEIdTeLIjxzArE7f a1Njv7H/rOhF+JKM1WUPcKWmqoKwtnJAyUe2S0ZyXMlqLZ3dYxqyncFezjaHF9TR0R N3rM+mR7iMemQ== Date: Sun, 31 Dec 2023 13:52:39 -0800 Subject: [PATCH 31/44] xfs: allow overlapping rtrmapbt records for shared data extents From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852081.1766284.6803325257041389359.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Allow overlapping realtime reverse mapping records if they both describe shared data extents and the fs supports reflink on the realtime volume. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/rtrmap.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index 60751c21fe52e..773be13c3be74 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -87,6 +87,18 @@ struct xchk_rtrmap { struct xfs_rmap_irec prev_rec; }; +static inline bool +xchk_rtrmapbt_is_shareable( + struct xfs_scrub *sc, + const struct xfs_rmap_irec *irec) +{ + if (!xfs_has_rtreflink(sc->mp)) + return false; + if (irec->rm_flags & XFS_RMAP_UNWRITTEN) + return false; + return true; +} + /* Flag failures for records that overlap but cannot. */ STATIC void xchk_rtrmapbt_check_overlapping( @@ -108,7 +120,10 @@ xchk_rtrmapbt_check_overlapping( if (pnext <= irec->rm_startblock) goto set_prev; - xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + /* Overlap is only allowed if both records are data fork mappings. */ + if (!xchk_rtrmapbt_is_shareable(bs->sc, &cr->overlap_rec) || + !xchk_rtrmapbt_is_shareable(bs->sc, irec)) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); /* Save whichever rmap record extends furthest. */ inext = irec->rm_startblock + irec->rm_blockcount; From patchwork Sun Dec 31 21:52:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507745 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD6B4C8CA for ; Sun, 31 Dec 2023 21:52:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Hrl1iOOr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3F1A7C433C8; Sun, 31 Dec 2023 21:52:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059575; bh=aJ0KbX9v7rHAnbQFOJ0Yya2pr1X1r0rduz5OP/3lepc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Hrl1iOOrwZ4rEwYZ9zoawQXpzLrE+ZwyDsCCx5FSMhauHb+b2CFJ2dFpJY9KqRogm ol10NekKlkzLvmcfvK2YzOP8lEq7oqohoxNpOA3XbD3B+UDPIYGkgfeORTGoJ9ZTak KBCBsN2XQk2poF/RmRcld0RwA/e0Za/FncemL8JYvI2/2JUCZhSfzbdVZc7oKKrAmF 51hSa1R3rBQMXc/rgxxKtLU2dpz2mnnC0+zTXZ6xeRFUEx682jy8qB8KpBHHP/cy2H cuv+qK9PHAVGJe+83wcvgEd7h0AGe6FZqLnhb4lctK84lC24/b748yFneCJwNafLgG qNjAv/MiAlAFw== Date: Sun, 31 Dec 2023 13:52:54 -0800 Subject: [PATCH 32/44] xfs: check reference counts of gaps between rt refcount records From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852096.1766284.10510831147720898764.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong If there's a gap between records in the rt refcount btree, we ought to cross-reference the gap with the rtrmap records to make sure that there aren't any overlapping records for a region that doesn't have any shared ownership. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/rtrefcount.c | 81 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 80 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/rtrefcount.c b/fs/xfs/scrub/rtrefcount.c index 360b9c6a4c04d..058a0ea18e09b 100644 --- a/fs/xfs/scrub/rtrefcount.c +++ b/fs/xfs/scrub/rtrefcount.c @@ -17,6 +17,7 @@ #include "xfs_rtgroup.h" #include "xfs_imeta.h" #include "xfs_rtrefcount_btree.h" +#include "xfs_rtalloc.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/btree.h" @@ -359,8 +360,14 @@ struct xchk_rtrefcbt_records { /* Previous refcount record. */ struct xfs_refcount_irec prev_rec; + /* The next rtgroup block where we aren't expecting shared extents. */ + xfs_rgblock_t next_unshared_rgbno; + /* Number of CoW blocks we expect. */ xfs_extlen_t cow_blocks; + + /* Was the last record a shared or CoW staging extent? */ + enum xfs_refc_domain prev_domain; }; static inline bool @@ -401,6 +408,53 @@ xchk_rtrefcountbt_check_mergeable( memcpy(&rrc->prev_rec, irec, sizeof(struct xfs_refcount_irec)); } +STATIC int +xchk_rtrefcountbt_rmap_check_gap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + xfs_rgblock_t *next_bno = priv; + + if (*next_bno != NULLRGBLOCK && rec->rm_startblock < *next_bno) + return -ECANCELED; + + *next_bno = rec->rm_startblock + rec->rm_blockcount; + return 0; +} + +/* + * Make sure that a gap in the reference count records does not correspond to + * overlapping records (i.e. shared extents) in the reverse mappings. + */ +static inline void +xchk_rtrefcountbt_xref_gaps( + struct xfs_scrub *sc, + struct xchk_rtrefcbt_records *rrc, + xfs_rtblock_t bno) +{ + struct xfs_rmap_irec low; + struct xfs_rmap_irec high; + xfs_rgblock_t next_bno = NULLRGBLOCK; + int error; + + if (bno <= rrc->next_unshared_rgbno || !sc->sr.rmap_cur || + xchk_skip_xref(sc->sm)) + return; + + memset(&low, 0, sizeof(low)); + low.rm_startblock = rrc->next_unshared_rgbno; + memset(&high, 0xFF, sizeof(high)); + high.rm_startblock = bno - 1; + + error = xfs_rmap_query_range(sc->sr.rmap_cur, &low, &high, + xchk_rtrefcountbt_rmap_check_gap, &next_bno); + if (error == -ECANCELED) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); + else + xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur); +} + /* Scrub a rtrefcountbt record. */ STATIC int xchk_rtrefcountbt_rec( @@ -429,9 +483,26 @@ xchk_rtrefcountbt_rec( if (irec.rc_domain == XFS_REFC_DOMAIN_COW) rrc->cow_blocks += irec.rc_blockcount; + /* Shared records always come before CoW records. */ + if (irec.rc_domain == XFS_REFC_DOMAIN_SHARED && + rrc->prev_domain == XFS_REFC_DOMAIN_COW) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + rrc->prev_domain = irec.rc_domain; + xchk_rtrefcountbt_check_mergeable(bs, rrc, &irec); xchk_rtrefcountbt_xref(bs->sc, &irec); + /* + * If this is a record for a shared extent, check that all blocks + * between the previous record and this one have at most one reverse + * mapping. + */ + if (irec.rc_domain == XFS_REFC_DOMAIN_SHARED) { + xchk_rtrefcountbt_xref_gaps(bs->sc, rrc, irec.rc_startblock); + rrc->next_unshared_rgbno = irec.rc_startblock + + irec.rc_blockcount; + } + return 0; } @@ -476,7 +547,9 @@ xchk_rtrefcountbt( { struct xfs_owner_info btree_oinfo; struct xchk_rtrefcbt_records rrc = { - .cow_blocks = 0, + .cow_blocks = 0, + .next_unshared_rgbno = 0, + .prev_domain = XFS_REFC_DOMAIN_SHARED, }; int error; @@ -491,6 +564,12 @@ xchk_rtrefcountbt( if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) return error; + /* + * Check that all blocks between the last refcount > 1 record and the + * end of the rt volume have at most one reverse mapping. + */ + xchk_rtrefcountbt_xref_gaps(sc, &rrc, sc->mp->m_sb.sb_rblocks); + xchk_refcount_xref_rmap(sc, &btree_oinfo, rrc.cow_blocks); return 0; From patchwork Sun Dec 31 21:53:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507746 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54C93C8CA for ; Sun, 31 Dec 2023 21:53:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="EgHKWe0K" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CA347C433C7; Sun, 31 Dec 2023 21:53:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059590; bh=ZzA4MZcasB7a5n5aR64NjjVnANol6BBBxaoohBNZjSA=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=EgHKWe0KGxH3T/sDfO+/41dM2DFjAr4qBT0LB5hqLTL262oOU252D3wwv9DpYApy/ J/LEtEqZiS/rlIEjSI6ctAzn8MT0QHMeLICqIzkW1env1U493f3VGRzUdMriYkIeHE F6K/B6dtb8aqHp9ciMz10ZRP6cG3txkOck79YMLtvHLNBAQUqjivk8aZp/vsXWQx9v FGR7ZmUmhNkfWnRpOA5CkiQ1thlJkZX5w7eDvhx+MrqT0UzFTJVAvCd7eHCpaw3aF9 PwztLc98D4nMzH9nAZuZ3vhFrx6e+B4u4xQHFg+3u+MSDG2ru3VyPW3h4Aqzs5Rljc n3Hbpl1tT1Krg== Date: Sun, 31 Dec 2023 13:53:10 -0800 Subject: [PATCH 33/44] xfs: allow dquot rt block count to exceed rt blocks on reflink fs From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852112.1766284.5409003953055384924.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Update the quota scrubber to allow dquots where the realtime block count exceeds the block count of the rt volume if reflink is enabled. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/quota.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c index 183d531875eae..58d6d4ed2853b 100644 --- a/fs/xfs/scrub/quota.c +++ b/fs/xfs/scrub/quota.c @@ -212,12 +212,18 @@ xchk_quota_item( if (mp->m_sb.sb_dblocks < dq->q_blk.count) xchk_fblock_set_warning(sc, XFS_DATA_FORK, offset); + if (mp->m_sb.sb_rblocks < dq->q_rtb.count) + xchk_fblock_set_warning(sc, XFS_DATA_FORK, + offset); } else { if (mp->m_sb.sb_dblocks < dq->q_blk.count) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset); + if (mp->m_sb.sb_rblocks < dq->q_rtb.count) + xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, + offset); } - if (dq->q_ino.count > fs_icount || dq->q_rtb.count > mp->m_sb.sb_rblocks) + if (dq->q_ino.count > fs_icount) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset); /* From patchwork Sun Dec 31 21:53:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507747 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B32CDD50B for ; Sun, 31 Dec 2023 21:53:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Tgu0rBBn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7CF85C433C8; Sun, 31 Dec 2023 21:53:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059606; bh=RV5c9mQ5dHXqZREnqNyLxaVqTzzPPNx433+lW8OGg7U=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Tgu0rBBng1eyH+2d+SxDSteKhSpGznnJ1OzCbkLgx7xKGQI3L2be+JLUsmeT+NiH5 61I2hEzmhJwOyHoL9h4m0pEEbytRdL91+U0/AGcqyEtX1sM//X+W4PSnAaA4N8N5PB uAHDFGgWPIaKbVGJaHLp0MUtzD70aspdk+FJ0lHokMtxBkgVSSNDvbP3w1o1WOPrqX 6ecj1EwLcRX1PYimSJ6619cf2QRqmK5MvYNOCkEoSo7vwTwZPTZSKvW8NOFk/nq78k wR8dVOl6rcHCnbNmVDQsu2dmsuRzP008bcFmEXW5bB5j7Rar6CEyur7kp/21VJRVg/ CNc/wi03w+c1g== Date: Sun, 31 Dec 2023 13:53:26 -0800 Subject: [PATCH 34/44] xfs: detect and repair misaligned rtinherit directory cowextsize hints From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852128.1766284.6785654642024257461.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong If we encounter a directory that has been configured to pass on a CoW extent size hint to a new realtime file and the hint isn't an integer multiple of the rt extent size, we should flag the hint for administrative review and/or turn it off because that is a misconfiguration. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/inode.c | 26 +++++++++++++++++--------- fs/xfs/scrub/inode_repair.c | 15 +++++++++++++++ 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index f746032a9db88..e52e12e9a1b4b 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -259,12 +259,7 @@ xchk_inode_extsize( xchk_ino_set_warning(sc, ino); } -/* - * Validate di_cowextsize hint. - * - * The rules are documented at xfs_ioctl_setattr_check_cowextsize(). - * These functions must be kept in sync with each other. - */ +/* Validate di_cowextsize hint. */ STATIC void xchk_inode_cowextsize( struct xfs_scrub *sc, @@ -275,12 +270,25 @@ xchk_inode_cowextsize( uint64_t flags2) { xfs_failaddr_t fa; + uint32_t value = be32_to_cpu(dip->di_cowextsize); - fa = xfs_inode_validate_cowextsize(sc->mp, - be32_to_cpu(dip->di_cowextsize), mode, flags, - flags2); + fa = xfs_inode_validate_cowextsize(sc->mp, value, mode, flags, flags2); if (fa) xchk_ino_set_corrupt(sc, ino); + + /* + * XFS allows a sysadmin to change the rt extent size when adding a rt + * section to a filesystem after formatting. If there are any + * directories with cowextsize and rtinherit set, the hint could become + * misaligned with the new rextsize. The verifier doesn't check this, + * because we allow rtinherit directories even without an rt device. + * Flag this as an administrative warning since we will clean this up + * eventually. + */ + if ((flags & XFS_DIFLAG_RTINHERIT) && + (flags2 & XFS_DIFLAG2_COWEXTSIZE) && + value % sc->mp->m_sb.sb_rextsize > 0) + xchk_ino_set_warning(sc, ino); } /* Make sure the di_flags make sense for the inode. */ diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 8d67ae257e597..b6b37652c4a35 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -1836,6 +1836,20 @@ xrep_inode_pptr( sizeof(struct xfs_attr_sf_hdr), true); } +/* Fix COW extent size hint problems. */ +STATIC void +xrep_inode_cowextsize( + struct xfs_scrub *sc) +{ + /* Fix misaligned CoW extent size hints on a directory. */ + if ((sc->ip->i_diflags & XFS_DIFLAG_RTINHERIT) && + (sc->ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) && + sc->ip->i_extsize % sc->mp->m_sb.sb_rextsize > 0) { + sc->ip->i_cowextsize = 0; + sc->ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE; + } +} + /* Fix any irregularities in an inode that the verifiers don't catch. */ STATIC int xrep_inode_problems( @@ -1859,6 +1873,7 @@ xrep_inode_problems( if (S_ISDIR(VFS_I(sc->ip)->i_mode)) xrep_inode_dir_size(sc); xrep_inode_extsize(sc); + xrep_inode_cowextsize(sc); trace_xrep_inode_fixed(sc); xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE); From patchwork Sun Dec 31 21:53:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507748 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4D7CDDC9 for ; Sun, 31 Dec 2023 21:53:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nJfQpP8k" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 26B33C433C7; Sun, 31 Dec 2023 21:53:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059622; bh=yjMc713lFK8as+VvzfCOMa2Cg1yCuw3PTjb4D8+DjTM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=nJfQpP8k1mViKrGPNll11lGHE9snEzPR+etOrzkbZO2PABxghjUJ2/QxjKPsZLLQv V25+FMBFROg+FUxcYgGs/yqiiO3t+vGsZkZfIw1oCQLUnJYozKJkmeMdHo+HCZQDln hS8SMAn/e2zDtK1Bs8DPHxdCkJKPuE6i/Q3n8EWLz34/HpRdCbi8rZ1iPq9DATF0Lf sWQVa+vE2hJMMxyfGsYYJWyzexmT8/o8DIMLkqnPYrWgs+JCvrgp7gvAAjeUf/V4Ba ieZ/7iTsm51w3eqU8h5qYZ6ymrhS6CCjiOsX2+0utFjoY77J5ll3ABf/b398sPWt0w gN9b33TXdJ1bQ== Date: Sun, 31 Dec 2023 13:53:41 -0800 Subject: [PATCH 35/44] xfs: scrub the metadir path of rt refcount btree files From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852144.1766284.6583152944581273036.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata directory tree path to the refcount btree file for each rt group. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_fs.h | 3 ++- fs/xfs/scrub/metapath.c | 14 ++++++++++++++ 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 7847da61db232..4159e96d01ae6 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -806,9 +806,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_METAPATH_GRPQUOTA 3 #define XFS_SCRUB_METAPATH_PRJQUOTA 4 #define XFS_SCRUB_METAPATH_RTRMAPBT 5 +#define XFS_SCRUB_METAPATH_RTREFCBT 6 /* Number of metapath sm_ino values */ -#define XFS_SCRUB_METAPATH_NR 6 +#define XFS_SCRUB_METAPATH_NR 7 /* * ioctl limits diff --git a/fs/xfs/scrub/metapath.c b/fs/xfs/scrub/metapath.c index 6afd117c890e9..447bdbea210d2 100644 --- a/fs/xfs/scrub/metapath.c +++ b/fs/xfs/scrub/metapath.c @@ -23,6 +23,7 @@ #include "xfs_attr.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -106,6 +107,7 @@ xchk_setup_metapath( switch (sc->sm->sm_ino) { case XFS_SCRUB_METAPATH_RTRMAPBT: + case XFS_SCRUB_METAPATH_RTREFCBT: /* empty */ break; default: @@ -157,6 +159,18 @@ xchk_setup_metapath( xfs_rtgroup_put(rtg); } break; + case XFS_SCRUB_METAPATH_RTREFCBT: + error = xfs_rtrefcountbt_create_path(mp, sc->sm->sm_agno, + &path); + if (error) + return error; + mpath->path = path; + rtg = xfs_rtgroup_get(mp, sc->sm->sm_agno); + if (rtg) { + ip = rtg->rtg_refcountip; + xfs_rtgroup_put(rtg); + } + break; default: return -EINVAL; } From patchwork Sun Dec 31 21:53:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507749 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F046ADF42 for ; Sun, 31 Dec 2023 21:53:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Br0ulkxr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BFE6AC433C7; Sun, 31 Dec 2023 21:53:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059637; bh=DDNO4FndOxDLHSPKZP/cOZqLcB8hXXNNY+W37XP+w9c=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Br0ulkxrnChKKnaBcl8di8ALeWeenG0eRgYX2owaLM3AVTteIcRkAEaURFw2d9kKB +y38+ufvpJiuBBcEO/33Ei4TDI9fCyKd8vc6K2hvTOaAGFrZETerHIM256aBnXX9Il pNu5CtgfDdC57FHCeiKjq1jAagdFZGuZogcf4Z3e+aNGx7oaf03yYwCi2k9PHH2T4+ uHApcX+xyKuiIiKN8nAPssveiOJF+YIglK15KnEa+lcpZyhvSrittUS5oHe4+mv/wo fhbE4C8oPrszPebZsvzNuKtdD2M188KrLRUfP5E8jz8CLnGJal8w0ZdhdnXY84V/Wj v0b9JDKTT34DA== Date: Sun, 31 Dec 2023 13:53:57 -0800 Subject: [PATCH 36/44] xfs: don't flag quota rt block usage on rtreflink filesystems From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852160.1766284.10569753561139191538.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Quota space usage is allowed to exceed the size of the physical storage when reflink is enabled. Now that we have reflink for the realtime volume, apply this same logic to the rtb repair logic. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/quota_repair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/scrub/quota_repair.c b/fs/xfs/scrub/quota_repair.c index 0bab4c30cb85a..4e7198073a64a 100644 --- a/fs/xfs/scrub/quota_repair.c +++ b/fs/xfs/scrub/quota_repair.c @@ -236,7 +236,7 @@ xrep_quota_item( rqi->need_quotacheck = true; dirty = true; } - if (dq->q_rtb.count > mp->m_sb.sb_rblocks) { + if (!xfs_has_reflink(mp) && dq->q_rtb.count > mp->m_sb.sb_rblocks) { dq->q_rtb.reserved -= dq->q_rtb.count; dq->q_rtb.reserved += mp->m_sb.sb_rblocks; dq->q_rtb.count = mp->m_sb.sb_rblocks; From patchwork Sun Dec 31 21:54:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507750 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F9BFDF51 for ; Sun, 31 Dec 2023 21:54:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mwR6GhDm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6C7DAC433C7; Sun, 31 Dec 2023 21:54:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059653; bh=WmWM0LxfWAFIYIsKnjfwElglz+CZ/h9IU8iQDb48Dfc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=mwR6GhDm4QumFYeOjYKrvLY+5vkKo1ghqBNCRt4Kr8+gxHXD6f7jEYBnb2v78ebvC YGvktT2UHWprRewvRKrUIRL5txmrtjSVh7wBkCtJay2lGjUlfC4wZy1/V4eBLC/jHO X6BkfoZ8FL1fJRvJBjeMSgl3MRNwdiGIuo1TvIfVeIDTT3QE9PA5EREBo+oiMEESxa hDXaWVI0iGONbB/Ug7sSG8RFyynBzt6BJxGvhrZABKfzNVQPUTOCrNo3sJuZ6cgdLI 05nprVXUmLvR6H25n0cCDU91E0dnXx4wmhE4WyQnED7lcmbukUXOKQ55Znl5D4iZDs hK7c5qOkF0x8w== Date: Sun, 31 Dec 2023 13:54:12 -0800 Subject: [PATCH 37/44] xfs: check new rtbitmap records against rt refcount btree From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852175.1766284.18016890871197639875.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When we're rebuilding the realtime bitmap, check the proposed free extents against the rt refcount btree to make sure we don't commit any grievous errors. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/repair.c | 7 +++++++ fs/xfs/scrub/rtbitmap_repair.c | 24 +++++++++++++++++++++++- 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index c9c538083c722..fdd12933fd47f 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -41,6 +41,7 @@ #include "xfs_rtgroup.h" #include "xfs_rtalloc.h" #include "xfs_imeta.h" +#include "xfs_rtrefcount_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -1008,6 +1009,12 @@ xrep_rtgroup_btcur_init( xfs_has_rtrmapbt(mp)) sr->rmap_cur = xfs_rtrmapbt_init_cursor(mp, sc->tp, sr->rtg, sr->rtg->rtg_rmapip); + + if (sc->sm->sm_type != XFS_SCRUB_TYPE_RTREFCBT && + (sr->rtlock_flags & XFS_RTGLOCK_REFCOUNT) && + xfs_has_rtreflink(mp)) + sr->refc_cur = xfs_rtrefcountbt_init_cursor(mp, sc->tp, + sr->rtg, sr->rtg->rtg_refcountip); } /* diff --git a/fs/xfs/scrub/rtbitmap_repair.c b/fs/xfs/scrub/rtbitmap_repair.c index db87ce51c35fc..24d7d0ff49ea3 100644 --- a/fs/xfs/scrub/rtbitmap_repair.c +++ b/fs/xfs/scrub/rtbitmap_repair.c @@ -22,6 +22,7 @@ #include "xfs_swapext.h" #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" +#include "xfs_refcount.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -418,7 +419,8 @@ xrep_rgbitmap_mark_free( xfs_rgblock_t rgbno) { struct xfs_mount *mp = rgb->sc->mp; - struct xfs_rtgroup *rtg = rgb->sc->sr.rtg; + struct xchk_rt *sr = &rgb->sc->sr; + struct xfs_rtgroup *rtg = sr->rtg; xfs_rtblock_t rtbno; xfs_rtxnum_t startrtx; xfs_rtxnum_t nextrtx; @@ -427,6 +429,7 @@ xrep_rgbitmap_mark_free( unsigned int bufwsize; xfs_extlen_t mod; xfs_rtword_t mask; + enum xbtree_recpacking outcome; int error; if (!xfs_verify_rgbext(rtg, rgb->next_rgbno, rgbno - rgb->next_rgbno)) @@ -446,6 +449,25 @@ xrep_rgbitmap_mark_free( if (mod != mp->m_sb.sb_rextsize - 1) return -EFSCORRUPTED; + /* Must not be shared or CoW staging. */ + if (sr->refc_cur) { + error = xfs_refcount_has_records(sr->refc_cur, + XFS_REFC_DOMAIN_SHARED, rgb->next_rgbno, + rgbno - rgb->next_rgbno, &outcome); + if (error) + return error; + if (outcome != XBTREE_RECPACKING_EMPTY) + return -EFSCORRUPTED; + + error = xfs_refcount_has_records(sr->refc_cur, + XFS_REFC_DOMAIN_COW, rgb->next_rgbno, + rgbno - rgb->next_rgbno, &outcome); + if (error) + return error; + if (outcome != XBTREE_RECPACKING_EMPTY) + return -EFSCORRUPTED; + } + trace_xrep_rgbitmap_record_free(mp, startrtx, nextrtx - 1); /* Set bits as needed to round startrtx up to the nearest word. */ From patchwork Sun Dec 31 21:54:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507751 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BE5CDF55 for ; Sun, 31 Dec 2023 21:54:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Q7Iij1HQ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0808BC433C9; Sun, 31 Dec 2023 21:54:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059669; bh=gAL6YZhhGcnrZFs+LduFyWW5xTPuE+KaZqVjk9XpD7o=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Q7Iij1HQcIqj1MTU1F2rBQS3wVV+bkL1gh2Vw0uFqhP56EjUq5UI8qpRhBrtNrN83 BP8AlgARvZXvJtqIep3KjcxfhbOqCiso6M2MFOVkA1J2Y6pDzIrEFwj/XPLqvNunxi SorNfPcxn8KYYc+tfwfUdAXcOe5R9/nTxFY4io8jzLOM4rmu4DhTpIW+u1mS5gsH6c MZTCDAMsgARDjDDC6yw/NuIsvR/NCNP9e10UB+wE4xWx84BVxdMas22cn1UgJAzSDr xIwkH+I115acomdw3yWd7LNFN/WImwvP3nt0Gsk5QaCbP3tYFHJW9enOY3OoyB3rcw iP/Q5mC73pJlQ== Date: Sun, 31 Dec 2023 13:54:28 -0800 Subject: [PATCH 38/44] xfs: walk the rt reference count tree when rebuilding rmap From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852191.1766284.17012719067221063791.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When we're rebuilding the data device rmap, if we encounter a "refcount" format fork, we have to walk the (realtime) refcount btree inode to build the appropriate mappings. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/rmap_repair.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/fs/xfs/scrub/rmap_repair.c b/fs/xfs/scrub/rmap_repair.c index 7733334a1faa9..0f3783aaaa997 100644 --- a/fs/xfs/scrub/rmap_repair.c +++ b/fs/xfs/scrub/rmap_repair.c @@ -32,6 +32,7 @@ #include "xfs_ag.h" #include "xfs_rtrmap_btree.h" #include "xfs_rtgroup.h" +#include "xfs_rtrefcount_btree.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -547,6 +548,39 @@ xrep_rmap_scan_rtrmapbt( return -EFSCORRUPTED; } +static int +xrep_rmap_scan_rtrefcountbt( + struct xrep_rmap_ifork *rf, + struct xfs_inode *ip) +{ + struct xfs_scrub *sc = rf->rr->sc; + struct xfs_btree_cur *cur; + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + int error; + + if (rf->whichfork != XFS_DATA_FORK) + return -EFSCORRUPTED; + + for_each_rtgroup(sc->mp, rgno, rtg) { + if (ip == rtg->rtg_refcountip) { + cur = xfs_rtrefcountbt_init_cursor(sc->mp, sc->tp, rtg, + ip); + error = xrep_rmap_scan_iroot_btree(rf, cur); + xfs_btree_del_cursor(cur, error); + xfs_rtgroup_rele(rtg); + return error; + } + } + + /* + * We shouldn't find a refcount format inode that isn't associated with + * an rtgroup! + */ + ASSERT(0); + return -EFSCORRUPTED; +} + /* Find all the extents from a given AG in an inode fork. */ STATIC int xrep_rmap_scan_ifork( @@ -578,6 +612,8 @@ xrep_rmap_scan_ifork( return error; } else if (ifp->if_format == XFS_DINODE_FMT_RMAP) { return xrep_rmap_scan_rtrmapbt(&rf, ip); + } else if (ifp->if_format == XFS_DINODE_FMT_REFCOUNT) { + return xrep_rmap_scan_rtrefcountbt(&rf, ip); } else if (ifp->if_format != XFS_DINODE_FMT_EXTENTS) { return 0; } From patchwork Sun Dec 31 21:54:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507752 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E009F9CC for ; Sun, 31 Dec 2023 21:54:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="U0old2M0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9FDD1C433C7; Sun, 31 Dec 2023 21:54:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059684; bh=63KsGTUQmtVOV0315Krqy30xUHhFdvQrjj0iiOwAh/I=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=U0old2M0xhLXRuzV5aT6uPYejyvIBQJbvTHmOq5j1XJUDo/l1yBQbfDZ1gkKHuewZ 5SN1RBp+Btas/WHC4B0ISVtFHSw+VGCB/UNp1ur1lUApzu06IYM4pRUircX5RN9SRP R1Cpwksnvp2akT+7CrCzs8lnILwFzJ0nWhffNoAtSpepks6gVc2UBoPDkqlqUUqvrK 30fWxM5hz7ouYe+hEWpH/zgMayetYLEmmnUVEDEHUOMaRu8y4/6mgc7u3eL2ym8G6D WCbVcNmMhyrptShjfXWdHnnmlt87/SGEZV+mUitKZO5Lv3fiLDMsCbP6VDTtS12lZn Y2Otzxik3HPtg== Date: Sun, 31 Dec 2023 13:54:44 -0800 Subject: [PATCH 39/44] xfs: capture realtime CoW staging extents when rebuilding rt rmapbt From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852207.1766284.3176388270914377798.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Walk the realtime refcount btree to find the CoW staging extents when we're rebuilding the realtime rmap btree. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/repair.h | 1 fs/xfs/scrub/rgb_bitmap.h | 37 +++++++++++++++ fs/xfs/scrub/rtrmap_repair.c | 103 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 141 insertions(+) create mode 100644 fs/xfs/scrub/rgb_bitmap.h diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index a382ba0478aa0..61c6bc31a266b 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -49,6 +49,7 @@ xrep_trans_commit( struct xbitmap; struct xagb_bitmap; +struct xrgb_bitmap; struct xfsb_bitmap; int xrep_fix_freelist(struct xfs_scrub *sc, int alloc_flags); diff --git a/fs/xfs/scrub/rgb_bitmap.h b/fs/xfs/scrub/rgb_bitmap.h new file mode 100644 index 0000000000000..47a5caf3a230d --- /dev/null +++ b/fs/xfs/scrub/rgb_bitmap.h @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2020-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_SCRUB_RGB_BITMAP_H__ +#define __XFS_SCRUB_RGB_BITMAP_H__ + +/* Bitmaps, but for type-checked for xfs_rgblock_t */ + +struct xrgb_bitmap { + struct xbitmap32 rgbitmap; +}; + +static inline void xrgb_bitmap_init(struct xrgb_bitmap *bitmap) +{ + xbitmap32_init(&bitmap->rgbitmap); +} + +static inline void xrgb_bitmap_destroy(struct xrgb_bitmap *bitmap) +{ + xbitmap32_destroy(&bitmap->rgbitmap); +} + +static inline int xrgb_bitmap_set(struct xrgb_bitmap *bitmap, + xfs_rgblock_t start, xfs_extlen_t len) +{ + return xbitmap32_set(&bitmap->rgbitmap, start, len); +} + +static inline int xrgb_bitmap_walk(struct xrgb_bitmap *bitmap, + xbitmap32_walk_fn fn, void *priv) +{ + return xbitmap32_walk(&bitmap->rgbitmap, fn, priv); +} + +#endif /* __XFS_SCRUB_RGB_BITMAP_H__ */ diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c index 42df1cf45ae0b..a40afa571b981 100644 --- a/fs/xfs/scrub/rtrmap_repair.c +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -29,6 +29,7 @@ #include "xfs_rtalloc.h" #include "xfs_ag.h" #include "xfs_rtgroup.h" +#include "xfs_refcount.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -37,6 +38,7 @@ #include "scrub/repair.h" #include "scrub/bitmap.h" #include "scrub/fsb_bitmap.h" +#include "scrub/rgb_bitmap.h" #include "scrub/xfile.h" #include "scrub/xfarray.h" #include "scrub/iscan.h" @@ -436,6 +438,100 @@ xrep_rtrmap_scan_ag( return error; } +struct xrep_rtrmap_stash_run { + struct xrep_rtrmap *rr; + uint64_t owner; +}; + +static int +xrep_rtrmap_stash_run( + uint32_t start, + uint32_t len, + void *priv) +{ + struct xrep_rtrmap_stash_run *rsr = priv; + struct xrep_rtrmap *rr = rsr->rr; + xfs_rgblock_t rgbno = start; + + return xrep_rtrmap_stash(rr, rgbno, len, rsr->owner, 0, 0); +} + +/* + * Emit rmaps for every extent of bits set in the bitmap. Caller must ensure + * that the ranges are in units of FS blocks. + */ +STATIC int +xrep_rtrmap_stash_bitmap( + struct xrep_rtrmap *rr, + struct xrgb_bitmap *bitmap, + const struct xfs_owner_info *oinfo) +{ + struct xrep_rtrmap_stash_run rsr = { + .rr = rr, + .owner = oinfo->oi_owner, + }; + + return xrgb_bitmap_walk(bitmap, xrep_rtrmap_stash_run, &rsr); +} + +/* Record a CoW staging extent. */ +STATIC int +xrep_rtrmap_walk_cowblocks( + struct xfs_btree_cur *cur, + const struct xfs_refcount_irec *irec, + void *priv) +{ + struct xrgb_bitmap *bitmap = priv; + + if (!xfs_refcount_check_domain(irec) || + irec->rc_domain != XFS_REFC_DOMAIN_COW) + return -EFSCORRUPTED; + + return xrgb_bitmap_set(bitmap, irec->rc_startblock, + irec->rc_blockcount); +} + +/* + * Collect rmaps for the blocks containing the refcount btree, and all CoW + * staging extents. + */ +STATIC int +xrep_rtrmap_find_refcount_rmaps( + struct xrep_rtrmap *rr) +{ + struct xrgb_bitmap cow_blocks; /* COWBIT */ + struct xfs_refcount_irec low = { + .rc_startblock = 0, + .rc_domain = XFS_REFC_DOMAIN_COW, + }; + struct xfs_refcount_irec high = { + .rc_startblock = -1U, + .rc_domain = XFS_REFC_DOMAIN_COW, + }; + struct xfs_scrub *sc = rr->sc; + int error; + + if (!xfs_has_rtreflink(sc->mp)) + return 0; + + xrgb_bitmap_init(&cow_blocks); + + /* Collect rmaps for CoW staging extents. */ + error = xfs_refcount_query_range(sc->sr.refc_cur, &low, &high, + xrep_rtrmap_walk_cowblocks, &cow_blocks); + if (error) + goto out_bitmap; + + /* Generate rmaps for everything. */ + error = xrep_rtrmap_stash_bitmap(rr, &cow_blocks, &XFS_RMAP_OINFO_COW); + if (error) + goto out_bitmap; + +out_bitmap: + xrgb_bitmap_destroy(&cow_blocks); + return error; +} + /* Count and check all collected records. */ STATIC int xrep_rtrmap_check_record( @@ -483,6 +579,13 @@ xrep_rtrmap_find_rmaps( if (error) return error; + /* Find CoW staging extents. */ + xrep_rtgroup_btcur_init(sc, &sc->sr); + error = xrep_rtrmap_find_refcount_rmaps(rr); + xchk_rtgroup_btcur_free(&sc->sr); + if (error) + return error; + /* * Set up for a potentially lengthy filesystem scan by reducing our * transaction resource usage for the duration. Specifically: From patchwork Sun Dec 31 21:54:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507753 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFB97F9C3 for ; Sun, 31 Dec 2023 21:55:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VgyZFOw/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C1CFC433C7; Sun, 31 Dec 2023 21:55:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059700; bh=2Y+pKHnKVaUKglU8PLyDOKArqXAIV7IgaahFzp9X51c=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=VgyZFOw/JOWdQw6fUOdy7A7N6EwZm6LiY6F+c3lw7kj1w/0yXl1Uol+yhBVD/wLVz 23GOUqeRIiXklyRUCB75wccPiY2YF2RiYh661cIqZoaeGatklPTUfq8p2dof6fw4qP RsOqvflHnb6UC//WKu10g+nxXE3Fs7/VDZenbczlzMwHqj+PPR1KstE60oWF8nvAvO oKdzaYfh/3T6sYl4oYIfLNy7xPLwb4t3adN8FHDoi0yFx2mUqRQ6l3LZC4Ozd+zJFt WtBEdafWHb+becF27nVvAkXdBrPh0HGz5x3GnWJqgoG0hJOKulBZs0HoCkAqvCalXx VP8S7y++fR72A== Date: Sun, 31 Dec 2023 13:54:59 -0800 Subject: [PATCH 40/44] xfs: online repair of the realtime refcount btree From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852223.1766284.15728102213006370259.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Port the data device's refcount btree repair code to the realtime refcount btree. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/scrub/bmap_repair.c | 2 fs/xfs/scrub/repair.c | 20 + fs/xfs/scrub/repair.h | 7 fs/xfs/scrub/rtrefcount.c | 9 fs/xfs/scrub/rtrefcount_repair.c | 781 ++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/rtrmap_repair.c | 2 fs/xfs/scrub/scrub.c | 2 fs/xfs/scrub/trace.h | 31 ++ 9 files changed, 847 insertions(+), 8 deletions(-) create mode 100644 fs/xfs/scrub/rtrefcount_repair.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 4d4f340d904fc..a779030fd91db 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -237,6 +237,7 @@ xfs-y += $(addprefix scrub/, \ xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ rgsuper_repair.o \ rtbitmap_repair.o \ + rtrefcount_repair.o \ rtrmap_repair.o \ rtsummary_repair.o \ ) diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 26dc04923944b..98618821ad975 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -397,7 +397,7 @@ xrep_bmap_check_rtfork_rmap( /* Make sure this isn't free space. */ rtbno = xfs_rgbno_to_rtb(sc->mp, cur->bc_ino.rtg->rtg_rgno, rec->rm_startblock); - return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount); + return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount, false); } /* Record realtime extents that belong to this inode's fork. */ diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index fdd12933fd47f..72aa1e522910b 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -1039,21 +1039,31 @@ xrep_rtgroup_init( return 0; } -/* Ensure that all rt blocks in the given range are not marked free. */ +/* + * Ensure that all rt blocks in the given range are not marked free. If + * @must_align is true, then both ends must be aligned to a rt extent. + */ int xrep_require_rtext_inuse( struct xfs_scrub *sc, xfs_rtblock_t rtbno, - xfs_filblks_t len) + xfs_filblks_t len, + bool must_align) { struct xfs_mount *mp = sc->mp; xfs_rtxnum_t startrtx; xfs_rtxnum_t endrtx; + xfs_extlen_t mod; bool is_free = false; int error; - startrtx = xfs_rtb_to_rtx(mp, rtbno); - endrtx = xfs_rtb_to_rtx(mp, rtbno + len - 1); + startrtx = xfs_rtb_to_rtxrem(mp, rtbno, &mod); + if (must_align && mod != 0) + return -EFSCORRUPTED; + + endrtx = xfs_rtb_to_rtxrem(mp, rtbno + len - 1, &mod); + if (must_align && mod != mp->m_sb.sb_rextsize - 1) + return -EFSCORRUPTED; error = xfs_rtalloc_extent_is_free(mp, sc->tp, startrtx, endrtx - startrtx + 1, &is_free); @@ -1341,6 +1351,8 @@ xrep_is_rtmeta_ino( /* Newer rt metadata files are not guaranteed to exist */ if (rtg->rtg_rmapip && ino == rtg->rtg_rmapip->i_ino) return true; + if (rtg->rtg_refcountip && ino == rtg->rtg_refcountip->i_ino) + return true; return false; } diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 61c6bc31a266b..969317d429db6 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -98,6 +98,7 @@ int xrep_setup_nlinks(struct xfs_scrub *sc); int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *resblks); int xrep_setup_dirtree(struct xfs_scrub *sc); int xrep_setup_rtrmapbt(struct xfs_scrub *sc); +int xrep_setup_rtrefcountbt(struct xfs_scrub *sc); /* Repair setup functions */ int xrep_setup_ag_allocbt(struct xfs_scrub *sc); @@ -113,7 +114,7 @@ int xrep_rtgroup_init(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, struct xchk_rt *sr, unsigned int rtglock_flags); void xrep_rtgroup_btcur_init(struct xfs_scrub *sc, struct xchk_rt *sr); int xrep_require_rtext_inuse(struct xfs_scrub *sc, xfs_rtblock_t rtbno, - xfs_filblks_t len); + xfs_filblks_t len, bool must_align); xfs_extlen_t xrep_calc_rtgroup_resblks(struct xfs_scrub *sc); #else # define xrep_rtgroup_init(sc, rtg, sr, lockflags) (-ENOSYS) @@ -160,12 +161,14 @@ int xrep_rtsummary(struct xfs_scrub *sc); int xrep_rgsuperblock(struct xfs_scrub *sc); int xrep_rgbitmap(struct xfs_scrub *sc); int xrep_rtrmapbt(struct xfs_scrub *sc); +int xrep_rtrefcountbt(struct xfs_scrub *sc); #else # define xrep_rtbitmap xrep_notsupported # define xrep_rtsummary xrep_notsupported # define xrep_rgsuperblock xrep_notsupported # define xrep_rgbitmap xrep_notsupported # define xrep_rtrmapbt xrep_notsupported +# define xrep_rtrefcountbt xrep_notsupported #endif /* CONFIG_XFS_RT */ #ifdef CONFIG_XFS_QUOTA @@ -239,6 +242,7 @@ xrep_setup_nothing( #define xrep_setup_dirtree xrep_setup_nothing #define xrep_setup_metapath xrep_setup_nothing #define xrep_setup_rtrmapbt xrep_setup_nothing +#define xrep_setup_rtrefcountbt xrep_setup_nothing #define xrep_setup_inode(sc, imap) ((void)0) @@ -278,6 +282,7 @@ static inline int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *x) #define xrep_rgsuperblock xrep_notsupported #define xrep_rgbitmap xrep_notsupported #define xrep_rtrmapbt xrep_notsupported +#define xrep_rtrefcountbt xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/rtrefcount.c b/fs/xfs/scrub/rtrefcount.c index 058a0ea18e09b..8924ad8b680f0 100644 --- a/fs/xfs/scrub/rtrefcount.c +++ b/fs/xfs/scrub/rtrefcount.c @@ -7,8 +7,10 @@ #include "xfs_fs.h" #include "xfs_shared.h" #include "xfs_format.h" +#include "xfs_log_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" +#include "xfs_trans.h" #include "xfs_btree.h" #include "xfs_rmap.h" #include "xfs_refcount.h" @@ -21,6 +23,7 @@ #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/btree.h" +#include "scrub/repair.h" /* Set us up with the realtime refcount metadata locked. */ int @@ -34,6 +37,12 @@ xchk_setup_rtrefcountbt( if (xchk_need_intent_drain(sc)) xchk_fsgates_enable(sc, XCHK_FSGATES_DRAIN); + if (xchk_could_repair(sc)) { + error = xrep_setup_rtrefcountbt(sc); + if (error) + return error; + } + rtg = xfs_rtgroup_get(mp, sc->sm->sm_agno); if (!rtg) return -ENOENT; diff --git a/fs/xfs/scrub/rtrefcount_repair.c b/fs/xfs/scrub/rtrefcount_repair.c new file mode 100644 index 0000000000000..3034ca88be8b4 --- /dev/null +++ b/fs/xfs/scrub/rtrefcount_repair.c @@ -0,0 +1,781 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2021-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_alloc.h" +#include "xfs_ialloc.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_refcount.h" +#include "xfs_rtrefcount_btree.h" +#include "xfs_error.h" +#include "xfs_health.h" +#include "xfs_inode.h" +#include "xfs_quota.h" +#include "xfs_rtalloc.h" +#include "xfs_ag.h" +#include "xfs_rtgroup.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" +#include "scrub/fsb_bitmap.h" +#include "scrub/xfile.h" +#include "scrub/xfarray.h" +#include "scrub/newbt.h" +#include "scrub/reap.h" +#include "scrub/rcbag.h" + +/* + * Rebuilding the Reference Count Btree + * ==================================== + * + * This algorithm is "borrowed" from xfs_repair. Imagine the rmap + * entries as rectangles representing extents of physical blocks, and + * that the rectangles can be laid down to allow them to overlap each + * other; then we know that we must emit a refcnt btree entry wherever + * the amount of overlap changes, i.e. the emission stimulus is + * level-triggered: + * + * - --- + * -- ----- ---- --- ------ + * -- ---- ----------- ---- --------- + * -------------------------------- ----------- + * ^ ^ ^^ ^^ ^ ^^ ^^^ ^^^^ ^ ^^ ^ ^ ^ + * 2 1 23 21 3 43 234 2123 1 01 2 3 0 + * + * For our purposes, a rmap is a tuple (startblock, len, fileoff, owner). + * + * Note that in the actual refcnt btree we don't store the refcount < 2 + * cases because the bnobt tells us which blocks are free; single-use + * blocks aren't recorded in the bnobt or the refcntbt. If the rmapbt + * supports storing multiple entries covering a given block we could + * theoretically dispense with the refcntbt and simply count rmaps, but + * that's inefficient in the (hot) write path, so we'll take the cost of + * the extra tree to save time. Also there's no guarantee that rmap + * will be enabled. + * + * Given an array of rmaps sorted by physical block number, a starting + * physical block (sp), a bag to hold rmaps that cover sp, and the next + * physical block where the level changes (np), we can reconstruct the + * rt refcount btree as follows: + * + * While there are still unprocessed rmaps in the array, + * - Set sp to the physical block (pblk) of the next unprocessed rmap. + * - Add to the bag all rmaps in the array where startblock == sp. + * - Set np to the physical block where the bag size will change. This + * is the minimum of (the pblk of the next unprocessed rmap) and + * (startblock + len of each rmap in the bag). + * - Record the bag size as old_bag_size. + * + * - While the bag isn't empty, + * - Remove from the bag all rmaps where startblock + len == np. + * - Add to the bag all rmaps in the array where startblock == np. + * - If the bag size isn't old_bag_size, store the refcount entry + * (sp, np - sp, bag_size) in the refcnt btree. + * - If the bag is empty, break out of the inner loop. + * - Set old_bag_size to the bag size + * - Set sp = np. + * - Set np to the physical block where the bag size will change. + * This is the minimum of (the pblk of the next unprocessed rmap) + * and (startblock + len of each rmap in the bag). + * + * Like all the other repairers, we make a list of all the refcount + * records we need, then reinitialize the rt refcount btree root and + * insert all the records. + */ + +struct xrep_rtrefc { + /* refcount extents */ + struct xfarray *refcount_records; + + /* new refcountbt information */ + struct xrep_newbt new_btree; + + /* old refcountbt blocks */ + struct xfsb_bitmap old_rtrefcountbt_blocks; + + struct xfs_scrub *sc; + + /* get_records()'s position in the rt refcount record array. */ + xfarray_idx_t array_cur; + + /* # of refcountbt blocks */ + xfs_filblks_t btblocks; +}; + +/* Set us up to repair refcount btrees. */ +int +xrep_setup_rtrefcountbt( + struct xfs_scrub *sc) +{ + char *descr; + int error; + + descr = xchk_xfile_ag_descr(sc, "rmap record bag"); + error = xrep_setup_buftarg(sc, descr); + kfree(descr); + return error; +} + +/* Check for any obvious conflicts with this shared/CoW staging extent. */ +STATIC int +xrep_rtrefc_check_ext( + struct xfs_scrub *sc, + const struct xfs_refcount_irec *rec) +{ + xfs_rtblock_t rtbno; + + if (xfs_rtrefcount_check_irec(sc->sr.rtg, rec) != NULL) + return -EFSCORRUPTED; + + /* Make sure this isn't free space or misaligned. */ + rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, + rec->rc_startblock); + return xrep_require_rtext_inuse(sc, rtbno, rec->rc_blockcount, true); +} + +/* Record a reference count extent. */ +STATIC int +xrep_rtrefc_stash( + struct xrep_rtrefc *rr, + enum xfs_refc_domain domain, + xfs_rgblock_t bno, + xfs_extlen_t len, + uint64_t refcount) +{ + struct xfs_refcount_irec irec = { + .rc_startblock = bno, + .rc_blockcount = len, + .rc_refcount = refcount, + .rc_domain = domain, + }; + int error = 0; + + if (xchk_should_terminate(rr->sc, &error)) + return error; + + irec.rc_refcount = min_t(uint64_t, XFS_REFC_REFCOUNT_MAX, refcount); + + error = xrep_rtrefc_check_ext(rr->sc, &irec); + if (error) + return error; + + trace_xrep_rtrefc_found(rr->sc->sr.rtg, &irec); + + return xfarray_append(rr->refcount_records, &irec); +} + +/* Record a CoW staging extent. */ +STATIC int +xrep_rtrefc_stash_cow( + struct xrep_rtrefc *rr, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + return xrep_rtrefc_stash(rr, XFS_REFC_DOMAIN_COW, bno, len, 1); +} + +/* Decide if an rmap could describe a shared extent. */ +static inline bool +xrep_rtrefc_rmap_shareable( + const struct xfs_rmap_irec *rmap) +{ + /* rt metadata are never sharable */ + if (XFS_RMAP_NON_INODE_OWNER(rmap->rm_owner)) + return false; + + /* Unwritten file blocks are not shareable. */ + if (rmap->rm_flags & XFS_RMAP_UNWRITTEN) + return false; + + return true; +} + +/* Grab the next (abbreviated) rmap record from the rmapbt. */ +STATIC int +xrep_rtrefc_walk_rmaps( + struct xrep_rtrefc *rr, + struct xfs_rmap_irec *rmap, + bool *have_rec) +{ + struct xfs_btree_cur *cur = rr->sc->sr.rmap_cur; + struct xfs_mount *mp = cur->bc_mp; + int have_gt; + int error = 0; + + *have_rec = false; + + /* + * Loop through the remaining rmaps. Remember CoW staging + * extents and the refcountbt blocks from the old tree for later + * disposal. We can only share written data fork extents, so + * keep looping until we find an rmap for one. + */ + do { + if (xchk_should_terminate(rr->sc, &error)) + return error; + + error = xfs_btree_increment(cur, 0, &have_gt); + if (error) + return error; + if (!have_gt) + return 0; + + error = xfs_rmap_get_rec(cur, rmap, &have_gt); + if (error) + return error; + if (XFS_IS_CORRUPT(mp, !have_gt)) { + xfs_btree_mark_sick(cur); + return -EFSCORRUPTED; + } + + if (rmap->rm_owner == XFS_RMAP_OWN_COW) { + error = xrep_rtrefc_stash_cow(rr, rmap->rm_startblock, + rmap->rm_blockcount); + if (error) + return error; + } else if (xfs_internal_inum(mp, rmap->rm_owner) || + (rmap->rm_flags & (XFS_RMAP_ATTR_FORK | + XFS_RMAP_BMBT_BLOCK))) { + xfs_btree_mark_sick(cur); + return -EFSCORRUPTED; + } + } while (!xrep_rtrefc_rmap_shareable(rmap)); + + *have_rec = true; + return 0; +} + +static inline uint32_t +xrep_rtrefc_encode_startblock( + const struct xfs_refcount_irec *irec) +{ + uint32_t start; + + start = irec->rc_startblock & ~XFS_REFC_COWFLAG; + if (irec->rc_domain == XFS_REFC_DOMAIN_COW) + start |= XFS_REFC_COWFLAG; + + return start; +} + +/* + * Compare two refcount records. We want to sort in order of increasing block + * number. + */ +static int +xrep_rtrefc_extent_cmp( + const void *a, + const void *b) +{ + const struct xfs_refcount_irec *ap = a; + const struct xfs_refcount_irec *bp = b; + uint32_t sa, sb; + + sa = xrep_rtrefc_encode_startblock(ap); + sb = xrep_rtrefc_encode_startblock(bp); + + if (sa > sb) + return 1; + if (sa < sb) + return -1; + return 0; +} + +/* + * Sort the refcount extents by startblock or else the btree records will be in + * the wrong order. Make sure the records do not overlap in physical space. + */ +STATIC int +xrep_rtrefc_sort_records( + struct xrep_rtrefc *rr) +{ + struct xfs_refcount_irec irec; + xfarray_idx_t cur; + enum xfs_refc_domain dom = XFS_REFC_DOMAIN_SHARED; + xfs_rgblock_t next_rgbno = 0; + int error; + + error = xfarray_sort(rr->refcount_records, xrep_rtrefc_extent_cmp, + XFARRAY_SORT_KILLABLE); + if (error) + return error; + + foreach_xfarray_idx(rr->refcount_records, cur) { + if (xchk_should_terminate(rr->sc, &error)) + return error; + + error = xfarray_load(rr->refcount_records, cur, &irec); + if (error) + return error; + + if (dom == XFS_REFC_DOMAIN_SHARED && + irec.rc_domain == XFS_REFC_DOMAIN_COW) { + dom = irec.rc_domain; + next_rgbno = 0; + } + + if (dom != irec.rc_domain) + return -EFSCORRUPTED; + if (irec.rc_startblock < next_rgbno) + return -EFSCORRUPTED; + + next_rgbno = irec.rc_startblock + irec.rc_blockcount; + } + + return error; +} + +/* Record extents that belong to the realtime refcount inode. */ +STATIC int +xrep_rtrefc_walk_rmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_rtrefc *rr = priv; + struct xfs_mount *mp = cur->bc_mp; + xfs_fsblock_t fsbno; + int error = 0; + + if (xchk_should_terminate(rr->sc, &error)) + return error; + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != rr->sc->ip->i_ino) + return 0; + + error = xrep_check_ino_btree_mapping(rr->sc, rec); + if (error) + return error; + + fsbno = XFS_AGB_TO_FSB(mp, cur->bc_ag.pag->pag_agno, + rec->rm_startblock); + + return xfsb_bitmap_set(&rr->old_rtrefcountbt_blocks, fsbno, + rec->rm_blockcount); +} + +/* + * Walk forward through the rmap btree to collect all rmaps starting at + * @bno in @rmap_bag. These represent the file(s) that share ownership of + * the current block. Upon return, the rmap cursor points to the last record + * satisfying the startblock constraint. + */ +static int +xrep_rtrefc_push_rmaps_at( + struct xrep_rtrefc *rr, + struct rcbag *rcstack, + xfs_rgblock_t bno, + struct xfs_rmap_irec *rmap, + bool *have) +{ + struct xfs_scrub *sc = rr->sc; + int have_gt; + int error; + + while (*have && rmap->rm_startblock == bno) { + error = rcbag_add(rcstack, rr->sc->tp, rmap); + if (error) + return error; + + error = xrep_rtrefc_walk_rmaps(rr, rmap, have); + if (error) + return error; + } + + error = xfs_btree_decrement(sc->sr.rmap_cur, 0, &have_gt); + if (error) + return error; + if (XFS_IS_CORRUPT(sc->mp, !have_gt)) { + xfs_btree_mark_sick(sc->sr.rmap_cur); + return -EFSCORRUPTED; + } + + return 0; +} + +/* Scan one AG for reverse mappings for the realtime refcount btree. */ +STATIC int +xrep_rtrefc_scan_ag( + struct xrep_rtrefc *rr, + struct xfs_perag *pag) +{ + struct xfs_scrub *sc = rr->sc; + int error; + + error = xrep_ag_init(sc, pag, &sc->sa); + if (error) + return error; + + error = xfs_rmap_query_all(sc->sa.rmap_cur, xrep_rtrefc_walk_rmap, rr); + xchk_ag_free(sc, &sc->sa); + return error; +} + +/* Iterate all the rmap records to generate reference count data. */ +STATIC int +xrep_rtrefc_find_refcounts( + struct xrep_rtrefc *rr) +{ + struct xfs_scrub *sc = rr->sc; + struct rcbag *rcstack; + struct xfs_perag *pag; + uint64_t old_stack_height; + xfs_rgblock_t sbno; + xfs_rgblock_t cbno; + xfs_rgblock_t nbno; + xfs_agnumber_t agno; + bool have; + int error; + + /* Scan for old rtrefc btree blocks. */ + for_each_perag(sc->mp, agno, pag) { + error = xrep_rtrefc_scan_ag(rr, pag); + if (error) { + xfs_perag_rele(pag); + return error; + } + } + + xrep_rtgroup_btcur_init(sc, &sc->sr); + + /* + * Set up a bag to store all the rmap records that we're tracking to + * generate a reference count record. If this exceeds + * XFS_REFC_REFCOUNT_MAX, we clamp rc_refcount. + */ + error = rcbag_init(sc->mp, sc->xfile_buftarg, &rcstack); + if (error) + goto out_cur; + + /* Start the rtrmapbt cursor to the left of all records. */ + error = xfs_btree_goto_left_edge(sc->sr.rmap_cur); + if (error) + goto out_bag; + + /* Process reverse mappings into refcount data. */ + while (xfs_btree_has_more_records(sc->sr.rmap_cur)) { + struct xfs_rmap_irec rmap; + + /* Push all rmaps with pblk == sbno onto the stack */ + error = xrep_rtrefc_walk_rmaps(rr, &rmap, &have); + if (error) + goto out_bag; + if (!have) + break; + sbno = cbno = rmap.rm_startblock; + error = xrep_rtrefc_push_rmaps_at(rr, rcstack, sbno, &rmap, + &have); + if (error) + goto out_bag; + + /* Set nbno to the bno of the next refcount change */ + error = rcbag_next_edge(rcstack, sc->tp, &rmap, have, &nbno); + if (error) + goto out_bag; + + ASSERT(nbno > sbno); + old_stack_height = rcbag_count(rcstack); + + /* While stack isn't empty... */ + while (rcbag_count(rcstack) > 0) { + /* Pop all rmaps that end at nbno */ + error = rcbag_remove_ending_at(rcstack, sc->tp, nbno); + if (error) + goto out_bag; + + /* Push array items that start at nbno */ + error = xrep_rtrefc_walk_rmaps(rr, &rmap, &have); + if (error) + goto out_bag; + if (have) { + error = xrep_rtrefc_push_rmaps_at(rr, rcstack, + nbno, &rmap, &have); + if (error) + goto out_bag; + } + + /* Emit refcount if necessary */ + ASSERT(nbno > cbno); + if (rcbag_count(rcstack) != old_stack_height) { + if (old_stack_height > 1) { + error = xrep_rtrefc_stash(rr, + XFS_REFC_DOMAIN_SHARED, + cbno, nbno - cbno, + old_stack_height); + if (error) + goto out_bag; + } + cbno = nbno; + } + + /* Stack empty, go find the next rmap */ + if (rcbag_count(rcstack) == 0) + break; + old_stack_height = rcbag_count(rcstack); + sbno = nbno; + + /* Set nbno to the bno of the next refcount change */ + error = rcbag_next_edge(rcstack, sc->tp, &rmap, have, + &nbno); + if (error) + goto out_bag; + + ASSERT(nbno > sbno); + } + } + + ASSERT(rcbag_count(rcstack) == 0); +out_bag: + rcbag_free(&rcstack); +out_cur: + xchk_rtgroup_btcur_free(&sc->sr); + return error; +} + +/* Retrieve refcountbt data for bulk load. */ +STATIC int +xrep_rtrefc_get_records( + struct xfs_btree_cur *cur, + unsigned int idx, + struct xfs_btree_block *block, + unsigned int nr_wanted, + void *priv) +{ + struct xrep_rtrefc *rr = priv; + union xfs_btree_rec *block_rec; + unsigned int loaded; + int error; + + for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { + error = xfarray_load(rr->refcount_records, rr->array_cur++, + &cur->bc_rec.rc); + if (error) + return error; + + block_rec = xfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + } + + return loaded; +} + +/* Feed one of the new btree blocks to the bulk loader. */ +STATIC int +xrep_rtrefc_claim_block( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr, + void *priv) +{ + struct xrep_rtrefc *rr = priv; + + return xrep_newbt_claim_block(cur, &rr->new_btree, ptr); +} + +/* Figure out how much space we need to create the incore btree root block. */ +STATIC size_t +xrep_rtrefc_iroot_size( + struct xfs_btree_cur *cur, + unsigned int level, + unsigned int nr_this_level, + void *priv) +{ + return xfs_rtrefcount_broot_space_calc(cur->bc_mp, level, + nr_this_level); +} + +/* + * Use the collected refcount information to stage a new rt refcount btree. If + * this is successful we'll return with the new btree root information logged + * to the repair transaction but not yet committed. + */ +STATIC int +xrep_rtrefc_build_new_tree( + struct xrep_rtrefc *rr) +{ + struct xfs_scrub *sc = rr->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_rtgroup *rtg = sc->sr.rtg; + struct xfs_btree_cur *refc_cur; + int error; + + error = xrep_rtrefc_sort_records(rr); + if (error) + return error; + + /* + * Prepare to construct the new btree by reserving disk space for the + * new btree and setting up all the accounting information we'll need + * to root the new btree while it's under construction and before we + * attach it to the realtime refcount inode. + */ + error = xrep_newbt_init_metadir_inode(&rr->new_btree, sc); + if (error) + return error; + + rr->new_btree.bload.get_records = xrep_rtrefc_get_records; + rr->new_btree.bload.claim_block = xrep_rtrefc_claim_block; + rr->new_btree.bload.iroot_size = xrep_rtrefc_iroot_size; + + /* Compute how many blocks we'll need. */ + refc_cur = xfs_rtrefcountbt_stage_cursor(mp, rtg, rtg->rtg_refcountip, + &rr->new_btree.ifake); + error = xfs_btree_bload_compute_geometry(refc_cur, &rr->new_btree.bload, + xfarray_length(rr->refcount_records)); + if (error) + goto err_cur; + + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + goto err_cur; + + /* + * Guess how many blocks we're going to need to rebuild an entire + * rtrefcountbt from the number of extents we found, and pump up our + * transaction to have sufficient block reservation. We're allowed + * to exceed quota to repair inconsistent metadata, though this is + * unlikely. + */ + error = xfs_trans_reserve_more_inode(sc->tp, rtg->rtg_refcountip, + rr->new_btree.bload.nr_blocks, 0, true); + if (error) + goto err_cur; + + /* Reserve the space we'll need for the new btree. */ + error = xrep_newbt_alloc_blocks(&rr->new_btree, + rr->new_btree.bload.nr_blocks); + if (error) + goto err_cur; + + /* Add all observed refcount records. */ + rr->new_btree.ifake.if_fork->if_format = XFS_DINODE_FMT_REFCOUNT; + rr->array_cur = XFARRAY_CURSOR_INIT; + error = xfs_btree_bload(refc_cur, &rr->new_btree.bload, rr); + if (error) + goto err_cur; + + /* + * Install the new rtrefc btree in the inode. After this point the old + * btree is no longer accessible, the new tree is live, and we can + * delete the cursor. + */ + xfs_rtrefcountbt_commit_staged_btree(refc_cur, sc->tp); + xrep_inode_set_nblocks(rr->sc, rr->new_btree.ifake.if_blocks); + xfs_btree_del_cursor(refc_cur, 0); + + /* Dispose of any unused blocks and the accounting information. */ + error = xrep_newbt_commit(&rr->new_btree); + if (error) + return error; + + return xrep_roll_trans(sc); +err_cur: + xfs_btree_del_cursor(refc_cur, error); + xrep_newbt_cancel(&rr->new_btree); + return error; +} + +/* + * Now that we've logged the roots of the new btrees, invalidate all of the + * old blocks and free them. + */ +STATIC int +xrep_rtrefc_remove_old_tree( + struct xrep_rtrefc *rr) +{ + int error; + + /* + * Free all the extents that were allocated to the former rtrefcountbt + * and aren't cross-linked with something else. + */ + error = xrep_reap_metadir_fsblocks(rr->sc, + &rr->old_rtrefcountbt_blocks); + if (error) + return error; + + /* + * Ensure the proper reservation for the rtrefcount inode so that we + * don't fail to expand the btree. + */ + return xrep_reset_imeta_reservation(rr->sc); +} + +/* Rebuild the rt refcount btree. */ +int +xrep_rtrefcountbt( + struct xfs_scrub *sc) +{ + struct xrep_rtrefc *rr; + struct xfs_mount *mp = sc->mp; + char *descr; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_has_rtrmapbt(mp)) + return -EOPNOTSUPP; + + /* Make sure any problems with the fork are fixed. */ + error = xrep_metadata_inode_forks(sc); + if (error) + return error; + + rr = kzalloc(sizeof(struct xrep_rtrefc), XCHK_GFP_FLAGS); + if (!rr) + return -ENOMEM; + rr->sc = sc; + + /* Set up enough storage to handle one refcount record per rt extent. */ + descr = xchk_xfile_ag_descr(sc, "reference count records"); + error = xfarray_create(descr, mp->m_sb.sb_rextents, + sizeof(struct xfs_refcount_irec), + &rr->refcount_records); + kfree(descr); + if (error) + goto out_rr; + + /* Collect all reference counts. */ + xfsb_bitmap_init(&rr->old_rtrefcountbt_blocks); + error = xrep_rtrefc_find_refcounts(rr); + if (error) + goto out_bitmap; + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + /* Rebuild the refcount information. */ + error = xrep_rtrefc_build_new_tree(rr); + if (error) + goto out_bitmap; + + /* Kill the old tree. */ + error = xrep_rtrefc_remove_old_tree(rr); + if (error) + goto out_bitmap; + +out_bitmap: + xfsb_bitmap_destroy(&rr->old_rtrefcountbt_blocks); + xfarray_destroy(rr->refcount_records); +out_rr: + kfree(rr); + return error; +} diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c index a40afa571b981..c29e398283018 100644 --- a/fs/xfs/scrub/rtrmap_repair.c +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -137,7 +137,7 @@ xrep_rtrmap_check_mapping( /* Make sure this isn't free space. */ rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, rec->rm_startblock); - return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount); + return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount, false); } /* Store a reverse-mapping record. */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 2611e0223489c..94a733975879a 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -486,7 +486,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .setup = xchk_setup_rtrefcountbt, .scrub = xchk_rtrefcountbt, .has = xfs_has_rtreflink, - .repair = xrep_notsupported, + .repair = xrep_rtrefcountbt, }, }; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 2d373c9a53ad6..aa943433cbb02 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -3995,6 +3995,37 @@ TRACE_EVENT(xrep_rtrmap_live_update, __entry->offset, __entry->flags) ); + +TRACE_EVENT(xrep_rtrefc_found, + TP_PROTO(struct xfs_rtgroup *rtg, const struct xfs_refcount_irec *rec), + TP_ARGS(rtg, rec), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgnumber_t, rgno) + __field(enum xfs_refc_domain, domain) + __field(xfs_rgblock_t, startblock) + __field(xfs_extlen_t, blockcount) + __field(xfs_nlink_t, refcount) + ), + TP_fast_assign( + __entry->dev = rtg->rtg_mount->m_super->s_dev; + __entry->rtdev = rtg->rtg_mount->m_rtdev_targp->bt_dev; + __entry->rgno = rtg->rtg_rgno; + __entry->domain = rec->rc_domain; + __entry->startblock = rec->rc_startblock; + __entry->blockcount = rec->rc_blockcount; + __entry->refcount = rec->rc_refcount; + ), + TP_printk("dev %d:%d rtdev %d:%d rgno 0x%x dom %s rgbno 0x%x fsbcount 0x%x refcount %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgno, + __print_symbolic(__entry->domain, XFS_REFC_DOMAIN_STRINGS), + __entry->startblock, + __entry->blockcount, + __entry->refcount) +); #endif /* CONFIG_XFS_RT */ #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */ From patchwork Sun Dec 31 21:55:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507754 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A7B7F9C3 for ; Sun, 31 Dec 2023 21:55:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="H7AOGOng" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E6C90C433C8; Sun, 31 Dec 2023 21:55:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059716; bh=elzKaMcuOW4ifxnuGyV+M6fsuiIhxRHqnQLChFBGfcI=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=H7AOGOng7Z9wsMcQ/tK6CWz5xRTu1zQsZnLL6Q8BOSTGD25je/4ydqI7v/3zU8st5 iPqx/XPUG2jZAAhRnYkSMoQ9ewTozigAA2aj9psWmSGMX+R8rgsKZY+JZFHzW0dLAj z2SgnrA3R5CowHYgavkgSdvVpd263h+bcLb918bVVHMATPNEVC36GW4Pbsf0re7P9Q Jv9LX31ckhlIoL95xAwB//UeHwVLniRML82TEPxlFg8ji/QkwfUOyXiBkpLaWptwte LFfo6ZKzTv/kjV9xdn4QDVOrgTxzTxeDseTguRs7nNhf6oYll3gIXnnwsXpGLcJixF fENeLdXWQRAgg== Date: Sun, 31 Dec 2023 13:55:15 -0800 Subject: [PATCH 41/44] xfs: repair inodes that have a refcount btree in the data fork From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852239.1766284.2392902277010863021.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Plumb knowledge of refcount btrees into the inode core repair code. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/inode_repair.c | 47 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index b6b37652c4a35..a182a9551c08c 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -40,6 +40,7 @@ #include "xfs_symlink_remote.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -924,6 +925,37 @@ xrep_dinode_bad_rmapbt_fork( return false; } +/* Return true if this refcount-format ifork looks like garbage. */ +STATIC bool +xrep_dinode_bad_refcountbt_fork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + unsigned int dfork_size, + int whichfork) +{ + struct xfs_rtrefcount_root *dfp; + unsigned int nrecs; + unsigned int level; + + if (whichfork != XFS_DATA_FORK) + return true; + if (dfork_size < sizeof(struct xfs_rtrefcount_root)) + return true; + + dfp = XFS_DFORK_PTR(dip, whichfork); + nrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (level > sc->mp->m_rtrefc_maxlevels) + return true; + if (xfs_rtrefcount_droot_space_calc(level, nrecs) > dfork_size) + return true; + if (level > 0 && nrecs == 0) + return true; + + return false; +} + /* * Check the data fork for things that will fail the ifork verifiers or the * ifork formatters. @@ -1009,6 +1041,11 @@ xrep_dinode_check_dfork( XFS_DATA_FORK)) return true; break; + case XFS_DINODE_FMT_REFCOUNT: + if (xrep_dinode_bad_refcountbt_fork(sc, dip, dfork_size, + XFS_DATA_FORK)) + return true; + break; default: return true; } @@ -1134,6 +1171,11 @@ xrep_dinode_check_afork( XFS_ATTR_FORK)) return true; break; + case XFS_DINODE_FMT_REFCOUNT: + if (xrep_dinode_bad_refcountbt_fork(sc, dip, afork_size, + XFS_ATTR_FORK)) + return true; + break; default: return true; } @@ -1182,6 +1224,7 @@ xrep_dinode_ensure_forkoff( { struct xfs_bmdr_block *bmdr; struct xfs_rtrmap_root *rmdr; + struct xfs_rtrefcount_root *rcdr; struct xfs_scrub *sc = ri->sc; xfs_extnum_t attr_extents, data_extents; size_t bmdr_minsz = xfs_bmdr_space_calc(1); @@ -1292,6 +1335,10 @@ xrep_dinode_ensure_forkoff( rmdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK); dfork_min = xfs_rtrmap_broot_space(sc->mp, rmdr); break; + case XFS_DINODE_FMT_REFCOUNT: + rcdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + dfork_min = xfs_rtrefcount_broot_space(sc->mp, rcdr); + break; default: dfork_min = 0; break; From patchwork Sun Dec 31 21:55:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507755 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21EECFBEA for ; Sun, 31 Dec 2023 21:55:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DdIlks7A" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9DD3EC433C7; Sun, 31 Dec 2023 21:55:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059731; bh=kZARMGHDb3CF9VOQC4TkmLKkGtBTylDrTVMQsPVr/pk=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=DdIlks7AlCoRGvPTMUrihFVpBC6C3U+MEzDiNxs4/djyY3Q07GRt9X4pcIUNVz0GO RePJcmzQ3obZxpQekDlElAf1z/qKOsLLXiyzZaZftIgXhdDWJttZBXiPt+kqaEmU+d XLavOQhMQ4nL3ZxwhgsD9SEPhLnpFcDHe9+mRhLK9wEta1YNfOkN5vgmKEGgfiSX+/ +5vxA1ShnviiPykDdXK1y+sbJcvSCybOefVQdixTRxQyycXGA2JfVJgZcHYSBRXwA7 nodt/FNvhSjZ8wibEqV0uyBt1RDYOI0DAtEREGppTUu1vCUeKwWhsrhm2bwTaf+SDa uc1k3Xen3Jd+A== Date: Sun, 31 Dec 2023 13:55:31 -0800 Subject: [PATCH 42/44] xfs: check for shared rt extents when rebuilding rt file's data fork From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852255.1766284.11484868844063893927.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When we're rebuilding the data fork of a realtime file, we need to cross-reference each mapping with the rt refcount btree to ensure that the reflink flag is set if there are any shared extents found. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/bmap_repair.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 98618821ad975..c4f9a8ba8cf73 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -101,14 +101,23 @@ xrep_bmap_discover_shared( xfs_filblks_t blockcount) { struct xfs_scrub *sc = rb->sc; + struct xfs_btree_cur *cur; xfs_agblock_t agbno; xfs_agblock_t fbno; xfs_extlen_t flen; int error; - agbno = XFS_FSB_TO_AGBNO(sc->mp, startblock); - error = xfs_refcount_find_shared(sc->sa.refc_cur, agbno, blockcount, - &fbno, &flen, false); + if (XFS_IS_REALTIME_INODE(sc->ip)) { + xfs_rgnumber_t rgno; + + agbno = xfs_rtb_to_rgbno(sc->mp, startblock, &rgno); + cur = sc->sr.refc_cur; + } else { + agbno = XFS_FSB_TO_AGBNO(sc->mp, startblock); + cur = sc->sa.refc_cur; + } + error = xfs_refcount_find_shared(cur, agbno, blockcount, &fbno, &flen, + false); if (error) return error; @@ -456,7 +465,9 @@ xrep_bmap_scan_rtgroup( return 0; error = xrep_rtgroup_init(sc, rtg, &sc->sr, - XFS_RTGLOCK_RMAP | XFS_RTGLOCK_BITMAP_SHARED); + XFS_RTGLOCK_RMAP | + XFS_RTGLOCK_REFCOUNT | + XFS_RTGLOCK_BITMAP_SHARED); if (error) return error; @@ -900,10 +911,6 @@ xrep_bmap_init_reflink_scan( if (whichfork != XFS_DATA_FORK) return RLS_IRRELEVANT; - /* cannot share realtime extents */ - if (XFS_IS_REALTIME_INODE(sc->ip)) - return RLS_IRRELEVANT; - return RLS_UNKNOWN; } From patchwork Sun Dec 31 21:55:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507756 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 84738FBE5 for ; Sun, 31 Dec 2023 21:55:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lrCpxu+N" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 48728C433C7; Sun, 31 Dec 2023 21:55:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059747; bh=OF9Y9sz6jCbsEJbVW3ajEUNqdwgjjHH9LrDrMxgZB8U=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=lrCpxu+NTY/XPz2/ytd3p3Q+Z11FtJkry7mGW/8RJl7Za3cWR9r8Ir1tE42t232sr h9fGP6kTJ+qqPVd+FDqyXbxhD+iSxh37u7zeHnhwPhzzF4gU9fbitc/ak3Wn7dwK7/ b/wYOk3draUs3jMb0pzCHmckCKPMrxabbEbeLW3fAO6RxUJBElVxjvyQhF8lA2Gqpn N6fnbS77SxZIbo0jdL+kzSQlOiJPcz43Ap4XZLF6ya7Vab1oBKmTw9mGpFXwWG2yem 8ujjxp/ZGvJ4QKZRjSAaD3vdZxiV5jVAxOeiIHzH2rLdyMM3m2FdIotNo0qLVH8lHy 6CabbcdhRCEDg== Date: Sun, 31 Dec 2023 13:55:46 -0800 Subject: [PATCH 43/44] xfs: fix CoW forks for realtime files From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852271.1766284.5224402997263588022.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Port the copy on write fork repair to realtime files. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/cow_repair.c | 210 +++++++++++++++++++++++++++++++++++++++---- fs/xfs/scrub/reap.c | 222 +++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/reap.h | 7 + fs/xfs/scrub/repair.h | 1 fs/xfs/scrub/rtb_bitmap.h | 37 ++++++++ fs/xfs/scrub/trace.h | 72 +++++++++++++++ 6 files changed, 532 insertions(+), 17 deletions(-) create mode 100644 fs/xfs/scrub/rtb_bitmap.h diff --git a/fs/xfs/scrub/cow_repair.c b/fs/xfs/scrub/cow_repair.c index e48d869986f34..4be05fd8d0490 100644 --- a/fs/xfs/scrub/cow_repair.c +++ b/fs/xfs/scrub/cow_repair.c @@ -26,6 +26,9 @@ #include "xfs_errortag.h" #include "xfs_icache.h" #include "xfs_refcount_btree.h" +#include "xfs_rtalloc.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -34,6 +37,7 @@ #include "scrub/bitmap.h" #include "scrub/off_bitmap.h" #include "scrub/fsb_bitmap.h" +#include "scrub/rtb_bitmap.h" #include "scrub/reap.h" /* @@ -61,7 +65,10 @@ struct xrep_cow { struct xoff_bitmap bad_fileoffs; /* Bitmap of fsblocks that were removed from the CoW fork. */ - struct xfsb_bitmap old_cowfork_fsblocks; + union { + struct xfsb_bitmap old_cowfork_fsblocks; + struct xrtb_bitmap old_cowfork_rtblocks; + }; /* CoW fork mappings used to scan for bad CoW staging extents. */ struct xfs_bmbt_irec irec; @@ -145,8 +152,12 @@ xrep_cow_mark_shared_staging( xrep_cow_trim_refcount(xc, &rrec, rec); - fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, - rrec.rc_startblock); + if (XFS_IS_REALTIME_INODE(xc->sc->ip)) + fsbno = xfs_rgbno_to_rtb(xc->sc->mp, cur->bc_ino.rtg->rtg_rgno, + rrec.rc_startblock); + else + fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, + rrec.rc_startblock); return xrep_cow_mark_file_range(xc, fsbno, rrec.rc_blockcount); } @@ -166,6 +177,7 @@ xrep_cow_mark_missing_staging( { struct xrep_cow *xc = priv; struct xfs_refcount_irec rrec; + xfs_fsblock_t fsbno; int error; if (!xfs_refcount_check_domain(rec) || @@ -177,9 +189,13 @@ xrep_cow_mark_missing_staging( if (xc->next_bno >= rrec.rc_startblock) goto next; - error = xrep_cow_mark_file_range(xc, - XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, - xc->next_bno), + if (XFS_IS_REALTIME_INODE(xc->sc->ip)) + fsbno = xfs_rgbno_to_rtb(xc->sc->mp, cur->bc_ino.rtg->rtg_rgno, + xc->next_bno); + else + fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, + xc->next_bno); + error = xrep_cow_mark_file_range(xc, fsbno, rrec.rc_startblock - xc->next_bno); if (error) return error; @@ -222,7 +238,12 @@ xrep_cow_mark_missing_staging_rmap( rec_len -= adj; } - fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, rec_bno); + if (XFS_IS_REALTIME_INODE(xc->sc->ip)) + fsbno = xfs_rgbno_to_rtb(xc->sc->mp, cur->bc_ino.rtg->rtg_rgno, + rec_bno); + else + fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, + rec_bno); return xrep_cow_mark_file_range(xc, fsbno, rec_len); } @@ -311,6 +332,97 @@ xrep_cow_find_bad( return 0; } +/* + * Find any part of the CoW fork mapping that isn't a single-owner CoW staging + * extent and mark the corresponding part of the file range in the bitmap. + */ +STATIC int +xrep_cow_find_bad_rt( + struct xrep_cow *xc) +{ + struct xfs_refcount_irec rc_low = { 0 }; + struct xfs_refcount_irec rc_high = { 0 }; + struct xfs_rmap_irec rm_low = { 0 }; + struct xfs_rmap_irec rm_high = { 0 }; + struct xfs_scrub *sc = xc->sc; + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + int error = 0; + + xc->irec_startbno = xfs_rtb_to_rgbno(sc->mp, xc->irec.br_startblock, + &rgno); + + rtg = xfs_rtgroup_get(sc->mp, rgno); + if (!rtg) + return -EFSCORRUPTED; + + if (xrep_is_rtmeta_ino(sc, rtg, sc->ip->i_ino)) + goto out_rtg; + + error = xrep_rtgroup_init(sc, rtg, &sc->sr, + XFS_RTGLOCK_RMAP | XFS_RTGLOCK_REFCOUNT); + if (error) + goto out_rtg; + + /* Mark any CoW fork extents that are shared. */ + rc_low.rc_startblock = xc->irec_startbno; + rc_high.rc_startblock = xc->irec_startbno + xc->irec.br_blockcount - 1; + rc_low.rc_domain = rc_high.rc_domain = XFS_REFC_DOMAIN_SHARED; + error = xfs_refcount_query_range(sc->sr.refc_cur, &rc_low, &rc_high, + xrep_cow_mark_shared_staging, xc); + if (error) + goto out_sr; + + /* Make sure there are CoW staging extents for the whole mapping. */ + rc_low.rc_startblock = xc->irec_startbno; + rc_high.rc_startblock = xc->irec_startbno + xc->irec.br_blockcount - 1; + rc_low.rc_domain = rc_high.rc_domain = XFS_REFC_DOMAIN_COW; + xc->next_bno = xc->irec_startbno; + error = xfs_refcount_query_range(sc->sr.refc_cur, &rc_low, &rc_high, + xrep_cow_mark_missing_staging, xc); + if (error) + goto out_sr; + + if (xc->next_bno < xc->irec_startbno + xc->irec.br_blockcount) { + error = xrep_cow_mark_file_range(xc, + xfs_rgbno_to_rtb(sc->mp, rtg->rtg_rgno, + xc->next_bno), + xc->irec_startbno + xc->irec.br_blockcount - + xc->next_bno); + if (error) + goto out_sr; + } + + /* Mark any area has an rmap that isn't a COW staging extent. */ + rm_low.rm_startblock = xc->irec_startbno; + memset(&rm_high, 0xFF, sizeof(rm_high)); + rm_high.rm_startblock = xc->irec_startbno + xc->irec.br_blockcount - 1; + error = xfs_rmap_query_range(sc->sr.rmap_cur, &rm_low, &rm_high, + xrep_cow_mark_missing_staging_rmap, xc); + if (error) + goto out_sr; + + /* + * If userspace is forcing us to rebuild the CoW fork or someone + * turned on the debugging knob, replace everything in the + * CoW fork and then scan for staging extents in the refcountbt. + */ + if ((sc->sm->sm_flags & XFS_SCRUB_IFLAG_FORCE_REBUILD) || + XFS_TEST_ERROR(false, sc->mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR)) { + error = xrep_cow_mark_file_range(xc, xc->irec.br_startblock, + xc->irec.br_blockcount); + if (error) + goto out_rtg; + } + +out_sr: + xchk_rtgroup_btcur_free(&sc->sr); + xchk_rtgroup_free(sc, &sc->sr); +out_rtg: + xfs_rtgroup_put(rtg); + return error; +} + /* * Allocate a replacement CoW staging extent of up to the given number of * blocks, and fill out the mapping. @@ -351,6 +463,46 @@ xrep_cow_alloc( return 0; } +/* + * Allocate a replacement rt CoW staging extent of up to the given number of + * blocks, and fill out the mapping. + */ +STATIC int +xrep_cow_alloc_rt( + struct xfs_scrub *sc, + xfs_extlen_t maxlen, + struct xrep_cow_extent *repl) +{ + xfs_rtxnum_t rtx = NULLRTEXTNO; + xfs_rtxlen_t maxrtx; + xfs_rtxlen_t rtxlen = 0; + xfs_rtblock_t rtbno; + xfs_extlen_t len; + int error; + + maxrtx = xfs_rtb_to_rtx(sc->mp, maxlen); + error = xfs_trans_reserve_more(sc->tp, 0, maxrtx); + if (error) + return error; + + xfs_rtbitmap_lock(sc->tp, sc->mp); + + error = xfs_rtallocate_extent(sc->tp, 0, 1, maxrtx, &rtxlen, 0, 1, + &rtx); + if (error) + return error; + if (rtx == NULLRTEXTNO) + return -ENOSPC; + + rtbno = xfs_rtx_to_rtb(sc->mp, rtx); + len = xfs_rtxlen_to_extlen(sc->mp, rtxlen); + xfs_refcount_alloc_cow_extent(sc->tp, true, rtbno, len); + + repl->fsbno = rtbno; + repl->len = len; + return 0; +} + /* * Look up the current CoW fork mapping so that we only allocate enough to * replace a single mapping. If we don't find a mapping that covers the start @@ -468,7 +620,10 @@ xrep_cow_replace_range( */ alloc_len = min_t(xfs_fileoff_t, XFS_MAX_BMBT_EXTLEN, nextoff - startoff); - error = xrep_cow_alloc(sc, alloc_len, &repl); + if (XFS_IS_REALTIME_INODE(sc->ip)) + error = xrep_cow_alloc_rt(sc, alloc_len, &repl); + else + error = xrep_cow_alloc(sc, alloc_len, &repl); if (error) return error; @@ -484,8 +639,12 @@ xrep_cow_replace_range( return error; /* Note the old CoW staging extents; we'll reap them all later. */ - error = xfsb_bitmap_set(&xc->old_cowfork_fsblocks, got.br_startblock, - repl.len); + if (XFS_IS_REALTIME_INODE(sc->ip)) + error = xrtb_bitmap_set(&xc->old_cowfork_rtblocks, + got.br_startblock, repl.len); + else + error = xfsb_bitmap_set(&xc->old_cowfork_fsblocks, + got.br_startblock, repl.len); if (error) return error; @@ -541,8 +700,12 @@ xrep_bmap_cow( if (!ifp) return 0; - /* realtime files aren't supported yet */ - if (XFS_IS_REALTIME_INODE(sc->ip)) + /* + * Realtime files with large extent sizes are not supported because + * we could encounter an CoW mapping that has been partially written + * out *and* requires replacement, and there's no solution to that. + */ + if (xfs_inode_has_bigallocunit(sc->ip)) return -EOPNOTSUPP; /* @@ -563,7 +726,10 @@ xrep_bmap_cow( xc->sc = sc; xoff_bitmap_init(&xc->bad_fileoffs); - xfsb_bitmap_init(&xc->old_cowfork_fsblocks); + if (XFS_IS_REALTIME_INODE(sc->ip)) + xrtb_bitmap_init(&xc->old_cowfork_rtblocks); + else + xfsb_bitmap_init(&xc->old_cowfork_fsblocks); for_each_xfs_iext(ifp, &icur, &xc->irec) { if (xchk_should_terminate(sc, &error)) @@ -586,7 +752,10 @@ xrep_bmap_cow( if (xfs_bmap_is_written_extent(&xc->irec)) continue; - error = xrep_cow_find_bad(xc); + if (XFS_IS_REALTIME_INODE(sc->ip)) + error = xrep_cow_find_bad_rt(xc); + else + error = xrep_cow_find_bad(xc); if (error) goto out_bitmap; } @@ -601,13 +770,20 @@ xrep_bmap_cow( * by the refcount btree, not the inode, so it is correct to treat them * like inode metadata. */ - error = xrep_reap_fsblocks(sc, &xc->old_cowfork_fsblocks, - &XFS_RMAP_OINFO_COW); + if (XFS_IS_REALTIME_INODE(sc->ip)) + error = xrep_reap_rtblocks(sc, &xc->old_cowfork_rtblocks, + &XFS_RMAP_OINFO_COW); + else + error = xrep_reap_fsblocks(sc, &xc->old_cowfork_fsblocks, + &XFS_RMAP_OINFO_COW); if (error) goto out_bitmap; out_bitmap: - xfsb_bitmap_destroy(&xc->old_cowfork_fsblocks); + if (XFS_IS_REALTIME_INODE(sc->ip)) + xrtb_bitmap_destroy(&xc->old_cowfork_rtblocks); + else + xfsb_bitmap_destroy(&xc->old_cowfork_fsblocks); xoff_bitmap_destroy(&xc->bad_fileoffs); kmem_free(xc); return error; diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c index bb28c2d2b8780..b8166e19726a4 100644 --- a/fs/xfs/scrub/reap.c +++ b/fs/xfs/scrub/reap.c @@ -34,6 +34,8 @@ #include "xfs_attr_remote.h" #include "xfs_defer.h" #include "xfs_imeta.h" +#include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -41,6 +43,7 @@ #include "scrub/bitmap.h" #include "scrub/agb_bitmap.h" #include "scrub/fsb_bitmap.h" +#include "scrub/rtb_bitmap.h" #include "scrub/reap.h" /* @@ -680,6 +683,225 @@ xrep_reap_fsblocks( return 0; } +#ifdef CONFIG_XFS_RT +/* + * Figure out the longest run of blocks that we can dispose of with a single + * call. Cross-linked blocks should have their reverse mappings removed, but + * single-owner extents can be freed. Units are rt blocks, not rt extents. + */ +STATIC int +xreap_rgextent_select( + struct xreap_state *rs, + xfs_rgblock_t rgbno, + xfs_rgblock_t rgbno_next, + bool *crosslinked, + xfs_extlen_t *rglenp) +{ + struct xfs_scrub *sc = rs->sc; + struct xfs_btree_cur *cur; + xfs_rgblock_t bno = rgbno + 1; + xfs_extlen_t len = 1; + int error; + + /* + * Determine if there are any other rmap records covering the first + * block of this extent. If so, the block is crosslinked. + */ + cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, sc->sr.rtg, + sc->sr.rtg->rtg_rmapip); + error = xfs_rmap_has_other_keys(cur, rgbno, 1, rs->oinfo, + crosslinked); + if (error) + goto out_cur; + + /* + * Figure out how many of the subsequent blocks have the same crosslink + * status. + */ + while (bno < rgbno_next) { + bool also_crosslinked; + + error = xfs_rmap_has_other_keys(cur, bno, 1, rs->oinfo, + &also_crosslinked); + if (error) + goto out_cur; + + if (*crosslinked != also_crosslinked) + break; + + len++; + bno++; + } + + *rglenp = len; + trace_xreap_rgextent_select(sc->sr.rtg, rgbno, len, *crosslinked); +out_cur: + xfs_btree_del_cursor(cur, error); + return error; +} + +/* + * Dispose of as much of the beginning of this rtgroup extent as possible. + * The number of blocks disposed of will be returned in @rglenp. + */ +STATIC int +xreap_rgextent_iter( + struct xreap_state *rs, + xfs_rgblock_t rgbno, + xfs_extlen_t *rglenp, + bool crosslinked) +{ + struct xfs_scrub *sc = rs->sc; + xfs_rtblock_t rtbno; + int error; + + /* + * The only caller so far is CoW fork repair, so we only know how to + * unlink or free CoW staging extents. Here we don't have to worry + * about invalidating buffers! + */ + if (rs->oinfo != &XFS_RMAP_OINFO_COW) { + ASSERT(rs->oinfo == &XFS_RMAP_OINFO_COW); + return -EFSCORRUPTED; + } + ASSERT(rs->resv == XFS_AG_RESV_NONE); + + rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, rgbno); + + /* + * If there are other rmappings, this block is cross linked and must + * not be freed. Remove the forward and reverse mapping and move on. + */ + if (crosslinked) { + trace_xreap_dispose_unmap_rtextent(sc->sr.rtg, rgbno, *rglenp); + + xfs_refcount_free_cow_extent(sc->tp, true, rtbno, *rglenp); + rs->deferred++; + return 0; + } + + trace_xreap_dispose_free_rtextent(sc->sr.rtg, rgbno, *rglenp); + + /* + * The CoW staging extent is not crosslinked. Use deferred work items + * to remove the refcountbt records (which removes the rmap records) + * and free the extent. We're not worried about the system going down + * here because log recovery walks the refcount btree to clean out the + * CoW staging extents. + */ + xfs_refcount_free_cow_extent(sc->tp, true, rtbno, *rglenp); + error = xfs_free_extent_later(sc->tp, rtbno, *rglenp, NULL, + rs->resv, + XFS_FREE_EXTENT_REALTIME | + XFS_FREE_EXTENT_SKIP_DISCARD); + if (error) + return error; + + rs->deferred++; + return 0; +} + +#define XREAP_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP | \ + XFS_RTGLOCK_RMAP | \ + XFS_RTGLOCK_REFCOUNT) + +/* + * Break a rt file metadata extent into sub-extents by fate (crosslinked, not + * crosslinked), and dispose of each sub-extent separately. The extent must + * be aligned to a realtime extent. + */ +STATIC int +xreap_rtmeta_extent( + uint64_t rtbno, + uint64_t len, + void *priv) +{ + struct xreap_state *rs = priv; + struct xfs_scrub *sc = rs->sc; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno = xfs_rtb_to_rgbno(sc->mp, rtbno, &rgno); + xfs_rgblock_t rgbno_next = rgbno + len; + int error = 0; + + ASSERT(sc->ip != NULL); + ASSERT(!sc->sr.rtg); + + /* + * We're reaping blocks after repairing file metadata, which means that + * we have to init the xchk_ag structure ourselves. + */ + sc->sr.rtg = xfs_rtgroup_get(sc->mp, rgno); + if (!sc->sr.rtg) + return -EFSCORRUPTED; + + xfs_rtgroup_lock(NULL, sc->sr.rtg, XREAP_RTGLOCK_ALL); + + while (rgbno < rgbno_next) { + xfs_extlen_t rglen; + bool crosslinked; + + error = xreap_rgextent_select(rs, rgbno, rgbno_next, + &crosslinked, &rglen); + if (error) + goto out_unlock; + + error = xreap_rgextent_iter(rs, rgbno, &rglen, crosslinked); + if (error) + goto out_unlock; + + if (xreap_want_defer_finish(rs)) { + error = xfs_defer_finish(&sc->tp); + if (error) + goto out_unlock; + xreap_defer_finish_reset(rs); + } else if (xreap_want_roll(rs)) { + error = xfs_trans_roll_inode(&sc->tp, sc->ip); + if (error) + goto out_unlock; + xreap_reset(rs); + } + + rgbno += rglen; + } + +out_unlock: + xfs_rtgroup_unlock(sc->sr.rtg, XREAP_RTGLOCK_ALL); + xfs_rtgroup_put(sc->sr.rtg); + sc->sr.rtg = NULL; + return error; +} + +/* + * Dispose of every block of every rt metadata extent in the bitmap. + * Do not use this to dispose of the mappings in an ondisk inode fork. + */ +int +xrep_reap_rtblocks( + struct xfs_scrub *sc, + struct xrtb_bitmap *bitmap, + const struct xfs_owner_info *oinfo) +{ + struct xreap_state rs = { + .sc = sc, + .oinfo = oinfo, + .resv = XFS_AG_RESV_NONE, + }; + int error; + + ASSERT(xfs_has_rmapbt(sc->mp)); + ASSERT(sc->ip != NULL); + + error = xrtb_bitmap_walk(bitmap, xreap_rtmeta_extent, &rs); + if (error) + return error; + + if (xreap_dirty(&rs)) + return xrep_defer_finish(sc); + + return 0; +} +#endif /* CONFIG_XFS_RT */ + /* * Dispose of every block of an old metadata btree that used to be rooted in a * metadata directory file. diff --git a/fs/xfs/scrub/reap.h b/fs/xfs/scrub/reap.h index 70e5e6bbb8d38..4c8f62701fb36 100644 --- a/fs/xfs/scrub/reap.h +++ b/fs/xfs/scrub/reap.h @@ -17,6 +17,13 @@ int xrep_reap_ifork(struct xfs_scrub *sc, struct xfs_inode *ip, int whichfork); int xrep_reap_metadir_fsblocks(struct xfs_scrub *sc, struct xfsb_bitmap *bitmap); +#ifdef CONFIG_XFS_RT +int xrep_reap_rtblocks(struct xfs_scrub *sc, struct xrtb_bitmap *bitmap, + const struct xfs_owner_info *oinfo); +#else +# define xrep_reap_rtblocks(...) (-EOPNOTSUPP) +#endif /* CONFIG_XFS_RT */ + /* Buffer cache scan context. */ struct xrep_bufscan { /* Disk address for the buffers we want to scan. */ diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 969317d429db6..801a9013f37f1 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -51,6 +51,7 @@ struct xbitmap; struct xagb_bitmap; struct xrgb_bitmap; struct xfsb_bitmap; +struct xrtb_bitmap; int xrep_fix_freelist(struct xfs_scrub *sc, int alloc_flags); diff --git a/fs/xfs/scrub/rtb_bitmap.h b/fs/xfs/scrub/rtb_bitmap.h new file mode 100644 index 0000000000000..1313ef605511e --- /dev/null +++ b/fs/xfs/scrub/rtb_bitmap.h @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2022-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_SCRUB_RTB_BITMAP_H__ +#define __XFS_SCRUB_RTB_BITMAP_H__ + +/* Bitmaps, but for type-checked for xfs_rtblock_t */ + +struct xrtb_bitmap { + struct xbitmap64 rtbitmap; +}; + +static inline void xrtb_bitmap_init(struct xrtb_bitmap *bitmap) +{ + xbitmap64_init(&bitmap->rtbitmap); +} + +static inline void xrtb_bitmap_destroy(struct xrtb_bitmap *bitmap) +{ + xbitmap64_destroy(&bitmap->rtbitmap); +} + +static inline int xrtb_bitmap_set(struct xrtb_bitmap *bitmap, + xfs_rtblock_t start, xfs_filblks_t len) +{ + return xbitmap64_set(&bitmap->rtbitmap, start, len); +} + +static inline int xrtb_bitmap_walk(struct xrtb_bitmap *bitmap, + xbitmap64_walk_fn fn, void *priv) +{ + return xbitmap64_walk(&bitmap->rtbitmap, fn, priv); +} + +#endif /* __XFS_SCRUB_RTB_BITMAP_H__ */ diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index aa943433cbb02..2c6f7e3b7578d 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -2013,6 +2013,41 @@ DEFINE_REPAIR_EXTENT_EVENT(xreap_agextent_binval); DEFINE_REPAIR_EXTENT_EVENT(xreap_bmapi_binval); DEFINE_REPAIR_EXTENT_EVENT(xrep_agfl_insert); +#ifdef CONFIG_XFS_RT +DECLARE_EVENT_CLASS(xrep_rtgroup_extent_class, + TP_PROTO(struct xfs_rtgroup *rtg, xfs_rgblock_t rgbno, + xfs_extlen_t len), + TP_ARGS(rtg, rgbno, len), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgnumber_t, rgno) + __field(xfs_rgblock_t, rgbno) + __field(xfs_extlen_t, len) + ), + TP_fast_assign( + __entry->dev = rtg->rtg_mount->m_super->s_dev; + __entry->rtdev = rtg->rtg_mount->m_rtdev_targp->bt_dev; + __entry->rgno = rtg->rtg_rgno; + __entry->rgbno = rgbno; + __entry->len = len; + ), + TP_printk("dev %d:%d rtdev %d:%d rgno 0x%x rgbno 0x%x fsbcount 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgno, + __entry->rgbno, + __entry->len) +); +#define DEFINE_REPAIR_RTGROUP_EXTENT_EVENT(name) \ +DEFINE_EVENT(xrep_rtgroup_extent_class, name, \ + TP_PROTO(struct xfs_rtgroup *rtg, xfs_rgblock_t rgbno, \ + xfs_extlen_t len), \ + TP_ARGS(rtg, rgbno, len)) +DEFINE_REPAIR_RTGROUP_EXTENT_EVENT(xreap_dispose_unmap_rtextent); +DEFINE_REPAIR_RTGROUP_EXTENT_EVENT(xreap_dispose_free_rtextent); +#endif /* CONFIG_XFS_RT */ + DECLARE_EVENT_CLASS(xrep_reap_find_class, TP_PROTO(struct xfs_perag *pag, xfs_agblock_t agbno, xfs_extlen_t len, bool crosslinked), @@ -2046,6 +2081,43 @@ DEFINE_EVENT(xrep_reap_find_class, name, \ DEFINE_REPAIR_REAP_FIND_EVENT(xreap_agextent_select); DEFINE_REPAIR_REAP_FIND_EVENT(xreap_bmapi_select); +#ifdef CONFIG_XFS_RT +DECLARE_EVENT_CLASS(xrep_rtgroup_reap_find_class, + TP_PROTO(struct xfs_rtgroup *rtg, xfs_rgblock_t rgbno, xfs_extlen_t len, + bool crosslinked), + TP_ARGS(rtg, rgbno, len, crosslinked), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgnumber_t, rgno) + __field(xfs_rgblock_t, rgbno) + __field(xfs_extlen_t, len) + __field(bool, crosslinked) + ), + TP_fast_assign( + __entry->dev = rtg->rtg_mount->m_super->s_dev; + __entry->rtdev = rtg->rtg_mount->m_rtdev_targp->bt_dev; + __entry->rgno = rtg->rtg_rgno; + __entry->rgbno = rgbno; + __entry->len = len; + __entry->crosslinked = crosslinked; + ), + TP_printk("dev %d:%d rtdev %d:%d rgno 0x%x rgbno 0x%x fsbcount 0x%x crosslinked %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgno, + __entry->rgbno, + __entry->len, + __entry->crosslinked ? 1 : 0) +); +#define DEFINE_REPAIR_RTGROUP_REAP_FIND_EVENT(name) \ +DEFINE_EVENT(xrep_rtgroup_reap_find_class, name, \ + TP_PROTO(struct xfs_rtgroup *rtg, xfs_rgblock_t rgbno, \ + xfs_extlen_t len, bool crosslinked), \ + TP_ARGS(rtg, rgbno, len, crosslinked)) +DEFINE_REPAIR_RTGROUP_REAP_FIND_EVENT(xreap_rgextent_select); +#endif /* CONFIG_XFS_RT */ + DECLARE_EVENT_CLASS(xrep_rmap_class, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno, xfs_extlen_t len, From patchwork Sun Dec 31 21:56:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507757 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96042FBE1 for ; Sun, 31 Dec 2023 21:56:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CS1A6oQf" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 05FCFC433C7; Sun, 31 Dec 2023 21:56:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704059763; bh=p64KpcigM9if05gLOHitPzr5PqARFA7vcetQz767JG4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=CS1A6oQfoaSkp/HvpJZ2TGO7t6BtAe2hKEFvxLNlbK9zub+JNv6/ttLSzSJLnnweO g8EKOZ/tcepHQZiUAii+gVwQgclvnBzAUri9/xmDJZNnZuv/q+QId/P/pnSRewpiyG Hvv/YQIB4rBsjS92Z+wkMhRU5eL3+3edBNQ8SIcTGyKXjoeWyVBBO45bVjpprMjeas FW7xM/nneLOOBosPBrq+MhMWRVxSl0m/rLAslLQ9PjZho+l5/oAvJzSHxIJGYVXAJ4 PnHE6sKXJrtyBrlRF01j8ZiXDXn74QjHXaHuAXSR7YSpZbb7LWRq0PM1N95gvEA551 HVCo3nWtVwMoQ== Date: Sun, 31 Dec 2023 13:56:02 -0800 Subject: [PATCH 44/44] xfs: enable realtime reflink From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404852287.1766284.966275252347175997.stgit@frogsfrogsfrogs> In-Reply-To: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> References: <170404851479.1766284.4860754291017677928.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Enable reflink for realtime devices, sort of. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_super.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index fbfd2963b2e2c..bf0c0ce9a54b9 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1728,14 +1728,27 @@ xfs_fs_fill_super( "EXPERIMENTAL realtime allocation group feature in use. Use at your own risk!"); if (xfs_has_reflink(mp)) { - if (mp->m_sb.sb_rblocks) { + /* + * Reflink doesn't support rt extent sizes larger than a single + * block because we would have to perform unshare-around for + * rtext-unaligned write requests. + */ + if (xfs_has_realtime(mp) && mp->m_sb.sb_rextsize != 1) { xfs_alert(mp, - "reflink not compatible with realtime device!"); + "reflink not compatible with realtime extent size %u!", + mp->m_sb.sb_rextsize); error = -EINVAL; goto out_filestream_unmount; } - if (xfs_globals.always_cow) { + /* + * always-cow mode is not supported on filesystems with rt + * extent sizes larger than a single block because we'd have + * to perform write-around for unaligned writes because remap + * requests must be aligned to an rt extent. + */ + if (xfs_globals.always_cow && + (!xfs_has_realtime(mp) || mp->m_sb.sb_rextsize == 1)) { xfs_info(mp, "using DEBUG-only always_cow mode."); mp->m_always_cow = true; }