From patchwork Sun Dec 31 21:32:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507665 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB1E8BA2E for ; Sun, 31 Dec 2023 21:32:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AukRP9tZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8C8A6C433C7; Sun, 31 Dec 2023 21:32:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058323; bh=c3GESBwV5Msrc2DvrFSLXu0jpElY4QhuV0MVBd8433A=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=AukRP9tZBQJAPYS0pcmPaavkvbMSUPfiC73oMMiBIh+FAvE0uP3TS+o6LFbZU7Jnw /M44em7lctKdnIYQDEjefa2eIs+HgsenlMjV/KQ1Sk3YsATIOHaSNR9WL2VdPioRXs 22N0M+R5whNNqqFbHaO1kchfYHQvlH5uFt2oMFiTg5jwhQ2G3Tnx2SUhE+hnAFjf+N AQLvng6xPUZdf3uTqAVxDitDFDKQ7LtzjQvAbYz1EU4cdw8dPIYNLiGNoUr/iTMSYp fdYrvc5qIVgGdDKYluJ5yQNuhOuf8DVuhkNp3ivxA3mhZC1MLWkHyRUcosyxHeD45D XskBgguOdxg8g== Date: Sun, 31 Dec 2023 13:32:03 -0800 Subject: [PATCH 01/39] xfs: prepare rmap btree cursor tracepoints for realtime From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404849916.1764998.16216956664784728981.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Rework the rmap btree cursor tracepoints in preparation to handle the realtime rmap btree cursor. Mostly this involves renaming the field to "rmapbno" and extracting the group number from the cursor when possible. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_trace.c | 18 ++++++++++++++++ fs/xfs/xfs_trace.h | 58 +++++++++++++++++++++++++++------------------------- 2 files changed, 48 insertions(+), 28 deletions(-) diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index 8a990a2fa132f..f35927ebe2450 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -47,6 +47,24 @@ #include "xfs_rtgroup.h" #include "xfs_rmap.h" +static inline void +xfs_rmapbt_crack_agno_opdev( + struct xfs_btree_cur *cur, + xfs_agnumber_t *agno, + dev_t *opdev) +{ + if (cur->bc_flags & XFS_BTREE_IN_XFILE) { + *agno = 0; + *opdev = xfbtree_target(cur->bc_mem.xfbtree)->bt_dev; + } else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) { + *agno = cur->bc_ino.rtg->rtg_rgno; + *opdev = cur->bc_mp->m_rtdev_targp->bt_dev; + } else { + *agno = cur->bc_ag.pag->pag_agno; + *opdev = cur->bc_mp->m_super->s_dev; + } +} + /* * We include this last to have the helpers above available for the trace * event implementations. diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 9290a02bac99e..43ecf98f86558 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -14,11 +14,15 @@ * ino: filesystem inode number * * agbno: per-AG block number in fs blocks + * rgbno: per-rtgroup block number in fs blocks * startblock: physical block number for file mappings. This is either a * segmented fsblock for data device mappings, or a rfsblock * for realtime device mappings * fsbcount: number of blocks in an extent, in fs blocks * + * rmapbno: physical block number for a reverse mapping. This is an agbno for + * per-AG rmap btrees or a rgbno for realtime rmap btrees. + * * daddr: physical block number in 512b blocks * bbcount: number of blocks in a physical extent, in 512b blocks * @@ -2840,13 +2844,14 @@ DEFINE_DEFER_PENDING_ITEM_EVENT(xfs_defer_finish_item); /* rmap tracepoints */ DECLARE_EVENT_CLASS(xfs_rmap_class, TP_PROTO(struct xfs_btree_cur *cur, - xfs_agblock_t agbno, xfs_extlen_t len, bool unwritten, + xfs_agblock_t rmapbno, xfs_extlen_t len, bool unwritten, const struct xfs_owner_info *oinfo), - TP_ARGS(cur, agbno, len, unwritten, oinfo), + TP_ARGS(cur, rmapbno, len, unwritten, oinfo), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, rmapbno) __field(xfs_extlen_t, len) __field(uint64_t, owner) __field(uint64_t, offset) @@ -2854,8 +2859,8 @@ DECLARE_EVENT_CLASS(xfs_rmap_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; - __entry->agbno = agbno; + xfs_rmapbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); + __entry->rmapbno = rmapbno; __entry->len = len; __entry->owner = oinfo->oi_owner; __entry->offset = oinfo->oi_offset; @@ -2863,10 +2868,11 @@ DECLARE_EVENT_CLASS(xfs_rmap_class, if (unwritten) __entry->flags |= XFS_RMAP_UNWRITTEN; ), - TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%lx", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x rmapbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%lx", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->agbno, + __entry->rmapbno, __entry->len, __entry->owner, __entry->offset, @@ -2875,9 +2881,9 @@ DECLARE_EVENT_CLASS(xfs_rmap_class, #define DEFINE_RMAP_EVENT(name) \ DEFINE_EVENT(xfs_rmap_class, name, \ TP_PROTO(struct xfs_btree_cur *cur, \ - xfs_agblock_t agbno, xfs_extlen_t len, bool unwritten, \ + xfs_agblock_t rmapbno, xfs_extlen_t len, bool unwritten, \ const struct xfs_owner_info *oinfo), \ - TP_ARGS(cur, agbno, len, unwritten, oinfo)) + TP_ARGS(cur, rmapbno, len, unwritten, oinfo)) /* btree cursor error/%ip tracepoint class */ DECLARE_EVENT_CLASS(xfs_btree_error_class, @@ -2936,40 +2942,35 @@ TRACE_EVENT(xfs_rmap_convert_state, TP_ARGS(cur, state, caller_ip), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) - __field(xfs_ino_t, ino) __field(int, state) __field(unsigned long, caller_ip) ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) { - __entry->agno = 0; - __entry->ino = cur->bc_ino.ip->i_ino; - } else { - __entry->agno = cur->bc_ag.pag->pag_agno; - __entry->ino = 0; - } + xfs_rmapbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->state = state; __entry->caller_ip = caller_ip; ), - TP_printk("dev %d:%d agno 0x%x ino 0x%llx state %d caller %pS", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x state %d caller %pS", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->ino, __entry->state, (char *)__entry->caller_ip) ); DECLARE_EVENT_CLASS(xfs_rmapbt_class, TP_PROTO(struct xfs_btree_cur *cur, - xfs_agblock_t agbno, xfs_extlen_t len, + xfs_agblock_t rmapbno, xfs_extlen_t len, uint64_t owner, uint64_t offset, unsigned int flags), - TP_ARGS(cur, agbno, len, owner, offset, flags), + TP_ARGS(cur, rmapbno, len, owner, offset, flags), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, rmapbno) __field(xfs_extlen_t, len) __field(uint64_t, owner) __field(uint64_t, offset) @@ -2977,17 +2978,18 @@ DECLARE_EVENT_CLASS(xfs_rmapbt_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; - __entry->agbno = agbno; + xfs_rmapbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); + __entry->rmapbno = rmapbno; __entry->len = len; __entry->owner = owner; __entry->offset = offset; __entry->flags = flags; ), - TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x rmapbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->agbno, + __entry->rmapbno, __entry->len, __entry->owner, __entry->offset, @@ -2996,9 +2998,9 @@ DECLARE_EVENT_CLASS(xfs_rmapbt_class, #define DEFINE_RMAPBT_EVENT(name) \ DEFINE_EVENT(xfs_rmapbt_class, name, \ TP_PROTO(struct xfs_btree_cur *cur, \ - xfs_agblock_t agbno, xfs_extlen_t len, \ + xfs_agblock_t rmapbno, xfs_extlen_t len, \ uint64_t owner, uint64_t offset, unsigned int flags), \ - TP_ARGS(cur, agbno, len, owner, offset, flags)) + TP_ARGS(cur, rmapbno, len, owner, offset, flags)) TRACE_DEFINE_ENUM(XFS_RMAP_MAP); TRACE_DEFINE_ENUM(XFS_RMAP_MAP_SHARED); From patchwork Sun Dec 31 21:32:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507666 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C032BBA22 for ; Sun, 31 Dec 2023 21:32:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Yh0zD2D3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4BB1FC433C7; Sun, 31 Dec 2023 21:32:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058339; bh=MPyy2vNTFg8PstYcc1tFrQk8RD3FP/ejOTKx2W2BqG4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Yh0zD2D3UqlizMoJZSKacm3zUm3gWM4yJ/20rNkM+Fkollg1UpZf+f39b5nig/WWW fzxmeA19RMJQNyGuHkLmJQP3RDh7BN8XM7f+XaO5YpixPMP3s/2YR0hBm7Ca/hy7nc k3U4TeURwKFvGGhROsiBAewCgZ4sFcjMqklODmXjebXgsSo9xhgxoEJIL9UZ2gCXz+ 8l8eXLsQLMPphsO/X4Krr0SzVOgEkQwmbz8aeXKf+OXgNgW+C3HvYLdMO30FmmeJVU /Q5EpbWbJTq3iVisYRrf7x1ulTTe6I7kDCzyLo3JziVsWaajs/zlzK7V5cq8NE3tR1 qSdVyzaJzpIww== Date: Sun, 31 Dec 2023 13:32:18 -0800 Subject: [PATCH 02/39] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404849932.1764998.15853029901796499799.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Simplify the calling conventions by allowing callers to pass a fsbno (xfs_fsblock_t) directly into these functions, since we're just going to set it in a struct anyway. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_refcount.c | 6 ++---- fs/xfs/libxfs/xfs_rmap.c | 12 +++++------- fs/xfs/libxfs/xfs_rmap.h | 8 ++++---- fs/xfs/scrub/alloc_repair.c | 10 +++++++--- 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index e30c4cdfaa392..6f7ec83281656 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1887,8 +1887,7 @@ xfs_refcount_alloc_cow_extent( __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len); /* Add rmap entry */ - xfs_rmap_alloc_extent(tp, XFS_FSB_TO_AGNO(mp, fsb), - XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); + xfs_rmap_alloc_extent(tp, fsb, len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ @@ -1904,8 +1903,7 @@ xfs_refcount_free_cow_extent( return; /* Remove rmap entry */ - xfs_rmap_free_extent(tp, XFS_FSB_TO_AGNO(mp, fsb), - XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); + xfs_rmap_free_extent(tp, fsb, len, XFS_RMAP_OWN_COW); __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len); } diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 0f552cb737c8b..b3383cc474492 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -527,7 +527,7 @@ xfs_rmap_free_check_owner( struct xfs_btree_cur *cur, uint64_t ltoff, struct xfs_rmap_irec *rec, - xfs_filblks_t len, + xfs_extlen_t len, uint64_t owner, uint64_t offset, unsigned int flags) @@ -2718,8 +2718,7 @@ xfs_rmap_convert_extent( void xfs_rmap_alloc_extent( struct xfs_trans *tp, - xfs_agnumber_t agno, - xfs_agblock_t bno, + xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) { @@ -2728,7 +2727,7 @@ xfs_rmap_alloc_extent( if (!xfs_rmap_update_is_needed(tp->t_mountp, XFS_DATA_FORK)) return; - bmap.br_startblock = XFS_AGB_TO_FSB(tp->t_mountp, agno, bno); + bmap.br_startblock = fsbno; bmap.br_blockcount = len; bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; @@ -2740,8 +2739,7 @@ xfs_rmap_alloc_extent( void xfs_rmap_free_extent( struct xfs_trans *tp, - xfs_agnumber_t agno, - xfs_agblock_t bno, + xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) { @@ -2750,7 +2748,7 @@ xfs_rmap_free_extent( if (!xfs_rmap_update_is_needed(tp->t_mountp, XFS_DATA_FORK)) return; - bmap.br_startblock = XFS_AGB_TO_FSB(tp->t_mountp, agno, bno); + bmap.br_startblock = fsbno; bmap.br_blockcount = len; bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index e6240efd6fe75..0ccfd7d88e56e 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -184,10 +184,10 @@ void xfs_rmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip, void xfs_rmap_convert_extent(struct xfs_mount *mp, struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *imap); -void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_agnumber_t agno, - xfs_agblock_t bno, xfs_extlen_t len, uint64_t owner); -void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_agnumber_t agno, - xfs_agblock_t bno, xfs_extlen_t len, uint64_t owner); +void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, + xfs_extlen_t len, uint64_t owner); +void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, + xfs_extlen_t len, uint64_t owner); int xfs_rmap_finish_one(struct xfs_trans *tp, struct xfs_rmap_intent *ri, struct xfs_btree_cur **pcur); diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c index 45edda096869c..3805099cb578b 100644 --- a/fs/xfs/scrub/alloc_repair.c +++ b/fs/xfs/scrub/alloc_repair.c @@ -542,9 +542,13 @@ xrep_abt_dispose_one( ASSERT(pag == resv->pag); /* Add a deferred rmap for each extent we used. */ - if (resv->used > 0) - xfs_rmap_alloc_extent(sc->tp, pag->pag_agno, resv->agbno, - resv->used, XFS_RMAP_OWN_AG); + if (resv->used > 0) { + xfs_fsblock_t fsbno; + + fsbno = XFS_AGB_TO_FSB(sc->mp, pag->pag_agno, resv->agbno); + xfs_rmap_alloc_extent(sc->tp, fsbno, resv->used, + XFS_RMAP_OWN_AG); + } /* * For each reserved btree block we didn't use, add it to the free From patchwork Sun Dec 31 21:32:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507667 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2EF62BA2B for ; Sun, 31 Dec 2023 21:32:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AYKnLCc6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F32FAC433C8; Sun, 31 Dec 2023 21:32:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058355; bh=G9MrzXIp+a/2boci0/eQ/LHW7gDOrQG4+OubWORU6ys=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=AYKnLCc6lpiyUD4CdxcBhI5It9BhSNkLieBVl8lhBHPQwzvmA3JaV8fZhrP6Fgfjk XkvvfXKsmYLg1eWLSCef3lkYqVKCK+cxNHTx/dOfssRGOqyMrNwoC9zuotO2ykW8B+ 7t8E+kjl5h1uT/93Eic4Y6AEzVxkJkiWtZWQ2mlBp5Rr+9KKWL6+P3XdYwvlBswwIq yKrrNqoTbP2vEQ5TKY+cEOhquAEcJbiWz7ETeuFkzubT+FyTz63sJRHrxhgTfNrWyZ WbWl+p5saL9xXSHjHCeGWZzwH7MpfylMAtuyFK9fsDKrhsfjWGm3SMyeRY8BqcD1bJ 0yR0TGGVEBRcg== Date: Sun, 31 Dec 2023 13:32:34 -0800 Subject: [PATCH 03/39] xfs: introduce realtime rmap btree definitions From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404849949.1764998.560975757192969969.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add new realtime rmap btree definitions. The realtime rmap btree will be rooted from a hidden inode, but has its own shape and therefore needs to have most of its own separate types. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_btree.h | 1 + fs/xfs/libxfs/xfs_format.h | 7 +++++++ fs/xfs/libxfs/xfs_types.h | 5 +++-- fs/xfs/scrub/trace.h | 1 + fs/xfs/xfs_trace.h | 1 + 5 files changed, 13 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index ce0bc5dfffe1c..e6571c9157d1e 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -64,6 +64,7 @@ union xfs_btree_rec { #define XFS_BTNUM_RMAP ((xfs_btnum_t)XFS_BTNUM_RMAPi) #define XFS_BTNUM_REFC ((xfs_btnum_t)XFS_BTNUM_REFCi) #define XFS_BTNUM_RCBAG ((xfs_btnum_t)XFS_BTNUM_RCBAGi) +#define XFS_BTNUM_RTRMAP ((xfs_btnum_t)XFS_BTNUM_RTRMAPi) struct xfs_btree_ops; uint32_t xfs_btree_magic(struct xfs_mount *mp, const struct xfs_btree_ops *ops); diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 87476c6bb6c64..b47d4f16143a6 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1746,6 +1746,13 @@ typedef __be32 xfs_rmap_ptr_t; XFS_FIBT_BLOCK(mp) + 1 : \ XFS_IBT_BLOCK(mp) + 1) +/* + * Realtime Reverse mapping btree format definitions + * + * This is a btree for reverse mapping records for realtime volumes + */ +#define XFS_RTRMAP_CRC_MAGIC 0x4d415052 /* 'MAPR' */ + /* * Reference Count Btree format definitions * diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index ad2ce83874f9f..b3edc57dc65bd 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -126,7 +126,7 @@ typedef enum { typedef enum { XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_REFCi, XFS_BTNUM_RCBAGi, - XFS_BTNUM_MAX + XFS_BTNUM_RTRMAPi, XFS_BTNUM_MAX } xfs_btnum_t; #define XFS_BTNUM_STRINGS \ @@ -137,7 +137,8 @@ typedef enum { { XFS_BTNUM_INOi, "inobt" }, \ { XFS_BTNUM_FINOi, "finobt" }, \ { XFS_BTNUM_REFCi, "refcbt" }, \ - { XFS_BTNUM_RCBAGi, "rcbagbt" } + { XFS_BTNUM_RCBAGi, "rcbagbt" }, \ + { XFS_BTNUM_RTRMAPi, "rtrmapbt" } struct xfs_name { const unsigned char *name; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index dd809042a6041..bdcd77c839317 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -48,6 +48,7 @@ TRACE_DEFINE_ENUM(XFS_BTNUM_FINOi); TRACE_DEFINE_ENUM(XFS_BTNUM_RMAPi); TRACE_DEFINE_ENUM(XFS_BTNUM_REFCi); TRACE_DEFINE_ENUM(XFS_BTNUM_RCBAGi); +TRACE_DEFINE_ENUM(XFS_BTNUM_RTRMAPi); TRACE_DEFINE_ENUM(XFS_REFC_DOMAIN_SHARED); TRACE_DEFINE_ENUM(XFS_REFC_DOMAIN_COW); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 43ecf98f86558..1c89d38b85446 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2550,6 +2550,7 @@ TRACE_DEFINE_ENUM(XFS_BTNUM_FINOi); TRACE_DEFINE_ENUM(XFS_BTNUM_RMAPi); TRACE_DEFINE_ENUM(XFS_BTNUM_REFCi); TRACE_DEFINE_ENUM(XFS_BTNUM_RCBAGi); +TRACE_DEFINE_ENUM(XFS_BTNUM_RTRMAPi); DECLARE_EVENT_CLASS(xfs_btree_cur_class, TP_PROTO(struct xfs_btree_cur *cur, int level, struct xfs_buf *bp), From patchwork Sun Dec 31 21:32:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507668 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C50EDBA22 for ; Sun, 31 Dec 2023 21:32:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AJpxu5zo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 93534C433C7; Sun, 31 Dec 2023 21:32:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058370; bh=VWV/ycl2EQIXYA6xHLHw/O17AIwhQaauvymfkpF7n/g=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=AJpxu5zob1fTuKOGNDawQvnma96INFy/xBCl3RtQYGAmmN9oeKujTIBa8eRDLfIpz gc1ovG0SuBFxG0fj3yJ5tzjcY4Y6e5PiLllFvvMnG1U+xvyfgGMh+R65xhPc5EV2YF 0VKUcfLex8j13l5nI531CXJijBXArqB5Mi9AE/JkghuKxlD04A+BJWIB2UCs/dyAoH D5qx04uFTAtY/KTdS7r24sTw0oXNnrQ7uWVnsXqoRZKaEPa44JQw9BrFrDhCz7p+Bh 9++3d0kuPoU07LX+adWARDrWLP5/MzLwXYmkdinyZ2LVAIc3J1m/ulJFVOfr+5toBW Hz2NdkH4KAgdw== Date: Sun, 31 Dec 2023 13:32:50 -0800 Subject: [PATCH 04/39] xfs: define the on-disk realtime rmap btree format From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404849965.1764998.13102413727560102964.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Start filling out the rtrmap btree implementation. Start with the on-disk btree format; add everything needed to read, write and manipulate rmap btree blocks. This prepares the way for connecting the btree operations implementation. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_btree.c | 5 + fs/xfs/libxfs/xfs_format.h | 3 fs/xfs/libxfs/xfs_ondisk.h | 1 fs/xfs/libxfs/xfs_rtrmap_btree.c | 307 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 83 ++++++++++ fs/xfs/libxfs/xfs_sb.c | 6 + fs/xfs/libxfs/xfs_shared.h | 2 fs/xfs/xfs_mount.c | 5 - fs/xfs/xfs_mount.h | 9 + 10 files changed, 420 insertions(+), 2 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtrmap_btree.c create mode 100644 fs/xfs/libxfs/xfs_rtrmap_btree.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 6dc9e740f8ce5..50a7929982fdd 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -48,6 +48,7 @@ xfs-y += $(addprefix libxfs/, \ xfs_rmap_btree.o \ xfs_refcount.o \ xfs_refcount_btree.o \ + xfs_rtrmap_btree.o \ xfs_sb.o \ xfs_swapext.o \ xfs_symlink_remote.o \ diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 5b4a2c6a3f2ac..c8a2ce10cccc8 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -32,6 +32,7 @@ #include "scrub/xfbtree.h" #include "xfs_btree_mem.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" /* * Btree magic numbers. @@ -5528,6 +5529,9 @@ xfs_btree_init_cur_caches(void) if (error) goto err; error = xfs_refcountbt_init_cur_cache(); + if (error) + goto err; + error = xfs_rtrmapbt_init_cur_cache(); if (error) goto err; @@ -5546,6 +5550,7 @@ xfs_btree_destroy_cur_caches(void) xfs_bmbt_destroy_cur_cache(); xfs_rmapbt_destroy_cur_cache(); xfs_refcountbt_destroy_cur_cache(); + xfs_rtrmapbt_destroy_cur_cache(); } /* Move the btree cursor before the first record. */ diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index b47d4f16143a6..5317c6438f070 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1753,6 +1753,9 @@ typedef __be32 xfs_rmap_ptr_t; */ #define XFS_RTRMAP_CRC_MAGIC 0x4d415052 /* 'MAPR' */ +/* inode-based btree pointer type */ +typedef __be64 xfs_rtrmap_ptr_t; + /* * Reference Count Btree format definitions * diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h index 70b96efa26973..897a1b72f8d12 100644 --- a/fs/xfs/libxfs/xfs_ondisk.h +++ b/fs/xfs/libxfs/xfs_ondisk.h @@ -77,6 +77,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(union xfs_rtword_raw, 4); XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_raw, 4); XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo, 48); + XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t, 8); /* * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c new file mode 100644 index 0000000000000..ecacb457cd27f --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -0,0 +1,307 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2018-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_alloc.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_trace.h" +#include "xfs_cksum.h" +#include "xfs_error.h" +#include "xfs_extent_busy.h" +#include "xfs_rtgroup.h" + +static struct kmem_cache *xfs_rtrmapbt_cur_cache; + +/* + * Realtime Reverse Map btree. + * + * This is a btree used to track the owner(s) of a given extent in the realtime + * device. See the comments in xfs_rmap_btree.c for more information. + * + * This tree is basically the same as the regular rmap btree except that it + * is rooted in an inode and does not live in free space. + */ + +static struct xfs_btree_cur * +xfs_rtrmapbt_dup_cursor( + struct xfs_btree_cur *cur) +{ + struct xfs_btree_cur *new; + + new = xfs_rtrmapbt_init_cursor(cur->bc_mp, cur->bc_tp, cur->bc_ino.rtg, + cur->bc_ino.ip); + + /* Copy the flags values since init cursor doesn't get them. */ + new->bc_ino.flags = cur->bc_ino.flags; + + return new; +} + +static xfs_failaddr_t +xfs_rtrmapbt_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_target->bt_mount; + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + xfs_failaddr_t fa; + int level; + + if (!xfs_verify_magic(bp, block->bb_magic)) + return __this_address; + + if (!xfs_has_rmapbt(mp)) + return __this_address; + fa = xfs_btree_lblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN); + if (fa) + return fa; + level = be16_to_cpu(block->bb_level); + if (level > mp->m_rtrmap_maxlevels) + return __this_address; + + return xfs_btree_lblock_verify(bp, mp->m_rtrmap_mxr[level != 0]); +} + +static void +xfs_rtrmapbt_read_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + if (!xfs_btree_lblock_verify_crc(bp)) + xfs_verifier_error(bp, -EFSBADCRC, __this_address); + else { + fa = xfs_rtrmapbt_verify(bp); + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + } + + if (bp->b_error) + trace_xfs_btree_corrupt(bp, _RET_IP_); +} + +static void +xfs_rtrmapbt_write_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + fa = xfs_rtrmapbt_verify(bp); + if (fa) { + trace_xfs_btree_corrupt(bp, _RET_IP_); + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + return; + } + xfs_btree_lblock_calc_crc(bp); + +} + +const struct xfs_buf_ops xfs_rtrmapbt_buf_ops = { + .name = "xfs_rtrmapbt", + .magic = { 0, cpu_to_be32(XFS_RTRMAP_CRC_MAGIC) }, + .verify_read = xfs_rtrmapbt_read_verify, + .verify_write = xfs_rtrmapbt_write_verify, + .verify_struct = xfs_rtrmapbt_verify, +}; + +const struct xfs_btree_ops xfs_rtrmapbt_ops = { + .rec_len = sizeof(struct xfs_rmap_rec), + .key_len = 2 * sizeof(struct xfs_rmap_key), + .lru_refs = XFS_RMAP_BTREE_REF, + .geom_flags = XFS_BTREE_LONG_PTRS | XFS_BTREE_ROOT_IN_INODE | + XFS_BTREE_CRC_BLOCKS | XFS_BTREE_OVERLAPPING | + XFS_BTREE_IROOT_RECORDS, + + .dup_cursor = xfs_rtrmapbt_dup_cursor, + .buf_ops = &xfs_rtrmapbt_buf_ops, +}; + +/* Initialize a new rt rmap btree cursor. */ +static struct xfs_btree_cur * +xfs_rtrmapbt_init_common( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)); + + cur = xfs_btree_alloc_cursor(mp, tp, XFS_BTNUM_RTRMAP, + &xfs_rtrmapbt_ops, mp->m_rtrmap_maxlevels, + xfs_rtrmapbt_cur_cache); + cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_rmap_2); + + cur->bc_ino.ip = ip; + cur->bc_ino.allocated = 0; + cur->bc_ino.flags = 0; + + cur->bc_ino.rtg = xfs_rtgroup_hold(rtg); + return cur; +} + +/* Allocate a new rt rmap btree cursor. */ +struct xfs_btree_cur * +xfs_rtrmapbt_init_cursor( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + + cur = xfs_rtrmapbt_init_common(mp, tp, rtg, ip); + cur->bc_nlevels = be16_to_cpu(ifp->if_broot->bb_level) + 1; + cur->bc_ino.forksize = xfs_inode_fork_size(ip, XFS_DATA_FORK); + cur->bc_ino.whichfork = XFS_DATA_FORK; + return cur; +} + +/* Create a new rt reverse mapping btree cursor with a fake root for staging. */ +struct xfs_btree_cur * +xfs_rtrmapbt_stage_cursor( + struct xfs_mount *mp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake) +{ + struct xfs_btree_cur *cur; + + cur = xfs_rtrmapbt_init_common(mp, NULL, rtg, ip); + cur->bc_nlevels = ifake->if_levels; + cur->bc_ino.forksize = ifake->if_fork_size; + cur->bc_ino.whichfork = -1; + xfs_btree_stage_ifakeroot(cur, ifake, NULL); + return cur; +} + +/* + * Install a new rt reverse mapping btree root. Caller is responsible for + * invalidating and freeing the old btree blocks. + */ +void +xfs_rtrmapbt_commit_staged_btree( + struct xfs_btree_cur *cur, + struct xfs_trans *tp) +{ + struct xbtree_ifakeroot *ifake = cur->bc_ino.ifake; + struct xfs_ifork *ifp; + int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; + + ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + + /* + * Free any resources hanging off the real fork, then shallow-copy the + * staging fork's contents into the real fork to transfer everything + * we just built. + */ + ifp = xfs_ifork_ptr(cur->bc_ino.ip, XFS_DATA_FORK); + xfs_idestroy_fork(ifp); + memcpy(ifp, ifake->if_fork, sizeof(struct xfs_ifork)); + + xfs_trans_log_inode(tp, cur->bc_ino.ip, flags); + xfs_btree_commit_ifakeroot(cur, tp, XFS_DATA_FORK, &xfs_rtrmapbt_ops); +} + +/* Calculate number of records in a rt reverse mapping btree block. */ +static inline unsigned int +xfs_rtrmapbt_block_maxrecs( + unsigned int blocklen, + bool leaf) +{ + if (leaf) + return blocklen / sizeof(struct xfs_rmap_rec); + return blocklen / + (2 * sizeof(struct xfs_rmap_key) + sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Calculate number of records in an rt reverse mapping btree block. + */ +unsigned int +xfs_rtrmapbt_maxrecs( + struct xfs_mount *mp, + unsigned int blocklen, + bool leaf) +{ + blocklen -= XFS_RTRMAP_BLOCK_LEN; + return xfs_rtrmapbt_block_maxrecs(blocklen, leaf); +} + +/* Compute the max possible height for realtime reverse mapping btrees. */ +unsigned int +xfs_rtrmapbt_maxlevels_ondisk(void) +{ + unsigned int minrecs[2]; + unsigned int blocklen; + + blocklen = XFS_MIN_CRC_BLOCKSIZE - XFS_BTREE_LBLOCK_CRC_LEN; + + minrecs[0] = xfs_rtrmapbt_block_maxrecs(blocklen, true) / 2; + minrecs[1] = xfs_rtrmapbt_block_maxrecs(blocklen, false) / 2; + + /* We need at most one record for every block in an rt group. */ + return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS); +} + +int __init +xfs_rtrmapbt_init_cur_cache(void) +{ + xfs_rtrmapbt_cur_cache = kmem_cache_create("xfs_rtrmapbt_cur", + xfs_btree_cur_sizeof(xfs_rtrmapbt_maxlevels_ondisk()), + 0, 0, NULL); + + if (!xfs_rtrmapbt_cur_cache) + return -ENOMEM; + return 0; +} + +void +xfs_rtrmapbt_destroy_cur_cache(void) +{ + kmem_cache_destroy(xfs_rtrmapbt_cur_cache); + xfs_rtrmapbt_cur_cache = NULL; +} + +/* Compute the maximum height of an rt reverse mapping btree. */ +void +xfs_rtrmapbt_compute_maxlevels( + struct xfs_mount *mp) +{ + unsigned int d_maxlevels, r_maxlevels; + + if (!xfs_has_rtrmapbt(mp)) { + mp->m_rtrmap_maxlevels = 0; + return; + } + + /* + * The realtime rmapbt lives on the data device, which means that its + * maximum height is constrained by the size of the data device and + * the height required to store one rmap record for each block in an + * rt group. + */ + d_maxlevels = xfs_btree_space_to_height(mp->m_rtrmap_mnr, + mp->m_sb.sb_dblocks); + r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrmap_mnr, + mp->m_sb.sb_rgblocks); + + /* Add one level to handle the inode root level. */ + mp->m_rtrmap_maxlevels = min(d_maxlevels, r_maxlevels) + 1; +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h new file mode 100644 index 0000000000000..0f73267971924 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (c) 2018-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_RTRMAP_BTREE_H__ +#define __XFS_RTRMAP_BTREE_H__ + +struct xfs_buf; +struct xfs_btree_cur; +struct xfs_mount; +struct xbtree_ifakeroot; +struct xfs_rtgroup; + +/* rmaps only exist on crc enabled filesystems */ +#define XFS_RTRMAP_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN + +struct xfs_btree_cur *xfs_rtrmapbt_init_cursor(struct xfs_mount *mp, + struct xfs_trans *tp, struct xfs_rtgroup *rtg, + struct xfs_inode *ip); +struct xfs_btree_cur *xfs_rtrmapbt_stage_cursor(struct xfs_mount *mp, + struct xfs_rtgroup *rtg, struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake); +void xfs_rtrmapbt_commit_staged_btree(struct xfs_btree_cur *cur, + struct xfs_trans *tp); +unsigned int xfs_rtrmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, + bool leaf); +void xfs_rtrmapbt_compute_maxlevels(struct xfs_mount *mp); + +/* + * Addresses of records, keys, and pointers within an incore rtrmapbt block. + * + * (note that some of these may appear unused, but they are used in userspace) + */ +static inline struct xfs_rmap_rec * +xfs_rtrmap_rec_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_rec *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + (index - 1) * sizeof(struct xfs_rmap_rec)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_key_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_high_key_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + sizeof(struct xfs_rmap_key) + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_ptr_addr( + struct xfs_btree_block *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrmap_ptr_t *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + maxrecs * 2 * sizeof(struct xfs_rmap_key) + + (index - 1) * sizeof(xfs_rtrmap_ptr_t)); +} + +unsigned int xfs_rtrmapbt_maxlevels_ondisk(void); + +int __init xfs_rtrmapbt_init_cur_cache(void); +void xfs_rtrmapbt_destroy_cur_cache(void); + +#endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 8178a8e8097ff..a5ca8f4f8699f 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -28,6 +28,7 @@ #include "xfs_rtbitmap.h" #include "xfs_swapext.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" /* * Physical superblock buffer manipulations. Shared with libxfs in userspace. @@ -1125,6 +1126,11 @@ xfs_sb_mount_common( mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2; mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2; + mp->m_rtrmap_mxr[0] = xfs_rtrmapbt_maxrecs(mp, sbp->sb_blocksize, true); + mp->m_rtrmap_mxr[1] = xfs_rtrmapbt_maxrecs(mp, sbp->sb_blocksize, false); + mp->m_rtrmap_mnr[0] = mp->m_rtrmap_mxr[0] / 2; + mp->m_rtrmap_mnr[1] = mp->m_rtrmap_mxr[1] / 2; + mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, true); mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, false); mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2; diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index 8ad4b67d6febb..adb742267c9d0 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -42,6 +42,7 @@ extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops; extern const struct xfs_buf_ops xfs_rtsummary_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; extern const struct xfs_buf_ops xfs_rtsb_buf_ops; +extern const struct xfs_buf_ops xfs_rtrmapbt_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops; extern const struct xfs_buf_ops xfs_symlink_buf_ops; @@ -54,6 +55,7 @@ extern const struct xfs_btree_ops xfs_finobt_ops; extern const struct xfs_btree_ops xfs_bmbt_ops; extern const struct xfs_btree_ops xfs_refcountbt_ops; extern const struct xfs_btree_ops xfs_rmapbt_ops; +extern const struct xfs_btree_ops xfs_rtrmapbt_ops; /* log size calculation functions */ int xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes); diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index c774ac73bdec5..30d2d0c5e5e53 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -36,6 +36,7 @@ #include "xfs_ag.h" #include "xfs_imeta.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" #include "scrub/stats.h" static DEFINE_MUTEX(xfs_uuid_table_mutex); @@ -664,8 +665,7 @@ static inline void xfs_rtbtree_compute_maxlevels( struct xfs_mount *mp) { - /* This will be filled in later. */ - mp->m_rtbtree_maxlevels = 0; + mp->m_rtbtree_maxlevels = mp->m_rtrmap_maxlevels; } /* @@ -737,6 +737,7 @@ xfs_mountfs( xfs_bmap_compute_maxlevels(mp, XFS_ATTR_FORK); xfs_mount_setup_inode_geom(mp); xfs_rmapbt_compute_maxlevels(mp); + xfs_rtrmapbt_compute_maxlevels(mp); xfs_refcountbt_compute_maxlevels(mp); xfs_agbtree_compute_maxlevels(mp); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 7ef8d5f706883..ada8b281d74e2 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -133,11 +133,14 @@ typedef struct xfs_mount { uint m_bmap_dmnr[2]; /* min bmap btree records */ uint m_rmap_mxr[2]; /* max rmap btree records */ uint m_rmap_mnr[2]; /* min rmap btree records */ + uint m_rtrmap_mxr[2]; /* max rtrmap btree records */ + uint m_rtrmap_mnr[2]; /* min rtrmap btree records */ uint m_refc_mxr[2]; /* max refc btree records */ uint m_refc_mnr[2]; /* min refc btree records */ uint m_alloc_maxlevels; /* max alloc btree levels */ uint m_bm_maxlevels[2]; /* max bmap btree levels */ uint m_rmap_maxlevels; /* max rmap btree levels */ + uint m_rtrmap_maxlevels; /* max rtrmap btree level */ uint m_refc_maxlevels; /* max refcount btree level */ unsigned int m_agbtree_maxlevels; /* max level of all AG btrees */ unsigned int m_rtbtree_maxlevels; /* max level of all rt btrees */ @@ -369,6 +372,12 @@ __XFS_HAS_FEAT(large_extent_counts, NREXT64) __XFS_HAS_FEAT(metadir, METADIR) __XFS_HAS_FEAT(rtgroups, RTGROUPS) +static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp) +{ + return xfs_has_rtgroups(mp) && xfs_has_realtime(mp) && + xfs_has_rmapbt(mp); +} + /* * Mount features * From patchwork Sun Dec 31 21:33:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507669 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5FD6ABA22 for ; Sun, 31 Dec 2023 21:33:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bnVikEOK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 34857C433C7; Sun, 31 Dec 2023 21:33:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058386; bh=zUa7UOeOORDKhHsPSUq6Z/vDheHJfAwPOSgkpI3dhaE=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=bnVikEOKHT1MTErmzDtyq0gkGb6T1JZATEoKi2KUrQRTiMYsGAQ+LHDAsKm23AMg3 rWdO++1xTARinod0YPLrLMePqt9AA37XkQmSwYa23+i+B3OgL2l/luPEAV69jSIy4H xbIo29uVMojiNC5VGdNNCGrdktVK2sr6IMcrMjLEDENh1gJZ+etIqZwikxp7ekaWIT CvIidkrVAF9EFlj0QvhW8xCa7AxcZdgDxvjjHvgfL1S9cIKbTomZbKZ+QpICjPQSfE QXTYE/o3avQb71Or30mPKUlB06JRjdTmIAxlekQA8VAvMXStpu1cnvGBMM1z8yfA+p I4aXw5dEwrYfw== Date: Sun, 31 Dec 2023 13:33:05 -0800 Subject: [PATCH 05/39] xfs: realtime rmap btree transaction reservations From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404849982.1764998.2025526432916669148.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Make sure that there's enough log reservation to handle mapping and unmapping realtime extents. We have to reserve enough space to handle a split in the rtrmapbt to add the record and a second split in the regular rmapbt to record the rtrmapbt split. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_swapext.c | 4 +++- fs/xfs/libxfs/xfs_trans_resv.c | 12 ++++++++++-- fs/xfs/libxfs/xfs_trans_space.h | 13 +++++++++++++ 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c index 244ef3d8431fd..80d4aa1ec6399 100644 --- a/fs/xfs/libxfs/xfs_swapext.c +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -761,7 +761,9 @@ xfs_swapext_rmapbt_blocks( if (!xfs_has_rmapbt(mp)) return 0; if (XFS_IS_REALTIME_INODE(req->ip1)) - return 0; + return howmany_64(req->nr_exchanges, + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp)) * + XFS_RTRMAPADD_SPACE_RES(mp); return howmany_64(req->nr_exchanges, XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)) * diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 83922f5e54aed..423b0cede71cb 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -214,7 +214,9 @@ xfs_calc_inode_chunk_res( * Per-extent log reservation for the btree changes involved in freeing or * allocating a realtime extent. We have to be able to log as many rtbitmap * blocks as needed to mark inuse XFS_BMBT_MAX_EXTLEN blocks' worth of realtime - * extents, as well as the realtime summary block. + * extents, as well as the realtime summary block (t1). Realtime rmap btree + * operations happen in a second transaction, so factor in a couple of rtrmapbt + * splits (t2). */ static unsigned int xfs_rtalloc_block_count( @@ -223,10 +225,16 @@ xfs_rtalloc_block_count( { unsigned int rtbmp_blocks; xfs_rtxlen_t rtxlen; + unsigned int t1, t2 = 0; rtxlen = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN); rtbmp_blocks = xfs_rtbitmap_blockcount(mp, rtxlen); - return (rtbmp_blocks + 1) * num_ops; + t1 = (rtbmp_blocks + 1) * num_ops; + + if (xfs_has_rmapbt(mp)) + t2 = num_ops * (2 * mp->m_rtrmap_maxlevels - 1); + + return max(t1, t2); } /* diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h index 1155ff2d37e29..d89b570aafcc6 100644 --- a/fs/xfs/libxfs/xfs_trans_space.h +++ b/fs/xfs/libxfs/xfs_trans_space.h @@ -14,6 +14,19 @@ #define XFS_MAX_CONTIG_BMAPS_PER_BLOCK(mp) \ (((mp)->m_bmap_dmxr[0]) - ((mp)->m_bmap_dmnr[0])) +/* Worst case number of realtime rmaps that can be held in a block. */ +#define XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp) \ + (((mp)->m_rtrmap_mxr[0]) - ((mp)->m_rtrmap_mnr[0])) + +/* Adding one realtime rmap could split every level to the top of the tree. */ +#define XFS_RTRMAPADD_SPACE_RES(mp) ((mp)->m_rtrmap_maxlevels) + +/* Blocks we might need to add "b" realtime rmaps to a tree. */ +#define XFS_NRTRMAPADD_SPACE_RES(mp, b) \ + ((((b) + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp) - 1) / \ + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp)) * \ + XFS_RTRMAPADD_SPACE_RES(mp)) + /* Worst case number of rmaps that can be held in a block. */ #define XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) \ (((mp)->m_rmap_mxr[0]) - ((mp)->m_rmap_mnr[0])) From patchwork Sun Dec 31 21:33:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507670 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E6D8BA22 for ; Sun, 31 Dec 2023 21:33:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WpL3XINZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DF934C433C7; Sun, 31 Dec 2023 21:33:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058401; bh=2sKNKP31axM5vjzBQIEk0zwqt33UTTx0lklZFdn+ySM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=WpL3XINZPNPgbM6/tzwaDhLAw4VijJZWIo2+tO7n3G5kAR/PmToT9eF0j4ssj642i sX+yo9fwH9xKgrye7jISOEJiG74t4n/aUzNzEyPtBr2KZu25RuDUry2f/KNLLjZ9de q7HpviWbl1ewLHLOd1DPJhPnwxzPz5OBc/6IhfJCH9paNtx68LaMHmUVIKhKnhfpoi tBX2f+uFzvHd86IdiyI8WFURseIZlBjAD8DR9FlLHwxn76X+uz+fv1QiMlGVJv3DJG RtB10kLL7UTF9mKEWyDYyzKhmmmPxtH8Q09GmesGsFEu2N2MP3t0Zn6Ind2utyXc1a n/nT1aDWf5CJQ== Date: Sun, 31 Dec 2023 13:33:21 -0800 Subject: [PATCH 06/39] xfs: add realtime rmap btree operations From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404849998.1764998.10573874480161884224.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Implement the generic btree operations needed to manipulate rtrmap btree blocks. This is different from the regular rmapbt in that we allocate space from the filesystem at large, and are neither constrained to the free space nor any particular AG. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_btree.c | 71 ++++++++++ fs/xfs/libxfs/xfs_btree.h | 5 + fs/xfs/libxfs/xfs_rtrmap_btree.c | 271 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 347 insertions(+) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index c8a2ce10cccc8..a294641d91832 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -33,6 +33,10 @@ #include "xfs_btree_mem.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_bmap.h" +#include "xfs_rmap.h" +#include "xfs_quota.h" +#include "xfs_imeta.h" /* * Btree magic numbers. @@ -5579,3 +5583,70 @@ xfs_btree_goto_left_edge( return 0; } + +/* Allocate a block for an inode-rooted metadata btree. */ +int +xfs_btree_alloc_imeta_block( + struct xfs_btree_cur *cur, + const union xfs_btree_ptr *start, + union xfs_btree_ptr *new, + int *stat) +{ + struct xfs_alloc_arg args = { + .mp = cur->bc_mp, + .tp = cur->bc_tp, + .resv = XFS_AG_RESV_IMETA, + .minlen = 1, + .maxlen = 1, + .prod = 1, + }; + struct xfs_inode *ip = cur->bc_ino.ip; + int error; + + ASSERT(xfs_is_metadir_inode(ip)); + ASSERT(XFS_IS_DQDETACHED(cur->bc_mp, ip)); + + xfs_rmap_ino_bmbt_owner(&args.oinfo, ip->i_ino, cur->bc_ino.whichfork); + error = xfs_alloc_vextent_start_ag(&args, + XFS_INO_TO_FSB(cur->bc_mp, ip->i_ino)); + if (error) + return error; + if (args.fsbno == NULLFSBLOCK) { + *stat = 0; + return 0; + } + ASSERT(args.len == 1); + + xfs_imeta_resv_alloc_extent(ip, &args); + cur->bc_ino.allocated++; + + new->l = cpu_to_be64(args.fsbno); + *stat = 1; + return 0; +} + +/* Free a block from an inode-rooted metadata btree. */ +int +xfs_btree_free_imeta_block( + struct xfs_btree_cur *cur, + struct xfs_buf *bp) +{ + struct xfs_owner_info oinfo; + struct xfs_mount *mp = cur->bc_mp; + struct xfs_inode *ip = cur->bc_ino.ip; + struct xfs_trans *tp = cur->bc_tp; + xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); + int error; + + ASSERT(xfs_is_metadir_inode(ip)); + ASSERT(XFS_IS_DQDETACHED(cur->bc_mp, ip)); + + xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); + error = xfs_free_extent_later(tp, fsbno, 1, &oinfo, XFS_AG_RESV_IMETA, + 0); + if (error) + return error; + + xfs_imeta_resv_free_extent(ip, tp, 1); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index e6571c9157d1e..3559cf5d3a653 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -764,4 +764,9 @@ void xfs_btree_destroy_cur_caches(void); int xfs_btree_goto_left_edge(struct xfs_btree_cur *cur); +int xfs_btree_alloc_imeta_block(struct xfs_btree_cur *cur, + const union xfs_btree_ptr *start, union xfs_btree_ptr *newp, + int *stat); +int xfs_btree_free_imeta_block(struct xfs_btree_cur *cur, struct xfs_buf *bp); + #endif /* __XFS_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index ecacb457cd27f..5be3e1af55684 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -18,12 +18,14 @@ #include "xfs_alloc.h" #include "xfs_btree.h" #include "xfs_btree_staging.h" +#include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" #include "xfs_trace.h" #include "xfs_cksum.h" #include "xfs_error.h" #include "xfs_extent_busy.h" #include "xfs_rtgroup.h" +#include "xfs_bmap.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -52,6 +54,182 @@ xfs_rtrmapbt_dup_cursor( return new; } +STATIC int +xfs_rtrmapbt_get_minrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrmapbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0) / 2; + } + + return cur->bc_mp->m_rtrmap_mnr[level != 0]; +} + +STATIC int +xfs_rtrmapbt_get_maxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrmapbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0); + } + + return cur->bc_mp->m_rtrmap_mxr[level != 0]; +} + +/* + * Convert the ondisk record's offset field into the ondisk key's offset field. + * Fork and bmbt are significant parts of the rmap record key, but written + * status is merely a record attribute. + */ +static inline __be64 ondisk_rec_offset_to_key(const union xfs_btree_rec *rec) +{ + return rec->rmap.rm_offset & ~cpu_to_be64(XFS_RMAP_OFF_UNWRITTEN); +} + +STATIC void +xfs_rtrmapbt_init_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + key->rmap.rm_startblock = rec->rmap.rm_startblock; + key->rmap.rm_owner = rec->rmap.rm_owner; + key->rmap.rm_offset = ondisk_rec_offset_to_key(rec); +} + +STATIC void +xfs_rtrmapbt_init_high_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + uint64_t off; + int adj; + + adj = be32_to_cpu(rec->rmap.rm_blockcount) - 1; + + key->rmap.rm_startblock = rec->rmap.rm_startblock; + be32_add_cpu(&key->rmap.rm_startblock, adj); + key->rmap.rm_owner = rec->rmap.rm_owner; + key->rmap.rm_offset = ondisk_rec_offset_to_key(rec); + if (XFS_RMAP_NON_INODE_OWNER(be64_to_cpu(rec->rmap.rm_owner)) || + XFS_RMAP_IS_BMBT_BLOCK(be64_to_cpu(rec->rmap.rm_offset))) + return; + off = be64_to_cpu(key->rmap.rm_offset); + off = (XFS_RMAP_OFF(off) + adj) | (off & ~XFS_RMAP_OFF_MASK); + key->rmap.rm_offset = cpu_to_be64(off); +} + +STATIC void +xfs_rtrmapbt_init_rec_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_rec *rec) +{ + rec->rmap.rm_startblock = cpu_to_be32(cur->bc_rec.r.rm_startblock); + rec->rmap.rm_blockcount = cpu_to_be32(cur->bc_rec.r.rm_blockcount); + rec->rmap.rm_owner = cpu_to_be64(cur->bc_rec.r.rm_owner); + rec->rmap.rm_offset = cpu_to_be64( + xfs_rmap_irec_offset_pack(&cur->bc_rec.r)); +} + +STATIC void +xfs_rtrmapbt_init_ptr_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr) +{ + ptr->l = 0; +} + +/* + * Mask the appropriate parts of the ondisk key field for a key comparison. + * Fork and bmbt are significant parts of the rmap record key, but written + * status is merely a record attribute. + */ +static inline uint64_t offset_keymask(uint64_t offset) +{ + return offset & ~XFS_RMAP_OFF_UNWRITTEN; +} + +STATIC int64_t +xfs_rtrmapbt_key_diff( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key) +{ + struct xfs_rmap_irec *rec = &cur->bc_rec.r; + const struct xfs_rmap_key *kp = &key->rmap; + __u64 x, y; + int64_t d; + + d = (int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock; + if (d) + return d; + + x = be64_to_cpu(kp->rm_owner); + y = rec->rm_owner; + if (x > y) + return 1; + else if (y > x) + return -1; + + x = offset_keymask(be64_to_cpu(kp->rm_offset)); + y = offset_keymask(xfs_rmap_irec_offset_pack(rec)); + if (x > y) + return 1; + else if (y > x) + return -1; + return 0; +} + +STATIC int64_t +xfs_rtrmapbt_diff_two_keys( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2, + const union xfs_btree_key *mask) +{ + const struct xfs_rmap_key *kp1 = &k1->rmap; + const struct xfs_rmap_key *kp2 = &k2->rmap; + int64_t d; + __u64 x, y; + + /* Doesn't make sense to mask off the physical space part */ + ASSERT(!mask || mask->rmap.rm_startblock); + + d = (int64_t)be32_to_cpu(kp1->rm_startblock) - + be32_to_cpu(kp2->rm_startblock); + if (d) + return d; + + if (!mask || mask->rmap.rm_owner) { + x = be64_to_cpu(kp1->rm_owner); + y = be64_to_cpu(kp2->rm_owner); + if (x > y) + return 1; + else if (y > x) + return -1; + } + + if (!mask || mask->rmap.rm_offset) { + /* Doesn't make sense to allow offset but not owner */ + ASSERT(!mask || mask->rmap.rm_owner); + + x = offset_keymask(be64_to_cpu(kp1->rm_offset)); + y = offset_keymask(be64_to_cpu(kp2->rm_offset)); + if (x > y) + return 1; + else if (y > x) + return -1; + } + + return 0; +} + static xfs_failaddr_t xfs_rtrmapbt_verify( struct xfs_buf *bp) @@ -118,6 +296,86 @@ const struct xfs_buf_ops xfs_rtrmapbt_buf_ops = { .verify_struct = xfs_rtrmapbt_verify, }; +STATIC int +xfs_rtrmapbt_keys_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2) +{ + uint32_t x; + uint32_t y; + uint64_t a; + uint64_t b; + + x = be32_to_cpu(k1->rmap.rm_startblock); + y = be32_to_cpu(k2->rmap.rm_startblock); + if (x < y) + return 1; + else if (x > y) + return 0; + a = be64_to_cpu(k1->rmap.rm_owner); + b = be64_to_cpu(k2->rmap.rm_owner); + if (a < b) + return 1; + else if (a > b) + return 0; + a = offset_keymask(be64_to_cpu(k1->rmap.rm_offset)); + b = offset_keymask(be64_to_cpu(k2->rmap.rm_offset)); + if (a <= b) + return 1; + return 0; +} + +STATIC int +xfs_rtrmapbt_recs_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_rec *r1, + const union xfs_btree_rec *r2) +{ + uint32_t x; + uint32_t y; + uint64_t a; + uint64_t b; + + x = be32_to_cpu(r1->rmap.rm_startblock); + y = be32_to_cpu(r2->rmap.rm_startblock); + if (x < y) + return 1; + else if (x > y) + return 0; + a = be64_to_cpu(r1->rmap.rm_owner); + b = be64_to_cpu(r2->rmap.rm_owner); + if (a < b) + return 1; + else if (a > b) + return 0; + a = offset_keymask(be64_to_cpu(r1->rmap.rm_offset)); + b = offset_keymask(be64_to_cpu(r2->rmap.rm_offset)); + if (a <= b) + return 1; + return 0; +} + +STATIC enum xbtree_key_contig +xfs_rtrmapbt_keys_contiguous( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key1, + const union xfs_btree_key *key2, + const union xfs_btree_key *mask) +{ + ASSERT(!mask || mask->rmap.rm_startblock); + + /* + * We only support checking contiguity of the physical space component. + * If any callers ever need more specificity than that, they'll have to + * implement it here. + */ + ASSERT(!mask || (!mask->rmap.rm_owner && !mask->rmap.rm_offset)); + + return xbtree_key_contig(be32_to_cpu(key1->rmap.rm_startblock), + be32_to_cpu(key2->rmap.rm_startblock)); +} + const struct xfs_btree_ops xfs_rtrmapbt_ops = { .rec_len = sizeof(struct xfs_rmap_rec), .key_len = 2 * sizeof(struct xfs_rmap_key), @@ -127,7 +385,20 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { XFS_BTREE_IROOT_RECORDS, .dup_cursor = xfs_rtrmapbt_dup_cursor, + .alloc_block = xfs_btree_alloc_imeta_block, + .free_block = xfs_btree_free_imeta_block, + .get_minrecs = xfs_rtrmapbt_get_minrecs, + .get_maxrecs = xfs_rtrmapbt_get_maxrecs, + .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, + .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, + .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, + .init_ptr_from_cur = xfs_rtrmapbt_init_ptr_from_cur, + .key_diff = xfs_rtrmapbt_key_diff, .buf_ops = &xfs_rtrmapbt_buf_ops, + .diff_two_keys = xfs_rtrmapbt_diff_two_keys, + .keys_inorder = xfs_rtrmapbt_keys_inorder, + .recs_inorder = xfs_rtrmapbt_recs_inorder, + .keys_contiguous = xfs_rtrmapbt_keys_contiguous, }; /* Initialize a new rt rmap btree cursor. */ From patchwork Sun Dec 31 21:33:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507671 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A907FBA2B for ; Sun, 31 Dec 2023 21:33:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DQAFYann" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 746E1C433C8; Sun, 31 Dec 2023 21:33:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058417; bh=VxG14s1tOg/6X9l//bjZzEnDcPUH8e5lGRuk8CEEDEg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=DQAFYannQnomOJqBAkS/9NDkLhwNSfltgNP5WVRBUWD2vgENrudhuSSyOtxxAAKnw rsYrvYIEgPGEURcqvzaTna3gMi4/SgDUCduAv4vsoBK+OeTuOrIhxzBtXphPiileYX ilWc224anjR/6m5UbCwL0JKbBXnQ5VN6uwsWIS/YrRBU8J9riA0VwUSgUMXjFQ7hCT 4NFjTod5DY0T2K5tRbW3NzoYaOL1Og0Oez/wy2UbwhXWxtQs6PSg1ng4lBE8YXUR35 zwFcfSr5YKYCELZVCwQmZOs/p9cH0n5irfbKpkkZDPo7+Lu+eJStlVhOXFgzEQUk1e Cvui5aB5j249g== Date: Sun, 31 Dec 2023 13:33:37 -0800 Subject: [PATCH 07/39] xfs: prepare rmap functions to deal with rtrmapbt From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850014.1764998.4302501890890069104.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Prepare the high-level rmap functions to deal with the new realtime rmapbt and its slightly different conventions. Provide the ability to talk to either rmapbt or rtrmapbt formats from the same high level code. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_rmap.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rmap.h | 3 ++ 2 files changed, 69 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index b3383cc474492..35df4e832996e 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -25,6 +25,7 @@ #include "xfs_ag.h" #include "xfs_health.h" #include "xfs_rmap_item.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_rmap_intent_cache; @@ -264,11 +265,72 @@ xfs_rmap_check_irec( return NULL; } +xfs_failaddr_t +xfs_rtrmap_check_irec( + struct xfs_rtgroup *rtg, + const struct xfs_rmap_irec *irec) +{ + struct xfs_mount *mp = rtg->rtg_mount; + bool is_inode; + bool is_unwritten; + bool is_bmbt; + bool is_attr; + + if (irec->rm_blockcount == 0) + return __this_address; + + if (irec->rm_owner == XFS_RMAP_OWN_FS) { + if (irec->rm_startblock != 0) + return __this_address; + if (irec->rm_blockcount != mp->m_sb.sb_rextsize) + return __this_address; + if (irec->rm_offset != 0) + return __this_address; + } else { + if (!xfs_verify_rgbext(rtg, irec->rm_startblock, + irec->rm_blockcount)) + return __this_address; + } + + if (!(xfs_verify_ino(mp, irec->rm_owner) || + (irec->rm_owner <= XFS_RMAP_OWN_FS && + irec->rm_owner >= XFS_RMAP_OWN_MIN))) + return __this_address; + + /* Check flags. */ + is_inode = !XFS_RMAP_NON_INODE_OWNER(irec->rm_owner); + is_bmbt = irec->rm_flags & XFS_RMAP_BMBT_BLOCK; + is_attr = irec->rm_flags & XFS_RMAP_ATTR_FORK; + is_unwritten = irec->rm_flags & XFS_RMAP_UNWRITTEN; + + if (!is_inode && irec->rm_owner != XFS_RMAP_OWN_FS) + return __this_address; + + if (!is_inode && irec->rm_offset != 0) + return __this_address; + + if (is_bmbt || is_attr) + return __this_address; + + if (is_unwritten && !is_inode) + return __this_address; + + /* Check for a valid fork offset, if applicable. */ + if (is_inode && + !xfs_verify_fileext(mp, irec->rm_offset, irec->rm_blockcount)) + return __this_address; + + return NULL; +} + static inline xfs_failaddr_t xfs_rmap_check_btrec( struct xfs_btree_cur *cur, const struct xfs_rmap_irec *irec) { + if (cur->bc_btnum == XFS_BTNUM_RTRMAP) + return xfs_rtrmap_check_irec(cur->bc_ino.rtg, irec); + if (cur->bc_flags & XFS_BTREE_IN_XFILE) return xfs_rmap_check_irec(cur->bc_mem.pag, irec); return xfs_rmap_check_irec(cur->bc_ag.pag, irec); @@ -285,6 +347,10 @@ xfs_rmap_complain_bad_rec( if (cur->bc_flags & XFS_BTREE_IN_XFILE) xfs_warn(mp, "In-Memory Reverse Mapping BTree record corruption detected at %pS!", fa); + else if (cur->bc_btnum == XFS_BTNUM_RTRMAP) + xfs_warn(mp, + "RT Reverse Mapping BTree record corruption in rtgroup %u detected at %pS!", + cur->bc_ino.rtg->rtg_rgno, fa); else xfs_warn(mp, "Reverse Mapping BTree record corruption in AG %d detected at %pS!", diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index 0ccfd7d88e56e..762f2f40b6e47 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -7,6 +7,7 @@ #define __XFS_RMAP_H__ struct xfs_perag; +struct xfs_rtgroup; static inline void xfs_rmap_ino_bmbt_owner( @@ -206,6 +207,8 @@ xfs_failaddr_t xfs_rmap_btrec_to_irec(const union xfs_btree_rec *rec, struct xfs_rmap_irec *irec); xfs_failaddr_t xfs_rmap_check_irec(struct xfs_perag *pag, const struct xfs_rmap_irec *irec); +xfs_failaddr_t xfs_rtrmap_check_irec(struct xfs_rtgroup *rtg, + const struct xfs_rmap_irec *irec); int xfs_rmap_has_records(struct xfs_btree_cur *cur, xfs_agblock_t bno, xfs_extlen_t len, enum xbtree_recpacking *outcome); From patchwork Sun Dec 31 21:33:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507672 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A07CCBA2E for ; Sun, 31 Dec 2023 21:33:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="h3m0pGgN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1BEFDC433C7; Sun, 31 Dec 2023 21:33:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058433; bh=PI1WlxA7XD3kit4u2A+en4NnxQBdh3mXCTmK48MN01I=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=h3m0pGgN7FTNJdlcoAlK2QGh2+9SUi6u/VgPUyoT+JJGMD1ELLGmCAwPGEW/OMhGj T76KMygTiUN21MAmM+XbrtSDfygJjPFw5g41a91oFIQqJtB9wL1jYqk7ZxT4rZlmpu 9VKQKb3IKw3GSvVsSUv0vWw00Eytd7pJAEn/3ztNkGVGfv4xG4nt4OPkhvmDs46OS+ Lofx1sW1Cy+ydxHzVp3cbw07210O8DSn1uklIavX26sY/s8/fePpq196hdZH1S/d+6 +OqLN6j8pBXiu1ukcPEi+qaJSHI0mxbCyCGk04czxn9R9bVmYjWEM+34uY2RFUGDsc BPeXv3R6cSwTg== Date: Sun, 31 Dec 2023 13:33:52 -0800 Subject: [PATCH 08/39] xfs: add a realtime flag to the rmap update log redo items From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850030.1764998.6821619909784140893.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Extend the rmap update (RUI) log items with a new realtime flag that indicates that the updates apply against the realtime rmapbt. We'll wire up the actual rmap code later. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_defer.h | 1 fs/xfs/libxfs/xfs_log_format.h | 6 + fs/xfs/libxfs/xfs_log_recover.h | 2 fs/xfs/libxfs/xfs_refcount.c | 4 - fs/xfs/libxfs/xfs_rmap.c | 32 ++++- fs/xfs/libxfs/xfs_rmap.h | 12 +- fs/xfs/scrub/alloc_repair.c | 2 fs/xfs/xfs_log_recover.c | 2 fs/xfs/xfs_rmap_item.c | 237 +++++++++++++++++++++++++++++++++++++-- fs/xfs/xfs_trace.h | 23 +++- 10 files changed, 293 insertions(+), 28 deletions(-) diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index b4e1c386768c9..fddcb4cccbcc2 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -69,6 +69,7 @@ struct xfs_defer_op_type { extern const struct xfs_defer_op_type xfs_bmap_update_defer_type; extern const struct xfs_defer_op_type xfs_refcount_update_defer_type; extern const struct xfs_defer_op_type xfs_rmap_update_defer_type; +extern const struct xfs_defer_op_type xfs_rtrmap_update_defer_type; extern const struct xfs_defer_op_type xfs_extent_free_defer_type; extern const struct xfs_defer_op_type xfs_agfl_free_defer_type; extern const struct xfs_defer_op_type xfs_rtextent_free_defer_type; diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index 1f5fe4a588eca..ea4e88d665707 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -250,6 +250,8 @@ typedef struct xfs_trans_header { #define XFS_LI_SXD 0x1249 /* extent swap done */ #define XFS_LI_EFI_RT 0x124a /* realtime extent free intent */ #define XFS_LI_EFD_RT 0x124b /* realtime extent free done */ +#define XFS_LI_RUI_RT 0x124c /* realtime rmap update intent */ +#define XFS_LI_RUD_RT 0x124d /* realtime rmap update done */ #define XFS_LI_TYPE_DESC \ { XFS_LI_EFI, "XFS_LI_EFI" }, \ @@ -271,7 +273,9 @@ typedef struct xfs_trans_header { { XFS_LI_SXI, "XFS_LI_SXI" }, \ { XFS_LI_SXD, "XFS_LI_SXD" }, \ { XFS_LI_EFI_RT, "XFS_LI_EFI_RT" }, \ - { XFS_LI_EFD_RT, "XFS_LI_EFD_RT" } + { XFS_LI_EFD_RT, "XFS_LI_EFD_RT" }, \ + { XFS_LI_RUI_RT, "XFS_LI_RUI_RT" }, \ + { XFS_LI_RUD_RT, "XFS_LI_RUD_RT" } /* * Inode Log Item Format definitions. diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h index 811c37026d251..433974693d10b 100644 --- a/fs/xfs/libxfs/xfs_log_recover.h +++ b/fs/xfs/libxfs/xfs_log_recover.h @@ -79,6 +79,8 @@ extern const struct xlog_recover_item_ops xlog_sxi_item_ops; extern const struct xlog_recover_item_ops xlog_sxd_item_ops; extern const struct xlog_recover_item_ops xlog_rtefi_item_ops; extern const struct xlog_recover_item_ops xlog_rtefd_item_ops; +extern const struct xlog_recover_item_ops xlog_rtrui_item_ops; +extern const struct xlog_recover_item_ops xlog_rtrud_item_ops; /* * Macros, structures, prototypes for internal log manager use. diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 6f7ec83281656..7f4433b2a5dd3 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1887,7 +1887,7 @@ xfs_refcount_alloc_cow_extent( __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len); /* Add rmap entry */ - xfs_rmap_alloc_extent(tp, fsb, len, XFS_RMAP_OWN_COW); + xfs_rmap_alloc_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ @@ -1903,7 +1903,7 @@ xfs_refcount_free_cow_extent( return; /* Remove rmap entry */ - xfs_rmap_free_extent(tp, fsb, len, XFS_RMAP_OWN_COW); + xfs_rmap_free_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len); } diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 35df4e832996e..daef2d67eb7a0 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -2682,6 +2682,21 @@ xfs_rmap_finish_one( return 0; } +/* + * Process one of the deferred realtime rmap operations. We pass back the + * btree cursor to reduce overhead. + */ +int +xfs_rtrmap_finish_one( + struct xfs_trans *tp, + struct xfs_rmap_intent *ri, + struct xfs_btree_cur **pcur) +{ + /* coming in a subsequent patch */ + ASSERT(0); + return -EFSCORRUPTED; +} + /* * Don't defer an rmap if we aren't an rmap filesystem. */ @@ -2702,6 +2717,7 @@ __xfs_rmap_add( struct xfs_trans *tp, enum xfs_rmap_intent_type type, uint64_t owner, + bool isrt, int whichfork, struct xfs_bmbt_irec *bmap) { @@ -2713,6 +2729,7 @@ __xfs_rmap_add( ri->ri_owner = owner; ri->ri_whichfork = whichfork; ri->ri_bmap = *bmap; + ri->ri_realtime = isrt; xfs_rmap_defer_add(tp, ri); } @@ -2726,6 +2743,7 @@ xfs_rmap_map_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_MAP; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(tp->t_mountp, whichfork)) return; @@ -2733,7 +2751,7 @@ xfs_rmap_map_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_MAP_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* Unmap an extent out of a file. */ @@ -2745,6 +2763,7 @@ xfs_rmap_unmap_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_UNMAP; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(tp->t_mountp, whichfork)) return; @@ -2752,7 +2771,7 @@ xfs_rmap_unmap_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_UNMAP_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* @@ -2770,6 +2789,7 @@ xfs_rmap_convert_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_CONVERT; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(mp, whichfork)) return; @@ -2777,13 +2797,14 @@ xfs_rmap_convert_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_CONVERT_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* Schedule the creation of an rmap for non-file data. */ void xfs_rmap_alloc_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) @@ -2798,13 +2819,14 @@ xfs_rmap_alloc_extent( bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; - __xfs_rmap_add(tp, XFS_RMAP_ALLOC, owner, XFS_DATA_FORK, &bmap); + __xfs_rmap_add(tp, XFS_RMAP_ALLOC, owner, isrt, XFS_DATA_FORK, &bmap); } /* Schedule the deletion of an rmap for non-file data. */ void xfs_rmap_free_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) @@ -2819,7 +2841,7 @@ xfs_rmap_free_extent( bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; - __xfs_rmap_add(tp, XFS_RMAP_FREE, owner, XFS_DATA_FORK, &bmap); + __xfs_rmap_add(tp, XFS_RMAP_FREE, owner, isrt, XFS_DATA_FORK, &bmap); } /* Compare rmap records. Returns -1 if a < b, 1 if a > b, and 0 if equal. */ diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index 762f2f40b6e47..3719fc4cbc26b 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -174,7 +174,11 @@ struct xfs_rmap_intent { int ri_whichfork; uint64_t ri_owner; struct xfs_bmbt_irec ri_bmap; - struct xfs_perag *ri_pag; + union { + struct xfs_perag *ri_pag; + struct xfs_rtgroup *ri_rtg; + }; + bool ri_realtime; }; /* functions for updating the rmapbt based on bmbt map/unmap operations */ @@ -185,11 +189,13 @@ void xfs_rmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip, void xfs_rmap_convert_extent(struct xfs_mount *mp, struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *imap); -void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, +void xfs_rmap_alloc_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner); -void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, +void xfs_rmap_free_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner); +int xfs_rtrmap_finish_one(struct xfs_trans *tp, struct xfs_rmap_intent *ri, + struct xfs_btree_cur **pcur); int xfs_rmap_finish_one(struct xfs_trans *tp, struct xfs_rmap_intent *ri, struct xfs_btree_cur **pcur); int __xfs_rmap_finish_intent(struct xfs_btree_cur *rcur, diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c index 3805099cb578b..d4ea1afb238a0 100644 --- a/fs/xfs/scrub/alloc_repair.c +++ b/fs/xfs/scrub/alloc_repair.c @@ -546,7 +546,7 @@ xrep_abt_dispose_one( xfs_fsblock_t fsbno; fsbno = XFS_AGB_TO_FSB(sc->mp, pag->pag_agno, resv->agbno); - xfs_rmap_alloc_extent(sc->tp, fsbno, resv->used, + xfs_rmap_alloc_extent(sc->tp, false, fsbno, resv->used, XFS_RMAP_OWN_AG); } diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 0aeca77d511d0..1efb69fcadf10 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1795,6 +1795,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = { &xlog_sxd_item_ops, &xlog_rtefi_item_ops, &xlog_rtefd_item_ops, + &xlog_rtrui_item_ops, + &xlog_rtrud_item_ops, }; static const struct xlog_recover_item_ops * diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index e2ee1b6719202..229b5127d4716 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -23,6 +23,7 @@ #include "xfs_ag.h" #include "xfs_btree.h" #include "xfs_trace.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_rui_cache; struct kmem_cache *xfs_rud_cache; @@ -94,7 +95,9 @@ xfs_rui_item_format( ASSERT(atomic_read(&ruip->rui_next_extent) == ruip->rui_format.rui_nextents); - ruip->rui_format.rui_type = XFS_LI_RUI; + ASSERT(lip->li_type == XFS_LI_RUI || lip->li_type == XFS_LI_RUI_RT); + + ruip->rui_format.rui_type = lip->li_type; ruip->rui_format.rui_size = 1; xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_RUI_FORMAT, &ruip->rui_format, @@ -137,19 +140,22 @@ xfs_rui_item_release( STATIC struct xfs_rui_log_item * xfs_rui_init( struct xfs_mount *mp, + unsigned short item_type, uint nextents) { struct xfs_rui_log_item *ruip; ASSERT(nextents > 0); + ASSERT(item_type == XFS_LI_RUI || item_type == XFS_LI_RUI_RT); + if (nextents > XFS_RUI_MAX_FAST_EXTENTS) ruip = kmem_zalloc(xfs_rui_log_item_sizeof(nextents), 0); else ruip = kmem_cache_zalloc(xfs_rui_cache, GFP_KERNEL | __GFP_NOFAIL); - xfs_log_item_init(mp, &ruip->rui_item, XFS_LI_RUI, &xfs_rui_item_ops); + xfs_log_item_init(mp, &ruip->rui_item, item_type, &xfs_rui_item_ops); ruip->rui_format.rui_nextents = nextents; ruip->rui_format.rui_id = (uintptr_t)(void *)ruip; atomic_set(&ruip->rui_next_extent, 0); @@ -188,7 +194,9 @@ xfs_rud_item_format( struct xfs_rud_log_item *rudp = RUD_ITEM(lip); struct xfs_log_iovec *vecp = NULL; - rudp->rud_format.rud_type = XFS_LI_RUD; + ASSERT(lip->li_type == XFS_LI_RUD || lip->li_type == XFS_LI_RUD_RT); + + rudp->rud_format.rud_type = lip->li_type; rudp->rud_format.rud_size = 1; xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_RUD_FORMAT, &rudp->rud_format, @@ -232,6 +240,14 @@ static inline struct xfs_rmap_intent *ri_entry(const struct list_head *e) return list_entry(e, struct xfs_rmap_intent, ri_list); } +static inline bool +xfs_rui_item_isrt(const struct xfs_log_item *lip) +{ + ASSERT(lip->li_type == XFS_LI_RUI || lip->li_type == XFS_LI_RUI_RT); + + return lip->li_type == XFS_LI_RUI_RT; +} + /* Sort rmap intents by AG. */ static int xfs_rmap_update_diff_items( @@ -311,11 +327,12 @@ xfs_rmap_update_create_intent( bool sort) { struct xfs_mount *mp = tp->t_mountp; - struct xfs_rui_log_item *ruip = xfs_rui_init(mp, count); + struct xfs_rui_log_item *ruip; struct xfs_rmap_intent *ri; ASSERT(count > 0); + ruip = xfs_rui_init(mp, XFS_LI_RUI, count); if (sort) list_sort(mp, items, xfs_rmap_update_diff_items); list_for_each_entry(ri, items, ri_list) @@ -323,6 +340,12 @@ xfs_rmap_update_create_intent( return &ruip->rui_item; } +static inline unsigned short +xfs_rud_type_from_rui(const struct xfs_rui_log_item *ruip) +{ + return xfs_rui_item_isrt(&ruip->rui_item) ? XFS_LI_RUD_RT : XFS_LI_RUD; +} + /* Get an RUD so we can process all the deferred rmap updates. */ static struct xfs_log_item * xfs_rmap_update_create_done( @@ -334,8 +357,8 @@ xfs_rmap_update_create_done( struct xfs_rud_log_item *rudp; rudp = kmem_cache_zalloc(xfs_rud_cache, GFP_KERNEL | __GFP_NOFAIL); - xfs_log_item_init(tp->t_mountp, &rudp->rud_item, XFS_LI_RUD, - &xfs_rud_item_ops); + xfs_log_item_init(tp->t_mountp, &rudp->rud_item, + xfs_rud_type_from_rui(ruip), &xfs_rud_item_ops); rudp->rud_ruip = ruip; rudp->rud_format.rud_rui_id = ruip->rui_format.rui_id; @@ -352,8 +375,23 @@ xfs_rmap_defer_add( trace_xfs_rmap_defer(mp, ri); - ri->ri_pag = xfs_perag_intent_get(mp, ri->ri_bmap.br_startblock); - xfs_defer_add(tp, &ri->ri_list, &xfs_rmap_update_defer_type); + /* + * Deferred rmap updates for the realtime and data sections must use + * separate transactions to finish deferred work because updates to + * realtime metadata files can lock AGFs to allocate btree blocks and + * we don't want that mixing with the AGF locks taken to finish data + * section updates. + */ + if (ri->ri_realtime) { + xfs_rgnumber_t rgno; + + rgno = xfs_rtb_to_rgno(mp, ri->ri_bmap.br_startblock); + ri->ri_rtg = xfs_rtgroup_get(mp, rgno); + xfs_defer_add(tp, &ri->ri_list, &xfs_rtrmap_update_defer_type); + } else { + ri->ri_pag = xfs_perag_intent_get(mp, ri->ri_bmap.br_startblock); + xfs_defer_add(tp, &ri->ri_list, &xfs_rmap_update_defer_type); + } } /* Cancel a deferred rmap update. */ @@ -564,10 +602,12 @@ xfs_rmap_relog_intent( struct xfs_map_extent *map; unsigned int count; + ASSERT(intent->li_type == XFS_LI_RUI || intent->li_type == XFS_LI_RUI_RT); + count = RUI_ITEM(intent)->rui_format.rui_nextents; map = RUI_ITEM(intent)->rui_format.rui_extents; - ruip = xfs_rui_init(tp->t_mountp, count); + ruip = xfs_rui_init(tp->t_mountp, intent->li_type, count); memcpy(ruip->rui_format.rui_extents, map, count * sizeof(*map)); atomic_set(&ruip->rui_next_extent, count); @@ -587,6 +627,98 @@ const struct xfs_defer_op_type xfs_rmap_update_defer_type = { .relog_intent = xfs_rmap_relog_intent, }; +#ifdef CONFIG_XFS_RT +/* Sort rmap intents by rtgroup. */ +static int +xfs_rtrmap_update_diff_items( + void *priv, + const struct list_head *a, + const struct list_head *b) +{ + struct xfs_rmap_intent *ra = ri_entry(a); + struct xfs_rmap_intent *rb = ri_entry(b); + + return ra->ri_rtg->rtg_rgno - rb->ri_rtg->rtg_rgno; +} + +static struct xfs_log_item * +xfs_rtrmap_update_create_intent( + struct xfs_trans *tp, + struct list_head *items, + unsigned int count, + bool sort) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_rui_log_item *ruip; + struct xfs_rmap_intent *ri; + + ASSERT(count > 0); + + ruip = xfs_rui_init(mp, XFS_LI_RUI_RT, count); + if (sort) + list_sort(mp, items, xfs_rtrmap_update_diff_items); + list_for_each_entry(ri, items, ri_list) + xfs_rmap_update_log_item(tp, ruip, ri); + return &ruip->rui_item; +} + +/* Cancel a deferred realtime rmap update. */ +STATIC void +xfs_rtrmap_update_cancel_item( + struct list_head *item) +{ + struct xfs_rmap_intent *ri = ri_entry(item); + + xfs_rtgroup_put(ri->ri_rtg); + kmem_cache_free(xfs_rmap_intent_cache, ri); +} + +/* Process a deferred realtime rmap update. */ +STATIC int +xfs_rtrmap_update_finish_item( + struct xfs_trans *tp, + struct xfs_log_item *done, + struct list_head *item, + struct xfs_btree_cur **state) +{ + struct xfs_rmap_intent *ri = ri_entry(item); + int error; + + error = xfs_rtrmap_finish_one(tp, ri, state); + + xfs_rtrmap_update_cancel_item(item); + return error; +} + +/* Clean up after calling xfs_rtrmap_finish_one. */ +STATIC void +xfs_rtrmap_finish_one_cleanup( + struct xfs_trans *tp, + struct xfs_btree_cur *rcur, + int error) +{ + if (rcur) + xfs_btree_del_cursor(rcur, error); +} + +const struct xfs_defer_op_type xfs_rtrmap_update_defer_type = { + .name = "rtrmap", + .max_items = XFS_RUI_MAX_FAST_EXTENTS, + .create_intent = xfs_rtrmap_update_create_intent, + .abort_intent = xfs_rmap_update_abort_intent, + .create_done = xfs_rmap_update_create_done, + .finish_item = xfs_rtrmap_update_finish_item, + .finish_cleanup = xfs_rtrmap_finish_one_cleanup, + .cancel_item = xfs_rtrmap_update_cancel_item, + .recover_work = xfs_rmap_recover_work, + .relog_intent = xfs_rmap_relog_intent, +}; +#else +const struct xfs_defer_op_type xfs_rtrmap_update_defer_type = { + .name = "rtrmap", +}; +#endif + STATIC bool xfs_rui_item_match( struct xfs_log_item *lip, @@ -652,7 +784,7 @@ xlog_recover_rui_commit_pass2( return -EFSCORRUPTED; } - ruip = xfs_rui_init(mp, rui_formatp->rui_nextents); + ruip = xfs_rui_init(mp, ITEM_TYPE(item), rui_formatp->rui_nextents); xfs_rui_copy_format(&ruip->rui_format, rui_formatp); atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents); @@ -666,6 +798,61 @@ const struct xlog_recover_item_ops xlog_rui_item_ops = { .commit_pass2 = xlog_recover_rui_commit_pass2, }; +#ifdef CONFIG_XFS_RT +STATIC int +xlog_recover_rtrui_commit_pass2( + struct xlog *log, + struct list_head *buffer_list, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + struct xfs_mount *mp = log->l_mp; + struct xfs_rui_log_item *ruip; + struct xfs_rui_log_format *rui_formatp; + size_t len; + + rui_formatp = item->ri_buf[0].i_addr; + + if (item->ri_buf[0].i_len < xfs_rui_log_format_sizeof(0)) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, + item->ri_buf[0].i_addr, item->ri_buf[0].i_len); + return -EFSCORRUPTED; + } + + len = xfs_rui_log_format_sizeof(rui_formatp->rui_nextents); + if (item->ri_buf[0].i_len != len) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, + item->ri_buf[0].i_addr, item->ri_buf[0].i_len); + return -EFSCORRUPTED; + } + + ruip = xfs_rui_init(mp, ITEM_TYPE(item), rui_formatp->rui_nextents); + xfs_rui_copy_format(&ruip->rui_format, rui_formatp); + atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents); + + xlog_recover_intent_item(log, &ruip->rui_item, lsn, + &xfs_rtrmap_update_defer_type); + return 0; +} +#else +STATIC int +xlog_recover_rtrui_commit_pass2( + struct xlog *log, + struct list_head *buffer_list, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, log->l_mp, + item->ri_buf[0].i_addr, item->ri_buf[0].i_len); + return -EFSCORRUPTED; +} +#endif + +const struct xlog_recover_item_ops xlog_rtrui_item_ops = { + .item_type = XFS_LI_RUI_RT, + .commit_pass2 = xlog_recover_rtrui_commit_pass2, +}; + /* * This routine is called when an RUD format structure is found in a committed * transaction in the log. Its purpose is to cancel the corresponding RUI if it @@ -697,3 +884,33 @@ const struct xlog_recover_item_ops xlog_rud_item_ops = { .item_type = XFS_LI_RUD, .commit_pass2 = xlog_recover_rud_commit_pass2, }; + +#ifdef CONFIG_XFS_RT +STATIC int +xlog_recover_rtrud_commit_pass2( + struct xlog *log, + struct list_head *buffer_list, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + struct xfs_rud_log_format *rud_formatp; + + rud_formatp = item->ri_buf[0].i_addr; + if (item->ri_buf[0].i_len != sizeof(struct xfs_rud_log_format)) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, log->l_mp, + rud_formatp, item->ri_buf[0].i_len); + return -EFSCORRUPTED; + } + + xlog_recover_release_intent(log, XFS_LI_RUI_RT, + rud_formatp->rud_rui_id); + return 0; +} +#else +# define xlog_recover_rtrud_commit_pass2 xlog_recover_rtrui_commit_pass2 +#endif + +const struct xlog_recover_item_ops xlog_rtrud_item_ops = { + .item_type = XFS_LI_RUD_RT, + .commit_pass2 = xlog_recover_rtrud_commit_pass2, +}; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 1c89d38b85446..10eeceb8b9e7f 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3017,9 +3017,10 @@ DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, TP_ARGS(mp, ri), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(unsigned long long, owner) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, rmapbno) __field(int, whichfork) __field(xfs_fileoff_t, l_loff) __field(xfs_filblks_t, l_len) @@ -3028,9 +3029,18 @@ DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, ), TP_fast_assign( __entry->dev = mp->m_super->s_dev; - __entry->agno = XFS_FSB_TO_AGNO(mp, ri->ri_bmap.br_startblock); - __entry->agbno = XFS_FSB_TO_AGBNO(mp, - ri->ri_bmap.br_startblock); + if (ri->ri_realtime) { + __entry->opdev = mp->m_rtdev_targp->bt_dev; + __entry->rmapbno = xfs_rtb_to_rgbno(mp, + ri->ri_bmap.br_startblock, + &__entry->agno); + } else { + __entry->agno = XFS_FSB_TO_AGNO(mp, + ri->ri_bmap.br_startblock); + __entry->opdev = __entry->dev; + __entry->rmapbno = XFS_FSB_TO_AGBNO(mp, + ri->ri_bmap.br_startblock); + } __entry->owner = ri->ri_owner; __entry->whichfork = ri->ri_whichfork; __entry->l_loff = ri->ri_bmap.br_startoff; @@ -3038,11 +3048,12 @@ DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, __entry->l_state = ri->ri_bmap.br_state; __entry->op = ri->ri_type; ), - TP_printk("dev %d:%d op %s agno 0x%x agbno 0x%x owner 0x%llx %s fileoff 0x%llx fsbcount 0x%llx state %d", + TP_printk("dev %d:%d op %s opdev %d:%d agno 0x%x rmapbno 0x%x owner 0x%llx %s fileoff 0x%llx fsbcount 0x%llx state %d", MAJOR(__entry->dev), MINOR(__entry->dev), __print_symbolic(__entry->op, XFS_RMAP_INTENT_STRINGS), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->agbno, + __entry->rmapbno, __entry->owner, __print_symbolic(__entry->whichfork, XFS_WHICHFORK_STRINGS), __entry->l_loff, From patchwork Sun Dec 31 21:34:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507673 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF5C7BA22 for ; Sun, 31 Dec 2023 21:34:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Vl5EssTB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AC4BDC433C7; Sun, 31 Dec 2023 21:34:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058448; bh=A34v3rmtFN5HWJhDZDZ0Auj9cLXQggVzOGhlw3N0rN0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Vl5EssTBSSG5UiE0pJ+qOMRz5DrVjWFIvRg2pBgBgOznhZbJqndTbyq5tjqiTLvED /tmYxfIq17ApL4BEoklSP0NLF77q21tIOAnB7zhgstyd4opIKnfuP2cvQpKnwLCscm TgFtrXZ02u4Ozjf1Y1OX3jjdi3Zhoq9rBlbagVzjbVVnenCLqWQX20b8K7EW/PE8wJ xfmKaX6K6EexA6gh+bsmE9VewRHpbMoaN5+s+nfnY0p5EdeYR9JH792o6LxfilOnTm h1ezxgN6RuUCldsVv0d4mwV1lb3XJi9jTfUPrTfOnjXSBIWoJiLivRdwxlPS9SyRbu RLGGL86o95dEg== Date: Sun, 31 Dec 2023 13:34:08 -0800 Subject: [PATCH 09/39] xfs: support recovering rmap intent items targetting realtime extents From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850047.1764998.4734291595014223853.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now that we have rmap on the realtime device, log recovery has to support remapping extents on the realtime volume. Make this work. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_rmap_item.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 229b5127d4716..580baf3b1b1d3 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -451,6 +451,7 @@ xfs_rmap_update_abort_intent( static inline bool xfs_rui_validate_map( struct xfs_mount *mp, + bool isrt, struct xfs_map_extent *map) { if (!xfs_has_rmapbt(mp)) @@ -480,6 +481,9 @@ xfs_rui_validate_map( if (!xfs_verify_fileext(mp, map->me_startoff, map->me_len)) return false; + if (isrt) + return xfs_verify_rtbext(mp, map->me_startblock, map->me_len); + return xfs_verify_fsbext(mp, map->me_startblock, map->me_len); } @@ -487,6 +491,7 @@ static inline void xfs_rui_recover_work( struct xfs_mount *mp, struct xfs_defer_pending *dfp, + bool isrt, const struct xfs_map_extent *map) { struct xfs_rmap_intent *ri; @@ -531,7 +536,15 @@ xfs_rui_recover_work( ri->ri_bmap.br_blockcount = map->me_len; ri->ri_bmap.br_state = (map->me_flags & XFS_RMAP_EXTENT_UNWRITTEN) ? XFS_EXT_UNWRITTEN : XFS_EXT_NORM; - ri->ri_pag = xfs_perag_intent_get(mp, map->me_startblock); + ri->ri_realtime = isrt; + if (isrt) { + xfs_rgnumber_t rgno; + + rgno = xfs_rtb_to_rgno(mp, map->me_startblock); + ri->ri_rtg = xfs_rtgroup_get(mp, rgno); + } else { + ri->ri_pag = xfs_perag_intent_get(mp, map->me_startblock); + } xfs_defer_add_item(dfp, &ri->ri_list); } @@ -550,6 +563,7 @@ xfs_rmap_recover_work( struct xfs_rui_log_item *ruip = RUI_ITEM(lip); struct xfs_trans *tp; struct xfs_mount *mp = lip->li_log->l_mp; + bool isrt = xfs_rui_item_isrt(lip); int i; int error = 0; @@ -559,7 +573,7 @@ xfs_rmap_recover_work( * just toss the RUI. */ for (i = 0; i < ruip->rui_format.rui_nextents; i++) { - if (!xfs_rui_validate_map(mp, + if (!xfs_rui_validate_map(mp, isrt, &ruip->rui_format.rui_extents[i])) { XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, &ruip->rui_format, @@ -567,7 +581,8 @@ xfs_rmap_recover_work( return -EFSCORRUPTED; } - xfs_rui_recover_work(mp, dfp, &ruip->rui_format.rui_extents[i]); + xfs_rui_recover_work(mp, dfp, isrt, + &ruip->rui_format.rui_extents[i]); } resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate); From patchwork Sun Dec 31 21:34:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507674 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85112BA22 for ; Sun, 31 Dec 2023 21:34:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PXNrMIvy" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4EA41C433C7; Sun, 31 Dec 2023 21:34:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058464; bh=KsHhpkVsjTM/GVxto467Za5WgBVoaSafUwE/kzDzTO0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=PXNrMIvyNjA/J35KlC75SNzYw8Kghyqnbh4DrfxMjTpmovsegbAFMRTLC0IaTx/G5 /h+B8/o3SvzceISAQynVG6cwWjJz4LT6Svmgm3ywSRZSKkftaKK4OHTV3ibG1PtwVa JjBZnBNn5r8yfshH76bXdDYziiC9o3W648xHjcu5TkHJmYlzXi4C908fZEibrSODJk qYlN9DBvXjF+e7hzK/0MXlHgYM5IhdtDUxw5p4HNp4/+6Rye0wMjvmX0WNj0eXzf0V aw49rkh+PIJ/CfEMJUS7f3+7VTpQZt+oRCacq7GqW86hE2zTzBu5/RZPWFfCRiH9LT DmJakb2bnPtLQ== Date: Sun, 31 Dec 2023 13:34:23 -0800 Subject: [PATCH 10/39] xfs: add realtime rmap btree block detection to log recovery From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850062.1764998.3010689403938620693.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Identify rtrmapbt blocks in the log correctly so that we can validate them during log recovery. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_buf_item_recover.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index 2e617041161e0..c7d86636bd312 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -259,6 +259,9 @@ xlog_recover_validate_buf_type( case XFS_BMAP_MAGIC: bp->b_ops = &xfs_bmbt_buf_ops; break; + case XFS_RTRMAP_CRC_MAGIC: + bp->b_ops = &xfs_rtrmapbt_buf_ops; + break; case XFS_RMAP_CRC_MAGIC: bp->b_ops = &xfs_rmapbt_buf_ops; break; @@ -768,6 +771,7 @@ xlog_recover_get_buf_lsn( uuid = &btb->bb_u.s.bb_uuid; break; } + case XFS_RTRMAP_CRC_MAGIC: case XFS_BMAP_CRC_MAGIC: case XFS_BMAP_MAGIC: { struct xfs_btree_block *btb = blk; From patchwork Sun Dec 31 21:34:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507675 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83014BA22 for ; Sun, 31 Dec 2023 21:34:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cnSNcFGO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0ACCBC433C8; Sun, 31 Dec 2023 21:34:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058480; bh=SKWFlA0j7WTbEZDTizmejKvICN+Bu8V9ikZlfMhFnUc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=cnSNcFGOU7pTHAk+d6G/gS1EXXoDPwj+BNJrg7Cclt79/hnwYMo/6tI+8KVfEeBlg dwEsQdAUn0pdUakWtq+2l4FSLPjlZ/8D7w8BBVeXdnDjFiggIKy288ZxdIddz6QOCf aDuCc6cWfZyQmYNVrm/tSXozuMS8KkGxXWWixAeBko1Qnlluvm4LXQtO5kcBSlMHzA 8aSSwT1hXQstv4/MzA5+C5exYuYiUJTGglcl5EIf+DlwYMJv7EjNE01ZA/Gb0PAL/F +ajtj5j4hU3H3YAiNav2qCOKcChK/YjrojUwgwjfpYvNPI08zu4ckq5GTWp9ZkQw3F Jc9Fby7vmTJxw== Date: Sun, 31 Dec 2023 13:34:39 -0800 Subject: [PATCH 11/39] xfs: add realtime reverse map inode to metadata directory From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850078.1764998.2120978271688136761.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add a metadir path to select the realtime rmap btree inode and load it at mount time. The rtrmapbt inode will have a unique extent format code, which means that we also have to update the inode validation and flush routines to look for it. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_format.h | 6 ++- fs/xfs/libxfs/xfs_inode_buf.c | 10 +++++ fs/xfs/libxfs/xfs_inode_fork.c | 9 +++++ fs/xfs/libxfs/xfs_rtgroup.h | 3 ++ fs/xfs/libxfs/xfs_rtrmap_btree.c | 33 ++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 4 ++ fs/xfs/xfs_inode.c | 19 ++++++++++ fs/xfs/xfs_inode_item.c | 2 + fs/xfs/xfs_inode_item_recover.c | 1 + fs/xfs/xfs_rtalloc.c | 71 ++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_trace.h | 1 + 11 files changed, 156 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 5317c6438f070..d374240fc58c0 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1026,7 +1026,8 @@ enum xfs_dinode_fmt { XFS_DINODE_FMT_LOCAL, /* bulk data */ XFS_DINODE_FMT_EXTENTS, /* struct xfs_bmbt_rec */ XFS_DINODE_FMT_BTREE, /* struct xfs_bmdr_block */ - XFS_DINODE_FMT_UUID /* added long ago, but never used */ + XFS_DINODE_FMT_UUID, /* added long ago, but never used */ + XFS_DINODE_FMT_RMAP, /* reverse mapping btree */ }; #define XFS_INODE_FORMAT_STR \ @@ -1034,7 +1035,8 @@ enum xfs_dinode_fmt { { XFS_DINODE_FMT_LOCAL, "local" }, \ { XFS_DINODE_FMT_EXTENTS, "extent" }, \ { XFS_DINODE_FMT_BTREE, "btree" }, \ - { XFS_DINODE_FMT_UUID, "uuid" } + { XFS_DINODE_FMT_UUID, "uuid" }, \ + { XFS_DINODE_FMT_RMAP, "rmap" } /* * Max values for extnum and aextnum. diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 18d9e71a3bb30..5ed779cbe6f9f 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -411,6 +411,12 @@ xfs_dinode_verify_fork( if (di_nextents > max_extents) return __this_address; break; + case XFS_DINODE_FMT_RMAP: + if (!xfs_has_rtrmapbt(mp)) + return __this_address; + if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADIR))) + return __this_address; + break; default: return __this_address; } @@ -430,6 +436,10 @@ xfs_dinode_verify_forkoff( if (dip->di_forkoff != (roundup(sizeof(xfs_dev_t), 8) >> 3)) return __this_address; break; + case XFS_DINODE_FMT_RMAP: + if (!(xfs_has_metadir(mp) && xfs_has_parent(mp))) + return __this_address; + fallthrough; case XFS_DINODE_FMT_LOCAL: /* fall through ... */ case XFS_DINODE_FMT_EXTENTS: /* fall through ... */ case XFS_DINODE_FMT_BTREE: diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 16543bb873a81..eb7e733b30638 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -264,6 +264,11 @@ xfs_iformat_data_fork( return xfs_iformat_extents(ip, dip, XFS_DATA_FORK); case XFS_DINODE_FMT_BTREE: return xfs_iformat_btree(ip, dip, XFS_DATA_FORK); + case XFS_DINODE_FMT_RMAP: + if (!xfs_has_rtrmapbt(ip->i_mount)) + return -EFSCORRUPTED; + ASSERT(0); /* to be implemented later */ + return -EFSCORRUPTED; default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -653,6 +658,10 @@ xfs_iflush_fork( } break; + case XFS_DINODE_FMT_RMAP: + ASSERT(0); /* to be implemented later */ + break; + default: ASSERT(0); break; diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 0a63f14b5aa0f..77503bda35563 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -22,6 +22,9 @@ struct xfs_rtgroup { /* for rcu-safe freeing */ struct rcu_head rcu_head; + /* reverse mapping btree inode */ + struct xfs_inode *rtg_rmapip; + /* Number of blocks in this group */ xfs_rgblock_t rtg_blockcount; diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 5be3e1af55684..e60864dd15030 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -18,6 +18,7 @@ #include "xfs_alloc.h" #include "xfs_btree.h" #include "xfs_btree_staging.h" +#include "xfs_imeta.h" #include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" #include "xfs_trace.h" @@ -476,6 +477,7 @@ xfs_rtrmapbt_commit_staged_btree( int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + ASSERT(ifake->if_fork->if_format == XFS_DINODE_FMT_RMAP); /* * Free any resources hanging off the real fork, then shallow-copy the @@ -576,3 +578,34 @@ xfs_rtrmapbt_compute_maxlevels( /* Add one level to handle the inode root level. */ mp->m_rtrmap_maxlevels = min(d_maxlevels, r_maxlevels) + 1; } + +#define XFS_RTRMAP_NAMELEN 17 + +/* Create the metadata directory path for an rtrmap btree inode. */ +int +xfs_rtrmapbt_create_path( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp) +{ + struct xfs_imeta_path *path; + unsigned char *fname; + int error; + + error = xfs_imeta_create_file_path(mp, 2, &path); + if (error) + return error; + + fname = kmalloc(XFS_RTRMAP_NAMELEN, GFP_KERNEL); + if (!fname) { + xfs_imeta_free_path(path); + return -ENOMEM; + } + + snprintf(fname, XFS_RTRMAP_NAMELEN, "%u.rmap", rgno); + path->im_path[0] = "realtime"; + path->im_path[1] = fname; + path->im_dynamicmask = 0x2; + *pathp = path; + return 0; +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 0f73267971924..29b698660182d 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -11,6 +11,7 @@ struct xfs_btree_cur; struct xfs_mount; struct xbtree_ifakeroot; struct xfs_rtgroup; +struct xfs_imeta_path; /* rmaps only exist on crc enabled filesystems */ #define XFS_RTRMAP_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN @@ -80,4 +81,7 @@ unsigned int xfs_rtrmapbt_maxlevels_ondisk(void); int __init xfs_rtrmapbt_init_cur_cache(void); void xfs_rtrmapbt_destroy_cur_cache(void); +int xfs_rtrmapbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 78afc5b8e11c6..b7cda81161fb5 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2501,7 +2501,15 @@ xfs_iflush( __func__, ip->i_ino, be16_to_cpu(dip->di_magic), dip); goto flush_out; } - if (S_ISREG(VFS_I(ip)->i_mode)) { + if (ip->i_df.if_format == XFS_DINODE_FMT_RMAP) { + if (!S_ISREG(VFS_I(ip)->i_mode) || + !(ip->i_diflags2 & XFS_DIFLAG2_METADIR)) { + xfs_alert_tag(mp, XFS_PTAG_IFLUSH, + "%s: Bad rt rmapbt inode %Lu, ptr "PTR_FMT, + __func__, ip->i_ino, ip); + goto flush_out; + } + } else if (S_ISREG(VFS_I(ip)->i_mode)) { if (XFS_TEST_ERROR( ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS && ip->i_df.if_format != XFS_DINODE_FMT_BTREE, @@ -2541,6 +2549,15 @@ xfs_iflush( goto flush_out; } + if (xfs_inode_has_attr_fork(ip)) { + if (ip->i_af.if_format == XFS_DINODE_FMT_RMAP) { + xfs_alert_tag(mp, XFS_PTAG_IFLUSH, + "%s: rt rmapbt in inode %Lu attr fork, ptr "PTR_FMT, + __func__, ip->i_ino, ip); + goto flush_out; + } + } + /* * Inode item log recovery for v2 inodes are dependent on the flushiter * count for correct sequencing. We bump the flush iteration count so diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index b35335e20342c..2903c101505f5 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -210,6 +210,7 @@ xfs_inode_item_data_fork_size( } break; case XFS_DINODE_FMT_BTREE: + case XFS_DINODE_FMT_RMAP: if ((iip->ili_fields & XFS_ILOG_DBROOT) && ip->i_df.if_broot_bytes > 0) { *nbytes += ip->i_df.if_broot_bytes; @@ -330,6 +331,7 @@ xfs_inode_item_format_data_fork( } break; case XFS_DINODE_FMT_BTREE: + case XFS_DINODE_FMT_RMAP: iip->ili_fields &= ~(XFS_ILOG_DDATA | XFS_ILOG_DEXT | XFS_ILOG_DEV); diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index 144198a6b2702..0ec61d17f98e6 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -393,6 +393,7 @@ xlog_recover_inode_commit_pass2( if (unlikely(S_ISREG(ldip->di_mode))) { if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && + (ldip->di_format != XFS_DINODE_FMT_RMAP) && (ldip->di_format != XFS_DINODE_FMT_BTREE)) { XFS_CORRUPTION_ERROR( "Bad log dinode data fork format for regular file", diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 49528f901d047..5308554fa93ec 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -25,6 +25,8 @@ #include "xfs_da_format.h" #include "xfs_imeta.h" #include "xfs_rtgroup.h" +#include "xfs_error.h" +#include "xfs_rtrmap_btree.h" /* * Realtime metadata files are not quite regular files because userspace can't @@ -35,6 +37,7 @@ */ static struct lock_class_key xfs_rbmip_key; static struct lock_class_key xfs_rsumip_key; +static struct lock_class_key xfs_rrmapip_key; /* * Read and return the summary information for a given extent size, @@ -1589,6 +1592,53 @@ __xfs_rt_iget( #define xfs_rt_iget(tp, ino, lockdep_key, ipp) \ __xfs_rt_iget((tp), (ino), (lockdep_key), #lockdep_key, (ipp)) +/* Load realtime rmap btree inode. */ +STATIC int +xfs_rtmount_rmapbt( + struct xfs_rtgroup *rtg, + struct xfs_trans *tp) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_path *path; + struct xfs_inode *ip; + xfs_ino_t ino; + int error; + + if (!xfs_has_rtrmapbt(mp)) + return 0; + + error = xfs_rtrmapbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + return error; + + error = xfs_imeta_lookup(tp, path, &ino); + if (error) + goto out_path; + + if (ino == NULLFSINO) { + error = -EFSCORRUPTED; + goto out_path; + } + + error = xfs_rt_iget(tp, ino, &xfs_rrmapip_key, &ip); + if (error) + goto out_path; + + if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_RMAP)) { + error = -EFSCORRUPTED; + goto out_rele; + } + + rtg->rtg_rmapip = ip; + ip = NULL; +out_rele: + if (ip) + xfs_imeta_irele(ip); +out_path: + xfs_imeta_free_path(path); + return error; +} + /* * Read in the bmbt of an rt metadata inode so that we never have to load them * at runtime. This enables the use of shared ILOCKs for rtbitmap scans. Use @@ -1663,12 +1713,24 @@ xfs_rtmount_inodes( for_each_rtgroup(mp, rgno, rtg) { rtg->rtg_blockcount = xfs_rtgroup_block_count(mp, rtg->rtg_rgno); + + error = xfs_rtmount_rmapbt(rtg, tp); + if (error) { + xfs_rtgroup_rele(rtg); + goto out_rele_rtgroup; + } } xfs_alloc_rsum_cache(mp, sbp->sb_rbmblocks); xfs_trans_cancel(tp); return 0; +out_rele_rtgroup: + for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_rmapip) + xfs_imeta_irele(rtg->rtg_rmapip); + rtg->rtg_rmapip = NULL; + } out_rele_summary: xfs_imeta_irele(mp->m_rsumip); out_rele_bitmap: @@ -1682,7 +1744,16 @@ void xfs_rtunmount_inodes( struct xfs_mount *mp) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + kmem_free(mp->m_rsum_cache); + + for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_rmapip) + xfs_imeta_irele(rtg->rtg_rmapip); + rtg->rtg_rmapip = NULL; + } if (mp->m_rbmip) xfs_imeta_irele(mp->m_rbmip); if (mp->m_rsumip) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 10eeceb8b9e7f..8ebdfb216266c 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2221,6 +2221,7 @@ TRACE_DEFINE_ENUM(XFS_DINODE_FMT_LOCAL); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_EXTENTS); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_BTREE); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_UUID); +TRACE_DEFINE_ENUM(XFS_DINODE_FMT_RMAP); DECLARE_EVENT_CLASS(xfs_swap_extent_class, TP_PROTO(struct xfs_inode *ip, int which), From patchwork Sun Dec 31 21:34:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507676 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35FA0BA22 for ; Sun, 31 Dec 2023 21:34:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sPrXx16A" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9BBF5C433C8; Sun, 31 Dec 2023 21:34:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058495; bh=vvo59GPgyPmlVY2fZ5GIDxsfypvoBGG/isjqYMBdtzE=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=sPrXx16AdsrRz02fOfHqUq2MWdzV75GMY01NTmjtwR1y8//LXOowhx3t/+3tV1kAX 5VCIPCcgQeZOGQusWU5L9i0/lqKhZAwFjzXgfPUz9XmPHlhv0LTGixtYmIRW0WJnMK I/i6u6SSeQhBs61rc3wWy4imlmWu/vTlPo6ODQlidUECzmzW90R40W3ekuP523WcdY 7ruMvVWfhSqcFSCDzzpGS1LnqmxYI+Y5qWMGYNWijFmntSz7NGc4gV/JKShpz/sSK7 ShlQ50SIBUdJ2aezLbjdvfTsTrVEzW+zicKqgD+ldg5NyrOp9tdnOmxqExsEHYvoyq pni1Ml7nkufNg== Date: Sun, 31 Dec 2023 13:34:55 -0800 Subject: [PATCH 12/39] xfs: add metadata reservations for realtime rmap btrees From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850095.1764998.3698658457137800745.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Reserve some free blocks so that we will always have enough free blocks in the data volume to handle expansion of the realtime rmap btree. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_rtrmap_btree.c | 39 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 2 ++ fs/xfs/xfs_rtalloc.c | 21 +++++++++++++++++++- 3 files changed, 61 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index e60864dd15030..9d2087962c53a 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -609,3 +609,42 @@ xfs_rtrmapbt_create_path( *pathp = path; return 0; } + +/* Calculate the rtrmap btree size for some records. */ +static unsigned long long +xfs_rtrmapbt_calc_size( + struct xfs_mount *mp, + unsigned long long len) +{ + return xfs_btree_calc_size(mp->m_rtrmap_mnr, len); +} + +/* + * Calculate the maximum rmap btree size. + */ +static unsigned long long +xfs_rtrmapbt_max_size( + struct xfs_mount *mp, + xfs_rtblock_t rtblocks) +{ + /* Bail out if we're uninitialized, which can happen in mkfs. */ + if (mp->m_rtrmap_mxr[0] == 0) + return 0; + + return xfs_rtrmapbt_calc_size(mp, rtblocks); +} + +/* + * Figure out how many blocks to reserve and how many are used by this btree. + */ +xfs_filblks_t +xfs_rtrmapbt_calc_reserves( + struct xfs_mount *mp) +{ + if (!xfs_has_rtrmapbt(mp)) + return 0; + + /* 1/64th (~1.5%) of the space, and enough for 1 record per block. */ + return max_t(xfs_filblks_t, mp->m_sb.sb_rgblocks >> 6, + xfs_rtrmapbt_max_size(mp, mp->m_sb.sb_rgblocks)); +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 29b698660182d..b7950e6d45d40 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -84,4 +84,6 @@ void xfs_rtrmapbt_destroy_cur_cache(void); int xfs_rtrmapbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, struct xfs_imeta_path **pathp); +xfs_filblks_t xfs_rtrmapbt_calc_reserves(struct xfs_mount *mp); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 5308554fa93ec..2f2a92672de9a 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1560,6 +1560,11 @@ void xfs_rt_resv_free( struct xfs_mount *mp) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + + for_each_rtgroup(mp, rgno, rtg) + xfs_imeta_resv_free_inode(rtg->rtg_rmapip); } /* Reserve space for rt metadata inodes' space expansion. */ @@ -1567,7 +1572,21 @@ int xfs_rt_resv_init( struct xfs_mount *mp) { - return 0; + struct xfs_rtgroup *rtg; + xfs_filblks_t ask; + xfs_rgnumber_t rgno; + int error = 0; + + for_each_rtgroup(mp, rgno, rtg) { + int err2; + + ask = xfs_rtrmapbt_calc_reserves(mp); + err2 = xfs_imeta_resv_init_inode(rtg->rtg_rmapip, ask); + if (err2 && !error) + error = err2; + } + + return error; } static inline int From patchwork Sun Dec 31 21:35:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507677 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71D98BA2B for ; Sun, 31 Dec 2023 21:35:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pOcdpO+E" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 421FCC433C7; Sun, 31 Dec 2023 21:35:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058511; bh=3Sz0J7qNlAw5R+NMdSK4FS6nI7AposQgy8FhzwnKzec=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=pOcdpO+EfkwPyBd7d2r8F5Re8yJBoL08/Hjj8lgd1qcPA/EcKEr4VFlTxigDJpYrq 66/C6ggCRHrW+Vh/qgDw6alyJS14OevoczgDQ7n7AEGP4VMPXtSH4HGfiDea30+585 7CaxkB48ebnH8HUIFkeQ4IqRDt5MiZ+pH6vYSL7qdYBs6f3dFHH8gP/pNBbaRYRlZz mQBDYE4x7fF2YxML/u6IX/sDftLV/R74mTThizrRZHDkI3yVr/DW6k8IZkG7lEmf9u JrIX5xEd9at2uImWT0V8hefgM26iXoJ5QZeieag1wMYKfG2cYBgFgGo+Pj9qgRIGGG rXuQ3LdC5/1nA== Date: Sun, 31 Dec 2023 13:35:10 -0800 Subject: [PATCH 13/39] xfs: wire up a new inode fork type for the realtime rmap From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850111.1764998.11779580998834604071.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Plumb in the pieces we need to embed the root of the realtime rmap btree in an inode's data fork, complete with new fork type and on-disk interpretation functions. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_format.h | 8 + fs/xfs/libxfs/xfs_inode_fork.c | 8 + fs/xfs/libxfs/xfs_ondisk.h | 1 fs/xfs/libxfs/xfs_rtrmap_btree.c | 220 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 112 +++++++++++++++++++ fs/xfs/xfs_inode_item_recover.c | 32 +++++- 6 files changed, 375 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index d374240fc58c0..1c1910256a927 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1755,6 +1755,14 @@ typedef __be32 xfs_rmap_ptr_t; */ #define XFS_RTRMAP_CRC_MAGIC 0x4d415052 /* 'MAPR' */ +/* + * rtrmap root header, on-disk form only. + */ +struct xfs_rtrmap_root { + __be16 bb_level; /* 0 is a leaf */ + __be16 bb_numrecs; /* current # of data records */ +}; + /* inode-based btree pointer type */ typedef __be64 xfs_rtrmap_ptr_t; diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index eb7e733b30638..ab54a9b27c781 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -27,6 +27,7 @@ #include "xfs_errortag.h" #include "xfs_health.h" #include "xfs_symlink_remote.h" +#include "xfs_rtrmap_btree.h" struct kmem_cache *xfs_ifork_cache; @@ -267,8 +268,7 @@ xfs_iformat_data_fork( case XFS_DINODE_FMT_RMAP: if (!xfs_has_rtrmapbt(ip->i_mount)) return -EFSCORRUPTED; - ASSERT(0); /* to be implemented later */ - return -EFSCORRUPTED; + return xfs_iformat_rtrmap(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -659,7 +659,9 @@ xfs_iflush_fork( break; case XFS_DINODE_FMT_RMAP: - ASSERT(0); /* to be implemented later */ + ASSERT(whichfork == XFS_DATA_FORK); + if (iip->ili_fields & brootflag[whichfork]) + xfs_iflush_rtrmap(ip, dip); break; default: diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h index 897a1b72f8d12..102a3574fc682 100644 --- a/fs/xfs/libxfs/xfs_ondisk.h +++ b/fs/xfs/libxfs/xfs_ondisk.h @@ -78,6 +78,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_raw, 4); XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo, 48); XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t, 8); + XFS_CHECK_STRUCT_SIZE(struct xfs_rtrmap_root, 4); /* * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 9d2087962c53a..e2ee2a500ca38 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -85,6 +85,39 @@ xfs_rtrmapbt_get_maxrecs( return cur->bc_mp->m_rtrmap_mxr[level != 0]; } +/* Calculate number of records in the ondisk realtime rmap btree inode root. */ +unsigned int +xfs_rtrmapbt_droot_maxrecs( + unsigned int blocklen, + bool leaf) +{ + blocklen -= sizeof(struct xfs_rtrmap_root); + + if (leaf) + return blocklen / sizeof(struct xfs_rmap_rec); + return blocklen / (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Get the maximum records we could store in the on-disk format. + * + * For non-root nodes this is equivalent to xfs_rtrmapbt_get_maxrecs, but + * for the root node this checks the available space in the dinode fork + * so that we can resize the in-memory buffer to match it. After a + * resize to the maximum size this function returns the same value + * as xfs_rtrmapbt_get_maxrecs for the root node, too. + */ +STATIC int +xfs_rtrmapbt_get_dmaxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level != cur->bc_nlevels - 1) + return cur->bc_mp->m_rtrmap_mxr[level != 0]; + return xfs_rtrmapbt_droot_maxrecs(cur->bc_ino.forksize, level == 0); +} + /* * Convert the ondisk record's offset field into the ondisk key's offset field. * Fork and bmbt are significant parts of the rmap record key, but written @@ -377,6 +410,64 @@ xfs_rtrmapbt_keys_contiguous( be32_to_cpu(key2->rmap.rm_startblock)); } +/* Move the rtrmap btree root from one incore buffer to another. */ +static void +xfs_rtrmapbt_broot_move( + struct xfs_inode *ip, + int whichfork, + struct xfs_btree_block *dst_broot, + size_t dst_bytes, + struct xfs_btree_block *src_broot, + size_t src_bytes, + unsigned int level, + unsigned int numrecs) +{ + struct xfs_mount *mp = ip->i_mount; + void *dptr; + void *sptr; + + ASSERT(xfs_rtrmap_droot_space(src_broot) <= + xfs_inode_fork_size(ip, whichfork)); + + /* + * We always have to move the pointers because they are not butted + * against the btree block header. + */ + if (numrecs && level > 0) { + sptr = xfs_rtrmap_broot_ptr_addr(mp, src_broot, 1, src_bytes); + dptr = xfs_rtrmap_broot_ptr_addr(mp, dst_broot, 1, dst_bytes); + memmove(dptr, sptr, numrecs * sizeof(xfs_fsblock_t)); + } + + if (src_broot == dst_broot) + return; + + /* + * If the root is being totally relocated, we have to migrate the block + * header and the keys/records that come after it. + */ + memcpy(dst_broot, src_broot, XFS_RTRMAP_BLOCK_LEN); + + if (!numrecs) + return; + + if (level == 0) { + sptr = xfs_rtrmap_rec_addr(src_broot, 1); + dptr = xfs_rtrmap_rec_addr(dst_broot, 1); + memcpy(dptr, sptr, numrecs * sizeof(struct xfs_rmap_rec)); + } else { + sptr = xfs_rtrmap_key_addr(src_broot, 1); + dptr = xfs_rtrmap_key_addr(dst_broot, 1); + memcpy(dptr, sptr, numrecs * 2 * sizeof(struct xfs_rmap_key)); + } +} + +static const struct xfs_ifork_broot_ops xfs_rtrmapbt_iroot_ops = { + .maxrecs = xfs_rtrmapbt_maxrecs, + .size = xfs_rtrmap_broot_space_calc, + .move = xfs_rtrmapbt_broot_move, +}; + const struct xfs_btree_ops xfs_rtrmapbt_ops = { .rec_len = sizeof(struct xfs_rmap_rec), .key_len = 2 * sizeof(struct xfs_rmap_key), @@ -390,6 +481,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { .free_block = xfs_btree_free_imeta_block, .get_minrecs = xfs_rtrmapbt_get_minrecs, .get_maxrecs = xfs_rtrmapbt_get_maxrecs, + .get_dmaxrecs = xfs_rtrmapbt_get_dmaxrecs, .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, @@ -400,6 +492,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { .keys_inorder = xfs_rtrmapbt_keys_inorder, .recs_inorder = xfs_rtrmapbt_recs_inorder, .keys_contiguous = xfs_rtrmapbt_keys_contiguous, + .iroot_ops = &xfs_rtrmapbt_iroot_ops, }; /* Initialize a new rt rmap btree cursor. */ @@ -648,3 +741,130 @@ xfs_rtrmapbt_calc_reserves( return max_t(xfs_filblks_t, mp->m_sb.sb_rgblocks >> 6, xfs_rtrmapbt_max_size(mp, mp->m_sb.sb_rgblocks)); } + +/* Convert on-disk form of btree root to in-memory form. */ +STATIC void +xfs_rtrmapbt_from_disk( + struct xfs_inode *ip, + struct xfs_rtrmap_root *dblock, + unsigned int dblocklen, + struct xfs_btree_block *rblock) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_rmap_key *fkp; + __be64 *fpp; + struct xfs_rmap_key *tkp; + __be64 *tpp; + struct xfs_rmap_rec *frp; + struct xfs_rmap_rec *trp; + unsigned int rblocklen = xfs_rtrmap_broot_space(mp, dblock); + unsigned int numrecs; + unsigned int maxrecs; + + xfs_btree_init_block(mp, rblock, &xfs_rtrmapbt_ops, 0, 0, ip->i_ino); + + rblock->bb_level = dblock->bb_level; + rblock->bb_numrecs = dblock->bb_numrecs; + numrecs = be16_to_cpu(dblock->bb_numrecs); + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrmapbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrmap_droot_key_addr(dblock, 1); + tkp = xfs_rtrmap_key_addr(rblock, 1); + fpp = xfs_rtrmap_droot_ptr_addr(dblock, 1, maxrecs); + tpp = xfs_rtrmap_broot_ptr_addr(mp, rblock, 1, rblocklen); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrmap_droot_rec_addr(dblock, 1); + trp = xfs_rtrmap_rec_addr(rblock, 1); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Load a realtime reverse mapping btree root in from disk. */ +int +xfs_iformat_rtrmap( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrmap_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + unsigned int numrecs; + unsigned int level; + int dsize; + + dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK); + numrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (level > mp->m_rtrmap_maxlevels || + xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) + return -EFSCORRUPTED; + + xfs_iroot_alloc(ip, XFS_DATA_FORK, + xfs_rtrmap_broot_space_calc(mp, level, numrecs)); + xfs_rtrmapbt_from_disk(ip, dfp, dsize, ifp->if_broot); + return 0; +} + +/* Convert in-memory form of btree root to on-disk form. */ +void +xfs_rtrmapbt_to_disk( + struct xfs_mount *mp, + struct xfs_btree_block *rblock, + unsigned int rblocklen, + struct xfs_rtrmap_root *dblock, + unsigned int dblocklen) +{ + struct xfs_rmap_key *fkp; + __be64 *fpp; + struct xfs_rmap_key *tkp; + __be64 *tpp; + struct xfs_rmap_rec *frp; + struct xfs_rmap_rec *trp; + unsigned int numrecs; + unsigned int maxrecs; + + ASSERT(rblock->bb_magic == cpu_to_be32(XFS_RTRMAP_CRC_MAGIC)); + ASSERT(uuid_equal(&rblock->bb_u.l.bb_uuid, &mp->m_sb.sb_meta_uuid)); + ASSERT(rblock->bb_u.l.bb_blkno == cpu_to_be64(XFS_BUF_DADDR_NULL)); + ASSERT(rblock->bb_u.l.bb_leftsib == cpu_to_be64(NULLFSBLOCK)); + ASSERT(rblock->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK)); + + dblock->bb_level = rblock->bb_level; + dblock->bb_numrecs = rblock->bb_numrecs; + numrecs = be16_to_cpu(rblock->bb_numrecs); + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrmapbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrmap_key_addr(rblock, 1); + tkp = xfs_rtrmap_droot_key_addr(dblock, 1); + fpp = xfs_rtrmap_broot_ptr_addr(mp, rblock, 1, rblocklen); + tpp = xfs_rtrmap_droot_ptr_addr(dblock, 1, maxrecs); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrmap_rec_addr(rblock, 1); + trp = xfs_rtrmap_droot_rec_addr(dblock, 1); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Flush a realtime reverse mapping btree root out to disk. */ +void +xfs_iflush_rtrmap( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrmap_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + + ASSERT(ifp->if_broot != NULL); + ASSERT(ifp->if_broot_bytes > 0); + ASSERT(xfs_rtrmap_droot_space(ifp->if_broot) <= + xfs_inode_fork_size(ip, XFS_DATA_FORK)); + xfs_rtrmapbt_to_disk(ip->i_mount, ifp->if_broot, ifp->if_broot_bytes, + dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index b7950e6d45d40..2b26a05f90779 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -27,6 +27,7 @@ void xfs_rtrmapbt_commit_staged_btree(struct xfs_btree_cur *cur, unsigned int xfs_rtrmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, bool leaf); void xfs_rtrmapbt_compute_maxlevels(struct xfs_mount *mp); +unsigned int xfs_rtrmapbt_droot_maxrecs(unsigned int blocklen, bool leaf); /* * Addresses of records, keys, and pointers within an incore rtrmapbt block. @@ -86,4 +87,115 @@ int xfs_rtrmapbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, xfs_filblks_t xfs_rtrmapbt_calc_reserves(struct xfs_mount *mp); +/* Addresses of key, pointers, and records within an ondisk rtrmapbt block. */ + +static inline struct xfs_rmap_rec * +xfs_rtrmap_droot_rec_addr( + struct xfs_rtrmap_root *block, + unsigned int index) +{ + return (struct xfs_rmap_rec *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_rmap_rec)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_droot_key_addr( + struct xfs_rtrmap_root *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)(block + 1) + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_droot_ptr_addr( + struct xfs_rtrmap_root *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrmap_ptr_t *) + ((char *)(block + 1) + + maxrecs * 2 * sizeof(struct xfs_rmap_key) + + (index - 1) * sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Address of pointers within the incore btree root. + * + * These are to be used when we know the size of the block and + * we don't have a cursor. + */ +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_broot_ptr_addr( + struct xfs_mount *mp, + struct xfs_btree_block *bb, + unsigned int index, + unsigned int block_size) +{ + return xfs_rtrmap_ptr_addr(bb, index, + xfs_rtrmapbt_maxrecs(mp, block_size, false)); +} + +/* + * Compute the space required for the incore btree root containing the given + * number of records. + */ +static inline size_t +xfs_rtrmap_broot_space_calc( + struct xfs_mount *mp, + unsigned int level, + unsigned int nrecs) +{ + size_t sz = XFS_RTRMAP_BLOCK_LEN; + + if (level > 0) + return sz + nrecs * (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); + return sz + nrecs * sizeof(struct xfs_rmap_rec); +} + +/* + * Compute the space required for the incore btree root given the ondisk + * btree root block. + */ +static inline size_t +xfs_rtrmap_broot_space(struct xfs_mount *mp, struct xfs_rtrmap_root *bb) +{ + return xfs_rtrmap_broot_space_calc(mp, be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +/* Compute the space required for the ondisk root block. */ +static inline size_t +xfs_rtrmap_droot_space_calc( + unsigned int level, + unsigned int nrecs) +{ + size_t sz = sizeof(struct xfs_rtrmap_root); + + if (level > 0) + return sz + nrecs * (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); + return sz + nrecs * sizeof(struct xfs_rmap_rec); +} + +/* + * Compute the space required for the ondisk root block given an incore root + * block. + */ +static inline size_t +xfs_rtrmap_droot_space(struct xfs_btree_block *bb) +{ + return xfs_rtrmap_droot_space_calc(be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +int xfs_iformat_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); +void xfs_rtrmapbt_to_disk(struct xfs_mount *mp, struct xfs_btree_block *rblock, + unsigned int rblocklen, struct xfs_rtrmap_root *dblock, + unsigned int dblocklen); +void xfs_iflush_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index 0ec61d17f98e6..4cf967df28ef1 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -22,6 +22,7 @@ #include "xfs_log_recover.h" #include "xfs_icache.h" #include "xfs_bmap_btree.h" +#include "xfs_rtrmap_btree.h" STATIC void xlog_recover_inode_ra_pass2( @@ -266,6 +267,31 @@ xlog_dinode_verify_extent_counts( return 0; } +static inline int +xlog_recover_inode_dbroot( + struct xfs_mount *mp, + void *src, + unsigned int len, + struct xfs_dinode *dip) +{ + void *dfork = XFS_DFORK_DPTR(dip); + unsigned int dsize = XFS_DFORK_DSIZE(dip, mp); + + switch (dip->di_format) { + case XFS_DINODE_FMT_BTREE: + xfs_bmbt_to_bmdr(mp, src, len, dfork, dsize); + break; + case XFS_DINODE_FMT_RMAP: + xfs_rtrmapbt_to_disk(mp, src, len, dfork, dsize); + break; + default: + ASSERT(0); + return -EFSCORRUPTED; + } + + return 0; +} + STATIC int xlog_recover_inode_commit_pass2( struct xlog *log, @@ -475,9 +501,9 @@ xlog_recover_inode_commit_pass2( break; case XFS_ILOG_DBROOT: - xfs_bmbt_to_bmdr(mp, (struct xfs_btree_block *)src, len, - (struct xfs_bmdr_block *)XFS_DFORK_DPTR(dip), - XFS_DFORK_DSIZE(dip, mp)); + error = xlog_recover_inode_dbroot(mp, src, len, dip); + if (error) + goto out_release; break; default: From patchwork Sun Dec 31 21:35:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507678 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 22991BA22 for ; Sun, 31 Dec 2023 21:35:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Fxmf6+2H" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E95F8C433C7; Sun, 31 Dec 2023 21:35:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058527; bh=3IMVhDjQxHaXunVlRzF+LUUs4ebxuUV6PdkfQpaz2Gc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Fxmf6+2HkOhQSdg3CfpaCPyoQJ+wQjXI23rV+Y3k/jKfCrILOOoS5i7pI+JZMs4Vd 2zPtx2StKzDAX39PN4iZ0yw7E1TQIMgMgJW/YNHwd4I92wAF+h4R4V33LNG9+b4d7a DXNfaGfWXowf8Wf4vJAQP39SR6lCh2THyLxwEaXFYIjhk5or69G2dDB2HPGrcAOVIe hVLJhEN7xzheIU5732IBt+rpsMIIJoxbvv2WXsl0XnPoXEZEYD7G0QBMJWhurG1V+I Qu77iHgqecdwzoq7iFaU+8bHaf90m8alQcWV/GhozltqpQQkKRY9qYmDw5ardOYDqQ 70KBKYXBzGm2A== Date: Sun, 31 Dec 2023 13:35:26 -0800 Subject: [PATCH 14/39] xfs: allow inodes with zero extents but nonzero nblocks From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850127.1764998.1732124242804706299.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Metadata inodes that store btrees will have zero extents and a nonzero nblocks. Adjust the inode verifier so that this combination is not flagged. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_inode_buf.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 5ed779cbe6f9f..dae2efec1d5d0 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -601,9 +601,6 @@ xfs_dinode_verify( if (mode && nextents + naextents > nblocks) return __this_address; - if (nextents + naextents == 0 && nblocks != 0) - return __this_address; - if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents) return __this_address; @@ -707,6 +704,19 @@ xfs_dinode_verify( return fa; } + /* metadata inodes containing btrees always have zero extent count */ + if (flags2 & XFS_DIFLAG2_METADIR) { + switch (XFS_DFORK_FORMAT(dip, XFS_DATA_FORK)) { + case XFS_DINODE_FMT_RMAP: + break; + default: + if (nextents + naextents == 0 && nblocks != 0) + return __this_address; + break; + } + } else if (nextents + naextents == 0 && nblocks != 0) + return __this_address; + return NULL; } From patchwork Sun Dec 31 21:35:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507679 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18980BA2B for ; Sun, 31 Dec 2023 21:35:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GVXZDemn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 83425C433C7; Sun, 31 Dec 2023 21:35:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058542; bh=DJCuE8v7JjKZ0SJZxYo7VKqaFDXOcvR3eB2aqYRo16c=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=GVXZDemnamKKR++HSAbdpaGyl8au/+9Pub0VQPzQn0axPaMZRqsUTCPsKFd6xqcV7 ORoBP//1MMgy/N8JT2nDx5+ILKG0Pu5CewKiof6WMfZ9HWPVAXYVcQYbnwtqetp4BA 6tCxidK8/RLiu6pd51P7VuZOzReZagvejhrq52OdUvGPY8dpNi0YCTOcmSQNhUxz8V 3//0TEeT2C7esviESwKkR4d1ep6XtU9xwJubWTaXRsAuIx9FUzm6Rjs64k+xW/tvm6 k5sC7oMMKR380nXddV4mRYFKT0BrUbQXlvJssGEvJuB9M86LOEqayVUfnAkOU5zo+c xpTEp0d/JnmcQ== Date: Sun, 31 Dec 2023 13:35:42 -0800 Subject: [PATCH 15/39] xfs: use realtime EFI to free extents when realtime rmap is enabled From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850143.1764998.796898768023798748.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When rmap is enabled, XFS expects a certain order of operations, which is: 1) remove the file mapping, 2) remove the reverse mapping, and then 3) free the blocks. xfs_bmap_del_extent_real tries to do 1 and 3 in the same transaction, which means that when rtrmap is enabled, we have to use realtime EFIs to maintain the expected order. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index ce9c247525a3d..992e492972e76 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5057,7 +5057,6 @@ xfs_bmap_del_extent_real( { xfs_fsblock_t del_endblock=0; /* first block past del */ xfs_fileoff_t del_endoff; /* first offset past del */ - int do_fx; /* free extent at end of routine */ int error; /* error return value */ struct xfs_bmbt_irec got; /* current extent entry */ xfs_fileoff_t got_endoff; /* first offset past got */ @@ -5070,6 +5069,8 @@ xfs_bmap_del_extent_real( uint qfield; /* quota field to update */ uint32_t state = xfs_bmap_fork_to_state(whichfork); struct xfs_bmbt_irec old; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); + bool want_free = !(bflags & XFS_BMAPI_REMAP); *logflagsp = 0; @@ -5101,18 +5102,24 @@ xfs_bmap_del_extent_real( return -ENOSPC; *logflagsp = XFS_ILOG_CORE; - if (xfs_ifork_is_realtime(ip, whichfork)) { - if (!(bflags & XFS_BMAPI_REMAP)) { + if (isrt) { + /* + * Historically, we did not use EFIs to free realtime extents. + * However, when reverse mapping is enabled, we must maintain + * the same order of operations as the data device, which is: + * Remove the file mapping, remove the reverse mapping, and + * then free the blocks. This means that we must delay the + * freeing until after we've scheduled the rmap update. + */ + if (want_free && !xfs_has_rtrmapbt(mp)) { error = xfs_rtfree_blocks(tp, del->br_startblock, del->br_blockcount); if (error) return error; + want_free = false; } - - do_fx = 0; qfield = XFS_TRANS_DQ_RTBCOUNT; } else { - do_fx = 1; qfield = XFS_TRANS_DQ_BCOUNT; } nblks = del->br_blockcount; @@ -5262,7 +5269,7 @@ xfs_bmap_del_extent_real( /* * If we need to, add to list of extents to delete. */ - if (do_fx && !(bflags & XFS_BMAPI_REMAP)) { + if (want_free) { if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { xfs_refcount_decrease_extent(tp, del); } else { @@ -5271,6 +5278,8 @@ xfs_bmap_del_extent_real( if ((bflags & XFS_BMAPI_NODISCARD) || del->br_state == XFS_EXT_UNWRITTEN) efi_flags |= XFS_FREE_EXTENT_SKIP_DISCARD; + if (isrt) + efi_flags |= XFS_FREE_EXTENT_REALTIME; error = xfs_free_extent_later(tp, del->br_startblock, del->br_blockcount, NULL, From patchwork Sun Dec 31 21:35:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507680 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54878BA22 for ; Sun, 31 Dec 2023 21:35:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="uOW94Tep" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 20787C433C8; Sun, 31 Dec 2023 21:35:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058558; bh=WmUnPLLXqbFbDN+61WQ7MtxTARtPwv+692jaBDERyCo=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=uOW94TeprS2wXb153uEe0r+6Z1gcyOQOWQMToslpx3o+wwPOK3xMorrFjSsQ9rUDa c7shFqXm1CPqYpKBR4/h5NkjuQ4yMMJyHV41CVcz7fmENdt0SuT1obUfAU4ZlwfY09 3CK0DVLwgHUm4TZMKL3WUqFQdmiILRgyExveJ/rm5hH+1bM0Kwh1lBQzWNfSOARSow RJwFLmDIMEqVaApxmk5TocN5SVohCusNkBR4mZgknk9SP1xiAZ5gvWjw909BreNqag Bsesk8Zzi6EPBCmAmn68BY+p0CRa18Tfcgp7kzJDpH+lQg2wPgsvceZ1Z79nZtXmht 1jGk1aSV3tz9g== Date: Sun, 31 Dec 2023 13:35:57 -0800 Subject: [PATCH 16/39] xfs: wire up rmap map and unmap to the realtime rmapbt From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850159.1764998.17772260162651574654.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Connect the map and unmap reverse-mapping operations to the realtime rmapbt via the deferred operation callbacks. This enables us to perform rmap operations against the correct btree. [Contains a minor bugfix from hch] Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_rmap.c | 37 ++++++++++++++++++++++++++++++++++--- fs/xfs/libxfs/xfs_rtgroup.c | 9 +++++++++ fs/xfs/libxfs/xfs_rtgroup.h | 5 ++++- 3 files changed, 47 insertions(+), 4 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index daef2d67eb7a0..8766805ed1343 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -26,6 +26,7 @@ #include "xfs_health.h" #include "xfs_rmap_item.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" struct kmem_cache *xfs_rmap_intent_cache; @@ -2692,9 +2693,39 @@ xfs_rtrmap_finish_one( struct xfs_rmap_intent *ri, struct xfs_btree_cur **pcur) { - /* coming in a subsequent patch */ - ASSERT(0); - return -EFSCORRUPTED; + struct xfs_owner_info oinfo; + struct xfs_mount *mp = tp->t_mountp; + struct xfs_btree_cur *rcur = *pcur; + xfs_rgnumber_t rgno; + xfs_rgblock_t bno; + bool unwritten; + + trace_xfs_rmap_deferred(mp, ri); + + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_RMAP_FINISH_ONE)) + return -EIO; + + /* + * If we haven't gotten a cursor or the cursor rtgroup doesn't match + * the startblock, get one now. + */ + if (rcur != NULL && rcur->bc_ino.rtg != ri->ri_rtg) { + xfs_btree_del_cursor(rcur, 0); + rcur = NULL; + } + if (rcur == NULL) { + xfs_rtgroup_lock(tp, ri->ri_rtg, XFS_RTGLOCK_RMAP); + *pcur = rcur = xfs_rtrmapbt_init_cursor(mp, tp, ri->ri_rtg, + ri->ri_rtg->rtg_rmapip); + } + + xfs_rmap_ino_owner(&oinfo, ri->ri_owner, ri->ri_whichfork, + ri->ri_bmap.br_startoff); + unwritten = ri->ri_bmap.br_state == XFS_EXT_UNWRITTEN; + bno = xfs_rtb_to_rgbno(mp, ri->ri_bmap.br_startblock, &rgno); + + return __xfs_rmap_finish_intent(rcur, ri->ri_type, bno, + ri->ri_bmap.br_blockcount, &oinfo, unwritten); } /* diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index 7a45a16cbbab8..6ffe1c21a1ea4 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -552,6 +552,12 @@ xfs_rtgroup_lock( xfs_rtbitmap_lock(tp, rtg->rtg_mount); else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) xfs_rtbitmap_lock_shared(rtg->rtg_mount, XFS_RBMLOCK_BITMAP); + + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg->rtg_rmapip) { + xfs_ilock(rtg->rtg_rmapip, XFS_ILOCK_EXCL); + if (tp) + xfs_trans_ijoin(tp, rtg->rtg_rmapip, XFS_ILOCK_EXCL); + } } /* Unlock metadata inodes associated with this rt group. */ @@ -564,6 +570,9 @@ xfs_rtgroup_unlock( ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) || !(rtglock_flags & XFS_RTGLOCK_BITMAP)); + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg->rtg_rmapip) + xfs_iunlock(rtg->rtg_rmapip, XFS_ILOCK_EXCL); + if (rtglock_flags & XFS_RTGLOCK_BITMAP) xfs_rtbitmap_unlock(rtg->rtg_mount); else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 77503bda35563..559a5135820d3 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -231,9 +231,12 @@ int xfs_rtgroup_init_secondary_super(struct xfs_mount *mp, xfs_rgnumber_t rgno, #define XFS_RTGLOCK_BITMAP (1U << 0) /* Lock the rt bitmap inode in shared mode */ #define XFS_RTGLOCK_BITMAP_SHARED (1U << 1) +/* Lock the rt rmap inode in exclusive mode */ +#define XFS_RTGLOCK_RMAP (1U << 2) #define XFS_RTGLOCK_ALL_FLAGS (XFS_RTGLOCK_BITMAP | \ - XFS_RTGLOCK_BITMAP_SHARED) + XFS_RTGLOCK_BITMAP_SHARED | \ + XFS_RTGLOCK_RMAP) void xfs_rtgroup_lock(struct xfs_trans *tp, struct xfs_rtgroup *rtg, unsigned int rtglock_flags); From patchwork Sun Dec 31 21:36:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507681 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01F71BA2B for ; Sun, 31 Dec 2023 21:36:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="N7qv5jm+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C5D37C433C8; Sun, 31 Dec 2023 21:36:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058573; bh=3tEBmX3FuuIzH6bkTOpyHSMQo3O90R06xMO4fzQuSYo=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=N7qv5jm+3WnFS7WIcPcg/6WOqdQvyBze4/f69crZE+nXbGz62T0UG7IQCWRP9nN6H mR/uDhV5CNku+RmTEC3BL9m13Fj9lphJUvSrFplUZBFZWWU/7LgHSY+Yfnlfi/ppnA dYVUrEM6EeJGYkENDtbrnW67P/7wkH9yvYLPZ7gY0wy8PNhsd6lxkQrdU/QuLYzmPR yEJZSfddAv7ITWlBDh2uMuRhXq3lM+mwhK+3Tm3fjLh/mDXsHKDBqt83XckuHVVAEb BPwItKJna2nCVt0pAdVZ5lwxII8Uuvecqwdstm/Ns8agZxogXMnv/ob8QQiG6Bjgbo b65DK0tST0DXQ== Date: Sun, 31 Dec 2023 13:36:13 -0800 Subject: [PATCH 17/39] xfs: create routine to allocate and initialize a realtime rmap btree inode From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850174.1764998.2153400423503289445.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create a library routine to allocate and initialize an empty realtime rmapbt inode. We'll use this for mkfs and repair. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_rtrmap_btree.c | 34 ++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 4 ++++ 2 files changed, 38 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index e2ee2a500ca38..b824562bdc2ec 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -868,3 +868,37 @@ xfs_iflush_rtrmap( xfs_rtrmapbt_to_disk(ip->i_mount, ifp->if_broot, ifp->if_broot_bytes, dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); } + +/* + * Create a realtime rmap btree inode. + * + * Regardless of the return value, the caller must clean up @upd. If a new + * inode is returned through @*ipp, the caller must finish setting up the incore + * inode and release it. + */ +int +xfs_rtrmapbt_create( + struct xfs_imeta_update *upd, + struct xfs_inode **ipp) +{ + struct xfs_mount *mp = upd->mp; + struct xfs_ifork *ifp; + int error; + + error = xfs_imeta_create(upd, S_IFREG, ipp); + if (error) + return error; + + ifp = xfs_ifork_ptr(upd->ip, XFS_DATA_FORK); + ifp->if_format = XFS_DINODE_FMT_RMAP; + ASSERT(ifp->if_broot_bytes == 0); + ASSERT(ifp->if_bytes == 0); + + /* Initialize the empty incore btree root. */ + xfs_iroot_alloc(upd->ip, XFS_DATA_FORK, + xfs_rtrmap_broot_space_calc(mp, 0, 0)); + xfs_btree_init_block(mp, ifp->if_broot, &xfs_rtrmapbt_ops, 0, 0, + upd->ip->i_ino); + xfs_trans_log_inode(upd->tp, upd->ip, XFS_ILOG_CORE | XFS_ILOG_DBROOT); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 2b26a05f90779..108ab8c0aea44 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -198,4 +198,8 @@ void xfs_rtrmapbt_to_disk(struct xfs_mount *mp, struct xfs_btree_block *rblock, unsigned int dblocklen); void xfs_iflush_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); +struct xfs_imeta_update; + +int xfs_rtrmapbt_create(struct xfs_imeta_update *upd, struct xfs_inode **ipp); + #endif /* __XFS_RTRMAP_BTREE_H__ */ From patchwork Sun Dec 31 21:36:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507682 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD55BBA22 for ; Sun, 31 Dec 2023 21:36:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="S0mbL56E" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7761EC433C8; Sun, 31 Dec 2023 21:36:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058589; bh=7YzAdLPjL9f0cgJVIiioe1voLkSZOtqUaFenqe16OGg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=S0mbL56E0uwgxam72L06wS1gzf95aJpe9M20ajNjbfRUju+/cn7S+lCXQEtYAGW+S tRUL2CfIaovoCaX5CFDSqr3B6yv5ueC1C8IpYQishy/JSgwfcmg6q72L2W/oAjcYmH 97HE8xLg1/zd07V8ChcduKrPqGw5+Ch4K2/6b8YInRV9LpYEgiJqViPSU/9rqZ5Rfz FlcJ5jqZ4jqjdaWoPeH4faMceYpnucnpblK94QVO3rnAZKAYaCh0Vy8dFiozU6cZ+S HbhkjLm6bIggKa/AwvfUMDjJ55upON82VWgql08ttEV/rMWdAkh1QnxPfJ+CMtDtGC IFyMX96jQRthg== Date: Sun, 31 Dec 2023 13:36:28 -0800 Subject: [PATCH 18/39] xfs: rearrange xfs_fsmap.c a little bit From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850190.1764998.8305564725972480539.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong The order of the functions in this file has gotten a little confusing over the years. Specifically, the two data device implementations (bnobt and rmapbt) could be adjacent in the source code instead of split in two by the logdev and rtdev fsmap implementations. We're about to add more functionality to this file, so rearrange things now. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_fsmap.c | 270 ++++++++++++++++++++++++++-------------------------- 1 file changed, 135 insertions(+), 135 deletions(-) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index c49a5b01c3e0c..34769cb35c908 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -427,141 +427,6 @@ xfs_getfsmap_set_irec_flags( irec->rm_flags |= XFS_RMAP_UNWRITTEN; } -/* Execute a getfsmap query against the log device. */ -STATIC int -xfs_getfsmap_logdev( - struct xfs_trans *tp, - const struct xfs_fsmap *keys, - struct xfs_getfsmap_info *info) -{ - struct xfs_mount *mp = tp->t_mountp; - struct xfs_rmap_irec rmap; - xfs_daddr_t rec_daddr, len_daddr; - xfs_fsblock_t start_fsb, end_fsb; - uint64_t eofs; - - eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks); - if (keys[0].fmr_physical >= eofs) - return 0; - start_fsb = XFS_BB_TO_FSBT(mp, - keys[0].fmr_physical + keys[0].fmr_length); - end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); - - /* Adjust the low key if we are continuing from where we left off. */ - if (keys[0].fmr_length > 0) - info->low_daddr = XFS_FSB_TO_BB(mp, start_fsb); - - trace_xfs_fsmap_low_key_linear(mp, info->dev, start_fsb); - trace_xfs_fsmap_high_key_linear(mp, info->dev, end_fsb); - - if (start_fsb > 0) - return 0; - - /* Fabricate an rmap entry for the external log device. */ - rmap.rm_startblock = 0; - rmap.rm_blockcount = mp->m_sb.sb_logblocks; - rmap.rm_owner = XFS_RMAP_OWN_LOG; - rmap.rm_offset = 0; - rmap.rm_flags = 0; - - rec_daddr = XFS_FSB_TO_BB(mp, rmap.rm_startblock); - len_daddr = XFS_FSB_TO_BB(mp, rmap.rm_blockcount); - return xfs_getfsmap_helper(tp, info, &rmap, rec_daddr, len_daddr); -} - -#ifdef CONFIG_XFS_RT -/* Transform a rtbitmap "record" into a fsmap */ -STATIC int -xfs_getfsmap_rtdev_rtbitmap_helper( - struct xfs_mount *mp, - struct xfs_trans *tp, - const struct xfs_rtalloc_rec *rec, - void *priv) -{ - struct xfs_getfsmap_info *info = priv; - struct xfs_rmap_irec irec; - xfs_rtblock_t rtbno; - xfs_daddr_t rec_daddr, len_daddr; - - rtbno = xfs_rtx_to_rtb(mp, rec->ar_startext); - rec_daddr = XFS_FSB_TO_BB(mp, rtbno); - irec.rm_startblock = rtbno; - - rtbno = xfs_rtx_to_rtb(mp, rec->ar_extcount); - len_daddr = XFS_FSB_TO_BB(mp, rtbno); - irec.rm_blockcount = rtbno; - - irec.rm_owner = XFS_RMAP_OWN_NULL; /* "free" */ - irec.rm_offset = 0; - irec.rm_flags = 0; - - return xfs_getfsmap_helper(tp, info, &irec, rec_daddr, len_daddr); -} - -/* Execute a getfsmap query against the realtime device rtbitmap. */ -STATIC int -xfs_getfsmap_rtdev_rtbitmap( - struct xfs_trans *tp, - const struct xfs_fsmap *keys, - struct xfs_getfsmap_info *info) -{ - - struct xfs_rtalloc_rec alow = { 0 }; - struct xfs_rtalloc_rec ahigh = { 0 }; - struct xfs_mount *mp = tp->t_mountp; - xfs_rtblock_t start_rtb; - xfs_rtblock_t end_rtb; - uint64_t eofs; - int error; - - eofs = XFS_FSB_TO_BB(mp, xfs_rtx_to_rtb(mp, mp->m_sb.sb_rextents)); - if (keys[0].fmr_physical >= eofs) - return 0; - start_rtb = XFS_BB_TO_FSBT(mp, - keys[0].fmr_physical + keys[0].fmr_length); - end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); - - info->missing_owner = XFS_FMR_OWN_UNKNOWN; - - /* Adjust the low key if we are continuing from where we left off. */ - if (keys[0].fmr_length > 0) { - info->low_daddr = XFS_FSB_TO_BB(mp, start_rtb); - if (info->low_daddr >= eofs) - return 0; - } - - trace_xfs_fsmap_low_key_linear(mp, info->dev, start_rtb); - trace_xfs_fsmap_high_key_linear(mp, info->dev, end_rtb); - - xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP); - - /* - * Set up query parameters to return free rtextents covering the range - * we want. - */ - alow.ar_startext = xfs_rtb_to_rtx(mp, start_rtb); - ahigh.ar_startext = xfs_rtb_to_rtxup(mp, end_rtb); - error = xfs_rtalloc_query_range(mp, tp, &alow, &ahigh, - xfs_getfsmap_rtdev_rtbitmap_helper, info); - if (error) - goto err; - - /* - * Report any gaps at the end of the rtbitmap by simulating a null - * rmap starting at the block after the end of the query range. - */ - info->last = true; - ahigh.ar_startext = min(mp->m_sb.sb_rextents, ahigh.ar_startext); - - error = xfs_getfsmap_rtdev_rtbitmap_helper(mp, tp, &ahigh, info); - if (error) - goto err; -err: - xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP); - return error; -} -#endif /* CONFIG_XFS_RT */ - static inline bool rmap_not_shareable(struct xfs_mount *mp, const struct xfs_rmap_irec *r) { @@ -786,6 +651,141 @@ xfs_getfsmap_datadev_bnobt( xfs_getfsmap_datadev_bnobt_query, &akeys[0]); } +/* Execute a getfsmap query against the log device. */ +STATIC int +xfs_getfsmap_logdev( + struct xfs_trans *tp, + const struct xfs_fsmap *keys, + struct xfs_getfsmap_info *info) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_rmap_irec rmap; + xfs_daddr_t rec_daddr, len_daddr; + xfs_fsblock_t start_fsb, end_fsb; + uint64_t eofs; + + eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks); + if (keys[0].fmr_physical >= eofs) + return 0; + start_fsb = XFS_BB_TO_FSBT(mp, + keys[0].fmr_physical + keys[0].fmr_length); + end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); + + /* Adjust the low key if we are continuing from where we left off. */ + if (keys[0].fmr_length > 0) + info->low_daddr = XFS_FSB_TO_BB(mp, start_fsb); + + trace_xfs_fsmap_low_key_linear(mp, info->dev, start_fsb); + trace_xfs_fsmap_high_key_linear(mp, info->dev, end_fsb); + + if (start_fsb > 0) + return 0; + + /* Fabricate an rmap entry for the external log device. */ + rmap.rm_startblock = 0; + rmap.rm_blockcount = mp->m_sb.sb_logblocks; + rmap.rm_owner = XFS_RMAP_OWN_LOG; + rmap.rm_offset = 0; + rmap.rm_flags = 0; + + rec_daddr = XFS_FSB_TO_BB(mp, rmap.rm_startblock); + len_daddr = XFS_FSB_TO_BB(mp, rmap.rm_blockcount); + return xfs_getfsmap_helper(tp, info, &rmap, rec_daddr, len_daddr); +} + +#ifdef CONFIG_XFS_RT +/* Transform a rtbitmap "record" into a fsmap */ +STATIC int +xfs_getfsmap_rtdev_rtbitmap_helper( + struct xfs_mount *mp, + struct xfs_trans *tp, + const struct xfs_rtalloc_rec *rec, + void *priv) +{ + struct xfs_getfsmap_info *info = priv; + struct xfs_rmap_irec irec; + xfs_rtblock_t rtbno; + xfs_daddr_t rec_daddr, len_daddr; + + rtbno = xfs_rtx_to_rtb(mp, rec->ar_startext); + rec_daddr = XFS_FSB_TO_BB(mp, rtbno); + irec.rm_startblock = rtbno; + + rtbno = xfs_rtx_to_rtb(mp, rec->ar_extcount); + len_daddr = XFS_FSB_TO_BB(mp, rtbno); + irec.rm_blockcount = rtbno; + + irec.rm_owner = XFS_RMAP_OWN_NULL; /* "free" */ + irec.rm_offset = 0; + irec.rm_flags = 0; + + return xfs_getfsmap_helper(tp, info, &irec, rec_daddr, len_daddr); +} + +/* Execute a getfsmap query against the realtime device rtbitmap. */ +STATIC int +xfs_getfsmap_rtdev_rtbitmap( + struct xfs_trans *tp, + const struct xfs_fsmap *keys, + struct xfs_getfsmap_info *info) +{ + + struct xfs_rtalloc_rec alow = { 0 }; + struct xfs_rtalloc_rec ahigh = { 0 }; + struct xfs_mount *mp = tp->t_mountp; + xfs_rtblock_t start_rtb; + xfs_rtblock_t end_rtb; + uint64_t eofs; + int error; + + eofs = XFS_FSB_TO_BB(mp, xfs_rtx_to_rtb(mp, mp->m_sb.sb_rextents)); + if (keys[0].fmr_physical >= eofs) + return 0; + start_rtb = XFS_BB_TO_FSBT(mp, + keys[0].fmr_physical + keys[0].fmr_length); + end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); + + info->missing_owner = XFS_FMR_OWN_UNKNOWN; + + /* Adjust the low key if we are continuing from where we left off. */ + if (keys[0].fmr_length > 0) { + info->low_daddr = XFS_FSB_TO_BB(mp, start_rtb); + if (info->low_daddr >= eofs) + return 0; + } + + trace_xfs_fsmap_low_key_linear(mp, info->dev, start_rtb); + trace_xfs_fsmap_high_key_linear(mp, info->dev, end_rtb); + + xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP); + + /* + * Set up query parameters to return free rtextents covering the range + * we want. + */ + alow.ar_startext = xfs_rtb_to_rtx(mp, start_rtb); + ahigh.ar_startext = xfs_rtb_to_rtxup(mp, end_rtb); + error = xfs_rtalloc_query_range(mp, tp, &alow, &ahigh, + xfs_getfsmap_rtdev_rtbitmap_helper, info); + if (error) + goto err; + + /* + * Report any gaps at the end of the rtbitmap by simulating a null + * rmap starting at the block after the end of the query range. + */ + info->last = true; + ahigh.ar_startext = min(mp->m_sb.sb_rextents, ahigh.ar_startext); + + error = xfs_getfsmap_rtdev_rtbitmap_helper(mp, tp, &ahigh, info); + if (error) + goto err; +err: + xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP); + return error; +} +#endif /* CONFIG_XFS_RT */ + /* Do we recognize the device? */ STATIC bool xfs_getfsmap_is_valid_device( From patchwork Sun Dec 31 21:36:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507683 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F002BA2E for ; Sun, 31 Dec 2023 21:36:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ly9sWAlj" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2076EC433C8; Sun, 31 Dec 2023 21:36:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058605; bh=LCiWiR2RYSfvCMtpj/Mt7ciKi1qyfJdt8/ve70ssa58=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Ly9sWAljdgMEyWckLvftEcpbr/Dz3ZG3oNIyx13NKkRGiNP1s0Z9TrKPYTAWrmGzK HBlCcQGE0X6g2doBq0cLygwKzJhUoJzj6vmBONy76H1GlArMpJZ4buiOE7qdGHxfLE OSlBvDxz5X0h8PssrlLJocxBTevj5lfoFUU5NH0pd8KgliAD48WCXCdh/kirn3YhGO K5IrS6boEVWt2XRpvj+lQKPTQeDmVMVrsbe4WoMVnkHIk+XxDjcbXjlBcWcx2+P3SL tKwYCUefQkiCEhgySMDf5jsQz/7otRlNP5qFxhIQHU9qBEphmpn7oRE5SF3E3MnOAS ABpeabHjNMkbg== Date: Sun, 31 Dec 2023 13:36:44 -0800 Subject: [PATCH 19/39] xfs: wire up getfsmap to the realtime reverse mapping btree From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850206.1764998.3851642767596046037.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Connect the getfsmap ioctl to the realtime rmapbt. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_fsmap.c | 194 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 191 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 34769cb35c908..b0eabc76eb28a 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -25,6 +25,8 @@ #include "xfs_alloc_btree.h" #include "xfs_rtbitmap.h" #include "xfs_ag.h" +#include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" /* Convert an xfs_fsmap to an fsmap. */ static void @@ -158,6 +160,7 @@ struct xfs_getfsmap_info { struct xfs_fsmap_head *head; struct fsmap *fsmap_recs; /* mapping records */ struct xfs_buf *agf_bp; /* AGF, for refcount queries */ + struct xfs_rtgroup *rtg; /* rt group info, if needed */ struct xfs_perag *pag; /* AG info, if applicable */ xfs_daddr_t next_daddr; /* next daddr we expect */ /* daddr of low fsmap key when we're using the rtbitmap */ @@ -338,8 +341,14 @@ xfs_getfsmap_helper( if (info->head->fmh_entries >= info->head->fmh_count) return -ECANCELED; - trace_xfs_fsmap_mapping(mp, info->dev, - info->pag ? info->pag->pag_agno : NULLAGNUMBER, rec); + if (info->pag) + trace_xfs_fsmap_mapping(mp, info->dev, info->pag->pag_agno, + rec); + else if (info->rtg) + trace_xfs_fsmap_mapping(mp, info->dev, info->rtg->rtg_rgno, + rec); + else + trace_xfs_fsmap_mapping(mp, info->dev, NULLAGNUMBER, rec); fmr.fmr_device = info->dev; fmr.fmr_physical = rec_daddr; @@ -784,6 +793,181 @@ xfs_getfsmap_rtdev_rtbitmap( xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP); return error; } + +/* Transform a realtime rmapbt record into a fsmap */ +STATIC int +xfs_getfsmap_rtdev_rmapbt_helper( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xfs_mount *mp = cur->bc_mp; + struct xfs_getfsmap_info *info = priv; + xfs_rtblock_t rtbno; + xfs_daddr_t rec_daddr; + + rtbno = xfs_rgbno_to_rtb(mp, cur->bc_ino.rtg->rtg_rgno, + rec->rm_startblock); + rec_daddr = xfs_rtb_to_daddr(mp, rtbno); + + return xfs_getfsmap_helper(cur->bc_tp, info, rec, rec_daddr, 0); +} + +/* Actually query the rtrmap btree. */ +STATIC int +xfs_getfsmap_rtdev_rmapbt_query( + struct xfs_trans *tp, + struct xfs_getfsmap_info *info, + struct xfs_btree_cur **curpp) +{ + struct xfs_mount *mp = tp->t_mountp; + + /* Report any gap at the end of the last rtgroup. */ + if (info->last) + return xfs_getfsmap_rtdev_rmapbt_helper(*curpp, &info->high, + info); + + /* Query the rtrmapbt */ + xfs_rtgroup_lock(NULL, info->rtg, XFS_RTGLOCK_RMAP); + *curpp = xfs_rtrmapbt_init_cursor(mp, tp, info->rtg, + info->rtg->rtg_rmapip); + return xfs_rmap_query_range(*curpp, &info->low, &info->high, + xfs_getfsmap_rtdev_rmapbt_helper, info); +} + +/* Execute a getfsmap query against the realtime device rmapbt. */ +STATIC int +xfs_getfsmap_rtdev_rmapbt( + struct xfs_trans *tp, + const struct xfs_fsmap *keys, + struct xfs_getfsmap_info *info) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_rtgroup *rtg; + struct xfs_btree_cur *bt_cur = NULL; + xfs_rtblock_t start_rtb; + xfs_rtblock_t end_rtb; + xfs_rgnumber_t start_rg, end_rg; + uint64_t eofs; + int error = 0; + + eofs = XFS_FSB_TO_BB(mp, xfs_rtx_to_rtb(mp, mp->m_sb.sb_rextents)); + if (keys[0].fmr_physical >= eofs) + return 0; + start_rtb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); + end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); + + info->missing_owner = XFS_FMR_OWN_FREE; + + /* + * Convert the fsmap low/high keys to rtgroup based keys. Initialize + * low to the fsmap low key and max out the high key to the end + * of the rtgroup. + */ + info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); + error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); + if (error) + return error; + info->low.rm_blockcount = XFS_BB_TO_FSBT(mp, keys[0].fmr_length); + xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); + + /* Adjust the low key if we are continuing from where we left off. */ + if (info->low.rm_blockcount == 0) { + /* No previous record from which to continue */ + } else if (rmap_not_shareable(mp, &info->low)) { + /* Last record seen was an unshareable extent */ + info->low.rm_owner = 0; + info->low.rm_offset = 0; + + start_rtb += info->low.rm_blockcount; + if (xfs_rtb_to_daddr(mp, start_rtb) >= eofs) + return 0; + } else { + /* Last record seen was a shareable file data extent */ + info->low.rm_offset += info->low.rm_blockcount; + } + info->low.rm_startblock = xfs_rtb_to_rgbno(mp, start_rtb, &start_rg); + + info->high.rm_startblock = -1U; + info->high.rm_owner = ULLONG_MAX; + info->high.rm_offset = ULLONG_MAX; + info->high.rm_blockcount = 0; + info->high.rm_flags = XFS_RMAP_KEY_FLAGS | XFS_RMAP_REC_FLAGS; + + end_rg = xfs_rtb_to_rgno(mp, end_rtb); + + for_each_rtgroup_range(mp, start_rg, end_rg, rtg) { + /* + * Set the rtgroup high key from the fsmap high key if this + * is the last rtgroup that we're querying. + */ + info->rtg = rtg; + if (rtg->rtg_rgno == end_rg) { + xfs_rgnumber_t junk; + + info->high.rm_startblock = xfs_rtb_to_rgbno(mp, + end_rtb, &junk); + info->high.rm_offset = XFS_BB_TO_FSBT(mp, + keys[1].fmr_offset); + error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]); + if (error) + break; + xfs_getfsmap_set_irec_flags(&info->high, &keys[1]); + } + + if (bt_cur) { + xfs_rtgroup_unlock(bt_cur->bc_ino.rtg, + XFS_RTGLOCK_RMAP); + xfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR); + bt_cur = NULL; + } + + trace_xfs_fsmap_low_key(mp, info->dev, rtg->rtg_rgno, + &info->low); + trace_xfs_fsmap_high_key(mp, info->dev, rtg->rtg_rgno, + &info->high); + + error = xfs_getfsmap_rtdev_rmapbt_query(tp, info, &bt_cur); + if (error) + break; + + /* + * Set the rtgroup low key to the start of the rtgroup prior to + * moving on to the next rtgroup. + */ + if (rtg->rtg_rgno == start_rg) + memset(&info->low, 0, sizeof(info->low)); + + /* + * If this is the last rtgroup, report any gap at the end of it + * before we drop the reference to the perag when the loop + * terminates. + */ + if (rtg->rtg_rgno == end_rg) { + info->last = true; + error = xfs_getfsmap_rtdev_rmapbt_query(tp, info, + &bt_cur); + if (error) + break; + } + info->rtg = NULL; + } + + if (bt_cur) { + xfs_rtgroup_unlock(bt_cur->bc_ino.rtg, XFS_RTGLOCK_RMAP); + xfs_btree_del_cursor(bt_cur, error < 0 ? XFS_BTREE_ERROR : + XFS_BTREE_NOERROR); + } + if (info->rtg) { + xfs_rtgroup_rele(info->rtg); + info->rtg = NULL; + } else if (rtg) { + /* loop termination case */ + xfs_rtgroup_rele(rtg); + } + + return error; +} #endif /* CONFIG_XFS_RT */ /* Do we recognize the device? */ @@ -916,7 +1100,10 @@ xfs_getfsmap( #ifdef CONFIG_XFS_RT if (mp->m_rtdev_targp) { handlers[2].dev = new_encode_dev(mp->m_rtdev_targp->bt_dev); - handlers[2].fn = xfs_getfsmap_rtdev_rtbitmap; + if (use_rmap) + handlers[2].fn = xfs_getfsmap_rtdev_rmapbt; + else + handlers[2].fn = xfs_getfsmap_rtdev_rtbitmap; } #endif /* CONFIG_XFS_RT */ @@ -983,6 +1170,7 @@ xfs_getfsmap( info.dev = handlers[i].dev; info.last = false; info.pag = NULL; + info.rtg = NULL; info.low_daddr = -1ULL; info.low.rm_blockcount = 0; error = handlers[i].fn(tp, dkeys, &info); From patchwork Sun Dec 31 21:37:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507684 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37FF4BA22 for ; Sun, 31 Dec 2023 21:37:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ntZG3vhs" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB1F9C433C8; Sun, 31 Dec 2023 21:37:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058620; bh=+XNhfkSrBFegwmzhPOYnyf9QQ5vP60c2ra534jTk3UY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=ntZG3vhsTq0PiOEjPfCGK/ft1i4K+gSXyAM682ktJDP1oxZCT/zWQAifQMtP1jI/B NhxrgZ7UzRIMPIkzwJreikdbWtbCVO8l02LzqXkVJbPLeraS38siarkflcieP5uZlY VbtLf4s41jnLBpgvXyan0ts7uiIzZTmbknflV8dKGfV6bBnAweU9rNDXxysrzDZ8OW S1pI8ucJ1X1NXaI+ZXmBrk8Yp/oU4qiy2C7EKZDXY71noneX2tUHhXoiZ2H6ZF3FfX VtokYuxQ8yh2/t+rTFFdIbjaJmL7N1vfdZ0qufHpQ142VPOCQl+vVPaW83VvYoeJlD QbID31etuN9yQ== Date: Sun, 31 Dec 2023 13:37:00 -0800 Subject: [PATCH 20/39] xfs: check that the rtrmapbt maxlevels doesn't increase when growing fs From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850221.1764998.5077676356604159966.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong The size of filesystem transaction reservations depends on the maximum height (maxlevels) of the realtime btrees. Since we don't want a grow operation to increase the reservation size enough that we'll fail the minimum log size checks on the next mount, constrain growfs operations if they would cause an increase in those maxlevels. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_fsops.c | 12 +++++++++ fs/xfs/xfs_rtalloc.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_rtalloc.h | 5 ++++ fs/xfs/xfs_trace.h | 21 ++++++++++++++++ 4 files changed, 101 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 99f9f5c8d9b6e..e78ee67b9dd12 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -23,6 +23,7 @@ #include "xfs_trace.h" #include "xfs_rtgroup.h" #include "xfs_rtalloc.h" +#include "xfs_rtrmap_btree.h" /* * Write new AG headers to disk. Non-transactional, but need to be @@ -115,6 +116,13 @@ xfs_growfs_data_private( xfs_buf_relse(bp); } + /* Make sure the new fs size won't cause problems with the log. */ + error = xfs_growfs_check_rtgeom(mp, nb, mp->m_sb.sb_rblocks, + mp->m_sb.sb_rextsize, mp->m_sb.sb_rextents, + mp->m_sb.sb_rbmblocks, mp->m_sb.sb_rextslog); + if (error) + return error; + nb_div = nb; nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks); if (nb_mod && nb_mod >= XFS_MIN_AG_BLOCKS) @@ -227,7 +235,11 @@ xfs_growfs_data_private( error = xfs_fs_reserve_ag_blocks(mp); if (error == -ENOSPC) error = 0; + + /* Compute new maxlevels for rt btrees. */ + xfs_rtrmapbt_compute_maxlevels(mp); } + return error; out_trans_cancel: diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 2f2a92672de9a..ac6580bda2052 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -27,6 +27,7 @@ #include "xfs_rtgroup.h" #include "xfs_error.h" #include "xfs_rtrmap_btree.h" +#include "xfs_trace.h" /* * Realtime metadata files are not quite regular files because userspace can't @@ -1016,6 +1017,57 @@ xfs_growfs_rt_init_primary( return 0; } +/* + * Check that changes to the realtime geometry won't affect the minimum + * log size, which would cause the fs to become unusable. + */ +int +xfs_growfs_check_rtgeom( + const struct xfs_mount *mp, + xfs_rfsblock_t dblocks, + xfs_rfsblock_t rblocks, + xfs_agblock_t rextsize, + xfs_rtblock_t rextents, + xfs_extlen_t rbmblocks, + uint8_t rextslog) +{ + struct xfs_mount *fake_mp; + int min_logfsbs; + + fake_mp = kmem_alloc(sizeof(struct xfs_mount), KM_MAYFAIL); + if (!fake_mp) + return -ENOMEM; + + /* + * Create a dummy xfs_mount with the new rt geometry, and compute the + * new minimum log size. This ensures that the log is big enough to + * handle the larger transactions that we could start sending. + */ + memcpy(fake_mp, mp, sizeof(struct xfs_mount)); + + fake_mp->m_sb.sb_dblocks = dblocks; + fake_mp->m_sb.sb_rblocks = rblocks; + fake_mp->m_sb.sb_rextents = rextents; + fake_mp->m_sb.sb_rextsize = rextsize; + fake_mp->m_sb.sb_rbmblocks = rbmblocks; + fake_mp->m_sb.sb_rextslog = rextslog; + if (rblocks > 0) + fake_mp->m_features |= XFS_FEAT_REALTIME; + + xfs_rtrmapbt_compute_maxlevels(fake_mp); + + xfs_trans_resv_calc(fake_mp, M_RES(fake_mp)); + min_logfsbs = xfs_log_calc_minimum_size(fake_mp); + trace_xfs_growfs_check_rtgeom(mp, min_logfsbs); + + kmem_free(fake_mp); + + if (mp->m_sb.sb_logblocks < min_logfsbs) + return -ENOSPC; + + return 0; +} + /* * Grow the realtime area of the filesystem. */ @@ -1107,6 +1159,12 @@ xfs_growfs_rt( if (nrsumblocks > (mp->m_sb.sb_logblocks >> 1)) return -EINVAL; + /* Make sure the new fs size won't cause problems with the log. */ + error = xfs_growfs_check_rtgeom(mp, mp->m_sb.sb_dblocks, nrblocks, + in->extsize, nrextents, nrbmblocks, nrextslog); + if (error) + return error; + /* Allocate the new rt group structures */ if (xfs_has_rtgroups(mp)) { uint64_t new_rgcount; @@ -1301,8 +1359,12 @@ xfs_growfs_rt( rtg->rtg_blockcount = xfs_rtgroup_block_count(mp, rtg->rtg_rgno); - /* Ensure the mount RT feature flag is now set. */ + /* + * Ensure the mount RT feature flag is now set, and compute new + * maxlevels for rt btrees. + */ mp->m_features |= XFS_FEAT_REALTIME; + xfs_rtrmapbt_compute_maxlevels(mp); } if (error) goto out_free; diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index c01ca192646a9..8a7b6cfa13cf0 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -80,6 +80,10 @@ xfs_growfs_rt( xfs_growfs_rt_t *in); /* user supplied growfs struct */ int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp); +int xfs_growfs_check_rtgeom(const struct xfs_mount *mp, xfs_rfsblock_t dblocks, + xfs_rfsblock_t rblocks, xfs_agblock_t rextsize, + xfs_rtblock_t rextents, xfs_extlen_t rbmblocks, + uint8_t rextslog); #else # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (-ENOSYS) # define xfs_rtpick_extent(m,t,l,rb) (-ENOSYS) @@ -101,6 +105,7 @@ xfs_rtmount_init( # define xfs_rtunmount_inodes(m) # define xfs_rt_resv_free(mp) ((void)0) # define xfs_rt_resv_init(mp) (0) +# define xfs_growfs_check_rtgeom(mp, d, r, rs, rx, rb, rl) (0) #endif /* CONFIG_XFS_RT */ #endif /* __XFS_RTALLOC_H__ */ diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 8ebdfb216266c..2dc991730c303 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -5406,6 +5406,27 @@ DEFINE_IMETA_RESV_EVENT(xfs_imeta_resv_free_extent); DEFINE_IMETA_RESV_EVENT(xfs_imeta_resv_critical); DEFINE_INODE_ERROR_EVENT(xfs_imeta_resv_init_error); +#ifdef CONFIG_XFS_RT +TRACE_EVENT(xfs_growfs_check_rtgeom, + TP_PROTO(const struct xfs_mount *mp, unsigned int min_logfsbs), + TP_ARGS(mp, min_logfsbs), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned int, logblocks) + __field(unsigned int, min_logfsbs) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->logblocks = mp->m_sb.sb_logblocks; + __entry->min_logfsbs = min_logfsbs; + ), + TP_printk("dev %d:%d logblocks %u min_logfsbs %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->logblocks, + __entry->min_logfsbs) +); +#endif /* CONFIG_XFS_RT */ + #endif /* _TRACE_XFS_H */ #undef TRACE_INCLUDE_PATH From patchwork Sun Dec 31 21:37:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507685 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9626DBA2B for ; Sun, 31 Dec 2023 21:37:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="P+Z0yjm7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 685CCC433C8; Sun, 31 Dec 2023 21:37:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058636; bh=Fqg75wC5yfmbLY+AKLukU8nCMWwwMd1aDdLF7GUqxQs=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=P+Z0yjm7RFKYJmPeTUJ3By3Q2DpEVqIB/eeVmWpCRYYcSgUoAQ30KPbwRQn/BqmhF 2mRoAHFztTc8NW7ih67JkjT5zdx0m+vGxy81YQJCv6d0JEFf9Cty3TQqN0ojP6K6OJ gLw1jRtlMjxwZ5cx1kTLLGlrWRv/3KdOYgGLht4XU21z6QAyvX7I7IXV7S/GwcC0Ta EwnOlSnv4R1DVFO09H4MKo0Dc8qOOkVtG6KwT2iast+LPzkJ0ClMFBtyvViGPxSl2H mYH3uF438SHrNRTWL9yyreDDSXKwify7J375mX5sJDVSmJRWvHAL5MY6oo++oDbrbN /5/1CYISpMXBA== Date: Sun, 31 Dec 2023 13:37:15 -0800 Subject: [PATCH 21/39] xfs: add realtime rmap btree when adding rt volume From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850238.1764998.14223965488625049404.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong If we're adding enough space to the realtime section to require the creation of new realtime groups, create the rt rmap btree inode before we start adding the space. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_rtalloc.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 86 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index ac6580bda2052..f6b23439b674d 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -24,8 +24,11 @@ #include "xfs_health.h" #include "xfs_da_format.h" #include "xfs_imeta.h" +#include "xfs_imeta_utils.h" #include "xfs_rtgroup.h" #include "xfs_error.h" +#include "xfs_btree.h" +#include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" #include "xfs_trace.h" @@ -1017,6 +1020,74 @@ xfs_growfs_rt_init_primary( return 0; } +/* Add a metadata inode for a realtime rmap btree. */ +static int +xfs_growfsrt_create_rtrmap( + struct xfs_rtgroup *rtg) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_update upd; + struct xfs_rmap_irec rmap = { + .rm_startblock = 0, + .rm_blockcount = mp->m_sb.sb_rextsize, + .rm_owner = XFS_RMAP_OWN_FS, + .rm_offset = 0, + .rm_flags = 0, + }; + struct xfs_btree_cur *cur; + struct xfs_imeta_path *path; + struct xfs_inode *ip = NULL; + int error; + + if (!xfs_has_rtrmapbt(mp) || rtg->rtg_rmapip) + return 0; + + error = xfs_rtrmapbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + return error; + + error = xfs_imeta_ensure_dirpath(mp, path); + if (error) + goto out_path; + + error = xfs_imeta_start_create(mp, path, &upd); + if (error) + goto out_path; + + error = xfs_rtrmapbt_create(&upd, &ip); + if (error) + goto out_cancel; + + lockdep_set_class(&ip->i_lock.mr_lock, &xfs_rrmapip_key); + + /* Rmap the rtgroup superblock; this had better fit in the data fork. */ + cur = xfs_rtrmapbt_init_cursor(mp, upd.tp, rtg, ip); + error = xfs_rmap_map_raw(cur, &rmap); + xfs_btree_del_cursor(cur, error); + if (error) + goto out_cancel; + + error = xfs_imeta_commit_update(&upd); + if (error) + goto out_path; + + xfs_imeta_free_path(path); + xfs_finish_inode_setup(ip); + rtg->rtg_rmapip = ip; + return 0; + +out_cancel: + xfs_imeta_cancel_update(&upd, error); + /* Have to finish setting up the inode to ensure it's deleted. */ + if (ip) { + xfs_finish_inode_setup(ip); + xfs_irele(ip); + } +out_path: + xfs_imeta_free_path(path); + return error; +} + /* * Check that changes to the realtime geometry won't affect the minimum * log size, which would cause the fs to become unusable. @@ -1122,7 +1193,9 @@ xfs_growfs_rt( return -EINVAL; /* Unsupported realtime features. */ - if (xfs_has_rmapbt(mp) || xfs_has_reflink(mp) || xfs_has_quota(mp)) + if (!xfs_has_rtgroups(mp) && xfs_has_rmapbt(mp)) + return -EOPNOTSUPP; + if (xfs_has_reflink(mp) || xfs_has_quota(mp)) return -EOPNOTSUPP; nrblocks = in->newblocks; @@ -1261,10 +1334,21 @@ xfs_growfs_rt( /* recompute growfsrt reservation from new rsumsize */ xfs_trans_resv_calc(nmp, &nmp->m_resv); - if (xfs_has_rtgroups(mp)) + if (xfs_has_rtgroups(mp)) { + xfs_rgnumber_t rgno = last_rgno; + nsbp->sb_rgcount = howmany_64(nsbp->sb_rblocks, nsbp->sb_rgblocks); + for_each_rtgroup_range(mp, rgno, nsbp->sb_rgcount, rtg) { + error = xfs_growfsrt_create_rtrmap(rtg); + if (error) { + xfs_rtgroup_rele(rtg); + break; + } + } + } + /* * Start a transaction, get the log reservation. */ From patchwork Sun Dec 31 21:37:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507686 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E4B3BA2B for ; Sun, 31 Dec 2023 21:37:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jGxpYVpm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1BDFEC433C7; Sun, 31 Dec 2023 21:37:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058652; bh=mVFtsuYnkp5sJOAr7e7qjuzc1/ugqe/dcfPuCSZ1Rao=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=jGxpYVpmZeA5d/Juu57M0pZ+jJtfZdO/QPnXQZCyCiBIIvHGAfVUH3dfdmmUZJgPo ERXRrX43ReEMEls3Ypbz86a9auLgPSUwrUB10HfARTxq75hdTo7FYYXHIj1IGTod1K +YupqpGTpdsfuAq1HM2PEqPoWhNp8XZPThcKJDinAAA/aJMhFAGwL16DwG9/FVPYjp wHExEBOIAh8K6krooPNjLqkFpn7OVnIdVMk6WZle+t9iFRPBbyMxnUqGqy4fWZ9Im1 /9pxjGZOMgUihG4UNw64aeLsFo/QL0W+baJePRSJbVGcaNMRxMFJouYbZNq9fj4870 6qw898P3tamZg== Date: Sun, 31 Dec 2023 13:37:31 -0800 Subject: [PATCH 22/39] xfs: report realtime rmap btree corruption errors to the health system From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850253.1764998.8292417915212123414.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Whenever we encounter corrupt realtime rmap btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_fs_staging.h | 1 + fs/xfs/libxfs/xfs_health.h | 4 +++- fs/xfs/libxfs/xfs_inode_fork.c | 4 +++- fs/xfs/libxfs/xfs_rtrmap_btree.c | 5 ++++- fs/xfs/xfs_health.c | 4 ++++ fs/xfs/xfs_rtalloc.c | 2 ++ 6 files changed, 17 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs_staging.h b/fs/xfs/libxfs/xfs_fs_staging.h index 1f57331487791..9d5d6af62b616 100644 --- a/fs/xfs/libxfs/xfs_fs_staging.h +++ b/fs/xfs/libxfs/xfs_fs_staging.h @@ -216,6 +216,7 @@ struct xfs_rtgroup_geometry { }; #define XFS_RTGROUP_GEOM_SICK_SUPER (1 << 0) /* superblock */ #define XFS_RTGROUP_GEOM_SICK_BITMAP (1 << 1) /* rtbitmap for this group */ +#define XFS_RTGROUP_GEOM_SICK_RMAPBT (1 << 2) /* reverse mappings */ #define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 63, struct xfs_rtgroup_geometry) diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index 1e9938a417b42..aeeb62769773f 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -68,6 +68,7 @@ struct xfs_rtgroup; #define XFS_SICK_RT_BITMAP (1 << 0) /* realtime bitmap */ #define XFS_SICK_RT_SUMMARY (1 << 1) /* realtime summary */ #define XFS_SICK_RT_SUPER (1 << 2) /* rt group superblock */ +#define XFS_SICK_RT_RMAPBT (1 << 3) /* reverse mappings */ /* Observable health issues for AG metadata. */ #define XFS_SICK_AG_SB (1 << 0) /* superblock */ @@ -113,7 +114,8 @@ struct xfs_rtgroup; #define XFS_SICK_RT_PRIMARY (XFS_SICK_RT_BITMAP | \ XFS_SICK_RT_SUMMARY | \ - XFS_SICK_RT_SUPER) + XFS_SICK_RT_SUPER | \ + XFS_SICK_RT_RMAPBT) #define XFS_SICK_AG_PRIMARY (XFS_SICK_AG_SB | \ XFS_SICK_AG_AGF | \ diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index ab54a9b27c781..e7ab04aea2db6 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -266,8 +266,10 @@ xfs_iformat_data_fork( case XFS_DINODE_FMT_BTREE: return xfs_iformat_btree(ip, dip, XFS_DATA_FORK); case XFS_DINODE_FMT_RMAP: - if (!xfs_has_rtrmapbt(ip->i_mount)) + if (!xfs_has_rtrmapbt(ip->i_mount)) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } return xfs_iformat_rtrmap(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index b824562bdc2ec..355c50196e986 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -27,6 +27,7 @@ #include "xfs_extent_busy.h" #include "xfs_rtgroup.h" #include "xfs_bmap.h" +#include "xfs_health.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -800,8 +801,10 @@ xfs_iformat_rtrmap( level = be16_to_cpu(dfp->bb_level); if (level > mp->m_rtrmap_maxlevels || - xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) + xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } xfs_iroot_alloc(ip, XFS_DATA_FORK, xfs_rtrmap_broot_space_calc(mp, level, numrecs)); diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index b53ef95a37a54..ed0767a6fa15a 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -532,6 +532,7 @@ xfs_ag_geom_health( static const struct ioctl_sick_map rtgroup_map[] = { { XFS_SICK_RT_SUPER, XFS_RTGROUP_GEOM_SICK_SUPER }, { XFS_SICK_RT_BITMAP, XFS_RTGROUP_GEOM_SICK_BITMAP }, + { XFS_SICK_RT_RMAPBT, XFS_RTGROUP_GEOM_SICK_RMAPBT }, { 0, 0 }, }; @@ -636,6 +637,9 @@ xfs_btree_mark_sick( case XFS_BTNUM_BMAP: xfs_bmap_mark_sick(cur->bc_ino.ip, cur->bc_ino.whichfork); return; + case XFS_BTNUM_RTRMAP: + xfs_rtgroup_mark_sick(cur->bc_ino.rtg, XFS_SICK_RT_RMAPBT); + return; case XFS_BTNUM_BNO: mask = XFS_SICK_AG_BNOBT; break; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index f6b23439b674d..37156cf8acd25 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1781,6 +1781,7 @@ xfs_rtmount_rmapbt( goto out_path; if (ino == NULLFSINO) { + xfs_rtgroup_mark_sick(rtg, XFS_SICK_RT_RMAPBT); error = -EFSCORRUPTED; goto out_path; } @@ -1790,6 +1791,7 @@ xfs_rtmount_rmapbt( goto out_path; if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_RMAP)) { + xfs_rtgroup_mark_sick(rtg, XFS_SICK_RT_RMAPBT); error = -EFSCORRUPTED; goto out_rele; } From patchwork Sun Dec 31 21:37:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507687 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0005FBA2B for ; Sun, 31 Dec 2023 21:37:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NfMeMtrc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C36CEC433C8; Sun, 31 Dec 2023 21:37:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058667; bh=2l5EA8P06w4WwqQGFIT3KULiKQq0oPbFZKhc+MEl+Co=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=NfMeMtrcl4UCr36XD66Sr0bnd2WkmkkPig760Pft6h26xDvveAUNPQ50FmFhdA6fq h5m8D1DAv/B3ZSKeQVFHg10dgTSfc0h5qCLAdvXMtJNVW3eFyqbfrcmyoVnFCZoVRE KQVB9zH7hBivd1XJl3qT3mRA1yqrtH4dHc6mV49BN0peUPDFO6PFv1MNVcmy43bGTo i1qteQFXnVCbzhUVe+NfaHxmx1oUcJg+o6JMH46p9mD7nsPQRNXySomWwhirbqAHIz qKQmAfAlZrQ99mv4okcJC5UmyExJju8awWRcUXS3XUmzmYet3EYmOmu9I/OboKsRfa Vil0ahLQgn14w== Date: Sun, 31 Dec 2023 13:37:47 -0800 Subject: [PATCH 23/39] xfs: fix scrub tracepoints when inode-rooted btrees are involved From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850269.1764998.4993791802508458626.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Fix a minor mistakes in the scrub tracepoints that can manifest when inode-rooted btrees are enabled. The existing code worked fine for bmap btrees, but we should tighten the code up to be less sloppy. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/trace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index bdcd77c839317..822fcdfd89a4b 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -611,7 +611,7 @@ TRACE_EVENT(xchk_ifork_btree_op_error, TP_fast_assign( xfs_fsblock_t fsbno = xchk_btree_cur_fsbno(cur, level); __entry->dev = sc->mp->m_super->s_dev; - __entry->ino = sc->ip->i_ino; + __entry->ino = cur->bc_ino.ip->i_ino; __entry->whichfork = cur->bc_ino.whichfork; __entry->type = sc->sm->sm_type; __entry->btnum = cur->bc_btnum; From patchwork Sun Dec 31 21:38:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507688 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D4FDBA2E for ; Sun, 31 Dec 2023 21:38:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RZWJvPxz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6B82BC433C8; Sun, 31 Dec 2023 21:38:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058683; bh=N5qLY73T6eOO9JSWXK+3nZ6WgNIEXfmb7ftdZazJofc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=RZWJvPxz4v+fNXM5Sg6ShVXtLkv4loBECcCwHqOFG40T360yWAwhSCZMDWdWSDjCx sUuC2x3kuD5FbHnS1ZFe6f4bLLH1cWkwKmuJ6O5MXZiUaYGs8MZOWbDVMAoC/SIO2U qVoWZhhz+OW5eoK8eH0OEtA2UyBhjazSpt2fd0E2GiX/tdNDYBsrV7hVNN1eYbheSi 5chEqt45iEA5zgONgFWIqE/7yAxSv11Jbx4/U51roW08tuc2IOUSOxcrl//8IWd4ne GqqRYCSw069DIMkYgxQoAEt+iVR8vrrgbUAcnHEs1ERT7L9i8MDRqpBhEJzzUpKod3 fGqMGqQbhFAyQ== Date: Sun, 31 Dec 2023 13:38:02 -0800 Subject: [PATCH 24/39] xfs: allow queued realtime intents to drain before scrubbing From: "Darrick J. Wong" To: djwong@kernel.org Cc: Christoph Hellwig , linux-xfs@vger.kernel.org Message-ID: <170404850285.1764998.16445367274469895177.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When a writer thread executes a chain of log intent items for the realtime volume, the ILOCKs taken during each step are for each rt metadata file, not the entire rt volume itself. Although scrub takes all rt metadata ILOCKs, this isn't sufficient to guard against scrub checking the rt volume while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding the realtime volume. When there's a collision, cross-referencing between data structures (e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the mount structure the same drain that we use to protect scrub against concurrent AG updates, but this time for the realtime volume. [Contains a few cleanups from hch] Cc: Christoph Hellwig Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_rtgroup.c | 2 + fs/xfs/libxfs/xfs_rtgroup.h | 9 ++++ fs/xfs/scrub/common.c | 88 +++++++++++++++++++++++++++++++++++++---- fs/xfs/scrub/rtbitmap.c | 3 + fs/xfs/scrub/scrub.c | 2 - fs/xfs/xfs_bmap_item.c | 11 ++--- fs/xfs/xfs_drain.c | 93 +++++++++++++++++++++++++++++++++++++++---- fs/xfs/xfs_drain.h | 28 ++++++++++++- fs/xfs/xfs_extfree_item.c | 14 ++---- fs/xfs/xfs_mount.h | 1 fs/xfs/xfs_rmap_item.c | 16 +++---- fs/xfs/xfs_trace.h | 32 +++++++++++++++ 12 files changed, 255 insertions(+), 44 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index 6ffe1c21a1ea4..7b031f4917349 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -162,6 +162,7 @@ xfs_initialize_rtgroups( /* Place kernel structure only init below this point. */ spin_lock_init(&rtg->rtg_state_lock); init_waitqueue_head(&rtg->rtg_active_wq); + xfs_defer_drain_init(&rtg->rtg_intents_drain); #endif /* __KERNEL__ */ /* Active ref owned by mount indicates rtgroup is online. */ @@ -216,6 +217,7 @@ xfs_free_rtgroups( spin_unlock(&mp->m_rtgroup_lock); ASSERT(rtg); XFS_IS_CORRUPT(mp, atomic_read(&rtg->rtg_ref) != 0); + xfs_defer_drain_free(&rtg->rtg_intents_drain); /* drop the mount's active reference */ xfs_rtgroup_rele(rtg); diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 559a5135820d3..9487c2e00478b 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -39,6 +39,15 @@ struct xfs_rtgroup { #ifdef __KERNEL__ /* -- kernel only structures below this line -- */ spinlock_t rtg_state_lock; + + /* + * We use xfs_drain to track the number of deferred log intent items + * that have been queued (but not yet processed) so that waiters (e.g. + * scrub) will not lock resources when other threads are in the middle + * of processing a chain of intent items only to find momentary + * inconsistencies. + */ + struct xfs_defer_drain rtg_intents_drain; #endif /* __KERNEL__ */ }; diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 3dee7d717073e..efce0dff2e937 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -776,12 +776,88 @@ xchk_rt_unlock_rtbitmap( } #ifdef CONFIG_XFS_RT +/* Lock all the rt group metadata inode ILOCKs and wait for intents. */ +static int +xchk_rtgroup_drain_and_lock( + struct xfs_scrub *sc, + struct xchk_rt *sr, + unsigned int rtglock_flags) +{ + int error = 0; + + ASSERT(sr->rtg != NULL); + + /* + * If we're /only/ locking the rtbitmap in shared mode, then we're + * obviously not trying to compare records in two metadata inodes. + * There's no need to drain intents here because the caller (most + * likely the rgsuper scanner) doesn't need that level of consistency. + */ + if (rtglock_flags == XFS_RTGLOCK_BITMAP_SHARED) { + xfs_rtgroup_lock(NULL, sr->rtg, rtglock_flags); + sr->rtlock_flags = rtglock_flags; + return 0; + } + + do { + if (xchk_should_terminate(sc, &error)) + return error; + + xfs_rtgroup_lock(NULL, sr->rtg, rtglock_flags); + + /* + * If we've grabbed a non-metadata file for scrubbing, we + * assume that holding its ILOCK will suffice to coordinate + * with any rt intent chains involving this inode. + */ + if (sc->ip && !xfs_is_metadata_inode(sc->ip)) { + sr->rtlock_flags = rtglock_flags; + return 0; + } + + /* + * Decide if the rt group is quiet enough for all metadata to + * be consistent with each other. Regular file IO doesn't get + * to lock all the rt inodes at the same time, which means that + * there could be other threads in the middle of processing a + * chain of deferred ops. + * + * We just locked all the metadata inodes for this rt group; + * now take a look to see if there are any intents in progress. + * If there are, drop the rt group inode locks and wait for the + * intents to drain. Since we hold the rt group inode locks + * for the duration of the scrub, this is the only time we have + * to sample the intents counter; any threads increasing it + * after this point can't possibly be in the middle of a chain + * of rt metadata updates. + * + * Obviously, this should be slanted against scrub and in favor + * of runtime threads. + */ + if (!xfs_rtgroup_intent_busy(sr->rtg)) { + sr->rtlock_flags = rtglock_flags; + return 0; + } + + xfs_rtgroup_unlock(sr->rtg, rtglock_flags); + + if (!(sc->flags & XCHK_FSGATES_DRAIN)) + return -ECHRNG; + error = xfs_rtgroup_intent_drain(sr->rtg); + if (error == -ERESTARTSYS) + error = -EINTR; + } while (!error); + + return error; +} + /* * For scrubbing a realtime group, grab all the in-core resources we'll need to * check the metadata, which means taking the ILOCK of the realtime group's - * metadata inodes. Callers must not join these inodes to the transaction with - * non-zero lockflags or concurrency problems will result. The @rtglock_flags - * argument takes XFS_RTGLOCK_* flags. + * metadata inodes and draining any running intent chains. Callers must not + * join these inodes to the transaction with non-zero lockflags or concurrency + * problems will result. The @rtglock_flags argument takes XFS_RTGLOCK_* + * flags. */ int xchk_rtgroup_init( @@ -797,9 +873,7 @@ xchk_rtgroup_init( if (!sr->rtg) return -ENOENT; - xfs_rtgroup_lock(NULL, sr->rtg, rtglock_flags); - sr->rtlock_flags = rtglock_flags; - return 0; + return xchk_rtgroup_drain_and_lock(sc, sr, rtglock_flags); } /* @@ -1457,7 +1531,7 @@ xchk_fsgates_enable( trace_xchk_fsgates_enable(sc, scrub_fsgates); if (scrub_fsgates & XCHK_FSGATES_DRAIN) - xfs_drain_wait_enable(); + xfs_defer_drain_wait_enable(); if (scrub_fsgates & XCHK_FSGATES_QUOTA) xfs_dqtrx_hook_enable(); diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index aae8b0e6ff281..5bedb09387495 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -84,6 +84,9 @@ xchk_setup_rtbitmap( struct xchk_rtbitmap *rtb; int error; + if (xchk_need_intent_drain(sc)) + xchk_fsgates_enable(sc, XCHK_FSGATES_DRAIN); + rtb = kzalloc(sizeof(struct xchk_rtbitmap), XCHK_GFP_FLAGS); if (!rtb) return -ENOMEM; diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index a6b7b57fc1df7..58bb52a63782e 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -165,7 +165,7 @@ xchk_fsgates_disable( trace_xchk_fsgates_disable(sc, sc->flags & FSGATES_MASK); if (sc->flags & XCHK_FSGATES_DRAIN) - xfs_drain_wait_disable(); + xfs_defer_drain_wait_disable(); if (sc->flags & XCHK_FSGATES_QUOTA) xfs_dqtrx_hook_disable(); diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index e50276ceceae9..4522de5f13a8d 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -327,10 +327,8 @@ xfs_bmap_update_get_group( { if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) { if (xfs_has_rtgroups(mp)) { - xfs_rgnumber_t rgno; - - rgno = xfs_rtb_to_rgno(mp, bi->bi_bmap.br_startblock); - bi->bi_rtg = xfs_rtgroup_get(mp, rgno); + bi->bi_rtg = xfs_rtgroup_intent_get(mp, + bi->bi_bmap.br_startblock); } else { bi->bi_rtg = NULL; } @@ -366,8 +364,9 @@ xfs_bmap_update_put_group( struct xfs_bmap_intent *bi) { if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) { - if (xfs_has_rtgroups(bi->bi_owner->i_mount)) - xfs_rtgroup_put(bi->bi_rtg); + if (xfs_has_rtgroups(bi->bi_owner->i_mount)) { + xfs_rtgroup_intent_put(bi->bi_rtg); + } return; } diff --git a/fs/xfs/xfs_drain.c b/fs/xfs/xfs_drain.c index 7bdb9688c0f5e..306cd133b9a27 100644 --- a/fs/xfs/xfs_drain.c +++ b/fs/xfs/xfs_drain.c @@ -11,11 +11,12 @@ #include "xfs_mount.h" #include "xfs_ag.h" #include "xfs_trace.h" +#include "xfs_rtgroup.h" /* - * Use a static key here to reduce the overhead of xfs_drain_rele. If the + * Use a static key here to reduce the overhead of xfs_defer_drain_rele. If the * compiler supports jump labels, the static branch will be replaced by a nop - * sled when there are no xfs_drain_wait callers. Online fsck is currently + * sled when there are no xfs_defer_drain_wait callers. Online fsck is currently * the only caller, so this is a reasonable tradeoff. * * Note: Patching the kernel code requires taking the cpu hotplug lock. Other @@ -23,18 +24,18 @@ * XFS callers cannot hold any locks that might be used by memory reclaim or * writeback when calling the static_branch_{inc,dec} functions. */ -static DEFINE_STATIC_KEY_FALSE(xfs_drain_waiter_gate); +static DEFINE_STATIC_KEY_FALSE(xfs_defer_drain_waiter_gate); void -xfs_drain_wait_disable(void) +xfs_defer_drain_wait_disable(void) { - static_branch_dec(&xfs_drain_waiter_gate); + static_branch_dec(&xfs_defer_drain_waiter_gate); } void -xfs_drain_wait_enable(void) +xfs_defer_drain_wait_enable(void) { - static_branch_inc(&xfs_drain_waiter_gate); + static_branch_inc(&xfs_defer_drain_waiter_gate); } void @@ -71,7 +72,7 @@ static inline bool has_waiters(struct wait_queue_head *wq_head) static inline void xfs_defer_drain_rele(struct xfs_defer_drain *dr) { if (atomic_dec_and_test(&dr->dr_count) && - static_branch_unlikely(&xfs_drain_waiter_gate) && + static_branch_unlikely(&xfs_defer_drain_waiter_gate) && has_waiters(&dr->dr_waiters)) wake_up(&dr->dr_waiters); } @@ -164,3 +165,79 @@ xfs_perag_intent_busy( { return xfs_defer_drain_busy(&pag->pag_intents_drain); } + +#ifdef CONFIG_XFS_RT + +/* + * Get a passive reference to an rtgroup and declare an intent to update its + * metadata. + */ +struct xfs_rtgroup * +xfs_rtgroup_intent_get( + struct xfs_mount *mp, + xfs_rtblock_t rtbno) +{ + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + + rgno = xfs_rtb_to_rgno(mp, rtbno); + rtg = xfs_rtgroup_get(mp, rgno); + if (!rtg) + return NULL; + + xfs_rtgroup_intent_hold(rtg); + return rtg; +} + +/* + * Release our intent to update this rtgroup's metadata, and then release our + * passive ref to the rtgroup. + */ +void +xfs_rtgroup_intent_put( + struct xfs_rtgroup *rtg) +{ + xfs_rtgroup_intent_rele(rtg); + xfs_rtgroup_put(rtg); +} +/* + * Declare an intent to update rtgroup metadata. Other threads that need + * exclusive access can decide to back off if they see declared intentions. + */ +void +xfs_rtgroup_intent_hold( + struct xfs_rtgroup *rtg) +{ + trace_xfs_rtgroup_intent_hold(rtg, __return_address); + xfs_defer_drain_grab(&rtg->rtg_intents_drain); +} + +/* Release our intent to update this rtgroup's metadata. */ +void +xfs_rtgroup_intent_rele( + struct xfs_rtgroup *rtg) +{ + trace_xfs_rtgroup_intent_rele(rtg, __return_address); + xfs_defer_drain_rele(&rtg->rtg_intents_drain); +} + +/* + * Wait for the intent update count for this rtgroup to hit zero. + * Callers must not hold any rt metadata inode locks. + */ +int +xfs_rtgroup_intent_drain( + struct xfs_rtgroup *rtg) +{ + trace_xfs_rtgroup_wait_intents(rtg, __return_address); + return xfs_defer_drain_wait(&rtg->rtg_intents_drain); +} + +/* Has anyone declared an intent to update this rtgroup? */ +bool +xfs_rtgroup_intent_busy( + struct xfs_rtgroup *rtg) +{ + return xfs_defer_drain_busy(&rtg->rtg_intents_drain); +} +#endif /* CONFIG_XFS_RT */ diff --git a/fs/xfs/xfs_drain.h b/fs/xfs/xfs_drain.h index 775164f54ea6d..b02ff87a873a1 100644 --- a/fs/xfs/xfs_drain.h +++ b/fs/xfs/xfs_drain.h @@ -7,6 +7,7 @@ #define XFS_DRAIN_H_ struct xfs_perag; +struct xfs_rtgroup; #ifdef CONFIG_XFS_DRAIN_INTENTS /* @@ -25,8 +26,8 @@ struct xfs_defer_drain { void xfs_defer_drain_init(struct xfs_defer_drain *dr); void xfs_defer_drain_free(struct xfs_defer_drain *dr); -void xfs_drain_wait_disable(void); -void xfs_drain_wait_enable(void); +void xfs_defer_drain_wait_disable(void); +void xfs_defer_drain_wait_enable(void); /* * Deferred Work Intent Drains @@ -60,6 +61,9 @@ void xfs_drain_wait_enable(void); * All functions that create work items must increment the intent counter as * soon as the item is added to the transaction and cannot drop the counter * until the item is finished or cancelled. + * + * The same principles apply to realtime groups because the rt metadata inode + * ILOCKs are not held across transaction rolls. */ struct xfs_perag *xfs_perag_intent_get(struct xfs_mount *mp, xfs_fsblock_t fsbno); @@ -70,6 +74,7 @@ void xfs_perag_intent_rele(struct xfs_perag *pag); int xfs_perag_intent_drain(struct xfs_perag *pag); bool xfs_perag_intent_busy(struct xfs_perag *pag); + #else struct xfs_defer_drain { /* empty */ }; @@ -85,4 +90,23 @@ static inline void xfs_perag_intent_rele(struct xfs_perag *pag) { } #endif /* CONFIG_XFS_DRAIN_INTENTS */ +#if defined(CONFIG_XFS_DRAIN_INTENTS) && defined(CONFIG_XFS_RT) +struct xfs_rtgroup *xfs_rtgroup_intent_get(struct xfs_mount *mp, + xfs_rtblock_t rtbno); +void xfs_rtgroup_intent_put(struct xfs_rtgroup *rtg); + +void xfs_rtgroup_intent_hold(struct xfs_rtgroup *rtg); +void xfs_rtgroup_intent_rele(struct xfs_rtgroup *rtg); + +int xfs_rtgroup_intent_drain(struct xfs_rtgroup *rtg); +bool xfs_rtgroup_intent_busy(struct xfs_rtgroup *rtg); +#else +#define xfs_rtgroup_intent_get(mp, rtbno) \ + xfs_rtgroup_get(mp, xfs_rtb_to_rgno((mp), (rtbno))) +#define xfs_rtgroup_intent_put(rtg) xfs_rtgroup_put(rtg) +static inline void xfs_rtgroup_intent_hold(struct xfs_rtgroup *rtg) { } +static inline void xfs_rtgroup_intent_rele(struct xfs_rtgroup *rtg) { } +#endif /* CONFIG_XFS_DRAIN_INTENTS && CONFIG_XFS_RT */ + + #endif /* XFS_DRAIN_H_ */ diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 51a363d85978f..a17d2bf05b414 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -468,11 +468,8 @@ xfs_extent_free_defer_add( trace_xfs_extent_free_defer(mp, xefi); if (xfs_efi_is_realtime(xefi)) { - xfs_rgnumber_t rgno; - - rgno = xfs_rtb_to_rgno(mp, xefi->xefi_startblock); - xefi->xefi_rtg = xfs_rtgroup_get(mp, rgno); - + xefi->xefi_rtg = xfs_rtgroup_intent_get(mp, + xefi->xefi_startblock); *dfpp = xfs_defer_add(tp, &xefi->xefi_list, &xfs_rtextent_free_defer_type); return; @@ -615,11 +612,8 @@ xfs_efi_recover_work( xefi->xefi_agresv = XFS_AG_RESV_NONE; xefi->xefi_owner = XFS_RMAP_OWN_UNKNOWN; if (isrt) { - xfs_rgnumber_t rgno; - xefi->xefi_flags |= XFS_EFI_REALTIME; - rgno = xfs_rtb_to_rgno(mp, extp->ext_start); - xefi->xefi_rtg = xfs_rtgroup_get(mp, rgno); + xefi->xefi_rtg = xfs_rtgroup_intent_get(mp, extp->ext_start); } else { xefi->xefi_pag = xfs_perag_intent_get(mp, extp->ext_start); } @@ -778,7 +772,7 @@ xfs_rtextent_free_cancel_item( { struct xfs_extent_free_item *xefi = xefi_entry(item); - xfs_rtgroup_put(xefi->xefi_rtg); + xfs_rtgroup_intent_put(xefi->xefi_rtg); kmem_cache_free(xfs_extfree_item_cache, xefi); } diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index ada8b281d74e2..21c889a723820 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -13,6 +13,7 @@ struct xfs_ail; struct xfs_quotainfo; struct xfs_da_geometry; struct xfs_perag; +struct xfs_rtgroup; /* dynamic preallocation free space thresholds, 5% down to 1% */ enum { diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 580baf3b1b1d3..f4270efc9dc5e 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -383,13 +383,12 @@ xfs_rmap_defer_add( * section updates. */ if (ri->ri_realtime) { - xfs_rgnumber_t rgno; - - rgno = xfs_rtb_to_rgno(mp, ri->ri_bmap.br_startblock); - ri->ri_rtg = xfs_rtgroup_get(mp, rgno); + ri->ri_rtg = xfs_rtgroup_intent_get(mp, + ri->ri_bmap.br_startblock); xfs_defer_add(tp, &ri->ri_list, &xfs_rtrmap_update_defer_type); } else { - ri->ri_pag = xfs_perag_intent_get(mp, ri->ri_bmap.br_startblock); + ri->ri_pag = xfs_perag_intent_get(mp, + ri->ri_bmap.br_startblock); xfs_defer_add(tp, &ri->ri_list, &xfs_rmap_update_defer_type); } } @@ -538,10 +537,7 @@ xfs_rui_recover_work( XFS_EXT_UNWRITTEN : XFS_EXT_NORM; ri->ri_realtime = isrt; if (isrt) { - xfs_rgnumber_t rgno; - - rgno = xfs_rtb_to_rgno(mp, map->me_startblock); - ri->ri_rtg = xfs_rtgroup_get(mp, rgno); + ri->ri_rtg = xfs_rtgroup_intent_get(mp, map->me_startblock); } else { ri->ri_pag = xfs_perag_intent_get(mp, map->me_startblock); } @@ -684,7 +680,7 @@ xfs_rtrmap_update_cancel_item( { struct xfs_rmap_intent *ri = ri_entry(item); - xfs_rtgroup_put(ri->ri_rtg); + xfs_rtgroup_intent_put(ri->ri_rtg); kmem_cache_free(xfs_rmap_intent_cache, ri); } diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 2dc991730c303..05d8ff68b09e2 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -4952,6 +4952,38 @@ DEFINE_PERAG_INTENTS_EVENT(xfs_perag_intent_hold); DEFINE_PERAG_INTENTS_EVENT(xfs_perag_intent_rele); DEFINE_PERAG_INTENTS_EVENT(xfs_perag_wait_intents); +#ifdef CONFIG_XFS_RT +DECLARE_EVENT_CLASS(xfs_rtgroup_intents_class, + TP_PROTO(struct xfs_rtgroup *rtg, void *caller_ip), + TP_ARGS(rtg, caller_ip), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(long, nr_intents) + __field(void *, caller_ip) + ), + TP_fast_assign( + __entry->dev = rtg->rtg_mount->m_super->s_dev; + __entry->rtdev = rtg->rtg_mount->m_rtdev_targp->bt_dev; + __entry->nr_intents = atomic_read(&rtg->rtg_intents_drain.dr_count); + __entry->caller_ip = caller_ip; + ), + TP_printk("dev %d:%d rtdev %d:%d intents %ld caller %pS", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->nr_intents, + __entry->caller_ip) +); + +#define DEFINE_RTGROUP_INTENTS_EVENT(name) \ +DEFINE_EVENT(xfs_rtgroup_intents_class, name, \ + TP_PROTO(struct xfs_rtgroup *rtg, void *caller_ip), \ + TP_ARGS(rtg, caller_ip)) +DEFINE_RTGROUP_INTENTS_EVENT(xfs_rtgroup_intent_hold); +DEFINE_RTGROUP_INTENTS_EVENT(xfs_rtgroup_intent_rele); +DEFINE_RTGROUP_INTENTS_EVENT(xfs_rtgroup_wait_intents); +#endif /* CONFIG_XFS_RT */ + #endif /* CONFIG_XFS_DRAIN_INTENTS */ TRACE_EVENT(xfs_swapext_overhead, From patchwork Sun Dec 31 21:38:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507689 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DE34BA2B for ; Sun, 31 Dec 2023 21:38:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="j+XFUVXQ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 228F8C433C7; Sun, 31 Dec 2023 21:38:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058699; bh=v+LeyTURGzLj78qHwJXO8g6UGxreVcP+NO8zZZIza0M=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=j+XFUVXQLtAm44liuM4guwQZgGgh9rLNpdEKHioE1CSoZ3NWXINDb/2+7ynLqbfIg N3sFlb840Zeon33GIRy20CoQy6/fub6IngZgoQtM/lnHMTJVnCqn0S1NIX8f/puju2 7kJPh1jzOrpFg8gqePywQ5LuWXBBxNGuU0VAXTG3/CulIbjtgNHsNDXjezhQwIFBPN de37xTB1IdYOLz6Os05sv3wrvpmoEgozU5f6+stLTcAa24oVh1ma8vs7y+HrPrAps0 QJRl6ATtMxrdqsz4QaKXmGKhPw5sDRb96dGK0xva02IDqC+9/CP13X5s5jxudFXuie iRRAqJmoptOHA== Date: Sun, 31 Dec 2023 13:38:18 -0800 Subject: [PATCH 25/39] xfs: scrub the realtime rmapbt From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850302.1764998.14932806738322962852.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Check the realtime reverse mapping btree against the rtbitmap, and modify the rtbitmap scrub to check against the rtrmapbt. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_fs.h | 3 - fs/xfs/scrub/bmap.c | 1 fs/xfs/scrub/bmap_repair.c | 1 fs/xfs/scrub/common.c | 78 +++++++++++++++++ fs/xfs/scrub/common.h | 10 ++ fs/xfs/scrub/health.c | 1 fs/xfs/scrub/inode.c | 10 +- fs/xfs/scrub/inode_repair.c | 7 +- fs/xfs/scrub/repair.c | 1 fs/xfs/scrub/rtrmap.c | 193 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.c | 9 ++ fs/xfs/scrub/scrub.h | 5 + fs/xfs/scrub/stats.c | 1 fs/xfs/scrub/trace.h | 4 + 15 files changed, 313 insertions(+), 12 deletions(-) create mode 100644 fs/xfs/scrub/rtrmap.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 50a7929982fdd..69ca5cb7e4000 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -193,6 +193,7 @@ xfs-$(CONFIG_XFS_ONLINE_SCRUB_STATS) += scrub/stats.o xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ rgsuper.o \ rtbitmap.o \ + rtrmap.o \ rtsummary.o \ ) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 102b927336057..dcf048aae8c17 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -737,9 +737,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_METAPATH 29 /* metadata directory tree paths */ #define XFS_SCRUB_TYPE_RGSUPER 30 /* realtime superblock */ #define XFS_SCRUB_TYPE_RGBITMAP 31 /* realtime group bitmap */ +#define XFS_SCRUB_TYPE_RTRMAPBT 32 /* rtgroup reverse mapping btree */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 32 +#define XFS_SCRUB_TYPE_NR 33 /* * This special type code only applies to the vectored scrub implementation. diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index b90f0d8540ba5..8fa51350facc4 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -953,6 +953,7 @@ xchk_bmap( case XFS_DINODE_FMT_UUID: case XFS_DINODE_FMT_DEV: case XFS_DINODE_FMT_LOCAL: + case XFS_DINODE_FMT_RMAP: /* No mappings to check. */ if (whichfork == XFS_COW_FORK) xchk_fblock_set_corrupt(sc, whichfork, 0); diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 8a6dc7ff2f79e..9dddc5997b4fc 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -728,6 +728,7 @@ xrep_bmap_check_inputs( case XFS_DINODE_FMT_DEV: case XFS_DINODE_FMT_LOCAL: case XFS_DINODE_FMT_UUID: + case XFS_DINODE_FMT_RMAP: return -ECANCELED; case XFS_DINODE_FMT_EXTENTS: case XFS_DINODE_FMT_BTREE: diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index efce0dff2e937..5ea9f3f310335 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -35,6 +35,8 @@ #include "xfs_swapext.h" #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_bmap_util.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -866,6 +868,8 @@ xchk_rtgroup_init( struct xchk_rt *sr, unsigned int rtglock_flags) { + int error; + ASSERT(sr->rtg == NULL); ASSERT(sr->rtlock_flags == 0); @@ -873,7 +877,30 @@ xchk_rtgroup_init( if (!sr->rtg) return -ENOENT; - return xchk_rtgroup_drain_and_lock(sc, sr, rtglock_flags); + error = xchk_rtgroup_drain_and_lock(sc, sr, rtglock_flags); + if (error) + return error; + + if (xfs_has_rtrmapbt(sc->mp) && (rtglock_flags & XFS_RTGLOCK_RMAP)) + sr->rmap_cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, + sr->rtg, sr->rtg->rtg_rmapip); + + return 0; +} + +/* + * Free all the btree cursors and other incore data relating to the realtime + * group. This has to be done /before/ committing (or cancelling) the scrub + * transaction. + */ +void +xchk_rtgroup_btcur_free( + struct xchk_rt *sr) +{ + if (sr->rmap_cur) + xfs_btree_del_cursor(sr->rmap_cur, XFS_BTREE_ERROR); + + sr->rmap_cur = NULL; } /* @@ -961,6 +988,14 @@ xchk_setup_fs( return xchk_trans_alloc(sc, resblks); } +/* Set us up with a transaction and an empty context to repair rt metadata. */ +int +xchk_setup_rt( + struct xfs_scrub *sc) +{ + return xchk_trans_alloc(sc, 0); +} + /* Set us up with AG headers and btree cursors. */ int xchk_setup_ag_btree( @@ -1696,3 +1731,44 @@ xchk_inode_is_allocated( rcu_read_unlock(); return error; } + +/* Count the blocks used by a file, even if it's a metadata inode. */ +int +xchk_inode_count_blocks( + struct xfs_scrub *sc, + int whichfork, + xfs_extnum_t *nextents, + xfs_filblks_t *count) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(sc->ip, whichfork); + struct xfs_btree_cur *cur; + xfs_extlen_t btblocks; + int error; + + if (!ifp) { + *nextents = 0; + *count = 0; + return 0; + } + + switch (ifp->if_format) { + case XFS_DINODE_FMT_RMAP: + if (!sc->sr.rtg) { + ASSERT(0); + return -EFSCORRUPTED; + } + cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, sc->sr.rtg, + sc->ip); + error = xfs_btree_count_blocks(cur, &btblocks); + xfs_btree_del_cursor(cur, error); + if (error) + return error; + + *nextents = 0; + *count = btblocks - 1; + return 0; + default: + return xfs_bmap_count_blocks(sc->tp, sc->ip, whichfork, + nextents, count); + } +} diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index f96dd658feab9..3251f80c00e15 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -65,6 +65,7 @@ static inline int xchk_setup_nothing(struct xfs_scrub *sc) /* Setup functions */ int xchk_setup_agheader(struct xfs_scrub *sc); int xchk_setup_fs(struct xfs_scrub *sc); +int xchk_setup_rt(struct xfs_scrub *sc); int xchk_setup_ag_allocbt(struct xfs_scrub *sc); int xchk_setup_ag_iallocbt(struct xfs_scrub *sc); int xchk_setup_ag_rmapbt(struct xfs_scrub *sc); @@ -83,11 +84,13 @@ int xchk_setup_rtbitmap(struct xfs_scrub *sc); int xchk_setup_rtsummary(struct xfs_scrub *sc); int xchk_setup_rgsuperblock(struct xfs_scrub *sc); int xchk_setup_rgbitmap(struct xfs_scrub *sc); +int xchk_setup_rtrmapbt(struct xfs_scrub *sc); #else # define xchk_setup_rtbitmap xchk_setup_nothing # define xchk_setup_rtsummary xchk_setup_nothing # define xchk_setup_rgsuperblock xchk_setup_nothing # define xchk_setup_rgbitmap xchk_setup_nothing +# define xchk_setup_rtrmapbt xchk_setup_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_ino_dqattach(struct xfs_scrub *sc); @@ -148,14 +151,17 @@ void xchk_rt_unlock_rtbitmap(struct xfs_scrub *sc); #ifdef CONFIG_XFS_RT /* All the locks we need to check an rtgroup. */ -#define XCHK_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP_SHARED) +#define XCHK_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP_SHARED | \ + XFS_RTGLOCK_RMAP) int xchk_rtgroup_init(struct xfs_scrub *sc, xfs_rgnumber_t rgno, struct xchk_rt *sr, unsigned int rtglock_flags); void xchk_rtgroup_unlock(struct xfs_scrub *sc, struct xchk_rt *sr); +void xchk_rtgroup_btcur_free(struct xchk_rt *sr); void xchk_rtgroup_free(struct xfs_scrub *sc, struct xchk_rt *sr); #else # define xchk_rtgroup_init(sc, rgno, sr, lockflags) (-ENOSYS) +# define xchk_rtgroup_btcur_free(sr) ((void)0) # define xchk_rtgroup_free(sc, sr) ((void)0) #endif /* CONFIG_XFS_RT */ @@ -282,5 +288,7 @@ void xchk_fsgates_enable(struct xfs_scrub *sc, unsigned int scrub_fshooks); int xchk_inode_is_allocated(struct xfs_scrub *sc, xfs_agino_t agino, bool *inuse); +int xchk_inode_count_blocks(struct xfs_scrub *sc, int whichfork, + xfs_extnum_t *nextents, xfs_filblks_t *count); #endif /* __XFS_SCRUB_COMMON_H__ */ diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c index 7fccb1a03060a..0fec833057697 100644 --- a/fs/xfs/scrub/health.c +++ b/fs/xfs/scrub/health.c @@ -115,6 +115,7 @@ static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = { [XFS_SCRUB_TYPE_DIRTREE] = { XHG_INO, XFS_SICK_INO_DIRTREE }, [XFS_SCRUB_TYPE_METAPATH] = { XHG_FS, XFS_SICK_FS_METAPATH }, [XFS_SCRUB_TYPE_RGSUPER] = { XHG_RTGROUP, XFS_SICK_RT_SUPER }, + [XFS_SCRUB_TYPE_RTRMAPBT] = { XHG_RTGROUP, XFS_SICK_RT_RMAPBT }, }; /* Return the health status mask for this scrub type. */ diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index 7357bf5156ba3..5fc10e02b9c41 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -496,6 +496,10 @@ xchk_dinode( if (!S_ISREG(mode) && !S_ISDIR(mode)) xchk_ino_set_corrupt(sc, ino); break; + case XFS_DINODE_FMT_RMAP: + if (!S_ISREG(mode)) + xchk_ino_set_corrupt(sc, ino); + break; case XFS_DINODE_FMT_UUID: default: xchk_ino_set_corrupt(sc, ino); @@ -680,15 +684,13 @@ xchk_inode_xref_bmap( return; /* Walk all the extents to check nextents/naextents/nblocks. */ - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_DATA_FORK, - &nextents, &count); + error = xchk_inode_count_blocks(sc, XFS_DATA_FORK, &nextents, &count); if (!xchk_should_check_xref(sc, &error, NULL)) return; if (nextents < xfs_dfork_data_extents(dip)) xchk_ino_xref_set_corrupt(sc, sc->ip->i_ino); - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK, - &nextents, &acount); + error = xchk_inode_count_blocks(sc, XFS_ATTR_FORK, &nextents, &acount); if (!xchk_should_check_xref(sc, &error, NULL)) return; if (nextents != xfs_dfork_attr_extents(dip)) diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 30be423b3716e..2b3a6cbadae71 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -1446,8 +1446,7 @@ xrep_inode_blockcounts( trace_xrep_inode_blockcounts(sc); /* Set data fork counters from the data fork mappings. */ - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_DATA_FORK, - &nextents, &count); + error = xchk_inode_count_blocks(sc, XFS_DATA_FORK, &nextents, &count); if (error) return error; if (xfs_is_reflink_inode(sc->ip)) { @@ -1471,8 +1470,8 @@ xrep_inode_blockcounts( /* Set attr fork counters from the attr fork mappings. */ ifp = xfs_ifork_ptr(sc->ip, XFS_ATTR_FORK); if (ifp) { - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK, - &nextents, &acount); + error = xchk_inode_count_blocks(sc, XFS_ATTR_FORK, &nextents, + &acount); if (error) return error; if (count >= sc->mp->m_sb.sb_dblocks) diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 24d99891a634c..cad45d5b86e2e 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -61,6 +61,7 @@ xrep_attempt( trace_xrep_attempt(XFS_I(file_inode(sc->file)), sc->sm, error); xchk_ag_btcur_free(&sc->sa); + xchk_rtgroup_btcur_free(&sc->sr); /* Repair whatever's broken. */ ASSERT(sc->ops->repair); diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c new file mode 100644 index 0000000000000..7c7da0b232321 --- /dev/null +++ b/fs/xfs/scrub/rtrmap.c @@ -0,0 +1,193 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2018-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_inode.h" +#include "xfs_rtalloc.h" +#include "xfs_rtgroup.h" +#include "xfs_imeta.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" + +/* Set us up with the realtime metadata locked. */ +int +xchk_setup_rtrmapbt( + struct xfs_scrub *sc) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_rtgroup *rtg; + int error = 0; + + if (xchk_need_intent_drain(sc)) + xchk_fsgates_enable(sc, XCHK_FSGATES_DRAIN); + + rtg = xfs_rtgroup_get(mp, sc->sm->sm_agno); + if (!rtg) + return -ENOENT; + + error = xchk_setup_rt(sc); + if (error) + goto out_rtg; + + error = xchk_install_live_inode(sc, rtg->rtg_rmapip); + if (error) + goto out_rtg; + + error = xchk_ino_dqattach(sc); + if (error) + goto out_rtg; + + error = xchk_rtgroup_init(sc, rtg->rtg_rgno, &sc->sr, XCHK_RTGLOCK_ALL); +out_rtg: + xfs_rtgroup_put(rtg); + return error; +} + +/* Realtime reverse mapping. */ + +struct xchk_rtrmap { + /* + * The furthest-reaching of the rmapbt records that we've already + * processed. This enables us to detect overlapping records for space + * allocations that cannot be shared. + */ + struct xfs_rmap_irec overlap_rec; + + /* + * The previous rmapbt record, so that we can check for two records + * that could be one. + */ + struct xfs_rmap_irec prev_rec; +}; + +/* Flag failures for records that overlap but cannot. */ +STATIC void +xchk_rtrmapbt_check_overlapping( + struct xchk_btree *bs, + struct xchk_rtrmap *cr, + const struct xfs_rmap_irec *irec) +{ + xfs_rtblock_t pnext, inext; + + if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + /* No previous record? */ + if (cr->overlap_rec.rm_blockcount == 0) + goto set_prev; + + /* Do overlap_rec and irec overlap? */ + pnext = cr->overlap_rec.rm_startblock + cr->overlap_rec.rm_blockcount; + if (pnext <= irec->rm_startblock) + goto set_prev; + + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + + /* Save whichever rmap record extends furthest. */ + inext = irec->rm_startblock + irec->rm_blockcount; + if (pnext > inext) + return; + +set_prev: + memcpy(&cr->overlap_rec, irec, sizeof(struct xfs_rmap_irec)); +} + +/* Decide if two reverse-mapping records can be merged. */ +static inline bool +xchk_rtrmap_mergeable( + struct xchk_rtrmap *cr, + const struct xfs_rmap_irec *r2) +{ + const struct xfs_rmap_irec *r1 = &cr->prev_rec; + + /* Ignore if prev_rec is not yet initialized. */ + if (cr->prev_rec.rm_blockcount == 0) + return false; + + if (r1->rm_owner != r2->rm_owner) + return false; + if (r1->rm_startblock + r1->rm_blockcount != r2->rm_startblock) + return false; + if ((unsigned long long)r1->rm_blockcount + r2->rm_blockcount > + XFS_RMAP_LEN_MAX) + return false; + if (r1->rm_flags != r2->rm_flags) + return false; + return r1->rm_offset + r1->rm_blockcount == r2->rm_offset; +} + +/* Flag failures for records that could be merged. */ +STATIC void +xchk_rtrmapbt_check_mergeable( + struct xchk_btree *bs, + struct xchk_rtrmap *cr, + const struct xfs_rmap_irec *irec) +{ + if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + if (xchk_rtrmap_mergeable(cr, irec)) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + + memcpy(&cr->prev_rec, irec, sizeof(struct xfs_rmap_irec)); +} + +/* Scrub a realtime rmapbt record. */ +STATIC int +xchk_rtrmapbt_rec( + struct xchk_btree *bs, + const union xfs_btree_rec *rec) +{ + struct xchk_rtrmap *cr = bs->private; + struct xfs_rmap_irec irec; + + if (xfs_rmap_btrec_to_irec(rec, &irec) != NULL || + xfs_rtrmap_check_irec(bs->cur->bc_ino.rtg, &irec) != NULL) { + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + return 0; + } + + if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return 0; + + xchk_rtrmapbt_check_mergeable(bs, cr, &irec); + xchk_rtrmapbt_check_overlapping(bs, cr, &irec); + return 0; +} + +/* Scrub the realtime rmap btree. */ +int +xchk_rtrmapbt( + struct xfs_scrub *sc) +{ + struct xfs_owner_info oinfo; + struct xchk_rtrmap cr = { }; + int error; + + error = xchk_metadata_inode_forks(sc); + if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) + return error; + + xfs_rmap_ino_bmbt_owner(&oinfo, sc->sr.rtg->rtg_rmapip->i_ino, + XFS_DATA_FORK); + return xchk_btree(sc, sc->sr.rmap_cur, xchk_rtrmapbt_rec, &oinfo, &cr); +} diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 58bb52a63782e..428624d4f0c45 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -223,6 +223,8 @@ xchk_teardown( int error) { xchk_ag_free(sc, &sc->sa); + xchk_rtgroup_btcur_free(&sc->sr); + if (sc->tp) { if (error == 0 && (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)) error = xfs_trans_commit(sc->tp); @@ -472,6 +474,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .has = xfs_has_rtgroups, .repair = xrep_notsupported, }, + [XFS_SCRUB_TYPE_RTRMAPBT] = { /* realtime group rmapbt */ + .type = ST_RTGROUP, + .setup = xchk_setup_rtrmapbt, + .scrub = xchk_rtrmapbt, + .has = xfs_has_rtrmapbt, + .repair = xrep_notsupported, + }, }; static int diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 76eca41a8995a..52bbc1fbcc560 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -122,6 +122,9 @@ struct xchk_rt { * if rtg != NULL. */ unsigned int rtlock_flags; + + /* rtgroup btrees */ + struct xfs_btree_cur *rmap_cur; }; struct xfs_scrub { @@ -278,11 +281,13 @@ int xchk_rtbitmap(struct xfs_scrub *sc); int xchk_rtsummary(struct xfs_scrub *sc); int xchk_rgsuperblock(struct xfs_scrub *sc); int xchk_rgbitmap(struct xfs_scrub *sc); +int xchk_rtrmapbt(struct xfs_scrub *sc); #else # define xchk_rtbitmap xchk_nothing # define xchk_rtsummary xchk_nothing # define xchk_rgsuperblock xchk_nothing # define xchk_rgbitmap xchk_nothing +# define xchk_rtrmapbt xchk_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_quota(struct xfs_scrub *sc); diff --git a/fs/xfs/scrub/stats.c b/fs/xfs/scrub/stats.c index 4bdff9a19dd6c..0da7ecabfe9d9 100644 --- a/fs/xfs/scrub/stats.c +++ b/fs/xfs/scrub/stats.c @@ -83,6 +83,7 @@ static const char *name_map[XFS_SCRUB_TYPE_NR] = { [XFS_SCRUB_TYPE_METAPATH] = "metapath", [XFS_SCRUB_TYPE_RGSUPER] = "rgsuper", [XFS_SCRUB_TYPE_RGBITMAP] = "rgbitmap", + [XFS_SCRUB_TYPE_RTRMAPBT] = "rtrmapbt", }; /* Format the scrub stats into a text buffer, similar to pcp style. */ diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 822fcdfd89a4b..f02914f129605 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -86,6 +86,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_BARRIER); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_METAPATH); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGSUPER); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGBITMAP); +TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RTRMAPBT); #define XFS_SCRUB_TYPE_STRINGS \ { XFS_SCRUB_TYPE_PROBE, "probe" }, \ @@ -120,7 +121,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGBITMAP); { XFS_SCRUB_TYPE_BARRIER, "barrier" }, \ { XFS_SCRUB_TYPE_METAPATH, "metapath" }, \ { XFS_SCRUB_TYPE_RGSUPER, "rgsuper" }, \ - { XFS_SCRUB_TYPE_RGBITMAP, "rgbitmap" } + { XFS_SCRUB_TYPE_RGBITMAP, "rgbitmap" }, \ + { XFS_SCRUB_TYPE_RTRMAPBT, "rtrmapbt" } #define XFS_SCRUB_FLAG_STRINGS \ { XFS_SCRUB_IFLAG_REPAIR, "repair" }, \ From patchwork Sun Dec 31 21:38:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507690 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 612F8BA2E for ; Sun, 31 Dec 2023 21:38:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TICCqyLl" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D3C8AC433C7; Sun, 31 Dec 2023 21:38:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058714; bh=GdqA0Mu8ofmYcySdlvHf9Q9hGB++9U2cIUDkgRsOOgM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=TICCqyLlfFJ3B61jrCvBoTVcYcvy2ownjgt46R/RWNbtpIKLBSJEWXznBvy2XhptJ WA/CZLJNCm2KVLZlP8rXkRMMQi/1wglFsZH8Si6UAsSXNFAImhmNHssqUByHBG99iS DYEmI+myrMRMBaL16ZEoBM9/VdKanC3GJqYu7KO3o71CFLq+E6it7zxhNjfrA5viVQ TyG3XiaL0PCSSSt8FqoaCQk7sCqzs6gOILH/gvClk/CFGjYix5NwdU3c72au2hbk0L 50N2hzVjmDmMWyi1d9oooLhfZygUAZeE1npEQ1pDbFa83kMuzCwp0r7RUcl8EJdjSi ul/C/bkkuxT+w== Date: Sun, 31 Dec 2023 13:38:34 -0800 Subject: [PATCH 26/39] xfs: cross-reference realtime bitmap to realtime rmapbt scrubber From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850320.1764998.10421725619605918249.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When we're checking the realtime rmap btree entries, cross-reference those entries with the realtime bitmap too. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/rtrmap.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index 7c7da0b232321..6009d458605e4 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -151,6 +151,23 @@ xchk_rtrmapbt_check_mergeable( memcpy(&cr->prev_rec, irec, sizeof(struct xfs_rmap_irec)); } +/* Cross-reference with other metadata. */ +STATIC void +xchk_rtrmapbt_xref( + struct xfs_scrub *sc, + struct xfs_rmap_irec *irec) +{ + xfs_rtblock_t rtbno; + + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, + irec->rm_startblock); + + xchk_xref_is_used_rt_space(sc, rtbno, irec->rm_blockcount); +} + /* Scrub a realtime rmapbt record. */ STATIC int xchk_rtrmapbt_rec( @@ -171,6 +188,7 @@ xchk_rtrmapbt_rec( xchk_rtrmapbt_check_mergeable(bs, cr, &irec); xchk_rtrmapbt_check_overlapping(bs, cr, &irec); + xchk_rtrmapbt_xref(bs->sc, &irec); return 0; } From patchwork Sun Dec 31 21:38:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507691 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F03A0BE47 for ; Sun, 31 Dec 2023 21:38:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="G/6Kifk2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 881B5C433C8; Sun, 31 Dec 2023 21:38:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058730; bh=aVaFy32h38MRsmD4OUjtLCoIg8kyU1wx9vIp1kySICk=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=G/6Kifk26qUsALhwRbekXAm+PaqTYp5eCnYAAn1UPd50dfKuVaRaIOsrOjjH0PWbB Ra6qHIbdtn9X62U02jQrbQzUs4RqsLF8BojPeWKMsPQaJkd25fTeDrORYRivVFfoMF shvPO0k4YIfPO57j3eAcgV6kDVwWi4/FeBNgrYobm0uAawtesROEsK+a8oK0rtIvOv 7mBWKl9PuKYlYf5P3BPfMfHs+CM+Hd9dtpZrxBvoL5VGEo3RHXCZLw6z1mZj1QLIVQ mfrpAEudsGVzpitSqb8xYoo8/VzvbPdDI7qVvQ/Z7m0zCT24b5i5g4CnSVgV6Oql7v NZ2PoasOvq3Eg== Date: Sun, 31 Dec 2023 13:38:50 -0800 Subject: [PATCH 27/39] xfs: cross-reference the realtime rmapbt From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850335.1764998.14778110195238361933.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Teach the data fork and realtime bitmap scrubbers to cross-reference information with the realtime rmap btree. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/bmap.c | 72 ++++++++++++++++++++++++++++++++++++----------- fs/xfs/scrub/rtbitmap.c | 58 ++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/rtbitmap.h | 3 ++ fs/xfs/scrub/rtrmap.c | 65 ++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.h | 9 ++++++ 5 files changed, 190 insertions(+), 17 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 8fa51350facc4..604c4df2fdb34 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -20,6 +20,7 @@ #include "xfs_rmap.h" #include "xfs_rmap_btree.h" #include "xfs_health.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/btree.h" @@ -142,15 +143,22 @@ static inline bool xchk_bmap_get_rmap( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec, - xfs_agblock_t agbno, + xfs_agblock_t bno, uint64_t owner, struct xfs_rmap_irec *rmap) { + struct xfs_btree_cur **curp = &info->sc->sa.rmap_cur; xfs_fileoff_t offset; unsigned int rflags = 0; int has_rmap; int error; + if (xfs_ifork_is_realtime(info->sc->ip, info->whichfork)) + curp = &info->sc->sr.rmap_cur; + + if (*curp == NULL) + return false; + if (info->whichfork == XFS_ATTR_FORK) rflags |= XFS_RMAP_ATTR_FORK; if (irec->br_state == XFS_EXT_UNWRITTEN) @@ -171,13 +179,13 @@ xchk_bmap_get_rmap( * range rmap lookup to make sure we get the correct owner/offset. */ if (info->is_shared) { - error = xfs_rmap_lookup_le_range(info->sc->sa.rmap_cur, agbno, - owner, offset, rflags, rmap, &has_rmap); + error = xfs_rmap_lookup_le_range(*curp, bno, owner, offset, + rflags, rmap, &has_rmap); } else { - error = xfs_rmap_lookup_le(info->sc->sa.rmap_cur, agbno, - owner, offset, rflags, rmap, &has_rmap); + error = xfs_rmap_lookup_le(*curp, bno, owner, offset, + rflags, rmap, &has_rmap); } - if (!xchk_should_check_xref(info->sc, &error, &info->sc->sa.rmap_cur)) + if (!xchk_should_check_xref(info->sc, &error, curp)) return false; if (!has_rmap) @@ -191,29 +199,29 @@ STATIC void xchk_bmap_xref_rmap( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec, - xfs_agblock_t agbno) + xfs_agblock_t bno) { struct xfs_rmap_irec rmap; unsigned long long rmap_end; uint64_t owner = info->sc->ip->i_ino; - if (!info->sc->sa.rmap_cur || xchk_skip_xref(info->sc->sm)) + if (xchk_skip_xref(info->sc->sm)) return; /* Find the rmap record for this irec. */ - if (!xchk_bmap_get_rmap(info, irec, agbno, owner, &rmap)) + if (!xchk_bmap_get_rmap(info, irec, bno, owner, &rmap)) return; /* * The rmap must be an exact match for this incore file mapping record, * which may have arisen from multiple ondisk records. */ - if (rmap.rm_startblock != agbno) + if (rmap.rm_startblock != bno) xchk_fblock_xref_set_corrupt(info->sc, info->whichfork, irec->br_startoff); rmap_end = (unsigned long long)rmap.rm_startblock + rmap.rm_blockcount; - if (rmap_end != agbno + irec->br_blockcount) + if (rmap_end != bno + irec->br_blockcount) xchk_fblock_xref_set_corrupt(info->sc, info->whichfork, irec->br_startoff); @@ -258,7 +266,7 @@ STATIC void xchk_bmap_xref_rmap_cow( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec, - xfs_agblock_t agbno) + xfs_agblock_t bno) { struct xfs_rmap_irec rmap; unsigned long long rmap_end; @@ -268,7 +276,7 @@ xchk_bmap_xref_rmap_cow( return; /* Find the rmap record for this irec. */ - if (!xchk_bmap_get_rmap(info, irec, agbno, owner, &rmap)) + if (!xchk_bmap_get_rmap(info, irec, bno, owner, &rmap)) return; /* @@ -276,12 +284,12 @@ xchk_bmap_xref_rmap_cow( * can start before and end after the physical space allocated to this * mapping. There are no offsets to check. */ - if (rmap.rm_startblock > agbno) + if (rmap.rm_startblock > bno) xchk_fblock_xref_set_corrupt(info->sc, info->whichfork, irec->br_startoff); rmap_end = (unsigned long long)rmap.rm_startblock + rmap.rm_blockcount; - if (rmap_end < agbno + irec->br_blockcount) + if (rmap_end < bno + irec->br_blockcount) xchk_fblock_xref_set_corrupt(info->sc, info->whichfork, irec->br_startoff); @@ -314,10 +322,40 @@ xchk_bmap_rt_iextent_xref( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec) { - xchk_rt_init(info->sc, &info->sc->sr, XCHK_RTLOCK_BITMAP_SHARED); + struct xfs_owner_info oinfo; + struct xfs_mount *mp = ip->i_mount; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno; + int error; + + if (!xfs_has_rtrmapbt(mp)) { + xchk_rt_init(info->sc, &info->sc->sr, + XCHK_RTLOCK_BITMAP_SHARED); + xchk_xref_is_used_rt_space(info->sc, irec->br_startblock, + irec->br_blockcount); + xchk_rt_unlock(info->sc, &info->sc->sr); + return; + } + + rgbno = xfs_rtb_to_rgbno(mp, irec->br_startblock, &rgno); + error = xchk_rtgroup_init(info->sc, rgno, &info->sc->sr, + XCHK_RTGLOCK_ALL); + if (!xchk_fblock_process_error(info->sc, info->whichfork, + irec->br_startoff, &error)) + goto out_free; + xchk_xref_is_used_rt_space(info->sc, irec->br_startblock, irec->br_blockcount); - xchk_rt_unlock(info->sc, &info->sc->sr); + xchk_bmap_xref_rmap(info, irec, rgbno); + + xfs_rmap_ino_owner(&oinfo, info->sc->ip->i_ino, info->whichfork, + irec->br_startoff); + xchk_xref_is_only_rt_owned_by(info->sc, rgbno, irec->br_blockcount, + &oinfo); + +out_free: + xchk_rtgroup_btcur_free(&info->sc->sr); + xchk_rtgroup_free(info->sc, &info->sc->sr); } /* Cross-reference a single datadev extent record. */ diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 5bedb09387495..0d823eadbdba0 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -9,6 +9,7 @@ #include "xfs_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" +#include "xfs_btree.h" #include "xfs_log_format.h" #include "xfs_trans.h" #include "xfs_rtbitmap.h" @@ -16,10 +17,13 @@ #include "xfs_bmap.h" #include "xfs_bit.h" #include "xfs_rtgroup.h" +#include "xfs_rmap.h" +#include "xfs_rtrmap_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/repair.h" #include "scrub/rtbitmap.h" +#include "scrub/btree.h" static inline void xchk_rtbitmap_compute_geometry( @@ -123,6 +127,36 @@ xchk_setup_rtbitmap( /* Per-rtgroup bitmap contents. */ +/* Cross-reference rtbitmap entries with other metadata. */ +STATIC void +xchk_rgbitmap_xref( + struct xchk_rgbitmap *rgb, + xfs_rtblock_t startblock, + xfs_rtblock_t blockcount) +{ + struct xfs_scrub *sc = rgb->sc; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno; + + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + if (!sc->sr.rmap_cur) + return; + + rgbno = xfs_rtb_to_rgbno(sc->mp, startblock, &rgno); + xchk_xref_has_no_rt_owner(sc, rgbno, blockcount); + + if (rgb->next_free_rtblock < startblock) { + xfs_rgblock_t next_rgbno; + + next_rgbno = xfs_rtb_to_rgbno(sc->mp, rgb->next_free_rtblock, + &rgno); + xchk_xref_has_rt_owner(sc, next_rgbno, rgbno - next_rgbno); + } + + rgb->next_free_rtblock = startblock + blockcount; +} + /* Scrub a free extent record from the realtime bitmap. */ STATIC int xchk_rgbitmap_rec( @@ -141,6 +175,12 @@ xchk_rgbitmap_rec( if (!xfs_verify_rtbext(mp, startblock, blockcount)) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0); + + xchk_rgbitmap_xref(rgb, startblock, blockcount); + + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return -ECANCELED; + return 0; } @@ -154,6 +194,7 @@ xchk_rgbitmap( struct xfs_rtgroup *rtg = sc->sr.rtg; struct xchk_rgbitmap *rgb = sc->buf; xfs_rtblock_t rtbno; + xfs_rtblock_t last_rtbno; xfs_rgblock_t last_rgbno = rtg->rtg_blockcount - 1; int error; @@ -168,6 +209,7 @@ xchk_rgbitmap( * realtime group. */ rtbno = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, 0); + rgb->next_free_rtblock = rtbno; keys[0].ar_startext = xfs_rtb_to_rtx(mp, rtbno); rtbno = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, last_rgbno); @@ -179,6 +221,22 @@ xchk_rgbitmap( if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error)) return error; + /* + * Check that the are rmappings for all rt extents between the end of + * the last free extent we saw and the last possible extent in the rt + * group. + */ + last_rtbno = xfs_rgbno_to_rtb(sc->mp, rtg->rtg_rgno, last_rgbno); + if (rgb->next_free_rtblock < last_rtbno) { + xfs_rgnumber_t rgno; + xfs_rgblock_t next_rgbno; + + next_rgbno = xfs_rtb_to_rgbno(sc->mp, rgb->next_free_rtblock, + &rgno); + xchk_xref_has_rt_owner(sc, next_rgbno, + last_rgbno - next_rgbno); + } + return 0; } diff --git a/fs/xfs/scrub/rtbitmap.h b/fs/xfs/scrub/rtbitmap.h index f659f0e76b4fa..42d50fa48a0ec 100644 --- a/fs/xfs/scrub/rtbitmap.h +++ b/fs/xfs/scrub/rtbitmap.h @@ -17,6 +17,9 @@ struct xchk_rgbitmap { struct xfs_scrub *sc; struct xchk_rtbitmap rtb; + + /* The next free rt block that we expect to see. */ + xfs_rtblock_t next_free_rtblock; }; #ifdef CONFIG_XFS_ONLINE_REPAIR diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index 6009d458605e4..c3e1cee81b6d2 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -209,3 +209,68 @@ xchk_rtrmapbt( XFS_DATA_FORK); return xchk_btree(sc, sc->sr.rmap_cur, xchk_rtrmapbt_rec, &oinfo, &cr); } + +/* xref check that the extent has no realtime reverse mapping at all */ +void +xchk_xref_has_no_rt_owner( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + enum xbtree_recpacking outcome; + int error; + + if (!sc->sr.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_rmap_has_records(sc->sr.rmap_cur, bno, len, &outcome); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + return; + if (outcome != XBTREE_RECPACKING_EMPTY) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); +} + +/* xref check that the extent is completely mapped */ +void +xchk_xref_has_rt_owner( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + enum xbtree_recpacking outcome; + int error; + + if (!sc->sr.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_rmap_has_records(sc->sr.rmap_cur, bno, len, &outcome); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + return; + if (outcome != XBTREE_RECPACKING_FULL) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); +} + +/* xref check that the extent is only owned by a given owner */ +void +xchk_xref_is_only_rt_owned_by( + struct xfs_scrub *sc, + xfs_agblock_t bno, + xfs_extlen_t len, + const struct xfs_owner_info *oinfo) +{ + struct xfs_rmap_matches res; + int error; + + if (!sc->sr.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_rmap_count_owners(sc->sr.rmap_cur, bno, len, oinfo, &res); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + return; + if (res.matches != 1) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); + if (res.bad_non_owner_matches) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); + if (res.non_owner_matches) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); +} diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 52bbc1fbcc560..38731b99625d6 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -321,8 +321,17 @@ void xchk_xref_is_not_cow_staging(struct xfs_scrub *sc, xfs_agblock_t bno, #ifdef CONFIG_XFS_RT void xchk_xref_is_used_rt_space(struct xfs_scrub *sc, xfs_rtblock_t rtbno, xfs_extlen_t len); +void xchk_xref_has_no_rt_owner(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); +void xchk_xref_has_rt_owner(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); +void xchk_xref_is_only_rt_owned_by(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len, const struct xfs_owner_info *oinfo); #else # define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0) +# define xchk_xref_has_no_rt_owner(sc, rtbno, len) do { } while (0) +# define xchk_xref_has_rt_owner(sc, rtbno, len) do { } while (0) +# define xchk_xref_is_only_rt_owned_by(sc, bno, len, oinfo) do { } while (0) #endif #endif /* __XFS_SCRUB_SCRUB_H__ */ From patchwork Sun Dec 31 21:39:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507692 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59C58BE47 for ; Sun, 31 Dec 2023 21:39:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Xo3LlZpZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 271FDC433C8; Sun, 31 Dec 2023 21:39:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058746; bh=jb1UgfnOPX94YCGJJeKTy0vnAuzFOj9qU9vy65qEDNI=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Xo3LlZpZ0sxAbar+jrsnWKdieRhwNDOURhXFHOURTve4bmgUjXS73pVys0tyudKgr rAzLgau/9PniVA2XoAvbUskZeo6DQZEc3F9yohBfkVzUb0rxjpPV8Hb6VExi7mRIGE 5Qfl709osxjohsexFDGR06SCW2sg6orYbKg10bPiLUe2DmJloo4LlHoNEYSLVMrZ4I ScTFVgbyn0KMPnI2cY6oNQ/HwP4mZN4pQWoVJdtk7PiWM7MsR2X+41w41aS9dw2DA/ s+YB1hIFVcFGcccJXaI45x4GbtlvRa4vxzb25H5uxLTN2oQZPHDIN8uis5B/5hDlzr S2Yq+3lMNSQMQ== Date: Sun, 31 Dec 2023 13:39:05 -0800 Subject: [PATCH 28/39] xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850351.1764998.5239025918043401996.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Teach the bmbt scrubber how to perform a comprehensive check that the rmapbt does not contain /any/ mappings that are not described by bmbt records when it's dealing with a realtime file. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/bmap.c | 60 +++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 53 insertions(+), 7 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 604c4df2fdb34..696ac7208c4d5 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -21,6 +21,8 @@ #include "xfs_rmap_btree.h" #include "xfs_health.h" #include "xfs_rtgroup.h" +#include "xfs_rtalloc.h" +#include "xfs_rtrmap_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/btree.h" @@ -637,12 +639,20 @@ xchk_bmap_check_rmap( */ check_rec = *rec; while (have_map) { + xfs_fsblock_t startblock; + if (irec.br_startoff != check_rec.rm_offset) xchk_fblock_set_corrupt(sc, sbcri->whichfork, check_rec.rm_offset); - if (irec.br_startblock != XFS_AGB_TO_FSB(sc->mp, - cur->bc_ag.pag->pag_agno, - check_rec.rm_startblock)) + if (cur->bc_btnum == XFS_BTNUM_RMAP) + startblock = XFS_AGB_TO_FSB(sc->mp, + cur->bc_ag.pag->pag_agno, + check_rec.rm_startblock); + else + startblock = xfs_rgbno_to_rtb(sc->mp, + cur->bc_ino.rtg->rtg_rgno, + check_rec.rm_startblock); + if (irec.br_startblock != startblock) xchk_fblock_set_corrupt(sc, sbcri->whichfork, check_rec.rm_offset); if (irec.br_blockcount > check_rec.rm_blockcount) @@ -696,6 +706,30 @@ xchk_bmap_check_ag_rmaps( return error; } +/* Make sure each rt rmap has a corresponding bmbt entry. */ +STATIC int +xchk_bmap_check_rt_rmaps( + struct xfs_scrub *sc, + struct xfs_rtgroup *rtg) +{ + struct xchk_bmap_check_rmap_info sbcri; + struct xfs_btree_cur *cur; + int error; + + xfs_rtgroup_lock(NULL, rtg, XFS_RTGLOCK_RMAP); + cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, rtg, rtg->rtg_rmapip); + + sbcri.sc = sc; + sbcri.whichfork = XFS_DATA_FORK; + error = xfs_rmap_query_all(cur, xchk_bmap_check_rmap, &sbcri); + if (error == -ECANCELED) + error = 0; + + xfs_btree_del_cursor(cur, error); + xfs_rtgroup_unlock(rtg, XFS_RTGLOCK_RMAP); + return error; +} + /* * Decide if we want to scan the reverse mappings to determine if the attr * fork /really/ has zero space mappings. @@ -750,10 +784,6 @@ xchk_bmap_check_empty_datafork( { struct xfs_ifork *ifp = &ip->i_df; - /* Don't support realtime rmap checks yet. */ - if (XFS_IS_REALTIME_INODE(ip)) - return false; - /* * If the dinode repair found a bad data fork, it will reset the fork * to extents format with zero records and wait for the this scrubber @@ -805,6 +835,22 @@ xchk_bmap_check_rmaps( xfs_agnumber_t agno; int error; + if (xfs_ifork_is_realtime(sc->ip, whichfork)) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + + for_each_rtgroup(sc->mp, rgno, rtg) { + error = xchk_bmap_check_rt_rmaps(sc, rtg); + if (error || + (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) { + xfs_rtgroup_rele(rtg); + return error; + } + } + + return 0; + } + for_each_perag(sc->mp, agno, pag) { error = xchk_bmap_check_ag_rmaps(sc, whichfork, pag); if (error || From patchwork Sun Dec 31 21:39:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507693 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E07BDBE4A for ; Sun, 31 Dec 2023 21:39:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="f2mwfihD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B66F8C433C8; Sun, 31 Dec 2023 21:39:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058761; bh=S16yVOAcnYwnGXujtaQ2PZx5qhOmCPIQiGs09HbFoZA=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=f2mwfihDyBBNS3uuWH5Rw1uVJPxlqr1g2Fc0cWWbtf96Ybxx+WoIsHMO3IqGRN7e7 jS9/P+hbQ7qN5UkQtKTDwcKCRWQT5u55DvDXmJtD53SSDdv1AvGS+eUiwzXMqkxVfM NvoLsWZ8GVVZKp/LeAgpf3CIn6W585mIFU9J5q546urhhbqUZ33Lz5qOqC1VvEZmPN wGr+aQNIpIWZTPtRgO5TtUNXNic8aA3ny0B7GPeRmyG9utYV1uFuVmCnLGGJrfcwdk bMuM41KOPjLDb7MJOUWly7VK3l0TKOEj4Jqk6ppRMzDr46AV/Jde0BUmzf6a6y0C56 0ukBQTW4RqcDg== Date: Sun, 31 Dec 2023 13:39:21 -0800 Subject: [PATCH 29/39] xfs: scrub the metadir path of rt rmap btree files From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850367.1764998.11204405675323358106.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata directory tree path to the rmap btree file for each rt group. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_fs.h | 3 ++- fs/xfs/scrub/metapath.c | 27 ++++++++++++++++++++++++++- 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index dcf048aae8c17..0bbdbfb0a8ae7 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -804,9 +804,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_METAPATH_USRQUOTA 2 #define XFS_SCRUB_METAPATH_GRPQUOTA 3 #define XFS_SCRUB_METAPATH_PRJQUOTA 4 +#define XFS_SCRUB_METAPATH_RTRMAPBT 5 /* Number of metapath sm_ino values */ -#define XFS_SCRUB_METAPATH_NR 5 +#define XFS_SCRUB_METAPATH_NR 6 /* * ioctl limits diff --git a/fs/xfs/scrub/metapath.c b/fs/xfs/scrub/metapath.c index 5a669a1a8ad17..6afd117c890e9 100644 --- a/fs/xfs/scrub/metapath.c +++ b/fs/xfs/scrub/metapath.c @@ -21,6 +21,8 @@ #include "xfs_bmap_btree.h" #include "xfs_trans_space.h" #include "xfs_attr.h" +#include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -93,13 +95,25 @@ xchk_setup_metapath( struct xchk_metapath *mpath; struct xfs_mount *mp = sc->mp; struct xfs_inode *ip = NULL; + struct xfs_rtgroup *rtg; + struct xfs_imeta_path *path; int error; if (!xfs_has_metadir(mp)) return -ENOENT; - if (sc->sm->sm_gen || sc->sm->sm_agno) + if (sc->sm->sm_gen) return -EINVAL; + switch (sc->sm->sm_ino) { + case XFS_SCRUB_METAPATH_RTRMAPBT: + /* empty */ + break; + default: + if (sc->sm->sm_agno) + return -EINVAL; + break; + } + mpath = kzalloc(sizeof(struct xchk_metapath), XCHK_GFP_FLAGS); if (!mpath) return -ENOMEM; @@ -132,6 +146,17 @@ xchk_setup_metapath( if (XFS_IS_PQUOTA_ON(mp)) ip = xfs_quota_inode(mp, XFS_DQTYPE_PROJ); break; + case XFS_SCRUB_METAPATH_RTRMAPBT: + error = xfs_rtrmapbt_create_path(mp, sc->sm->sm_agno, &path); + if (error) + return error; + mpath->path = path; + rtg = xfs_rtgroup_get(mp, sc->sm->sm_agno); + if (rtg) { + ip = rtg->rtg_rmapip; + xfs_rtgroup_put(rtg); + } + break; default: return -EINVAL; } From patchwork Sun Dec 31 21:39:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507694 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68B9ABE47 for ; Sun, 31 Dec 2023 21:39:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="edy8AwB0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3D534C433C8; Sun, 31 Dec 2023 21:39:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058777; bh=2lePDWR8rW4wzB5O3WHR+t/Tdojrh+K+hBtB1OecntQ=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=edy8AwB077hb31Nuzci9cF6uArSfoC/kvax95hdnPU4cjLIazttroqcthAX/34sjD /m0nFhehfpg4Wab0DvIg28mV5oH75pHhcpVDHKn96FYfeNN5kolLXlY+qYWhZItueD a4h11HlrYThB4UJKAsj+ebS5Pa8yAkSRelBB6W3ngYtBvmUHdbCJvSDcHiaHPTcEt3 5BLjVPEJsuSSNUyu0K0ctxb7qSTNTNyaalphjVYPxyqy4nj4TWFoqbtEOj9+Q1fTuF YyWS8zDzYQbKvG27MikzqhXX08zK5ak3hzP01uoks6qL2R4XdvfrX/pBH9Mq3TFaaq rEh3rWb2H6Xbg== Date: Sun, 31 Dec 2023 13:39:36 -0800 Subject: [PATCH 30/39] xfs: walk the rt reverse mapping tree when rebuilding rmap From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850383.1764998.7674261539966586592.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When we're rebuilding the data device rmap, if we encounter an "rmap" format fork, we have to walk the (realtime) rmap btree inode to build the appropriate mappings. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/rmap_repair.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/fs/xfs/scrub/rmap_repair.c b/fs/xfs/scrub/rmap_repair.c index c6bb90fa43cca..7733334a1faa9 100644 --- a/fs/xfs/scrub/rmap_repair.c +++ b/fs/xfs/scrub/rmap_repair.c @@ -30,6 +30,8 @@ #include "xfs_refcount.h" #include "xfs_refcount_btree.h" #include "xfs_ag.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_rtgroup.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -513,6 +515,38 @@ xrep_rmap_scan_iext( return xrep_rmap_stash_accumulated(rf); } +static int +xrep_rmap_scan_rtrmapbt( + struct xrep_rmap_ifork *rf, + struct xfs_inode *ip) +{ + struct xfs_scrub *sc = rf->rr->sc; + struct xfs_btree_cur *cur; + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + int error; + + if (rf->whichfork != XFS_DATA_FORK) + return -EFSCORRUPTED; + + for_each_rtgroup(sc->mp, rgno, rtg) { + if (ip == rtg->rtg_rmapip) { + cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, rtg, ip); + error = xrep_rmap_scan_iroot_btree(rf, cur); + xfs_btree_del_cursor(cur, error); + xfs_rtgroup_rele(rtg); + return error; + } + } + + /* + * We shouldn't find an rmap format inode that isn't associated with + * an rtgroup! + */ + ASSERT(0); + return -EFSCORRUPTED; +} + /* Find all the extents from a given AG in an inode fork. */ STATIC int xrep_rmap_scan_ifork( @@ -542,6 +576,8 @@ xrep_rmap_scan_ifork( error = xrep_rmap_scan_bmbt(&rf, ip, &mappings_done); if (error || mappings_done) return error; + } else if (ifp->if_format == XFS_DINODE_FMT_RMAP) { + return xrep_rmap_scan_rtrmapbt(&rf, ip); } else if (ifp->if_format != XFS_DINODE_FMT_EXTENTS) { return 0; } From patchwork Sun Dec 31 21:39:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507695 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74029BE48 for ; Sun, 31 Dec 2023 21:39:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="E9WcrRa2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E125DC433C8; Sun, 31 Dec 2023 21:39:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058792; bh=4Y3RjqnT63jn9C3rnGbNJHdNCC3WM5jGO+C4mF3aUA4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=E9WcrRa2LIzh5KIZGhv/8jOvktAWIwBJYRWRppKiAJpkPpYTDqnzgEigeTZQNraxE 9wzotwF+HMlRixTdDI90kB159nnKFqEC4feAfP6s8vbCFrGM8Uvh4jjEe8asTgIx29 QbSapTwuhSUt4w/ABsUKTFli96my1dowiavcN0zrji5Yktv6oHrZX1UsR6dA3+CciM 7sdxwgZiTxsSIYfqiRxvHMph3HJygl38YTXgSad+OxP/4s8Jg02S6LS0b1Il0rj/w1 KcMJZ75meGimbr4uYdp5C7tMdBO0Gja3J6gOWBZUm2lalgSRaEd0Omm2FNBi+VS6Yt l1xSk9WWre7rw== Date: Sun, 31 Dec 2023 13:39:52 -0800 Subject: [PATCH 31/39] xfs: online repair of realtime file bmaps From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850399.1764998.13883768550246417804.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Repair the block mappings of realtime files. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/bmap_repair.c | 127 +++++++++++++++++++++++++++++++++++++++++++- fs/xfs/scrub/common.c | 2 - fs/xfs/scrub/common.h | 3 + fs/xfs/scrub/repair.c | 93 ++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 11 ++++ 5 files changed, 231 insertions(+), 5 deletions(-) diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 9dddc5997b4fc..25c52caae58a4 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -25,11 +25,13 @@ #include "xfs_bmap_btree.h" #include "xfs_rmap.h" #include "xfs_rmap_btree.h" +#include "xfs_rtrmap_btree.h" #include "xfs_refcount.h" #include "xfs_quota.h" #include "xfs_ialloc.h" #include "xfs_ag.h" #include "xfs_reflink.h" +#include "xfs_rtgroup.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -361,6 +363,116 @@ xrep_bmap_scan_ag( return error; } +#ifdef CONFIG_XFS_RT +/* Check for any obvious errors or conflicts in the file mapping. */ +STATIC int +xrep_bmap_check_rtfork_rmap( + struct xfs_scrub *sc, + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec) +{ + xfs_rtblock_t rtbno; + + /* xattr extents are never stored on realtime devices */ + if (rec->rm_flags & XFS_RMAP_ATTR_FORK) + return -EFSCORRUPTED; + + /* bmbt blocks are never stored on realtime devices */ + if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) + return -EFSCORRUPTED; + + /* Data extents for non-rt files are never stored on the rt device. */ + if (!XFS_IS_REALTIME_INODE(sc->ip)) + return -EFSCORRUPTED; + + /* Check the file offsets and physical extents. */ + if (!xfs_verify_fileext(sc->mp, rec->rm_offset, rec->rm_blockcount)) + return -EFSCORRUPTED; + + /* Check that this is within the rtgroup. */ + if (!xfs_verify_rgbext(cur->bc_ino.rtg, rec->rm_startblock, + rec->rm_blockcount)) + return -EFSCORRUPTED; + + /* Make sure this isn't free space. */ + rtbno = xfs_rgbno_to_rtb(sc->mp, cur->bc_ino.rtg->rtg_rgno, + rec->rm_startblock); + return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount); +} + +/* Record realtime extents that belong to this inode's fork. */ +STATIC int +xrep_bmap_walk_rtrmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_bmap *rb = priv; + xfs_rtblock_t rtbno; + int error = 0; + + if (xchk_should_terminate(rb->sc, &error)) + return error; + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != rb->sc->ip->i_ino) + return 0; + + error = xrep_bmap_check_rtfork_rmap(rb->sc, cur, rec); + if (error) + return error; + + /* + * Record all blocks allocated to this file even if the extent isn't + * for the fork we're rebuilding so that we can reset di_nblocks later. + */ + rb->nblocks += rec->rm_blockcount; + + /* If this rmap isn't for the fork we want, we're done. */ + if (rb->whichfork == XFS_DATA_FORK && + (rec->rm_flags & XFS_RMAP_ATTR_FORK)) + return 0; + if (rb->whichfork == XFS_ATTR_FORK && + !(rec->rm_flags & XFS_RMAP_ATTR_FORK)) + return 0; + + rtbno = xfs_rgbno_to_rtb(cur->bc_mp, cur->bc_ino.rtg->rtg_rgno, + rec->rm_startblock); + return xrep_bmap_from_rmap(rb, rec->rm_offset, rtbno, + rec->rm_blockcount, + rec->rm_flags & XFS_RMAP_UNWRITTEN); +} + +/* Scan the realtime reverse mappings to build the new extent map. */ +STATIC int +xrep_bmap_scan_rtgroup( + struct xrep_bmap *rb, + struct xfs_rtgroup *rtg) +{ + struct xfs_scrub *sc = rb->sc; + int error; + + if (xrep_is_rtmeta_ino(sc, rtg, sc->ip->i_ino)) + return 0; + + error = xrep_rtgroup_init(sc, rtg, &sc->sr, + XFS_RTGLOCK_RMAP | XFS_RTGLOCK_BITMAP_SHARED); + if (error) + return error; + + error = xfs_rmap_query_all(sc->sr.rmap_cur, xrep_bmap_walk_rtrmap, rb); + xchk_rtgroup_btcur_free(&sc->sr); + xchk_rtgroup_free(sc, &sc->sr); + return error; +} +#else +static inline int +xrep_bmap_scan_rtgroup(struct xrep_bmap *rb, struct xfs_rtgroup *rtg) +{ + return -EFSCORRUPTED; +} +#endif + /* Find the delalloc extents from the old incore extent tree. */ STATIC int xrep_bmap_find_delalloc( @@ -410,9 +522,20 @@ xrep_bmap_find_mappings( { struct xfs_scrub *sc = rb->sc; struct xfs_perag *pag; + struct xfs_rtgroup *rtg; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; int error = 0; + /* Iterate the rtrmaps for extents. */ + for_each_rtgroup(sc->mp, rgno, rtg) { + error = xrep_bmap_scan_rtgroup(rb, rtg); + if (error) { + xfs_rtgroup_rele(rtg); + return error; + } + } + /* Iterate the rmaps for extents. */ for_each_perag(sc->mp, agno, pag) { error = xrep_bmap_scan_ag(rb, pag); @@ -751,10 +874,6 @@ xrep_bmap_check_inputs( return -EINVAL; } - /* Don't know how to rebuild realtime data forks. */ - if (XFS_IS_REALTIME_INODE(sc->ip)) - return -EOPNOTSUPP; - return 0; } diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 5ea9f3f310335..e16185e9ddd9b 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -779,7 +779,7 @@ xchk_rt_unlock_rtbitmap( #ifdef CONFIG_XFS_RT /* Lock all the rt group metadata inode ILOCKs and wait for intents. */ -static int +int xchk_rtgroup_drain_and_lock( struct xfs_scrub *sc, struct xchk_rt *sr, diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 3251f80c00e15..c0b2c62d4bd82 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -159,10 +159,13 @@ int xchk_rtgroup_init(struct xfs_scrub *sc, xfs_rgnumber_t rgno, void xchk_rtgroup_unlock(struct xfs_scrub *sc, struct xchk_rt *sr); void xchk_rtgroup_btcur_free(struct xchk_rt *sr); void xchk_rtgroup_free(struct xfs_scrub *sc, struct xchk_rt *sr); +int xchk_rtgroup_drain_and_lock(struct xfs_scrub *sc, struct xchk_rt *sr, + unsigned int rtglock_flags); #else # define xchk_rtgroup_init(sc, rgno, sr, lockflags) (-ENOSYS) # define xchk_rtgroup_btcur_free(sr) ((void)0) # define xchk_rtgroup_free(sc, sr) ((void)0) +# define xchk_rtgroup_drain_and_lock(sc, sr, lockflags) (-ENOSYS) #endif /* CONFIG_XFS_RT */ int xchk_ag_read_headers(struct xfs_scrub *sc, xfs_agnumber_t agno, diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index cad45d5b86e2e..5af1a8871de55 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -36,6 +36,9 @@ #include "xfs_da_btree.h" #include "xfs_attr.h" #include "xfs_dir2.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -954,6 +957,73 @@ xrep_ag_init( return 0; } +#ifdef CONFIG_XFS_RT +/* Initialize all the btree cursors for a RT repair. */ +static void +xrep_rtgroup_btcur_init( + struct xfs_scrub *sc, + struct xchk_rt *sr) +{ + struct xfs_mount *mp = sc->mp; + + ASSERT(sr->rtg != NULL); + + if (sc->sm->sm_type != XFS_SCRUB_TYPE_RTRMAPBT && + (sr->rtlock_flags & XFS_RTGLOCK_RMAP) && + xfs_has_rtrmapbt(mp)) + sr->rmap_cur = xfs_rtrmapbt_init_cursor(mp, sc->tp, sr->rtg, + sr->rtg->rtg_rmapip); +} + +/* + * Given a reference to a rtgroup structure, lock rtgroup btree inodes and + * create btree cursors. Must only be called to repair a regular rt file. + */ +int +xrep_rtgroup_init( + struct xfs_scrub *sc, + struct xfs_rtgroup *rtg, + struct xchk_rt *sr, + unsigned int rtglock_flags) +{ + ASSERT(sr->rtg == NULL); + + xfs_rtgroup_lock(NULL, rtg, rtglock_flags); + sr->rtlock_flags = rtglock_flags; + + /* Grab our own passive reference from the caller's ref. */ + sr->rtg = xfs_rtgroup_hold(rtg); + xrep_rtgroup_btcur_init(sc, sr); + return 0; +} + +/* Ensure that all rt blocks in the given range are not marked free. */ +int +xrep_require_rtext_inuse( + struct xfs_scrub *sc, + xfs_rtblock_t rtbno, + xfs_filblks_t len) +{ + struct xfs_mount *mp = sc->mp; + xfs_rtxnum_t startrtx; + xfs_rtxnum_t endrtx; + bool is_free = false; + int error; + + startrtx = xfs_rtb_to_rtx(mp, rtbno); + endrtx = xfs_rtb_to_rtx(mp, rtbno + len - 1); + + error = xfs_rtalloc_extent_is_free(mp, sc->tp, startrtx, + endrtx - startrtx + 1, &is_free); + if (error) + return error; + if (is_free) + return -EFSCORRUPTED; + + return 0; +} +#endif /* CONFIG_XFS_RT */ + /* Reinitialize the per-AG block reservation for the AG we just fixed. */ int xrep_reset_perag_resv( @@ -1209,3 +1279,26 @@ xrep_buf_verify_struct( return fa == NULL; } + +/* Are we looking at a realtime metadata inode? */ +bool +xrep_is_rtmeta_ino( + struct xfs_scrub *sc, + struct xfs_rtgroup *rtg, + xfs_ino_t ino) +{ + /* + * All filesystems have rt bitmap and summary inodes, even if they + * don't have an rt section. + */ + if (ino == sc->mp->m_rbmip->i_ino) + return true; + if (ino == sc->mp->m_rsumip->i_ino) + return true; + + /* Newer rt metadata files are not guaranteed to exist */ + if (rtg->rtg_rmapip && ino == rtg->rtg_rmapip->i_ino) + return true; + + return false; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 3d8ca12b2cc51..e7fe4c5d8de5d 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -106,6 +106,17 @@ int xrep_setup_inode(struct xfs_scrub *sc, const struct xfs_imap *imap); void xrep_ag_btcur_init(struct xfs_scrub *sc, struct xchk_ag *sa); int xrep_ag_init(struct xfs_scrub *sc, struct xfs_perag *pag, struct xchk_ag *sa); +#ifdef CONFIG_XFS_RT +int xrep_rtgroup_init(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, + struct xchk_rt *sr, unsigned int rtglock_flags); +int xrep_require_rtext_inuse(struct xfs_scrub *sc, xfs_rtblock_t rtbno, + xfs_filblks_t len); +#else +# define xrep_rtgroup_init(sc, rtg, sr, lockflags) (-ENOSYS) +#endif /* CONFIG_XFS_RT */ + +bool xrep_is_rtmeta_ino(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, + xfs_ino_t ino); /* Metadata revalidators */ From patchwork Sun Dec 31 21:40:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507696 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AEE02BE47 for ; Sun, 31 Dec 2023 21:40:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QeyKBHo2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 837C8C433C7; Sun, 31 Dec 2023 21:40:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058808; bh=jf3w9WMZmFVo2XW5zq/B9PlurFTOu0PZWALJH/rXQnY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=QeyKBHo2rlDk4vEYYuNS6TdTava5uB/OAZ3xwfI5Hw6LCxN+mGM203EVPYTvcFGSW 9M1+/Z9jS2OJUMK7EwT+A2gcu9NKBQsVDw3RgkBcEweooLRl125C/bCMf1qinnEciX 7D2BEAf0vOZBsGCaWU0gg54KG4nF3pMWb7hEf2/82CQef5+/HI1dt/16Owa+yaOjhj J85gzcabheCL63EK8hKBfWBAgMt04isnL3JEn6p/qHbxpfG6omWIH82+ApShoalXq1 o1oYmqlnrHgfkD1UKkqEoqntkz3fpGqUFKXI+bdj7Tk8FfW0A6u6a+GNdwIDgDa/jI dWBBpLEtYcwYw== Date: Sun, 31 Dec 2023 13:40:08 -0800 Subject: [PATCH 32/39] xfs: repair inodes that have realtime extents From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850415.1764998.13498058839507532597.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Plumb into the inode core repair code the ability to search for extents on realtime devices. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/inode_repair.c | 63 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 62 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 2b3a6cbadae71..45ad78d1e5404 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -38,6 +38,8 @@ #include "xfs_log_priv.h" #include "xfs_health.h" #include "xfs_symlink_remote.h" +#include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -718,18 +720,77 @@ xrep_dinode_count_ag_rmaps( return error; } +/* Count extents and blocks for an inode given an rt rmap. */ +STATIC int +xrep_dinode_walk_rtrmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_inode *ri = priv; + int error = 0; + + if (xchk_should_terminate(ri->sc, &error)) + return error; + + /* We only care about this inode. */ + if (rec->rm_owner != ri->sc->sm->sm_ino) + return 0; + + if (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK)) + return -EFSCORRUPTED; + + ri->rt_blocks += rec->rm_blockcount; + ri->rt_extents++; + return 0; +} + +/* Count extents and blocks for an inode from all realtime rmap data. */ +STATIC int +xrep_dinode_count_rtgroup_rmaps( + struct xrep_inode *ri, + struct xfs_rtgroup *rtg) +{ + struct xfs_scrub *sc = ri->sc; + int error; + + if (!xfs_has_realtime(sc->mp) || + xrep_is_rtmeta_ino(sc, rtg, sc->sm->sm_ino)) + return 0; + + error = xrep_rtgroup_init(sc, rtg, &sc->sr, XFS_RTGLOCK_RMAP); + if (error) + return error; + + error = xfs_rmap_query_all(sc->sr.rmap_cur, xrep_dinode_walk_rtrmap, + ri); + xchk_rtgroup_btcur_free(&sc->sr); + xchk_rtgroup_free(sc, &sc->sr); + return error; +} + /* Count extents and blocks for a given inode from all rmap data. */ STATIC int xrep_dinode_count_rmaps( struct xrep_inode *ri) { struct xfs_perag *pag; + struct xfs_rtgroup *rtg; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; int error; - if (!xfs_has_rmapbt(ri->sc->mp) || xfs_has_realtime(ri->sc->mp)) + if (!xfs_has_rmapbt(ri->sc->mp)) return -EOPNOTSUPP; + for_each_rtgroup(ri->sc->mp, rgno, rtg) { + error = xrep_dinode_count_rtgroup_rmaps(ri, rtg); + if (error) { + xfs_rtgroup_rele(rtg); + return error; + } + } + for_each_perag(ri->sc->mp, agno, pag) { error = xrep_dinode_count_ag_rmaps(ri, pag); if (error) { From patchwork Sun Dec 31 21:40:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507697 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A07D3BE4A for ; Sun, 31 Dec 2023 21:40:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="R+eugcSW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2C1A6C433C8; Sun, 31 Dec 2023 21:40:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058824; bh=DdPduB72BynKanXiNEeS3gMgZnmD8MUGewOs/zaEVY0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=R+eugcSWmHBGtO7xi/X6S6O4j0ayFIF6572HL+J6pq3Yq24c28xqkIFsnMvErMh1C 8gQNW0MasRvKpIgU2Vx9sIIyqUSkTTcUrRZiYmHhvyJxwhQvmgYWvMw1LKu2xuiZ6G cSmHX28X5kqsYxmSVBHW3qwkfKsoGEBkKjbrL1yZqvqIPHJTBgv1uF6slkpai5+NFz q4wE7i4iWqyNhfq+XAwLY8XGdLDdbjOqE9uQ8hDCL9OzU+9oh1ilTTHSPOV6C/8RYt +uXeQxj4+65METWppdU8WWbBpdWqriTUw5A0sm8Q0JTpq5MTiXPEusJRycnHUgvmml HYCnWPmD5zzrg== Date: Sun, 31 Dec 2023 13:40:23 -0800 Subject: [PATCH 33/39] xfs: repair rmap btree inodes From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850431.1764998.13516823120574789010.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Teach the inode repair code how to deal with realtime rmap btree inodes that won't load properly. This is most likely moot since the filesystem generally won't mount without the rtrmapbt inodes being usable, but we'll add this for completeness. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/inode_repair.c | 46 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 45ad78d1e5404..f4f6ed6ef5120 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -895,6 +895,37 @@ xrep_dinode_bad_bmbt_fork( return false; } +/* Return true if this rmap-format ifork looks like garbage. */ +STATIC bool +xrep_dinode_bad_rmapbt_fork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + unsigned int dfork_size, + int whichfork) +{ + struct xfs_rtrmap_root *dfp; + unsigned int nrecs; + unsigned int level; + + if (whichfork != XFS_DATA_FORK) + return true; + if (dfork_size < sizeof(struct xfs_rtrmap_root)) + return true; + + dfp = XFS_DFORK_PTR(dip, whichfork); + nrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (level > sc->mp->m_rtrmap_maxlevels) + return true; + if (xfs_rtrmap_droot_space_calc(level, nrecs) > dfork_size) + return true; + if (level > 0 && nrecs == 0) + return true; + + return false; +} + /* * Check the data fork for things that will fail the ifork verifiers or the * ifork formatters. @@ -975,6 +1006,11 @@ xrep_dinode_check_dfork( XFS_DATA_FORK)) return true; break; + case XFS_DINODE_FMT_RMAP: + if (xrep_dinode_bad_rmapbt_fork(sc, dip, dfork_size, + XFS_DATA_FORK)) + return true; + break; default: return true; } @@ -1095,6 +1131,11 @@ xrep_dinode_check_afork( XFS_ATTR_FORK)) return true; break; + case XFS_DINODE_FMT_RMAP: + if (xrep_dinode_bad_rmapbt_fork(sc, dip, afork_size, + XFS_ATTR_FORK)) + return true; + break; default: return true; } @@ -1142,6 +1183,7 @@ xrep_dinode_ensure_forkoff( uint16_t mode) { struct xfs_bmdr_block *bmdr; + struct xfs_rtrmap_root *rmdr; struct xfs_scrub *sc = ri->sc; xfs_extnum_t attr_extents, data_extents; size_t bmdr_minsz = xfs_bmdr_space_calc(1); @@ -1248,6 +1290,10 @@ xrep_dinode_ensure_forkoff( bmdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK); dfork_min = xfs_bmap_broot_space(sc->mp, bmdr); break; + case XFS_DINODE_FMT_RMAP: + rmdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + dfork_min = xfs_rtrmap_broot_space(sc->mp, rmdr); + break; default: dfork_min = 0; break; From patchwork Sun Dec 31 21:40:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507698 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21550BE48 for ; Sun, 31 Dec 2023 21:40:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="h/IP8dPf" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DC4F8C433C8; Sun, 31 Dec 2023 21:40:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058840; bh=doGXoTJgJSJDtlNOlCgbG0xS4+EzPXbIRtDvD36gvzc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=h/IP8dPfiMCE+d79yHVjUDFVMwmyWTPU64vyG76jPmhbcfezqMackYHfBbhgKZhLW mFqF2OM3CM4TNLbkFTZxLoClRCbYUsb0ZV4DS+0WpRDEDacM6UMlq2Jg7PsjPoLo7h /6PP44LXvgaNOjlaqvEidU0UCMGeZWI99qR7EV6HeQkWikjGGN66ffcMWpi64mbqzM HUccbSzx50B0jGkd+lj5YaPXznq0hgG6Z1zDr2LZ8788ALyA1uRvzT3rBq0DQiirSn 9fZQPfalNNRDZ1O7I3Lu+tywaZLbQJexMWk4I6o6Uv+9h/JYqSfFf8RJWZgFedgbUi ieCyIexXFWyMQ== Date: Sun, 31 Dec 2023 13:40:39 -0800 Subject: [PATCH 34/39] xfs: online repair of realtime bitmaps for a realtime group From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850446.1764998.15564224276753988976.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong For a given rt group, regenerate the bitmap contents from the group's realtime rmap btree. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/common.h | 5 fs/xfs/scrub/repair.c | 2 fs/xfs/scrub/repair.h | 4 fs/xfs/scrub/rtbitmap.c | 19 + fs/xfs/scrub/rtbitmap.h | 56 +++ fs/xfs/scrub/rtbitmap_repair.c | 676 +++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/rtsummary_repair.c | 3 fs/xfs/scrub/scrub.c | 2 fs/xfs/scrub/tempfile.c | 15 + fs/xfs/scrub/tempswap.h | 2 fs/xfs/scrub/trace.c | 1 fs/xfs/scrub/trace.h | 149 +++++++++ 12 files changed, 922 insertions(+), 12 deletions(-) diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index c0b2c62d4bd82..5dc481f69d160 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -274,6 +274,11 @@ int xchk_metadata_inode_forks(struct xfs_scrub *sc); (sc)->mp->m_super->s_id, \ (sc)->ip ? (sc)->ip->i_ino : (sc)->sm->sm_ino, \ ##__VA_ARGS__) +#define xchk_xfile_rtgroup_descr(sc, fmt, ...) \ + kasprintf(XCHK_GFP_FLAGS, "XFS (%s): rtgroup 0x%x " fmt, \ + (sc)->mp->m_super->s_id, \ + (sc)->sa.pag ? (sc)->sr.rtg->rtg_rgno : (sc)->sm->sm_agno, \ + ##__VA_ARGS__) /* * Setting up a hook to wait for intents to drain is costly -- we have to take diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 5af1a8871de55..4789014be4a36 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -959,7 +959,7 @@ xrep_ag_init( #ifdef CONFIG_XFS_RT /* Initialize all the btree cursors for a RT repair. */ -static void +void xrep_rtgroup_btcur_init( struct xfs_scrub *sc, struct xchk_rt *sr) diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index e7fe4c5d8de5d..b66e0b5331394 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -109,6 +109,7 @@ int xrep_ag_init(struct xfs_scrub *sc, struct xfs_perag *pag, #ifdef CONFIG_XFS_RT int xrep_rtgroup_init(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, struct xchk_rt *sr, unsigned int rtglock_flags); +void xrep_rtgroup_btcur_init(struct xfs_scrub *sc, struct xchk_rt *sr); int xrep_require_rtext_inuse(struct xfs_scrub *sc, xfs_rtblock_t rtbno, xfs_filblks_t len); #else @@ -151,10 +152,12 @@ int xrep_metapath(struct xfs_scrub *sc); int xrep_rtbitmap(struct xfs_scrub *sc); int xrep_rtsummary(struct xfs_scrub *sc); int xrep_rgsuperblock(struct xfs_scrub *sc); +int xrep_rgbitmap(struct xfs_scrub *sc); #else # define xrep_rtbitmap xrep_notsupported # define xrep_rtsummary xrep_notsupported # define xrep_rgsuperblock xrep_notsupported +# define xrep_rgbitmap xrep_notsupported #endif /* CONFIG_XFS_RT */ #ifdef CONFIG_XFS_QUOTA @@ -260,6 +263,7 @@ static inline int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *x) #define xrep_dirtree xrep_notsupported #define xrep_metapath xrep_notsupported #define xrep_rgsuperblock xrep_notsupported +#define xrep_rgbitmap xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 0d823eadbdba0..47463ef336eed 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -19,9 +19,11 @@ #include "xfs_rtgroup.h" #include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" +#include "xfs_swapext.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/repair.h" +#include "scrub/tempswap.h" #include "scrub/rtbitmap.h" #include "scrub/btree.h" @@ -45,15 +47,26 @@ xchk_setup_rgbitmap( { struct xfs_mount *mp = sc->mp; struct xchk_rgbitmap *rgb; + unsigned int wordcnt = xchk_rgbitmap_wordcnt(sc); int error; - rgb = kzalloc(sizeof(struct xchk_rgbitmap), XCHK_GFP_FLAGS); + if (xchk_need_intent_drain(sc)) + xchk_fsgates_enable(sc, XCHK_FSGATES_DRAIN); + + rgb = kzalloc(struct_size(rgb, words, wordcnt), XCHK_GFP_FLAGS); if (!rgb) return -ENOMEM; rgb->sc = sc; sc->buf = rgb; + rgb->rtglock_flags = XCHK_RTGLOCK_ALL; - error = xchk_trans_alloc(sc, 0); + if (xchk_could_repair(sc)) { + error = xrep_setup_rgbitmap(sc, rgb); + if (error) + return error; + } + + error = xchk_trans_alloc(sc, rgb->rtb.resblks); if (error) return error; @@ -66,7 +79,7 @@ xchk_setup_rgbitmap( return error; error = xchk_rtgroup_init(sc, sc->sm->sm_agno, &sc->sr, - XCHK_RTGLOCK_ALL); + rgb->rtglock_flags); if (error) return error; diff --git a/fs/xfs/scrub/rtbitmap.h b/fs/xfs/scrub/rtbitmap.h index 42d50fa48a0ec..6ff897f0871f8 100644 --- a/fs/xfs/scrub/rtbitmap.h +++ b/fs/xfs/scrub/rtbitmap.h @@ -13,19 +13,75 @@ struct xchk_rtbitmap { unsigned int resblks; }; +/* + * We use an xfile to construct new bitmap blocks for the portion of the + * rtbitmap file that we're replacing. Whereas the ondisk bitmap must be + * accessed through the buffer cache, the xfile bitmap supports direct + * word-level accesses. Therefore, we create a small abstraction for linear + * access. + */ +typedef unsigned long long xrep_wordoff_t; +typedef unsigned int xrep_wordcnt_t; + +/* Mask to round an rtx down to the nearest bitmap word. */ +#define XREP_RTBMP_WORDMASK ((1ULL << XFS_NBWORDLOG) - 1) + struct xchk_rgbitmap { struct xfs_scrub *sc; struct xchk_rtbitmap rtb; +#ifdef CONFIG_XFS_ONLINE_REPAIR + struct xfs_rtalloc_args args; + struct xrep_tempswap tempswap; +#endif + /* The next free rt block that we expect to see. */ xfs_rtblock_t next_free_rtblock; + + /* file offset inside the rtbitmap where we start swapping */ + xfs_fileoff_t group_rbmoff; + + /* number of rtbitmap blocks for this group */ + xfs_filblks_t group_rbmlen; + + /* The next rtgroup block we expect to see during our rtrmapbt walk. */ + xfs_rgblock_t next_rgbno; + + /* rtgroup lock flags */ + unsigned int rtglock_flags; + + /* rtword position of xfile as we write buffers to disk. */ + xrep_wordoff_t prep_wordoff; + + /* Memory buffer full of 1s for rgbitmap repair. */ + union xfs_rtword_raw words[]; }; #ifdef CONFIG_XFS_ONLINE_REPAIR int xrep_setup_rtbitmap(struct xfs_scrub *sc, struct xchk_rtbitmap *rtb); +int xrep_setup_rgbitmap(struct xfs_scrub *sc, struct xchk_rgbitmap *rgb); + +/* + * How big should the words[] buffer be? + * + * For repairs, we want a full fsblock worth of space so that we can memcpy a + * buffer full of 1s into the xfile bitmap. The xfile bitmap doesn't have + * rtbitmap block headers, so we don't use blockwsize. Scrub doesn't use the + * words buffer at all. + */ +static inline unsigned int +xchk_rgbitmap_wordcnt( + struct xfs_scrub *sc) +{ + if (xchk_could_repair(sc)) + return sc->mp->m_sb.sb_blocksize >> XFS_WORDLOG; + return 0; +} #else # define xrep_setup_rtbitmap(sc, rtb) (0) +# define xrep_setup_rgbitmap(sc, rgb) (0) +# define xchk_rgbitmap_wordcnt(sc) (0) #endif /* CONFIG_XFS_ONLINE_REPAIR */ #endif /* __XFS_SCRUB_RTBITMAP_H__ */ diff --git a/fs/xfs/scrub/rtbitmap_repair.c b/fs/xfs/scrub/rtbitmap_repair.c index 46f5d5f605c91..db87ce51c35fc 100644 --- a/fs/xfs/scrub/rtbitmap_repair.c +++ b/fs/xfs/scrub/rtbitmap_repair.c @@ -12,17 +12,693 @@ #include "xfs_btree.h" #include "xfs_log_format.h" #include "xfs_trans.h" +#include "xfs_rtalloc.h" #include "xfs_inode.h" #include "xfs_bit.h" #include "xfs_bmap.h" #include "xfs_bmap_btree.h" +#include "xfs_rmap.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_swapext.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" #include "scrub/repair.h" #include "scrub/xfile.h" +#include "scrub/tempfile.h" +#include "scrub/tempswap.h" +#include "scrub/reap.h" #include "scrub/rtbitmap.h" +/* rt bitmap content repairs */ + +/* Set up to repair the realtime bitmap for this group. */ +int +xrep_setup_rgbitmap( + struct xfs_scrub *sc, + struct xchk_rgbitmap *rgb) +{ + struct xfs_mount *mp = sc->mp; + char *descr; + unsigned long long blocks = 0; + unsigned long long rtbmp_words; + int error; + + error = xrep_tempfile_create(sc, S_IFREG); + if (error) + return error; + + /* Create an xfile to hold our reconstructed bitmap. */ + rtbmp_words = xfs_rtbitmap_wordcount(mp, mp->m_sb.sb_rextents); + descr = xchk_xfile_rtgroup_descr(sc, "bitmap file"); + error = xfile_create(descr, rtbmp_words << XFS_WORDLOG, &sc->xfile); + kfree(descr); + if (error) + return error; + + /* + * Reserve enough blocks to write out a completely new bitmap file, + * plus twice as many blocks as we would need if we can only allocate + * one block per data fork mapping. This should cover the + * preallocation of the temporary file and swapping the extent + * mappings. + * + * We cannot use xfs_swapext_estimate because we have not yet + * constructed the replacement bitmap and therefore do not know how + * many extents it will use. By the time we do, we will have a dirty + * transaction (which we cannot drop because we cannot drop the + * rtbitmap ILOCK) and cannot ask for more reservation. + */ + blocks = mp->m_sb.sb_rbmblocks; + blocks += xfs_bmbt_calc_size(mp, blocks) * 2; + if (blocks > UINT_MAX) + return -EOPNOTSUPP; + + rgb->rtb.resblks += blocks; + + /* + * Grab support for atomic extent swapping before we allocate any + * transactions or grab ILOCKs. + */ + error = xrep_tempswap_grab_log_assist(sc); + if (error) + return error; + + /* + * We must hold rbmip with ILOCK_EXCL to use the extent swap at the end + * of the repair function. Change the desired rtglock flags. + */ + rgb->rtglock_flags &= ~XFS_RTGLOCK_BITMAP_SHARED; + rgb->rtglock_flags |= XFS_RTGLOCK_BITMAP; + return 0; +} + +static inline xrep_wordoff_t +rtx_to_wordoff( + struct xfs_mount *mp, + xfs_rtxnum_t rtx) +{ + return rtx >> XFS_NBWORDLOG; +} + +static inline xrep_wordcnt_t +rtxlen_to_wordcnt( + xfs_rtxlen_t rtxlen) +{ + return rtxlen >> XFS_NBWORDLOG; +} + +/* Helper functions to record rtwords in an xfile. */ + +static inline int +xfbmp_load( + struct xchk_rgbitmap *rgb, + xrep_wordoff_t wordoff, + xfs_rtword_t *word) +{ + union xfs_rtword_raw urk; + int error; + + ASSERT(xfs_has_rtgroups(rgb->sc->mp)); + + error = xfile_obj_load(rgb->sc->xfile, &urk, + sizeof(union xfs_rtword_raw), + wordoff << XFS_WORDLOG); + if (error) + return error; + + *word = le32_to_cpu(urk.rtg); + return 0; +} + +static inline int +xfbmp_store( + struct xchk_rgbitmap *rgb, + xrep_wordoff_t wordoff, + const xfs_rtword_t word) +{ + union xfs_rtword_raw urk; + + ASSERT(xfs_has_rtgroups(rgb->sc->mp)); + + urk.rtg = cpu_to_le32(word); + return xfile_obj_store(rgb->sc->xfile, &urk, + sizeof(union xfs_rtword_raw), + wordoff << XFS_WORDLOG); +} + +static inline int +xfbmp_copyin( + struct xchk_rgbitmap *rgb, + xrep_wordoff_t wordoff, + const union xfs_rtword_raw *word, + xrep_wordcnt_t nr_words) +{ + return xfile_obj_store(rgb->sc->xfile, word, nr_words << XFS_WORDLOG, + wordoff << XFS_WORDLOG); +} + +static inline int +xfbmp_copyout( + struct xchk_rgbitmap *rgb, + xrep_wordoff_t wordoff, + union xfs_rtword_raw *word, + xrep_wordcnt_t nr_words) +{ + return xfile_obj_load(rgb->sc->xfile, word, nr_words << XFS_WORDLOG, + wordoff << XFS_WORDLOG); +} + +/* + * Preserve the portions of the rtbitmap block for the start of this rtgroup + * that map to the previous rtgroup. + */ +STATIC int +xrep_rgbitmap_load_before( + struct xchk_rgbitmap *rgb) +{ + struct xfs_scrub *sc = rgb->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_rtgroup *rtg = sc->sr.rtg; + xrep_wordoff_t wordoff; + xfs_rtblock_t group_rtbno; + xfs_rtxnum_t group_rtx, rbmoff_rtx; + xfs_rtword_t ondisk_word; + xfs_rtword_t xfile_word; + xfs_rtword_t mask; + xrep_wordcnt_t wordcnt; + int bit; + int error; + + /* + * Compute the file offset within the rtbitmap block that corresponds + * to the start of this group, and decide if we need to read blocks + * from the group before this one. + */ + group_rtbno = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, 0); + group_rtx = xfs_rtb_to_rtx(mp, group_rtbno); + + rgb->group_rbmoff = xfs_rtx_to_rbmblock(mp, group_rtx); + rbmoff_rtx = xfs_rbmblock_to_rtx(mp, rgb->group_rbmoff); + rgb->prep_wordoff = rtx_to_wordoff(mp, rbmoff_rtx); + + trace_xrep_rgbitmap_load(rtg, rgb->group_rbmoff, rbmoff_rtx, + group_rtx - 1); + + if (rbmoff_rtx == group_rtx) + return 0; + + rgb->args.mp = sc->mp; + rgb->args.tp = sc->tp; + error = xfs_rtbitmap_read_buf(&rgb->args, rgb->group_rbmoff); + if (error) { + /* + * Reading the existing rbmblock failed, and we must deal with + * the part of the rtbitmap block that corresponds to the + * previous group. The most conservative option is to fill + * that part of the bitmap with zeroes so that it won't get + * allocated. The xfile contains zeroes already, so we can + * return. + */ + return 0; + } + + /* + * Copy full rtbitmap words into memory from the beginning of the + * ondisk block until we get to the word that corresponds to the start + * of this group. + */ + wordoff = rtx_to_wordoff(mp, rbmoff_rtx); + wordcnt = rtxlen_to_wordcnt(group_rtx - rbmoff_rtx); + if (wordcnt > 0) { + union xfs_rtword_raw *p; + + p = xfs_rbmblock_wordptr(&rgb->args, 0); + error = xfbmp_copyin(rgb, wordoff, p, wordcnt); + if (error) + goto out_rele; + + trace_xrep_rgbitmap_load_words(mp, rgb->group_rbmoff, wordoff, + wordcnt); + wordoff += wordcnt; + } + + /* + * Compute the bit position of the first rtextent of this group. If + * the bit position is zero, we don't have to RMW a partial word and + * move to the next step. + */ + bit = group_rtx & XREP_RTBMP_WORDMASK; + if (bit == 0) + goto out_rele; + + /* + * Create a mask of the bits that we want to load from disk. These + * bits track space in a different rtgroup, which is why we must + * preserve them even as we replace parts of the bitmap. + */ + mask = ~((((xfs_rtword_t)1 << (XFS_NBWORD - bit)) - 1) << bit); + + error = xfbmp_load(rgb, wordoff, &xfile_word); + if (error) + goto out_rele; + ondisk_word = xfs_rtbitmap_getword(&rgb->args, wordcnt); + + trace_xrep_rgbitmap_load_word(mp, wordoff, bit, ondisk_word, + xfile_word, mask); + + xfile_word &= ~mask; + xfile_word |= (ondisk_word & mask); + + error = xfbmp_store(rgb, wordoff, xfile_word); + if (error) + goto out_rele; + +out_rele: + xfs_rtbuf_cache_relse(&rgb->args); + return error; +} + +/* + * Preserve the portions of the rtbitmap block for the end of this rtgroup + * that map to the next rtgroup. + */ +STATIC int +xrep_rgbitmap_load_after( + struct xchk_rgbitmap *rgb) +{ + struct xfs_scrub *sc = rgb->sc; + struct xfs_mount *mp = rgb->sc->mp; + struct xfs_rtgroup *rtg = rgb->sc->sr.rtg; + xrep_wordoff_t wordoff; + xfs_rtblock_t last_rtbno; + xfs_rtxnum_t last_group_rtx, last_rbmblock_rtx; + xfs_fileoff_t last_group_rbmoff; + xfs_rtword_t ondisk_word; + xfs_rtword_t xfile_word; + xfs_rtword_t mask; + xrep_wordcnt_t wordcnt; + unsigned int last_group_word; + int bit; + int error; + + last_rtbno = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, + rtg->rtg_blockcount - 1); + last_group_rtx = xfs_rtb_to_rtx(mp, last_rtbno); + + last_group_rbmoff = xfs_rtx_to_rbmblock(mp, last_group_rtx); + rgb->group_rbmlen = last_group_rbmoff - rgb->group_rbmoff + 1; + last_rbmblock_rtx = xfs_rbmblock_to_rtx(mp, last_group_rbmoff + 1) - 1; + + trace_xrep_rgbitmap_load(rtg, last_group_rbmoff, last_group_rtx + 1, + last_rbmblock_rtx); + + if (last_rbmblock_rtx == last_group_rtx || + rtg->rtg_rgno == mp->m_sb.sb_rgcount - 1) + return 0; + + rgb->args.mp = sc->mp; + rgb->args.tp = sc->tp; + error = xfs_rtbitmap_read_buf(&rgb->args, last_group_rbmoff); + if (error) { + /* + * Reading the existing rbmblock failed, and we must deal with + * the part of the rtbitmap block that corresponds to the + * previous group. The most conservative option is to fill + * that part of the bitmap with zeroes so that it won't get + * allocated. The xfile contains zeroes already, so we can + * return. + */ + return 0; + } + + /* + * Compute the bit position of the first rtextent of the next group. + * If the bit position is zero, we don't have to RMW a partial word + * and move to the next step. + */ + wordoff = rtx_to_wordoff(mp, last_group_rtx); + bit = (last_group_rtx + 1) & XREP_RTBMP_WORDMASK; + if (bit == 0) + goto copy_words; + + /* + * Create a mask of the bits that we want to load from disk. These + * bits track space in a different rtgroup, which is why we must + * preserve them even as we replace parts of the bitmap. + */ + mask = (((xfs_rtword_t)1 << (XFS_NBWORD - bit)) - 1) << bit; + + error = xfbmp_load(rgb, wordoff, &xfile_word); + if (error) + goto out_rele; + last_group_word = xfs_rtx_to_rbmword(mp, last_group_rtx); + ondisk_word = xfs_rtbitmap_getword(&rgb->args, last_group_word); + + trace_xrep_rgbitmap_load_word(mp, wordoff, bit, ondisk_word, + xfile_word, mask); + + xfile_word &= ~mask; + xfile_word |= (ondisk_word & mask); + + error = xfbmp_store(rgb, wordoff, xfile_word); + if (error) + goto out_rele; + +copy_words: + /* Copy as many full words as we can. */ + wordoff++; + wordcnt = rtxlen_to_wordcnt(last_rbmblock_rtx - last_group_rtx); + if (wordcnt > 0) { + union xfs_rtword_raw *p; + + p = xfs_rbmblock_wordptr(&rgb->args, + mp->m_blockwsize - wordcnt); + error = xfbmp_copyin(rgb, wordoff, p, wordcnt); + if (error) + goto out_rele; + + trace_xrep_rgbitmap_load_words(mp, last_group_rbmoff, wordoff, + wordcnt); + } + +out_rele: + xfs_rtbuf_cache_relse(&rgb->args); + return error; +} + +/* Perform a logical OR operation on an rtword in the incore bitmap. */ +static int +xrep_rgbitmap_or( + struct xchk_rgbitmap *rgb, + xrep_wordoff_t wordoff, + xfs_rtword_t mask) +{ + xfs_rtword_t word; + int error; + + error = xfbmp_load(rgb, wordoff, &word); + if (error) + return error; + + trace_xrep_rgbitmap_or(rgb->sc->mp, wordoff, mask, word); + + return xfbmp_store(rgb, wordoff, word | mask); +} + +/* + * Mark as free every rt extent between the next rt block we expected to see + * in the rtrmap records and the given rt block. + */ +STATIC int +xrep_rgbitmap_mark_free( + struct xchk_rgbitmap *rgb, + xfs_rgblock_t rgbno) +{ + struct xfs_mount *mp = rgb->sc->mp; + struct xfs_rtgroup *rtg = rgb->sc->sr.rtg; + xfs_rtblock_t rtbno; + xfs_rtxnum_t startrtx; + xfs_rtxnum_t nextrtx; + xrep_wordoff_t wordoff, nextwordoff; + unsigned int bit; + unsigned int bufwsize; + xfs_extlen_t mod; + xfs_rtword_t mask; + int error; + + if (!xfs_verify_rgbext(rtg, rgb->next_rgbno, rgbno - rgb->next_rgbno)) + return -EFSCORRUPTED; + + /* + * Convert rt blocks to rt extents The block range we find must be + * aligned to an rtextent boundary on both ends. + */ + rtbno = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, rgb->next_rgbno); + startrtx = xfs_rtb_to_rtxrem(mp, rtbno, &mod); + if (mod) + return -EFSCORRUPTED; + + rtbno = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, rgbno - 1); + nextrtx = xfs_rtb_to_rtxrem(mp, rtbno, &mod) + 1; + if (mod != mp->m_sb.sb_rextsize - 1) + return -EFSCORRUPTED; + + trace_xrep_rgbitmap_record_free(mp, startrtx, nextrtx - 1); + + /* Set bits as needed to round startrtx up to the nearest word. */ + bit = startrtx & XREP_RTBMP_WORDMASK; + if (bit) { + xfs_rtblock_t len = nextrtx - startrtx; + unsigned int lastbit; + + lastbit = XFS_RTMIN(bit + len, XFS_NBWORD); + mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit; + + error = xrep_rgbitmap_or(rgb, rtx_to_wordoff(mp, startrtx), + mask); + if (error || lastbit - bit == len) + return error; + startrtx += XFS_NBWORD - bit; + } + + /* Set bits as needed to round nextrtx down to the nearest word. */ + bit = nextrtx & XREP_RTBMP_WORDMASK; + if (bit) { + mask = ((xfs_rtword_t)1 << bit) - 1; + + error = xrep_rgbitmap_or(rgb, rtx_to_wordoff(mp, nextrtx), + mask); + if (error || startrtx + bit == nextrtx) + return error; + nextrtx -= bit; + } + + trace_xrep_rgbitmap_record_free_bulk(mp, startrtx, nextrtx - 1); + + /* Set all the words in between, up to a whole fs block at once. */ + wordoff = rtx_to_wordoff(mp, startrtx); + nextwordoff = rtx_to_wordoff(mp, nextrtx); + bufwsize = mp->m_sb.sb_blocksize >> XFS_WORDLOG; + + while (wordoff < nextwordoff) { + xrep_wordoff_t rem; + xrep_wordcnt_t wordcnt; + + wordcnt = min_t(xrep_wordcnt_t, nextwordoff - wordoff, + bufwsize); + + /* + * Try to keep us aligned to the rtwords buffer to reduce the + * number of xfile writes. + */ + rem = wordoff & (bufwsize - 1); + if (rem) + wordcnt = min_t(xrep_wordcnt_t, wordcnt, + bufwsize - rem); + + error = xfbmp_copyin(rgb, wordoff, rgb->words, wordcnt); + if (error) + return error; + + wordoff += wordcnt; + } + + return 0; +} + +/* Set free space in the rtbitmap based on rtrmapbt records. */ +STATIC int +xrep_rgbitmap_walk_rtrmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xchk_rgbitmap *rgb = priv; + int error = 0; + + if (xchk_should_terminate(rgb->sc, &error)) + return error; + + if (rgb->next_rgbno < rec->rm_startblock) { + error = xrep_rgbitmap_mark_free(rgb, rec->rm_startblock); + if (error) + return error; + } + + rgb->next_rgbno = max(rgb->next_rgbno, + rec->rm_startblock + rec->rm_blockcount); + return 0; +} + +/* + * Walk the rtrmapbt to find all the gaps between records, and mark the gaps + * in the realtime bitmap that we're computing. + */ +STATIC int +xrep_rgbitmap_find_freespace( + struct xchk_rgbitmap *rgb) +{ + struct xfs_scrub *sc = rgb->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_rtgroup *rtg = sc->sr.rtg; + int error; + + /* Prepare a buffer of ones so that we can accelerate bulk setting. */ + memset(rgb->words, 0xFF, mp->m_sb.sb_blocksize); + + xrep_rtgroup_btcur_init(sc, &sc->sr); + error = xfs_rmap_query_all(sc->sr.rmap_cur, xrep_rgbitmap_walk_rtrmap, + rgb); + if (error) + goto out; + + /* + * Mark as free every possible rt extent from the last one we saw to + * the end of the rt group. + */ + if (rgb->next_rgbno < rtg->rtg_blockcount) { + error = xrep_rgbitmap_mark_free(rgb, rtg->rtg_blockcount); + if (error) + goto out; + } + +out: + xchk_rtgroup_btcur_free(&sc->sr); + return error; +} + +static int +xrep_rgbitmap_prep_buf( + struct xfs_scrub *sc, + struct xfs_buf *bp, + void *data) +{ + struct xchk_rgbitmap *rgb = data; + struct xfs_mount *mp = sc->mp; + union xfs_rtword_raw *ondisk; + int error; + + rgb->args.mp = sc->mp; + rgb->args.tp = sc->tp; + rgb->args.rbmbp = bp; + ondisk = xfs_rbmblock_wordptr(&rgb->args, 0); + rgb->args.rbmbp = NULL; + + error = xfbmp_copyout(rgb, rgb->prep_wordoff, ondisk, + mp->m_blockwsize); + if (error) + return error; + + if (xfs_has_rtgroups(sc->mp)) { + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + hdr->rt_magic = cpu_to_be32(XFS_RTBITMAP_MAGIC); + hdr->rt_owner = cpu_to_be64(sc->ip->i_ino); + hdr->rt_blkno = cpu_to_be64(xfs_buf_daddr(bp)); + hdr->rt_lsn = 0; + uuid_copy(&hdr->rt_uuid, &sc->mp->m_sb.sb_meta_uuid); + bp->b_ops = &xfs_rtbitmap_buf_ops; + } else { + bp->b_ops = &xfs_rtbuf_ops; + } + + rgb->prep_wordoff += mp->m_blockwsize; + xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_RTBITMAP_BUF); + return 0; +} + +/* Repair the realtime bitmap for this rt group. */ +int +xrep_rgbitmap( + struct xfs_scrub *sc) +{ + struct xchk_rgbitmap *rgb = sc->buf; + int error; + + /* + * We require the realtime rmapbt (and atomic file updates) to rebuild + * anything. + */ + if (!xfs_has_rtrmapbt(sc->mp)) + return -EOPNOTSUPP; + + /* + * If the start or end of this rt group happens to be in the middle of + * an rtbitmap block, try to read in the parts of the bitmap that are + * from some other group. + */ + error = xrep_rgbitmap_load_before(rgb); + if (error) + return error; + error = xrep_rgbitmap_load_after(rgb); + if (error) + return error; + + /* + * Generate the new rtbitmap data. We don't need the rtbmp information + * once this call is finished. + */ + error = xrep_rgbitmap_find_freespace(rgb); + if (error) + return error; + + /* + * Try to take ILOCK_EXCL of the temporary file. We had better be the + * only ones holding onto this inode, but we can't block while holding + * the rtbitmap file's ILOCK_EXCL. + */ + while (!xrep_tempfile_ilock_nowait(sc)) { + if (xchk_should_terminate(sc, &error)) + return error; + delay(1); + } + + /* + * Make sure we have space allocated for the part of the bitmap + * file that corresponds to this group. + */ + xfs_trans_ijoin(sc->tp, sc->ip, 0); + xfs_trans_ijoin(sc->tp, sc->tempip, 0); + error = xrep_tempfile_prealloc(sc, rgb->group_rbmoff, rgb->group_rbmlen); + if (error) + return error; + + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + return error; + + /* Copy the bitmap file that we generated. */ + error = xrep_tempfile_copyin(sc, rgb->group_rbmoff, rgb->group_rbmlen, + xrep_rgbitmap_prep_buf, rgb); + if (error) + return error; + error = xrep_tempfile_set_isize(sc, + XFS_FSB_TO_B(sc->mp, sc->mp->m_sb.sb_rbmblocks)); + if (error) + return error; + + /* + * Now swap the extents. We're done with the temporary buffer, so + * we can reuse it for the tempfile swapext information. + */ + error = xrep_tempswap_trans_reserve(sc, XFS_DATA_FORK, + rgb->group_rbmoff, rgb->group_rbmlen, &rgb->tempswap); + if (error) + return error; + + error = xrep_tempswap_contents(sc, &rgb->tempswap); + if (error) + return error; + + /* Free the old bitmap blocks if they are free. */ + return xrep_reap_ifork(sc, sc->tempip, XFS_DATA_FORK); +} + +/* rt bitmap file repairs */ + /* Set up to repair the realtime bitmap file metadata. */ int xrep_setup_rtbitmap( diff --git a/fs/xfs/scrub/rtsummary_repair.c b/fs/xfs/scrub/rtsummary_repair.c index c66373eec436d..390c6403e8908 100644 --- a/fs/xfs/scrub/rtsummary_repair.c +++ b/fs/xfs/scrub/rtsummary_repair.c @@ -168,7 +168,8 @@ xrep_rtsummary( * Now swap the extents. Nothing in repair uses the temporary buffer, * so we can reuse it for the tempfile swapext information. */ - error = xrep_tempswap_trans_reserve(sc, XFS_DATA_FORK, &rts->tempswap); + error = xrep_tempswap_trans_reserve(sc, XFS_DATA_FORK, 0, rsumblocks, + &rts->tempswap); if (error) return error; diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 428624d4f0c45..435003e5a1e92 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -472,7 +472,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .setup = xchk_setup_rgbitmap, .scrub = xchk_rgbitmap, .has = xfs_has_rtgroups, - .repair = xrep_notsupported, + .repair = xrep_rgbitmap, }, [XFS_SCRUB_TYPE_RTRMAPBT] = { /* realtime group rmapbt */ .type = ST_RTGROUP, diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c index 1dd7c4c5cff0f..e4d1c703c3195 100644 --- a/fs/xfs/scrub/tempfile.c +++ b/fs/xfs/scrub/tempfile.c @@ -608,6 +608,8 @@ STATIC int xrep_tempswap_prep_request( struct xfs_scrub *sc, int whichfork, + xfs_fileoff_t off, + xfs_filblks_t len, struct xrep_tempswap *tx) { struct xfs_swapext_req *req = &tx->req; @@ -631,10 +633,10 @@ xrep_tempswap_prep_request( /* Swap all mappings in both forks. */ req->ip1 = sc->tempip; req->ip2 = sc->ip; - req->startoff1 = 0; - req->startoff2 = 0; + req->startoff1 = off; + req->startoff2 = off; req->whichfork = whichfork; - req->blockcount = XFS_MAX_FILEOFF; + req->blockcount = len; req->req_flags = XFS_SWAP_REQ_LOGGED; /* Always swap sizes when we're swapping data fork mappings. */ @@ -801,6 +803,8 @@ int xrep_tempswap_trans_reserve( struct xfs_scrub *sc, int whichfork, + xfs_fileoff_t off, + xfs_filblks_t len, struct xrep_tempswap *tx) { int error; @@ -809,7 +813,7 @@ xrep_tempswap_trans_reserve( ASSERT(xfs_isilocked(sc->ip, XFS_ILOCK_EXCL)); ASSERT(xfs_isilocked(sc->tempip, XFS_ILOCK_EXCL)); - error = xrep_tempswap_prep_request(sc, whichfork, tx); + error = xrep_tempswap_prep_request(sc, whichfork, off, len, tx); if (error) return error; @@ -846,7 +850,8 @@ xrep_tempswap_trans_alloc( ASSERT(sc->tp == NULL); - error = xrep_tempswap_prep_request(sc, whichfork, tx); + error = xrep_tempswap_prep_request(sc, whichfork, 0, XFS_MAX_FILEOFF, + tx); if (error) return error; diff --git a/fs/xfs/scrub/tempswap.h b/fs/xfs/scrub/tempswap.h index 83900eef8cfc5..0be14f5f6382e 100644 --- a/fs/xfs/scrub/tempswap.h +++ b/fs/xfs/scrub/tempswap.h @@ -13,7 +13,7 @@ struct xrep_tempswap { int xrep_tempswap_grab_log_assist(struct xfs_scrub *sc); int xrep_tempswap_trans_reserve(struct xfs_scrub *sc, int whichfork, - struct xrep_tempswap *ti); + xfs_fileoff_t off, xfs_filblks_t len, struct xrep_tempswap *ti); int xrep_tempswap_trans_alloc(struct xfs_scrub *sc, int whichfork, struct xrep_tempswap *ti); diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c index 06f7b89920e94..4d0a6dceaa6c6 100644 --- a/fs/xfs/scrub/trace.c +++ b/fs/xfs/scrub/trace.c @@ -22,6 +22,7 @@ #include "xfs_rmap.h" #include "xfs_parent.h" #include "xfs_imeta.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/xfile.h" #include "scrub/xfarray.h" diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index f02914f129605..c90324ca86579 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -3752,6 +3752,155 @@ DEFINE_XCHK_METAPATH_EVENT(xrep_metapath_try_unlink); DEFINE_XCHK_METAPATH_EVENT(xrep_metapath_unlink); DEFINE_XCHK_METAPATH_EVENT(xrep_metapath_link); +#ifdef CONFIG_XFS_RT +DECLARE_EVENT_CLASS(xrep_rgbitmap_class, + TP_PROTO(struct xfs_mount *mp, xfs_rtxnum_t start, xfs_rtxnum_t end), + TP_ARGS(mp, start, end), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rtxnum_t, start) + __field(xfs_rtxnum_t, end) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->start = start; + __entry->end = end; + ), + TP_printk("dev %d:%d rtdev %d:%d startrtx 0x%llx endrtx 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->start, + __entry->end) +); +#define DEFINE_REPAIR_RGBITMAP_EVENT(name) \ +DEFINE_EVENT(xrep_rgbitmap_class, name, \ + TP_PROTO(struct xfs_mount *mp, xfs_rtxnum_t start, \ + xfs_rtxnum_t end), \ + TP_ARGS(mp, start, end)) +DEFINE_REPAIR_RGBITMAP_EVENT(xrep_rgbitmap_record_free); +DEFINE_REPAIR_RGBITMAP_EVENT(xrep_rgbitmap_record_free_bulk); + +TRACE_EVENT(xrep_rgbitmap_or, + TP_PROTO(struct xfs_mount *mp, unsigned long long wordoff, + xfs_rtword_t mask, xfs_rtword_t word), + TP_ARGS(mp, wordoff, mask, word), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(unsigned long long, wordoff) + __field(unsigned int, mask) + __field(unsigned int, word) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->wordoff = wordoff; + __entry->mask = mask; + __entry->word = word; + ), + TP_printk("dev %d:%d rtdev %d:%d wordoff 0x%llx mask 0x%x word 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->wordoff, + __entry->mask, + __entry->word) +); + +TRACE_EVENT(xrep_rgbitmap_load, + TP_PROTO(struct xfs_rtgroup *rtg, xfs_fileoff_t rbmoff, + xfs_rtxnum_t rtx, xfs_rtxnum_t len), + TP_ARGS(rtg, rbmoff, rtx, len), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgnumber_t, rgno) + __field(xfs_fileoff_t, rbmoff) + __field(xfs_rtxnum_t, rtx) + __field(xfs_rtxnum_t, len) + ), + TP_fast_assign( + __entry->dev = rtg->rtg_mount->m_super->s_dev; + __entry->rtdev = rtg->rtg_mount->m_rtdev_targp->bt_dev; + __entry->rgno = rtg->rtg_rgno; + __entry->rbmoff = rbmoff; + __entry->rtx = rtx; + __entry->len = len; + ), + TP_printk("dev %d:%d rtdev %d:%d rgno 0x%x rbmoff 0x%llx rtx 0x%llx rtxcount 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgno, + __entry->rbmoff, + __entry->rtx, + __entry->len) +); + +TRACE_EVENT(xrep_rgbitmap_load_words, + TP_PROTO(struct xfs_mount *mp, xfs_fileoff_t rbmoff, + unsigned long long wordoff, unsigned int wordcnt), + TP_ARGS(mp, rbmoff, wordoff, wordcnt), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_fileoff_t, rbmoff) + __field(unsigned long long, wordoff) + __field(unsigned int, wordcnt) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->rbmoff = rbmoff; + __entry->wordoff = wordoff; + __entry->wordcnt = wordcnt; + ), + TP_printk("dev %d:%d rtdev %d:%d rbmoff 0x%llx wordoff 0x%llx wordcnt 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rbmoff, + __entry->wordoff, + __entry->wordcnt) +); + +TRACE_EVENT(xrep_rgbitmap_load_word, + TP_PROTO(struct xfs_mount *mp, unsigned long long wordoff, + unsigned int bit, xfs_rtword_t ondisk_word, + xfs_rtword_t xfile_word, xfs_rtword_t word_mask), + TP_ARGS(mp, wordoff, bit, ondisk_word, xfile_word, word_mask), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(unsigned long long, wordoff) + __field(unsigned int, bit) + __field(xfs_rtword_t, ondisk_word) + __field(xfs_rtword_t, xfile_word) + __field(xfs_rtword_t, word_mask) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->wordoff = wordoff; + __entry->bit = bit; + __entry->ondisk_word = ondisk_word; + __entry->xfile_word = xfile_word; + __entry->word_mask = word_mask; + ), + TP_printk("dev %d:%d rtdev %d:%d wordoff 0x%llx bit %u ondisk 0x%x(0x%x) inmem 0x%x(0x%x) result 0x%x mask 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->wordoff, + __entry->bit, + __entry->ondisk_word, + __entry->ondisk_word & __entry->word_mask, + __entry->xfile_word, + __entry->xfile_word & ~__entry->word_mask, + (__entry->xfile_word & ~__entry->word_mask) | + (__entry->ondisk_word & __entry->word_mask), + __entry->word_mask) +); +#endif /* CONFIG_XFS_RT */ + #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */ From patchwork Sun Dec 31 21:40:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507699 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B276BE47 for ; Sun, 31 Dec 2023 21:40:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qzXJef5L" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9DAADC433C7; Sun, 31 Dec 2023 21:40:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058855; bh=n6UvvuOetrKBPwcwfJsqSFBFdGH0XEhrJVW6Zd18n5Q=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=qzXJef5Lpv19xWHY0r+GUgBSI+4hr3BHV00w3GVFvx3pb5VCQV5ZBJkE5h1OchdqP kUSK81zLD9x9muAnsjZq+xBFEw7dpqTwHuRoz6c7M/tDHD6CZoWAudK7lFJ4s45D4h xCCM9ABeTzfyyg5fxVEIF4eWGJuXNwDIxxP9UL9V0IWp3d++rDLgfnD5k1l9iy7F1F McwjZi3PfNLzHb/aUIdLzgXi5aBOJK8chmrtI3dQp7AC6cRHADpBvSOkMRhDeP+hjJ Z2UiDzdPitvjbXhLWyosf4h/bNyiBBd9wtfpwoPK1zRg4zZz2GNlqDpl8Zqw5zWjIg e8DCj7uO1r5sg== Date: Sun, 31 Dec 2023 13:40:55 -0800 Subject: [PATCH 35/39] xfs: support repairing metadata btrees rooted in metadir inodes From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850463.1764998.8425659997382974924.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Adapt the repair code so that we can stage a new btree in the data fork area of a metadir inode and reap the old blocks. We already have nearly all of the infrastructure; the only parts that were missing were the metadata inode reservation handling. Signed-off-by: Darrick J. Wong --- fs/xfs/scrub/newbt.c | 43 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/newbt.h | 1 + fs/xfs/scrub/reap.c | 41 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/reap.h | 2 ++ 4 files changed, 87 insertions(+) diff --git a/fs/xfs/scrub/newbt.c b/fs/xfs/scrub/newbt.c index 8375c8b9752a7..66a780f4e0176 100644 --- a/fs/xfs/scrub/newbt.c +++ b/fs/xfs/scrub/newbt.c @@ -19,6 +19,8 @@ #include "xfs_rmap.h" #include "xfs_ag.h" #include "xfs_defer.h" +#include "xfs_imeta.h" +#include "xfs_quota.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -120,6 +122,44 @@ xrep_newbt_init_inode( return 0; } +/* + * Initialize accounting resources for staging a new metadata inode btree. + * If the inode has an imeta space reservation, the caller must adjust the + * imeta reservation at btree commit. + */ +int +xrep_newbt_init_metadir_inode( + struct xrep_newbt *xnr, + struct xfs_scrub *sc) +{ + struct xfs_owner_info oinfo; + struct xfs_ifork *ifp; + + ASSERT(xfs_is_metadir_inode(sc->ip)); + ASSERT(XFS_IS_DQDETACHED(sc->mp, sc->ip)); + + xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK); + + ifp = kmem_cache_zalloc(xfs_ifork_cache, XCHK_GFP_FLAGS); + if (!ifp) + return -ENOMEM; + + /* + * Allocate new metadir btree blocks with XFS_AG_RESV_NONE because the + * inode metadata space reservations can only account allocated space + * to the i_nblocks. We do not want to change the inode core fields + * until we're ready to commit the new tree, so we allocate the blocks + * as if they were regular file blocks. This exposes us to a higher + * risk of the repair being cancelled due to ENOSPC. + */ + xrep_newbt_init_ag(xnr, sc, &oinfo, + XFS_INO_TO_FSB(sc->mp, sc->ip->i_ino), + XFS_AG_RESV_NONE); + xnr->ifake.if_fork = ifp; + xnr->ifake.if_fork_size = xfs_inode_fork_size(sc->ip, XFS_DATA_FORK); + return 0; +} + /* * Initialize accounting resources for staging a new btree. Callers are * expected to add their own reservations (and clean them up) manually. @@ -225,6 +265,7 @@ xrep_newbt_alloc_ag_blocks( int error = 0; ASSERT(sc->sa.pag != NULL); + ASSERT(xnr->resv != XFS_AG_RESV_IMETA); while (nr_blocks > 0) { struct xfs_alloc_arg args = { @@ -299,6 +340,8 @@ xrep_newbt_alloc_file_blocks( struct xfs_mount *mp = sc->mp; int error = 0; + ASSERT(xnr->resv != XFS_AG_RESV_IMETA); + while (nr_blocks > 0) { struct xfs_alloc_arg args = { .tp = sc->tp, diff --git a/fs/xfs/scrub/newbt.h b/fs/xfs/scrub/newbt.h index 3d804d31af24a..5ce785599287b 100644 --- a/fs/xfs/scrub/newbt.h +++ b/fs/xfs/scrub/newbt.h @@ -63,6 +63,7 @@ void xrep_newbt_init_ag(struct xrep_newbt *xnr, struct xfs_scrub *sc, enum xfs_ag_resv_type resv); int xrep_newbt_init_inode(struct xrep_newbt *xnr, struct xfs_scrub *sc, int whichfork, const struct xfs_owner_info *oinfo); +int xrep_newbt_init_metadir_inode(struct xrep_newbt *xnr, struct xfs_scrub *sc); int xrep_newbt_alloc_blocks(struct xrep_newbt *xnr, uint64_t nr_blocks); int xrep_newbt_add_extent(struct xrep_newbt *xnr, struct xfs_perag *pag, xfs_agblock_t agbno, xfs_extlen_t len); diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c index 37eb61906be18..b8c48e36d2a8d 100644 --- a/fs/xfs/scrub/reap.c +++ b/fs/xfs/scrub/reap.c @@ -33,6 +33,7 @@ #include "xfs_attr.h" #include "xfs_attr_remote.h" #include "xfs_defer.h" +#include "xfs_imeta.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -391,6 +392,8 @@ xreap_agextent_iter( xfs_fsblock_t fsbno; int error = 0; + ASSERT(rs->resv != XFS_AG_RESV_IMETA); + fsbno = XFS_AGB_TO_FSB(sc->mp, sc->sa.pag->pag_agno, agbno); /* @@ -676,6 +679,44 @@ xrep_reap_fsblocks( return 0; } +/* + * Dispose of every block of an old metadata btree that used to be rooted in a + * metadata directory file. + */ +int +xrep_reap_metadir_fsblocks( + struct xfs_scrub *sc, + struct xfsb_bitmap *bitmap) +{ + /* + * Reap old metadir btree blocks with XFS_AG_RESV_NONE because the old + * blocks are no longer mapped by the inode, and inode metadata space + * reservations can only account freed space to the i_nblocks. + */ + struct xfs_owner_info oinfo; + struct xreap_state rs = { + .sc = sc, + .oinfo = &oinfo, + .resv = XFS_AG_RESV_NONE, + }; + int error; + + ASSERT(xfs_has_rmapbt(sc->mp)); + ASSERT(sc->ip != NULL); + ASSERT(xfs_is_metadir_inode(sc->ip)); + + xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK); + + error = xfsb_bitmap_walk(bitmap, xreap_fsmeta_extent, &rs); + if (error) + return error; + + if (xreap_dirty(&rs)) + return xrep_defer_finish(sc); + + return 0; +} + /* * Metadata files are not supposed to share blocks with anything else. * If blocks are shared, we remove the reverse mapping (thus reducing the diff --git a/fs/xfs/scrub/reap.h b/fs/xfs/scrub/reap.h index 3f2f1775e29db..70e5e6bbb8d38 100644 --- a/fs/xfs/scrub/reap.h +++ b/fs/xfs/scrub/reap.h @@ -14,6 +14,8 @@ int xrep_reap_agblocks(struct xfs_scrub *sc, struct xagb_bitmap *bitmap, int xrep_reap_fsblocks(struct xfs_scrub *sc, struct xfsb_bitmap *bitmap, const struct xfs_owner_info *oinfo); int xrep_reap_ifork(struct xfs_scrub *sc, struct xfs_inode *ip, int whichfork); +int xrep_reap_metadir_fsblocks(struct xfs_scrub *sc, + struct xfsb_bitmap *bitmap); /* Buffer cache scan context. */ struct xrep_bufscan { From patchwork Sun Dec 31 21:41:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507700 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA59CBE48 for ; Sun, 31 Dec 2023 21:41:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ploUjSYz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 41E89C433C8; Sun, 31 Dec 2023 21:41:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058871; bh=TRtqJGnVr4rJFiKyLUVIoZORiVI2Dw6CF3rSbE91QPM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=ploUjSYzajhIpyS3bM8hddLiyc4l2dbPBt/k7Ma1Pn6fi8VvXhH7ySG9Vudroodzd Q8uOARW8dxRx3WWOJsuibjp5XRGtz/LI02dXnDwRAHq9rZI0RDYwLV1viuq9aBIfix JsAh5z2F+cQeZtyNqKfY+6YkezY2jpJrGlpRQA/ZxA0Zki9il1NNLlHo9oPGc5Gwot 1aZ5pi89Z0VvjgCfm5cDylQ2PJz3RYK2IcHln71jzhWdjLtQlPobveE4zw9leKuFUO /shD//AFiaZ69rSb3avappQ1NZPujzz16cate0sms+Gq4nC7YvmX3HW4wRouhKIaiZ Qgyz61iKfeP0g== Date: Sun, 31 Dec 2023 13:41:10 -0800 Subject: [PATCH 36/39] xfs: online repair of the realtime rmap btree From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850479.1764998.16240055087580879187.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Repair the realtime rmap btree while mounted. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_rtrmap_btree.c | 2 fs/xfs/libxfs/xfs_rtrmap_btree.h | 3 fs/xfs/scrub/common.c | 5 fs/xfs/scrub/repair.c | 135 +++++++ fs/xfs/scrub/repair.h | 13 + fs/xfs/scrub/rtrmap.c | 7 fs/xfs/scrub/rtrmap_repair.c | 749 ++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.c | 2 fs/xfs/scrub/trace.h | 57 +++ 10 files changed, 971 insertions(+), 3 deletions(-) create mode 100644 fs/xfs/scrub/rtrmap_repair.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 69ca5cb7e4000..f9092ae77e684 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -235,6 +235,7 @@ xfs-y += $(addprefix scrub/, \ xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ rgsuper_repair.o \ rtbitmap_repair.o \ + rtrmap_repair.o \ rtsummary_repair.o \ ) diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 355c50196e986..51a8ffff1c755 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -705,7 +705,7 @@ xfs_rtrmapbt_create_path( } /* Calculate the rtrmap btree size for some records. */ -static unsigned long long +unsigned long long xfs_rtrmapbt_calc_size( struct xfs_mount *mp, unsigned long long len) diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 108ab8c0aea44..5aec719be053f 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -202,4 +202,7 @@ struct xfs_imeta_update; int xfs_rtrmapbt_create(struct xfs_imeta_update *upd, struct xfs_inode **ipp); +unsigned long long xfs_rtrmapbt_calc_size(struct xfs_mount *mp, + unsigned long long len); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index e16185e9ddd9b..558e267924399 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -993,7 +993,10 @@ int xchk_setup_rt( struct xfs_scrub *sc) { - return xchk_trans_alloc(sc, 0); + uint resblks; + + resblks = xrep_calc_rtgroup_resblks(sc); + return xchk_trans_alloc(sc, resblks); } /* Set us up with AG headers and btree cursors. */ diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 4789014be4a36..c9c538083c722 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -39,6 +39,8 @@ #include "xfs_rtrmap_btree.h" #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" +#include "xfs_rtalloc.h" +#include "xfs_imeta.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -382,6 +384,39 @@ xrep_calc_ag_resblks( return max(max(bnobt_sz, inobt_sz), max(rmapbt_sz, refcbt_sz)); } +#ifdef CONFIG_XFS_RT +/* + * Figure out how many blocks to reserve for a rtgroup repair. We calculate + * the worst case estimate for the number of blocks we'd need to rebuild one of + * any type of per-rtgroup btree. + */ +xfs_extlen_t +xrep_calc_rtgroup_resblks( + struct xfs_scrub *sc) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_scrub_metadata *sm = sc->sm; + struct xfs_rtgroup *rtg; + xfs_extlen_t usedlen; + xfs_extlen_t rmapbt_sz = 0; + + if (!(sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)) + return 0; + + rtg = xfs_rtgroup_get(mp, sm->sm_agno); + usedlen = rtg->rtg_blockcount; + xfs_rtgroup_put(rtg); + + if (xfs_has_rmapbt(mp)) + rmapbt_sz = xfs_rtrmapbt_calc_size(mp, usedlen); + + trace_xrep_calc_rtgroup_resblks_btsize(mp, sm->sm_agno, usedlen, + rmapbt_sz); + + return rmapbt_sz; +} +#endif /* CONFIG_XFS_RT */ + /* * Reconstructing per-AG Btrees * @@ -1302,3 +1337,103 @@ xrep_is_rtmeta_ino( return false; } + +/* Check the sanity of a rmap record for a metadata btree inode. */ +int +xrep_check_ino_btree_mapping( + struct xfs_scrub *sc, + const struct xfs_rmap_irec *rec) +{ + enum xbtree_recpacking outcome; + int error; + + /* + * Metadata btree inodes never have extended attributes, and all blocks + * should have the bmbt block flag set. + */ + if ((rec->rm_flags & XFS_RMAP_ATTR_FORK) || + !(rec->rm_flags & XFS_RMAP_BMBT_BLOCK)) + return -EFSCORRUPTED; + + /* Make sure the block is within the AG. */ + if (!xfs_verify_agbext(sc->sa.pag, rec->rm_startblock, + rec->rm_blockcount)) + return -EFSCORRUPTED; + + /* Make sure this isn't free space. */ + error = xfs_alloc_has_records(sc->sa.bno_cur, rec->rm_startblock, + rec->rm_blockcount, &outcome); + if (error) + return error; + if (outcome != XBTREE_RECPACKING_EMPTY) + return -EFSCORRUPTED; + + return 0; +} + +/* + * Reset the block count of the inode being repaired, and adjust the dquot + * block usage to match. The inode must not have an xattr fork. + */ +void +xrep_inode_set_nblocks( + struct xfs_scrub *sc, + int64_t new_blocks) +{ + int64_t delta; + + delta = new_blocks - sc->ip->i_nblocks; + sc->ip->i_nblocks = new_blocks; + + xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE); + if (delta != 0) + xfs_trans_mod_dquot_byino(sc->tp, sc->ip, XFS_TRANS_DQ_BCOUNT, + delta); +} + +/* Reset the block reservation for a metadata inode. */ +int +xrep_reset_imeta_reservation( + struct xfs_scrub *sc) +{ + struct xfs_inode *ip = sc->ip; + int64_t delta; + int error; + + delta = ip->i_nblocks + ip->i_delayed_blks - ip->i_meta_resv_asked; + if (delta == 0) + return 0; + + if (delta > 0) { + int64_t give_back; + + /* Too many blocks, free from the incore reservation. */ + give_back = min_t(uint64_t, delta, ip->i_delayed_blks); + if (give_back > 0) { + xfs_mod_delalloc(ip->i_mount, -give_back); + xfs_mod_fdblocks(ip->i_mount, give_back, true); + ip->i_delayed_blks -= give_back; + } + + return 0; + } + + /* Not enough reservation, try to add more. @delta is negative here. */ + error = xfs_mod_fdblocks(sc->mp, delta, true); + while (error == -ENOSPC) { + delta++; + if (delta == 0) { + xfs_warn(sc->mp, +"Insufficient free space to reset space reservation for inode 0x%llx after repair.", + ip->i_ino); + return 0; + } + error = xfs_mod_fdblocks(sc->mp, delta, true); + } + if (error) + return error; + + xfs_mod_delalloc(sc->mp, -delta); + ip->i_delayed_blks += -delta; + return 0; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index b66e0b5331394..a382ba0478aa0 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -96,6 +96,7 @@ int xrep_setup_parent(struct xfs_scrub *sc); int xrep_setup_nlinks(struct xfs_scrub *sc); int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *resblks); int xrep_setup_dirtree(struct xfs_scrub *sc); +int xrep_setup_rtrmapbt(struct xfs_scrub *sc); /* Repair setup functions */ int xrep_setup_ag_allocbt(struct xfs_scrub *sc); @@ -112,12 +113,16 @@ int xrep_rtgroup_init(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, void xrep_rtgroup_btcur_init(struct xfs_scrub *sc, struct xchk_rt *sr); int xrep_require_rtext_inuse(struct xfs_scrub *sc, xfs_rtblock_t rtbno, xfs_filblks_t len); +xfs_extlen_t xrep_calc_rtgroup_resblks(struct xfs_scrub *sc); #else # define xrep_rtgroup_init(sc, rtg, sr, lockflags) (-ENOSYS) +# define xrep_calc_rtgroup_resblks(sc) (0) #endif /* CONFIG_XFS_RT */ bool xrep_is_rtmeta_ino(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, xfs_ino_t ino); +int xrep_check_ino_btree_mapping(struct xfs_scrub *sc, + const struct xfs_rmap_irec *rec); /* Metadata revalidators */ @@ -153,11 +158,13 @@ int xrep_rtbitmap(struct xfs_scrub *sc); int xrep_rtsummary(struct xfs_scrub *sc); int xrep_rgsuperblock(struct xfs_scrub *sc); int xrep_rgbitmap(struct xfs_scrub *sc); +int xrep_rtrmapbt(struct xfs_scrub *sc); #else # define xrep_rtbitmap xrep_notsupported # define xrep_rtsummary xrep_notsupported # define xrep_rgsuperblock xrep_notsupported # define xrep_rgbitmap xrep_notsupported +# define xrep_rtrmapbt xrep_notsupported #endif /* CONFIG_XFS_RT */ #ifdef CONFIG_XFS_QUOTA @@ -176,6 +183,8 @@ int xrep_trans_alloc_hook_dummy(struct xfs_mount *mp, void **cookiep, void xrep_trans_cancel_hook_dummy(void **cookiep, struct xfs_trans *tp); bool xrep_buf_verify_struct(struct xfs_buf *bp, const struct xfs_buf_ops *ops); +void xrep_inode_set_nblocks(struct xfs_scrub *sc, int64_t new_blocks); +int xrep_reset_imeta_reservation(struct xfs_scrub *sc); #else @@ -199,6 +208,8 @@ xrep_calc_ag_resblks( return 0; } +#define xrep_calc_rtgroup_resblks xrep_calc_ag_resblks + static inline int xrep_reset_perag_resv( struct xfs_scrub *sc) @@ -226,6 +237,7 @@ xrep_setup_nothing( #define xrep_setup_nlinks xrep_setup_nothing #define xrep_setup_dirtree xrep_setup_nothing #define xrep_setup_metapath xrep_setup_nothing +#define xrep_setup_rtrmapbt xrep_setup_nothing #define xrep_setup_inode(sc, imap) ((void)0) @@ -264,6 +276,7 @@ static inline int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *x) #define xrep_metapath xrep_notsupported #define xrep_rgsuperblock xrep_notsupported #define xrep_rgbitmap xrep_notsupported +#define xrep_rtrmapbt xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index c3e1cee81b6d2..ce21fa95a5da7 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -27,6 +27,7 @@ #include "scrub/common.h" #include "scrub/btree.h" #include "scrub/trace.h" +#include "scrub/repair.h" /* Set us up with the realtime metadata locked. */ int @@ -44,6 +45,12 @@ xchk_setup_rtrmapbt( if (!rtg) return -ENOENT; + if (xchk_could_repair(sc)) { + error = xrep_setup_rtrmapbt(sc); + if (error) + return error; + } + error = xchk_setup_rt(sc); if (error) goto out_rtg; diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c new file mode 100644 index 0000000000000..c56558ce919ab --- /dev/null +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -0,0 +1,749 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2020-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_alloc.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_inode.h" +#include "xfs_icache.h" +#include "xfs_bmap.h" +#include "xfs_bmap_btree.h" +#include "xfs_quota.h" +#include "xfs_rtalloc.h" +#include "xfs_ag.h" +#include "xfs_rtgroup.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" +#include "scrub/fsb_bitmap.h" +#include "scrub/xfile.h" +#include "scrub/xfarray.h" +#include "scrub/iscan.h" +#include "scrub/newbt.h" +#include "scrub/reap.h" + +/* + * Realtime Reverse Mapping Btree Repair + * ===================================== + * + * This isn't quite as difficult as repairing the rmap btree on the data + * device, since we only store the data fork extents of realtime files on the + * realtime device. We still have to freeze the filesystem and stop the + * background threads like we do for the rmap repair, but we only have to scan + * realtime inodes. + * + * Collecting entries for the new realtime rmap btree is easy -- all we have + * to do is generate rtrmap entries from the data fork mappings of all realtime + * files in the filesystem. We then scan the rmap btrees of the data device + * looking for extents belonging to the old btree and note them in a bitmap. + * + * To rebuild the realtime rmap btree, we bulk-load the collected mappings into + * a new btree cursor and atomically swap that into the realtime inode. Then + * we can free the blocks from the old btree. + * + * We use the 'xrep_rtrmap' prefix for all the rmap functions. + */ + +/* + * Packed rmap record. The UNWRITTEN flags are hidden in the upper bits of + * offset, just like the on-disk record. + */ +struct xrep_rtrmap_extent { + xfs_rgblock_t startblock; + xfs_extlen_t blockcount; + uint64_t owner; + uint64_t offset; +} __packed; + +/* Context for collecting rmaps */ +struct xrep_rtrmap { + /* new rtrmapbt information */ + struct xrep_newbt new_btree; + + /* rmap records generated from primary metadata */ + struct xfarray *rtrmap_records; + + struct xfs_scrub *sc; + + /* bitmap of old rtrmapbt blocks */ + struct xfsb_bitmap old_rtrmapbt_blocks; + + /* inode scan cursor */ + struct xchk_iscan iscan; + + /* get_records()'s position in the free space record array. */ + xfarray_idx_t array_cur; +}; + +/* Set us up to repair rt reverse mapping btrees. */ +int +xrep_setup_rtrmapbt( + struct xfs_scrub *sc) +{ + struct xrep_rtrmap *rr; + + rr = kzalloc(sizeof(struct xrep_rtrmap), XCHK_GFP_FLAGS); + if (!rr) + return -ENOMEM; + + rr->sc = sc; + sc->buf = rr; + return 0; +} + +/* Make sure there's nothing funny about this mapping. */ +STATIC int +xrep_rtrmap_check_mapping( + struct xfs_scrub *sc, + const struct xfs_rmap_irec *rec) +{ + xfs_rtblock_t rtbno; + + if (xfs_rtrmap_check_irec(sc->sr.rtg, rec) != NULL) + return -EFSCORRUPTED; + + /* Make sure this isn't free space. */ + rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, + rec->rm_startblock); + return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount); +} + +/* Store a reverse-mapping record. */ +static inline int +xrep_rtrmap_stash( + struct xrep_rtrmap *rr, + xfs_rgblock_t startblock, + xfs_extlen_t blockcount, + uint64_t owner, + uint64_t offset, + unsigned int flags) +{ + struct xrep_rtrmap_extent rre = { + .startblock = startblock, + .blockcount = blockcount, + .owner = owner, + }; + struct xfs_rmap_irec rmap = { + .rm_startblock = startblock, + .rm_blockcount = blockcount, + .rm_owner = owner, + .rm_offset = offset, + .rm_flags = flags, + }; + struct xfs_scrub *sc = rr->sc; + int error = 0; + + if (xchk_should_terminate(sc, &error)) + return error; + + trace_xrep_rtrmap_found(sc->mp, &rmap); + + rre.offset = xfs_rmap_irec_offset_pack(&rmap); + return xfarray_append(rr->rtrmap_records, &rre); +} + +/* Finding all file and bmbt extents. */ + +/* Context for accumulating rmaps for an inode fork. */ +struct xrep_rtrmap_ifork { + /* + * Accumulate rmap data here to turn multiple adjacent bmaps into a + * single rmap. + */ + struct xfs_rmap_irec accum; + + struct xrep_rtrmap *rr; +}; + +/* Stash an rmap that we accumulated while walking an inode fork. */ +STATIC int +xrep_rtrmap_stash_accumulated( + struct xrep_rtrmap_ifork *rf) +{ + if (rf->accum.rm_blockcount == 0) + return 0; + + return xrep_rtrmap_stash(rf->rr, rf->accum.rm_startblock, + rf->accum.rm_blockcount, rf->accum.rm_owner, + rf->accum.rm_offset, rf->accum.rm_flags); +} + +/* Accumulate a bmbt record. */ +STATIC int +xrep_rtrmap_visit_bmbt( + struct xfs_btree_cur *cur, + struct xfs_bmbt_irec *rec, + void *priv) +{ + struct xrep_rtrmap_ifork *rf = priv; + struct xfs_rmap_irec *accum = &rf->accum; + struct xfs_mount *mp = rf->rr->sc->mp; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno; + unsigned int rmap_flags = 0; + int error; + + rgbno = xfs_rtb_to_rgbno(mp, rec->br_startblock, &rgno); + if (rgno != rf->rr->sc->sr.rtg->rtg_rgno) + return 0; + + if (rec->br_state == XFS_EXT_UNWRITTEN) + rmap_flags |= XFS_RMAP_UNWRITTEN; + + /* If this bmap is adjacent to the previous one, just add it. */ + if (accum->rm_blockcount > 0 && + rec->br_startoff == accum->rm_offset + accum->rm_blockcount && + rgbno == accum->rm_startblock + accum->rm_blockcount && + rmap_flags == accum->rm_flags) { + accum->rm_blockcount += rec->br_blockcount; + return 0; + } + + /* Otherwise stash the old rmap and start accumulating a new one. */ + error = xrep_rtrmap_stash_accumulated(rf); + if (error) + return error; + + accum->rm_startblock = rgbno; + accum->rm_blockcount = rec->br_blockcount; + accum->rm_offset = rec->br_startoff; + accum->rm_flags = rmap_flags; + return 0; +} + +/* + * Iterate the block mapping btree to collect rmap records for anything in this + * fork that maps to the rt volume. Sets @mappings_done to true if we've + * scanned the block mappings in this fork. + */ +STATIC int +xrep_rtrmap_scan_bmbt( + struct xrep_rtrmap_ifork *rf, + struct xfs_inode *ip, + bool *mappings_done) +{ + struct xrep_rtrmap *rr = rf->rr; + struct xfs_btree_cur *cur; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + int error = 0; + + *mappings_done = false; + + /* + * If the incore extent cache is already loaded, we'll just use the + * incore extent scanner to record mappings. Don't bother walking the + * ondisk extent tree. + */ + if (!xfs_need_iread_extents(ifp)) + return 0; + + /* Accumulate all the mappings in the bmap btree. */ + cur = xfs_bmbt_init_cursor(rr->sc->mp, rr->sc->tp, ip, XFS_DATA_FORK); + error = xfs_bmap_query_all(cur, xrep_rtrmap_visit_bmbt, rf); + xfs_btree_del_cursor(cur, error); + if (error) + return error; + + /* Stash any remaining accumulated rmaps and exit. */ + *mappings_done = true; + return xrep_rtrmap_stash_accumulated(rf); +} + +/* + * Iterate the in-core extent cache to collect rmap records for anything in + * this fork that matches the AG. + */ +STATIC int +xrep_rtrmap_scan_iext( + struct xrep_rtrmap_ifork *rf, + struct xfs_ifork *ifp) +{ + struct xfs_bmbt_irec rec; + struct xfs_iext_cursor icur; + int error; + + for_each_xfs_iext(ifp, &icur, &rec) { + if (isnullstartblock(rec.br_startblock)) + continue; + error = xrep_rtrmap_visit_bmbt(NULL, &rec, rf); + if (error) + return error; + } + + return xrep_rtrmap_stash_accumulated(rf); +} + +/* Find all the extents on the realtime device mapped by an inode fork. */ +STATIC int +xrep_rtrmap_scan_dfork( + struct xrep_rtrmap *rr, + struct xfs_inode *ip) +{ + struct xrep_rtrmap_ifork rf = { + .accum = { .rm_owner = ip->i_ino, }, + .rr = rr, + }; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + int error = 0; + + if (ifp->if_format == XFS_DINODE_FMT_BTREE) { + bool mappings_done; + + /* + * Scan the bmbt for mappings. If the incore extent tree is + * loaded, we want to scan the cached mappings since that's + * faster when the extent counts are very high. + */ + error = xrep_rtrmap_scan_bmbt(&rf, ip, &mappings_done); + if (error || mappings_done) + return error; + } else if (ifp->if_format != XFS_DINODE_FMT_EXTENTS) { + /* realtime data forks should only be extents or btree */ + return -EFSCORRUPTED; + } + + /* Scan incore extent cache. */ + return xrep_rtrmap_scan_iext(&rf, ifp); +} + +/* Record reverse mappings for a file. */ +STATIC int +xrep_rtrmap_scan_inode( + struct xrep_rtrmap *rr, + struct xfs_inode *ip) +{ + unsigned int lock_mode; + int error = 0; + + /* Skip the rt rmap btree inode. */ + if (rr->sc->ip == ip) + return 0; + + lock_mode = xfs_ilock_data_map_shared(ip); + + /* Check the data fork if it's on the realtime device. */ + if (XFS_IS_REALTIME_INODE(ip)) { + error = xrep_rtrmap_scan_dfork(rr, ip); + if (error) + goto out_unlock; + } + + xchk_iscan_mark_visited(&rr->iscan, ip); +out_unlock: + xfs_iunlock(ip, lock_mode); + return error; +} + +/* Record extents that belong to the realtime rmap inode. */ +STATIC int +xrep_rtrmap_walk_rmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_rtrmap *rr = priv; + struct xfs_mount *mp = cur->bc_mp; + xfs_fsblock_t fsbno; + int error = 0; + + if (xchk_should_terminate(rr->sc, &error)) + return error; + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != rr->sc->ip->i_ino) + return 0; + + error = xrep_check_ino_btree_mapping(rr->sc, rec); + if (error) + return error; + + fsbno = XFS_AGB_TO_FSB(mp, cur->bc_ag.pag->pag_agno, + rec->rm_startblock); + + return xfsb_bitmap_set(&rr->old_rtrmapbt_blocks, fsbno, + rec->rm_blockcount); +} + +/* Scan one AG for reverse mappings for the realtime rmap btree. */ +STATIC int +xrep_rtrmap_scan_ag( + struct xrep_rtrmap *rr, + struct xfs_perag *pag) +{ + struct xfs_scrub *sc = rr->sc; + int error; + + error = xrep_ag_init(sc, pag, &sc->sa); + if (error) + return error; + + error = xfs_rmap_query_all(sc->sa.rmap_cur, xrep_rtrmap_walk_rmap, rr); + xchk_ag_free(sc, &sc->sa); + return error; +} + +STATIC int +xrep_rtrmap_find_super_rmaps( + struct xrep_rtrmap *rr) +{ + struct xfs_scrub *sc = rr->sc; + + /* Create a record for the rtgroup superblock. */ + return xrep_rtrmap_stash(rr, 0, sc->mp->m_sb.sb_rextsize, + XFS_RMAP_OWN_FS, 0, 0); +} + +/* Generate all the reverse-mappings for the realtime device. */ +STATIC int +xrep_rtrmap_find_rmaps( + struct xrep_rtrmap *rr) +{ + struct xfs_scrub *sc = rr->sc; + struct xfs_perag *pag; + struct xfs_inode *ip; + xfs_agnumber_t agno; + int error; + + /* Generate rmaps for the rtgroup superblock */ + error = xrep_rtrmap_find_super_rmaps(rr); + if (error) + return error; + + /* + * Set up for a potentially lengthy filesystem scan by reducing our + * transaction resource usage for the duration. Specifically: + * + * Unlock the realtime metadata inodes and cancel the transaction to + * release the log grant space while we scan the filesystem. + * + * Create a new empty transaction to eliminate the possibility of the + * inode scan deadlocking on cyclical metadata. + * + * We pass the empty transaction to the file scanning function to avoid + * repeatedly cycling empty transactions. This can be done even though + * we take the IOLOCK to quiesce the file because empty transactions + * do not take sb_internal. + */ + xchk_trans_cancel(sc); + xchk_rtgroup_unlock(sc, &sc->sr); + error = xchk_trans_alloc_empty(sc); + if (error) + return error; + + while ((error = xchk_iscan_iter(&rr->iscan, &ip)) == 1) { + error = xrep_rtrmap_scan_inode(rr, ip); + xchk_irele(sc, ip); + if (error) + break; + + if (xchk_should_terminate(sc, &error)) + break; + } + xchk_iscan_iter_finish(&rr->iscan); + if (error) + return error; + + /* + * Switch out for a real transaction and lock the RT metadata in + * preparation for building a new tree. + */ + xchk_trans_cancel(sc); + error = xchk_setup_rt(sc); + if (error) + return error; + error = xchk_rtgroup_drain_and_lock(sc, &sc->sr, XCHK_RTGLOCK_ALL); + if (error) + return error; + + /* Scan for old rtrmap blocks. */ + for_each_perag(sc->mp, agno, pag) { + error = xrep_rtrmap_scan_ag(rr, pag); + if (error) { + xfs_perag_rele(pag); + return error; + } + } + + return 0; +} + +/* Building the new rtrmap btree. */ + +/* Retrieve rtrmapbt data for bulk load. */ +STATIC int +xrep_rtrmap_get_records( + struct xfs_btree_cur *cur, + unsigned int idx, + struct xfs_btree_block *block, + unsigned int nr_wanted, + void *priv) +{ + struct xrep_rtrmap_extent rec; + struct xfs_rmap_irec *irec = &cur->bc_rec.r; + struct xrep_rtrmap *rr = priv; + union xfs_btree_rec *block_rec; + unsigned int loaded; + int error; + + for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { + error = xfarray_load_next(rr->rtrmap_records, &rr->array_cur, + &rec); + if (error) + return error; + + irec->rm_startblock = rec.startblock; + irec->rm_blockcount = rec.blockcount; + irec->rm_owner = rec.owner; + + if (xfs_rmap_irec_offset_unpack(rec.offset, irec) != NULL) + return -EFSCORRUPTED; + + error = xrep_rtrmap_check_mapping(rr->sc, irec); + if (error) + return error; + + block_rec = xfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + } + + return loaded; +} + +/* Feed one of the new btree blocks to the bulk loader. */ +STATIC int +xrep_rtrmap_claim_block( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr, + void *priv) +{ + struct xrep_rtrmap *rr = priv; + + return xrep_newbt_claim_block(cur, &rr->new_btree, ptr); +} + +/* Figure out how much space we need to create the incore btree root block. */ +STATIC size_t +xrep_rtrmap_iroot_size( + struct xfs_btree_cur *cur, + unsigned int level, + unsigned int nr_this_level, + void *priv) +{ + return xfs_rtrmap_broot_space_calc(cur->bc_mp, level, nr_this_level); +} + +/* + * Use the collected rmap information to stage a new rmap btree. If this is + * successful we'll return with the new btree root information logged to the + * repair transaction but not yet committed. This implements section (III) + * above. + */ +STATIC int +xrep_rtrmap_build_new_tree( + struct xrep_rtrmap *rr) +{ + struct xfs_scrub *sc = rr->sc; + struct xfs_rtgroup *rtg = sc->sr.rtg; + struct xfs_btree_cur *rmap_cur; + uint64_t nr_records; + int error; + + /* + * Prepare to construct the new btree by reserving disk space for the + * new btree and setting up all the accounting information we'll need + * to root the new btree while it's under construction and before we + * attach it to the realtime rmapbt inode. + */ + error = xrep_newbt_init_metadir_inode(&rr->new_btree, sc); + if (error) + return error; + + rr->new_btree.bload.get_records = xrep_rtrmap_get_records; + rr->new_btree.bload.claim_block = xrep_rtrmap_claim_block; + rr->new_btree.bload.iroot_size = xrep_rtrmap_iroot_size; + + rmap_cur = xfs_rtrmapbt_stage_cursor(sc->mp, rtg, rtg->rtg_rmapip, + &rr->new_btree.ifake); + + nr_records = xfarray_length(rr->rtrmap_records); + + /* Compute how many blocks we'll need for the rmaps collected. */ + error = xfs_btree_bload_compute_geometry(rmap_cur, + &rr->new_btree.bload, nr_records); + if (error) + goto err_cur; + + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + goto err_cur; + + /* + * Guess how many blocks we're going to need to rebuild an entire + * rtrmapbt from the number of extents we found, and pump up our + * transaction to have sufficient block reservation. We're allowed + * to exceed quota to repair inconsistent metadata, though this is + * unlikely. + */ + error = xfs_trans_reserve_more_inode(sc->tp, rtg->rtg_rmapip, + rr->new_btree.bload.nr_blocks, 0, true); + if (error) + goto err_cur; + + /* Reserve the space we'll need for the new btree. */ + error = xrep_newbt_alloc_blocks(&rr->new_btree, + rr->new_btree.bload.nr_blocks); + if (error) + goto err_cur; + + /* Add all observed rmap records. */ + rr->new_btree.ifake.if_fork->if_format = XFS_DINODE_FMT_RMAP; + rr->array_cur = XFARRAY_CURSOR_INIT; + error = xfs_btree_bload(rmap_cur, &rr->new_btree.bload, rr); + if (error) + goto err_cur; + + /* + * Install the new rtrmap btree in the inode. After this point the old + * btree is no longer accessible, the new tree is live, and we can + * delete the cursor. + */ + xfs_rtrmapbt_commit_staged_btree(rmap_cur, sc->tp); + xrep_inode_set_nblocks(rr->sc, rr->new_btree.ifake.if_blocks); + xfs_btree_del_cursor(rmap_cur, 0); + + /* Dispose of any unused blocks and the accounting information. */ + error = xrep_newbt_commit(&rr->new_btree); + if (error) + return error; + + return xrep_roll_trans(sc); + +err_cur: + xfs_btree_del_cursor(rmap_cur, error); + xrep_newbt_cancel(&rr->new_btree); + return error; +} + +/* Reaping the old btree. */ + +/* Reap the old rtrmapbt blocks. */ +STATIC int +xrep_rtrmap_remove_old_tree( + struct xrep_rtrmap *rr) +{ + int error; + + /* + * Free all the extents that were allocated to the former rtrmapbt and + * aren't cross-linked with something else. + */ + error = xrep_reap_metadir_fsblocks(rr->sc, &rr->old_rtrmapbt_blocks); + if (error) + return error; + + /* + * Ensure the proper reservation for the rtrmap inode so that we don't + * fail to expand the new btree. + */ + return xrep_reset_imeta_reservation(rr->sc); +} + +/* Set up the filesystem scan components. */ +STATIC int +xrep_rtrmap_setup_scan( + struct xrep_rtrmap *rr) +{ + struct xfs_scrub *sc = rr->sc; + char *descr; + int error; + + xfsb_bitmap_init(&rr->old_rtrmapbt_blocks); + + /* Set up some storage */ + descr = xchk_xfile_rtgroup_descr(sc, "reverse mapping records"); + error = xfarray_create(descr, 0, sizeof(struct xrep_rtrmap_extent), + &rr->rtrmap_records); + kfree(descr); + if (error) + goto out_bitmap; + + /* Retry iget every tenth of a second for up to 30 seconds. */ + xchk_iscan_start(sc, 30000, 100, &rr->iscan); + return 0; + +out_bitmap: + xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); + return error; +} + +/* Tear down scan components. */ +STATIC void +xrep_rtrmap_teardown( + struct xrep_rtrmap *rr) +{ + xchk_iscan_teardown(&rr->iscan); + xfarray_destroy(rr->rtrmap_records); + xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); +} + +/* Repair the realtime rmap btree. */ +int +xrep_rtrmapbt( + struct xfs_scrub *sc) +{ + struct xrep_rtrmap *rr = sc->buf; + int error; + + /* Functionality is not yet complete. */ + return xrep_notsupported(sc); + + /* Make sure any problems with the fork are fixed. */ + error = xrep_metadata_inode_forks(sc); + if (error) + return error; + + error = xrep_rtrmap_setup_scan(rr); + if (error) + return error; + + /* Collect rmaps for realtime files. */ + error = xrep_rtrmap_find_rmaps(rr); + if (error) + goto out_records; + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + /* Rebuild the rtrmap information. */ + error = xrep_rtrmap_build_new_tree(rr); + if (error) + goto out_records; + + /* Kill the old tree. */ + error = xrep_rtrmap_remove_old_tree(rr); + if (error) + goto out_records; + +out_records: + xrep_rtrmap_teardown(rr); + return error; +} diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 435003e5a1e92..8193ad6702b4d 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -479,7 +479,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .setup = xchk_setup_rtrmapbt, .scrub = xchk_rtrmapbt, .has = xfs_has_rtrmapbt, - .repair = xrep_notsupported, + .repair = xrep_rtrmapbt, }, }; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index c90324ca86579..95fdb82660dc3 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -2314,6 +2314,32 @@ TRACE_EVENT(xrep_calc_ag_resblks_btsize, __entry->rmapbt_sz, __entry->refcbt_sz) ) + +#ifdef CONFIG_XFS_RT +TRACE_EVENT(xrep_calc_rtgroup_resblks_btsize, + TP_PROTO(struct xfs_mount *mp, xfs_rgnumber_t rgno, + xfs_rgblock_t usedlen, xfs_rgblock_t rmapbt_sz), + TP_ARGS(mp, rgno, usedlen, rmapbt_sz), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_rgnumber_t, rgno) + __field(xfs_rgblock_t, usedlen) + __field(xfs_rgblock_t, rmapbt_sz) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rgno = rgno; + __entry->usedlen = usedlen; + __entry->rmapbt_sz = rmapbt_sz; + ), + TP_printk("dev %d:%d rgno 0x%x usedlen %u rmapbt %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->rgno, + __entry->usedlen, + __entry->rmapbt_sz) +); +#endif /* CONFIG_XFS_RT */ + TRACE_EVENT(xrep_reset_counters, TP_PROTO(struct xfs_mount *mp, struct xchk_fscounters *fsc), TP_ARGS(mp, fsc), @@ -3899,6 +3925,37 @@ TRACE_EVENT(xrep_rgbitmap_load_word, (__entry->ondisk_word & __entry->word_mask), __entry->word_mask) ); + +TRACE_EVENT(xrep_rtrmap_found, + TP_PROTO(struct xfs_mount *mp, const struct xfs_rmap_irec *rec), + TP_ARGS(mp, rec), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgblock_t, rgbno) + __field(xfs_extlen_t, len) + __field(uint64_t, owner) + __field(uint64_t, offset) + __field(unsigned int, flags) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->rgbno = rec->rm_startblock; + __entry->len = rec->rm_blockcount; + __entry->owner = rec->rm_owner; + __entry->offset = rec->rm_offset; + __entry->flags = rec->rm_flags; + ), + TP_printk("dev %d:%d rtdev %d:%d rgbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgbno, + __entry->len, + __entry->owner, + __entry->offset, + __entry->flags) +); #endif /* CONFIG_XFS_RT */ #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */ From patchwork Sun Dec 31 21:41:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507701 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FE51BE47 for ; Sun, 31 Dec 2023 21:41:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SyTf8SQ+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EAEDAC433C8; Sun, 31 Dec 2023 21:41:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058887; bh=AGNibaqhcAEWJRh0rnewZY5wZq9a3+ScxyapTVUC4Xc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=SyTf8SQ+6YG6s4IEsiZMdcAFHHUdI5Iv4B8Pjy0JHrZPlqOGnb+ENFqZiQCnoAzdC xa8nTuWjnGMMNRohGfLM4zVtpUmg3WgDxbGZ9zpbLPr5Y5sk24bTIO+/9qcJ+R9a3j IeBJSVNR6GwqW7m4AK0XuA7CIXT5iB2VfK5TvA8E4ofFWxlCE82iVug5dYLSKhCgXe qJB7u5zCfj8Zp7bqfyiUIZRLYyUBp2PXKvQf49dBbUXP7vs/k3RImXQnYvpp9RLwz/ 9pjoGP3jzChS3O11GXCuG/O1uMApyGkOvad9jY3QMjzBJ3EULDDp6s9BIeDsEseWHz 9Ksf9h04tjTWA== Date: Sun, 31 Dec 2023 13:41:26 -0800 Subject: [PATCH 37/39] xfs: create a shadow rmap btree during realtime rmap repair From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850496.1764998.9469664524257812727.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create an in-memory btree of rmap records instead of an array. This enables us to do live record collection instead of freezing the fs. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_btree.c | 2 fs/xfs/libxfs/xfs_btree.h | 1 fs/xfs/libxfs/xfs_rmap.c | 5 + fs/xfs/libxfs/xfs_rtrmap_btree.c | 123 ++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 9 ++ fs/xfs/scrub/rtrmap_repair.c | 158 +++++++++++++++++++++++++++----------- fs/xfs/scrub/xfbtree.c | 3 + 7 files changed, 255 insertions(+), 46 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index a294641d91832..2a181cf30299f 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -491,6 +491,8 @@ xfs_btree_del_cursor( if (cur->bc_flags & XFS_BTREE_IN_XFILE) { if (cur->bc_mem.pag) xfs_perag_put(cur->bc_mem.pag); + if (cur->bc_mem.rtg) + xfs_rtgroup_put(cur->bc_mem.rtg); } kmem_cache_free(cur->bc_cache, cur); } diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index 3559cf5d3a653..4753a5c847616 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -269,6 +269,7 @@ struct xfs_btree_cur_mem { struct xfbtree *xfbtree; struct xfs_buf *head_bp; struct xfs_perag *pag; + struct xfs_rtgroup *rtg; }; struct xfs_btree_level { diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 8766805ed1343..d100e03f9560f 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -329,8 +329,11 @@ xfs_rmap_check_btrec( struct xfs_btree_cur *cur, const struct xfs_rmap_irec *irec) { - if (cur->bc_btnum == XFS_BTNUM_RTRMAP) + if (cur->bc_btnum == XFS_BTNUM_RTRMAP) { + if (cur->bc_flags & XFS_BTREE_IN_XFILE) + return xfs_rtrmap_check_irec(cur->bc_mem.rtg, irec); return xfs_rtrmap_check_irec(cur->bc_ino.rtg, irec); + } if (cur->bc_flags & XFS_BTREE_IN_XFILE) return xfs_rmap_check_irec(cur->bc_mem.pag, irec); diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 51a8ffff1c755..3084153af3a43 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -28,6 +28,9 @@ #include "xfs_rtgroup.h" #include "xfs_bmap.h" #include "xfs_health.h" +#include "scrub/xfile.h" +#include "scrub/xfbtree.h" +#include "xfs_btree_mem.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -557,6 +560,126 @@ xfs_rtrmapbt_stage_cursor( return cur; } +#ifdef CONFIG_XFS_BTREE_IN_XFILE +/* + * Validate an in-memory realtime rmap btree block. Callers are allowed to + * generate an in-memory btree even if the ondisk feature is not enabled. + */ +static xfs_failaddr_t +xfs_rtrmapbt_mem_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_mount; + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + xfs_failaddr_t fa; + unsigned int level; + + if (!xfs_verify_magic(bp, block->bb_magic)) + return __this_address; + + fa = xfs_btree_lblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN); + if (fa) + return fa; + + level = be16_to_cpu(block->bb_level); + if (xfs_has_rmapbt(mp)) { + if (level >= mp->m_rtrmap_maxlevels) + return __this_address; + } else { + if (level >= xfs_rtrmapbt_maxlevels_ondisk()) + return __this_address; + } + + return xfbtree_lblock_verify(bp, + xfs_rtrmapbt_maxrecs(mp, xfo_to_b(1), level == 0)); +} + +static void +xfs_rtrmapbt_mem_rw_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa = xfs_rtrmapbt_mem_verify(bp); + + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); +} + +/* skip crc checks on in-memory btrees to save time */ +static const struct xfs_buf_ops xfs_rtrmapbt_mem_buf_ops = { + .name = "xfs_rtrmapbt_mem", + .magic = { 0, cpu_to_be32(XFS_RTRMAP_CRC_MAGIC) }, + .verify_read = xfs_rtrmapbt_mem_rw_verify, + .verify_write = xfs_rtrmapbt_mem_rw_verify, + .verify_struct = xfs_rtrmapbt_mem_verify, +}; + +static const struct xfs_btree_ops xfs_rtrmapbt_mem_ops = { + .rec_len = sizeof(struct xfs_rmap_rec), + .key_len = 2 * sizeof(struct xfs_rmap_key), + .lru_refs = XFS_RMAP_BTREE_REF, + .geom_flags = XFS_BTREE_CRC_BLOCKS | XFS_BTREE_OVERLAPPING | + XFS_BTREE_LONG_PTRS | XFS_BTREE_IN_XFILE, + + .dup_cursor = xfbtree_dup_cursor, + .set_root = xfbtree_set_root, + .alloc_block = xfbtree_alloc_block, + .free_block = xfbtree_free_block, + .get_minrecs = xfbtree_get_minrecs, + .get_maxrecs = xfbtree_get_maxrecs, + .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, + .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, + .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, + .init_ptr_from_cur = xfbtree_init_ptr_from_cur, + .key_diff = xfs_rtrmapbt_key_diff, + .buf_ops = &xfs_rtrmapbt_mem_buf_ops, + .diff_two_keys = xfs_rtrmapbt_diff_two_keys, + .keys_inorder = xfs_rtrmapbt_keys_inorder, + .recs_inorder = xfs_rtrmapbt_recs_inorder, + .keys_contiguous = xfs_rtrmapbt_keys_contiguous, +}; + +/* Create a cursor for an in-memory btree. */ +struct xfs_btree_cur * +xfs_rtrmapbt_mem_cursor( + struct xfs_rtgroup *rtg, + struct xfs_trans *tp, + struct xfs_buf *head_bp, + struct xfbtree *xfbtree) +{ + struct xfs_btree_cur *cur; + struct xfs_mount *mp = rtg->rtg_mount; + + /* Overlapping btree; 2 keys per pointer. */ + cur = xfs_btree_alloc_cursor(mp, tp, XFS_BTNUM_RTRMAP, + &xfs_rtrmapbt_mem_ops, mp->m_rtrmap_maxlevels, + xfs_rtrmapbt_cur_cache); + cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_rmap_2); + cur->bc_mem.xfbtree = xfbtree; + cur->bc_mem.head_bp = head_bp; + cur->bc_nlevels = xfs_btree_mem_head_nlevels(head_bp); + + cur->bc_mem.rtg = xfs_rtgroup_hold(rtg); + return cur; +} + +int +xfs_rtrmapbt_mem_create( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_buftarg *target, + struct xfbtree **xfbtreep) +{ + struct xfbtree_config cfg = { + .btree_ops = &xfs_rtrmapbt_mem_ops, + .target = target, + .flags = XFBTREE_DIRECT_MAP, + .owner = rgno, + }; + + return xfbtree_create(mp, &cfg, xfbtreep); +} +#endif /* CONFIG_XFS_BTREE_IN_XFILE */ + /* * Install a new rt reverse mapping btree root. Caller is responsible for * invalidating and freeing the old btree blocks. diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 5aec719be053f..b0a8e8d89f9eb 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -205,4 +205,13 @@ int xfs_rtrmapbt_create(struct xfs_imeta_update *upd, struct xfs_inode **ipp); unsigned long long xfs_rtrmapbt_calc_size(struct xfs_mount *mp, unsigned long long len); +#ifdef CONFIG_XFS_BTREE_IN_XFILE +struct xfbtree; +struct xfs_btree_cur *xfs_rtrmapbt_mem_cursor(struct xfs_rtgroup *rtg, + struct xfs_trans *tp, struct xfs_buf *mhead_bp, + struct xfbtree *xfbtree); +int xfs_rtrmapbt_mem_create(struct xfs_mount *mp, xfs_rgnumber_t rgno, + struct xfs_buftarg *target, struct xfbtree **xfbtreep); +#endif /* CONFIG_XFS_BTREE_IN_XFILE */ + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c index c56558ce919ab..00e606dfc6842 100644 --- a/fs/xfs/scrub/rtrmap_repair.c +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -12,6 +12,7 @@ #include "xfs_defer.h" #include "xfs_btree.h" #include "xfs_btree_staging.h" +#include "xfs_btree_mem.h" #include "xfs_bit.h" #include "xfs_log_format.h" #include "xfs_trans.h" @@ -41,6 +42,7 @@ #include "scrub/iscan.h" #include "scrub/newbt.h" #include "scrub/reap.h" +#include "scrub/xfbtree.h" /* * Realtime Reverse Mapping Btree Repair @@ -64,24 +66,13 @@ * We use the 'xrep_rtrmap' prefix for all the rmap functions. */ -/* - * Packed rmap record. The UNWRITTEN flags are hidden in the upper bits of - * offset, just like the on-disk record. - */ -struct xrep_rtrmap_extent { - xfs_rgblock_t startblock; - xfs_extlen_t blockcount; - uint64_t owner; - uint64_t offset; -} __packed; - /* Context for collecting rmaps */ struct xrep_rtrmap { /* new rtrmapbt information */ struct xrep_newbt new_btree; /* rmap records generated from primary metadata */ - struct xfarray *rtrmap_records; + struct xfbtree *rtrmap_btree; struct xfs_scrub *sc; @@ -91,8 +82,11 @@ struct xrep_rtrmap { /* inode scan cursor */ struct xchk_iscan iscan; - /* get_records()'s position in the free space record array. */ - xfarray_idx_t array_cur; + /* in-memory btree cursor for the ->get_blocks walk */ + struct xfs_btree_cur *mcur; + + /* Number of records we're staging in the new btree. */ + uint64_t nr_records; }; /* Set us up to repair rt reverse mapping btrees. */ @@ -101,6 +95,14 @@ xrep_setup_rtrmapbt( struct xfs_scrub *sc) { struct xrep_rtrmap *rr; + char *descr; + int error; + + descr = xchk_xfile_rtgroup_descr(sc, "reverse mapping records"); + error = xrep_setup_buftarg(sc, descr); + kfree(descr); + if (error) + return error; rr = kzalloc(sizeof(struct xrep_rtrmap), XCHK_GFP_FLAGS); if (!rr) @@ -138,11 +140,6 @@ xrep_rtrmap_stash( uint64_t offset, unsigned int flags) { - struct xrep_rtrmap_extent rre = { - .startblock = startblock, - .blockcount = blockcount, - .owner = owner, - }; struct xfs_rmap_irec rmap = { .rm_startblock = startblock, .rm_blockcount = blockcount, @@ -151,6 +148,8 @@ xrep_rtrmap_stash( .rm_flags = flags, }; struct xfs_scrub *sc = rr->sc; + struct xfs_btree_cur *mcur; + struct xfs_buf *mhead_bp; int error = 0; if (xchk_should_terminate(sc, &error)) @@ -158,8 +157,23 @@ xrep_rtrmap_stash( trace_xrep_rtrmap_found(sc->mp, &rmap); - rre.offset = xfs_rmap_irec_offset_pack(&rmap); - return xfarray_append(rr->rtrmap_records, &rre); + /* Add entry to in-memory btree. */ + error = xfbtree_head_read_buf(rr->rtrmap_btree, sc->tp, &mhead_bp); + if (error) + return error; + + mcur = xfs_rtrmapbt_mem_cursor(sc->sr.rtg, sc->tp, mhead_bp, + rr->rtrmap_btree); + error = xfs_rmap_map_raw(mcur, &rmap); + xfs_btree_del_cursor(mcur, error); + if (error) + goto out_cancel; + + return xfbtree_trans_commit(rr->rtrmap_btree, sc->tp); + +out_cancel: + xfbtree_trans_cancel(rr->rtrmap_btree, sc->tp); + return error; } /* Finding all file and bmbt extents. */ @@ -402,6 +416,24 @@ xrep_rtrmap_scan_ag( return error; } +/* Count and check all collected records. */ +STATIC int +xrep_rtrmap_check_record( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_rtrmap *rr = priv; + int error; + + error = xrep_rtrmap_check_mapping(rr->sc, rec); + if (error) + return error; + + rr->nr_records++; + return 0; +} + STATIC int xrep_rtrmap_find_super_rmaps( struct xrep_rtrmap *rr) @@ -421,6 +453,8 @@ xrep_rtrmap_find_rmaps( struct xfs_scrub *sc = rr->sc; struct xfs_perag *pag; struct xfs_inode *ip; + struct xfs_buf *mhead_bp; + struct xfs_btree_cur *mcur; xfs_agnumber_t agno; int error; @@ -484,7 +518,25 @@ xrep_rtrmap_find_rmaps( } } - return 0; + /* + * Now that we have everything locked again, we need to count the + * number of rmap records stashed in the btree. This should reflect + * all actively-owned rt files in the filesystem. At the same time, + * check all our records before we start building a new btree, which + * requires the rtbitmap lock. + */ + error = xfbtree_head_read_buf(rr->rtrmap_btree, NULL, &mhead_bp); + if (error) + return error; + + mcur = xfs_rtrmapbt_mem_cursor(rr->sc->sr.rtg, NULL, mhead_bp, + rr->rtrmap_btree); + rr->nr_records = 0; + error = xfs_rmap_query_all(mcur, xrep_rtrmap_check_record, rr); + xfs_btree_del_cursor(mcur, error); + xfs_buf_relse(mhead_bp); + + return error; } /* Building the new rtrmap btree. */ @@ -498,29 +550,25 @@ xrep_rtrmap_get_records( unsigned int nr_wanted, void *priv) { - struct xrep_rtrmap_extent rec; - struct xfs_rmap_irec *irec = &cur->bc_rec.r; struct xrep_rtrmap *rr = priv; union xfs_btree_rec *block_rec; unsigned int loaded; int error; for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { - error = xfarray_load_next(rr->rtrmap_records, &rr->array_cur, - &rec); + int stat = 0; + + error = xfs_btree_increment(rr->mcur, 0, &stat); if (error) return error; - - irec->rm_startblock = rec.startblock; - irec->rm_blockcount = rec.blockcount; - irec->rm_owner = rec.owner; - - if (xfs_rmap_irec_offset_unpack(rec.offset, irec) != NULL) + if (!stat) return -EFSCORRUPTED; - error = xrep_rtrmap_check_mapping(rr->sc, irec); + error = xfs_rmap_get_rec(rr->mcur, &cur->bc_rec.r, &stat); if (error) return error; + if (!stat) + return -EFSCORRUPTED; block_rec = xfs_btree_rec_addr(cur, idx, block); cur->bc_ops->init_rec_from_cur(cur, block_rec); @@ -565,7 +613,7 @@ xrep_rtrmap_build_new_tree( struct xfs_scrub *sc = rr->sc; struct xfs_rtgroup *rtg = sc->sr.rtg; struct xfs_btree_cur *rmap_cur; - uint64_t nr_records; + struct xfs_buf *mhead_bp; int error; /* @@ -585,11 +633,9 @@ xrep_rtrmap_build_new_tree( rmap_cur = xfs_rtrmapbt_stage_cursor(sc->mp, rtg, rtg->rtg_rmapip, &rr->new_btree.ifake); - nr_records = xfarray_length(rr->rtrmap_records); - /* Compute how many blocks we'll need for the rmaps collected. */ error = xfs_btree_bload_compute_geometry(rmap_cur, - &rr->new_btree.bload, nr_records); + &rr->new_btree.bload, rr->nr_records); if (error) goto err_cur; @@ -615,12 +661,25 @@ xrep_rtrmap_build_new_tree( if (error) goto err_cur; + /* + * Create a cursor to the in-memory btree so that we can bulk load the + * new btree. + */ + error = xfbtree_head_read_buf(rr->rtrmap_btree, NULL, &mhead_bp); + if (error) + goto err_cur; + + rr->mcur = xfs_rtrmapbt_mem_cursor(sc->sr.rtg, NULL, mhead_bp, + rr->rtrmap_btree); + error = xfs_btree_goto_left_edge(rr->mcur); + if (error) + goto err_mcur; + /* Add all observed rmap records. */ rr->new_btree.ifake.if_fork->if_format = XFS_DINODE_FMT_RMAP; - rr->array_cur = XFARRAY_CURSOR_INIT; error = xfs_btree_bload(rmap_cur, &rr->new_btree.bload, rr); if (error) - goto err_cur; + goto err_mcur; /* * Install the new rtrmap btree in the inode. After this point the old @@ -630,6 +689,15 @@ xrep_rtrmap_build_new_tree( xfs_rtrmapbt_commit_staged_btree(rmap_cur, sc->tp); xrep_inode_set_nblocks(rr->sc, rr->new_btree.ifake.if_blocks); xfs_btree_del_cursor(rmap_cur, 0); + xfs_btree_del_cursor(rr->mcur, 0); + rr->mcur = NULL; + xfs_buf_relse(mhead_bp); + + /* + * Now that we've written the new btree to disk, we don't need to keep + * updating the in-memory btree. Abort the scan to stop live updates. + */ + xchk_iscan_abort(&rr->iscan); /* Dispose of any unused blocks and the accounting information. */ error = xrep_newbt_commit(&rr->new_btree); @@ -638,6 +706,9 @@ xrep_rtrmap_build_new_tree( return xrep_roll_trans(sc); +err_mcur: + xfs_btree_del_cursor(rr->mcur, error); + xfs_buf_relse(mhead_bp); err_cur: xfs_btree_del_cursor(rmap_cur, error); xrep_newbt_cancel(&rr->new_btree); @@ -674,16 +745,13 @@ xrep_rtrmap_setup_scan( struct xrep_rtrmap *rr) { struct xfs_scrub *sc = rr->sc; - char *descr; int error; xfsb_bitmap_init(&rr->old_rtrmapbt_blocks); /* Set up some storage */ - descr = xchk_xfile_rtgroup_descr(sc, "reverse mapping records"); - error = xfarray_create(descr, 0, sizeof(struct xrep_rtrmap_extent), - &rr->rtrmap_records); - kfree(descr); + error = xfs_rtrmapbt_mem_create(sc->mp, sc->sr.rtg->rtg_rgno, + sc->xfile_buftarg, &rr->rtrmap_btree); if (error) goto out_bitmap; @@ -702,7 +770,7 @@ xrep_rtrmap_teardown( struct xrep_rtrmap *rr) { xchk_iscan_teardown(&rr->iscan); - xfarray_destroy(rr->rtrmap_records); + xfbtree_destroy(rr->rtrmap_btree); xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); } diff --git a/fs/xfs/scrub/xfbtree.c b/fs/xfs/scrub/xfbtree.c index 9e557d87d1c9c..7c035ad1f696a 100644 --- a/fs/xfs/scrub/xfbtree.c +++ b/fs/xfs/scrub/xfbtree.c @@ -17,6 +17,7 @@ #include "xfs_error.h" #include "xfs_btree_mem.h" #include "xfs_ag.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/xfile.h" #include "scrub/xfbtree.h" @@ -298,6 +299,8 @@ xfbtree_dup_cursor( if (cur->bc_mem.pag) ncur->bc_mem.pag = xfs_perag_hold(cur->bc_mem.pag); + if (cur->bc_mem.rtg) + ncur->bc_mem.rtg = xfs_rtgroup_hold(cur->bc_mem.rtg); return ncur; } From patchwork Sun Dec 31 21:41:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507702 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6FB4BE48 for ; Sun, 31 Dec 2023 21:41:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JgCqBgMt" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9FDE8C433C7; Sun, 31 Dec 2023 21:41:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058902; bh=uPowIBEIgJ+zNu7LcrROwXirsFE24e0p57Lh/8gsBJw=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=JgCqBgMt4A5rBn7vzjTH/Ej8H9LS1OE5sV1zG07uVg3oO3OmBPifTY9QSQoMeWf8u Aem85/HB0t1LjwuVoqV4mT7GTGIb1RUOAq7MfqWAI07cJIWHmB+yBL1qGTIo9dL/0x kDfun6U/H94y5L61HZoY9yLuAO7pyjzrAjckfLrpQzjiCTQYrMquBfsSpd5Yyp5Qyh nl4y3rEHqnTJv5id4uovH+vyNe1P3J2nBhMg33c8VCkStRkRpjoZ9IgXAE5e4BnPHe g3FSuPTFFLJPEdAuxg//EMAnbgfH84bGXOSDaKDTJ4v6frOrbBz9aIWT0l2DKq89Eu tc4uSoB4VA2Pg== Date: Sun, 31 Dec 2023 13:41:42 -0800 Subject: [PATCH 38/39] xfs: hook live realtime rmap operations during a repair operation From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850513.1764998.8060614318675586540.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Hook the regular realtime rmap code when an rtrmapbt repair operation is running so that we can unlock the AGF buffer to scan the filesystem and keep the in-memory btree up to date during the scan. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_rmap.c | 56 ++++++++++++++++- fs/xfs/libxfs/xfs_rmap.h | 6 ++ fs/xfs/libxfs/xfs_rtgroup.c | 1 fs/xfs/libxfs/xfs_rtgroup.h | 3 + fs/xfs/scrub/rtrmap_repair.c | 139 ++++++++++++++++++++++++++++++++++++++++-- fs/xfs/scrub/trace.h | 36 +++++++++++ 6 files changed, 233 insertions(+), 8 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index d100e03f9560f..43108a195004c 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -920,8 +920,7 @@ xfs_rmap_update_hook( .oinfo = *oinfo, /* struct copy */ }; - if (pag) - xfs_hooks_call(&pag->pag_rmap_update_hooks, op, &p); + xfs_hooks_call(&pag->pag_rmap_update_hooks, op, &p); } } @@ -946,6 +945,50 @@ xfs_rmap_hook_del( # define xfs_rmap_update_hook(t, p, o, s, b, u, oi) do { } while (0) #endif /* CONFIG_XFS_LIVE_HOOKS */ +# if defined(CONFIG_XFS_LIVE_HOOKS) && defined(CONFIG_XFS_RT) +static inline void +xfs_rtrmap_update_hook( + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + enum xfs_rmap_intent_type op, + xfs_rgblock_t startblock, + xfs_extlen_t blockcount, + bool unwritten, + const struct xfs_owner_info *oinfo) +{ + if (xfs_hooks_switched_on(&xfs_rmap_hooks_switch)) { + struct xfs_rmap_update_params p = { + .startblock = startblock, + .blockcount = blockcount, + .unwritten = unwritten, + .oinfo = *oinfo, /* struct copy */ + }; + + xfs_hooks_call(&rtg->rtg_rmap_update_hooks, op, &p); + } +} + +/* Call the specified function during a rt reverse mapping update. */ +int +xfs_rtrmap_hook_add( + struct xfs_rtgroup *rtg, + struct xfs_rmap_hook *hook) +{ + return xfs_hooks_add(&rtg->rtg_rmap_update_hooks, &hook->update_hook); +} + +/* Stop calling the specified function during a rt reverse mapping update. */ +void +xfs_rtrmap_hook_del( + struct xfs_rtgroup *rtg, + struct xfs_rmap_hook *hook) +{ + xfs_hooks_del(&rtg->rtg_rmap_update_hooks, &hook->update_hook); +} +#else +# define xfs_rtrmap_update_hook(t, r, o, s, b, u, oi) do { } while (0) +#endif /* CONFIG_XFS_LIVE_HOOKS && CONFIG_XFS_RT */ + /* * Remove a reference to an extent in the rmap btree. */ @@ -2702,6 +2745,7 @@ xfs_rtrmap_finish_one( xfs_rgnumber_t rgno; xfs_rgblock_t bno; bool unwritten; + int error; trace_xfs_rmap_deferred(mp, ri); @@ -2727,8 +2771,14 @@ xfs_rtrmap_finish_one( unwritten = ri->ri_bmap.br_state == XFS_EXT_UNWRITTEN; bno = xfs_rtb_to_rgbno(mp, ri->ri_bmap.br_startblock, &rgno); - return __xfs_rmap_finish_intent(rcur, ri->ri_type, bno, + error = __xfs_rmap_finish_intent(rcur, ri->ri_type, bno, ri->ri_bmap.br_blockcount, &oinfo, unwritten); + if (error) + return error; + + xfs_rtrmap_update_hook(tp, ri->ri_rtg, ri->ri_type, bno, + ri->ri_bmap.br_blockcount, unwritten, &oinfo); + return 0; } /* diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index 3719fc4cbc26b..9e19e657eeff6 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -275,6 +275,12 @@ void xfs_rmap_hook_enable(void); int xfs_rmap_hook_add(struct xfs_perag *pag, struct xfs_rmap_hook *hook); void xfs_rmap_hook_del(struct xfs_perag *pag, struct xfs_rmap_hook *hook); + +# ifdef CONFIG_XFS_RT +int xfs_rtrmap_hook_add(struct xfs_rtgroup *rtg, struct xfs_rmap_hook *hook); +void xfs_rtrmap_hook_del(struct xfs_rtgroup *rtg, struct xfs_rmap_hook *hook); +# endif /* CONFIG_XFS_RT */ + #endif #endif /* __XFS_RMAP_H__ */ diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index 7b031f4917349..8bc97c9aa4c9c 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -163,6 +163,7 @@ xfs_initialize_rtgroups( spin_lock_init(&rtg->rtg_state_lock); init_waitqueue_head(&rtg->rtg_active_wq); xfs_defer_drain_init(&rtg->rtg_intents_drain); + xfs_hooks_init(&rtg->rtg_rmap_update_hooks); #endif /* __KERNEL__ */ /* Active ref owned by mount indicates rtgroup is online. */ diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 9487c2e00478b..3522527e553b8 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -48,6 +48,9 @@ struct xfs_rtgroup { * inconsistencies. */ struct xfs_defer_drain rtg_intents_drain; + + /* Hook to feed rt rmapbt updates to an active online repair. */ + struct xfs_hooks rtg_rmap_update_hooks; #endif /* __KERNEL__ */ }; diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c index 00e606dfc6842..42df1cf45ae0b 100644 --- a/fs/xfs/scrub/rtrmap_repair.c +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -71,6 +71,9 @@ struct xrep_rtrmap { /* new rtrmapbt information */ struct xrep_newbt new_btree; + /* lock for the xfbtree and xfile */ + struct mutex lock; + /* rmap records generated from primary metadata */ struct xfbtree *rtrmap_btree; @@ -79,6 +82,9 @@ struct xrep_rtrmap { /* bitmap of old rtrmapbt blocks */ struct xfsb_bitmap old_rtrmapbt_blocks; + /* Hooks into rtrmap update code. */ + struct xfs_rmap_hook hooks; + /* inode scan cursor */ struct xchk_iscan iscan; @@ -98,6 +104,8 @@ xrep_setup_rtrmapbt( char *descr; int error; + xchk_fsgates_enable(sc, XCHK_FSGATES_RMAP); + descr = xchk_xfile_rtgroup_descr(sc, "reverse mapping records"); error = xrep_setup_buftarg(sc, descr); kfree(descr); @@ -155,12 +163,16 @@ xrep_rtrmap_stash( if (xchk_should_terminate(sc, &error)) return error; + if (xchk_iscan_aborted(&rr->iscan)) + return -EFSCORRUPTED; + trace_xrep_rtrmap_found(sc->mp, &rmap); /* Add entry to in-memory btree. */ + mutex_lock(&rr->lock); error = xfbtree_head_read_buf(rr->rtrmap_btree, sc->tp, &mhead_bp); if (error) - return error; + goto out_abort; mcur = xfs_rtrmapbt_mem_cursor(sc->sr.rtg, sc->tp, mhead_bp, rr->rtrmap_btree); @@ -169,10 +181,18 @@ xrep_rtrmap_stash( if (error) goto out_cancel; - return xfbtree_trans_commit(rr->rtrmap_btree, sc->tp); + error = xfbtree_trans_commit(rr->rtrmap_btree, sc->tp); + if (error) + goto out_abort; + + mutex_unlock(&rr->lock); + return 0; out_cancel: xfbtree_trans_cancel(rr->rtrmap_btree, sc->tp); +out_abort: + xchk_iscan_abort(&rr->iscan); + mutex_unlock(&rr->lock); return error; } @@ -509,6 +529,13 @@ xrep_rtrmap_find_rmaps( if (error) return error; + /* + * If a hook failed to update the in-memory btree, we lack the data to + * continue the repair. + */ + if (xchk_iscan_aborted(&rr->iscan)) + return -EFSCORRUPTED; + /* Scan for old rtrmap blocks. */ for_each_perag(sc->mp, agno, pag) { error = xrep_rtrmap_scan_ag(rr, pag); @@ -739,6 +766,89 @@ xrep_rtrmap_remove_old_tree( return xrep_reset_imeta_reservation(rr->sc); } +static inline bool +xrep_rtrmapbt_want_live_update( + struct xchk_iscan *iscan, + const struct xfs_owner_info *oi) +{ + if (xchk_iscan_aborted(iscan)) + return false; + + /* + * We scanned the CoW staging extents before we started the iscan, so + * we need all the updates. + */ + if (XFS_RMAP_NON_INODE_OWNER(oi->oi_owner)) + return true; + + /* Ignore updates to files that the scanner hasn't visited yet. */ + return xchk_iscan_want_live_update(iscan, oi->oi_owner); +} + +/* + * Apply a rtrmapbt update from the regular filesystem into our shadow btree. + * We're running from the thread that owns the rtrmap ILOCK and is generating + * the update, so we must be careful about which parts of the struct + * xrep_rtrmap that we change. + */ +static int +xrep_rtrmapbt_live_update( + struct notifier_block *nb, + unsigned long action, + void *data) +{ + struct xfs_rmap_update_params *p = data; + struct xrep_rtrmap *rr; + struct xfs_mount *mp; + struct xfs_btree_cur *mcur; + struct xfs_buf *mhead_bp; + struct xfs_trans *tp; + void *txcookie; + int error; + + rr = container_of(nb, struct xrep_rtrmap, hooks.update_hook.nb); + mp = rr->sc->mp; + + if (!xrep_rtrmapbt_want_live_update(&rr->iscan, &p->oinfo)) + goto out_unlock; + + trace_xrep_rtrmap_live_update(mp, rr->sc->sr.rtg->rtg_rgno, action, p); + + error = xrep_trans_alloc_hook_dummy(mp, &txcookie, &tp); + if (error) + goto out_abort; + + mutex_lock(&rr->lock); + error = xfbtree_head_read_buf(rr->rtrmap_btree, tp, &mhead_bp); + if (error) + goto out_cancel; + + mcur = xfs_rtrmapbt_mem_cursor(rr->sc->sr.rtg, tp, mhead_bp, + rr->rtrmap_btree); + error = __xfs_rmap_finish_intent(mcur, action, p->startblock, + p->blockcount, &p->oinfo, p->unwritten); + xfs_btree_del_cursor(mcur, error); + if (error) + goto out_cancel; + + error = xfbtree_trans_commit(rr->rtrmap_btree, tp); + if (error) + goto out_cancel; + + xrep_trans_cancel_hook_dummy(&txcookie, tp); + mutex_unlock(&rr->lock); + return NOTIFY_DONE; + +out_cancel: + xfbtree_trans_cancel(rr->rtrmap_btree, tp); + xrep_trans_cancel_hook_dummy(&txcookie, tp); +out_abort: + xchk_iscan_abort(&rr->iscan); + mutex_unlock(&rr->lock); +out_unlock: + return NOTIFY_DONE; +} + /* Set up the filesystem scan components. */ STATIC int xrep_rtrmap_setup_scan( @@ -747,6 +857,7 @@ xrep_rtrmap_setup_scan( struct xfs_scrub *sc = rr->sc; int error; + mutex_init(&rr->lock); xfsb_bitmap_init(&rr->old_rtrmapbt_blocks); /* Set up some storage */ @@ -757,10 +868,26 @@ xrep_rtrmap_setup_scan( /* Retry iget every tenth of a second for up to 30 seconds. */ xchk_iscan_start(sc, 30000, 100, &rr->iscan); + + /* + * Hook into live rtrmap operations so that we can update our in-memory + * btree to reflect live changes on the filesystem. Since we drop the + * rtrmap ILOCK to scan all the inodes, we need this piece to avoid + * installing a stale btree. + */ + ASSERT(sc->flags & XCHK_FSGATES_RMAP); + xfs_hook_setup(&rr->hooks.update_hook, xrep_rtrmapbt_live_update); + error = xfs_rtrmap_hook_add(sc->sr.rtg, &rr->hooks); + if (error) + goto out_iscan; return 0; +out_iscan: + xchk_iscan_teardown(&rr->iscan); + xfbtree_destroy(rr->rtrmap_btree); out_bitmap: xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); + mutex_destroy(&rr->lock); return error; } @@ -769,9 +896,14 @@ STATIC void xrep_rtrmap_teardown( struct xrep_rtrmap *rr) { + struct xfs_scrub *sc = rr->sc; + + xchk_iscan_abort(&rr->iscan); + xfs_rtrmap_hook_del(sc->sr.rtg, &rr->hooks); xchk_iscan_teardown(&rr->iscan); xfbtree_destroy(rr->rtrmap_btree); xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); + mutex_destroy(&rr->lock); } /* Repair the realtime rmap btree. */ @@ -782,9 +914,6 @@ xrep_rtrmapbt( struct xrep_rtrmap *rr = sc->buf; int error; - /* Functionality is not yet complete. */ - return xrep_notsupported(sc); - /* Make sure any problems with the fork are fixed. */ error = xrep_metadata_inode_forks(sc); if (error) diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 95fdb82660dc3..65e0872792e1f 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -3956,6 +3956,42 @@ TRACE_EVENT(xrep_rtrmap_found, __entry->offset, __entry->flags) ); + +TRACE_EVENT(xrep_rtrmap_live_update, + TP_PROTO(struct xfs_mount *mp, xfs_rgnumber_t rgno, unsigned int op, + const struct xfs_rmap_update_params *p), + TP_ARGS(mp, rgno, op, p), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_rgnumber_t, rgno) + __field(unsigned int, op) + __field(xfs_rgblock_t, rgbno) + __field(xfs_extlen_t, len) + __field(uint64_t, owner) + __field(uint64_t, offset) + __field(unsigned int, flags) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rgno = rgno; + __entry->op = op; + __entry->rgbno = p->startblock; + __entry->len = p->blockcount; + xfs_owner_info_unpack(&p->oinfo, &__entry->owner, + &__entry->offset, &__entry->flags); + if (p->unwritten) + __entry->flags |= XFS_RMAP_UNWRITTEN; + ), + TP_printk("dev %d:%d rgno 0x%x op %s rgbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->rgno, + __print_symbolic(__entry->op, XFS_RMAP_INTENT_STRINGS), + __entry->rgbno, + __entry->len, + __entry->owner, + __entry->offset, + __entry->flags) +); #endif /* CONFIG_XFS_RT */ #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */ From patchwork Sun Dec 31 21:41:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13507703 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF7B1BE47 for ; Sun, 31 Dec 2023 21:41:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="slgtvjFn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6708CC433C8; Sun, 31 Dec 2023 21:41:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704058918; bh=C3J/Q4GWNrWYJg2uL1nZWH4bvge7X0BQGurHwy+MAdc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=slgtvjFnnHTdL6YAhTFGjU4H0qgRUKJ9dR0CGSlAZF+Jfl5fwY7yx+bhZ5oxh9dAZ 0TEU4sO6b38gvCh8piwoDjykf6iuVNScmxbF1KOawkJkmJyiIVntZXivdtco+BzehY +SlfneVzFW0MbI48UZABx54f815c96wv/ONBWs95YQT2h9ilJxpLB28neQLpPfVBB+ 1KfGS9oyOnbzInbnQCyc2d5EF/1rjXGJs4d/n5iYiy2TY1JNANjZSXcyyGKiIlyuu7 GUD5pdUHdkrrBAMt5gwiC3tryJjLtnggNzHt9DJjB70AYsuRrRdZMZ5tNtPPukjdN8 2OX9ZtkUY4xaw== Date: Sun, 31 Dec 2023 13:41:57 -0800 Subject: [PATCH 39/39] xfs: enable realtime rmap btree From: "Darrick J. Wong" To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <170404850530.1764998.3532838319012354252.stgit@frogsfrogsfrogs> In-Reply-To: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> References: <170404849811.1764998.10873316890301599216.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_super.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 4a3119a48ef08..fbfd2963b2e2c 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1741,12 +1741,6 @@ xfs_fs_fill_super( } } - if (xfs_has_rmapbt(mp) && mp->m_sb.sb_rblocks) { - xfs_alert(mp, - "reverse mapping btree not compatible with realtime device!"); - error = -EINVAL; - goto out_filestream_unmount; - } if (xfs_has_parent(mp)) xfs_warn(mp,