From patchwork Fri Dec 13 01:00:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906193 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB1401372 for ; Fri, 13 Dec 2024 01:00:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051651; cv=none; b=YXE79T0z7QdSJDFd/k4cPrazQUQPq+7m4h+Usl23kVPAVMxbKyjByOfoB2SQeu7fi5KxTrZDwDE6LuH0kqoyQLTFMUAl2gBXkdnY98u/y+2GZT8Ys57wIqgWduwHMCgo7lGOYtcBGsb0Dwm2odxX6qAkVqjDtpa4fZnukhcs0pg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051651; c=relaxed/simple; bh=S61ndOKxzAjpavRflo+zX8sOQ0AyHawb8RHBCqqj4mU=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qtwu8KtvRTG9J2XWRbHqGGbGW9EnUAfZaRj2uLAR+3pUdcBzirlO7CLYhQlwB6YN87TjWOwXv0DM2vBQO/2lN78g65FjwoX7fy1HkBVIid96QafzbkWbStjTympoxaNVXs9KRYBtx1flmM/WYYmiFIAebijRkkO485xv1Mf2j1g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=S9elLqkm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="S9elLqkm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5C0CBC4CECE; Fri, 13 Dec 2024 01:00:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051651; bh=S61ndOKxzAjpavRflo+zX8sOQ0AyHawb8RHBCqqj4mU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=S9elLqkm+H21xsKNAD9w8VROJuWaC73kCVHMMk4CtuzgAL1oeSAOwL1/JcGphaZs/ FOlVv40usfw0yOfwHVkJWjhi+g+mgiwtRDgNgcxoXaN0GNgC7CTYG7xDIUcwUXP0gI FGQbKYIEc9HWeQqiqVsEH7PhtW0tHxEfgovvcCW+GVCLIKUpfwoH2U1uv0VJFTjQld t/gF0yUiPev1YM8WXqKSz8kwViq+3iiyiSckaNCXpZubegcvMeqwMwoxDW+iR9jijr hZn3Pa1AzCRZ32/EVEL4iuWAafWs4/mrbjAXsWpVnEHxRw0vMWJk9FVlOdYkNImAGR RZQe5ZOZGNTAA== Date: Thu, 12 Dec 2024 17:00:50 -0800 Subject: [PATCH 01/37] xfs: add some rtgroup inode helpers From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123329.1181370.17404943645784258939.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create some simple helpers to reduce the amount of typing whenever we access rtgroup inodes. Conversion was done with this spatch and some minor reformatting: @@ expression rtg; @@ - rtg->rtg_inodes[XFS_RTGI_BITMAP] + rtg_bitmap(rtg) @@ expression rtg; @@ - rtg->rtg_inodes[XFS_RTGI_SUMMARY] + rtg_summary(rtg) and the CLI command: $ spatch --sp-file /tmp/moo.cocci --dir fs/xfs/ --use-gitgrep --in-place Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_rtbitmap.c | 2 +- fs/xfs/libxfs/xfs_rtgroup.c | 18 ++++++++---------- fs/xfs/libxfs/xfs_rtgroup.h | 10 ++++++++++ fs/xfs/scrub/rtbitmap.c | 7 +++---- fs/xfs/scrub/rtsummary.c | 12 +++++------- fs/xfs/xfs_qm.c | 8 ++++---- fs/xfs/xfs_rtalloc.c | 10 +++++----- 7 files changed, 36 insertions(+), 31 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 4ddfb7e395b38a..770adf60dd7392 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -1055,7 +1055,7 @@ xfs_rtfree_extent( xfs_rtxlen_t len) /* length of extent freed */ { struct xfs_mount *mp = tp->t_mountp; - struct xfs_inode *rbmip = rtg->rtg_inodes[XFS_RTGI_BITMAP]; + struct xfs_inode *rbmip = rtg_bitmap(rtg); struct xfs_rtalloc_args args = { .mp = mp, .tp = tp, diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index 4f3bfc884aff29..a79b734e70440d 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -197,10 +197,10 @@ xfs_rtgroup_lock( * Lock both realtime free space metadata inodes for a freespace * update. */ - xfs_ilock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_EXCL); - xfs_ilock(rtg->rtg_inodes[XFS_RTGI_SUMMARY], XFS_ILOCK_EXCL); + xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_EXCL); + xfs_ilock(rtg_summary(rtg), XFS_ILOCK_EXCL); } else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) { - xfs_ilock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_SHARED); + xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_SHARED); } } @@ -215,10 +215,10 @@ xfs_rtgroup_unlock( !(rtglock_flags & XFS_RTGLOCK_BITMAP)); if (rtglock_flags & XFS_RTGLOCK_BITMAP) { - xfs_iunlock(rtg->rtg_inodes[XFS_RTGI_SUMMARY], XFS_ILOCK_EXCL); - xfs_iunlock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_EXCL); + xfs_iunlock(rtg_summary(rtg), XFS_ILOCK_EXCL); + xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_EXCL); } else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) { - xfs_iunlock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_SHARED); + xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_SHARED); } } @@ -236,10 +236,8 @@ xfs_rtgroup_trans_join( ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED)); if (rtglock_flags & XFS_RTGLOCK_BITMAP) { - xfs_trans_ijoin(tp, rtg->rtg_inodes[XFS_RTGI_BITMAP], - XFS_ILOCK_EXCL); - xfs_trans_ijoin(tp, rtg->rtg_inodes[XFS_RTGI_SUMMARY], - XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, rtg_bitmap(rtg), XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, rtg_summary(rtg), XFS_ILOCK_EXCL); } } diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 7e7e491ff06fa5..19f8d302b9aa3f 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -64,6 +64,16 @@ static inline xfs_rgnumber_t rtg_rgno(const struct xfs_rtgroup *rtg) return rtg->rtg_group.xg_gno; } +static inline struct xfs_inode *rtg_bitmap(const struct xfs_rtgroup *rtg) +{ + return rtg->rtg_inodes[XFS_RTGI_BITMAP]; +} + +static inline struct xfs_inode *rtg_summary(const struct xfs_rtgroup *rtg) +{ + return rtg->rtg_inodes[XFS_RTGI_SUMMARY]; +} + /* Passive rtgroup references */ static inline struct xfs_rtgroup * xfs_rtgroup_get( diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 376a36fd9a9cdd..fb4970c877abd3 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -49,8 +49,7 @@ xchk_setup_rtbitmap( if (error) return error; - error = xchk_install_live_inode(sc, - sc->sr.rtg->rtg_inodes[XFS_RTGI_BITMAP]); + error = xchk_install_live_inode(sc, rtg_bitmap(sc->sr.rtg)); if (error) return error; @@ -146,7 +145,7 @@ xchk_rtbitmap( { struct xfs_mount *mp = sc->mp; struct xfs_rtgroup *rtg = sc->sr.rtg; - struct xfs_inode *rbmip = rtg->rtg_inodes[XFS_RTGI_BITMAP]; + struct xfs_inode *rbmip = rtg_bitmap(rtg); struct xchk_rtbitmap *rtb = sc->buf; int error; @@ -215,7 +214,7 @@ xchk_xref_is_used_rt_space( xfs_extlen_t len) { struct xfs_rtgroup *rtg = sc->sr.rtg; - struct xfs_inode *rbmip = rtg->rtg_inodes[XFS_RTGI_BITMAP]; + struct xfs_inode *rbmip = rtg_bitmap(rtg); xfs_rtxnum_t startext; xfs_rtxnum_t endext; bool is_free; diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index 49fc6250bafcaa..f1af5431b38856 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -81,8 +81,7 @@ xchk_setup_rtsummary( if (error) return error; - error = xchk_install_live_inode(sc, - sc->sr.rtg->rtg_inodes[XFS_RTGI_SUMMARY]); + error = xchk_install_live_inode(sc, rtg_summary(sc->sr.rtg)); if (error) return error; @@ -191,8 +190,7 @@ xchk_rtsum_record_free( rtlen = xfs_rtxlen_to_extlen(mp, rec->ar_extcount); if (!xfs_verify_rtbext(mp, rtbno, rtlen)) { - xchk_ino_xref_set_corrupt(sc, - rtg->rtg_inodes[XFS_RTGI_BITMAP]->i_ino); + xchk_ino_xref_set_corrupt(sc, rtg_bitmap(rtg)->i_ino); return -EFSCORRUPTED; } @@ -218,7 +216,7 @@ xchk_rtsum_compute( /* If the bitmap size doesn't match the computed size, bail. */ if (XFS_FSB_TO_B(mp, xfs_rtbitmap_blockcount(mp)) != - rtg->rtg_inodes[XFS_RTGI_BITMAP]->i_disk_size) + rtg_bitmap(rtg)->i_disk_size) return -EFSCORRUPTED; return xfs_rtalloc_query_all(rtg, sc->tp, xchk_rtsum_record_free, sc); @@ -310,8 +308,8 @@ xchk_rtsummary( { struct xfs_mount *mp = sc->mp; struct xfs_rtgroup *rtg = sc->sr.rtg; - struct xfs_inode *rbmip = rtg->rtg_inodes[XFS_RTGI_BITMAP]; - struct xfs_inode *rsumip = rtg->rtg_inodes[XFS_RTGI_SUMMARY]; + struct xfs_inode *rbmip = rtg_bitmap(rtg); + struct xfs_inode *rsumip = rtg_summary(rtg); struct xchk_rtsummary *rts = sc->buf; int error; diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 3abab5fb593e37..e1ba5af6250f0b 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -230,10 +230,10 @@ xfs_qm_unmount_rt( if (!rtg) return; - if (rtg->rtg_inodes[XFS_RTGI_BITMAP]) - xfs_qm_dqdetach(rtg->rtg_inodes[XFS_RTGI_BITMAP]); - if (rtg->rtg_inodes[XFS_RTGI_SUMMARY]) - xfs_qm_dqdetach(rtg->rtg_inodes[XFS_RTGI_SUMMARY]); + if (rtg_bitmap(rtg)) + xfs_qm_dqdetach(rtg_bitmap(rtg)); + if (rtg_summary(rtg)) + xfs_qm_dqdetach(rtg_summary(rtg)); xfs_rtgroup_rele(rtg); } diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 5128c5ad72f5da..4cd2f32aa70a0a 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -856,8 +856,8 @@ xfs_growfs_rt_bmblock( xfs_fileoff_t bmbno) { struct xfs_mount *mp = rtg_mount(rtg); - struct xfs_inode *rbmip = rtg->rtg_inodes[XFS_RTGI_BITMAP]; - struct xfs_inode *rsumip = rtg->rtg_inodes[XFS_RTGI_SUMMARY]; + struct xfs_inode *rbmip = rtg_bitmap(rtg); + struct xfs_inode *rsumip = rtg_summary(rtg); struct xfs_rtalloc_args args = { .mp = mp, .rtg = rtg, @@ -1041,8 +1041,8 @@ xfs_growfs_rt_alloc_blocks( xfs_extlen_t *nrbmblocks) { struct xfs_mount *mp = rtg_mount(rtg); - struct xfs_inode *rbmip = rtg->rtg_inodes[XFS_RTGI_BITMAP]; - struct xfs_inode *rsumip = rtg->rtg_inodes[XFS_RTGI_SUMMARY]; + struct xfs_inode *rbmip = rtg_bitmap(rtg); + struct xfs_inode *rsumip = rtg_summary(rtg); xfs_extlen_t orbmblocks = 0; xfs_extlen_t orsumblocks = 0; struct xfs_mount *nmp; @@ -1622,7 +1622,7 @@ xfs_rtpick_extent( xfs_rtxlen_t len) /* allocation length (rtextents) */ { struct xfs_mount *mp = rtg_mount(rtg); - struct xfs_inode *rbmip = rtg->rtg_inodes[XFS_RTGI_BITMAP]; + struct xfs_inode *rbmip = rtg_bitmap(rtg); xfs_rtxnum_t b = 0; /* result rtext */ int log2; /* log of sequence number */ uint64_t resid; /* residual after log removed */ From patchwork Fri Dec 13 01:01:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906194 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30B654C7D for ; Fri, 13 Dec 2024 01:01:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051667; cv=none; b=BtWVlrXozJnhtZeFYUgJ/wUdXtcuP+y74zrWusyk0fOkEY7uCn8Qak9Wnq3lQdoN62Xw7oNkMv17Uh/L3XzgDHMBjTuhVgH2N4hcpsflF8Iw1l4nwrMxJ2IkQnMfsRDoLLYgGfWQFdFDwrNexx6I/UNdvohmTSWR1wFDomH/m60= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051667; c=relaxed/simple; bh=07PKpt/gZ5z0vtvWE/Ci5E1dBNdBWXCjdJS4TP4pSSY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SDb/733PsX21Lbq/vtjoIz9sr39SnI8pcahFJIbxo22AJcRursiF8RaVAlNmdlMteUL/2+GPh3Hw1X2VZ7Qnvb471nl6NoajP2Acsws5rFWEu1wUl4Ucvzv4/d2fUkg2arhF1HxbTjTFaF8SsdotRhhVncc5OiW9Ioky4jFvX4w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Bl5EUiJS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Bl5EUiJS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EFA20C4CECE; Fri, 13 Dec 2024 01:01:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051667; bh=07PKpt/gZ5z0vtvWE/Ci5E1dBNdBWXCjdJS4TP4pSSY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Bl5EUiJSJqJuHPv8yTuOZ8fYO/ES9aumdhuDhTM8qSk/ALsIWuXFC2M242SoWJzzh hGCa2OFPLBD+r7IpmD3G9e4EldM7HNsJcWv6rb2Y9C1PboPNJYrJz+16l/wyc2DQ7u Ha1My0LE0NqXo4LeWEG77LFha0m4TYXtTloNb6M8iBPDcZkmqsXom3xXx2wqrci6c7 H5LLgF/cVOoEmJS+tz+/Qcpo87WyMHQdMuEbxT1vKFxPpDBnnQaVZq4FCmMgJW1ZHw qqVcVgxRncmJCPJug1W+fyU2Igi0ITguFL4DCOOORh9YXjVx61hu8Psbnf9ZETnuxr AiHGm8PBV4fYg== Date: Thu, 12 Dec 2024 17:01:06 -0800 Subject: [PATCH 02/37] xfs: prepare rmap btree cursor tracepoints for realtime From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123347.1181370.69471894031928674.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Rework the rmap btree cursor tracepoints in preparation to handle the realtime rmap btree cursor. Mostly this involves renaming the field to "gbno" and extracting the group number from the cursor. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_rmap_item.c | 4 +- fs/xfs/xfs_trace.h | 82 +++++++++++++++++++++++++----------------------- 2 files changed, 44 insertions(+), 42 deletions(-) diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 76b3c0ed3b4f63..ac2913a7335871 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -351,10 +351,10 @@ xfs_rmap_defer_add( { struct xfs_mount *mp = tp->t_mountp; - trace_xfs_rmap_defer(mp, ri); - ri->ri_group = xfs_group_intent_get(mp, ri->ri_bmap.br_startblock, XG_TYPE_AG); + + trace_xfs_rmap_defer(mp, ri); xfs_defer_add(tp, &ri->ri_list, &xfs_rmap_update_defer_type); } diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index cbda663fe6e817..8b7bb1f5ae3c6f 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -14,11 +14,15 @@ * ino: filesystem inode number * * agbno: per-AG block number in fs blocks + * rgbno: per-rtgroup block number in fs blocks * startblock: physical block number for file mappings. This is either a * segmented fsblock for data device mappings, or a rfsblock * for realtime device mappings * fsbcount: number of blocks in an extent, in fs blocks * + * gbno: generic allocation group block number. This is an agbno for + * space in a per-AG or a rgbno for space in a realtime group. + * * daddr: physical block number in 512b blocks * bbcount: number of blocks in a physical extent, in 512b blocks * @@ -2918,13 +2922,14 @@ DEFINE_DEFER_PENDING_ITEM_EVENT(xfs_defer_finish_item); /* rmap tracepoints */ DECLARE_EVENT_CLASS(xfs_rmap_class, TP_PROTO(struct xfs_btree_cur *cur, - xfs_agblock_t agbno, xfs_extlen_t len, bool unwritten, + xfs_agblock_t gbno, xfs_extlen_t len, bool unwritten, const struct xfs_owner_info *oinfo), - TP_ARGS(cur, agbno, len, unwritten, oinfo), + TP_ARGS(cur, gbno, len, unwritten, oinfo), TP_STRUCT__entry( __field(dev_t, dev) + __field(enum xfs_group_type, type) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, gbno) __field(xfs_extlen_t, len) __field(uint64_t, owner) __field(uint64_t, offset) @@ -2932,8 +2937,9 @@ DECLARE_EVENT_CLASS(xfs_rmap_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; + __entry->type = cur->bc_group->xg_type; __entry->agno = cur->bc_group->xg_gno; - __entry->agbno = agbno; + __entry->gbno = gbno; __entry->len = len; __entry->owner = oinfo->oi_owner; __entry->offset = oinfo->oi_offset; @@ -2941,10 +2947,11 @@ DECLARE_EVENT_CLASS(xfs_rmap_class, if (unwritten) __entry->flags |= XFS_RMAP_UNWRITTEN; ), - TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%lx", + TP_printk("dev %d:%d %sno 0x%x gbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%lx", MAJOR(__entry->dev), MINOR(__entry->dev), + __print_symbolic(__entry->type, XG_TYPE_STRINGS), __entry->agno, - __entry->agbno, + __entry->gbno, __entry->len, __entry->owner, __entry->offset, @@ -2953,9 +2960,9 @@ DECLARE_EVENT_CLASS(xfs_rmap_class, #define DEFINE_RMAP_EVENT(name) \ DEFINE_EVENT(xfs_rmap_class, name, \ TP_PROTO(struct xfs_btree_cur *cur, \ - xfs_agblock_t agbno, xfs_extlen_t len, bool unwritten, \ + xfs_agblock_t gbno, xfs_extlen_t len, bool unwritten, \ const struct xfs_owner_info *oinfo), \ - TP_ARGS(cur, agbno, len, unwritten, oinfo)) + TP_ARGS(cur, gbno, len, unwritten, oinfo)) /* btree cursor error/%ip tracepoint class */ DECLARE_EVENT_CLASS(xfs_btree_error_class, @@ -3018,47 +3025,36 @@ TRACE_EVENT(xfs_rmap_convert_state, TP_ARGS(cur, state, caller_ip), TP_STRUCT__entry( __field(dev_t, dev) + __field(enum xfs_group_type, type) __field(xfs_agnumber_t, agno) - __field(xfs_ino_t, ino) __field(int, state) __field(unsigned long, caller_ip) ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - switch (cur->bc_ops->type) { - case XFS_BTREE_TYPE_INODE: - __entry->agno = 0; - __entry->ino = cur->bc_ino.ip->i_ino; - break; - case XFS_BTREE_TYPE_AG: - __entry->agno = cur->bc_group->xg_gno; - __entry->ino = 0; - break; - case XFS_BTREE_TYPE_MEM: - __entry->agno = 0; - __entry->ino = 0; - break; - } + __entry->type = cur->bc_group->xg_type; + __entry->agno = cur->bc_group->xg_gno; __entry->state = state; __entry->caller_ip = caller_ip; ), - TP_printk("dev %d:%d agno 0x%x ino 0x%llx state %d caller %pS", + TP_printk("dev %d:%d %sno 0x%x state %d caller %pS", MAJOR(__entry->dev), MINOR(__entry->dev), + __print_symbolic(__entry->type, XG_TYPE_STRINGS), __entry->agno, - __entry->ino, __entry->state, (char *)__entry->caller_ip) ); DECLARE_EVENT_CLASS(xfs_rmapbt_class, TP_PROTO(struct xfs_btree_cur *cur, - xfs_agblock_t agbno, xfs_extlen_t len, + xfs_agblock_t gbno, xfs_extlen_t len, uint64_t owner, uint64_t offset, unsigned int flags), - TP_ARGS(cur, agbno, len, owner, offset, flags), + TP_ARGS(cur, gbno, len, owner, offset, flags), TP_STRUCT__entry( __field(dev_t, dev) + __field(enum xfs_group_type, type) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, gbno) __field(xfs_extlen_t, len) __field(uint64_t, owner) __field(uint64_t, offset) @@ -3066,17 +3062,19 @@ DECLARE_EVENT_CLASS(xfs_rmapbt_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; + __entry->type = cur->bc_group->xg_type; __entry->agno = cur->bc_group->xg_gno; - __entry->agbno = agbno; + __entry->gbno = gbno; __entry->len = len; __entry->owner = owner; __entry->offset = offset; __entry->flags = flags; ), - TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", + TP_printk("dev %d:%d %sno 0x%x gbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), + __print_symbolic(__entry->type, XG_TYPE_STRINGS), __entry->agno, - __entry->agbno, + __entry->gbno, __entry->len, __entry->owner, __entry->offset, @@ -3085,9 +3083,9 @@ DECLARE_EVENT_CLASS(xfs_rmapbt_class, #define DEFINE_RMAPBT_EVENT(name) \ DEFINE_EVENT(xfs_rmapbt_class, name, \ TP_PROTO(struct xfs_btree_cur *cur, \ - xfs_agblock_t agbno, xfs_extlen_t len, \ + xfs_agblock_t gbno, xfs_extlen_t len, \ uint64_t owner, uint64_t offset, unsigned int flags), \ - TP_ARGS(cur, agbno, len, owner, offset, flags)) + TP_ARGS(cur, gbno, len, owner, offset, flags)) TRACE_DEFINE_ENUM(XFS_RMAP_MAP); TRACE_DEFINE_ENUM(XFS_RMAP_MAP_SHARED); @@ -3104,8 +3102,9 @@ DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, TP_STRUCT__entry( __field(dev_t, dev) __field(unsigned long long, owner) + __field(enum xfs_group_type, type) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, gbno) __field(int, whichfork) __field(xfs_fileoff_t, l_loff) __field(xfs_filblks_t, l_len) @@ -3114,9 +3113,11 @@ DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, ), TP_fast_assign( __entry->dev = mp->m_super->s_dev; - __entry->agno = XFS_FSB_TO_AGNO(mp, ri->ri_bmap.br_startblock); - __entry->agbno = XFS_FSB_TO_AGBNO(mp, - ri->ri_bmap.br_startblock); + __entry->type = ri->ri_group->xg_type; + __entry->agno = ri->ri_group->xg_gno; + __entry->gbno = xfs_fsb_to_gbno(mp, + ri->ri_bmap.br_startblock, + ri->ri_group->xg_type); __entry->owner = ri->ri_owner; __entry->whichfork = ri->ri_whichfork; __entry->l_loff = ri->ri_bmap.br_startoff; @@ -3124,11 +3125,12 @@ DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, __entry->l_state = ri->ri_bmap.br_state; __entry->op = ri->ri_type; ), - TP_printk("dev %d:%d op %s agno 0x%x agbno 0x%x owner 0x%llx %s fileoff 0x%llx fsbcount 0x%llx state %d", + TP_printk("dev %d:%d op %s %sno 0x%x gbno 0x%x owner 0x%llx %s fileoff 0x%llx fsbcount 0x%llx state %d", MAJOR(__entry->dev), MINOR(__entry->dev), __print_symbolic(__entry->op, XFS_RMAP_INTENT_STRINGS), + __print_symbolic(__entry->type, XG_TYPE_STRINGS), __entry->agno, - __entry->agbno, + __entry->gbno, __entry->owner, __print_symbolic(__entry->whichfork, XFS_WHICHFORK_STRINGS), __entry->l_loff, @@ -3993,7 +3995,7 @@ TRACE_EVENT(xfs_fsmap_mapping, __entry->offset = frec->offset; __entry->flags = frec->rm_flags; ), - TP_printk("dev %d:%d keydev %d:%d agno 0x%x rmapbno 0x%x start_daddr 0x%llx len_daddr 0x%llx owner 0x%llx fileoff 0x%llx flags 0x%x", + TP_printk("dev %d:%d keydev %d:%d agno 0x%x gbno 0x%x start_daddr 0x%llx len_daddr 0x%llx owner 0x%llx fileoff 0x%llx flags 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), MAJOR(__entry->keydev), MINOR(__entry->keydev), __entry->agno, From patchwork Fri Dec 13 01:01:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906195 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B988F1372 for ; Fri, 13 Dec 2024 01:01:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051682; cv=none; b=QEGwD4IfIVIoOfuJbPHWtA9qoTlL9Xh7nGnH9sybt/8dKQCR5NwjM8GV6BpB8g1/HIUkeK3oI8ERgJFaaBvWGOVhnAEiX5h+SAnuo1QpvfsuyipU3bg49lv55ASAUxJ5sqqhvlCmBWG4o23fcgbBnTRD46TUgcXqM5ldzMDf5mg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051682; c=relaxed/simple; bh=Vqhkafs7GtBGtMJnM46QTAmE6CNjwNhW4kqxLd47GIs=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XeAS9dEY05KjJE83M5OiEgQWaavmGjd3MabQ5HlRtFVW88XD/Z8dGu34nLJoPceKzEBPes1/puA+H5kXJjFVXJnnnMuY7p1SfReQZcmTJNPtB1h7KpKpPwCp5ebf4BKzQyRvTZWrTVTUukZdlHj5YLYvShLFhkDy9fORprg1Veg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=AwGx8xfO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AwGx8xfO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 91F79C4CECE; Fri, 13 Dec 2024 01:01:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051682; bh=Vqhkafs7GtBGtMJnM46QTAmE6CNjwNhW4kqxLd47GIs=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=AwGx8xfOVLm1L3IIIfbAiKTNB3kQ7qSpEal13VT9BmM4Hl0QvyYG83WXkLBIjXVlM Cjw6PzWQIMR06fGKs0VCdPe3Cw81wW6nRNW9+2bZ7BxHLGfDnO3NUPyRwN6YXHKDBH 0ckUpNCsJaVdIXxsbXnczck1J2D6ZcNtGzByjlF9wRfmsWjWTHXhr2+4N5zrbOSzS6 hmzmQZqw/7iN60Eju9FQG2NvZ77BrslZ+6WFu5fRoBnGvalVK6655WBtKCiWenngXR UVMhR2XabUeCg76kiouX4lKleU8HmsooctJ+gmBbN4ZuCg5u4usqx1qfWwkO8VYvQm optF9UYOF2JsA== Date: Thu, 12 Dec 2024 17:01:22 -0800 Subject: [PATCH 03/37] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123364.1181370.781600665689768961.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Simplify the calling conventions by allowing callers to pass a fsbno (xfs_fsblock_t) directly into these functions, since we're just going to set it in a struct anyway. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_refcount.c | 6 ++---- fs/xfs/libxfs/xfs_rmap.c | 12 +++++------- fs/xfs/libxfs/xfs_rmap.h | 8 ++++---- fs/xfs/scrub/alloc_repair.c | 5 +++-- 4 files changed, 14 insertions(+), 17 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 2dbab68b4fe69f..26d3d7956e069d 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1831,8 +1831,7 @@ xfs_refcount_alloc_cow_extent( __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len); /* Add rmap entry */ - xfs_rmap_alloc_extent(tp, XFS_FSB_TO_AGNO(mp, fsb), - XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); + xfs_rmap_alloc_extent(tp, fsb, len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ @@ -1848,8 +1847,7 @@ xfs_refcount_free_cow_extent( return; /* Remove rmap entry */ - xfs_rmap_free_extent(tp, XFS_FSB_TO_AGNO(mp, fsb), - XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); + xfs_rmap_free_extent(tp, fsb, len, XFS_RMAP_OWN_COW); __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len); } diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index d0df68dc313185..57dbf99ce00453 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -525,7 +525,7 @@ xfs_rmap_free_check_owner( struct xfs_btree_cur *cur, uint64_t ltoff, struct xfs_rmap_irec *rec, - xfs_filblks_t len, + xfs_extlen_t len, uint64_t owner, uint64_t offset, unsigned int flags) @@ -2729,8 +2729,7 @@ xfs_rmap_convert_extent( void xfs_rmap_alloc_extent( struct xfs_trans *tp, - xfs_agnumber_t agno, - xfs_agblock_t bno, + xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) { @@ -2739,7 +2738,7 @@ xfs_rmap_alloc_extent( if (!xfs_rmap_update_is_needed(tp->t_mountp, XFS_DATA_FORK)) return; - bmap.br_startblock = XFS_AGB_TO_FSB(tp->t_mountp, agno, bno); + bmap.br_startblock = fsbno; bmap.br_blockcount = len; bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; @@ -2751,8 +2750,7 @@ xfs_rmap_alloc_extent( void xfs_rmap_free_extent( struct xfs_trans *tp, - xfs_agnumber_t agno, - xfs_agblock_t bno, + xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) { @@ -2761,7 +2759,7 @@ xfs_rmap_free_extent( if (!xfs_rmap_update_is_needed(tp->t_mountp, XFS_DATA_FORK)) return; - bmap.br_startblock = XFS_AGB_TO_FSB(tp->t_mountp, agno, bno); + bmap.br_startblock = fsbno; bmap.br_blockcount = len; bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index 96b4321d831007..8e2657af038e9e 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -184,10 +184,10 @@ void xfs_rmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip, void xfs_rmap_convert_extent(struct xfs_mount *mp, struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *imap); -void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_agnumber_t agno, - xfs_agblock_t bno, xfs_extlen_t len, uint64_t owner); -void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_agnumber_t agno, - xfs_agblock_t bno, xfs_extlen_t len, uint64_t owner); +void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, + xfs_extlen_t len, uint64_t owner); +void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, + xfs_extlen_t len, uint64_t owner); int xfs_rmap_finish_one(struct xfs_trans *tp, struct xfs_rmap_intent *ri, struct xfs_btree_cur **pcur); diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c index 0433363a90b616..11e1e5404fc6dc 100644 --- a/fs/xfs/scrub/alloc_repair.c +++ b/fs/xfs/scrub/alloc_repair.c @@ -542,8 +542,9 @@ xrep_abt_dispose_one( /* Add a deferred rmap for each extent we used. */ if (resv->used > 0) - xfs_rmap_alloc_extent(sc->tp, pag_agno(pag), resv->agbno, - resv->used, XFS_RMAP_OWN_AG); + xfs_rmap_alloc_extent(sc->tp, + xfs_agbno_to_fsb(pag, resv->agbno), resv->used, + XFS_RMAP_OWN_AG); /* * For each reserved btree block we didn't use, add it to the free From patchwork Fri Dec 13 01:01:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906196 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B13A2907 for ; Fri, 13 Dec 2024 01:01:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051698; cv=none; b=Gk9ov8Sq5vBbtSuM31XsP7JUww+iRmBp/FkpeQCqOKRA6kxo9L/jg+GRljjqyM3k5xg7ZKwkDogHqYbLnotg0dU1nj7NSM06yuXS9pZEPoxwy9GGaAwHliQj/0anS6Vu3fLuOhW1gO+b4W7oH/HfxUBWNOC1BhLCLy9AvEYdD1E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051698; c=relaxed/simple; bh=uriVxolC2pOT4y1ZcxGAFdzCzkG/yyYEltMB+at83D4=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=bZxoPppxRHOgcrMOzS9YowojrYqdA+E+tK0bkDVb/y6n9LNP+2juhrkJyDrN7WXyjmIuJQ8FVjzHMsyuY0gnogdgnis4Cj6OH4d+/1/KSZ0fkyUKQKhbX+vOksmnfBQkvJ8e/vrNPcOEu7WUJnq9kw8NAY8PPDloxkaCfMjYviQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=co5EJ3+Y; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="co5EJ3+Y" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2F523C4CECE; Fri, 13 Dec 2024 01:01:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051698; bh=uriVxolC2pOT4y1ZcxGAFdzCzkG/yyYEltMB+at83D4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=co5EJ3+YlOq13KrwT7YhEM8tC9HmlcVLu1nJkFo5pONnDyHM4qGQsToARYRxtpORq GCGOLXDQHmHN6QVZXtHlScae37tkPOifA+9UnYVaT7/KpX8qBSpPXwO1Cr0CB63j5P V0J3ecDWse0Jflu8i4akCiITKebh7R8curH0ZpTdcqs/2TnUaQOph8wSRoYJOJEBQZ GmwxY3YScN+2820/ghRyXThPPBvpPmqHiLF0ub68oQN4T1BpxYlpESpuHFB4qyeJmz 5wsJaizttaaWd9Ei+Gb79epLyBH5GR6MEgZ3eflT7rDcrhLDGveyAzGkE3RETC+au7 cGU6960MmllQw== Date: Thu, 12 Dec 2024 17:01:37 -0800 Subject: [PATCH 04/37] xfs: introduce realtime rmap btree ondisk definitions From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123381.1181370.5283272140713380009.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add the ondisk structure definitions for realtime rmap btrees. The realtime rmap btree will be rooted from a hidden inode so it needs to have a separate btree block magic and pointer format. Next, add everything needed to read, write and manipulate rmap btree blocks. This prepares the way for connecting the btree operations implementation. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_btree.c | 5 + fs/xfs/libxfs/xfs_format.h | 10 + fs/xfs/libxfs/xfs_ondisk.h | 1 fs/xfs/libxfs/xfs_rtrmap_btree.c | 271 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 82 +++++++++++ fs/xfs/libxfs/xfs_sb.c | 6 + fs/xfs/libxfs/xfs_shared.h | 7 + fs/xfs/xfs_mount.c | 5 - fs/xfs/xfs_mount.h | 9 + fs/xfs/xfs_stats.c | 3 fs/xfs/xfs_stats.h | 1 12 files changed, 398 insertions(+), 3 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtrmap_btree.c create mode 100644 fs/xfs/libxfs/xfs_rtrmap_btree.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index ed9b0dabc1f11d..ff45efb2463f73 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -51,6 +51,7 @@ xfs-y += $(addprefix libxfs/, \ xfs_rmap_btree.o \ xfs_refcount.o \ xfs_refcount_btree.o \ + xfs_rtrmap_btree.o \ xfs_sb.o \ xfs_symlink_remote.o \ xfs_trans_inode.o \ diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 5ab201ef041e7d..0e271919374780 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -30,6 +30,7 @@ #include "xfs_health.h" #include "xfs_buf_mem.h" #include "xfs_btree_mem.h" +#include "xfs_rtrmap_btree.h" /* * Btree magic numbers. @@ -5525,6 +5526,9 @@ xfs_btree_init_cur_caches(void) if (error) goto err; error = xfs_refcountbt_init_cur_cache(); + if (error) + goto err; + error = xfs_rtrmapbt_init_cur_cache(); if (error) goto err; @@ -5543,6 +5547,7 @@ xfs_btree_destroy_cur_caches(void) xfs_bmbt_destroy_cur_cache(); xfs_rmapbt_destroy_cur_cache(); xfs_refcountbt_destroy_cur_cache(); + xfs_rtrmapbt_destroy_cur_cache(); } /* Move the btree cursor before the first record. */ diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 4d47a3e723aa13..469fc7afa591b4 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1725,6 +1725,16 @@ typedef __be32 xfs_rmap_ptr_t; XFS_FIBT_BLOCK(mp) + 1 : \ XFS_IBT_BLOCK(mp) + 1) +/* + * Realtime Reverse mapping btree format definitions + * + * This is a btree for reverse mapping records for realtime volumes + */ +#define XFS_RTRMAP_CRC_MAGIC 0x4d415052 /* 'MAPR' */ + +/* inode-based btree pointer type */ +typedef __be64 xfs_rtrmap_ptr_t; + /* * Reference Count Btree format definitions * diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h index ad0dedf00f1898..2c50877a1a2f0b 100644 --- a/fs/xfs/libxfs/xfs_ondisk.h +++ b/fs/xfs/libxfs/xfs_ondisk.h @@ -83,6 +83,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(union xfs_rtword_raw, 4); XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_raw, 4); XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo, 48); + XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t, 8); /* * m68k has problems with struct xfs_attr_leaf_name_remote, but we pad diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c new file mode 100644 index 00000000000000..d3e4c52dcaa9d0 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -0,0 +1,271 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2018-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_alloc.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_trace.h" +#include "xfs_cksum.h" +#include "xfs_error.h" +#include "xfs_extent_busy.h" +#include "xfs_rtgroup.h" + +static struct kmem_cache *xfs_rtrmapbt_cur_cache; + +/* + * Realtime Reverse Map btree. + * + * This is a btree used to track the owner(s) of a given extent in the realtime + * device. See the comments in xfs_rmap_btree.c for more information. + * + * This tree is basically the same as the regular rmap btree except that it + * is rooted in an inode and does not live in free space. + */ + +static struct xfs_btree_cur * +xfs_rtrmapbt_dup_cursor( + struct xfs_btree_cur *cur) +{ + return xfs_rtrmapbt_init_cursor(cur->bc_tp, to_rtg(cur->bc_group)); +} + +static xfs_failaddr_t +xfs_rtrmapbt_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_target->bt_mount; + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + xfs_failaddr_t fa; + int level; + + if (!xfs_verify_magic(bp, block->bb_magic)) + return __this_address; + + if (!xfs_has_rmapbt(mp)) + return __this_address; + fa = xfs_btree_fsblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN); + if (fa) + return fa; + level = be16_to_cpu(block->bb_level); + if (level > mp->m_rtrmap_maxlevels) + return __this_address; + + return xfs_btree_fsblock_verify(bp, mp->m_rtrmap_mxr[level != 0]); +} + +static void +xfs_rtrmapbt_read_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + if (!xfs_btree_fsblock_verify_crc(bp)) + xfs_verifier_error(bp, -EFSBADCRC, __this_address); + else { + fa = xfs_rtrmapbt_verify(bp); + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + } + + if (bp->b_error) + trace_xfs_btree_corrupt(bp, _RET_IP_); +} + +static void +xfs_rtrmapbt_write_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + fa = xfs_rtrmapbt_verify(bp); + if (fa) { + trace_xfs_btree_corrupt(bp, _RET_IP_); + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + return; + } + xfs_btree_fsblock_calc_crc(bp); + +} + +const struct xfs_buf_ops xfs_rtrmapbt_buf_ops = { + .name = "xfs_rtrmapbt", + .magic = { 0, cpu_to_be32(XFS_RTRMAP_CRC_MAGIC) }, + .verify_read = xfs_rtrmapbt_read_verify, + .verify_write = xfs_rtrmapbt_write_verify, + .verify_struct = xfs_rtrmapbt_verify, +}; + +const struct xfs_btree_ops xfs_rtrmapbt_ops = { + .name = "rtrmap", + .type = XFS_BTREE_TYPE_INODE, + .geom_flags = XFS_BTGEO_OVERLAPPING | + XFS_BTGEO_IROOT_RECORDS, + + .rec_len = sizeof(struct xfs_rmap_rec), + /* Overlapping btree; 2 keys per pointer. */ + .key_len = 2 * sizeof(struct xfs_rmap_key), + .ptr_len = XFS_BTREE_LONG_PTR_LEN, + + .lru_refs = XFS_RMAP_BTREE_REF, + .statoff = XFS_STATS_CALC_INDEX(xs_rtrmap_2), + + .dup_cursor = xfs_rtrmapbt_dup_cursor, + .buf_ops = &xfs_rtrmapbt_buf_ops, +}; + +/* Allocate a new rt rmap btree cursor. */ +struct xfs_btree_cur * +xfs_rtrmapbt_init_cursor( + struct xfs_trans *tp, + struct xfs_rtgroup *rtg) +{ + struct xfs_inode *ip = NULL; + struct xfs_mount *mp = rtg_mount(rtg); + struct xfs_btree_cur *cur; + + return NULL; /* XXX */ + + xfs_assert_ilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL); + + cur = xfs_btree_alloc_cursor(mp, tp, &xfs_rtrmapbt_ops, + mp->m_rtrmap_maxlevels, xfs_rtrmapbt_cur_cache); + + cur->bc_ino.ip = ip; + cur->bc_group = xfs_group_hold(rtg_group(rtg)); + cur->bc_ino.whichfork = XFS_DATA_FORK; + cur->bc_nlevels = be16_to_cpu(ip->i_df.if_broot->bb_level) + 1; + cur->bc_ino.forksize = xfs_inode_fork_size(ip, XFS_DATA_FORK); + + return cur; +} + +/* + * Install a new rt reverse mapping btree root. Caller is responsible for + * invalidating and freeing the old btree blocks. + */ +void +xfs_rtrmapbt_commit_staged_btree( + struct xfs_btree_cur *cur, + struct xfs_trans *tp) +{ + struct xbtree_ifakeroot *ifake = cur->bc_ino.ifake; + struct xfs_ifork *ifp; + int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; + + ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + + /* + * Free any resources hanging off the real fork, then shallow-copy the + * staging fork's contents into the real fork to transfer everything + * we just built. + */ + ifp = xfs_ifork_ptr(cur->bc_ino.ip, XFS_DATA_FORK); + xfs_idestroy_fork(ifp); + memcpy(ifp, ifake->if_fork, sizeof(struct xfs_ifork)); + + cur->bc_ino.ip->i_projid = cur->bc_group->xg_gno; + xfs_trans_log_inode(tp, cur->bc_ino.ip, flags); + xfs_btree_commit_ifakeroot(cur, tp, XFS_DATA_FORK); +} + +/* Calculate number of records in a rt reverse mapping btree block. */ +static inline unsigned int +xfs_rtrmapbt_block_maxrecs( + unsigned int blocklen, + bool leaf) +{ + if (leaf) + return blocklen / sizeof(struct xfs_rmap_rec); + return blocklen / + (2 * sizeof(struct xfs_rmap_key) + sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Calculate number of records in an rt reverse mapping btree block. + */ +unsigned int +xfs_rtrmapbt_maxrecs( + struct xfs_mount *mp, + unsigned int blocklen, + bool leaf) +{ + blocklen -= XFS_RTRMAP_BLOCK_LEN; + return xfs_rtrmapbt_block_maxrecs(blocklen, leaf); +} + +/* Compute the max possible height for realtime reverse mapping btrees. */ +unsigned int +xfs_rtrmapbt_maxlevels_ondisk(void) +{ + unsigned int minrecs[2]; + unsigned int blocklen; + + blocklen = XFS_MIN_CRC_BLOCKSIZE - XFS_BTREE_LBLOCK_CRC_LEN; + + minrecs[0] = xfs_rtrmapbt_block_maxrecs(blocklen, true) / 2; + minrecs[1] = xfs_rtrmapbt_block_maxrecs(blocklen, false) / 2; + + /* We need at most one record for every block in an rt group. */ + return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS); +} + +int __init +xfs_rtrmapbt_init_cur_cache(void) +{ + xfs_rtrmapbt_cur_cache = kmem_cache_create("xfs_rtrmapbt_cur", + xfs_btree_cur_sizeof(xfs_rtrmapbt_maxlevels_ondisk()), + 0, 0, NULL); + + if (!xfs_rtrmapbt_cur_cache) + return -ENOMEM; + return 0; +} + +void +xfs_rtrmapbt_destroy_cur_cache(void) +{ + kmem_cache_destroy(xfs_rtrmapbt_cur_cache); + xfs_rtrmapbt_cur_cache = NULL; +} + +/* Compute the maximum height of an rt reverse mapping btree. */ +void +xfs_rtrmapbt_compute_maxlevels( + struct xfs_mount *mp) +{ + unsigned int d_maxlevels, r_maxlevels; + + if (!xfs_has_rtrmapbt(mp)) { + mp->m_rtrmap_maxlevels = 0; + return; + } + + /* + * The realtime rmapbt lives on the data device, which means that its + * maximum height is constrained by the size of the data device and + * the height required to store one rmap record for each block in an + * rt group. + */ + d_maxlevels = xfs_btree_space_to_height(mp->m_rtrmap_mnr, + mp->m_sb.sb_dblocks); + r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrmap_mnr, + mp->m_groups[XG_TYPE_RTG].blocks); + + /* Add one level to handle the inode root level. */ + mp->m_rtrmap_maxlevels = min(d_maxlevels, r_maxlevels) + 1; +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h new file mode 100644 index 00000000000000..63aabae2e09db1 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -0,0 +1,82 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (c) 2018-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_RTRMAP_BTREE_H__ +#define __XFS_RTRMAP_BTREE_H__ + +struct xfs_buf; +struct xfs_btree_cur; +struct xfs_mount; +struct xbtree_ifakeroot; +struct xfs_rtgroup; + +/* rmaps only exist on crc enabled filesystems */ +#define XFS_RTRMAP_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN + +struct xfs_btree_cur *xfs_rtrmapbt_init_cursor(struct xfs_trans *tp, + struct xfs_rtgroup *rtg); +struct xfs_btree_cur *xfs_rtrmapbt_stage_cursor(struct xfs_mount *mp, + struct xfs_rtgroup *rtg, struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake); +void xfs_rtrmapbt_commit_staged_btree(struct xfs_btree_cur *cur, + struct xfs_trans *tp); +unsigned int xfs_rtrmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, + bool leaf); +void xfs_rtrmapbt_compute_maxlevels(struct xfs_mount *mp); + +/* + * Addresses of records, keys, and pointers within an incore rtrmapbt block. + * + * (note that some of these may appear unused, but they are used in userspace) + */ +static inline struct xfs_rmap_rec * +xfs_rtrmap_rec_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_rec *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + (index - 1) * sizeof(struct xfs_rmap_rec)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_key_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_high_key_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + sizeof(struct xfs_rmap_key) + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_ptr_addr( + struct xfs_btree_block *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrmap_ptr_t *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + maxrecs * 2 * sizeof(struct xfs_rmap_key) + + (index - 1) * sizeof(xfs_rtrmap_ptr_t)); +} + +unsigned int xfs_rtrmapbt_maxlevels_ondisk(void); + +int __init xfs_rtrmapbt_init_cur_cache(void); +void xfs_rtrmapbt_destroy_cur_cache(void); + +#endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 3b5623611eba02..83fb14b4074c8d 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -28,6 +28,7 @@ #include "xfs_rtbitmap.h" #include "xfs_exchrange.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" /* * Physical superblock buffer manipulations. Shared with libxfs in userspace. @@ -1215,6 +1216,11 @@ xfs_sb_mount_common( mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2; mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2; + mp->m_rtrmap_mxr[0] = xfs_rtrmapbt_maxrecs(mp, sbp->sb_blocksize, true); + mp->m_rtrmap_mxr[1] = xfs_rtrmapbt_maxrecs(mp, sbp->sb_blocksize, false); + mp->m_rtrmap_mnr[0] = mp->m_rtrmap_mxr[0] / 2; + mp->m_rtrmap_mnr[1] = mp->m_rtrmap_mxr[1] / 2; + mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, true); mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, false); mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2; diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index e7efdb9ceaf382..da23dac22c3f08 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -42,6 +42,7 @@ extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops; extern const struct xfs_buf_ops xfs_rtsummary_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; extern const struct xfs_buf_ops xfs_rtsb_buf_ops; +extern const struct xfs_buf_ops xfs_rtrmapbt_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops; extern const struct xfs_buf_ops xfs_symlink_buf_ops; @@ -55,6 +56,7 @@ extern const struct xfs_btree_ops xfs_bmbt_ops; extern const struct xfs_btree_ops xfs_refcountbt_ops; extern const struct xfs_btree_ops xfs_rmapbt_ops; extern const struct xfs_btree_ops xfs_rmapbt_mem_ops; +extern const struct xfs_btree_ops xfs_rtrmapbt_ops; static inline bool xfs_btree_is_bno(const struct xfs_btree_ops *ops) { @@ -100,6 +102,11 @@ static inline bool xfs_btree_is_mem_rmap(const struct xfs_btree_ops *ops) # define xfs_btree_is_mem_rmap(...) (false) #endif +static inline bool xfs_btree_is_rtrmap(const struct xfs_btree_ops *ops) +{ + return ops == &xfs_rtrmapbt_ops; +} + /* log size calculation functions */ int xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes); int xfs_log_calc_minimum_size(struct xfs_mount *); diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 97137126b16f5a..7b7d21b50d5409 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -37,6 +37,7 @@ #include "xfs_rtbitmap.h" #include "xfs_metafile.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" #include "scrub/stats.h" static DEFINE_MUTEX(xfs_uuid_table_mutex); @@ -655,8 +656,7 @@ static inline void xfs_rtbtree_compute_maxlevels( struct xfs_mount *mp) { - /* This will be filled in later. */ - mp->m_rtbtree_maxlevels = 0; + mp->m_rtbtree_maxlevels = mp->m_rtrmap_maxlevels; } /* @@ -727,6 +727,7 @@ xfs_mountfs( xfs_bmap_compute_maxlevels(mp, XFS_ATTR_FORK); xfs_mount_setup_inode_geom(mp); xfs_rmapbt_compute_maxlevels(mp); + xfs_rtrmapbt_compute_maxlevels(mp); xfs_refcountbt_compute_maxlevels(mp); xfs_agbtree_compute_maxlevels(mp); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index ddb9d19a3a3d53..1bc95fb170db61 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -158,11 +158,14 @@ typedef struct xfs_mount { uint m_bmap_dmnr[2]; /* min bmap btree records */ uint m_rmap_mxr[2]; /* max rmap btree records */ uint m_rmap_mnr[2]; /* min rmap btree records */ + uint m_rtrmap_mxr[2]; /* max rtrmap btree records */ + uint m_rtrmap_mnr[2]; /* min rtrmap btree records */ uint m_refc_mxr[2]; /* max refc btree records */ uint m_refc_mnr[2]; /* min refc btree records */ uint m_alloc_maxlevels; /* max alloc btree levels */ uint m_bm_maxlevels[2]; /* max bmap btree levels */ uint m_rmap_maxlevels; /* max rmap btree levels */ + uint m_rtrmap_maxlevels; /* max rtrmap btree level */ uint m_refc_maxlevels; /* max refcount btree level */ unsigned int m_agbtree_maxlevels; /* max level of all AG btrees */ unsigned int m_rtbtree_maxlevels; /* max level of all rt btrees */ @@ -399,6 +402,12 @@ static inline bool xfs_has_rtsb(struct xfs_mount *mp) return xfs_has_rtgroups(mp) && xfs_has_realtime(mp); } +static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp) +{ + return xfs_has_rtgroups(mp) && xfs_has_realtime(mp) && + xfs_has_rmapbt(mp); +} + /* * Some features are always on for v5 file systems, allow the compiler to * eliminiate dead code when building without v4 support. diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c index ffb52725c2a8e8..f94fb70b524ffb 100644 --- a/fs/xfs/xfs_stats.c +++ b/fs/xfs/xfs_stats.c @@ -52,7 +52,8 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf) { "rmapbt", xfsstats_offset(xs_refcbt_2) }, { "refcntbt", xfsstats_offset(xs_rmap_mem_2) }, { "rmapbt_mem", xfsstats_offset(xs_rcbag_2) }, - { "rcbagbt", xfsstats_offset(xs_qm_dqreclaims)}, + { "rcbagbt", xfsstats_offset(xs_rtrmap_2) }, + { "rtrmapbt", xfsstats_offset(xs_qm_dqreclaims)}, /* we print both series of quota information together */ { "qm", xfsstats_offset(xs_xstrat_bytes)}, }; diff --git a/fs/xfs/xfs_stats.h b/fs/xfs/xfs_stats.h index a61fb56ed2e66c..05dc69c6d94906 100644 --- a/fs/xfs/xfs_stats.h +++ b/fs/xfs/xfs_stats.h @@ -127,6 +127,7 @@ struct __xfsstats { uint32_t xs_refcbt_2[__XBTS_MAX]; uint32_t xs_rmap_mem_2[__XBTS_MAX]; uint32_t xs_rcbag_2[__XBTS_MAX]; + uint32_t xs_rtrmap_2[__XBTS_MAX]; uint32_t xs_qm_dqreclaims; uint32_t xs_qm_dqreclaim_misses; uint32_t xs_qm_dquot_dups; From patchwork Fri Dec 13 01:01:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906197 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 460687485 for ; Fri, 13 Dec 2024 01:01:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051714; cv=none; b=MBiJZMf7cfhf8JgMNW/vExFR8ObqRsLtqeJ091Fr/E978wt3YdGc3fm+h3Lh6QriFoA30rFWl1BrAnGIUXX7mE7DNQbpZtvEwrQjyTC4KPXmRolmbP6xPKNx5vmOyD3+tq64IIPtr/2SJFWYJIiPUW+fzMd87KqyDV5kp4mCRUQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051714; c=relaxed/simple; bh=lIwZew1L2geLGB9vn4+JuQwT6Sh4NZt+BA6+78TQRG0=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jarYAc1IpAG7ExV88XhVO72yExStRA5iVjn1g1RsFCB03jprzfZ7fNYPbMu6941ebX8zDPQ5BI8vWtkz5jhvHTaiHCbbdCuugfjrFg3xktyjIe8t8dzgDWKt1sBvSjiVsBAHCyy+EmtEK0C6d67d5NXqkONjSKPmvA8Et/dFjgU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=F/VCpc/E; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="F/VCpc/E" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C6EE7C4CECE; Fri, 13 Dec 2024 01:01:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051713; bh=lIwZew1L2geLGB9vn4+JuQwT6Sh4NZt+BA6+78TQRG0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=F/VCpc/EOlrNQctElDFNfSaAIDHB2X+MSs32Zol3GSLRJBige0qjR7WHXyDGfpNLM X+++BVTpCm9M5yIxooZuUw/DOAvC+rccSNX9NYWNAay8LG2do0R60O7Ln83P+aLINw i8sQyUoTYrnXUH5hkG8MNGnJetopjYK3ZPLPKIP3/eJXra7sIzD04JkrjfFnoabV6l 2SY141YcFLEmvVHMlThPX8oVzHq7kq3WDOHnOn3nBK2Sdwl4Twl+wwklCnN5Al/T8u C37EEbmrXopBrg1S9dPlC9qhd8J1oD+k9f7wGNfiF0HVVOxue1IBzoWc8fTwrjvLsW p77oLwP87uejg== Date: Thu, 12 Dec 2024 17:01:53 -0800 Subject: [PATCH 05/37] xfs: realtime rmap btree transaction reservations From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123399.1181370.1254278860717277218.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Make sure that there's enough log reservation to handle mapping and unmapping realtime extents. We have to reserve enough space to handle a split in the rtrmapbt to add the record and a second split in the regular rmapbt to record the rtrmapbt split. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_exchmaps.c | 4 +++- fs/xfs/libxfs/xfs_trans_resv.c | 12 ++++++++++-- fs/xfs/libxfs/xfs_trans_space.h | 13 +++++++++++++ 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_exchmaps.c b/fs/xfs/libxfs/xfs_exchmaps.c index 2021396651de27..3f1d6a98c11819 100644 --- a/fs/xfs/libxfs/xfs_exchmaps.c +++ b/fs/xfs/libxfs/xfs_exchmaps.c @@ -662,7 +662,9 @@ xfs_exchmaps_rmapbt_blocks( if (!xfs_has_rmapbt(mp)) return 0; if (XFS_IS_REALTIME_INODE(req->ip1)) - return 0; + return howmany_64(req->nr_exchanges, + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp)) * + XFS_RTRMAPADD_SPACE_RES(mp); return howmany_64(req->nr_exchanges, XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)) * diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index bab402340b5da8..f3392eb2d7f41f 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -213,7 +213,9 @@ xfs_calc_inode_chunk_res( * Per-extent log reservation for the btree changes involved in freeing or * allocating a realtime extent. We have to be able to log as many rtbitmap * blocks as needed to mark inuse XFS_BMBT_MAX_EXTLEN blocks' worth of realtime - * extents, as well as the realtime summary block. + * extents, as well as the realtime summary block (t1). Realtime rmap btree + * operations happen in a second transaction, so factor in a couple of rtrmapbt + * splits (t2). */ static unsigned int xfs_rtalloc_block_count( @@ -222,10 +224,16 @@ xfs_rtalloc_block_count( { unsigned int rtbmp_blocks; xfs_rtxlen_t rtxlen; + unsigned int t1, t2 = 0; rtxlen = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN); rtbmp_blocks = xfs_rtbitmap_blockcount_len(mp, rtxlen); - return (rtbmp_blocks + 1) * num_ops; + t1 = (rtbmp_blocks + 1) * num_ops; + + if (xfs_has_rmapbt(mp)) + t2 = num_ops * (2 * mp->m_rtrmap_maxlevels - 1); + + return max(t1, t2); } /* diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h index 1155ff2d37e29f..d89b570aafcc64 100644 --- a/fs/xfs/libxfs/xfs_trans_space.h +++ b/fs/xfs/libxfs/xfs_trans_space.h @@ -14,6 +14,19 @@ #define XFS_MAX_CONTIG_BMAPS_PER_BLOCK(mp) \ (((mp)->m_bmap_dmxr[0]) - ((mp)->m_bmap_dmnr[0])) +/* Worst case number of realtime rmaps that can be held in a block. */ +#define XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp) \ + (((mp)->m_rtrmap_mxr[0]) - ((mp)->m_rtrmap_mnr[0])) + +/* Adding one realtime rmap could split every level to the top of the tree. */ +#define XFS_RTRMAPADD_SPACE_RES(mp) ((mp)->m_rtrmap_maxlevels) + +/* Blocks we might need to add "b" realtime rmaps to a tree. */ +#define XFS_NRTRMAPADD_SPACE_RES(mp, b) \ + ((((b) + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp) - 1) / \ + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp)) * \ + XFS_RTRMAPADD_SPACE_RES(mp)) + /* Worst case number of rmaps that can be held in a block. */ #define XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) \ (((mp)->m_rmap_mxr[0]) - ((mp)->m_rmap_mnr[0])) From patchwork Fri Dec 13 01:02:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906198 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9030C2629D for ; Fri, 13 Dec 2024 01:02:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051729; cv=none; b=TD0zyfWeVGVJU/J+80L0L7ZtL5cL9GlUz04N3EwQ3WlpY7sCJl8gdAXre0E95OvwoVlrHMaef/LI19i/SIM2a0bCet+76RNTUux8jXaf8pCLIBDo/FcHD2RZLcNWsI4M+1R2CXQ7ZCG7/DYsUonadNWuzvkw2CnPSF+WIv+Dv3E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051729; c=relaxed/simple; bh=wiGiVRa3PT2bsI3QT+JP8YKgPCBl58Ue9paV0GvFUrw=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=orWLVImIZbc3Y2tThbw3XXfmR8KvXCofzkL1Xqv9Ig+vOyfvY61VVvxuFpUQ6uZoBwUgF+fS7BKfDXpcp9yaX1EJzsQCdsiZScaukWojnv2AgIdazdhIJm7QlDIJ/No0S9rpfd4AfO1iF8RVfO8YNxAgAz8E5CauQ0R4xJCT7Ws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hSY8eOrV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hSY8eOrV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 68AEDC4CECE; Fri, 13 Dec 2024 01:02:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051729; bh=wiGiVRa3PT2bsI3QT+JP8YKgPCBl58Ue9paV0GvFUrw=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=hSY8eOrV6Yq4hWzzusXhnbdg+rykaRpRfUCaIgaDH0ZLRLZyy4OULDdxevIuLRaLE Oq4QFKguqZQooiA/NdawaPzHs07EvqUbIzN28C7coFOCjpxWwgXLpK7vvpzOu6xpeS ccYjJJR1yctk2G/+/7bCXNDoQgmlRvGlmj/6rkfPBwNoSJdOd+QiCkEQCu260+DNLQ R9/7OKI/DPCjFAjQH0Wc8HLhUsyHoNLlbxDv6T2CmryLRYZV26URiRlLMGnsd9VN4R nl9N+6VSQ+L459pCUxtHnnKqdYm4DuCc93S7gl2Si2OnH/r/TLR+tB4m9W1mbw9a5s smU6nDGJusZUQ== Date: Thu, 12 Dec 2024 17:02:08 -0800 Subject: [PATCH 06/37] xfs: add realtime rmap btree operations From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123416.1181370.6765657366135349371.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Implement the generic btree operations needed to manipulate rtrmap btree blocks. This is different from the regular rmapbt in that we allocate space from the filesystem at large, and are neither constrained to the free space nor any particular AG. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_btree.c | 68 ++++++++++ fs/xfs/libxfs/xfs_btree.h | 6 + fs/xfs/libxfs/xfs_rtrmap_btree.c | 271 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 345 insertions(+) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 0e271919374780..36ab06f8a3bc99 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -31,6 +31,10 @@ #include "xfs_buf_mem.h" #include "xfs_btree_mem.h" #include "xfs_rtrmap_btree.h" +#include "xfs_bmap.h" +#include "xfs_rmap.h" +#include "xfs_quota.h" +#include "xfs_metafile.h" /* * Btree magic numbers. @@ -5576,3 +5580,67 @@ xfs_btree_goto_left_edge( return 0; } + +/* Allocate a block for an inode-rooted metadata btree. */ +int +xfs_btree_alloc_metafile_block( + struct xfs_btree_cur *cur, + const union xfs_btree_ptr *start, + union xfs_btree_ptr *new, + int *stat) +{ + struct xfs_alloc_arg args = { + .mp = cur->bc_mp, + .tp = cur->bc_tp, + .resv = XFS_AG_RESV_METAFILE, + .minlen = 1, + .maxlen = 1, + .prod = 1, + }; + struct xfs_inode *ip = cur->bc_ino.ip; + int error; + + ASSERT(xfs_is_metadir_inode(ip)); + + xfs_rmap_ino_bmbt_owner(&args.oinfo, ip->i_ino, cur->bc_ino.whichfork); + error = xfs_alloc_vextent_start_ag(&args, + XFS_INO_TO_FSB(cur->bc_mp, ip->i_ino)); + if (error) + return error; + if (args.fsbno == NULLFSBLOCK) { + *stat = 0; + return 0; + } + ASSERT(args.len == 1); + + xfs_metafile_resv_alloc_space(ip, &args); + + new->l = cpu_to_be64(args.fsbno); + *stat = 1; + return 0; +} + +/* Free a block from an inode-rooted metadata btree. */ +int +xfs_btree_free_metafile_block( + struct xfs_btree_cur *cur, + struct xfs_buf *bp) +{ + struct xfs_owner_info oinfo; + struct xfs_mount *mp = cur->bc_mp; + struct xfs_inode *ip = cur->bc_ino.ip; + struct xfs_trans *tp = cur->bc_tp; + xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); + int error; + + ASSERT(xfs_is_metadir_inode(ip)); + + xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); + error = xfs_free_extent_later(tp, fsbno, 1, &oinfo, XFS_AG_RESV_METAFILE, + 0); + if (error) + return error; + + xfs_metafile_resv_free_space(ip, tp, 1); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index 3b8c2ccad90847..ee82dc777d6d5b 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -703,4 +703,10 @@ xfs_btree_at_iroot( level == cur->bc_nlevels - 1; } +int xfs_btree_alloc_metafile_block(struct xfs_btree_cur *cur, + const union xfs_btree_ptr *start, union xfs_btree_ptr *newp, + int *stat); +int xfs_btree_free_metafile_block(struct xfs_btree_cur *cur, + struct xfs_buf *bp); + #endif /* __XFS_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index d3e4c52dcaa9d0..99d828bb5fe7c3 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -18,12 +18,14 @@ #include "xfs_alloc.h" #include "xfs_btree.h" #include "xfs_btree_staging.h" +#include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" #include "xfs_trace.h" #include "xfs_cksum.h" #include "xfs_error.h" #include "xfs_extent_busy.h" #include "xfs_rtgroup.h" +#include "xfs_bmap.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -44,6 +46,182 @@ xfs_rtrmapbt_dup_cursor( return xfs_rtrmapbt_init_cursor(cur->bc_tp, to_rtg(cur->bc_group)); } +STATIC int +xfs_rtrmapbt_get_minrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrmapbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0) / 2; + } + + return cur->bc_mp->m_rtrmap_mnr[level != 0]; +} + +STATIC int +xfs_rtrmapbt_get_maxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrmapbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0); + } + + return cur->bc_mp->m_rtrmap_mxr[level != 0]; +} + +/* + * Convert the ondisk record's offset field into the ondisk key's offset field. + * Fork and bmbt are significant parts of the rmap record key, but written + * status is merely a record attribute. + */ +static inline __be64 ondisk_rec_offset_to_key(const union xfs_btree_rec *rec) +{ + return rec->rmap.rm_offset & ~cpu_to_be64(XFS_RMAP_OFF_UNWRITTEN); +} + +STATIC void +xfs_rtrmapbt_init_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + key->rmap.rm_startblock = rec->rmap.rm_startblock; + key->rmap.rm_owner = rec->rmap.rm_owner; + key->rmap.rm_offset = ondisk_rec_offset_to_key(rec); +} + +STATIC void +xfs_rtrmapbt_init_high_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + uint64_t off; + int adj; + + adj = be32_to_cpu(rec->rmap.rm_blockcount) - 1; + + key->rmap.rm_startblock = rec->rmap.rm_startblock; + be32_add_cpu(&key->rmap.rm_startblock, adj); + key->rmap.rm_owner = rec->rmap.rm_owner; + key->rmap.rm_offset = ondisk_rec_offset_to_key(rec); + if (XFS_RMAP_NON_INODE_OWNER(be64_to_cpu(rec->rmap.rm_owner)) || + XFS_RMAP_IS_BMBT_BLOCK(be64_to_cpu(rec->rmap.rm_offset))) + return; + off = be64_to_cpu(key->rmap.rm_offset); + off = (XFS_RMAP_OFF(off) + adj) | (off & ~XFS_RMAP_OFF_MASK); + key->rmap.rm_offset = cpu_to_be64(off); +} + +STATIC void +xfs_rtrmapbt_init_rec_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_rec *rec) +{ + rec->rmap.rm_startblock = cpu_to_be32(cur->bc_rec.r.rm_startblock); + rec->rmap.rm_blockcount = cpu_to_be32(cur->bc_rec.r.rm_blockcount); + rec->rmap.rm_owner = cpu_to_be64(cur->bc_rec.r.rm_owner); + rec->rmap.rm_offset = cpu_to_be64( + xfs_rmap_irec_offset_pack(&cur->bc_rec.r)); +} + +STATIC void +xfs_rtrmapbt_init_ptr_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr) +{ + ptr->l = 0; +} + +/* + * Mask the appropriate parts of the ondisk key field for a key comparison. + * Fork and bmbt are significant parts of the rmap record key, but written + * status is merely a record attribute. + */ +static inline uint64_t offset_keymask(uint64_t offset) +{ + return offset & ~XFS_RMAP_OFF_UNWRITTEN; +} + +STATIC int64_t +xfs_rtrmapbt_key_diff( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key) +{ + struct xfs_rmap_irec *rec = &cur->bc_rec.r; + const struct xfs_rmap_key *kp = &key->rmap; + __u64 x, y; + int64_t d; + + d = (int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock; + if (d) + return d; + + x = be64_to_cpu(kp->rm_owner); + y = rec->rm_owner; + if (x > y) + return 1; + else if (y > x) + return -1; + + x = offset_keymask(be64_to_cpu(kp->rm_offset)); + y = offset_keymask(xfs_rmap_irec_offset_pack(rec)); + if (x > y) + return 1; + else if (y > x) + return -1; + return 0; +} + +STATIC int64_t +xfs_rtrmapbt_diff_two_keys( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2, + const union xfs_btree_key *mask) +{ + const struct xfs_rmap_key *kp1 = &k1->rmap; + const struct xfs_rmap_key *kp2 = &k2->rmap; + int64_t d; + __u64 x, y; + + /* Doesn't make sense to mask off the physical space part */ + ASSERT(!mask || mask->rmap.rm_startblock); + + d = (int64_t)be32_to_cpu(kp1->rm_startblock) - + be32_to_cpu(kp2->rm_startblock); + if (d) + return d; + + if (!mask || mask->rmap.rm_owner) { + x = be64_to_cpu(kp1->rm_owner); + y = be64_to_cpu(kp2->rm_owner); + if (x > y) + return 1; + else if (y > x) + return -1; + } + + if (!mask || mask->rmap.rm_offset) { + /* Doesn't make sense to allow offset but not owner */ + ASSERT(!mask || mask->rmap.rm_owner); + + x = offset_keymask(be64_to_cpu(kp1->rm_offset)); + y = offset_keymask(be64_to_cpu(kp2->rm_offset)); + if (x > y) + return 1; + else if (y > x) + return -1; + } + + return 0; +} + static xfs_failaddr_t xfs_rtrmapbt_verify( struct xfs_buf *bp) @@ -110,6 +288,86 @@ const struct xfs_buf_ops xfs_rtrmapbt_buf_ops = { .verify_struct = xfs_rtrmapbt_verify, }; +STATIC int +xfs_rtrmapbt_keys_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2) +{ + uint32_t x; + uint32_t y; + uint64_t a; + uint64_t b; + + x = be32_to_cpu(k1->rmap.rm_startblock); + y = be32_to_cpu(k2->rmap.rm_startblock); + if (x < y) + return 1; + else if (x > y) + return 0; + a = be64_to_cpu(k1->rmap.rm_owner); + b = be64_to_cpu(k2->rmap.rm_owner); + if (a < b) + return 1; + else if (a > b) + return 0; + a = offset_keymask(be64_to_cpu(k1->rmap.rm_offset)); + b = offset_keymask(be64_to_cpu(k2->rmap.rm_offset)); + if (a <= b) + return 1; + return 0; +} + +STATIC int +xfs_rtrmapbt_recs_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_rec *r1, + const union xfs_btree_rec *r2) +{ + uint32_t x; + uint32_t y; + uint64_t a; + uint64_t b; + + x = be32_to_cpu(r1->rmap.rm_startblock); + y = be32_to_cpu(r2->rmap.rm_startblock); + if (x < y) + return 1; + else if (x > y) + return 0; + a = be64_to_cpu(r1->rmap.rm_owner); + b = be64_to_cpu(r2->rmap.rm_owner); + if (a < b) + return 1; + else if (a > b) + return 0; + a = offset_keymask(be64_to_cpu(r1->rmap.rm_offset)); + b = offset_keymask(be64_to_cpu(r2->rmap.rm_offset)); + if (a <= b) + return 1; + return 0; +} + +STATIC enum xbtree_key_contig +xfs_rtrmapbt_keys_contiguous( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key1, + const union xfs_btree_key *key2, + const union xfs_btree_key *mask) +{ + ASSERT(!mask || mask->rmap.rm_startblock); + + /* + * We only support checking contiguity of the physical space component. + * If any callers ever need more specificity than that, they'll have to + * implement it here. + */ + ASSERT(!mask || (!mask->rmap.rm_owner && !mask->rmap.rm_offset)); + + return xbtree_key_contig(be32_to_cpu(key1->rmap.rm_startblock), + be32_to_cpu(key2->rmap.rm_startblock)); +} + const struct xfs_btree_ops xfs_rtrmapbt_ops = { .name = "rtrmap", .type = XFS_BTREE_TYPE_INODE, @@ -125,7 +383,20 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { .statoff = XFS_STATS_CALC_INDEX(xs_rtrmap_2), .dup_cursor = xfs_rtrmapbt_dup_cursor, + .alloc_block = xfs_btree_alloc_metafile_block, + .free_block = xfs_btree_free_metafile_block, + .get_minrecs = xfs_rtrmapbt_get_minrecs, + .get_maxrecs = xfs_rtrmapbt_get_maxrecs, + .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, + .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, + .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, + .init_ptr_from_cur = xfs_rtrmapbt_init_ptr_from_cur, + .key_diff = xfs_rtrmapbt_key_diff, .buf_ops = &xfs_rtrmapbt_buf_ops, + .diff_two_keys = xfs_rtrmapbt_diff_two_keys, + .keys_inorder = xfs_rtrmapbt_keys_inorder, + .recs_inorder = xfs_rtrmapbt_recs_inorder, + .keys_contiguous = xfs_rtrmapbt_keys_contiguous, }; /* Allocate a new rt rmap btree cursor. */ From patchwork Fri Dec 13 01:02:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906199 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FA64184 for ; Fri, 13 Dec 2024 01:02:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051745; cv=none; b=ZEuGWk6LBnr2jBaPqcTxeVUwHRjMXqja+KRyo5oayybM2VIn+cGFys/PO7szvPyjhMbOXuN4Qm0zJk5ZiRfR/HYWDwoB26DeiaKGo7so8Ce6HgZ0I3weWsIRdIQYYZIPY6ML65PLuWT2tqf9GxUvfb99iLvqQQbRQsfjPtsKlf4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051745; c=relaxed/simple; bh=MrPwh5YXl0mg5u2sB24zCXUqXlWx2DCpOS2mF+uH4Sg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SgR4TQKfCYRLeP5c/+7sWjwZC6uz92h1Cb3PIHpvI7xhWpyuahWtnJiuTdgRBod00wVE6/M/Tt0kLvhtA1aX3qvIuu6KN5KfRtvwVC7vyrhkh6SAAKACJuHfk8eF0QPDv0TpcgyR9ITCgUvAPLYD7iUqIL6+2UEU4NQevZ5nzFc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=H3vYcAUk; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="H3vYcAUk" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 099AFC4CECE; Fri, 13 Dec 2024 01:02:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051745; bh=MrPwh5YXl0mg5u2sB24zCXUqXlWx2DCpOS2mF+uH4Sg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=H3vYcAUkLhF66O5s4hDB2tWWKeSEYBYguv+Ixf4ZN2FKtTX3y/uZagt8m2bFmX/lD XLT/qYGe62qdiuMO4OTT1tYXnsIkhWdzXirAzgsuJogUXnzL9fvzqWwqA/q0Fe0dTn nfOilgypv5K30GTPA9wzHmfHw8kmyDl1DnbqOAnD+WWtUQaNK90OOS2SDpnx4TB787 kSfPVy8h5QuJHW069tza1C3jAKtED1khF6BX0/uAJgYk6SX0Jpq6oD0047KRZs1Goq g/wZ2bro1sjMKxsmLcE4q+y8EcURbLaeikhkPNqlXEk6ASKl6XYNBQ7KzFaxJCTIjx Mc8o3gP5CQ5LQ== Date: Thu, 12 Dec 2024 17:02:24 -0800 Subject: [PATCH 07/37] xfs: prepare rmap functions to deal with rtrmapbt From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123434.1181370.2967234809192965405.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Prepare the high-level rmap functions to deal with the new realtime rmapbt and its slightly different conventions. Provide the ability to talk to either rmapbt or rtrmapbt formats from the same high level code. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_rmap.c | 63 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rmap.h | 3 ++ fs/xfs/libxfs/xfs_rtgroup.h | 26 ++++++++++++++++++ 3 files changed, 92 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 57dbf99ce00453..da1b004837d3ad 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -25,6 +25,7 @@ #include "xfs_ag.h" #include "xfs_health.h" #include "xfs_rmap_item.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_rmap_intent_cache; @@ -264,11 +265,69 @@ xfs_rmap_check_irec( return NULL; } +static xfs_failaddr_t +xfs_rtrmap_check_meta_irec( + struct xfs_rtgroup *rtg, + const struct xfs_rmap_irec *irec) +{ + struct xfs_mount *mp = rtg_mount(rtg); + + if (irec->rm_offset != 0) + return __this_address; + if (irec->rm_flags & XFS_RMAP_UNWRITTEN) + return __this_address; + + switch (irec->rm_owner) { + case XFS_RMAP_OWN_FS: + if (irec->rm_startblock != 0) + return __this_address; + if (irec->rm_blockcount != mp->m_sb.sb_rextsize) + return __this_address; + return NULL; + default: + return __this_address; + } + + return NULL; +} + +static xfs_failaddr_t +xfs_rtrmap_check_inode_irec( + struct xfs_rtgroup *rtg, + const struct xfs_rmap_irec *irec) +{ + struct xfs_mount *mp = rtg_mount(rtg); + + if (!xfs_verify_ino(mp, irec->rm_owner)) + return __this_address; + if (!xfs_verify_rgbext(rtg, irec->rm_startblock, irec->rm_blockcount)) + return __this_address; + if (!xfs_verify_fileext(mp, irec->rm_offset, irec->rm_blockcount)) + return __this_address; + return NULL; +} + +xfs_failaddr_t +xfs_rtrmap_check_irec( + struct xfs_rtgroup *rtg, + const struct xfs_rmap_irec *irec) +{ + if (irec->rm_blockcount == 0) + return __this_address; + if (irec->rm_flags & (XFS_RMAP_BMBT_BLOCK | XFS_RMAP_ATTR_FORK)) + return __this_address; + if (XFS_RMAP_NON_INODE_OWNER(irec->rm_owner)) + return xfs_rtrmap_check_meta_irec(rtg, irec); + return xfs_rtrmap_check_inode_irec(rtg, irec); +} + static inline xfs_failaddr_t xfs_rmap_check_btrec( struct xfs_btree_cur *cur, const struct xfs_rmap_irec *irec) { + if (xfs_btree_is_rtrmap(cur->bc_ops)) + return xfs_rtrmap_check_irec(to_rtg(cur->bc_group), irec); return xfs_rmap_check_irec(to_perag(cur->bc_group), irec); } @@ -283,6 +342,10 @@ xfs_rmap_complain_bad_rec( if (xfs_btree_is_mem_rmap(cur->bc_ops)) xfs_warn(mp, "In-Memory Reverse Mapping BTree record corruption detected at %pS!", fa); + else if (xfs_btree_is_rtrmap(cur->bc_ops)) + xfs_warn(mp, + "RT Reverse Mapping BTree record corruption in rtgroup %u detected at %pS!", + cur->bc_group->xg_gno, fa); else xfs_warn(mp, "Reverse Mapping BTree record corruption in AG %d detected at %pS!", diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index 8e2657af038e9e..1b19f54b65047f 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -7,6 +7,7 @@ #define __XFS_RMAP_H__ struct xfs_perag; +struct xfs_rtgroup; static inline void xfs_rmap_ino_bmbt_owner( @@ -206,6 +207,8 @@ xfs_failaddr_t xfs_rmap_btrec_to_irec(const union xfs_btree_rec *rec, struct xfs_rmap_irec *irec); xfs_failaddr_t xfs_rmap_check_irec(struct xfs_perag *pag, const struct xfs_rmap_irec *irec); +xfs_failaddr_t xfs_rtrmap_check_irec(struct xfs_rtgroup *rtg, + const struct xfs_rmap_irec *irec); int xfs_rmap_has_records(struct xfs_btree_cur *cur, xfs_agblock_t bno, xfs_extlen_t len, enum xbtree_recpacking *outcome); diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 19f8d302b9aa3f..dc3ce660a01307 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -132,6 +132,32 @@ xfs_rtgroup_next( return xfs_rtgroup_next_range(mp, rtg, 0, mp->m_sb.sb_rgcount - 1); } +static inline bool +xfs_verify_rgbno( + struct xfs_rtgroup *rtg, + xfs_rgblock_t rgbno) +{ + ASSERT(xfs_has_rtgroups(rtg_mount(rtg))); + + return xfs_verify_gbno(rtg_group(rtg), rgbno); +} + +/* + * Check that [@rgbno,@len] is a valid extent range in @rtg. + * + * Must only be used for RTG-enabled file systems. + */ +static inline bool +xfs_verify_rgbext( + struct xfs_rtgroup *rtg, + xfs_rgblock_t rgbno, + xfs_extlen_t len) +{ + ASSERT(xfs_has_rtgroups(rtg_mount(rtg))); + + return xfs_verify_gbext(rtg_group(rtg), rgbno, len); +} + static inline xfs_rtblock_t xfs_rgbno_to_rtb( struct xfs_rtgroup *rtg, From patchwork Fri Dec 13 01:02:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906200 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D409317BA1 for ; Fri, 13 Dec 2024 01:02:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051760; cv=none; b=Ll8gn0HSS5qiuru/vhEwkRwWEg5m1yDRE9Ob52aXLqax70F1ax3mz1zUF6lzPXjp52pop2K4w6185i3XnNxnO5+2uUffBBGKCa60Hz+1FzEBsNOsilf1sLkKRAYKAFa9DemiL8ILC1XIJWpTe75lZokY2XpqKe7VClVh9zAarHA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051760; c=relaxed/simple; bh=aB/V927Wy9Q/K38oAIIEzG+x5GLZK8pKirIgZwQ20Yg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DXmDBeHSP9f9mMPxmKkbulphUphR6fjvWXDtY1raSJvsifK/S08TeKfu1Y5SwV78DRwjsk9e25rG2roifxmf6mEPj9AWdDMjjapfQX7w5GkTCJuF8+SiCuWo4iHv0fJ7N9CsXduslAS5Ezn2aBY3vwb8386pRXT3eZlfV8Wy6VI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LGASlexP; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LGASlexP" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A6D65C4CECE; Fri, 13 Dec 2024 01:02:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051760; bh=aB/V927Wy9Q/K38oAIIEzG+x5GLZK8pKirIgZwQ20Yg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=LGASlexPysQKPTsz9vf67coieNN5n2x9qvHPwLDLPNSyxjWLj3VJ1V6PocY7fdwV+ 9GClrtYddBd/XsbNNSIQL9t49xb10WP7x7hGMMbzuVdJpJ9mg06hPkQdV7MFqfIn5/ x6Y9sIF4jNFJC5g5pH5Vrjuzvuqg6XFRx1dif37DyY7FDZQ4TyQV8gdl8kPEQJglxk Dw74rEfLzaPu1wxX1zvdxVfpJjt7fQPBemyzWc3wLG5J44AMaKa7pxtyBiTn2ysI3J czJ+H/0dWv+8toEi6H9lHjfDc4NKmdlpoJlbV3U/iH5s2MbYWnBzB68rpQFXSb2quB AseosvStvCRDQ== Date: Thu, 12 Dec 2024 17:02:40 -0800 Subject: [PATCH 08/37] xfs: add a realtime flag to the rmap update log redo items From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123451.1181370.1057797621602382553.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Extend the rmap update (RUI) log items to handle realtime volumes by adding a new log intent item type. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_defer.h | 1 fs/xfs/libxfs/xfs_log_format.h | 6 + fs/xfs/libxfs/xfs_log_recover.h | 2 fs/xfs/libxfs/xfs_refcount.c | 4 - fs/xfs/libxfs/xfs_rmap.c | 17 ++- fs/xfs/libxfs/xfs_rmap.h | 5 + fs/xfs/scrub/alloc_repair.c | 2 fs/xfs/xfs_log_recover.c | 2 fs/xfs/xfs_rmap_item.c | 197 +++++++++++++++++++++++++++++++++++++-- 9 files changed, 213 insertions(+), 23 deletions(-) diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index ec51b8465e61cb..1e2477eaa5a844 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -69,6 +69,7 @@ struct xfs_defer_op_type { extern const struct xfs_defer_op_type xfs_bmap_update_defer_type; extern const struct xfs_defer_op_type xfs_refcount_update_defer_type; extern const struct xfs_defer_op_type xfs_rmap_update_defer_type; +extern const struct xfs_defer_op_type xfs_rtrmap_update_defer_type; extern const struct xfs_defer_op_type xfs_extent_free_defer_type; extern const struct xfs_defer_op_type xfs_agfl_free_defer_type; extern const struct xfs_defer_op_type xfs_rtextent_free_defer_type; diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index 15dec19b6c32ad..a7e0e479454d3d 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -250,6 +250,8 @@ typedef struct xfs_trans_header { #define XFS_LI_XMD 0x1249 /* mapping exchange done */ #define XFS_LI_EFI_RT 0x124a /* realtime extent free intent */ #define XFS_LI_EFD_RT 0x124b /* realtime extent free done */ +#define XFS_LI_RUI_RT 0x124c /* realtime rmap update intent */ +#define XFS_LI_RUD_RT 0x124d /* realtime rmap update done */ #define XFS_LI_TYPE_DESC \ { XFS_LI_EFI, "XFS_LI_EFI" }, \ @@ -271,7 +273,9 @@ typedef struct xfs_trans_header { { XFS_LI_XMI, "XFS_LI_XMI" }, \ { XFS_LI_XMD, "XFS_LI_XMD" }, \ { XFS_LI_EFI_RT, "XFS_LI_EFI_RT" }, \ - { XFS_LI_EFD_RT, "XFS_LI_EFD_RT" } + { XFS_LI_EFD_RT, "XFS_LI_EFD_RT" }, \ + { XFS_LI_RUI_RT, "XFS_LI_RUI_RT" }, \ + { XFS_LI_RUD_RT, "XFS_LI_RUD_RT" } /* * Inode Log Item Format definitions. diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h index 5397a8ff004df8..abc705aff26dfe 100644 --- a/fs/xfs/libxfs/xfs_log_recover.h +++ b/fs/xfs/libxfs/xfs_log_recover.h @@ -79,6 +79,8 @@ extern const struct xlog_recover_item_ops xlog_xmi_item_ops; extern const struct xlog_recover_item_ops xlog_xmd_item_ops; extern const struct xlog_recover_item_ops xlog_rtefi_item_ops; extern const struct xlog_recover_item_ops xlog_rtefd_item_ops; +extern const struct xlog_recover_item_ops xlog_rtrui_item_ops; +extern const struct xlog_recover_item_ops xlog_rtrud_item_ops; /* * Macros, structures, prototypes for internal log manager use. diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 26d3d7956e069d..bbb86dc9a25c7f 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1831,7 +1831,7 @@ xfs_refcount_alloc_cow_extent( __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len); /* Add rmap entry */ - xfs_rmap_alloc_extent(tp, fsb, len, XFS_RMAP_OWN_COW); + xfs_rmap_alloc_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ @@ -1847,7 +1847,7 @@ xfs_refcount_free_cow_extent( return; /* Remove rmap entry */ - xfs_rmap_free_extent(tp, fsb, len, XFS_RMAP_OWN_COW); + xfs_rmap_free_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len); } diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index da1b004837d3ad..8d3cea90c7cd04 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -2710,6 +2710,7 @@ __xfs_rmap_add( struct xfs_trans *tp, enum xfs_rmap_intent_type type, uint64_t owner, + bool isrt, int whichfork, struct xfs_bmbt_irec *bmap) { @@ -2721,6 +2722,7 @@ __xfs_rmap_add( ri->ri_owner = owner; ri->ri_whichfork = whichfork; ri->ri_bmap = *bmap; + ri->ri_realtime = isrt; xfs_rmap_defer_add(tp, ri); } @@ -2734,6 +2736,7 @@ xfs_rmap_map_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_MAP; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(tp->t_mountp, whichfork)) return; @@ -2741,7 +2744,7 @@ xfs_rmap_map_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_MAP_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* Unmap an extent out of a file. */ @@ -2753,6 +2756,7 @@ xfs_rmap_unmap_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_UNMAP; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(tp->t_mountp, whichfork)) return; @@ -2760,7 +2764,7 @@ xfs_rmap_unmap_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_UNMAP_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* @@ -2778,6 +2782,7 @@ xfs_rmap_convert_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_CONVERT; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(mp, whichfork)) return; @@ -2785,13 +2790,14 @@ xfs_rmap_convert_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_CONVERT_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* Schedule the creation of an rmap for non-file data. */ void xfs_rmap_alloc_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) @@ -2806,13 +2812,14 @@ xfs_rmap_alloc_extent( bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; - __xfs_rmap_add(tp, XFS_RMAP_ALLOC, owner, XFS_DATA_FORK, &bmap); + __xfs_rmap_add(tp, XFS_RMAP_ALLOC, owner, isrt, XFS_DATA_FORK, &bmap); } /* Schedule the deletion of an rmap for non-file data. */ void xfs_rmap_free_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) @@ -2827,7 +2834,7 @@ xfs_rmap_free_extent( bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; - __xfs_rmap_add(tp, XFS_RMAP_FREE, owner, XFS_DATA_FORK, &bmap); + __xfs_rmap_add(tp, XFS_RMAP_FREE, owner, isrt, XFS_DATA_FORK, &bmap); } /* Compare rmap records. Returns -1 if a < b, 1 if a > b, and 0 if equal. */ diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index 1b19f54b65047f..5f39f6e53cd19a 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -175,6 +175,7 @@ struct xfs_rmap_intent { uint64_t ri_owner; struct xfs_bmbt_irec ri_bmap; struct xfs_group *ri_group; + bool ri_realtime; }; /* functions for updating the rmapbt based on bmbt map/unmap operations */ @@ -185,9 +186,9 @@ void xfs_rmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip, void xfs_rmap_convert_extent(struct xfs_mount *mp, struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *imap); -void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, +void xfs_rmap_alloc_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner); -void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, +void xfs_rmap_free_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner); int xfs_rmap_finish_one(struct xfs_trans *tp, struct xfs_rmap_intent *ri, diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c index 11e1e5404fc6dc..bed6a09aa79112 100644 --- a/fs/xfs/scrub/alloc_repair.c +++ b/fs/xfs/scrub/alloc_repair.c @@ -542,7 +542,7 @@ xrep_abt_dispose_one( /* Add a deferred rmap for each extent we used. */ if (resv->used > 0) - xfs_rmap_alloc_extent(sc->tp, + xfs_rmap_alloc_extent(sc->tp, false, xfs_agbno_to_fsb(pag, resv->agbno), resv->used, XFS_RMAP_OWN_AG); diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 0af3d477197b24..5c95c97519c767 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1820,6 +1820,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = { &xlog_xmd_item_ops, &xlog_rtefi_item_ops, &xlog_rtefd_item_ops, + &xlog_rtrui_item_ops, + &xlog_rtrud_item_ops, }; static const struct xlog_recover_item_ops * diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index ac2913a7335871..e8caa600a95cae 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -23,6 +23,7 @@ #include "xfs_ag.h" #include "xfs_btree.h" #include "xfs_trace.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_rui_cache; struct kmem_cache *xfs_rud_cache; @@ -94,7 +95,9 @@ xfs_rui_item_format( ASSERT(atomic_read(&ruip->rui_next_extent) == ruip->rui_format.rui_nextents); - ruip->rui_format.rui_type = XFS_LI_RUI; + ASSERT(lip->li_type == XFS_LI_RUI || lip->li_type == XFS_LI_RUI_RT); + + ruip->rui_format.rui_type = lip->li_type; ruip->rui_format.rui_size = 1; xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_RUI_FORMAT, &ruip->rui_format, @@ -137,12 +140,15 @@ xfs_rui_item_release( STATIC struct xfs_rui_log_item * xfs_rui_init( struct xfs_mount *mp, + unsigned short item_type, uint nextents) { struct xfs_rui_log_item *ruip; ASSERT(nextents > 0); + ASSERT(item_type == XFS_LI_RUI || item_type == XFS_LI_RUI_RT); + if (nextents > XFS_RUI_MAX_FAST_EXTENTS) ruip = kzalloc(xfs_rui_log_item_sizeof(nextents), GFP_KERNEL | __GFP_NOFAIL); @@ -150,7 +156,7 @@ xfs_rui_init( ruip = kmem_cache_zalloc(xfs_rui_cache, GFP_KERNEL | __GFP_NOFAIL); - xfs_log_item_init(mp, &ruip->rui_item, XFS_LI_RUI, &xfs_rui_item_ops); + xfs_log_item_init(mp, &ruip->rui_item, item_type, &xfs_rui_item_ops); ruip->rui_format.rui_nextents = nextents; ruip->rui_format.rui_id = (uintptr_t)(void *)ruip; atomic_set(&ruip->rui_next_extent, 0); @@ -189,7 +195,9 @@ xfs_rud_item_format( struct xfs_rud_log_item *rudp = RUD_ITEM(lip); struct xfs_log_iovec *vecp = NULL; - rudp->rud_format.rud_type = XFS_LI_RUD; + ASSERT(lip->li_type == XFS_LI_RUD || lip->li_type == XFS_LI_RUD_RT); + + rudp->rud_format.rud_type = lip->li_type; rudp->rud_format.rud_size = 1; xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_RUD_FORMAT, &rudp->rud_format, @@ -233,6 +241,14 @@ static inline struct xfs_rmap_intent *ri_entry(const struct list_head *e) return list_entry(e, struct xfs_rmap_intent, ri_list); } +static inline bool +xfs_rui_item_isrt(const struct xfs_log_item *lip) +{ + ASSERT(lip->li_type == XFS_LI_RUI || lip->li_type == XFS_LI_RUI_RT); + + return lip->li_type == XFS_LI_RUI_RT; +} + /* Sort rmap intents by AG. */ static int xfs_rmap_update_diff_items( @@ -305,18 +321,20 @@ xfs_rmap_update_log_item( } static struct xfs_log_item * -xfs_rmap_update_create_intent( +__xfs_rmap_update_create_intent( struct xfs_trans *tp, struct list_head *items, unsigned int count, - bool sort) + bool sort, + unsigned short item_type) { struct xfs_mount *mp = tp->t_mountp; - struct xfs_rui_log_item *ruip = xfs_rui_init(mp, count); + struct xfs_rui_log_item *ruip; struct xfs_rmap_intent *ri; ASSERT(count > 0); + ruip = xfs_rui_init(mp, item_type, count); if (sort) list_sort(mp, items, xfs_rmap_update_diff_items); list_for_each_entry(ri, items, ri_list) @@ -324,6 +342,23 @@ xfs_rmap_update_create_intent( return &ruip->rui_item; } +static struct xfs_log_item * +xfs_rmap_update_create_intent( + struct xfs_trans *tp, + struct list_head *items, + unsigned int count, + bool sort) +{ + return __xfs_rmap_update_create_intent(tp, items, count, sort, + XFS_LI_RUI); +} + +static inline unsigned short +xfs_rud_type_from_rui(const struct xfs_rui_log_item *ruip) +{ + return xfs_rui_item_isrt(&ruip->rui_item) ? XFS_LI_RUD_RT : XFS_LI_RUD; +} + /* Get an RUD so we can process all the deferred rmap updates. */ static struct xfs_log_item * xfs_rmap_update_create_done( @@ -335,8 +370,8 @@ xfs_rmap_update_create_done( struct xfs_rud_log_item *rudp; rudp = kmem_cache_zalloc(xfs_rud_cache, GFP_KERNEL | __GFP_NOFAIL); - xfs_log_item_init(tp->t_mountp, &rudp->rud_item, XFS_LI_RUD, - &xfs_rud_item_ops); + xfs_log_item_init(tp->t_mountp, &rudp->rud_item, + xfs_rud_type_from_rui(ruip), &xfs_rud_item_ops); rudp->rud_ruip = ruip; rudp->rud_format.rud_rui_id = ruip->rui_format.rui_id; @@ -351,11 +386,20 @@ xfs_rmap_defer_add( { struct xfs_mount *mp = tp->t_mountp; + /* + * Deferred rmap updates for the realtime and data sections must use + * separate transactions to finish deferred work because updates to + * realtime metadata files can lock AGFs to allocate btree blocks and + * we don't want that mixing with the AGF locks taken to finish data + * section updates. + */ ri->ri_group = xfs_group_intent_get(mp, ri->ri_bmap.br_startblock, - XG_TYPE_AG); + ri->ri_realtime ? XG_TYPE_RTG : XG_TYPE_AG); trace_xfs_rmap_defer(mp, ri); - xfs_defer_add(tp, &ri->ri_list, &xfs_rmap_update_defer_type); + xfs_defer_add(tp, &ri->ri_list, ri->ri_realtime ? + &xfs_rtrmap_update_defer_type : + &xfs_rmap_update_defer_type); } /* Cancel a deferred rmap update. */ @@ -566,10 +610,13 @@ xfs_rmap_relog_intent( struct xfs_map_extent *map; unsigned int count; + ASSERT(intent->li_type == XFS_LI_RUI || + intent->li_type == XFS_LI_RUI_RT); + count = RUI_ITEM(intent)->rui_format.rui_nextents; map = RUI_ITEM(intent)->rui_format.rui_extents; - ruip = xfs_rui_init(tp->t_mountp, count); + ruip = xfs_rui_init(tp->t_mountp, intent->li_type, count); memcpy(ruip->rui_format.rui_extents, map, count * sizeof(*map)); atomic_set(&ruip->rui_next_extent, count); @@ -589,6 +636,47 @@ const struct xfs_defer_op_type xfs_rmap_update_defer_type = { .relog_intent = xfs_rmap_relog_intent, }; +#ifdef CONFIG_XFS_RT +static struct xfs_log_item * +xfs_rtrmap_update_create_intent( + struct xfs_trans *tp, + struct list_head *items, + unsigned int count, + bool sort) +{ + return __xfs_rmap_update_create_intent(tp, items, count, sort, + XFS_LI_RUI_RT); +} + +/* Clean up after calling xfs_rmap_finish_one. */ +STATIC void +xfs_rtrmap_finish_one_cleanup( + struct xfs_trans *tp, + struct xfs_btree_cur *rcur, + int error) +{ + if (rcur) + xfs_btree_del_cursor(rcur, error); +} + +const struct xfs_defer_op_type xfs_rtrmap_update_defer_type = { + .name = "rtrmap", + .max_items = XFS_RUI_MAX_FAST_EXTENTS, + .create_intent = xfs_rtrmap_update_create_intent, + .abort_intent = xfs_rmap_update_abort_intent, + .create_done = xfs_rmap_update_create_done, + .finish_item = xfs_rmap_update_finish_item, + .finish_cleanup = xfs_rtrmap_finish_one_cleanup, + .cancel_item = xfs_rmap_update_cancel_item, + .recover_work = xfs_rmap_recover_work, + .relog_intent = xfs_rmap_relog_intent, +}; +#else +const struct xfs_defer_op_type xfs_rtrmap_update_defer_type = { + .name = "rtrmap", +}; +#endif + STATIC bool xfs_rui_item_match( struct xfs_log_item *lip, @@ -654,7 +742,7 @@ xlog_recover_rui_commit_pass2( return -EFSCORRUPTED; } - ruip = xfs_rui_init(mp, rui_formatp->rui_nextents); + ruip = xfs_rui_init(mp, ITEM_TYPE(item), rui_formatp->rui_nextents); xfs_rui_copy_format(&ruip->rui_format, rui_formatp); atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents); @@ -668,6 +756,61 @@ const struct xlog_recover_item_ops xlog_rui_item_ops = { .commit_pass2 = xlog_recover_rui_commit_pass2, }; +#ifdef CONFIG_XFS_RT +STATIC int +xlog_recover_rtrui_commit_pass2( + struct xlog *log, + struct list_head *buffer_list, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + struct xfs_mount *mp = log->l_mp; + struct xfs_rui_log_item *ruip; + struct xfs_rui_log_format *rui_formatp; + size_t len; + + rui_formatp = item->ri_buf[0].i_addr; + + if (item->ri_buf[0].i_len < xfs_rui_log_format_sizeof(0)) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, + item->ri_buf[0].i_addr, item->ri_buf[0].i_len); + return -EFSCORRUPTED; + } + + len = xfs_rui_log_format_sizeof(rui_formatp->rui_nextents); + if (item->ri_buf[0].i_len != len) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, + item->ri_buf[0].i_addr, item->ri_buf[0].i_len); + return -EFSCORRUPTED; + } + + ruip = xfs_rui_init(mp, ITEM_TYPE(item), rui_formatp->rui_nextents); + xfs_rui_copy_format(&ruip->rui_format, rui_formatp); + atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents); + + xlog_recover_intent_item(log, &ruip->rui_item, lsn, + &xfs_rtrmap_update_defer_type); + return 0; +} +#else +STATIC int +xlog_recover_rtrui_commit_pass2( + struct xlog *log, + struct list_head *buffer_list, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, log->l_mp, + item->ri_buf[0].i_addr, item->ri_buf[0].i_len); + return -EFSCORRUPTED; +} +#endif + +const struct xlog_recover_item_ops xlog_rtrui_item_ops = { + .item_type = XFS_LI_RUI_RT, + .commit_pass2 = xlog_recover_rtrui_commit_pass2, +}; + /* * This routine is called when an RUD format structure is found in a committed * transaction in the log. Its purpose is to cancel the corresponding RUI if it @@ -699,3 +842,33 @@ const struct xlog_recover_item_ops xlog_rud_item_ops = { .item_type = XFS_LI_RUD, .commit_pass2 = xlog_recover_rud_commit_pass2, }; + +#ifdef CONFIG_XFS_RT +STATIC int +xlog_recover_rtrud_commit_pass2( + struct xlog *log, + struct list_head *buffer_list, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + struct xfs_rud_log_format *rud_formatp; + + rud_formatp = item->ri_buf[0].i_addr; + if (item->ri_buf[0].i_len != sizeof(struct xfs_rud_log_format)) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, log->l_mp, + rud_formatp, item->ri_buf[0].i_len); + return -EFSCORRUPTED; + } + + xlog_recover_release_intent(log, XFS_LI_RUI_RT, + rud_formatp->rud_rui_id); + return 0; +} +#else +# define xlog_recover_rtrud_commit_pass2 xlog_recover_rtrui_commit_pass2 +#endif + +const struct xlog_recover_item_ops xlog_rtrud_item_ops = { + .item_type = XFS_LI_RUD_RT, + .commit_pass2 = xlog_recover_rtrud_commit_pass2, +}; From patchwork Fri Dec 13 01:02:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906201 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E7E4184 for ; Fri, 13 Dec 2024 01:02:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051776; cv=none; b=CC6DPcAaWVVDl7CG8pmLk4S/9EysixZTFlNO2ooT544NiRyWpKTaKX25bpMdbGwxQu0nWtP60Ahs31l9pg37TEdCqHfTZp2R0JDapzPUxpkxpgwbbqxwGskjH0bIlqM4Wk9/NPBKtEOMULmB6zDw93oT1OEI15JAhwosWrG0uK4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051776; c=relaxed/simple; bh=mnuwBrcGd2UbzzzyR7SY9vev4+t/sp7PY3BFVy8LYQo=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=chvniX9EbD0hglTnYev/fq7rH0wL4xejzZxfjHDdcQLXFnXlKXKuWacbQfTuuq5wRGMRLHjzP5cPmQLiMJZOoaZ/2oJ3dfR3Pd9cq2nmHGqVCr3LQKJ9lt7FeLfjP89axegt7RtYhhbi5G9/As1b1xXILzTIST+yTiBl3+jm5gc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VO7n9Yp6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VO7n9Yp6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BDEDC4CECE; Fri, 13 Dec 2024 01:02:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051776; bh=mnuwBrcGd2UbzzzyR7SY9vev4+t/sp7PY3BFVy8LYQo=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=VO7n9Yp6LAzi2c7YLHJvsELWIDytwazXwe85C+0mOdmo6/zThqNDzYlPOPC6ulRiT jE6OnAQ9myW6jwiM0CipplPhXP/Q8ZlSnwvWQN9etQqAU0UCL9GYhDcMFdCgSiVj4l UM7aMcAJvxbSnXbu/eVJ5oaEjUICH1Br3/WqnmJV0jQg2jHeLFE4g4yPQiWav63ixX qSXEQTNYC14uaAIwZORgybeLqvxV+0g/IRzSVPc7zDLameDBwwW0F8CxjnMZTB4s7p N0SXAyhQSBSGMFrRZTLZCf4rEYNkC1eS8b3AzHdWrfYkPi2VC9uM6a4vXdbyvJjjWh d+hwwa5c6WN4g== Date: Thu, 12 Dec 2024 17:02:55 -0800 Subject: [PATCH 09/37] xfs: support recovering rmap intent items targetting realtime extents From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123469.1181370.12989413207120120911.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now that we have rmap on the realtime device and rmap intent items that target the realtime device, log recovery has to support remapping extents on the realtime volume. Make this work. Identify rtrmapbt blocks in the log correctly so that we can validate them during log recovery. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_buf_item_recover.c | 4 ++++ fs/xfs/xfs_rmap_item.c | 15 ++++++++++++--- 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index 3d0c6402cb3634..4f2e4ea29e1f57 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -262,6 +262,9 @@ xlog_recover_validate_buf_type( case XFS_BMAP_MAGIC: bp->b_ops = &xfs_bmbt_buf_ops; break; + case XFS_RTRMAP_CRC_MAGIC: + bp->b_ops = &xfs_rtrmapbt_buf_ops; + break; case XFS_RMAP_CRC_MAGIC: bp->b_ops = &xfs_rmapbt_buf_ops; break; @@ -855,6 +858,7 @@ xlog_recover_get_buf_lsn( uuid = &btb->bb_u.s.bb_uuid; break; } + case XFS_RTRMAP_CRC_MAGIC: case XFS_BMAP_CRC_MAGIC: case XFS_BMAP_MAGIC: { struct xfs_btree_block *btb = blk; diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index e8caa600a95cae..89decffe76c8b5 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -459,6 +459,7 @@ xfs_rmap_update_abort_intent( static inline bool xfs_rui_validate_map( struct xfs_mount *mp, + bool isrt, struct xfs_map_extent *map) { if (!xfs_has_rmapbt(mp)) @@ -488,6 +489,9 @@ xfs_rui_validate_map( if (!xfs_verify_fileext(mp, map->me_startoff, map->me_len)) return false; + if (isrt) + return xfs_verify_rtbext(mp, map->me_startblock, map->me_len); + return xfs_verify_fsbext(mp, map->me_startblock, map->me_len); } @@ -495,6 +499,7 @@ static inline void xfs_rui_recover_work( struct xfs_mount *mp, struct xfs_defer_pending *dfp, + bool isrt, const struct xfs_map_extent *map) { struct xfs_rmap_intent *ri; @@ -539,7 +544,9 @@ xfs_rui_recover_work( ri->ri_bmap.br_blockcount = map->me_len; ri->ri_bmap.br_state = (map->me_flags & XFS_RMAP_EXTENT_UNWRITTEN) ? XFS_EXT_UNWRITTEN : XFS_EXT_NORM; - ri->ri_group = xfs_group_intent_get(mp, map->me_startblock, XG_TYPE_AG); + ri->ri_group = xfs_group_intent_get(mp, map->me_startblock, + isrt ? XG_TYPE_RTG : XG_TYPE_AG); + ri->ri_realtime = isrt; xfs_defer_add_item(dfp, &ri->ri_list); } @@ -558,6 +565,7 @@ xfs_rmap_recover_work( struct xfs_rui_log_item *ruip = RUI_ITEM(lip); struct xfs_trans *tp; struct xfs_mount *mp = lip->li_log->l_mp; + bool isrt = xfs_rui_item_isrt(lip); int i; int error = 0; @@ -567,7 +575,7 @@ xfs_rmap_recover_work( * just toss the RUI. */ for (i = 0; i < ruip->rui_format.rui_nextents; i++) { - if (!xfs_rui_validate_map(mp, + if (!xfs_rui_validate_map(mp, isrt, &ruip->rui_format.rui_extents[i])) { XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, &ruip->rui_format, @@ -575,7 +583,8 @@ xfs_rmap_recover_work( return -EFSCORRUPTED; } - xfs_rui_recover_work(mp, dfp, &ruip->rui_format.rui_extents[i]); + xfs_rui_recover_work(mp, dfp, isrt, + &ruip->rui_format.rui_extents[i]); } resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate); From patchwork Fri Dec 13 01:03:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906202 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28D23184 for ; Fri, 13 Dec 2024 01:03:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051792; cv=none; b=Tw86fh0w3LtnZEjP8IIdbdJKV08aQbPhoCOoUqOZPe/EszE+1opd/xInVcE82IxhD1xpKqmft6k/un+0+pkmO+5KjmES7T99QQVdIYF440ahk6O2TsM6jFiBY8w25UEg2olWiNF1wvzc+DqcIy9t52pVbeBdlvymMPFyaSRsfHg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051792; c=relaxed/simple; bh=naIcdYq6hKI8G/wHjLHHiLjYTPVCLYcaTYmNvrp/pBE=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=o+AUQTjVbfBvsgyX4YeQ2fdkOW/8PxiGtCjuO8KchOSPNzvL+9pL7hKV1z27TukDymdm4wWgP2O2C+Xu0fGb+blOZTYlcgKmnex0ZSEYc3axP8TCEo0r2tEMnLM3q+9DlVR3Nct2+2ap6aJpVLb0fCKrSYcjYH82SfUYGF0nMMk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DPnjCyd7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DPnjCyd7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F008EC4CECE; Fri, 13 Dec 2024 01:03:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051792; bh=naIcdYq6hKI8G/wHjLHHiLjYTPVCLYcaTYmNvrp/pBE=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=DPnjCyd7fS9O1QreRVqNW+QMuZyURc9t/IE4bYkWMA1HHmJ6GyYSP8JPzwuiY+iVI m0QEUi/dmiUs/stRMyjtDrPY/GsQ2+KeZHA1/fkwjqiC+D5JIsM+UKIWmELY3jK9rM udHbvgghb0OgmSyvF4Fsqwuu5HO8tzkxdaPvk5OkzYWT8QcVh7P+1ViCFWRlIBWxwL hEjzQZDV9LjAg1gA6YEMBe/oqXqFj8kjcG/cFs5l7n65aPeptkZwZhf/GfKIdkiU9D 0RxWuW9Mfs72jp0+wTQWySlghgMIp098eae0k4dfTuX4SaehqLa6ndyQn25iAZiDnf dwLZllmgCsANQ== Date: Thu, 12 Dec 2024 17:03:11 -0800 Subject: [PATCH 10/37] xfs: pretty print metadata file types in error messages From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123485.1181370.4679130203707005497.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create a helper function to turn a metadata file type code into a printable string, and use this to complain about lockdep problems with rtgroup inodes. We'll use this more in the next patch. Signed-off-by: "Darrick J. Wong" --- fs/xfs/libxfs/xfs_metafile.h | 17 +++++++++++++++++ fs/xfs/libxfs/xfs_rtgroup.c | 3 ++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_metafile.h b/fs/xfs/libxfs/xfs_metafile.h index 8d8f08a6071c23..9df8619d5fb1a9 100644 --- a/fs/xfs/libxfs/xfs_metafile.h +++ b/fs/xfs/libxfs/xfs_metafile.h @@ -6,6 +6,23 @@ #ifndef __XFS_METAFILE_H__ #define __XFS_METAFILE_H__ +static inline const char * +xfs_metafile_type_str(enum xfs_metafile_type metatype) +{ + static const struct { + enum xfs_metafile_type mtype; + const char *name; + } strings[] = { XFS_METAFILE_TYPE_STR }; + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(strings); i++) { + if (strings[i].mtype == metatype) + return strings[i].name; + } + + return NULL; +} + /* All metadata files must have these flags set. */ #define XFS_METAFILE_DIFLAGS (XFS_DIFLAG_IMMUTABLE | \ XFS_DIFLAG_SYNC | \ diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index a79b734e70440d..9e5fdc0dc55cef 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -282,7 +282,8 @@ xfs_rtginode_ilock_print_fn( const struct xfs_inode *ip = container_of(m, struct xfs_inode, i_lock.dep_map); - printk(KERN_CONT " rgno=%u", ip->i_projid); + printk(KERN_CONT " rgno=%u metatype=%s", ip->i_projid, + xfs_metafile_type_str(ip->i_metatype)); } /* From patchwork Fri Dec 13 01:03:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906203 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EC44629 for ; Fri, 13 Dec 2024 01:03:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051808; cv=none; b=oRXNnQCv2YkF4UJrnyRI5D4IEyza4/ceadEgwTPMttphbQQZ7msGGGZCFXW8fPbBVh/sI15o2gIMneRfX67/yoVRbvio/ik2WN3npW+P6C/j6nZKL8v9FwKVp8yiYjJlN3/pTfthGte02NShVKHm/07XITyVLx1ksTq9DKI0nbE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051808; c=relaxed/simple; bh=LPHYse6XEKvsrWn9wDn+xIR6xA8GdpCAXU24qpn49H8=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Hqtc/jdWfinD8+0xLiDl6Ok6ltBrV/8rvWjl4W8XePDSpHFjmR6YgyW7cr5L3VMTyy2QFv2LSh7U2SjCzR/mZe4G8qxqrRqBeLTHPbzTLg9xPDLapspkMMSPIiXLAEdtpN3UiaskSugnoFomAVFJ4AZZtSwaU1UYoxRVTxF1X0o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oRnUO+Af; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oRnUO+Af" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 98B11C4CECE; Fri, 13 Dec 2024 01:03:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051807; bh=LPHYse6XEKvsrWn9wDn+xIR6xA8GdpCAXU24qpn49H8=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=oRnUO+Af8vG1bwpzVmzjrIn38kSyxa//VsSSamdXLumo2/gwinkedwPlOL9HizqZF UGDak1TyAfbf8+jMjjR+/H895n5k/2A0gCXayiyeqENmGfMqYHVjUKaJ0KaIxh61Rc uPjN0kpGHFYrwmzfIfiTvlRmgQUh//rooHT47kafhnkjYTrdKMgJx5PxfmkN7+6mxZ /WP6mwUDmA8mtD8r4scGqi/zYjEWl8K3Tn8P4HENeWHvR8QD71qbXdLmV1m5zu5pPx 2geWN9JXM67bdO9NvWBJ3p/orfy2M0yfvbDp+zNMLTLt7vSPQ0AkiYtXsdZHu94jgT QBEhXgcb0BSTw== Date: Thu, 12 Dec 2024 17:03:27 -0800 Subject: [PATCH 11/37] xfs: support file data forks containing metadata btrees From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123501.1181370.9693980966695147034.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create a new fork format type for metadata btrees. This fork type requires that the inode is in the metadata directory tree, and only applies to the data fork. The actual type of the metadata btree itself is determined by the di_metatype field. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_format.h | 6 ++++-- fs/xfs/libxfs/xfs_inode_buf.c | 23 ++++++++++++++++++++--- fs/xfs/libxfs/xfs_inode_fork.c | 19 +++++++++++++++++++ fs/xfs/scrub/bmap.c | 1 + fs/xfs/scrub/bmap_repair.c | 1 + fs/xfs/scrub/inode.c | 4 ++++ fs/xfs/scrub/inode_repair.c | 36 ++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/rmap_repair.c | 31 +++++++++++++++++++++---------- fs/xfs/xfs_inode.c | 19 ++++++++++++++++++- fs/xfs/xfs_inode_item.c | 2 ++ fs/xfs/xfs_inode_item_recover.c | 38 ++++++++++++++++++++++++++++++++++---- fs/xfs/xfs_trace.h | 1 + 12 files changed, 161 insertions(+), 20 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 469fc7afa591b4..41ea4283c43cb4 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -997,7 +997,8 @@ enum xfs_dinode_fmt { XFS_DINODE_FMT_LOCAL, /* bulk data */ XFS_DINODE_FMT_EXTENTS, /* struct xfs_bmbt_rec */ XFS_DINODE_FMT_BTREE, /* struct xfs_bmdr_block */ - XFS_DINODE_FMT_UUID /* added long ago, but never used */ + XFS_DINODE_FMT_UUID, /* added long ago, but never used */ + XFS_DINODE_FMT_META_BTREE, /* metadata btree */ }; #define XFS_INODE_FORMAT_STR \ @@ -1005,7 +1006,8 @@ enum xfs_dinode_fmt { { XFS_DINODE_FMT_LOCAL, "local" }, \ { XFS_DINODE_FMT_EXTENTS, "extent" }, \ { XFS_DINODE_FMT_BTREE, "btree" }, \ - { XFS_DINODE_FMT_UUID, "uuid" } + { XFS_DINODE_FMT_UUID, "uuid" }, \ + { XFS_DINODE_FMT_META_BTREE, "meta_btree" } /* * Max values for extnum and aextnum. diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 424861fbf1bd49..1648d72d6ed95a 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -441,6 +441,16 @@ xfs_dinode_verify_fork( if (di_nextents > max_extents) return __this_address; break; + case XFS_DINODE_FMT_META_BTREE: + if (!xfs_has_metadir(mp)) + return __this_address; + if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))) + return __this_address; + switch (be16_to_cpu(dip->di_metatype)) { + default: + return __this_address; + } + break; default: return __this_address; } @@ -460,6 +470,10 @@ xfs_dinode_verify_forkoff( if (dip->di_forkoff != (roundup(sizeof(xfs_dev_t), 8) >> 3)) return __this_address; break; + case XFS_DINODE_FMT_META_BTREE: + if (!xfs_has_metadir(mp) || !xfs_has_parent(mp)) + return __this_address; + fallthrough; case XFS_DINODE_FMT_LOCAL: /* fall through ... */ case XFS_DINODE_FMT_EXTENTS: /* fall through ... */ case XFS_DINODE_FMT_BTREE: @@ -637,9 +651,6 @@ xfs_dinode_verify( if (mode && nextents + naextents > nblocks) return __this_address; - if (nextents + naextents == 0 && nblocks != 0) - return __this_address; - if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents) return __this_address; @@ -743,6 +754,12 @@ xfs_dinode_verify( return fa; } + /* metadata inodes containing btrees always have zero extent count */ + if (XFS_DFORK_FORMAT(dip, XFS_DATA_FORK) != XFS_DINODE_FMT_META_BTREE) { + if (nextents + naextents == 0 && nblocks != 0) + return __this_address; + } + return NULL; } diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 122ab362892de3..5ee733d4449b02 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -268,6 +268,12 @@ xfs_iformat_data_fork( return xfs_iformat_extents(ip, dip, XFS_DATA_FORK); case XFS_DINODE_FMT_BTREE: return xfs_iformat_btree(ip, dip, XFS_DATA_FORK); + case XFS_DINODE_FMT_META_BTREE: + switch (ip->i_metatype) { + default: + break; + } + fallthrough; default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -602,6 +608,19 @@ xfs_iflush_fork( } break; + case XFS_DINODE_FMT_META_BTREE: + ASSERT(whichfork == XFS_DATA_FORK); + + if (!(iip->ili_fields & brootflag[whichfork])) + break; + + switch (ip->i_metatype) { + default: + ASSERT(0); + break; + } + break; + default: ASSERT(0); break; diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 7e00312225ed10..0d7ad692822d48 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -983,6 +983,7 @@ xchk_bmap( case XFS_DINODE_FMT_UUID: case XFS_DINODE_FMT_DEV: case XFS_DINODE_FMT_LOCAL: + case XFS_DINODE_FMT_META_BTREE: /* No mappings to check. */ if (whichfork == XFS_COW_FORK) xchk_fblock_set_corrupt(sc, whichfork, 0); diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 7c4955482641f7..141d36f1da9a71 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -731,6 +731,7 @@ xrep_bmap_check_inputs( case XFS_DINODE_FMT_DEV: case XFS_DINODE_FMT_LOCAL: case XFS_DINODE_FMT_UUID: + case XFS_DINODE_FMT_META_BTREE: return -ECANCELED; case XFS_DINODE_FMT_EXTENTS: case XFS_DINODE_FMT_BTREE: diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index 25ee66e7649d40..2e911f38deaebe 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -502,6 +502,10 @@ xchk_dinode( if (!S_ISREG(mode) && !S_ISDIR(mode)) xchk_ino_set_corrupt(sc, ino); break; + case XFS_DINODE_FMT_META_BTREE: + if (!S_ISREG(mode)) + xchk_ino_set_corrupt(sc, ino); + break; case XFS_DINODE_FMT_UUID: default: xchk_ino_set_corrupt(sc, ino); diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 5a58ddd27bd2f5..7faa27472b9129 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -888,6 +888,25 @@ xrep_dinode_bad_bmbt_fork( return false; } +/* Check a metadata-btree fork. */ +STATIC bool +xrep_dinode_bad_metabt_fork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + unsigned int dfork_size, + int whichfork) +{ + if (whichfork != XFS_DATA_FORK) + return true; + + switch (be16_to_cpu(dip->di_metatype)) { + default: + return true; + } + + return false; +} + /* * Check the data fork for things that will fail the ifork verifiers or the * ifork formatters. @@ -968,6 +987,11 @@ xrep_dinode_check_dfork( XFS_DATA_FORK)) return true; break; + case XFS_DINODE_FMT_META_BTREE: + if (xrep_dinode_bad_metabt_fork(sc, dip, dfork_size, + XFS_DATA_FORK)) + return true; + break; default: return true; } @@ -1088,6 +1112,11 @@ xrep_dinode_check_afork( XFS_ATTR_FORK)) return true; break; + case XFS_DINODE_FMT_META_BTREE: + if (xrep_dinode_bad_metabt_fork(sc, dip, afork_size, + XFS_ATTR_FORK)) + return true; + break; default: return true; } @@ -1241,6 +1270,13 @@ xrep_dinode_ensure_forkoff( bmdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK); dfork_min = xfs_bmap_broot_space(sc->mp, bmdr); break; + case XFS_DINODE_FMT_META_BTREE: + switch (be16_to_cpu(dip->di_metatype)) { + default: + dfork_min = 0; + break; + } + break; default: dfork_min = 0; break; diff --git a/fs/xfs/scrub/rmap_repair.c b/fs/xfs/scrub/rmap_repair.c index a0a227d183d28d..2a0b9e3d0fbaee 100644 --- a/fs/xfs/scrub/rmap_repair.c +++ b/fs/xfs/scrub/rmap_repair.c @@ -499,6 +499,14 @@ xrep_rmap_scan_iext( return xrep_rmap_stash_accumulated(rf); } +static int +xrep_rmap_scan_meta_btree( + struct xrep_rmap_ifork *rf, + struct xfs_inode *ip) +{ + return -EFSCORRUPTED; /* XXX placeholder */ +} + /* Find all the extents from a given AG in an inode fork. */ STATIC int xrep_rmap_scan_ifork( @@ -512,14 +520,14 @@ xrep_rmap_scan_ifork( .whichfork = whichfork, }; struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); + bool mappings_done; int error = 0; if (!ifp) return 0; - if (ifp->if_format == XFS_DINODE_FMT_BTREE) { - bool mappings_done; - + switch (ifp->if_format) { + case XFS_DINODE_FMT_BTREE: /* * Scan the bmap btree for data device mappings. This includes * the btree blocks themselves, even if this is a realtime @@ -528,15 +536,18 @@ xrep_rmap_scan_ifork( error = xrep_rmap_scan_bmbt(&rf, ip, &mappings_done); if (error || mappings_done) return error; - } else if (ifp->if_format != XFS_DINODE_FMT_EXTENTS) { - return 0; + fallthrough; + case XFS_DINODE_FMT_EXTENTS: + /* Scan incore extent cache if this isn't a realtime file. */ + if (xfs_ifork_is_realtime(ip, whichfork)) + return 0; + + return xrep_rmap_scan_iext(&rf, ifp); + case XFS_DINODE_FMT_META_BTREE: + return xrep_rmap_scan_meta_btree(&rf, ip); } - /* Scan incore extent cache if this isn't a realtime file. */ - if (xfs_ifork_is_realtime(ip, whichfork)) - return 0; - - return xrep_rmap_scan_iext(&rf, ifp); + return 0; } /* diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index c8ad2606f928b2..c95fe1b1de4e6f 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2382,7 +2382,16 @@ xfs_iflush( __func__, ip->i_ino, be16_to_cpu(dip->di_magic), dip); goto flush_out; } - if (S_ISREG(VFS_I(ip)->i_mode)) { + if (ip->i_df.if_format == XFS_DINODE_FMT_META_BTREE) { + if (!S_ISREG(VFS_I(ip)->i_mode) || + !(ip->i_diflags2 & XFS_DIFLAG2_METADATA)) { + xfs_alert_tag(mp, XFS_PTAG_IFLUSH, + "%s: Bad %s meta btree inode %Lu, ptr "PTR_FMT, + __func__, xfs_metafile_type_str(ip->i_metatype), + ip->i_ino, ip); + goto flush_out; + } + } else if (S_ISREG(VFS_I(ip)->i_mode)) { if (XFS_TEST_ERROR( ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS && ip->i_df.if_format != XFS_DINODE_FMT_BTREE, @@ -2422,6 +2431,14 @@ xfs_iflush( goto flush_out; } + if (xfs_inode_has_attr_fork(ip) && + ip->i_af.if_format == XFS_DINODE_FMT_META_BTREE) { + xfs_alert_tag(mp, XFS_PTAG_IFLUSH, + "%s: meta btree in inode %Lu attr fork, ptr "PTR_FMT, + __func__, ip->i_ino, ip); + goto flush_out; + } + /* * Inode item log recovery for v2 inodes are dependent on the flushiter * count for correct sequencing. We bump the flush iteration count so diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 912f0b1bc3cb70..a174f64b8bb250 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -242,6 +242,7 @@ xfs_inode_item_data_fork_size( } break; case XFS_DINODE_FMT_BTREE: + case XFS_DINODE_FMT_META_BTREE: if ((iip->ili_fields & XFS_ILOG_DBROOT) && ip->i_df.if_broot_bytes > 0) { *nbytes += ip->i_df.if_broot_bytes; @@ -362,6 +363,7 @@ xfs_inode_item_format_data_fork( } break; case XFS_DINODE_FMT_BTREE: + case XFS_DINODE_FMT_META_BTREE: iip->ili_fields &= ~(XFS_ILOG_DDATA | XFS_ILOG_DEXT | XFS_ILOG_DEV); diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index e70d2611456bc9..5bb057ba76ead4 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -266,6 +266,35 @@ xlog_dinode_verify_extent_counts( return 0; } +static inline int +xlog_recover_inode_dbroot( + struct xfs_mount *mp, + void *src, + unsigned int len, + struct xfs_dinode *dip) +{ + void *dfork = XFS_DFORK_DPTR(dip); + unsigned int dsize = XFS_DFORK_DSIZE(dip, mp); + + switch (dip->di_format) { + case XFS_DINODE_FMT_BTREE: + xfs_bmbt_to_bmdr(mp, src, len, dfork, dsize); + break; + case XFS_DINODE_FMT_META_BTREE: + switch (be16_to_cpu(dip->di_metatype)) { + default: + ASSERT(0); + return -EFSCORRUPTED; + } + break; + default: + ASSERT(0); + return -EFSCORRUPTED; + } + + return 0; +} + STATIC int xlog_recover_inode_commit_pass2( struct xlog *log, @@ -394,7 +423,8 @@ xlog_recover_inode_commit_pass2( if (unlikely(S_ISREG(ldip->di_mode))) { if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && - (ldip->di_format != XFS_DINODE_FMT_BTREE)) { + (ldip->di_format != XFS_DINODE_FMT_BTREE) && + (ldip->di_format != XFS_DINODE_FMT_META_BTREE)) { XFS_CORRUPTION_ERROR( "Bad log dinode data fork format for regular file", XFS_ERRLEVEL_LOW, mp, ldip, sizeof(*ldip)); @@ -475,9 +505,9 @@ xlog_recover_inode_commit_pass2( break; case XFS_ILOG_DBROOT: - xfs_bmbt_to_bmdr(mp, (struct xfs_btree_block *)src, len, - (struct xfs_bmdr_block *)XFS_DFORK_DPTR(dip), - XFS_DFORK_DSIZE(dip, mp)); + error = xlog_recover_inode_dbroot(mp, src, len, dip); + if (error) + goto out_release; break; default: diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 8b7bb1f5ae3c6f..a098935163b7c2 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2299,6 +2299,7 @@ TRACE_DEFINE_ENUM(XFS_DINODE_FMT_LOCAL); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_EXTENTS); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_BTREE); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_UUID); +TRACE_DEFINE_ENUM(XFS_DINODE_FMT_META_BTREE); DECLARE_EVENT_CLASS(xfs_swap_extent_class, TP_PROTO(struct xfs_inode *ip, int which), From patchwork Fri Dec 13 01:03:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906204 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B59ED7485 for ; Fri, 13 Dec 2024 01:03:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051823; cv=none; b=fv1b+43+abH5lGVyh0/q1Bwa3YjuPyVJviS/IRft6mvbcP/iLC7C1b/9zWi18jvcloyLNOHVIkaHBLNzJuZKZ61QCzVsjdb8RLRl2dUIa/Akv5VCb1FYvSHtp8Dn18A46+54IOBYY7hYv9DFhspCOGdchsVaHsIRUy4DdtnwbzA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051823; c=relaxed/simple; bh=ya2t2uOUQhFxQBXWUw9ckzVZDx23uLxYJGvyG1tPWvM=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=bCrN/07Pspi5kyhZv03V73aSJM/307M4iqzEUTd0g9/PrQ7jqnsNmikKzCLvIWNchGbH32qxcyQquyar0iwQDlc8Mlhr22Ds1Uvw1qn/kykptd4DWGXnGMddKRklgTiWiBxaTPZ/wQln8KpOt9bC3Fwsb3WY3DP3idHWDKO8mRc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dHXjv8Qy; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dHXjv8Qy" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3C9A4C4CECE; Fri, 13 Dec 2024 01:03:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051823; bh=ya2t2uOUQhFxQBXWUw9ckzVZDx23uLxYJGvyG1tPWvM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=dHXjv8QyT2nTSU/MzGxeJ+mXyhAQPF6udBrWNdHl7g2vaLn3M5p0UUrNw3/Sgw8dB PRebVEMcSk59j+4Iu37RAIx3S2rXR7nvdTomYSy0+nDo61/blQWiAyIwgtVzQXdanu sTwJNMr8O0j3ELE2K4XOyCegCgDvUKf0rXMlIISNk1rDLa5ci2wtsIZAov4v7QjZFf hQ/khPQn8ksULxDsMzHof7bWdSZVNtGLD66+69CO1LAHY5zWg243UQvCOx3xPJKIRm eMTidQxQVW2JqyYoCaFvLZ18gnw04GCdRxUazdKyfX3NoGy/hNDMH65uEJnqYL4A81 Vf8yT/7MAR+/g== Date: Thu, 12 Dec 2024 17:03:42 -0800 Subject: [PATCH 12/37] xfs: add realtime reverse map inode to metadata directory From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123520.1181370.12338691375422114269.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add a metadir path to select the realtime rmap btree inode and load it at mount time. The rtrmapbt inode will have a unique extent format code, which means that we also have to update the inode validation and flush routines to look for it. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_format.h | 4 +++- fs/xfs/libxfs/xfs_inode_buf.c | 9 +++++++++ fs/xfs/libxfs/xfs_inode_fork.c | 6 ++++++ fs/xfs/libxfs/xfs_rtgroup.c | 20 ++++++++++++++++++-- fs/xfs/libxfs/xfs_rtgroup.h | 8 ++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.c | 6 +++--- 6 files changed, 47 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 41ea4283c43cb4..f32c9fda5a195f 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -857,6 +857,7 @@ enum xfs_metafile_type { XFS_METAFILE_PRJQUOTA, /* project quota */ XFS_METAFILE_RTBITMAP, /* rt bitmap */ XFS_METAFILE_RTSUMMARY, /* rt summary */ + XFS_METAFILE_RTRMAP, /* rt rmap */ XFS_METAFILE_MAX } __packed; @@ -868,7 +869,8 @@ enum xfs_metafile_type { { XFS_METAFILE_GRPQUOTA, "grpquota" }, \ { XFS_METAFILE_PRJQUOTA, "prjquota" }, \ { XFS_METAFILE_RTBITMAP, "rtbitmap" }, \ - { XFS_METAFILE_RTSUMMARY, "rtsummary" } + { XFS_METAFILE_RTSUMMARY, "rtsummary" }, \ + { XFS_METAFILE_RTRMAP, "rtrmap" } /* * On-disk inode structure. diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 1648d72d6ed95a..17cb91b89fcaa1 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -447,6 +447,15 @@ xfs_dinode_verify_fork( if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))) return __this_address; switch (be16_to_cpu(dip->di_metatype)) { + case XFS_METAFILE_RTRMAP: + /* + * growfs must create the rtrmap inodes before adding a + * realtime volume to the filesystem, so we cannot use + * the rtrmapbt predicate here. + */ + if (!xfs_has_rmapbt(mp)) + return __this_address; + break; default: return __this_address; } diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 5ee733d4449b02..a8662185f8c22a 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -270,6 +270,9 @@ xfs_iformat_data_fork( return xfs_iformat_btree(ip, dip, XFS_DATA_FORK); case XFS_DINODE_FMT_META_BTREE: switch (ip->i_metatype) { + case XFS_METAFILE_RTRMAP: + ASSERT(0); /* to be implemented later */ + return -EFSCORRUPTED; default: break; } @@ -615,6 +618,9 @@ xfs_iflush_fork( break; switch (ip->i_metatype) { + case XFS_METAFILE_RTRMAP: + ASSERT(0); /* to be implemented later */ + break; default: ASSERT(0); break; diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index 9e5fdc0dc55cef..1b56c13b282788 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -315,6 +315,8 @@ struct xfs_rtginode_ops { unsigned int sick; /* rtgroup sickness flag */ + unsigned int fmt_mask; /* all valid data fork formats */ + /* Does the fs have this feature? */ bool (*enabled)(struct xfs_mount *mp); @@ -330,14 +332,29 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = { .name = "bitmap", .metafile_type = XFS_METAFILE_RTBITMAP, .sick = XFS_SICK_RG_BITMAP, + .fmt_mask = (1U << XFS_DINODE_FMT_EXTENTS) | + (1U << XFS_DINODE_FMT_BTREE), .create = xfs_rtbitmap_create, }, [XFS_RTGI_SUMMARY] = { .name = "summary", .metafile_type = XFS_METAFILE_RTSUMMARY, .sick = XFS_SICK_RG_SUMMARY, + .fmt_mask = (1U << XFS_DINODE_FMT_EXTENTS) | + (1U << XFS_DINODE_FMT_BTREE), .create = xfs_rtsummary_create, }, + [XFS_RTGI_RMAP] = { + .name = "rmap", + .metafile_type = XFS_METAFILE_RTRMAP, + .fmt_mask = 1U << XFS_DINODE_FMT_META_BTREE, + /* + * growfs must create the rtrmap inodes before adding a + * realtime volume to the filesystem, so we cannot use the + * rtrmapbt predicate here. + */ + .enabled = xfs_has_rmapbt, + }, }; /* Return the shortname of this rtgroup inode. */ @@ -434,8 +451,7 @@ xfs_rtginode_load( return error; } - if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS && - ip->i_df.if_format != XFS_DINODE_FMT_BTREE)) { + if (XFS_IS_CORRUPT(mp, !((1U << ip->i_df.if_format) & ops->fmt_mask))) { xfs_irele(ip); xfs_rtginode_mark_sick(rtg, type); return -EFSCORRUPTED; diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index dc3ce660a01307..5b61291d26691f 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -14,6 +14,7 @@ struct xfs_trans; enum xfs_rtg_inodes { XFS_RTGI_BITMAP, /* allocation bitmap */ XFS_RTGI_SUMMARY, /* allocation summary */ + XFS_RTGI_RMAP, /* rmap btree inode */ XFS_RTGI_MAX, }; @@ -74,6 +75,11 @@ static inline struct xfs_inode *rtg_summary(const struct xfs_rtgroup *rtg) return rtg->rtg_inodes[XFS_RTGI_SUMMARY]; } +static inline struct xfs_inode *rtg_rmap(const struct xfs_rtgroup *rtg) +{ + return rtg->rtg_inodes[XFS_RTGI_RMAP]; +} + /* Passive rtgroup references */ static inline struct xfs_rtgroup * xfs_rtgroup_get( @@ -284,6 +290,8 @@ int xfs_rtginode_create(struct xfs_rtgroup *rtg, enum xfs_rtg_inodes type, bool init); void xfs_rtginode_irele(struct xfs_inode **ipp); +void xfs_rtginode_irele(struct xfs_inode **ipp); + static inline const char *xfs_rtginode_path(xfs_rgnumber_t rgno, enum xfs_rtg_inodes type) { diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 99d828bb5fe7c3..22aabf326b2ccd 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -18,6 +18,7 @@ #include "xfs_alloc.h" #include "xfs_btree.h" #include "xfs_btree_staging.h" +#include "xfs_metafile.h" #include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" #include "xfs_trace.h" @@ -405,12 +406,10 @@ xfs_rtrmapbt_init_cursor( struct xfs_trans *tp, struct xfs_rtgroup *rtg) { - struct xfs_inode *ip = NULL; + struct xfs_inode *ip = rtg_rmap(rtg); struct xfs_mount *mp = rtg_mount(rtg); struct xfs_btree_cur *cur; - return NULL; /* XXX */ - xfs_assert_ilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL); cur = xfs_btree_alloc_cursor(mp, tp, &xfs_rtrmapbt_ops, @@ -439,6 +438,7 @@ xfs_rtrmapbt_commit_staged_btree( int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + ASSERT(ifake->if_fork->if_format == XFS_DINODE_FMT_META_BTREE); /* * Free any resources hanging off the real fork, then shallow-copy the From patchwork Fri Dec 13 01:03:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906206 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48FB18472 for ; Fri, 13 Dec 2024 01:03:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051839; cv=none; b=ultuVoOqeNnL2kA2KLy/pv6uLwO/fna9D00yoKz77rOs364QxKd/S/xMMl6SuRAuaOBMkvNHrMo8QDQ1hG0T2NmqdBSHjUGBHg15G8CQIvSJ88hgG4+voqLx3T3P4l8IJ6JBJXTG99yLaScECEpRO9oCnpsFgtffqZTzJVQC5rk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051839; c=relaxed/simple; bh=WVQuGzDDEV//jhR1uAaL3qWlfnoSYPR5jLUqi14YEjk=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=AdzbjD8jMuaqEE5gjJuDwZV7JuEzHvSB3alhcuyly9CtgaS+D3gjfz6ZVz8/9HqLLi+OawWuyncomvN4DsBFaFanvquTBzPH96s62F6ECfUls1yAdg6ba907W3jWXfitqfykn07LEWNH3RLCVk0j1xlplW8xl54ECzJw5MqUnX8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=cbloeqow; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cbloeqow" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CF307C4CECE; Fri, 13 Dec 2024 01:03:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051838; bh=WVQuGzDDEV//jhR1uAaL3qWlfnoSYPR5jLUqi14YEjk=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=cbloeqowIXJ2WEdphdZJ4iu7OwwfyC3AIPgmmAir7IeBrvOK7L51adXSz3/cYZaGl AUXKKqm4msA6/rUaBPlUCT+XDijYKzbVsAtqhCyDBITVvhl3XhJ17Bqkk94ihqgKjy ckqd6XHVQOM+zp3Uf0MrZqcgSIkUvlqPFwD8EET6bwuxeTvUg/ZGd3x2hcUGV6PIx4 feumntfVMnxPbIK4iimAk8GWv1xagfVl23+7tw6Oh4O6PDzn3cd5U7u6Xn32nlAUHp ejQsSAQQ9M6p9BT1PYZw315KvoP7L5K163VSDpMhZnQYMzC4o06ZdX6+jmzDjRJvDy v2KAiehnYbzGQ== Date: Thu, 12 Dec 2024 17:03:58 -0800 Subject: [PATCH 13/37] xfs: add metadata reservations for realtime rmap btrees From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123537.1181370.15688658549010481316.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Reserve some free blocks so that we will always have enough free blocks in the data volume to handle expansion of the realtime rmap btree. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_rtrmap_btree.c | 41 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 2 ++ fs/xfs/xfs_rtalloc.c | 23 ++++++++++++++++++++- 3 files changed, 65 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 22aabf326b2ccd..08c4014a75a42c 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -540,3 +540,44 @@ xfs_rtrmapbt_compute_maxlevels( /* Add one level to handle the inode root level. */ mp->m_rtrmap_maxlevels = min(d_maxlevels, r_maxlevels) + 1; } + +/* Calculate the rtrmap btree size for some records. */ +static unsigned long long +xfs_rtrmapbt_calc_size( + struct xfs_mount *mp, + unsigned long long len) +{ + return xfs_btree_calc_size(mp->m_rtrmap_mnr, len); +} + +/* + * Calculate the maximum rmap btree size. + */ +static unsigned long long +xfs_rtrmapbt_max_size( + struct xfs_mount *mp, + xfs_rtblock_t rtblocks) +{ + /* Bail out if we're uninitialized, which can happen in mkfs. */ + if (mp->m_rtrmap_mxr[0] == 0) + return 0; + + return xfs_rtrmapbt_calc_size(mp, rtblocks); +} + +/* + * Figure out how many blocks to reserve and how many are used by this btree. + */ +xfs_filblks_t +xfs_rtrmapbt_calc_reserves( + struct xfs_mount *mp) +{ + uint32_t blocks = mp->m_groups[XG_TYPE_RTG].blocks; + + if (!xfs_has_rtrmapbt(mp)) + return 0; + + /* 1/64th (~1.5%) of the space, and enough for 1 record per block. */ + return max_t(xfs_filblks_t, blocks >> 6, + xfs_rtrmapbt_max_size(mp, blocks)); +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 63aabae2e09db1..ad5cb1078bc1a0 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -79,4 +79,6 @@ unsigned int xfs_rtrmapbt_maxlevels_ondisk(void); int __init xfs_rtrmapbt_init_cur_cache(void); void xfs_rtrmapbt_destroy_cur_cache(void); +xfs_filblks_t xfs_rtrmapbt_calc_reserves(struct xfs_mount *mp); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 4cd2f32aa70a0a..2245f9ecaa3398 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -22,6 +22,7 @@ #include "xfs_rtalloc.h" #include "xfs_sb.h" #include "xfs_rtbitmap.h" +#include "xfs_rtrmap_btree.h" #include "xfs_quota.h" #include "xfs_log_priv.h" #include "xfs_health.h" @@ -1498,6 +1499,13 @@ void xfs_rt_resv_free( struct xfs_mount *mp) { + struct xfs_rtgroup *rtg = NULL; + unsigned int i; + + while ((rtg = xfs_rtgroup_next(mp, rtg))) { + for (i = 0; i < XFS_RTGI_MAX; i++) + xfs_metafile_resv_free(rtg->rtg_inodes[i]); + } } /* Reserve space for rt metadata inodes' space expansion. */ @@ -1505,7 +1513,20 @@ int xfs_rt_resv_init( struct xfs_mount *mp) { - return 0; + struct xfs_rtgroup *rtg = NULL; + xfs_filblks_t ask; + int error = 0; + + while ((rtg = xfs_rtgroup_next(mp, rtg))) { + int err2; + + ask = xfs_rtrmapbt_calc_reserves(mp); + err2 = xfs_metafile_resv_init(rtg_rmap(rtg), ask); + if (err2 && !error) + error = err2; + } + + return error; } /* From patchwork Fri Dec 13 01:04:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906207 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABF8E184 for ; Fri, 13 Dec 2024 01:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051854; cv=none; b=mV2hWIq0tEtGmC/Id9JIt+hzjhB6v6DJcCAkcvbU5iN2NoREAZl/HUZt8sIFXTAh4+DIeKBGYbOpqRkzVzNpH4uK7oxvPXpMcrTAoyFlI1g2XKuF1TUuR3V08tXZ6PKNzTGSWv329rsS/i1CcYVJfmCReGnE9PQmdZAba0IyBIY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051854; c=relaxed/simple; bh=HGYeTyXTkzp/0gpX/+lLah/9qBvyfXcGSHDZvesfOYQ=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=AXUL24yKtWCZJ/7K6ihE3OhpQKSoIa6+M937plycJmrhQy1v0HfDKhnV7MlgD4L3Ds3WNA11JHcSvbR5baRBFtqURlp8BsgsNVrM5ea7rLyJTAOVFPob7T5tCfI2cn6aSa+mMwwkmJh6+cHVnAcIp8xj4qy+fC/imM47VXdCat8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QfmF7KvV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QfmF7KvV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 74ED6C4CECE; Fri, 13 Dec 2024 01:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051854; bh=HGYeTyXTkzp/0gpX/+lLah/9qBvyfXcGSHDZvesfOYQ=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=QfmF7KvVEI4CuvI32ChVZOoZeFbe59Wqva/O9kSr2rS24yGWg7bwSPO3C2rAJtKbs r6wMo8pO/2YGIbQWQuFxc2bdIgOPYuRPkuyXiF3bgq3rEjU9L4js5Xu7eYDnRDcmq7 u+FvSKvL3OSOEqEUrUt7z1KHKhscvmANGPZ0KFNo25Xl+AlIVmtyH0Jl4yK81zlezH l3U8ER0KqOaVgHV/r1HRnBv6QDiGnDpdWR8pFw9/IMUJeDHDVlPSNmuP6ORZwLCX1g nrxMfLpR9PznCr9Q+HgAXBcxEBaDnyvGebKLoG9AZfA7BSpxgCZANiplupcP8mYpdV 9eZldm3LCV83w== Date: Thu, 12 Dec 2024 17:04:14 -0800 Subject: [PATCH 14/37] xfs: wire up a new metafile type for the realtime rmap From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123554.1181370.7813302766369370222.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Plumb in the pieces we need to embed the root of the realtime rmap btree in an inode's data fork, complete with new metafile type and on-disk interpretation functions. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_format.h | 8 + fs/xfs/libxfs/xfs_inode_fork.c | 6 - fs/xfs/libxfs/xfs_ondisk.h | 1 fs/xfs/libxfs/xfs_rtrmap_btree.c | 251 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 112 +++++++++++++++++ fs/xfs/xfs_inode_item_recover.c | 4 + 6 files changed, 379 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index f32c9fda5a195f..fba4e59aded4a0 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1736,6 +1736,14 @@ typedef __be32 xfs_rmap_ptr_t; */ #define XFS_RTRMAP_CRC_MAGIC 0x4d415052 /* 'MAPR' */ +/* + * rtrmap root header, on-disk form only. + */ +struct xfs_rtrmap_root { + __be16 bb_level; /* 0 is a leaf */ + __be16 bb_numrecs; /* current # of data records */ +}; + /* inode-based btree pointer type */ typedef __be64 xfs_rtrmap_ptr_t; diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index a8662185f8c22a..d1a04b45ac5492 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -27,6 +27,7 @@ #include "xfs_errortag.h" #include "xfs_health.h" #include "xfs_symlink_remote.h" +#include "xfs_rtrmap_btree.h" struct kmem_cache *xfs_ifork_cache; @@ -271,8 +272,7 @@ xfs_iformat_data_fork( case XFS_DINODE_FMT_META_BTREE: switch (ip->i_metatype) { case XFS_METAFILE_RTRMAP: - ASSERT(0); /* to be implemented later */ - return -EFSCORRUPTED; + return xfs_iformat_rtrmap(ip, dip); default: break; } @@ -619,7 +619,7 @@ xfs_iflush_fork( switch (ip->i_metatype) { case XFS_METAFILE_RTRMAP: - ASSERT(0); /* to be implemented later */ + xfs_iflush_rtrmap(ip, dip); break; default: ASSERT(0); diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h index 2c50877a1a2f0b..07e2f5fb3a94ae 100644 --- a/fs/xfs/libxfs/xfs_ondisk.h +++ b/fs/xfs/libxfs/xfs_ondisk.h @@ -84,6 +84,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_raw, 4); XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo, 48); XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t, 8); + XFS_CHECK_STRUCT_SIZE(struct xfs_rtrmap_root, 4); /* * m68k has problems with struct xfs_attr_leaf_name_remote, but we pad diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 08c4014a75a42c..d90189a1ef10b5 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -77,6 +77,39 @@ xfs_rtrmapbt_get_maxrecs( return cur->bc_mp->m_rtrmap_mxr[level != 0]; } +/* Calculate number of records in the ondisk realtime rmap btree inode root. */ +unsigned int +xfs_rtrmapbt_droot_maxrecs( + unsigned int blocklen, + bool leaf) +{ + blocklen -= sizeof(struct xfs_rtrmap_root); + + if (leaf) + return blocklen / sizeof(struct xfs_rmap_rec); + return blocklen / (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Get the maximum records we could store in the on-disk format. + * + * For non-root nodes this is equivalent to xfs_rtrmapbt_get_maxrecs, but + * for the root node this checks the available space in the dinode fork + * so that we can resize the in-memory buffer to match it. After a + * resize to the maximum size this function returns the same value + * as xfs_rtrmapbt_get_maxrecs for the root node, too. + */ +STATIC int +xfs_rtrmapbt_get_dmaxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level != cur->bc_nlevels - 1) + return cur->bc_mp->m_rtrmap_mxr[level != 0]; + return xfs_rtrmapbt_droot_maxrecs(cur->bc_ino.forksize, level == 0); +} + /* * Convert the ondisk record's offset field into the ondisk key's offset field. * Fork and bmbt are significant parts of the rmap record key, but written @@ -369,6 +402,87 @@ xfs_rtrmapbt_keys_contiguous( be32_to_cpu(key2->rmap.rm_startblock)); } +static inline void +xfs_rtrmapbt_move_ptrs( + struct xfs_mount *mp, + struct xfs_btree_block *broot, + short old_size, + size_t new_size, + unsigned int numrecs) +{ + void *dptr; + void *sptr; + + sptr = xfs_rtrmap_broot_ptr_addr(mp, broot, 1, old_size); + dptr = xfs_rtrmap_broot_ptr_addr(mp, broot, 1, new_size); + memmove(dptr, sptr, numrecs * sizeof(xfs_rtrmap_ptr_t)); +} + +static struct xfs_btree_block * +xfs_rtrmapbt_broot_realloc( + struct xfs_btree_cur *cur, + unsigned int new_numrecs) +{ + struct xfs_mount *mp = cur->bc_mp; + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + struct xfs_btree_block *broot; + unsigned int new_size; + unsigned int old_size = ifp->if_broot_bytes; + const unsigned int level = cur->bc_nlevels - 1; + + new_size = xfs_rtrmap_broot_space_calc(mp, level, new_numrecs); + + /* Handle the nop case quietly. */ + if (new_size == old_size) + return ifp->if_broot; + + if (new_size > old_size) { + unsigned int old_numrecs; + + /* + * If there wasn't any memory allocated before, just allocate + * it now and get out. + */ + if (old_size == 0) + return xfs_broot_realloc(ifp, new_size); + + /* + * If there is already an existing if_broot, then we need to + * realloc it and possibly move the node block pointers because + * those are not butted up against the btree block header. + */ + old_numrecs = xfs_rtrmapbt_maxrecs(mp, old_size, level == 0); + broot = xfs_broot_realloc(ifp, new_size); + if (level > 0) + xfs_rtrmapbt_move_ptrs(mp, broot, old_size, new_size, + old_numrecs); + goto out_broot; + } + + /* + * We're reducing numrecs. If we're going all the way to zero, just + * free the block. + */ + ASSERT(ifp->if_broot != NULL && old_size > 0); + if (new_size == 0) + return xfs_broot_realloc(ifp, 0); + + /* + * Shrink the btree root by possibly moving the rtrmapbt pointers, + * since they are not butted up against the btree block header. Then + * reallocate broot. + */ + if (level > 0) + xfs_rtrmapbt_move_ptrs(mp, ifp->if_broot, old_size, new_size, + new_numrecs); + broot = xfs_broot_realloc(ifp, new_size); + +out_broot: + ASSERT(xfs_rtrmap_droot_space(broot) <= + xfs_inode_fork_size(cur->bc_ino.ip, cur->bc_ino.whichfork)); + return broot; +} + const struct xfs_btree_ops xfs_rtrmapbt_ops = { .name = "rtrmap", .type = XFS_BTREE_TYPE_INODE, @@ -388,6 +502,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { .free_block = xfs_btree_free_metafile_block, .get_minrecs = xfs_rtrmapbt_get_minrecs, .get_maxrecs = xfs_rtrmapbt_get_maxrecs, + .get_dmaxrecs = xfs_rtrmapbt_get_dmaxrecs, .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, @@ -398,6 +513,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { .keys_inorder = xfs_rtrmapbt_keys_inorder, .recs_inorder = xfs_rtrmapbt_recs_inorder, .keys_contiguous = xfs_rtrmapbt_keys_contiguous, + .broot_realloc = xfs_rtrmapbt_broot_realloc, }; /* Allocate a new rt rmap btree cursor. */ @@ -581,3 +697,138 @@ xfs_rtrmapbt_calc_reserves( return max_t(xfs_filblks_t, blocks >> 6, xfs_rtrmapbt_max_size(mp, blocks)); } + +/* Convert on-disk form of btree root to in-memory form. */ +STATIC void +xfs_rtrmapbt_from_disk( + struct xfs_inode *ip, + struct xfs_rtrmap_root *dblock, + unsigned int dblocklen, + struct xfs_btree_block *rblock) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_rmap_key *fkp; + __be64 *fpp; + struct xfs_rmap_key *tkp; + __be64 *tpp; + struct xfs_rmap_rec *frp; + struct xfs_rmap_rec *trp; + unsigned int rblocklen = xfs_rtrmap_broot_space(mp, dblock); + unsigned int numrecs; + unsigned int maxrecs; + + xfs_btree_init_block(mp, rblock, &xfs_rtrmapbt_ops, 0, 0, ip->i_ino); + + rblock->bb_level = dblock->bb_level; + rblock->bb_numrecs = dblock->bb_numrecs; + numrecs = be16_to_cpu(dblock->bb_numrecs); + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrmapbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrmap_droot_key_addr(dblock, 1); + tkp = xfs_rtrmap_key_addr(rblock, 1); + fpp = xfs_rtrmap_droot_ptr_addr(dblock, 1, maxrecs); + tpp = xfs_rtrmap_broot_ptr_addr(mp, rblock, 1, rblocklen); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrmap_droot_rec_addr(dblock, 1); + trp = xfs_rtrmap_rec_addr(rblock, 1); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Load a realtime reverse mapping btree root in from disk. */ +int +xfs_iformat_rtrmap( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_rtrmap_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + struct xfs_btree_block *broot; + unsigned int numrecs; + unsigned int level; + int dsize; + + /* + * growfs must create the rtrmap inodes before adding a realtime volume + * to the filesystem, so we cannot use the rtrmapbt predicate here. + */ + if (!xfs_has_rmapbt(ip->i_mount)) + return -EFSCORRUPTED; + + dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK); + numrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (level > mp->m_rtrmap_maxlevels || + xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) + return -EFSCORRUPTED; + + broot = xfs_broot_alloc(xfs_ifork_ptr(ip, XFS_DATA_FORK), + xfs_rtrmap_broot_space_calc(mp, level, numrecs)); + if (broot) + xfs_rtrmapbt_from_disk(ip, dfp, dsize, broot); + return 0; +} + +/* Convert in-memory form of btree root to on-disk form. */ +void +xfs_rtrmapbt_to_disk( + struct xfs_mount *mp, + struct xfs_btree_block *rblock, + unsigned int rblocklen, + struct xfs_rtrmap_root *dblock, + unsigned int dblocklen) +{ + struct xfs_rmap_key *fkp; + __be64 *fpp; + struct xfs_rmap_key *tkp; + __be64 *tpp; + struct xfs_rmap_rec *frp; + struct xfs_rmap_rec *trp; + unsigned int numrecs; + unsigned int maxrecs; + + ASSERT(rblock->bb_magic == cpu_to_be32(XFS_RTRMAP_CRC_MAGIC)); + ASSERT(uuid_equal(&rblock->bb_u.l.bb_uuid, &mp->m_sb.sb_meta_uuid)); + ASSERT(rblock->bb_u.l.bb_blkno == cpu_to_be64(XFS_BUF_DADDR_NULL)); + ASSERT(rblock->bb_u.l.bb_leftsib == cpu_to_be64(NULLFSBLOCK)); + ASSERT(rblock->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK)); + + dblock->bb_level = rblock->bb_level; + dblock->bb_numrecs = rblock->bb_numrecs; + numrecs = be16_to_cpu(rblock->bb_numrecs); + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrmapbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrmap_key_addr(rblock, 1); + tkp = xfs_rtrmap_droot_key_addr(dblock, 1); + fpp = xfs_rtrmap_broot_ptr_addr(mp, rblock, 1, rblocklen); + tpp = xfs_rtrmap_droot_ptr_addr(dblock, 1, maxrecs); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrmap_rec_addr(rblock, 1); + trp = xfs_rtrmap_droot_rec_addr(dblock, 1); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Flush a realtime reverse mapping btree root out to disk. */ +void +xfs_iflush_rtrmap( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrmap_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + + ASSERT(ifp->if_broot != NULL); + ASSERT(ifp->if_broot_bytes > 0); + ASSERT(xfs_rtrmap_droot_space(ifp->if_broot) <= + xfs_inode_fork_size(ip, XFS_DATA_FORK)); + xfs_rtrmapbt_to_disk(ip->i_mount, ifp->if_broot, ifp->if_broot_bytes, + dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index ad5cb1078bc1a0..ddae34cac10f1c 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -25,6 +25,7 @@ void xfs_rtrmapbt_commit_staged_btree(struct xfs_btree_cur *cur, unsigned int xfs_rtrmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, bool leaf); void xfs_rtrmapbt_compute_maxlevels(struct xfs_mount *mp); +unsigned int xfs_rtrmapbt_droot_maxrecs(unsigned int blocklen, bool leaf); /* * Addresses of records, keys, and pointers within an incore rtrmapbt block. @@ -81,4 +82,115 @@ void xfs_rtrmapbt_destroy_cur_cache(void); xfs_filblks_t xfs_rtrmapbt_calc_reserves(struct xfs_mount *mp); +/* Addresses of key, pointers, and records within an ondisk rtrmapbt block. */ + +static inline struct xfs_rmap_rec * +xfs_rtrmap_droot_rec_addr( + struct xfs_rtrmap_root *block, + unsigned int index) +{ + return (struct xfs_rmap_rec *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_rmap_rec)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_droot_key_addr( + struct xfs_rtrmap_root *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)(block + 1) + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_droot_ptr_addr( + struct xfs_rtrmap_root *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrmap_ptr_t *) + ((char *)(block + 1) + + maxrecs * 2 * sizeof(struct xfs_rmap_key) + + (index - 1) * sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Address of pointers within the incore btree root. + * + * These are to be used when we know the size of the block and + * we don't have a cursor. + */ +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_broot_ptr_addr( + struct xfs_mount *mp, + struct xfs_btree_block *bb, + unsigned int index, + unsigned int block_size) +{ + return xfs_rtrmap_ptr_addr(bb, index, + xfs_rtrmapbt_maxrecs(mp, block_size, false)); +} + +/* + * Compute the space required for the incore btree root containing the given + * number of records. + */ +static inline size_t +xfs_rtrmap_broot_space_calc( + struct xfs_mount *mp, + unsigned int level, + unsigned int nrecs) +{ + size_t sz = XFS_RTRMAP_BLOCK_LEN; + + if (level > 0) + return sz + nrecs * (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); + return sz + nrecs * sizeof(struct xfs_rmap_rec); +} + +/* + * Compute the space required for the incore btree root given the ondisk + * btree root block. + */ +static inline size_t +xfs_rtrmap_broot_space(struct xfs_mount *mp, struct xfs_rtrmap_root *bb) +{ + return xfs_rtrmap_broot_space_calc(mp, be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +/* Compute the space required for the ondisk root block. */ +static inline size_t +xfs_rtrmap_droot_space_calc( + unsigned int level, + unsigned int nrecs) +{ + size_t sz = sizeof(struct xfs_rtrmap_root); + + if (level > 0) + return sz + nrecs * (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); + return sz + nrecs * sizeof(struct xfs_rmap_rec); +} + +/* + * Compute the space required for the ondisk root block given an incore root + * block. + */ +static inline size_t +xfs_rtrmap_droot_space(struct xfs_btree_block *bb) +{ + return xfs_rtrmap_droot_space_calc(be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +int xfs_iformat_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); +void xfs_rtrmapbt_to_disk(struct xfs_mount *mp, struct xfs_btree_block *rblock, + unsigned int rblocklen, struct xfs_rtrmap_root *dblock, + unsigned int dblocklen); +void xfs_iflush_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index 5bb057ba76ead4..daaa4098f4d5a6 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -22,6 +22,7 @@ #include "xfs_log_recover.h" #include "xfs_icache.h" #include "xfs_bmap_btree.h" +#include "xfs_rtrmap_btree.h" STATIC void xlog_recover_inode_ra_pass2( @@ -282,6 +283,9 @@ xlog_recover_inode_dbroot( break; case XFS_DINODE_FMT_META_BTREE: switch (be16_to_cpu(dip->di_metatype)) { + case XFS_METAFILE_RTRMAP: + xfs_rtrmapbt_to_disk(mp, src, len, dfork, dsize); + return 0; default: ASSERT(0); return -EFSCORRUPTED; From patchwork Fri Dec 13 01:04:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906208 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A29DD17BA6 for ; Fri, 13 Dec 2024 01:04:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051870; cv=none; b=UsneujOsKp23sMzjQpzMyVFrC2sVkcdCgZh4RcKrpvf8TromkMBv6Rioh/fYkkGcGtDFubXqUgk8tlgEQY6DR513OrhjdR2IFCExmVzut8HUrH8WfO+hfKHDT8+MEEWMy5CXmosuIUm/z/eS2IQUqLPYikeyfekSJvyQNWX/j08= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051870; c=relaxed/simple; bh=92vXn0n2Z4Ah3mUr+DrbW8Xwr0BdqzzN+nbFR7Lqvwk=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qEdpG4jZryTz7uHH31QHKHKIl7Exj5ETUOL7ZlRRED9MshLjo1UUAy1h2X0Bs1RnCdZdn5EaBL/+t8iY/oizU4J5ZoYyQ2Egv8o+5jVZ/hgI3Pok4VAyn/UmhRctlGd9WRzEx/KL/wBvXWMaP/x3kn8wYvdFL+Brjfhsq5lSAS4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=M+/TQVXL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="M+/TQVXL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1FA51C4CECE; Fri, 13 Dec 2024 01:04:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051870; bh=92vXn0n2Z4Ah3mUr+DrbW8Xwr0BdqzzN+nbFR7Lqvwk=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=M+/TQVXLZF16AIeu7pWQaI17m77gMiwb5m2ztr0fG87auEkKgGdH4jM+bXmEYVsCz pBWRAQhJZnt0fOCUid0YJnvlGjDcxp32P1ghhyjUGQu7g3LKgaAWRqYZ28qvisNvXN 2LGBKEX7rZzByQysMkHQwIwPSuynheiDbPhw9ZYztoQKokB9rW2wf0Pa9WgXpPYrsK ckm5ZnWmfyALEU7xczHOE6dBhELi/660FlALEREtLnRqOn86Hq8NgDlrWgzvy/RVIt jGJBCgJ6IWdabJJpncFgM7gXbvYpckKvJjIjcPeCPSjrCboS9bNJIjsC/UM0V6CGFK p64npheWI4LBQ== Date: Thu, 12 Dec 2024 17:04:29 -0800 Subject: [PATCH 15/37] xfs: wire up rmap map and unmap to the realtime rmapbt From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123571.1181370.10004699515258970987.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Connect the map and unmap reverse-mapping operations to the realtime rmapbt via the deferred operation callbacks. This enables us to perform rmap operations against the correct btree. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_rmap.c | 78 +++++++++++++++++++++++++++++-------------- fs/xfs/libxfs/xfs_rtgroup.c | 9 +++++ fs/xfs/libxfs/xfs_rtgroup.h | 5 ++- 3 files changed, 66 insertions(+), 26 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 8d3cea90c7cd04..2f0688a57991cc 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -26,6 +26,7 @@ #include "xfs_health.h" #include "xfs_rmap_item.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" struct kmem_cache *xfs_rmap_intent_cache; @@ -2619,6 +2620,47 @@ __xfs_rmap_finish_intent( } } +static int +xfs_rmap_finish_init_cursor( + struct xfs_trans *tp, + struct xfs_rmap_intent *ri, + struct xfs_btree_cur **pcur) +{ + struct xfs_perag *pag = to_perag(ri->ri_group); + struct xfs_buf *agbp = NULL; + int error; + + /* + * Refresh the freelist before we start changing the rmapbt, because a + * shape change could cause us to allocate blocks. + */ + error = xfs_free_extent_fix_freelist(tp, pag, &agbp); + if (error) { + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGFL); + return error; + } + if (XFS_IS_CORRUPT(tp->t_mountp, !agbp)) { + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGFL); + return -EFSCORRUPTED; + } + *pcur = xfs_rmapbt_init_cursor(tp->t_mountp, tp, agbp, pag); + return 0; +} + +static int +xfs_rtrmap_finish_init_cursor( + struct xfs_trans *tp, + struct xfs_rmap_intent *ri, + struct xfs_btree_cur **pcur) +{ + struct xfs_rtgroup *rtg = to_rtg(ri->ri_group); + + xfs_rtgroup_lock(rtg, XFS_RTGLOCK_RMAP); + xfs_rtgroup_trans_join(tp, rtg, XFS_RTGLOCK_RMAP); + *pcur = xfs_rtrmapbt_init_cursor(tp, rtg); + return 0; +} + /* * Process one of the deferred rmap operations. We pass back the * btree cursor to maintain our lock on the rmapbt between calls. @@ -2634,8 +2676,6 @@ xfs_rmap_finish_one( { struct xfs_owner_info oinfo; struct xfs_mount *mp = tp->t_mountp; - struct xfs_btree_cur *rcur = *pcur; - struct xfs_buf *agbp = NULL; xfs_agblock_t bno; bool unwritten; int error = 0; @@ -2649,38 +2689,26 @@ xfs_rmap_finish_one( * If we haven't gotten a cursor or the cursor AG doesn't match * the startblock, get one now. */ - if (rcur != NULL && rcur->bc_group != ri->ri_group) { - xfs_btree_del_cursor(rcur, 0); - rcur = NULL; + if (*pcur != NULL && (*pcur)->bc_group != ri->ri_group) { + xfs_btree_del_cursor(*pcur, 0); *pcur = NULL; } - if (rcur == NULL) { - struct xfs_perag *pag = to_perag(ri->ri_group); - - /* - * Refresh the freelist before we start changing the - * rmapbt, because a shape change could cause us to - * allocate blocks. - */ - error = xfs_free_extent_fix_freelist(tp, pag, &agbp); - if (error) { - xfs_ag_mark_sick(pag, XFS_SICK_AG_AGFL); + if (*pcur == NULL) { + if (ri->ri_group->xg_type == XG_TYPE_RTG) + error = xfs_rtrmap_finish_init_cursor(tp, ri, pcur); + else + error = xfs_rmap_finish_init_cursor(tp, ri, pcur); + if (error) return error; - } - if (XFS_IS_CORRUPT(tp->t_mountp, !agbp)) { - xfs_ag_mark_sick(pag, XFS_SICK_AG_AGFL); - return -EFSCORRUPTED; - } - - *pcur = rcur = xfs_rmapbt_init_cursor(mp, tp, agbp, pag); } xfs_rmap_ino_owner(&oinfo, ri->ri_owner, ri->ri_whichfork, ri->ri_bmap.br_startoff); unwritten = ri->ri_bmap.br_state == XFS_EXT_UNWRITTEN; - bno = XFS_FSB_TO_AGBNO(rcur->bc_mp, ri->ri_bmap.br_startblock); - error = __xfs_rmap_finish_intent(rcur, ri->ri_type, bno, + bno = xfs_fsb_to_gbno(mp, ri->ri_bmap.br_startblock, + ri->ri_group->xg_type); + error = __xfs_rmap_finish_intent(*pcur, ri->ri_type, bno, ri->ri_bmap.br_blockcount, &oinfo, unwritten); if (error) return error; diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index 1b56c13b282788..af1716ec0691a4 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -202,6 +202,9 @@ xfs_rtgroup_lock( } else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) { xfs_ilock(rtg_bitmap(rtg), XFS_ILOCK_SHARED); } + + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg)) + xfs_ilock(rtg_rmap(rtg), XFS_ILOCK_EXCL); } /* Unlock metadata inodes associated with this rt group. */ @@ -214,6 +217,9 @@ xfs_rtgroup_unlock( ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) || !(rtglock_flags & XFS_RTGLOCK_BITMAP)); + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg)) + xfs_iunlock(rtg_rmap(rtg), XFS_ILOCK_EXCL); + if (rtglock_flags & XFS_RTGLOCK_BITMAP) { xfs_iunlock(rtg_summary(rtg), XFS_ILOCK_EXCL); xfs_iunlock(rtg_bitmap(rtg), XFS_ILOCK_EXCL); @@ -239,6 +245,9 @@ xfs_rtgroup_trans_join( xfs_trans_ijoin(tp, rtg_bitmap(rtg), XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, rtg_summary(rtg), XFS_ILOCK_EXCL); } + + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg_rmap(rtg)) + xfs_trans_ijoin(tp, rtg_rmap(rtg), XFS_ILOCK_EXCL); } /* Retrieve rt group geometry. */ diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 5b61291d26691f..733da7417c9cd7 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -265,9 +265,12 @@ int xfs_update_last_rtgroup_size(struct xfs_mount *mp, #define XFS_RTGLOCK_BITMAP (1U << 0) /* Lock the rt bitmap inode in shared mode */ #define XFS_RTGLOCK_BITMAP_SHARED (1U << 1) +/* Lock the rt rmap inode in exclusive mode */ +#define XFS_RTGLOCK_RMAP (1U << 2) #define XFS_RTGLOCK_ALL_FLAGS (XFS_RTGLOCK_BITMAP | \ - XFS_RTGLOCK_BITMAP_SHARED) + XFS_RTGLOCK_BITMAP_SHARED | \ + XFS_RTGLOCK_RMAP) void xfs_rtgroup_lock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags); void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags); From patchwork Fri Dec 13 01:04:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906209 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3171317BA1 for ; Fri, 13 Dec 2024 01:04:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051886; cv=none; b=EMvm1CA5zgSe+Jkz+zKXetR5YAoW0NS7vbxJu5W39NshJWTHPAB/c+YGSeSzBJhoeekRLLRXFsEaztGMJ4nNQrd2ID3yS+UhgS6GH/Z7aqe52esuDpOe0EjpROE+WOK+72vmyLHUY7QqkCclHHzMiRpMhTnsuWh87bRSM3hCwoc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051886; c=relaxed/simple; bh=SF70s06Y8VArL+TkwKyj7RzaVDX7Iv2fjS0RGiBtY3g=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WJjdVKtX/U67ViylIVSU7GerpJlrxORwbb7Uhr3TaTlgzJCsyURsTGt0df9Lew2nHz9uPqKGwp3Bx9N+1Byn6uFkRrdrP2FpNk+s4lXLUbiFEdBeEXyLEh4+TJuKuIbZaSZmsTOipyNe8dWi1lJp8JbStDBRhVakgCiv/m6uuJ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=r6nnwdnU; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="r6nnwdnU" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD8B0C4CECE; Fri, 13 Dec 2024 01:04:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051885; bh=SF70s06Y8VArL+TkwKyj7RzaVDX7Iv2fjS0RGiBtY3g=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=r6nnwdnUj19GF739EjcH/Y0MhBC0I/LHTJ5BDNvyhARxcciLKEfk6p5LYAZxIqj+L iJGk65JZXK+fXxLGOjeXezjsK98HvcxObX9tkrOLF0uPyUhPNwiO+vRcsLfzri/6Xu jF+Qu56tG1tVf3qm7hlXKprIjfYyQ5hClaJCHhWVnPlv9w7BJw9vrl665uZRNtkmnX n/wQxoQ9S5YK0xaYjBMFNOtnTloeA21bxYayIyeH0XI2maoki2LPi4li9oyAdlIdKd 873asP4vDXSMsgXgcXxZWXRvPHs6NmPA8vFrobtKUdYX4syo6vaL/D5gHpSj9utJ+s 4xkxpKyGEoVDg== Date: Thu, 12 Dec 2024 17:04:45 -0800 Subject: [PATCH 16/37] xfs: create routine to allocate and initialize a realtime rmap btree inode From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123589.1181370.6617549529420695155.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create a library routine to allocate and initialize an empty realtime rmapbt inode. We'll use this for mkfs and repair. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_rtgroup.c | 2 + fs/xfs/libxfs/xfs_rtrmap_btree.c | 54 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 5 ++++ fs/xfs/xfs_rtalloc.c | 12 +++++++- 4 files changed, 71 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index af1716ec0691a4..5f31b6e65d5d17 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -33,6 +33,7 @@ #include "xfs_rtbitmap.h" #include "xfs_metafile.h" #include "xfs_metadir.h" +#include "xfs_rtrmap_btree.h" /* Find the first usable fsblock in this rtgroup. */ static inline uint32_t @@ -363,6 +364,7 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = { * rtrmapbt predicate here. */ .enabled = xfs_has_rmapbt, + .create = xfs_rtrmapbt_create, }, }; diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index d90189a1ef10b5..7654661f4f5823 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -832,3 +832,57 @@ xfs_iflush_rtrmap( xfs_rtrmapbt_to_disk(ip->i_mount, ifp->if_broot, ifp->if_broot_bytes, dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); } + +/* + * Create a realtime rmap btree inode. + */ +int +xfs_rtrmapbt_create( + struct xfs_rtgroup *rtg, + struct xfs_inode *ip, + struct xfs_trans *tp, + bool init) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_mount *mp = ip->i_mount; + struct xfs_btree_block *broot; + + ifp->if_format = XFS_DINODE_FMT_META_BTREE; + ASSERT(ifp->if_broot_bytes == 0); + ASSERT(ifp->if_bytes == 0); + + /* Initialize the empty incore btree root. */ + broot = xfs_broot_realloc(ifp, xfs_rtrmap_broot_space_calc(mp, 0, 0)); + if (broot) + xfs_btree_init_block(mp, broot, &xfs_rtrmapbt_ops, 0, 0, + ip->i_ino); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE | XFS_ILOG_DBROOT); + + return 0; +} + +/* + * Initialize an rmap for a realtime superblock using the potentially updated + * rt geometry in the provided @mp. + */ +int +xfs_rtrmapbt_init_rtsb( + struct xfs_mount *mp, + struct xfs_rtgroup *rtg, + struct xfs_trans *tp) +{ + struct xfs_rmap_irec rmap = { + .rm_blockcount = mp->m_sb.sb_rextsize, + .rm_owner = XFS_RMAP_OWN_FS, + }; + struct xfs_btree_cur *cur; + int error; + + ASSERT(xfs_has_rtsb(mp)); + ASSERT(rtg_rgno(rtg) == 0); + + cur = xfs_rtrmapbt_init_cursor(tp, rtg); + error = xfs_rmap_map_raw(cur, &rmap); + xfs_btree_del_cursor(cur, error); + return error; +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index ddae34cac10f1c..db313492b17eed 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -193,4 +193,9 @@ void xfs_rtrmapbt_to_disk(struct xfs_mount *mp, struct xfs_btree_block *rblock, unsigned int dblocklen); void xfs_iflush_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); +int xfs_rtrmapbt_create(struct xfs_rtgroup *rtg, struct xfs_inode *ip, + struct xfs_trans *tp, bool init); +int xfs_rtrmapbt_init_rtsb(struct xfs_mount *mp, struct xfs_rtgroup *rtg, + struct xfs_trans *tp); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 2245f9ecaa3398..c7efd926413981 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -846,6 +846,13 @@ xfs_growfs_rt_init_rtsb( mp->m_rtsb_bp = rtsb_bp; error = xfs_bwrite(rtsb_bp); xfs_buf_unlock(rtsb_bp); + if (error) + return error; + + /* Initialize the rtrmap to reflect the rtsb. */ + if (rtg_rmap(args->rtg) != NULL) + error = xfs_rtrmapbt_init_rtsb(nargs->mp, args->rtg, args->tp); + return error; } @@ -894,8 +901,9 @@ xfs_growfs_rt_bmblock( goto out_free; nargs.tp = args.tp; - xfs_rtgroup_lock(args.rtg, XFS_RTGLOCK_BITMAP); - xfs_rtgroup_trans_join(args.tp, args.rtg, XFS_RTGLOCK_BITMAP); + xfs_rtgroup_lock(args.rtg, XFS_RTGLOCK_BITMAP | XFS_RTGLOCK_RMAP); + xfs_rtgroup_trans_join(args.tp, args.rtg, + XFS_RTGLOCK_BITMAP | XFS_RTGLOCK_RMAP); /* * Update the bitmap inode's size ondisk and incore. We need to update From patchwork Fri Dec 13 01:05:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906210 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E00D17BA9 for ; Fri, 13 Dec 2024 01:05:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051901; cv=none; b=cmPa5Ce88XmDlZ5oab6ywdbksO/RMbHhXpGen9yY7SR9XQ1BqHZfvcie8XWW3dfUYF3oXgVmTgMJLlsKKIQVr4uQuyrOxMuTBgqE5a2OWN5rYUxOPKnJv73+Cz2vXSjleQZ8fJHKwJbLh7QIPTxWzqNZeQTFr+Un4foOx0NuVnE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051901; c=relaxed/simple; bh=tHF5w+JH9uAYgiE/U956Yj4mGJ45thHXlfTNZjb8Po0=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=bIj903VgzwurBJuVuc/N/LThI7mlLokGay2J24CBGJL4WeiQj2EVN6oMgaHaNj2DxZDhiHzbyXwnRcfYyEWjuUV1j2/MKCKVtT+M3k+GIMpcA/uqHsQIB2eaZgJxliKR9HqRU6aBFZhZCiBg6IcxlzMok5oV8Yax0rqjnGB8zkQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eCu/AsnS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eCu/AsnS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6A33FC4CED4; Fri, 13 Dec 2024 01:05:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051901; bh=tHF5w+JH9uAYgiE/U956Yj4mGJ45thHXlfTNZjb8Po0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=eCu/AsnSy6u9Os4+bvd0MgKiZ2DXIQ0zI2tNJx1bohgTn0js07biQpr99h1Dkao0i e5UzVAKN6MFQGWggzLzOai+nUyzB38mempLAT5KL9Nr2ioZPC+TPQTxJV/D3zc7KUg 0F6TyrUwPZ8UqNj9d5u/+e+NWlk+V4+vqifgbP32WAZhEcQ1toz1LD2ixZSLqd8xQf F+X5zL49JA7oYIatRyrO8qbRuHrtJQ5cQckhpHM+s5sZnSTHYRFgG+JdlRDAJwoVVn ynC9meMkO7SrCqXmDb2rkDjep4VJIXc/LUD8JRoi9uYMabJllX0eNmAoe3bcpf7jcD EM9VXly+Wg8Ow== Date: Thu, 12 Dec 2024 17:05:00 -0800 Subject: [PATCH 17/37] xfs: wire up getfsmap to the realtime reverse mapping btree From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123606.1181370.13332852313724906133.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Connect the getfsmap ioctl to the realtime rmapbt. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_fsmap.c | 174 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 173 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 3290dd8524a69a..3e3ef16f65a335 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -26,6 +26,7 @@ #include "xfs_rtbitmap.h" #include "xfs_ag.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" /* Convert an xfs_fsmap to an fsmap. */ static void @@ -832,6 +833,174 @@ xfs_getfsmap_rtdev_rtbitmap( return error; } + +/* Transform a realtime rmapbt record into a fsmap */ +STATIC int +xfs_getfsmap_rtdev_rmapbt_helper( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xfs_fsmap_irec frec = { + .owner = rec->rm_owner, + .offset = rec->rm_offset, + .rm_flags = rec->rm_flags, + .rec_key = rec->rm_startblock, + }; + struct xfs_getfsmap_info *info = priv; + + return xfs_getfsmap_group_helper(info, cur->bc_tp, cur->bc_group, + rec->rm_startblock, rec->rm_blockcount, &frec); +} + +/* Actually query the rtrmap btree. */ +STATIC int +xfs_getfsmap_rtdev_rmapbt_query( + struct xfs_trans *tp, + struct xfs_getfsmap_info *info, + struct xfs_btree_cur **curpp) +{ + struct xfs_rtgroup *rtg = to_rtg(info->group); + + /* Query the rtrmapbt */ + xfs_rtgroup_lock(rtg, XFS_RTGLOCK_RMAP); + *curpp = xfs_rtrmapbt_init_cursor(tp, rtg); + return xfs_rmap_query_range(*curpp, &info->low, &info->high, + xfs_getfsmap_rtdev_rmapbt_helper, info); +} + +/* Execute a getfsmap query against the realtime device rmapbt. */ +STATIC int +xfs_getfsmap_rtdev_rmapbt( + struct xfs_trans *tp, + const struct xfs_fsmap *keys, + struct xfs_getfsmap_info *info) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_rtgroup *rtg = NULL; + struct xfs_btree_cur *bt_cur = NULL; + xfs_rtblock_t start_rtb; + xfs_rtblock_t end_rtb; + xfs_rgnumber_t start_rg, end_rg; + uint64_t eofs; + int error = 0; + + eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks); + if (keys[0].fmr_physical >= eofs) + return 0; + start_rtb = xfs_daddr_to_rtb(mp, keys[0].fmr_physical); + end_rtb = xfs_daddr_to_rtb(mp, min(eofs - 1, keys[1].fmr_physical)); + + info->missing_owner = XFS_FMR_OWN_FREE; + + /* + * Convert the fsmap low/high keys to rtgroup based keys. Initialize + * low to the fsmap low key and max out the high key to the end + * of the rtgroup. + */ + info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); + error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); + if (error) + return error; + info->low.rm_blockcount = XFS_BB_TO_FSBT(mp, keys[0].fmr_length); + xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); + + /* Adjust the low key if we are continuing from where we left off. */ + if (info->low.rm_blockcount == 0) { + /* No previous record from which to continue */ + } else if (rmap_not_shareable(mp, &info->low)) { + /* Last record seen was an unshareable extent */ + info->low.rm_owner = 0; + info->low.rm_offset = 0; + + start_rtb += info->low.rm_blockcount; + if (xfs_rtb_to_daddr(mp, start_rtb) >= eofs) + return 0; + } else { + /* Last record seen was a shareable file data extent */ + info->low.rm_offset += info->low.rm_blockcount; + } + info->low.rm_startblock = xfs_rtb_to_rgbno(mp, start_rtb); + + info->high.rm_startblock = -1U; + info->high.rm_owner = ULLONG_MAX; + info->high.rm_offset = ULLONG_MAX; + info->high.rm_blockcount = 0; + info->high.rm_flags = XFS_RMAP_KEY_FLAGS | XFS_RMAP_REC_FLAGS; + + start_rg = xfs_rtb_to_rgno(mp, start_rtb); + end_rg = xfs_rtb_to_rgno(mp, end_rtb); + + while ((rtg = xfs_rtgroup_next_range(mp, rtg, start_rg, end_rg))) { + /* + * Set the rtgroup high key from the fsmap high key if this + * is the last rtgroup that we're querying. + */ + info->group = rtg_group(rtg); + if (rtg_rgno(rtg) == end_rg) { + info->high.rm_startblock = + xfs_rtb_to_rgbno(mp, end_rtb); + info->high.rm_offset = + XFS_BB_TO_FSBT(mp, keys[1].fmr_offset); + error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]); + if (error) + break; + xfs_getfsmap_set_irec_flags(&info->high, &keys[1]); + } + + if (bt_cur) { + xfs_rtgroup_unlock(to_rtg(bt_cur->bc_group), + XFS_RTGLOCK_RMAP); + xfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR); + bt_cur = NULL; + } + + trace_xfs_fsmap_low_group_key(mp, info->dev, rtg_rgno(rtg), + &info->low); + trace_xfs_fsmap_high_group_key(mp, info->dev, rtg_rgno(rtg), + &info->high); + + error = xfs_getfsmap_rtdev_rmapbt_query(tp, info, &bt_cur); + if (error) + break; + + /* + * Set the rtgroup low key to the start of the rtgroup prior to + * moving on to the next rtgroup. + */ + if (rtg_rgno(rtg) == start_rg) + memset(&info->low, 0, sizeof(info->low)); + + /* + * If this is the last rtgroup, report any gap at the end of it + * before we drop the reference to the perag when the loop + * terminates. + */ + if (rtg_rgno(rtg) == end_rg) { + info->last = true; + error = xfs_getfsmap_rtdev_rmapbt_helper(bt_cur, + &info->high, info); + if (error) + break; + } + info->group = NULL; + } + + if (bt_cur) { + xfs_rtgroup_unlock(to_rtg(bt_cur->bc_group), + XFS_RTGLOCK_RMAP); + xfs_btree_del_cursor(bt_cur, error < 0 ? XFS_BTREE_ERROR : + XFS_BTREE_NOERROR); + } + + /* loop termination case */ + if (rtg) { + info->group = NULL; + xfs_rtgroup_rele(rtg); + } + + return error; +} #endif /* CONFIG_XFS_RT */ /* Do we recognize the device? */ @@ -971,7 +1140,10 @@ xfs_getfsmap( if (mp->m_rtdev_targp) { handlers[2].nr_sectors = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks); handlers[2].dev = new_encode_dev(mp->m_rtdev_targp->bt_dev); - handlers[2].fn = xfs_getfsmap_rtdev_rtbitmap; + if (use_rmap) + handlers[2].fn = xfs_getfsmap_rtdev_rmapbt; + else + handlers[2].fn = xfs_getfsmap_rtdev_rtbitmap; } #endif /* CONFIG_XFS_RT */ From patchwork Fri Dec 13 01:05:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906211 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35E062B9BF for ; Fri, 13 Dec 2024 01:05:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051917; cv=none; b=T5x8h+whNZkAr2udI2AEi5IndfJXY/p6iI2VuYK2MJiBBwqlO9z+kVPoO5PMBbxxJE5a+x5ELMM2rn2nfWa7+YU8grWinqgTWgWFwGH4GxSP/iDrcXunKfexjlLASCIbyOYpJFznP4jkpK1ME/upWtPsHeiJlZbXeUo+TR2kyxw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051917; c=relaxed/simple; bh=4/HfTthM7p9aXllprwGiMmn4K9FIrk1aaeUUm68PTfY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XkCNJBgVMIx45uxlGqntmphnSIvotaiLEaMby0km/y//kDz8leogX0mLvwc4M1VfR7nYbrOLuEm7yRU+RMNLjh+Eo1WDB/pO6ni3iK851VdAlkUkphr0CrHONjjkXiQGFDtjpHRA60300m1V3UIkAytKPv6V/HVk9Av5cSDGRag= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NECE25Cx; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NECE25Cx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 022ABC4CECE; Fri, 13 Dec 2024 01:05:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051917; bh=4/HfTthM7p9aXllprwGiMmn4K9FIrk1aaeUUm68PTfY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=NECE25Cx8ZVjp9avx9mgBCrfy/TnK5u2SyyykahSlc4NQybnKHFNFm+kOS6kV2JXw lLW3YS98u1D5s6QcskwnfQ7RPx/xZaVy7zVrGddnF6G4lh2zN+7dFXFnDkpP0PvCJE Rp8KOnXknv22oJdzOEqibpjAqg7TvkFwXFjuRG4S3BapG9AkmopXUOdyoNFwaWYDJK E17O7xQj2DjMQgfFFPa5Ln1Evi3PZEiysNYYWzG0HVhCu71BJPUWiXxq55pWAmQsh8 XB6iljI56TsKbY5Ski8authoyi//PQecPDU2+jlo6/FeaXwe+s+ctWN3903ObAH2wr KomIXF45DWRNA== Date: Thu, 12 Dec 2024 17:05:16 -0800 Subject: [PATCH 18/37] xfs: check that the rtrmapbt maxlevels doesn't increase when growing fs From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123623.1181370.1362773049349118652.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong The size of filesystem transaction reservations depends on the maximum height (maxlevels) of the realtime btrees. Since we don't want a grow operation to increase the reservation size enough that we'll fail the minimum log size checks on the next mount, constrain growfs operations if they would cause an increase in those maxlevels. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_fsops.c | 11 +++++++++++ fs/xfs/xfs_rtalloc.c | 25 ++++++++++++++++++------- fs/xfs/xfs_rtalloc.h | 10 ++++++++++ fs/xfs/xfs_trace.h | 21 +++++++++++++++++++++ 4 files changed, 60 insertions(+), 7 deletions(-) diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index e1145107d8cbd1..9df5a09c0acd3b 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -22,6 +22,7 @@ #include "xfs_ag_resv.h" #include "xfs_trace.h" #include "xfs_rtalloc.h" +#include "xfs_rtrmap_btree.h" /* * Write new AG headers to disk. Non-transactional, but need to be @@ -114,6 +115,12 @@ xfs_growfs_data_private( xfs_buf_relse(bp); } + /* Make sure the new fs size won't cause problems with the log. */ + error = xfs_growfs_check_rtgeom(mp, nb, mp->m_sb.sb_rblocks, + mp->m_sb.sb_rextsize); + if (error) + return error; + nb_div = nb; nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks); if (nb_mod && nb_mod >= XFS_MIN_AG_BLOCKS) @@ -221,7 +228,11 @@ xfs_growfs_data_private( error = xfs_fs_reserve_ag_blocks(mp); if (error == -ENOSPC) error = 0; + + /* Compute new maxlevels for rt btrees. */ + xfs_rtrmapbt_compute_maxlevels(mp); } + return error; out_trans_cancel: diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index c7efd926413981..3c1bce5a4855f2 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -989,9 +989,11 @@ xfs_growfs_rt_bmblock( goto out_free; /* - * Ensure the mount RT feature flag is now set. + * Ensure the mount RT feature flag is now set, and compute new + * maxlevels for rt btrees. */ mp->m_features |= XFS_FEAT_REALTIME; + xfs_rtrmapbt_compute_maxlevels(mp); kfree(nmp); return 0; @@ -1159,29 +1161,37 @@ xfs_growfs_rtg( return error; } -static int +int xfs_growfs_check_rtgeom( const struct xfs_mount *mp, + xfs_rfsblock_t dblocks, xfs_rfsblock_t rblocks, xfs_extlen_t rextsize) { + xfs_extlen_t min_logfsbs; struct xfs_mount *nmp; - int error = 0; nmp = xfs_growfs_rt_alloc_fake_mount(mp, rblocks, rextsize); if (!nmp) return -ENOMEM; + nmp->m_sb.sb_dblocks = dblocks; + + xfs_rtrmapbt_compute_maxlevels(nmp); + xfs_trans_resv_calc(nmp, M_RES(nmp)); /* * New summary size can't be more than half the size of the log. This * prevents us from getting a log overflow, since we'll log basically * the whole summary file at once. */ - if (nmp->m_rsumblocks > (mp->m_sb.sb_logblocks >> 1)) - error = -EINVAL; + min_logfsbs = min_t(xfs_extlen_t, xfs_log_calc_minimum_size(nmp), + nmp->m_rsumblocks * 2); kfree(nmp); - return error; + + if (min_logfsbs > mp->m_sb.sb_logblocks) + return -EINVAL; + return 0; } /* @@ -1300,7 +1310,8 @@ xfs_growfs_rt( goto out_unlock; /* Make sure the new fs size won't cause problems with the log. */ - error = xfs_growfs_check_rtgeom(mp, in->newblocks, in->extsize); + error = xfs_growfs_check_rtgeom(mp, mp->m_sb.sb_dblocks, in->newblocks, + in->extsize); if (error) goto out_unlock; diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index d87523e6a55006..9044f7226ab6fc 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -46,6 +46,8 @@ xfs_growfs_rt( xfs_growfs_rt_t *in); /* user supplied growfs struct */ int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp); +int xfs_growfs_check_rtgeom(const struct xfs_mount *mp, xfs_rfsblock_t dblocks, + xfs_rfsblock_t rblocks, xfs_agblock_t rextsize); #else # define xfs_growfs_rt(mp,in) (-ENOSYS) # define xfs_rtalloc_reinit_frextents(m) (0) @@ -65,6 +67,14 @@ xfs_rtmount_init( # define xfs_rtunmount_inodes(m) # define xfs_rt_resv_free(mp) ((void)0) # define xfs_rt_resv_init(mp) (0) + +static inline int +xfs_growfs_check_rtgeom(const struct xfs_mount *mp, + xfs_rfsblock_t dblocks, xfs_rfsblock_t rblocks, + xfs_extlen_t rextsize) +{ + return 0; +} #endif /* CONFIG_XFS_RT */ #endif /* __XFS_RTALLOC_H__ */ diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index a098935163b7c2..84cdc145e2d96a 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -5622,6 +5622,27 @@ DEFINE_METAFILE_RESV_EVENT(xfs_metafile_resv_free_space); DEFINE_METAFILE_RESV_EVENT(xfs_metafile_resv_critical); DEFINE_INODE_ERROR_EVENT(xfs_metafile_resv_init_error); +#ifdef CONFIG_XFS_RT +TRACE_EVENT(xfs_growfs_check_rtgeom, + TP_PROTO(const struct xfs_mount *mp, unsigned int min_logfsbs), + TP_ARGS(mp, min_logfsbs), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned int, logblocks) + __field(unsigned int, min_logfsbs) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->logblocks = mp->m_sb.sb_logblocks; + __entry->min_logfsbs = min_logfsbs; + ), + TP_printk("dev %d:%d logblocks %u min_logfsbs %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->logblocks, + __entry->min_logfsbs) +); +#endif /* CONFIG_XFS_RT */ + #endif /* _TRACE_XFS_H */ #undef TRACE_INCLUDE_PATH From patchwork Fri Dec 13 01:05:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906212 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4BE353BE for ; Fri, 13 Dec 2024 01:05:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051932; cv=none; b=fhoI8EKE/KaaCQKwSVoC1QjjARlaXuJlNFxVVijFQcCzMwAV0xYoc0ezU/nqOpgxwOQdciip9cnt76drXiRoCKVqQLwIzyFFrsudyYJZrAa4efo7rIYbx69V47qCqouxiQEMna6RANceJ4rfwFDs3Q/WlWjaES+NTgF4rvR5NKA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051932; c=relaxed/simple; bh=ej7SpNnQCxrpTsx4Z7nfWCq7PENLh92BrP7YKc2ylzM=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KcE84T+dnFduPAChmm0TNSi19EppZHtq92c26/ENLI9pB2plOSMmbM9T+UX50jW1BAG35dHx4q6HXJ4RWDLEw0L375UJr0dxpznATB8KQofBCCImPbEVvjamm03SUrU/Nz1gU9/6i9QQ11qMt/Oa5SHBtLezCewoKkgze/HZ99s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vDfe+e0S; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vDfe+e0S" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A9643C4CECE; Fri, 13 Dec 2024 01:05:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051932; bh=ej7SpNnQCxrpTsx4Z7nfWCq7PENLh92BrP7YKc2ylzM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=vDfe+e0SYGen7VzH0kkpaFn/hGhncw3C27u9zasswl05Th9hGNjE72DIlNt+hx3wV NRq5sG+EKT0qLc01wfdLErzYdZS3bq6uRzAHnjSly/3OIG3QaXTMekj8u3q18aItq+ n4BntixyYMrRtx02GlUSk7xMugehSk/Wqyi/reJArVA4DMChto6I6zLvRumklKRJX8 Gz184rMnhYRKkIdIBldJhvwQpO6OV84nu8Dx9hMb2s+EKEAod3MjuCo38ZKaGOhLHb Zu9MfdOaC3fOT6YbLSMpXV/cKkg6GFkKbb3HJaLWvB2wr74iPUY9vLrxXyfJTH52sU /7G9EUAUFGvHA== Date: Thu, 12 Dec 2024 17:05:32 -0800 Subject: [PATCH 19/37] xfs: report realtime rmap btree corruption errors to the health system From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123640.1181370.2788362731455898690.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Whenever we encounter corrupt realtime rmap btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_btree.h | 2 +- fs/xfs/libxfs/xfs_fs.h | 1 + fs/xfs/libxfs/xfs_health.h | 4 +++- fs/xfs/libxfs/xfs_rtgroup.c | 1 + fs/xfs/libxfs/xfs_rtrmap_btree.c | 10 ++++++++-- fs/xfs/xfs_health.c | 1 + 6 files changed, 15 insertions(+), 4 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index ee82dc777d6d5b..dbc047b2fb2cf5 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -135,7 +135,7 @@ struct xfs_btree_ops { /* offset of btree stats array */ unsigned int statoff; - /* sick mask for health reporting (only for XFS_BTREE_TYPE_AG) */ + /* sick mask for health reporting (not for bmap btrees) */ unsigned int sick_mask; /* cursor operations */ diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 41ce4d3d650ec7..7cca458ff81245 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -993,6 +993,7 @@ struct xfs_rtgroup_geometry { #define XFS_RTGROUP_GEOM_SICK_SUPER (1U << 0) /* superblock */ #define XFS_RTGROUP_GEOM_SICK_BITMAP (1U << 1) /* rtbitmap */ #define XFS_RTGROUP_GEOM_SICK_SUMMARY (1U << 2) /* rtsummary */ +#define XFS_RTGROUP_GEOM_SICK_RMAPBT (1U << 3) /* reverse mappings */ /* * ioctl commands that are used by Linux filesystems diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index d34986ac18c3fa..5c8a0aff6ba6e9 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -70,6 +70,7 @@ struct xfs_rtgroup; #define XFS_SICK_RG_SUPER (1 << 0) /* rt group superblock */ #define XFS_SICK_RG_BITMAP (1 << 1) /* rt group bitmap */ #define XFS_SICK_RG_SUMMARY (1 << 2) /* rt groups summary */ +#define XFS_SICK_RG_RMAPBT (1 << 3) /* reverse mappings */ /* Observable health issues for AG metadata. */ #define XFS_SICK_AG_SB (1 << 0) /* superblock */ @@ -115,7 +116,8 @@ struct xfs_rtgroup; #define XFS_SICK_RG_PRIMARY (XFS_SICK_RG_SUPER | \ XFS_SICK_RG_BITMAP | \ - XFS_SICK_RG_SUMMARY) + XFS_SICK_RG_SUMMARY | \ + XFS_SICK_RG_RMAPBT) #define XFS_SICK_AG_PRIMARY (XFS_SICK_AG_SB | \ XFS_SICK_AG_AGF | \ diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index 5f31b6e65d5d17..b7ed2d27d54553 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -357,6 +357,7 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = { [XFS_RTGI_RMAP] = { .name = "rmap", .metafile_type = XFS_METAFILE_RTRMAP, + .sick = XFS_SICK_RG_RMAPBT, .fmt_mask = 1U << XFS_DINODE_FMT_META_BTREE, /* * growfs must create the rtrmap inodes before adding a diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 7654661f4f5823..0a78dee01b1b2e 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -27,6 +27,7 @@ #include "xfs_extent_busy.h" #include "xfs_rtgroup.h" #include "xfs_bmap.h" +#include "xfs_health.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -496,6 +497,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { .lru_refs = XFS_RMAP_BTREE_REF, .statoff = XFS_STATS_CALC_INDEX(xs_rtrmap_2), + .sick_mask = XFS_SICK_RG_RMAPBT, .dup_cursor = xfs_rtrmapbt_dup_cursor, .alloc_block = xfs_btree_alloc_metafile_block, @@ -755,16 +757,20 @@ xfs_iformat_rtrmap( * growfs must create the rtrmap inodes before adding a realtime volume * to the filesystem, so we cannot use the rtrmapbt predicate here. */ - if (!xfs_has_rmapbt(ip->i_mount)) + if (!xfs_has_rmapbt(ip->i_mount)) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK); numrecs = be16_to_cpu(dfp->bb_numrecs); level = be16_to_cpu(dfp->bb_level); if (level > mp->m_rtrmap_maxlevels || - xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) + xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } broot = xfs_broot_alloc(xfs_ifork_ptr(ip, XFS_DATA_FORK), xfs_rtrmap_broot_space_calc(mp, level, numrecs)); diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index c7c2e656199862..d438c3c001c829 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -447,6 +447,7 @@ static const struct ioctl_sick_map rtgroup_map[] = { { XFS_SICK_RG_SUPER, XFS_RTGROUP_GEOM_SICK_SUPER }, { XFS_SICK_RG_BITMAP, XFS_RTGROUP_GEOM_SICK_BITMAP }, { XFS_SICK_RG_SUMMARY, XFS_RTGROUP_GEOM_SICK_SUMMARY }, + { XFS_SICK_RG_RMAPBT, XFS_RTGROUP_GEOM_SICK_RMAPBT }, }; /* Fill out rtgroup geometry health info. */ From patchwork Fri Dec 13 01:05:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906213 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C5F853BE for ; Fri, 13 Dec 2024 01:05:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051948; cv=none; b=KDtLzuHyGJPXywifmvFfkOoatfy1k9+PhTtXBK3ja4q4gWExHx2K7VaeXSE9oOmLklNMmIJ9BTY+W0aEuLtmwHWLTmUVr78WJ/1pSRmZRIYdoNsV8aq7SLsA+frSoQZAZYjqPEE3Mtu3NzSVyamOOlQ0OQsFAcJMUvJWdNJ1g0M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051948; c=relaxed/simple; bh=X/IaH8m+PjsJQI8kG8btIsJEIbYPTmpN6gOnc7y5L1U=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=dWotL9rCj6WK/pltZQWeHwyO+FpnA4qOiQQ3zcyfLyKnaA80MxoyP9NxY14RCWGBQakb7v3IGvFtXTGhmCFpAR8+F8y2ECnusYAv/Ua3eadS+9pzv30ksPlknOfDOT0kaE5YYajbZgDpDcRBAyMDEq2kF5GMKFY37WfDblQI7Tk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QR9HZQi8; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QR9HZQi8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 422E5C4CECE; Fri, 13 Dec 2024 01:05:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051948; bh=X/IaH8m+PjsJQI8kG8btIsJEIbYPTmpN6gOnc7y5L1U=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=QR9HZQi8JoiUQYcGstSyDemi2XT03c74RVbzYs6h6QWj7yG+ZbXztro9QeuNKdf2f rn42R8ZOmCY7bYDOzMw4gPlpq5u83oiRyN/zlUM67CsLEMx0J/wQ2bNeVhfOlbsCiL FzKycSPq5WcfS7diEVx6ghT1bL9nL8ZZVH/yKxDqqPiNiwzUgjDPY7ZaxFtzm27ZKD kh4gld20VBo8dUwxuraMZZ3irIozG7R584TGu6lBifcre8JNwHqUg5J7Vy8ioYL7ki s3530DJuxYvcG58qqBbR0AvGLkkdVxodWSYrvjUfbPxNeR42+pDfcggvtTE2hNTSWp oC7kNWNIKs3bw== Date: Thu, 12 Dec 2024 17:05:47 -0800 Subject: [PATCH 20/37] xfs: allow queued realtime intents to drain before scrubbing From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123657.1181370.9809445949360319107.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When a writer thread executes a chain of log intent items for the realtime volume, the ILOCKs taken during each step are for each rt metadata file, not the entire rt volume itself. Although scrub takes all rt metadata ILOCKs, this isn't sufficient to guard against scrub checking the rt volume while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding the realtime volume. When there's a collision, cross-referencing between data structures (e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the mount structure the same drain that we use to protect scrub against concurrent AG updates, but this time for the realtime volume. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/scrub/bmap.c | 7 ++++ fs/xfs/scrub/common.c | 72 ++++++++++++++++++++++++++++++++++++++++++++-- fs/xfs/scrub/common.h | 5 ++- fs/xfs/scrub/rgsuper.c | 4 ++- fs/xfs/scrub/rtbitmap.c | 8 ++++- fs/xfs/scrub/rtsummary.c | 5 +++ fs/xfs/scrub/scrub.c | 2 + fs/xfs/xfs_drain.c | 20 ++++++------- fs/xfs/xfs_drain.h | 7 +++- 9 files changed, 108 insertions(+), 22 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 0d7ad692822d48..dd99366643f832 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -324,10 +324,15 @@ xchk_bmap_rt_iextent_xref( irec->br_startoff, &error)) return; - xchk_rtgroup_lock(&info->sc->sr, XCHK_RTGLOCK_ALL); + error = xchk_rtgroup_lock(info->sc, &info->sc->sr, XCHK_RTGLOCK_ALL); + if (!xchk_fblock_process_error(info->sc, info->whichfork, + irec->br_startoff, &error)) + goto out_free; + xchk_xref_is_used_rt_space(info->sc, irec->br_startblock, irec->br_blockcount); +out_free: xchk_rtgroup_free(info->sc, &info->sc->sr); } diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 5cbd94b56582a4..613fb54e723ede 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -719,13 +719,79 @@ xchk_rtgroup_init( return 0; } -void +/* Lock all the rt group metadata inode ILOCKs and wait for intents. */ +int xchk_rtgroup_lock( + struct xfs_scrub *sc, struct xchk_rt *sr, unsigned int rtglock_flags) { - xfs_rtgroup_lock(sr->rtg, rtglock_flags); + int error = 0; + + ASSERT(sr->rtg != NULL); + + /* + * If we're /only/ locking the rtbitmap in shared mode, then we're + * obviously not trying to compare records in two metadata inodes. + * There's no need to drain intents here because the caller (most + * likely the rgsuper scanner) doesn't need that level of consistency. + */ + if (rtglock_flags == XFS_RTGLOCK_BITMAP_SHARED) { + xfs_rtgroup_lock(sr->rtg, rtglock_flags); + sr->rtlock_flags = rtglock_flags; + return 0; + } + + do { + if (xchk_should_terminate(sc, &error)) + return error; + + xfs_rtgroup_lock(sr->rtg, rtglock_flags); + + /* + * If we've grabbed a non-metadata file for scrubbing, we + * assume that holding its ILOCK will suffice to coordinate + * with any rt intent chains involving this inode. + */ + if (sc->ip && !xfs_is_internal_inode(sc->ip)) + break; + + /* + * Decide if the rt group is quiet enough for all metadata to + * be consistent with each other. Regular file IO doesn't get + * to lock all the rt inodes at the same time, which means that + * there could be other threads in the middle of processing a + * chain of deferred ops. + * + * We just locked all the metadata inodes for this rt group; + * now take a look to see if there are any intents in progress. + * If there are, drop the rt group inode locks and wait for the + * intents to drain. Since we hold the rt group inode locks + * for the duration of the scrub, this is the only time we have + * to sample the intents counter; any threads increasing it + * after this point can't possibly be in the middle of a chain + * of rt metadata updates. + * + * Obviously, this should be slanted against scrub and in favor + * of runtime threads. + */ + if (!xfs_group_intent_busy(rtg_group(sr->rtg))) + break; + + xfs_rtgroup_unlock(sr->rtg, rtglock_flags); + + if (!(sc->flags & XCHK_FSGATES_DRAIN)) + return -ECHRNG; + error = xfs_group_intent_drain(rtg_group(sr->rtg)); + if (error) { + if (error == -ERESTARTSYS) + error = -EINTR; + return error; + } + } while (1); + sr->rtlock_flags = rtglock_flags; + return 0; } /* @@ -1379,7 +1445,7 @@ xchk_fsgates_enable( trace_xchk_fsgates_enable(sc, scrub_fsgates); if (scrub_fsgates & XCHK_FSGATES_DRAIN) - xfs_drain_wait_enable(); + xfs_defer_drain_wait_enable(); if (scrub_fsgates & XCHK_FSGATES_QUOTA) xfs_dqtrx_hook_enable(); diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 9ff3cafd867962..e734572a8dd6ec 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -141,12 +141,13 @@ xchk_rtgroup_init_existing( return error == -ENOENT ? -EFSCORRUPTED : error; } -void xchk_rtgroup_lock(struct xchk_rt *sr, unsigned int rtglock_flags); +int xchk_rtgroup_lock(struct xfs_scrub *sc, struct xchk_rt *sr, + unsigned int rtglock_flags); void xchk_rtgroup_free(struct xfs_scrub *sc, struct xchk_rt *sr); #else # define xchk_rtgroup_init(sc, rgno, sr) (-EFSCORRUPTED) # define xchk_rtgroup_init_existing(sc, rgno, sr) (-EFSCORRUPTED) -# define xchk_rtgroup_lock(sc, lockflags) do { } while (0) +# define xchk_rtgroup_lock(sc, sr, lockflags) (-EFSCORRUPTED) # define xchk_rtgroup_free(sc, sr) do { } while (0) #endif /* CONFIG_XFS_RT */ diff --git a/fs/xfs/scrub/rgsuper.c b/fs/xfs/scrub/rgsuper.c index 463b3573bb761b..e062c7d12565cd 100644 --- a/fs/xfs/scrub/rgsuper.c +++ b/fs/xfs/scrub/rgsuper.c @@ -61,7 +61,9 @@ xchk_rgsuperblock( if (!xchk_xref_process_error(sc, 0, 0, &error)) return error; - xchk_rtgroup_lock(&sc->sr, XFS_RTGLOCK_BITMAP_SHARED); + error = xchk_rtgroup_lock(sc, &sc->sr, XFS_RTGLOCK_BITMAP_SHARED); + if (error) + return error; /* * Since we already validated the rt superblock at mount time, we don't diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index fb4970c877abd3..819026ea2d741f 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -30,6 +30,9 @@ xchk_setup_rtbitmap( struct xchk_rtbitmap *rtb; int error; + if (xchk_need_intent_drain(sc)) + xchk_fsgates_enable(sc, XCHK_FSGATES_DRAIN); + rtb = kzalloc(sizeof(struct xchk_rtbitmap), XCHK_GFP_FLAGS); if (!rtb) return -ENOMEM; @@ -57,12 +60,15 @@ xchk_setup_rtbitmap( if (error) return error; + error = xchk_rtgroup_lock(sc, &sc->sr, XCHK_RTGLOCK_ALL); + if (error) + return error; + /* * Now that we've locked the rtbitmap, we can't race with growfsrt * trying to expand the bitmap or change the size of the rt volume. * Hence it is safe to compute and check the geometry values. */ - xchk_rtgroup_lock(&sc->sr, XFS_RTGLOCK_BITMAP); if (mp->m_sb.sb_rblocks) { rtb->rextents = xfs_blen_to_rtbxlen(mp, mp->m_sb.sb_rblocks); rtb->rextslog = xfs_compute_rextslog(rtb->rextents); diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index f1af5431b38856..4ac679c1bd29cd 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -89,6 +89,10 @@ xchk_setup_rtsummary( if (error) return error; + error = xchk_rtgroup_lock(sc, &sc->sr, XFS_RTGLOCK_BITMAP); + if (error) + return error; + /* * Now that we've locked the rtbitmap and rtsummary, we can't race with * growfsrt trying to expand the summary or change the size of the rt @@ -99,7 +103,6 @@ xchk_setup_rtsummary( * exclusively here. If we ever start caring about running concurrent * fsmap with scrub this could be changed. */ - xchk_rtgroup_lock(&sc->sr, XFS_RTGLOCK_BITMAP); if (mp->m_sb.sb_rblocks) { rts->rextents = xfs_blen_to_rtbxlen(mp, mp->m_sb.sb_rblocks); rts->rbmblocks = xfs_rtbitmap_blockcount(mp); diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 950f5a58dcd967..652d347cee9929 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -164,7 +164,7 @@ xchk_fsgates_disable( trace_xchk_fsgates_disable(sc, sc->flags & XCHK_FSGATES_ALL); if (sc->flags & XCHK_FSGATES_DRAIN) - xfs_drain_wait_disable(); + xfs_defer_drain_wait_disable(); if (sc->flags & XCHK_FSGATES_QUOTA) xfs_dqtrx_hook_disable(); diff --git a/fs/xfs/xfs_drain.c b/fs/xfs/xfs_drain.c index 5ede81fadbd8ca..fa5f31931efdb5 100644 --- a/fs/xfs/xfs_drain.c +++ b/fs/xfs/xfs_drain.c @@ -13,28 +13,28 @@ #include "xfs_trace.h" /* - * Use a static key here to reduce the overhead of xfs_drain_rele. If the - * compiler supports jump labels, the static branch will be replaced by a nop - * sled when there are no xfs_drain_wait callers. Online fsck is currently - * the only caller, so this is a reasonable tradeoff. + * Use a static key here to reduce the overhead of xfs_defer_drain_rele. If + * the compiler supports jump labels, the static branch will be replaced by a + * nop sled when there are no xfs_defer_drain_wait callers. Online fsck is + * currently the only caller, so this is a reasonable tradeoff. * * Note: Patching the kernel code requires taking the cpu hotplug lock. Other * parts of the kernel allocate memory with that lock held, which means that * XFS callers cannot hold any locks that might be used by memory reclaim or * writeback when calling the static_branch_{inc,dec} functions. */ -static DEFINE_STATIC_KEY_FALSE(xfs_drain_waiter_gate); +static DEFINE_STATIC_KEY_FALSE(xfs_defer_drain_waiter_gate); void -xfs_drain_wait_disable(void) +xfs_defer_drain_wait_disable(void) { - static_branch_dec(&xfs_drain_waiter_gate); + static_branch_dec(&xfs_defer_drain_waiter_gate); } void -xfs_drain_wait_enable(void) +xfs_defer_drain_wait_enable(void) { - static_branch_inc(&xfs_drain_waiter_gate); + static_branch_inc(&xfs_defer_drain_waiter_gate); } void @@ -71,7 +71,7 @@ static inline bool has_waiters(struct wait_queue_head *wq_head) static inline void xfs_defer_drain_rele(struct xfs_defer_drain *dr) { if (atomic_dec_and_test(&dr->dr_count) && - static_branch_unlikely(&xfs_drain_waiter_gate) && + static_branch_unlikely(&xfs_defer_drain_waiter_gate) && has_waiters(&dr->dr_waiters)) wake_up(&dr->dr_waiters); } diff --git a/fs/xfs/xfs_drain.h b/fs/xfs/xfs_drain.h index efcf88df9a5e70..4d446dbf65e519 100644 --- a/fs/xfs/xfs_drain.h +++ b/fs/xfs/xfs_drain.h @@ -26,8 +26,8 @@ struct xfs_defer_drain { void xfs_defer_drain_init(struct xfs_defer_drain *dr); void xfs_defer_drain_free(struct xfs_defer_drain *dr); -void xfs_drain_wait_disable(void); -void xfs_drain_wait_enable(void); +void xfs_defer_drain_wait_disable(void); +void xfs_defer_drain_wait_enable(void); /* * Deferred Work Intent Drains @@ -61,6 +61,9 @@ void xfs_drain_wait_enable(void); * All functions that create work items must increment the intent counter as * soon as the item is added to the transaction and cannot drop the counter * until the item is finished or cancelled. + * + * The same principles apply to realtime groups because the rt metadata inode + * ILOCKs are not held across transaction rolls. */ struct xfs_group *xfs_group_intent_get(struct xfs_mount *mp, xfs_fsblock_t fsbno, enum xfs_group_type type); From patchwork Fri Dec 13 01:06:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906214 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0796A17BA1 for ; Fri, 13 Dec 2024 01:06:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051964; cv=none; b=T97IDFbGLhuBxy6L9MwASayCRV7qUSAX31xBmEj+bsaew6/o8cKIhXuYMZdYO4p0J7PeSCnjEwEkKauX3L/jBipA66rBig8j5B4z2LZPUwYaN/gk8InyOv/3XffaqeexSHgm+k+0/Evi0gbw6d3LJoGyABzPMDfS/0xLmXi6dmk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051964; c=relaxed/simple; bh=WvbX/zre8tTHsRscoOst/TqChXT9zowUsTojYR40fyY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=efi4loVdK9gV12pF3JpfM5XACWVGaivbREbUWuuiIlrChCkyBZjoXahpIa5JXbbJGzmlrzQ0ke1oIwsUHclekmvqirBNmHRckNaNk6k6Ft20S/N6lH6pquZ90pHKYL1L7Wx5uJJI28dylFwel3ZCy/3wQYJbhqA14PDC3DkhTs4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=grJtWOsN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="grJtWOsN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D36C7C4CECE; Fri, 13 Dec 2024 01:06:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051963; bh=WvbX/zre8tTHsRscoOst/TqChXT9zowUsTojYR40fyY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=grJtWOsNl9lPHcm9CrADLHZR2ZMLDhgYM+ml064cX2+Ex08x79iVzSYkEGIwFbFOJ wVCtxjnsmx1XAVUdaV30DHkunYknJueAzIVQeJ2+G9pcWGoh7KLuBiPntUPu4FcZq3 8MIgGECIVFSYzH2D0NkN0SCI5YrmKxKeaWE9u17W135oTb9eS1csk2ihtrQo79ESp5 3XVrh2lyDBq+3DpS2C7Y3n/Kt34R7oSG3BHWwJ4+fAJhuctD5976ghO4nj7t83F0Bj Hb1U7m/rJjQr0Y/uuGQFAUk/oe8Wav61RsZmFCA9ZHT2iZfHNI7vgXdVGYXVpLLwo9 v+rs+SS7Cbp1A== Date: Thu, 12 Dec 2024 17:06:03 -0800 Subject: [PATCH 21/37] xfs: scrub the realtime rmapbt From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123675.1181370.3767884091836095098.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Check the realtime reverse mapping btree against the rtbitmap, and modify the rtbitmap scrub to check against the rtrmapbt. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_fs.h | 3 - fs/xfs/scrub/common.c | 86 ++++++++++++++++++++ fs/xfs/scrub/common.h | 10 ++ fs/xfs/scrub/health.c | 1 fs/xfs/scrub/inode.c | 6 - fs/xfs/scrub/inode_repair.c | 7 +- fs/xfs/scrub/repair.c | 1 fs/xfs/scrub/rtrmap.c | 184 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.c | 9 ++ fs/xfs/scrub/scrub.h | 5 + fs/xfs/scrub/stats.c | 1 fs/xfs/scrub/trace.h | 4 + 13 files changed, 307 insertions(+), 11 deletions(-) create mode 100644 fs/xfs/scrub/rtrmap.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index ff45efb2463f73..136a465e00d2b1 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -194,6 +194,7 @@ xfs-$(CONFIG_XFS_ONLINE_SCRUB_STATS) += scrub/stats.o xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ rgsuper.o \ rtbitmap.o \ + rtrmap.o \ rtsummary.o \ ) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 7cca458ff81245..34fcbcd0bcd5e3 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -737,9 +737,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_DIRTREE 28 /* directory tree structure */ #define XFS_SCRUB_TYPE_METAPATH 29 /* metadata directory tree paths */ #define XFS_SCRUB_TYPE_RGSUPER 30 /* realtime superblock */ +#define XFS_SCRUB_TYPE_RTRMAPBT 31 /* rtgroup reverse mapping btree */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 31 +#define XFS_SCRUB_TYPE_NR 32 /* * This special type code only applies to the vectored scrub implementation. diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 613fb54e723ede..ca43dd4f52b2d6 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -35,6 +35,8 @@ #include "xfs_exchmaps.h" #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_bmap_util.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -791,9 +793,28 @@ xchk_rtgroup_lock( } while (1); sr->rtlock_flags = rtglock_flags; + + if (xfs_has_rtrmapbt(sc->mp) && (rtglock_flags & XFS_RTGLOCK_RMAP)) + sr->rmap_cur = xfs_rtrmapbt_init_cursor(sc->tp, sr->rtg); + return 0; } +/* + * Free all the btree cursors and other incore data relating to the realtime + * group. This has to be done /before/ committing (or cancelling) the scrub + * transaction. + */ +void +xchk_rtgroup_btcur_free( + struct xchk_rt *sr) +{ + if (sr->rmap_cur) + xfs_btree_del_cursor(sr->rmap_cur, XFS_BTREE_ERROR); + + sr->rmap_cur = NULL; +} + /* * Unlock the realtime group. This must be done /after/ committing (or * cancelling) the scrub transaction. @@ -878,6 +899,14 @@ xchk_setup_fs( return xchk_trans_alloc(sc, resblks); } +/* Set us up with a transaction and an empty context to repair rt metadata. */ +int +xchk_setup_rt( + struct xfs_scrub *sc) +{ + return xchk_trans_alloc(sc, 0); +} + /* Set us up with AG headers and btree cursors. */ int xchk_setup_ag_btree( @@ -1639,3 +1668,60 @@ xchk_inode_rootdir_inum(const struct xfs_inode *ip) return mp->m_metadirip->i_ino; return mp->m_rootip->i_ino; } + +static int +xchk_meta_btree_count_blocks( + struct xfs_scrub *sc, + xfs_extnum_t *nextents, + xfs_filblks_t *count) +{ + struct xfs_btree_cur *cur; + int error; + + if (!sc->sr.rtg) { + ASSERT(0); + return -EFSCORRUPTED; + } + + switch (sc->ip->i_metatype) { + case XFS_METAFILE_RTRMAP: + cur = xfs_rtrmapbt_init_cursor(sc->tp, sc->sr.rtg); + break; + default: + ASSERT(0); + return -EFSCORRUPTED; + } + + error = xfs_btree_count_blocks(cur, count); + xfs_btree_del_cursor(cur, error); + if (!error) { + *nextents = 0; + (*count)--; /* don't count the btree iroot */ + } + return error; +} + +/* Count the blocks used by a file, even if it's a metadata inode. */ +int +xchk_inode_count_blocks( + struct xfs_scrub *sc, + int whichfork, + xfs_extnum_t *nextents, + xfs_filblks_t *count) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(sc->ip, whichfork); + + if (!ifp) { + *nextents = 0; + *count = 0; + return 0; + } + + if (ifp->if_format == XFS_DINODE_FMT_META_BTREE) { + ASSERT(whichfork == XFS_DATA_FORK); + return xchk_meta_btree_count_blocks(sc, nextents, count); + } + + return xfs_bmap_count_blocks(sc->tp, sc->ip, whichfork, nextents, + count); +} diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index e734572a8dd6ec..1576467f724431 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -63,6 +63,7 @@ static inline int xchk_setup_nothing(struct xfs_scrub *sc) /* Setup functions */ int xchk_setup_agheader(struct xfs_scrub *sc); int xchk_setup_fs(struct xfs_scrub *sc); +int xchk_setup_rt(struct xfs_scrub *sc); int xchk_setup_ag_allocbt(struct xfs_scrub *sc); int xchk_setup_ag_iallocbt(struct xfs_scrub *sc); int xchk_setup_ag_rmapbt(struct xfs_scrub *sc); @@ -80,10 +81,12 @@ int xchk_setup_metapath(struct xfs_scrub *sc); int xchk_setup_rtbitmap(struct xfs_scrub *sc); int xchk_setup_rtsummary(struct xfs_scrub *sc); int xchk_setup_rgsuperblock(struct xfs_scrub *sc); +int xchk_setup_rtrmapbt(struct xfs_scrub *sc); #else # define xchk_setup_rtbitmap xchk_setup_nothing # define xchk_setup_rtsummary xchk_setup_nothing # define xchk_setup_rgsuperblock xchk_setup_nothing +# define xchk_setup_rtrmapbt xchk_setup_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_ino_dqattach(struct xfs_scrub *sc); @@ -125,7 +128,8 @@ xchk_ag_init_existing( #ifdef CONFIG_XFS_RT /* All the locks we need to check an rtgroup. */ -#define XCHK_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP) +#define XCHK_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP | \ + XFS_RTGLOCK_RMAP) int xchk_rtgroup_init(struct xfs_scrub *sc, xfs_rgnumber_t rgno, struct xchk_rt *sr); @@ -143,11 +147,13 @@ xchk_rtgroup_init_existing( int xchk_rtgroup_lock(struct xfs_scrub *sc, struct xchk_rt *sr, unsigned int rtglock_flags); +void xchk_rtgroup_btcur_free(struct xchk_rt *sr); void xchk_rtgroup_free(struct xfs_scrub *sc, struct xchk_rt *sr); #else # define xchk_rtgroup_init(sc, rgno, sr) (-EFSCORRUPTED) # define xchk_rtgroup_init_existing(sc, rgno, sr) (-EFSCORRUPTED) # define xchk_rtgroup_lock(sc, sr, lockflags) (-EFSCORRUPTED) +# define xchk_rtgroup_btcur_free(sr) do { } while (0) # define xchk_rtgroup_free(sc, sr) do { } while (0) #endif /* CONFIG_XFS_RT */ @@ -275,6 +281,8 @@ void xchk_fsgates_enable(struct xfs_scrub *sc, unsigned int scrub_fshooks); int xchk_inode_is_allocated(struct xfs_scrub *sc, xfs_agino_t agino, bool *inuse); +int xchk_inode_count_blocks(struct xfs_scrub *sc, int whichfork, + xfs_extnum_t *nextents, xfs_filblks_t *count); bool xchk_inode_is_dirtree_root(const struct xfs_inode *ip); bool xchk_inode_is_sb_rooted(const struct xfs_inode *ip); diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c index ccc6ca5934ca6a..bcc4244e3b55db 100644 --- a/fs/xfs/scrub/health.c +++ b/fs/xfs/scrub/health.c @@ -114,6 +114,7 @@ static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = { [XFS_SCRUB_TYPE_DIRTREE] = { XHG_INO, XFS_SICK_INO_DIRTREE }, [XFS_SCRUB_TYPE_METAPATH] = { XHG_FS, XFS_SICK_FS_METAPATH }, [XFS_SCRUB_TYPE_RGSUPER] = { XHG_RTGROUP, XFS_SICK_RG_SUPER }, + [XFS_SCRUB_TYPE_RTRMAPBT] = { XHG_RTGROUP, XFS_SICK_RG_RMAPBT }, }; /* Return the health status mask for this scrub type. */ diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index 2e911f38deaebe..8e702121dc8699 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -690,15 +690,13 @@ xchk_inode_xref_bmap( return; /* Walk all the extents to check nextents/naextents/nblocks. */ - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_DATA_FORK, - &nextents, &count); + error = xchk_inode_count_blocks(sc, XFS_DATA_FORK, &nextents, &count); if (!xchk_should_check_xref(sc, &error, NULL)) return; if (nextents < xfs_dfork_data_extents(dip)) xchk_ino_xref_set_corrupt(sc, sc->ip->i_ino); - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK, - &nextents, &acount); + error = xchk_inode_count_blocks(sc, XFS_ATTR_FORK, &nextents, &acount); if (!xchk_should_check_xref(sc, &error, NULL)) return; if (nextents != xfs_dfork_attr_extents(dip)) diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 7faa27472b9129..a94f9df0ca78f6 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -1536,8 +1536,7 @@ xrep_inode_blockcounts( trace_xrep_inode_blockcounts(sc); /* Set data fork counters from the data fork mappings. */ - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_DATA_FORK, - &nextents, &count); + error = xchk_inode_count_blocks(sc, XFS_DATA_FORK, &nextents, &count); if (error) return error; if (xfs_is_reflink_inode(sc->ip)) { @@ -1561,8 +1560,8 @@ xrep_inode_blockcounts( /* Set attr fork counters from the attr fork mappings. */ ifp = xfs_ifork_ptr(sc->ip, XFS_ATTR_FORK); if (ifp) { - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK, - &nextents, &acount); + error = xchk_inode_count_blocks(sc, XFS_ATTR_FORK, &nextents, + &acount); if (error) return error; if (count >= sc->mp->m_sb.sb_dblocks) diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 91c8bc055a4fd7..e788e3032f8e33 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -62,6 +62,7 @@ xrep_attempt( trace_xrep_attempt(XFS_I(file_inode(sc->file)), sc->sm, error); xchk_ag_btcur_free(&sc->sa); + xchk_rtgroup_btcur_free(&sc->sr); /* Repair whatever's broken. */ ASSERT(sc->ops->repair); diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c new file mode 100644 index 00000000000000..7b5f932bcd947f --- /dev/null +++ b/fs/xfs/scrub/rtrmap.c @@ -0,0 +1,184 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2018-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_inode.h" +#include "xfs_rtalloc.h" +#include "xfs_rtgroup.h" +#include "xfs_metafile.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" + +/* Set us up with the realtime metadata locked. */ +int +xchk_setup_rtrmapbt( + struct xfs_scrub *sc) +{ + int error; + + if (xchk_need_intent_drain(sc)) + xchk_fsgates_enable(sc, XCHK_FSGATES_DRAIN); + + error = xchk_rtgroup_init(sc, sc->sm->sm_agno, &sc->sr); + if (error) + return error; + + error = xchk_setup_rt(sc); + if (error) + return error; + + error = xchk_install_live_inode(sc, rtg_rmap(sc->sr.rtg)); + if (error) + return error; + + return xchk_rtgroup_lock(sc, &sc->sr, XCHK_RTGLOCK_ALL); +} + +/* Realtime reverse mapping. */ + +struct xchk_rtrmap { + /* + * The furthest-reaching of the rmapbt records that we've already + * processed. This enables us to detect overlapping records for space + * allocations that cannot be shared. + */ + struct xfs_rmap_irec overlap_rec; + + /* + * The previous rmapbt record, so that we can check for two records + * that could be one. + */ + struct xfs_rmap_irec prev_rec; +}; + +/* Flag failures for records that overlap but cannot. */ +STATIC void +xchk_rtrmapbt_check_overlapping( + struct xchk_btree *bs, + struct xchk_rtrmap *cr, + const struct xfs_rmap_irec *irec) +{ + xfs_rtblock_t pnext, inext; + + if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + /* No previous record? */ + if (cr->overlap_rec.rm_blockcount == 0) + goto set_prev; + + /* Do overlap_rec and irec overlap? */ + pnext = cr->overlap_rec.rm_startblock + cr->overlap_rec.rm_blockcount; + if (pnext <= irec->rm_startblock) + goto set_prev; + + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + + /* Save whichever rmap record extends furthest. */ + inext = irec->rm_startblock + irec->rm_blockcount; + if (pnext > inext) + return; + +set_prev: + memcpy(&cr->overlap_rec, irec, sizeof(struct xfs_rmap_irec)); +} + +/* Decide if two reverse-mapping records can be merged. */ +static inline bool +xchk_rtrmap_mergeable( + struct xchk_rtrmap *cr, + const struct xfs_rmap_irec *r2) +{ + const struct xfs_rmap_irec *r1 = &cr->prev_rec; + + /* Ignore if prev_rec is not yet initialized. */ + if (cr->prev_rec.rm_blockcount == 0) + return false; + + if (r1->rm_owner != r2->rm_owner) + return false; + if (r1->rm_startblock + r1->rm_blockcount != r2->rm_startblock) + return false; + if ((unsigned long long)r1->rm_blockcount + r2->rm_blockcount > + XFS_RMAP_LEN_MAX) + return false; + if (r1->rm_flags != r2->rm_flags) + return false; + return r1->rm_offset + r1->rm_blockcount == r2->rm_offset; +} + +/* Flag failures for records that could be merged. */ +STATIC void +xchk_rtrmapbt_check_mergeable( + struct xchk_btree *bs, + struct xchk_rtrmap *cr, + const struct xfs_rmap_irec *irec) +{ + if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + if (xchk_rtrmap_mergeable(cr, irec)) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + + memcpy(&cr->prev_rec, irec, sizeof(struct xfs_rmap_irec)); +} + +/* Scrub a realtime rmapbt record. */ +STATIC int +xchk_rtrmapbt_rec( + struct xchk_btree *bs, + const union xfs_btree_rec *rec) +{ + struct xchk_rtrmap *cr = bs->private; + struct xfs_rmap_irec irec; + + if (xfs_rmap_btrec_to_irec(rec, &irec) != NULL || + xfs_rtrmap_check_irec(to_rtg(bs->cur->bc_group), &irec) != NULL) { + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + return 0; + } + + if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return 0; + + xchk_rtrmapbt_check_mergeable(bs, cr, &irec); + xchk_rtrmapbt_check_overlapping(bs, cr, &irec); + return 0; +} + +/* Scrub the realtime rmap btree. */ +int +xchk_rtrmapbt( + struct xfs_scrub *sc) +{ + struct xfs_inode *ip = rtg_rmap(sc->sr.rtg); + struct xfs_owner_info oinfo; + struct xchk_rtrmap cr = { }; + int error; + + error = xchk_metadata_inode_forks(sc); + if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) + return error; + + xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, XFS_DATA_FORK); + return xchk_btree(sc, sc->sr.rmap_cur, xchk_rtrmapbt_rec, &oinfo, &cr); +} diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 652d347cee9929..09983899c34164 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -218,6 +218,8 @@ xchk_teardown( int error) { xchk_ag_free(sc, &sc->sa); + xchk_rtgroup_btcur_free(&sc->sr); + if (sc->tp) { if (error == 0 && (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)) error = xfs_trans_commit(sc->tp); @@ -458,6 +460,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .has = xfs_has_rtsb, .repair = xrep_rgsuperblock, }, + [XFS_SCRUB_TYPE_RTRMAPBT] = { /* realtime group rmapbt */ + .type = ST_RTGROUP, + .setup = xchk_setup_rtrmapbt, + .scrub = xchk_rtrmapbt, + .has = xfs_has_rtrmapbt, + .repair = xrep_notsupported, + }, }; static int diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 5dbbe93cb49bfa..0ad5122af486e1 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -126,6 +126,9 @@ struct xchk_rt { /* XFS_RTGLOCK_* lock state if locked */ unsigned int rtlock_flags; + + /* rtgroup btrees */ + struct xfs_btree_cur *rmap_cur; }; struct xfs_scrub { @@ -280,10 +283,12 @@ int xchk_metapath(struct xfs_scrub *sc); int xchk_rtbitmap(struct xfs_scrub *sc); int xchk_rtsummary(struct xfs_scrub *sc); int xchk_rgsuperblock(struct xfs_scrub *sc); +int xchk_rtrmapbt(struct xfs_scrub *sc); #else # define xchk_rtbitmap xchk_nothing # define xchk_rtsummary xchk_nothing # define xchk_rgsuperblock xchk_nothing +# define xchk_rtrmapbt xchk_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_quota(struct xfs_scrub *sc); diff --git a/fs/xfs/scrub/stats.c b/fs/xfs/scrub/stats.c index a476c7b2ab7597..eb6bb170c902b3 100644 --- a/fs/xfs/scrub/stats.c +++ b/fs/xfs/scrub/stats.c @@ -82,6 +82,7 @@ static const char *name_map[XFS_SCRUB_TYPE_NR] = { [XFS_SCRUB_TYPE_DIRTREE] = "dirtree", [XFS_SCRUB_TYPE_METAPATH] = "metapath", [XFS_SCRUB_TYPE_RGSUPER] = "rgsuper", + [XFS_SCRUB_TYPE_RTRMAPBT] = "rtrmapbt", }; /* Format the scrub stats into a text buffer, similar to pcp style. */ diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index d2ae7e93acb08e..5afc440f22f56c 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -72,6 +72,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_DIRTREE); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_BARRIER); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_METAPATH); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGSUPER); +TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RTRMAPBT); #define XFS_SCRUB_TYPE_STRINGS \ { XFS_SCRUB_TYPE_PROBE, "probe" }, \ @@ -105,7 +106,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGSUPER); { XFS_SCRUB_TYPE_DIRTREE, "dirtree" }, \ { XFS_SCRUB_TYPE_BARRIER, "barrier" }, \ { XFS_SCRUB_TYPE_METAPATH, "metapath" }, \ - { XFS_SCRUB_TYPE_RGSUPER, "rgsuper" } + { XFS_SCRUB_TYPE_RGSUPER, "rgsuper" }, \ + { XFS_SCRUB_TYPE_RTRMAPBT, "rtrmapbt" } #define XFS_SCRUB_FLAG_STRINGS \ { XFS_SCRUB_IFLAG_REPAIR, "repair" }, \ From patchwork Fri Dec 13 01:06:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906215 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E801417BA1 for ; Fri, 13 Dec 2024 01:06:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051980; cv=none; b=emvz7JWQ6olPmal1DQVMlqzJLzVAB+uAJaScHyzULj+H9ZWjBgT+udGXWV4zy8Uxh1po9cn6LlLwMaWFDoEXFKsVKWSdEHs19H4spwIDOc7La6SHfj4zcSI01/m+WFAO2HeGvWEoMfUkLvUfQnPpgOKVByfL+EYxkjy8mAT0Dio= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051980; c=relaxed/simple; bh=u8ozKaDkby0EGSEkZSeU0azHy5Dy+AO8Rj5gt3NIrC0=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=O61NFwsx6H1TZA8CMyK1okWuqhtHKhBUMpsmpWGzQL+V0kM0KYVJWNmLepHPIrElAviy1mfCgC6JsJwicsrQ5VfOrBuIvtzZ+Fcuz2Hn+P72rZ2JaZIOUyov11PbONr+cozquMdI2JGm+IhT8qwgTTOSZdk6USibrXFhue0sLfk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aXAqA2Co; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aXAqA2Co" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7A558C4CECE; Fri, 13 Dec 2024 01:06:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051979; bh=u8ozKaDkby0EGSEkZSeU0azHy5Dy+AO8Rj5gt3NIrC0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=aXAqA2CoTpdrDng0Q5gzLwqnFCogaybE47FdYZyxmXYYrFTflCbqmnQdqaZHdJQNW 3lLM76wYEtarNdyLWo65+iIkTO+PHgKE92gqXrQu8XjvVHgu/QBR7KV7edePsA+5fP UW3k4yDkP8mah2DXLF7vM97XMGuzkYontbAL/njWUO43748rb4oKzl1GyVTE6UK8MO wGihmpX3SMbsPGLOCGba8iZ1l7FtS1JJq5cnGznj0iTVsBkOFuGBqcpEJC31nQOc6t Atym2DnbaORqyopRW510RfV3JK98T87/oLIvNgj6n4+AG8JOehzS7SsH0d08dUd2zh ll82Vvepz+92g== Date: Thu, 12 Dec 2024 17:06:19 -0800 Subject: [PATCH 22/37] xfs: cross-reference realtime bitmap to realtime rmapbt scrubber From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123693.1181370.6852059995633084822.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When we're checking the realtime rmap btree entries, cross-reference those entries with the realtime bitmap too. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/scrub/rtrmap.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index 7b5f932bcd947f..515c2a9b02cdae 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -142,6 +142,20 @@ xchk_rtrmapbt_check_mergeable( memcpy(&cr->prev_rec, irec, sizeof(struct xfs_rmap_irec)); } +/* Cross-reference with other metadata. */ +STATIC void +xchk_rtrmapbt_xref( + struct xfs_scrub *sc, + struct xfs_rmap_irec *irec) +{ + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + xchk_xref_is_used_rt_space(sc, + xfs_rgbno_to_rtb(sc->sr.rtg, irec->rm_startblock), + irec->rm_blockcount); +} + /* Scrub a realtime rmapbt record. */ STATIC int xchk_rtrmapbt_rec( @@ -162,6 +176,7 @@ xchk_rtrmapbt_rec( xchk_rtrmapbt_check_mergeable(bs, cr, &irec); xchk_rtrmapbt_check_overlapping(bs, cr, &irec); + xchk_rtrmapbt_xref(bs->sc, &irec); return 0; } From patchwork Fri Dec 13 01:06:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906216 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 52C94629 for ; Fri, 13 Dec 2024 01:06:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051995; cv=none; b=ERmgwoxO+r2RW4iznqujj4tvXipaoDqu7RP7GFsyzDgB9syfr1Rd0fKCdkCNQRFSeafY389cS+qN+CphRNdn1frkC3w9VnGOzrMHUFBw1ufhrpDWhngy8/ZNfVSQtfw+y+3dlKopg/450EBZiHrjNtdP0CVNiy5oOdKVgddi2BU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734051995; c=relaxed/simple; bh=ltWaCnshj0u4uB6yUjnDsXYSTIHWaAnGiZHc0OSEsX0=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FvANoA0MfF8XIGwaVKbPmnvZ0LO+kO/mBLBmx5wIwaW+EGX7QtJ8GN5d8a8woYxaF1/n1PSryeblTXjM+tlOIHnLtXlmCk7YcmRK/28uCEr9gdcRLEUxsOwwzkAev1b2sQaVeVU45In2lsxeyA+mBDfoJO20hP0/GlO4MB8Yv+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mHQE4O8Z; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mHQE4O8Z" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1FB74C4CECE; Fri, 13 Dec 2024 01:06:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734051995; bh=ltWaCnshj0u4uB6yUjnDsXYSTIHWaAnGiZHc0OSEsX0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=mHQE4O8ZLMhU2Y/lWonVNaG9+R0jkHW1O+F3OwBhfWD9AFWNsSTZchP9qjdPl1nz9 Dr0QH0sSJwTTilHLFEi0y5mwwmA75A7yVRcDXz4J+JL+BnTd4Gj7PSe37zslYFndLh cdHfhTOG1KWnjE2OFJXbJeSlZB56BnYJ0W4ZmTbB8nuk3k3Cql9MnTiGM+eyNownUL DTushTDx1ojQEqnb+eYds8TZVMy3py0marhi65KgnqbmwtNyPm0wGRs6lIC4DRyeOn MfIEzZ8xLlDpgY4wN0ZkXup5sVcSMIBRGSB1CvddWFXYfUBGWk/BdaArUqPpQS7yEr H4mdbn7GWiKDw== Date: Thu, 12 Dec 2024 17:06:34 -0800 Subject: [PATCH 23/37] xfs: cross-reference the realtime rmapbt From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123710.1181370.7451351485334242968.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Teach the data fork and realtime bitmap scrubbers to cross-reference information with the realtime rmap btree. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/scrub/bmap.c | 52 +++++++++++++++++++++++++++----------- fs/xfs/scrub/rgsuper.c | 2 + fs/xfs/scrub/rtbitmap.c | 55 +++++++++++++++++++++++++++++++++++++--- fs/xfs/scrub/rtbitmap.h | 5 ++++ fs/xfs/scrub/rtrmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.h | 9 +++++++ 6 files changed, 169 insertions(+), 19 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index dd99366643f832..b7f9f3b3d81a3a 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -143,15 +143,22 @@ static inline bool xchk_bmap_get_rmap( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec, - xfs_agblock_t agbno, + xfs_agblock_t bno, uint64_t owner, struct xfs_rmap_irec *rmap) { + struct xfs_btree_cur **curp = &info->sc->sa.rmap_cur; xfs_fileoff_t offset; unsigned int rflags = 0; int has_rmap; int error; + if (xfs_ifork_is_realtime(info->sc->ip, info->whichfork)) + curp = &info->sc->sr.rmap_cur; + + if (*curp == NULL) + return false; + if (info->whichfork == XFS_ATTR_FORK) rflags |= XFS_RMAP_ATTR_FORK; if (irec->br_state == XFS_EXT_UNWRITTEN) @@ -172,13 +179,13 @@ xchk_bmap_get_rmap( * range rmap lookup to make sure we get the correct owner/offset. */ if (info->is_shared) { - error = xfs_rmap_lookup_le_range(info->sc->sa.rmap_cur, agbno, - owner, offset, rflags, rmap, &has_rmap); + error = xfs_rmap_lookup_le_range(*curp, bno, owner, offset, + rflags, rmap, &has_rmap); } else { - error = xfs_rmap_lookup_le(info->sc->sa.rmap_cur, agbno, - owner, offset, rflags, rmap, &has_rmap); + error = xfs_rmap_lookup_le(*curp, bno, owner, offset, + rflags, rmap, &has_rmap); } - if (!xchk_should_check_xref(info->sc, &error, &info->sc->sa.rmap_cur)) + if (!xchk_should_check_xref(info->sc, &error, curp)) return false; if (!has_rmap) @@ -192,29 +199,29 @@ STATIC void xchk_bmap_xref_rmap( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec, - xfs_agblock_t agbno) + xfs_agblock_t bno) { struct xfs_rmap_irec rmap; unsigned long long rmap_end; uint64_t owner = info->sc->ip->i_ino; - if (!info->sc->sa.rmap_cur || xchk_skip_xref(info->sc->sm)) + if (xchk_skip_xref(info->sc->sm)) return; /* Find the rmap record for this irec. */ - if (!xchk_bmap_get_rmap(info, irec, agbno, owner, &rmap)) + if (!xchk_bmap_get_rmap(info, irec, bno, owner, &rmap)) return; /* * The rmap must be an exact match for this incore file mapping record, * which may have arisen from multiple ondisk records. */ - if (rmap.rm_startblock != agbno) + if (rmap.rm_startblock != bno) xchk_fblock_xref_set_corrupt(info->sc, info->whichfork, irec->br_startoff); rmap_end = (unsigned long long)rmap.rm_startblock + rmap.rm_blockcount; - if (rmap_end != agbno + irec->br_blockcount) + if (rmap_end != bno + irec->br_blockcount) xchk_fblock_xref_set_corrupt(info->sc, info->whichfork, irec->br_startoff); @@ -259,7 +266,7 @@ STATIC void xchk_bmap_xref_rmap_cow( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec, - xfs_agblock_t agbno) + xfs_agblock_t bno) { struct xfs_rmap_irec rmap; unsigned long long rmap_end; @@ -269,7 +276,7 @@ xchk_bmap_xref_rmap_cow( return; /* Find the rmap record for this irec. */ - if (!xchk_bmap_get_rmap(info, irec, agbno, owner, &rmap)) + if (!xchk_bmap_get_rmap(info, irec, bno, owner, &rmap)) return; /* @@ -277,12 +284,12 @@ xchk_bmap_xref_rmap_cow( * can start before and end after the physical space allocated to this * mapping. There are no offsets to check. */ - if (rmap.rm_startblock > agbno) + if (rmap.rm_startblock > bno) xchk_fblock_xref_set_corrupt(info->sc, info->whichfork, irec->br_startoff); rmap_end = (unsigned long long)rmap.rm_startblock + rmap.rm_blockcount; - if (rmap_end < agbno + irec->br_blockcount) + if (rmap_end < bno + irec->br_blockcount) xchk_fblock_xref_set_corrupt(info->sc, info->whichfork, irec->br_startoff); @@ -315,6 +322,8 @@ xchk_bmap_rt_iextent_xref( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec) { + struct xfs_owner_info oinfo; + xfs_rgblock_t rgbno; int error; error = xchk_rtgroup_init_existing(info->sc, @@ -332,6 +341,19 @@ xchk_bmap_rt_iextent_xref( xchk_xref_is_used_rt_space(info->sc, irec->br_startblock, irec->br_blockcount); + if (!xfs_has_rtrmapbt(info->sc->mp)) + goto out_cur; + + rgbno = xfs_rtb_to_rgbno(info->sc->mp, irec->br_startblock); + xchk_bmap_xref_rmap(info, irec, rgbno); + + xfs_rmap_ino_owner(&oinfo, info->sc->ip->i_ino, info->whichfork, + irec->br_startoff); + xchk_xref_is_only_rt_owned_by(info->sc, rgbno, + irec->br_blockcount, &oinfo); + +out_cur: + xchk_rtgroup_btcur_free(&info->sc->sr); out_free: xchk_rtgroup_free(info->sc, &info->sc->sr); } diff --git a/fs/xfs/scrub/rgsuper.c b/fs/xfs/scrub/rgsuper.c index e062c7d12565cd..d189732d0e24fb 100644 --- a/fs/xfs/scrub/rgsuper.c +++ b/fs/xfs/scrub/rgsuper.c @@ -13,6 +13,7 @@ #include "xfs_log_format.h" #include "xfs_trans.h" #include "xfs_sb.h" +#include "xfs_rmap.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/repair.h" @@ -34,6 +35,7 @@ xchk_rgsuperblock_xref( return; xchk_xref_is_used_rt_space(sc, xfs_rgbno_to_rtb(sc->sr.rtg, 0), 1); + xchk_xref_is_only_rt_owned_by(sc, 0, 1, &XFS_RMAP_OINFO_FS); } int diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 819026ea2d741f..675f4fdd1e675f 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -9,17 +9,22 @@ #include "xfs_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" +#include "xfs_btree.h" #include "xfs_log_format.h" #include "xfs_trans.h" #include "xfs_rtbitmap.h" #include "xfs_inode.h" #include "xfs_bmap.h" #include "xfs_bit.h" +#include "xfs_rtgroup.h" #include "xfs_sb.h" +#include "xfs_rmap.h" +#include "xfs_rtrmap_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/repair.h" #include "scrub/rtbitmap.h" +#include "scrub/btree.h" /* Set us up with the realtime metadata locked. */ int @@ -37,6 +42,7 @@ xchk_setup_rtbitmap( if (!rtb) return -ENOMEM; sc->buf = rtb; + rtb->sc = sc; error = xchk_rtgroup_init(sc, sc->sm->sm_agno, &sc->sr); if (error) @@ -78,7 +84,30 @@ xchk_setup_rtbitmap( return 0; } -/* Realtime bitmap. */ +/* Per-rtgroup bitmap contents. */ + +/* Cross-reference rtbitmap entries with other metadata. */ +STATIC void +xchk_rtbitmap_xref( + struct xchk_rtbitmap *rtb, + xfs_rtblock_t startblock, + xfs_rtblock_t blockcount) +{ + struct xfs_scrub *sc = rtb->sc; + xfs_rgblock_t rgbno = xfs_rtb_to_rgbno(sc->mp, startblock); + + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + if (!sc->sr.rmap_cur) + return; + + xchk_xref_has_no_rt_owner(sc, rgbno, blockcount); + + if (rtb->next_free_rgbno < rgbno) + xchk_xref_has_rt_owner(sc, rtb->next_free_rgbno, + rgbno - rtb->next_free_rgbno); + rtb->next_free_rgbno = rgbno + blockcount; +} /* Scrub a free extent record from the realtime bitmap. */ STATIC int @@ -88,7 +117,8 @@ xchk_rtbitmap_rec( const struct xfs_rtalloc_rec *rec, void *priv) { - struct xfs_scrub *sc = priv; + struct xchk_rtbitmap *rtb = priv; + struct xfs_scrub *sc = rtb->sc; xfs_rtblock_t startblock; xfs_filblks_t blockcount; @@ -97,6 +127,12 @@ xchk_rtbitmap_rec( if (!xfs_verify_rtbext(rtg_mount(rtg), startblock, blockcount)) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0); + + xchk_rtbitmap_xref(rtb, startblock, blockcount); + + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return -ECANCELED; + return 0; } @@ -144,7 +180,7 @@ xchk_rtbitmap_check_extents( return error; } -/* Scrub the realtime bitmap. */ +/* Scrub this group's realtime bitmap. */ int xchk_rtbitmap( struct xfs_scrub *sc) @@ -153,6 +189,7 @@ xchk_rtbitmap( struct xfs_rtgroup *rtg = sc->sr.rtg; struct xfs_inode *rbmip = rtg_bitmap(rtg); struct xchk_rtbitmap *rtb = sc->buf; + xfs_rgblock_t last_rgbno; int error; /* Is sb_rextents correct? */ @@ -205,10 +242,20 @@ xchk_rtbitmap( if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) return error; - error = xfs_rtalloc_query_all(rtg, sc->tp, xchk_rtbitmap_rec, sc); + rtb->next_free_rgbno = 0; + error = xfs_rtalloc_query_all(rtg, sc->tp, xchk_rtbitmap_rec, rtb); if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error)) return error; + /* + * Check that the are rmappings for all rt extents between the end of + * the last free extent we saw and the last possible extent in the rt + * group. + */ + last_rgbno = rtg->rtg_extents * mp->m_sb.sb_rextsize - 1; + if (rtb->next_free_rgbno < last_rgbno) + xchk_xref_has_rt_owner(sc, rtb->next_free_rgbno, + last_rgbno - rtb->next_free_rgbno); return 0; } diff --git a/fs/xfs/scrub/rtbitmap.h b/fs/xfs/scrub/rtbitmap.h index 85304ff019e1dc..dd5b394d9697d2 100644 --- a/fs/xfs/scrub/rtbitmap.h +++ b/fs/xfs/scrub/rtbitmap.h @@ -7,10 +7,15 @@ #define __XFS_SCRUB_RTBITMAP_H__ struct xchk_rtbitmap { + struct xfs_scrub *sc; + uint64_t rextents; uint64_t rbmblocks; unsigned int rextslog; unsigned int resblks; + + /* The next free rt group block number that we expect to see. */ + xfs_rgblock_t next_free_rgbno; }; #ifdef CONFIG_XFS_ONLINE_REPAIR diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index 515c2a9b02cdae..764fa296792234 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -197,3 +197,68 @@ xchk_rtrmapbt( xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, XFS_DATA_FORK); return xchk_btree(sc, sc->sr.rmap_cur, xchk_rtrmapbt_rec, &oinfo, &cr); } + +/* xref check that the extent has no realtime reverse mapping at all */ +void +xchk_xref_has_no_rt_owner( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + enum xbtree_recpacking outcome; + int error; + + if (!sc->sr.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_rmap_has_records(sc->sr.rmap_cur, bno, len, &outcome); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + return; + if (outcome != XBTREE_RECPACKING_EMPTY) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); +} + +/* xref check that the extent is completely mapped */ +void +xchk_xref_has_rt_owner( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + enum xbtree_recpacking outcome; + int error; + + if (!sc->sr.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_rmap_has_records(sc->sr.rmap_cur, bno, len, &outcome); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + return; + if (outcome != XBTREE_RECPACKING_FULL) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); +} + +/* xref check that the extent is only owned by a given owner */ +void +xchk_xref_is_only_rt_owned_by( + struct xfs_scrub *sc, + xfs_agblock_t bno, + xfs_extlen_t len, + const struct xfs_owner_info *oinfo) +{ + struct xfs_rmap_matches res; + int error; + + if (!sc->sr.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_rmap_count_owners(sc->sr.rmap_cur, bno, len, oinfo, &res); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + return; + if (res.matches != 1) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); + if (res.bad_non_owner_matches) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); + if (res.non_owner_matches) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); +} diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 0ad5122af486e1..cba4e89a3a627b 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -322,8 +322,17 @@ void xchk_xref_is_not_cow_staging(struct xfs_scrub *sc, xfs_agblock_t bno, #ifdef CONFIG_XFS_RT void xchk_xref_is_used_rt_space(struct xfs_scrub *sc, xfs_rtblock_t rtbno, xfs_extlen_t len); +void xchk_xref_has_no_rt_owner(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); +void xchk_xref_has_rt_owner(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); +void xchk_xref_is_only_rt_owned_by(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len, const struct xfs_owner_info *oinfo); #else # define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0) +# define xchk_xref_has_no_rt_owner(sc, rtbno, len) do { } while (0) +# define xchk_xref_has_rt_owner(sc, rtbno, len) do { } while (0) +# define xchk_xref_is_only_rt_owned_by(sc, bno, len, oinfo) do { } while (0) #endif #endif /* __XFS_SCRUB_SCRUB_H__ */ From patchwork Fri Dec 13 01:06:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906217 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E04CF629 for ; Fri, 13 Dec 2024 01:06:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052011; cv=none; b=N3zgJKsH1rJX7sMTn6PJ411R/l5h1gxoHkbqc50b2ebPCgJFqnZ6lvyiiCQWPm7Cqo5nCxz6Afm9e3RMJp6gjDXz4V1KD182eNGa/EhRgijyV0H81SN8dVVRwCCNkwPQFKUuY0BlsdKx2KXaqYexeusvB2U8eiZ08WtJkiseheQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052011; c=relaxed/simple; bh=WUM6qnQ6Ne2z0qNRG752efCJR0fVEV9gVW9BMzP9bAY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GLtLHWYTH5cwDnYWWKC93FfsgFzYiW44sS6q3fro2Hw8kB/Zwd5nOn9ICL3BXSz/Fn3dK5ptk0O521rM6W4zWF1pzY4xKWPGH8Yp4X1T22LOAFEUV22SrvYCVFjYWbw3I4Y94amnSgoxOXX98IMB+g3/1xsv8VbPz/HUZN2HjCY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=o5LrSnIf; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="o5LrSnIf" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B98C2C4CED3; Fri, 13 Dec 2024 01:06:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052010; bh=WUM6qnQ6Ne2z0qNRG752efCJR0fVEV9gVW9BMzP9bAY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=o5LrSnIfDm7JOXadnoILNba+qzxYEcMvExZICOHbiJDcrt9Memdrnz74Q2s1jPbJl xQaHgePP5LP/LhjVjDyXZGi9n0xfcpDCsDfrbqqqG7TV1v+WelNKgTPQ0A6uL8zaIj rDHtW6FlZCvEEiLb4r3NgY5JYANyUdA9p7Pqf8w5PIHW7W2rbQG0ai1m2qG+dmRrOz z38g0AqenubtqVHZg9x5LOicYbAo8hG3alsHDbV41L+/69R6nhUItzO2ZtItNDxRVe bMx/lZusUuwJzZDL/X0vxEcc0Vkn/A8+jn3XkzQ44Z8mHNYb/oHiQFwJ2XtozvS2DJ KbwnkvwOan9Jw== Date: Thu, 12 Dec 2024 17:06:50 -0800 Subject: [PATCH 24/37] xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123727.1181370.595086889127827740.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Teach the bmbt scrubber how to perform a comprehensive check that the rmapbt does not contain /any/ mappings that are not described by bmbt records when it's dealing with a realtime file. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/scrub/bmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index b7f9f3b3d81a3a..f6077b0cba8a14 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -21,6 +21,8 @@ #include "xfs_rmap_btree.h" #include "xfs_rtgroup.h" #include "xfs_health.h" +#include "xfs_rtalloc.h" +#include "xfs_rtrmap_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/btree.h" @@ -641,8 +643,7 @@ xchk_bmap_check_rmap( xchk_fblock_set_corrupt(sc, sbcri->whichfork, check_rec.rm_offset); if (irec.br_startblock != - xfs_agbno_to_fsb(to_perag(cur->bc_group), - check_rec.rm_startblock)) + xfs_gbno_to_fsb(cur->bc_group, check_rec.rm_startblock)) xchk_fblock_set_corrupt(sc, sbcri->whichfork, check_rec.rm_offset); if (irec.br_blockcount > check_rec.rm_blockcount) @@ -696,6 +697,30 @@ xchk_bmap_check_ag_rmaps( return error; } +/* Make sure each rt rmap has a corresponding bmbt entry. */ +STATIC int +xchk_bmap_check_rt_rmaps( + struct xfs_scrub *sc, + struct xfs_rtgroup *rtg) +{ + struct xchk_bmap_check_rmap_info sbcri; + struct xfs_btree_cur *cur; + int error; + + xfs_rtgroup_lock(rtg, XFS_RTGLOCK_RMAP); + cur = xfs_rtrmapbt_init_cursor(sc->tp, rtg); + + sbcri.sc = sc; + sbcri.whichfork = XFS_DATA_FORK; + error = xfs_rmap_query_all(cur, xchk_bmap_check_rmap, &sbcri); + if (error == -ECANCELED) + error = 0; + + xfs_btree_del_cursor(cur, error); + xfs_rtgroup_unlock(rtg, XFS_RTGLOCK_RMAP); + return error; +} + /* * Decide if we want to scan the reverse mappings to determine if the attr * fork /really/ has zero space mappings. @@ -750,10 +775,6 @@ xchk_bmap_check_empty_datafork( { struct xfs_ifork *ifp = &ip->i_df; - /* Don't support realtime rmap checks yet. */ - if (XFS_IS_REALTIME_INODE(ip)) - return false; - /* * If the dinode repair found a bad data fork, it will reset the fork * to extents format with zero records and wait for the this scrubber @@ -804,6 +825,21 @@ xchk_bmap_check_rmaps( struct xfs_perag *pag = NULL; int error; + if (xfs_ifork_is_realtime(sc->ip, whichfork)) { + struct xfs_rtgroup *rtg = NULL; + + while ((rtg = xfs_rtgroup_next(sc->mp, rtg))) { + error = xchk_bmap_check_rt_rmaps(sc, rtg); + if (error || + (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) { + xfs_rtgroup_rele(rtg); + return error; + } + } + + return 0; + } + while ((pag = xfs_perag_next(sc->mp, pag))) { error = xchk_bmap_check_ag_rmaps(sc, whichfork, pag); if (error || From patchwork Fri Dec 13 01:07:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906218 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD842629 for ; Fri, 13 Dec 2024 01:07:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052026; cv=none; b=BaWvyWJLV/8GkM+uMzPu9MvzKAwywSEcg2aq33PGFDF13DAIs2eYancx/gH+7w31KynTUH++ChoQlT1oV3Q6uZjv1SSxK+p6BR0c/STt20duP1ZLEif8Zb34E/8zdOB5rd/k3UqQbdEXJhCu1rC3E4VYRX42uevpFd1KcZKSy4M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052026; c=relaxed/simple; bh=Fwjs/kBV21i3Odcw9HpOgGCPxlZ0GnJjW649x1tBc1c=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ZNhkE7dObAJxb3P+ea07/0K/PfHLbQvPcJtiTBs803XQCaefy7LqyCJ3p7smeyMEH0GiN9XMgnaEMvcj3kHFrYXzE8bMERZse2k+f8f5wBOk5w+EOOa5CPLoL1oDrkI4jDu4MUddy53/z4nB8+rjaUdgKprGa0PjwKVFBzPrx7E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kUdHmumx; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kUdHmumx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 58960C4CED3; Fri, 13 Dec 2024 01:07:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052026; bh=Fwjs/kBV21i3Odcw9HpOgGCPxlZ0GnJjW649x1tBc1c=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=kUdHmumxHV5BM2C2xtpPpnm7+0ZzPPjyEfKGehhtiIPNWVy2FxkJGRFoK8K0pMaVG ApurqTwNmboe3ZvfulTKExqMdsUXpxWOfMQcsXEaDUenB+ODTY+YZYG5UJH/DyTimd NlAOXhZlrmL+h652r71MUzZ+8Ax/nOwYQfAaRxU2Tg6uKOj+TxuTP1wIXOVn/E7YPW god/QutJ+ICM8G5ynqgU+I38pB13s640RbmgEwLl+llmkX1UeA/9L8wQQ7C2F1X9op 9pOLGvz8uxH1YX/yco36/BuTqc9DxFQFkNs9Vqk2VS85SsMYig3XVBAbr75oHL+Zv7 /Sb8qFrbYxKWw== Date: Thu, 12 Dec 2024 17:07:05 -0800 Subject: [PATCH 25/37] xfs: scrub the metadir path of rt rmap btree files From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123744.1181370.14108227241545412092.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata directory tree path to the rmap btree file for each rt group. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_fs.h | 3 ++- fs/xfs/scrub/metapath.c | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 34fcbcd0bcd5e3..d42d3a5617e314 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -830,9 +830,10 @@ struct xfs_scrub_vec_head { #define XFS_SCRUB_METAPATH_USRQUOTA (5) /* user quota */ #define XFS_SCRUB_METAPATH_GRPQUOTA (6) /* group quota */ #define XFS_SCRUB_METAPATH_PRJQUOTA (7) /* project quota */ +#define XFS_SCRUB_METAPATH_RTRMAPBT (8) /* realtime reverse mapping */ /* Number of metapath sm_ino values */ -#define XFS_SCRUB_METAPATH_NR (8) +#define XFS_SCRUB_METAPATH_NR (9) /* * ioctl limits diff --git a/fs/xfs/scrub/metapath.c b/fs/xfs/scrub/metapath.c index c678cba1ffc3f7..74d71373e7edf1 100644 --- a/fs/xfs/scrub/metapath.c +++ b/fs/xfs/scrub/metapath.c @@ -21,6 +21,7 @@ #include "xfs_trans_space.h" #include "xfs_attr.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -246,6 +247,8 @@ xchk_setup_metapath( return xchk_setup_metapath_dqinode(sc, XFS_DQTYPE_GROUP); case XFS_SCRUB_METAPATH_PRJQUOTA: return xchk_setup_metapath_dqinode(sc, XFS_DQTYPE_PROJ); + case XFS_SCRUB_METAPATH_RTRMAPBT: + return xchk_setup_metapath_rtginode(sc, XFS_RTGI_RMAP); default: return -ENOENT; } From patchwork Fri Dec 13 01:07:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906219 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 628468BEC for ; Fri, 13 Dec 2024 01:07:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052042; cv=none; b=haXr1gIXFBmis3W4102lDzkZpBxChfvn7Va5z9QYs0WKSRuCnd/YS1p6OFkYeDln9/Ifsl0A+kMxYS5brKjGrMEGLNWGdWo4YGZ/H+pzCDe4VcNH//mxd+rVXZeWNuWT9BQjhOFPRtDK87czg+K+pwK4XrPANdFoXq79Hc2BD84= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052042; c=relaxed/simple; bh=OjXP98YAQyPTIXahu4LOjqi/HWos/hs+77xWhRz1mmk=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qFo0gUIRyXI+vFLDzTof5gZAhBbTM4X2ewJSvvpCHG3ccoiBy0kYiQ3N7Y+BrfVcwSgVTDqDCR6VorzTqdpbFJP9eC0q71phvsZa7FhemBVmnACo/VSTcmBhZ5TMkJsK+TV/EB9Lrx9sGK//8Lq/TUzaW6duWgf7/WZEPu0Ie2E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YsNvcZJI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YsNvcZJI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0465AC4CED3; Fri, 13 Dec 2024 01:07:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052042; bh=OjXP98YAQyPTIXahu4LOjqi/HWos/hs+77xWhRz1mmk=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=YsNvcZJIpQWNSl9MSgrDhWA50/Ln1XmiScrapgu6kYNYo+y7o2A8rYF6NFkejTH9O l9atG68CYPie6ZLyyzBfO8ae4dtU8r7ggo1SsOwCuIT+x2cTb/VLYD143Z9kAEQnxE 00DIR8nHByKFfJb3toMUghSY/guOQeGTCWULD1/OtOI0aU6uvXDx1ccmAsEOlg1ly+ 5mAZzcuAKPAdmwxkNMSz2hrpsSPzJMZCHOqls7M9WJCZQjDbCcrxaLik0VegiEtfru xlejmn7N42MTvDGr21UshhBSH1i9jdSpeSsV2iH7LH43pZMjPOB8/t6amHrbvFeIh+ V6vaMkSmuV3+g== Date: Thu, 12 Dec 2024 17:07:21 -0800 Subject: [PATCH 26/37] xfs: walk the rt reverse mapping tree when rebuilding rmap From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123761.1181370.4118391090133986822.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong When we're rebuilding the data device rmap, if we encounter an "rmap" format fork, we have to walk the (realtime) rmap btree inode to build the appropriate mappings. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/scrub/rmap_repair.c | 53 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 52 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/rmap_repair.c b/fs/xfs/scrub/rmap_repair.c index 2a0b9e3d0fbaee..91c17feb49768b 100644 --- a/fs/xfs/scrub/rmap_repair.c +++ b/fs/xfs/scrub/rmap_repair.c @@ -31,6 +31,8 @@ #include "xfs_refcount.h" #include "xfs_refcount_btree.h" #include "xfs_ag.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_rtgroup.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -504,7 +506,56 @@ xrep_rmap_scan_meta_btree( struct xrep_rmap_ifork *rf, struct xfs_inode *ip) { - return -EFSCORRUPTED; /* XXX placeholder */ + struct xfs_scrub *sc = rf->rr->sc; + struct xfs_rtgroup *rtg = NULL; + struct xfs_btree_cur *cur = NULL; + enum xfs_rtg_inodes type; + int error; + + if (rf->whichfork != XFS_DATA_FORK) + return -EFSCORRUPTED; + + switch (ip->i_metatype) { + case XFS_METAFILE_RTRMAP: + type = XFS_RTGI_RMAP; + break; + default: + ASSERT(0); + return -EFSCORRUPTED; + } + + while ((rtg = xfs_rtgroup_next(sc->mp, rtg))) { + if (ip == rtg->rtg_inodes[type]) + goto found; + } + + /* + * We should never find an rt metadata btree inode that isn't + * associated with an rtgroup yet has ondisk blocks allocated to it. + */ + if (ip->i_nblocks) { + ASSERT(0); + return -EFSCORRUPTED; + } + + return 0; + +found: + switch (ip->i_metatype) { + case XFS_METAFILE_RTRMAP: + cur = xfs_rtrmapbt_init_cursor(sc->tp, rtg); + break; + default: + ASSERT(0); + error = -EFSCORRUPTED; + goto out_rtg; + } + + error = xrep_rmap_scan_iroot_btree(rf, cur); + xfs_btree_del_cursor(cur, error); +out_rtg: + xfs_rtgroup_rele(rtg); + return error; } /* Find all the extents from a given AG in an inode fork. */ From patchwork Fri Dec 13 01:07:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906220 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 24B04629 for ; Fri, 13 Dec 2024 01:07:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052059; cv=none; b=TACkl7JEFrhYWqMbO5LcUM1JSR0jAanRKC/lWV9P4up/BOHNYrT+bZIpJeK15ynjVFq1Ta6fNTKEh8muSjrztTj/Faifw3c9+u4URV/WDD9ETj/YGHWSZ1CYDdZgK8os5uhFErrlxWxECi0R/PYzmWfIkOLmrZEjgxat7RZ8GD4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052059; c=relaxed/simple; bh=UtHcsaXKLCGmZm3n79ZVhqHOzJfs3mtJeKYih20f4jw=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sRdsGfCRdelTIrE5c1Brn/dgKfXLDxxkn5quYAW0/PHiNVPlPnweG73qUedMKuTsRhZvUZIH9l9ZNMWOoEgKjNGXnoHwGVSnyqI8wwCwF1p8+kaArbWoCSHO6o9EpO4JtpMl7zwZySjq81hMbscz10MPazWjNJn5KJCK1HDtYx8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gfBQ33Pw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gfBQ33Pw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4E27C4CED3; Fri, 13 Dec 2024 01:07:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052058; bh=UtHcsaXKLCGmZm3n79ZVhqHOzJfs3mtJeKYih20f4jw=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=gfBQ33Pw6SD73UHPHk4HItyBAUu3YXiVKlG2mJkH3eTnRjl/4wLPJ4nv16rT22oXD 9I2HEAOPyP5lH8MXwUlg4XL5P4OBoGA6V2a5JqCXUo51FovM+DQlSeyDWzdqXITCmU Mq6OChmXIvqhXBF5wfBxlXUI847DWheoxK7+vnUJ15f+cTfFgCu5EIpi0WITsQ1cyu VtqLYDbn3qWW3LTEc6JfnjHhSVhmGSK0XthhsnUh1WdugWuoDbdMZ4266NQaSkD+Dd lEBW+Uk+ZZdYNzdo5awpDiwmMqWbk3q435PzACnLWIp5Qk3WQ5Ppvl1J7h10jINxWS p/M4BNlSeeySQ== Date: Thu, 12 Dec 2024 17:07:37 -0800 Subject: [PATCH 27/37] xfs: online repair of realtime file bmaps From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123778.1181370.13816707119197050202.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Repair the block mappings of realtime files. Signed-off-by: "Darrick J. Wong" --- fs/xfs/scrub/bmap_repair.c | 128 +++++++++++++++++++++++++++++++++++++++++++- fs/xfs/scrub/repair.c | 46 ++++++++++++++++ fs/xfs/scrub/repair.h | 2 + 3 files changed, 172 insertions(+), 4 deletions(-) diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 141d36f1da9a71..fd64bdf4e13887 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -25,11 +25,13 @@ #include "xfs_bmap_btree.h" #include "xfs_rmap.h" #include "xfs_rmap_btree.h" +#include "xfs_rtrmap_btree.h" #include "xfs_refcount.h" #include "xfs_quota.h" #include "xfs_ialloc.h" #include "xfs_ag.h" #include "xfs_reflink.h" +#include "xfs_rtgroup.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -359,6 +361,112 @@ xrep_bmap_scan_ag( return error; } +#ifdef CONFIG_XFS_RT +/* Check for any obvious errors or conflicts in the file mapping. */ +STATIC int +xrep_bmap_check_rtfork_rmap( + struct xfs_scrub *sc, + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec) +{ + /* xattr extents are never stored on realtime devices */ + if (rec->rm_flags & XFS_RMAP_ATTR_FORK) + return -EFSCORRUPTED; + + /* bmbt blocks are never stored on realtime devices */ + if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) + return -EFSCORRUPTED; + + /* Data extents for non-rt files are never stored on the rt device. */ + if (!XFS_IS_REALTIME_INODE(sc->ip)) + return -EFSCORRUPTED; + + /* Check the file offsets and physical extents. */ + if (!xfs_verify_fileext(sc->mp, rec->rm_offset, rec->rm_blockcount)) + return -EFSCORRUPTED; + + /* Check that this is within the rtgroup. */ + if (!xfs_verify_rgbext(to_rtg(cur->bc_group), rec->rm_startblock, + rec->rm_blockcount)) + return -EFSCORRUPTED; + + /* Make sure this isn't free space. */ + return xrep_require_rtext_inuse(sc, rec->rm_startblock, + rec->rm_blockcount); +} + +/* Record realtime extents that belong to this inode's fork. */ +STATIC int +xrep_bmap_walk_rtrmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_bmap *rb = priv; + int error = 0; + + if (xchk_should_terminate(rb->sc, &error)) + return error; + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != rb->sc->ip->i_ino) + return 0; + + error = xrep_bmap_check_rtfork_rmap(rb->sc, cur, rec); + if (error) + return error; + + /* + * Record all blocks allocated to this file even if the extent isn't + * for the fork we're rebuilding so that we can reset di_nblocks later. + */ + rb->nblocks += rec->rm_blockcount; + + /* If this rmap isn't for the fork we want, we're done. */ + if (rb->whichfork == XFS_DATA_FORK && + (rec->rm_flags & XFS_RMAP_ATTR_FORK)) + return 0; + if (rb->whichfork == XFS_ATTR_FORK && + !(rec->rm_flags & XFS_RMAP_ATTR_FORK)) + return 0; + + return xrep_bmap_from_rmap(rb, rec->rm_offset, + xfs_rgbno_to_rtb(to_rtg(cur->bc_group), + rec->rm_startblock), + rec->rm_blockcount, + rec->rm_flags & XFS_RMAP_UNWRITTEN); +} + +/* Scan the realtime reverse mappings to build the new extent map. */ +STATIC int +xrep_bmap_scan_rtgroup( + struct xrep_bmap *rb, + struct xfs_rtgroup *rtg) +{ + struct xfs_scrub *sc = rb->sc; + int error; + + if (!xfs_has_rtrmapbt(sc->mp)) + return 0; + + error = xrep_rtgroup_init(sc, rtg, &sc->sr, + XFS_RTGLOCK_RMAP | XFS_RTGLOCK_BITMAP_SHARED); + if (error) + return error; + + error = xfs_rmap_query_all(sc->sr.rmap_cur, xrep_bmap_walk_rtrmap, rb); + xchk_rtgroup_btcur_free(&sc->sr); + xchk_rtgroup_free(sc, &sc->sr); + return error; +} +#else +static inline int +xrep_bmap_scan_rtgroup(struct xrep_bmap *rb, struct xfs_rtgroup *rtg) +{ + return -EFSCORRUPTED; +} +#endif + /* Find the delalloc extents from the old incore extent tree. */ STATIC int xrep_bmap_find_delalloc( @@ -410,6 +518,22 @@ xrep_bmap_find_mappings( struct xfs_perag *pag = NULL; int error = 0; + /* + * Iterate the rtrmaps for extents. Metadata files never have content + * on the realtime device, so there's no need to scan them. + */ + if (!xfs_is_metadir_inode(sc->ip)) { + struct xfs_rtgroup *rtg = NULL; + + while ((rtg = xfs_rtgroup_next(sc->mp, rtg))) { + error = xrep_bmap_scan_rtgroup(rb, rtg); + if (error) { + xfs_rtgroup_rele(rtg); + return error; + } + } + } + /* Iterate the rmaps for extents. */ while ((pag = xfs_perag_next(sc->mp, pag))) { error = xrep_bmap_scan_ag(rb, pag); @@ -754,10 +878,6 @@ xrep_bmap_check_inputs( return -EINVAL; } - /* Don't know how to rebuild realtime data forks. */ - if (XFS_IS_REALTIME_INODE(sc->ip)) - return -EOPNOTSUPP; - return 0; } diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index e788e3032f8e33..18946dd46fa745 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -37,6 +37,9 @@ #include "xfs_da_btree.h" #include "xfs_attr.h" #include "xfs_dir2.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -955,6 +958,22 @@ xrep_ag_init( } #ifdef CONFIG_XFS_RT +/* Initialize all the btree cursors for a RT repair. */ +static void +xrep_rtgroup_btcur_init( + struct xfs_scrub *sc, + struct xchk_rt *sr) +{ + struct xfs_mount *mp = sc->mp; + + ASSERT(sr->rtg != NULL); + + if (sc->sm->sm_type != XFS_SCRUB_TYPE_RTRMAPBT && + (sr->rtlock_flags & XFS_RTGLOCK_RMAP) && + xfs_has_rtrmapbt(mp)) + sr->rmap_cur = xfs_rtrmapbt_init_cursor(sc->tp, sr->rtg); +} + /* * Given a reference to a rtgroup structure, lock rtgroup btree inodes and * create btree cursors. Must only be called to repair a regular rt file. @@ -973,6 +992,33 @@ xrep_rtgroup_init( /* Grab our own passive reference from the caller's ref. */ sr->rtg = xfs_rtgroup_hold(rtg); + xrep_rtgroup_btcur_init(sc, sr); + return 0; +} + +/* Ensure that all rt blocks in the given range are not marked free. */ +int +xrep_require_rtext_inuse( + struct xfs_scrub *sc, + xfs_rgblock_t rgbno, + xfs_filblks_t len) +{ + struct xfs_mount *mp = sc->mp; + xfs_rtxnum_t startrtx; + xfs_rtxnum_t endrtx; + bool is_free = false; + int error; + + startrtx = xfs_rgbno_to_rtx(mp, rgbno); + endrtx = xfs_rgbno_to_rtx(mp, rgbno + len - 1); + + error = xfs_rtalloc_extent_is_free(sc->sr.rtg, sc->tp, startrtx, + endrtx - startrtx + 1, &is_free); + if (error) + return error; + if (is_free) + return -EFSCORRUPTED; + return 0; } #endif /* CONFIG_XFS_RT */ diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index b649da1a93eb8c..584135042d9aa9 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -110,6 +110,8 @@ int xrep_ag_init(struct xfs_scrub *sc, struct xfs_perag *pag, #ifdef CONFIG_XFS_RT int xrep_rtgroup_init(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, struct xchk_rt *sr, unsigned int rtglock_flags); +int xrep_require_rtext_inuse(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_filblks_t len); #else # define xrep_rtgroup_init(sc, rtg, sr, lockflags) (-ENOSYS) #endif /* CONFIG_XFS_RT */ From patchwork Fri Dec 13 01:07:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906221 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C0D087485 for ; Fri, 13 Dec 2024 01:07:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052074; cv=none; b=AX+P99B2Z3sgmbSPHCMwpinIA1i2xARJ91S6YGQCpkRqpuVRaF1P9DRRZkoI6Nbg1mhVXE4wtBr3z1lODG5cm8aEDDxO+vWntlCvI6+lq7p0lxTyzRQBxsjl+++W0vidum/ADGo4dRkyZ6S8TekaFRuIW3xppl4BTo+XkaWjNks= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052074; c=relaxed/simple; bh=HEPXTZQQLHNyjKw6dkmYpyvLRXBwiZfkie4oTOaJ2AQ=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ijFvE/jJEmOEBFLELji5/4raOazueRUHnfLDUax04hu62XB6M3BTtb7NSdvNSO4jRGqXnywr453NK66gU8k38bnhaFRA+1poTi3BMTxaOpX3YW7BLlC4kRqYX0+gJGcuqFbK7OkJzfyW9xwlJs5nxKRtSAYuhghVqrCnrDXiwVg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=s/k5link; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="s/k5link" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5183DC4CED3; Fri, 13 Dec 2024 01:07:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052074; bh=HEPXTZQQLHNyjKw6dkmYpyvLRXBwiZfkie4oTOaJ2AQ=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=s/k5link5wK+AohRJdUYm44MOxndRFxCuqLatZbdIVoMP/sqLn4Vgap76vfEbp83s +5nNMBRwEaP9pET25GD4+Wja0Q3euPeRiq5WyblKfHcx3WqsYeFYvakaJKE5LaF0f0 s6xO+nkjlmsPdWHjyeKGB4dP9d7G+LEIl22wAxKCCAzuJwb5kqx1IE+BpqXgPKljF+ PdwLIXbhtfz5fFtn9+0XAyP+wcnt2V7jvF39I4nIsyVVXGwyydUZoE7o5oJpMH2N2m IQDLOYtd02BpUCjvsKdiT6dBzBtzDYTnb4oWZhQt1q4kbuiYQzhTykFG+QlTIS5G1y iDBlsKbpTjf/w== Date: Thu, 12 Dec 2024 17:07:53 -0800 Subject: [PATCH 28/37] xfs: repair inodes that have realtime extents From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123795.1181370.12023276963649602828.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Plumb into the inode core repair code the ability to search for extents on realtime devices. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/scrub/inode_repair.c | 58 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index a94f9df0ca78f6..816e81330ffc99 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -38,6 +38,8 @@ #include "xfs_log_priv.h" #include "xfs_health.h" #include "xfs_symlink_remote.h" +#include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -773,17 +775,71 @@ xrep_dinode_count_ag_rmaps( return error; } +/* Count extents and blocks for an inode given an rt rmap. */ +STATIC int +xrep_dinode_walk_rtrmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_inode *ri = priv; + int error = 0; + + if (xchk_should_terminate(ri->sc, &error)) + return error; + + /* We only care about this inode. */ + if (rec->rm_owner != ri->sc->sm->sm_ino) + return 0; + + if (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK)) + return -EFSCORRUPTED; + + ri->rt_blocks += rec->rm_blockcount; + ri->rt_extents++; + return 0; +} + +/* Count extents and blocks for an inode from all realtime rmap data. */ +STATIC int +xrep_dinode_count_rtgroup_rmaps( + struct xrep_inode *ri, + struct xfs_rtgroup *rtg) +{ + struct xfs_scrub *sc = ri->sc; + int error; + + error = xrep_rtgroup_init(sc, rtg, &sc->sr, XFS_RTGLOCK_RMAP); + if (error) + return error; + + error = xfs_rmap_query_all(sc->sr.rmap_cur, xrep_dinode_walk_rtrmap, + ri); + xchk_rtgroup_btcur_free(&sc->sr); + xchk_rtgroup_free(sc, &sc->sr); + return error; +} + /* Count extents and blocks for a given inode from all rmap data. */ STATIC int xrep_dinode_count_rmaps( struct xrep_inode *ri) { struct xfs_perag *pag = NULL; + struct xfs_rtgroup *rtg = NULL; int error; - if (!xfs_has_rmapbt(ri->sc->mp) || xfs_has_realtime(ri->sc->mp)) + if (!xfs_has_rmapbt(ri->sc->mp)) return -EOPNOTSUPP; + while ((rtg = xfs_rtgroup_next(ri->sc->mp, rtg))) { + error = xrep_dinode_count_rtgroup_rmaps(ri, rtg); + if (error) { + xfs_rtgroup_rele(rtg); + return error; + } + } + while ((pag = xfs_perag_next(ri->sc->mp, pag))) { error = xrep_dinode_count_ag_rmaps(ri, pag); if (error) { From patchwork Fri Dec 13 01:08:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906222 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FF4C1F95E for ; Fri, 13 Dec 2024 01:08:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052090; cv=none; b=W7WZpYEpSZazUZOtv3ToRaZtNz0ledCRmhFNhgrDMUPo22VMQCJheRldpUKpJhHPtDBb4CsWHJoHsL9CEiZW3UUxpN+3H0VX8ieo/tgMyAesNGqzzNRjE9XQHlT+ay1AhRiaJwWg7fApnVFs4u4C/jFCVSKnG/20XUUdk7ctdec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052090; c=relaxed/simple; bh=opOSw4IhEP/teePD0uASdn6mtV9l3jRGqKVc55qXqUg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=EmnMdaCBVacDCge4oo5L8VRNAyLfi9PJndEIr6J87LJ6g6588CqSNfKJZR1IGi4YjLbIe0huHLB6MQlxka5ofuqvbK2tCSmwmcC6ZKd2cpjmtkWmxq5uIGQC0JqHOb1orzZrpwS82Hzo+G90LEX191pWfKfcvSX6+L0MhUyxKL0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Q1tSovRo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Q1tSovRo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 37010C4CECE; Fri, 13 Dec 2024 01:08:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052090; bh=opOSw4IhEP/teePD0uASdn6mtV9l3jRGqKVc55qXqUg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Q1tSovRo7TcqBRlmquZ7mccwazSfWeSGouRP5y9LNkxbv0Pzcz85+WaR4J1M9ISOG mWr/yOGVwf2pC6dAUgmB8eA5wVtVVl//7v6ZiVULMo8NB+Vplwr7L9vZnbimKLinyr 1p8+FkNbWg7ajud6v7J7AX3xnpokejItfT7742cG8MbZ3LcFJcV+EJypF2HHJMET9p J+PrrJ46JiqFMZjBfyOX71Xb0RoftbnhzRrr6c5sufgHhppcKRiOA381A/e9bfIhe9 8Pgcg0XsQs5ffoNw7mbkrVzQnQnZucEyfPc14+0u4G1R7XIRYlXK0wVYsdV3GOLPj+ tvguPiuacHUDw== Date: Thu, 12 Dec 2024 17:08:09 -0800 Subject: [PATCH 29/37] xfs: repair rmap btree inodes From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123812.1181370.1385680244042469564.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Teach the inode repair code how to deal with realtime rmap btree inodes that won't load properly. This is most likely moot since the filesystem generally won't mount without the rtrmapbt inodes being usable, but we'll add this for completeness. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/scrub/inode_repair.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 816e81330ffc99..d7e3f033b16073 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -944,6 +944,34 @@ xrep_dinode_bad_bmbt_fork( return false; } +/* Return true if this rmap-format ifork looks like garbage. */ +STATIC bool +xrep_dinode_bad_rtrmapbt_fork( + struct xfs_scrub *sc, + struct xfs_dinode *dip, + unsigned int dfork_size) +{ + struct xfs_rtrmap_root *dfp; + unsigned int nrecs; + unsigned int level; + + if (dfork_size < sizeof(struct xfs_rtrmap_root)) + return true; + + dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + nrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (level > sc->mp->m_rtrmap_maxlevels) + return true; + if (xfs_rtrmap_droot_space_calc(level, nrecs) > dfork_size) + return true; + if (level > 0 && nrecs == 0) + return true; + + return false; +} + /* Check a metadata-btree fork. */ STATIC bool xrep_dinode_bad_metabt_fork( @@ -956,6 +984,8 @@ xrep_dinode_bad_metabt_fork( return true; switch (be16_to_cpu(dip->di_metatype)) { + case XFS_METAFILE_RTRMAP: + return xrep_dinode_bad_rtrmapbt_fork(sc, dip, dfork_size); default: return true; } @@ -1220,6 +1250,7 @@ xrep_dinode_ensure_forkoff( uint16_t mode) { struct xfs_bmdr_block *bmdr; + struct xfs_rtrmap_root *rmdr; struct xfs_scrub *sc = ri->sc; xfs_extnum_t attr_extents, data_extents; size_t bmdr_minsz = xfs_bmdr_space_calc(1); @@ -1328,6 +1359,10 @@ xrep_dinode_ensure_forkoff( break; case XFS_DINODE_FMT_META_BTREE: switch (be16_to_cpu(dip->di_metatype)) { + case XFS_METAFILE_RTRMAP: + rmdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + dfork_min = xfs_rtrmap_broot_space(sc->mp, rmdr); + break; default: dfork_min = 0; break; From patchwork Fri Dec 13 01:08:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906223 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7179422071 for ; Fri, 13 Dec 2024 01:08:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052106; cv=none; b=q8xAYUO+etfak/EyKZp5nDKuf51iJSny5mF3uT9giCnX/eO106cVsamQ9bcfIi+piyNgK53yjS9fYHe8AOEi51QU1XgEkv3tlK4BWUMMtWQukToq/ycHgdsWb+Q3R1P5VJP4pi06WnKXToGgzFZh8VWgn7anctv16uAKcDQsDF8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052106; c=relaxed/simple; bh=HcSm51cvSlQqpUISAtKTDgaDGqDPswLCYL1aFfw+eIg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=fZl8BTMWdX/GcqMMgkZ3ecWkbJ0rrDkxoMDQIthUpQadwU5OqvLpRljaf5T/KNjKnsc5gFwRz3+b/hXyuFeZ0B1ycBdPVu+9OTIR5owa2kxDaYjHR+st0eMYu+9CDxHfZoG/p2WR75G8QYUwGxXpJMDLbjXR+Upj04ycNRQn4c4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=t6W8Z6Uy; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="t6W8Z6Uy" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D298AC4CECE; Fri, 13 Dec 2024 01:08:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052106; bh=HcSm51cvSlQqpUISAtKTDgaDGqDPswLCYL1aFfw+eIg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=t6W8Z6Uyad3snXe78Q/mC6I0/mLTqGcc0vaElP08zb08desKqUgfNdSUVjYHBpDgB j0AZ2MiwO0cm5cqCkUQChrCHyrHq6PZ5OSzlGLUnXcaRQ7aTjPkBHZa4PhFH8wfEtm rOwqDF87qnuLopbxahcpJekZmFWgj9MmMtK5uJoCjXLlyNn1q72xvjLLxhcLO4Jw7F VO+OylIP1CP8D02rNF6M9LJft1YB75beXu0yMLPeEqShomJt6P5PeBejqhd0wZ9zGU YFu3OCepW42rC4OJNhHKIA+SOS6REdL7V3tZSCY59WZ1ptrKO79FRjU/dKS7HrMgdc SYL/qRlh0kG5w== Date: Thu, 12 Dec 2024 17:08:25 -0800 Subject: [PATCH 30/37] xfs: online repair of realtime bitmaps for a realtime group From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123829.1181370.623591226923572341.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong For a given rt group, regenerate the bitmap contents from the group's realtime rmap btree. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_rtbitmap.h | 9 + fs/xfs/scrub/common.h | 6 + fs/xfs/scrub/repair.c | 2 fs/xfs/scrub/repair.h | 1 fs/xfs/scrub/rtbitmap.c | 5 fs/xfs/scrub/rtbitmap.h | 50 +++++ fs/xfs/scrub/rtbitmap_repair.c | 429 ++++++++++++++++++++++++++++++++++++++- fs/xfs/scrub/rtsummary_repair.c | 3 fs/xfs/scrub/tempexch.h | 2 fs/xfs/scrub/tempfile.c | 20 +- fs/xfs/scrub/trace.c | 1 fs/xfs/scrub/trace.h | 150 ++++++++++++++ 12 files changed, 659 insertions(+), 19 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 16563a44bd138a..22e5d9cd95f47c 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -135,6 +135,15 @@ xfs_rtb_to_rtx( return div_u64(rtbno, mp->m_sb.sb_rextsize); } +/* Return the offset of a rtgroup block number within an rt extent. */ +static inline xfs_extlen_t +xfs_rgbno_to_rtxoff( + struct xfs_mount *mp, + xfs_rgblock_t rgbno) +{ + return rgbno % mp->m_sb.sb_rextsize; +} + /* Return the offset of an rt block number within an rt extent. */ static inline xfs_extlen_t xfs_rtb_to_rtxoff( diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 1576467f724431..e5891609af2740 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -264,6 +264,12 @@ int xchk_metadata_inode_forks(struct xfs_scrub *sc); (sc)->mp->m_super->s_id, \ (sc)->ip ? (sc)->ip->i_ino : (sc)->sm->sm_ino, \ ##__VA_ARGS__) +#define xchk_xfile_rtgroup_descr(sc, fmt, ...) \ + kasprintf(XCHK_GFP_FLAGS, "XFS (%s): rtgroup 0x%x " fmt, \ + (sc)->mp->m_super->s_id, \ + (sc)->sa.pag ? \ + rtg_rgno((sc)->sr.rtg) : (sc)->sm->sm_agno, \ + ##__VA_ARGS__) /* * Setting up a hook to wait for intents to drain is costly -- we have to take diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 18946dd46fa745..82fe01d78cb08d 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -959,7 +959,7 @@ xrep_ag_init( #ifdef CONFIG_XFS_RT /* Initialize all the btree cursors for a RT repair. */ -static void +void xrep_rtgroup_btcur_init( struct xfs_scrub *sc, struct xchk_rt *sr) diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 584135042d9aa9..7f493752ea78e6 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -110,6 +110,7 @@ int xrep_ag_init(struct xfs_scrub *sc, struct xfs_perag *pag, #ifdef CONFIG_XFS_RT int xrep_rtgroup_init(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, struct xchk_rt *sr, unsigned int rtglock_flags); +void xrep_rtgroup_btcur_init(struct xfs_scrub *sc, struct xchk_rt *sr); int xrep_require_rtext_inuse(struct xfs_scrub *sc, xfs_rgblock_t rgbno, xfs_filblks_t len); #else diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 675f4fdd1e675f..28c90a31f4c32b 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -20,9 +20,11 @@ #include "xfs_sb.h" #include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" +#include "xfs_exchmaps.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/repair.h" +#include "scrub/tempexch.h" #include "scrub/rtbitmap.h" #include "scrub/btree.h" @@ -38,7 +40,8 @@ xchk_setup_rtbitmap( if (xchk_need_intent_drain(sc)) xchk_fsgates_enable(sc, XCHK_FSGATES_DRAIN); - rtb = kzalloc(sizeof(struct xchk_rtbitmap), XCHK_GFP_FLAGS); + rtb = kzalloc(struct_size(rtb, words, xchk_rtbitmap_wordcnt(sc)), + XCHK_GFP_FLAGS); if (!rtb) return -ENOMEM; sc->buf = rtb; diff --git a/fs/xfs/scrub/rtbitmap.h b/fs/xfs/scrub/rtbitmap.h index dd5b394d9697d2..fe52b877253d35 100644 --- a/fs/xfs/scrub/rtbitmap.h +++ b/fs/xfs/scrub/rtbitmap.h @@ -6,6 +6,20 @@ #ifndef __XFS_SCRUB_RTBITMAP_H__ #define __XFS_SCRUB_RTBITMAP_H__ +/* + * We use an xfile to construct new bitmap blocks for the portion of the + * rtbitmap file that we're replacing. Whereas the ondisk bitmap must be + * accessed through the buffer cache, the xfile bitmap supports direct + * word-level accesses. Therefore, we create a small abstraction for linear + * access. + */ +typedef unsigned long long xrep_wordoff_t; +typedef unsigned int xrep_wordcnt_t; + +/* Mask to round an rtx down to the nearest bitmap word. */ +#define XREP_RTBMP_WORDMASK ((1ULL << XFS_NBWORDLOG) - 1) + + struct xchk_rtbitmap { struct xfs_scrub *sc; @@ -16,12 +30,48 @@ struct xchk_rtbitmap { /* The next free rt group block number that we expect to see. */ xfs_rgblock_t next_free_rgbno; + +#ifdef CONFIG_XFS_ONLINE_REPAIR + /* stuff for staging a new bitmap */ + struct xfs_rtalloc_args args; + struct xrep_tempexch tempexch; +#endif + + /* The next rtgroup block we expect to see during our rtrmapbt walk. */ + xfs_rgblock_t next_rgbno; + + /* rtgroup lock flags */ + unsigned int rtglock_flags; + + /* rtword position of xfile as we write buffers to disk. */ + xrep_wordoff_t prep_wordoff; + + /* In-Memory rtbitmap for repair. */ + union xfs_rtword_raw words[]; }; #ifdef CONFIG_XFS_ONLINE_REPAIR int xrep_setup_rtbitmap(struct xfs_scrub *sc, struct xchk_rtbitmap *rtb); + +/* + * How big should the words[] buffer be? + * + * For repairs, we want a full fsblock worth of space so that we can memcpy a + * buffer full of 1s into the xfile bitmap. The xfile bitmap doesn't have + * rtbitmap block headers, so we don't use blockwsize. Scrub doesn't use the + * words buffer at all. + */ +static inline unsigned int +xchk_rtbitmap_wordcnt( + struct xfs_scrub *sc) +{ + if (xchk_could_repair(sc)) + return sc->mp->m_sb.sb_blocksize >> XFS_WORDLOG; + return 0; +} #else # define xrep_setup_rtbitmap(sc, rtb) (0) +# define xchk_rtbitmap_wordcnt(sc) (0) #endif /* CONFIG_XFS_ONLINE_REPAIR */ #endif /* __XFS_SCRUB_RTBITMAP_H__ */ diff --git a/fs/xfs/scrub/rtbitmap_repair.c b/fs/xfs/scrub/rtbitmap_repair.c index 0fef98e9f83409..c6e33834c5ae98 100644 --- a/fs/xfs/scrub/rtbitmap_repair.c +++ b/fs/xfs/scrub/rtbitmap_repair.c @@ -12,32 +12,65 @@ #include "xfs_btree.h" #include "xfs_log_format.h" #include "xfs_trans.h" +#include "xfs_rtalloc.h" #include "xfs_inode.h" #include "xfs_bit.h" #include "xfs_bmap.h" #include "xfs_bmap_btree.h" +#include "xfs_rmap.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_exchmaps.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" +#include "xfs_extent_busy.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" #include "scrub/repair.h" #include "scrub/xfile.h" +#include "scrub/tempfile.h" +#include "scrub/tempexch.h" +#include "scrub/reap.h" #include "scrub/rtbitmap.h" -/* Set up to repair the realtime bitmap file metadata. */ +/* rt bitmap content repairs */ + +/* Set up to repair the realtime bitmap for this group. */ int xrep_setup_rtbitmap( struct xfs_scrub *sc, struct xchk_rtbitmap *rtb) { struct xfs_mount *mp = sc->mp; - unsigned long long blocks = 0; + char *descr; + unsigned long long blocks = mp->m_sb.sb_rbmblocks; + int error; + + error = xrep_tempfile_create(sc, S_IFREG); + if (error) + return error; + + /* Create an xfile to hold our reconstructed bitmap. */ + descr = xchk_xfile_rtgroup_descr(sc, "bitmap file"); + error = xfile_create(descr, blocks * mp->m_sb.sb_blocksize, &sc->xfile); + kfree(descr); + if (error) + return error; /* - * Reserve enough blocks to write out a completely new bmbt for a - * maximally fragmented bitmap file. We do not hold the rtbitmap - * ILOCK yet, so this is entirely speculative. + * Reserve enough blocks to write out a completely new bitmap file, + * plus twice as many blocks as we would need if we can only allocate + * one block per data fork mapping. This should cover the + * preallocation of the temporary file and exchanging the extent + * mappings. + * + * We cannot use xfs_exchmaps_estimate because we have not yet + * constructed the replacement bitmap and therefore do not know how + * many extents it will use. By the time we do, we will have a dirty + * transaction (which we cannot drop because we cannot drop the + * rtbitmap ILOCK) and cannot ask for more reservation. */ - blocks = xfs_bmbt_calc_size(mp, mp->m_sb.sb_rbmblocks); + blocks += xfs_bmbt_calc_size(mp, blocks) * 2; if (blocks > UINT_MAX) return -EOPNOTSUPP; @@ -45,6 +78,304 @@ xrep_setup_rtbitmap( return 0; } +static inline xrep_wordoff_t +rtx_to_wordoff( + struct xfs_mount *mp, + xfs_rtxnum_t rtx) +{ + return rtx >> XFS_NBWORDLOG; +} + +static inline xrep_wordcnt_t +rtxlen_to_wordcnt( + xfs_rtxlen_t rtxlen) +{ + return rtxlen >> XFS_NBWORDLOG; +} + +/* Helper functions to record rtwords in an xfile. */ + +static inline int +xfbmp_load( + struct xchk_rtbitmap *rtb, + xrep_wordoff_t wordoff, + xfs_rtword_t *word) +{ + union xfs_rtword_raw urk; + int error; + + ASSERT(xfs_has_rtgroups(rtb->sc->mp)); + + error = xfile_load(rtb->sc->xfile, &urk, + sizeof(union xfs_rtword_raw), + wordoff << XFS_WORDLOG); + if (error) + return error; + + *word = be32_to_cpu(urk.rtg); + return 0; +} + +static inline int +xfbmp_store( + struct xchk_rtbitmap *rtb, + xrep_wordoff_t wordoff, + const xfs_rtword_t word) +{ + union xfs_rtword_raw urk; + + ASSERT(xfs_has_rtgroups(rtb->sc->mp)); + + urk.rtg = cpu_to_be32(word); + return xfile_store(rtb->sc->xfile, &urk, + sizeof(union xfs_rtword_raw), + wordoff << XFS_WORDLOG); +} + +static inline int +xfbmp_copyin( + struct xchk_rtbitmap *rtb, + xrep_wordoff_t wordoff, + const union xfs_rtword_raw *word, + xrep_wordcnt_t nr_words) +{ + return xfile_store(rtb->sc->xfile, word, nr_words << XFS_WORDLOG, + wordoff << XFS_WORDLOG); +} + +static inline int +xfbmp_copyout( + struct xchk_rtbitmap *rtb, + xrep_wordoff_t wordoff, + union xfs_rtword_raw *word, + xrep_wordcnt_t nr_words) +{ + return xfile_load(rtb->sc->xfile, word, nr_words << XFS_WORDLOG, + wordoff << XFS_WORDLOG); +} + +/* Perform a logical OR operation on an rtword in the incore bitmap. */ +static int +xrep_rtbitmap_or( + struct xchk_rtbitmap *rtb, + xrep_wordoff_t wordoff, + xfs_rtword_t mask) +{ + xfs_rtword_t word; + int error; + + error = xfbmp_load(rtb, wordoff, &word); + if (error) + return error; + + trace_xrep_rtbitmap_or(rtb->sc->mp, wordoff, mask, word); + + return xfbmp_store(rtb, wordoff, word | mask); +} + +/* + * Mark as free every rt extent between the next rt block we expected to see + * in the rtrmap records and the given rt block. + */ +STATIC int +xrep_rtbitmap_mark_free( + struct xchk_rtbitmap *rtb, + xfs_rgblock_t rgbno) +{ + struct xfs_mount *mp = rtb->sc->mp; + struct xfs_rtgroup *rtg = rtb->sc->sr.rtg; + xfs_rtxnum_t startrtx; + xfs_rtxnum_t nextrtx; + xrep_wordoff_t wordoff, nextwordoff; + unsigned int bit; + unsigned int bufwsize; + xfs_extlen_t mod; + xfs_rtword_t mask; + int error; + + if (!xfs_verify_rgbext(rtg, rtb->next_rgbno, rgbno - rtb->next_rgbno)) + return -EFSCORRUPTED; + + /* + * Convert rt blocks to rt extents The block range we find must be + * aligned to an rtextent boundary on both ends. + */ + startrtx = xfs_rgbno_to_rtx(mp, rtb->next_rgbno); + mod = xfs_rgbno_to_rtxoff(mp, rtb->next_rgbno); + if (mod) + return -EFSCORRUPTED; + + nextrtx = xfs_rgbno_to_rtx(mp, rgbno - 1) + 1; + mod = xfs_rgbno_to_rtxoff(mp, rgbno - 1); + if (mod != mp->m_sb.sb_rextsize - 1) + return -EFSCORRUPTED; + + trace_xrep_rtbitmap_record_free(mp, startrtx, nextrtx - 1); + + /* Set bits as needed to round startrtx up to the nearest word. */ + bit = startrtx & XREP_RTBMP_WORDMASK; + if (bit) { + xfs_rtblock_t len = nextrtx - startrtx; + unsigned int lastbit; + + lastbit = min(bit + len, XFS_NBWORD); + mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit; + + error = xrep_rtbitmap_or(rtb, rtx_to_wordoff(mp, startrtx), + mask); + if (error || lastbit - bit == len) + return error; + startrtx += XFS_NBWORD - bit; + } + + /* Set bits as needed to round nextrtx down to the nearest word. */ + bit = nextrtx & XREP_RTBMP_WORDMASK; + if (bit) { + mask = ((xfs_rtword_t)1 << bit) - 1; + + error = xrep_rtbitmap_or(rtb, rtx_to_wordoff(mp, nextrtx), + mask); + if (error || startrtx + bit == nextrtx) + return error; + nextrtx -= bit; + } + + trace_xrep_rtbitmap_record_free_bulk(mp, startrtx, nextrtx - 1); + + /* Set all the words in between, up to a whole fs block at once. */ + wordoff = rtx_to_wordoff(mp, startrtx); + nextwordoff = rtx_to_wordoff(mp, nextrtx); + bufwsize = mp->m_sb.sb_blocksize >> XFS_WORDLOG; + + while (wordoff < nextwordoff) { + xrep_wordoff_t rem; + xrep_wordcnt_t wordcnt; + + wordcnt = min_t(xrep_wordcnt_t, nextwordoff - wordoff, + bufwsize); + + /* + * Try to keep us aligned to the rtwords buffer to reduce the + * number of xfile writes. + */ + rem = wordoff & (bufwsize - 1); + if (rem) + wordcnt = min_t(xrep_wordcnt_t, wordcnt, + bufwsize - rem); + + error = xfbmp_copyin(rtb, wordoff, rtb->words, wordcnt); + if (error) + return error; + + wordoff += wordcnt; + } + + return 0; +} + +/* Set free space in the rtbitmap based on rtrmapbt records. */ +STATIC int +xrep_rtbitmap_walk_rtrmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xchk_rtbitmap *rtb = priv; + int error = 0; + + if (xchk_should_terminate(rtb->sc, &error)) + return error; + + if (rtb->next_rgbno < rec->rm_startblock) { + error = xrep_rtbitmap_mark_free(rtb, rec->rm_startblock); + if (error) + return error; + } + + rtb->next_rgbno = max(rtb->next_rgbno, + rec->rm_startblock + rec->rm_blockcount); + return 0; +} + +/* + * Walk the rtrmapbt to find all the gaps between records, and mark the gaps + * in the realtime bitmap that we're computing. + */ +STATIC int +xrep_rtbitmap_find_freespace( + struct xchk_rtbitmap *rtb) +{ + struct xfs_scrub *sc = rtb->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_rtgroup *rtg = sc->sr.rtg; + uint64_t blockcount; + int error; + + /* Prepare a buffer of ones so that we can accelerate bulk setting. */ + memset(rtb->words, 0xFF, mp->m_sb.sb_blocksize); + + xrep_rtgroup_btcur_init(sc, &sc->sr); + error = xfs_rmap_query_all(sc->sr.rmap_cur, xrep_rtbitmap_walk_rtrmap, + rtb); + if (error) + goto out; + + /* + * Mark as free every possible rt extent from the last one we saw to + * the end of the rt group. + */ + blockcount = rtg->rtg_extents * mp->m_sb.sb_rextsize; + if (rtb->next_rgbno < blockcount) { + error = xrep_rtbitmap_mark_free(rtb, blockcount); + if (error) + goto out; + } + +out: + xchk_rtgroup_btcur_free(&sc->sr); + return error; +} + +static int +xrep_rtbitmap_prep_buf( + struct xfs_scrub *sc, + struct xfs_buf *bp, + void *data) +{ + struct xchk_rtbitmap *rtb = data; + struct xfs_mount *mp = sc->mp; + union xfs_rtword_raw *ondisk; + int error; + + rtb->args.mp = sc->mp; + rtb->args.tp = sc->tp; + rtb->args.rbmbp = bp; + ondisk = xfs_rbmblock_wordptr(&rtb->args, 0); + rtb->args.rbmbp = NULL; + + error = xfbmp_copyout(rtb, rtb->prep_wordoff, ondisk, + mp->m_blockwsize); + if (error) + return error; + + if (xfs_has_rtgroups(sc->mp)) { + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + hdr->rt_magic = cpu_to_be32(XFS_RTBITMAP_MAGIC); + hdr->rt_owner = cpu_to_be64(sc->ip->i_ino); + hdr->rt_blkno = cpu_to_be64(xfs_buf_daddr(bp)); + hdr->rt_lsn = 0; + uuid_copy(&hdr->rt_uuid, &sc->mp->m_sb.sb_meta_uuid); + bp->b_ops = &xfs_rtbitmap_buf_ops; + } else { + bp->b_ops = &xfs_rtbuf_ops; + } + + rtb->prep_wordoff += mp->m_blockwsize; + xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_RTBITMAP_BUF); + return 0; +} + /* * Make sure that the given range of the data fork of the realtime file is * mapped to written blocks. The caller must ensure that the inode is joined @@ -160,9 +491,18 @@ xrep_rtbitmap( { struct xchk_rtbitmap *rtb = sc->buf; struct xfs_mount *mp = sc->mp; + struct xfs_group *xg = rtg_group(sc->sr.rtg); unsigned long long blocks = 0; + unsigned int busy_gen; int error; + /* We require the realtime rmapbt to rebuild anything. */ + if (!xfs_has_rtrmapbt(sc->mp)) + return -EOPNOTSUPP; + /* We require atomic file exchange range to rebuild anything. */ + if (!xfs_has_exchange_range(sc->mp)) + return -EOPNOTSUPP; + /* Impossibly large rtbitmap means we can't touch the filesystem. */ if (rtb->rbmblocks > U32_MAX) return 0; @@ -195,6 +535,79 @@ xrep_rtbitmap( if (error) return error; - /* Fix inconsistent bitmap geometry */ - return xrep_rtbitmap_geometry(sc, rtb); + /* + * Fix inconsistent bitmap geometry. This function returns with a + * clean scrub transaction. + */ + error = xrep_rtbitmap_geometry(sc, rtb); + if (error) + return error; + + /* + * Make sure the busy extent list is clear because we can't put extents + * on there twice. + */ + if (!xfs_extent_busy_list_empty(xg, &busy_gen)) { + error = xfs_extent_busy_flush(sc->tp, xg, busy_gen, 0); + if (error) + return error; + } + + /* + * Generate the new rtbitmap data. We don't need the rtbmp information + * once this call is finished. + */ + error = xrep_rtbitmap_find_freespace(rtb); + if (error) + return error; + + /* + * Try to take ILOCK_EXCL of the temporary file. We had better be the + * only ones holding onto this inode, but we can't block while holding + * the rtbitmap file's ILOCK_EXCL. + */ + while (!xrep_tempfile_ilock_nowait(sc)) { + if (xchk_should_terminate(sc, &error)) + return error; + delay(1); + } + + /* + * Make sure we have space allocated for the part of the bitmap + * file that corresponds to this group. We already joined sc->ip. + */ + xfs_trans_ijoin(sc->tp, sc->tempip, 0); + error = xrep_tempfile_prealloc(sc, 0, rtb->rbmblocks); + if (error) + return error; + + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + return error; + + /* Copy the bitmap file that we generated. */ + error = xrep_tempfile_copyin(sc, 0, rtb->rbmblocks, + xrep_rtbitmap_prep_buf, rtb); + if (error) + return error; + error = xrep_tempfile_set_isize(sc, + XFS_FSB_TO_B(sc->mp, sc->mp->m_sb.sb_rbmblocks)); + if (error) + return error; + + /* + * Now exchange the data fork contents. We're done with the temporary + * buffer, so we can reuse it for the tempfile exchmaps information. + */ + error = xrep_tempexch_trans_reserve(sc, XFS_DATA_FORK, 0, + rtb->rbmblocks, &rtb->tempexch); + if (error) + return error; + + error = xrep_tempexch_contents(sc, &rtb->tempexch); + if (error) + return error; + + /* Free the old rtbitmap blocks if they're not in use. */ + return xrep_reap_ifork(sc, sc->tempip, XFS_DATA_FORK); } diff --git a/fs/xfs/scrub/rtsummary_repair.c b/fs/xfs/scrub/rtsummary_repair.c index 8198ea84ad70e5..d593977d70df21 100644 --- a/fs/xfs/scrub/rtsummary_repair.c +++ b/fs/xfs/scrub/rtsummary_repair.c @@ -165,7 +165,8 @@ xrep_rtsummary( * Now exchange the contents. Nothing in repair uses the temporary * buffer, so we can reuse it for the tempfile exchrange information. */ - error = xrep_tempexch_trans_reserve(sc, XFS_DATA_FORK, &rts->tempexch); + error = xrep_tempexch_trans_reserve(sc, XFS_DATA_FORK, 0, + rts->rsumblocks, &rts->tempexch); if (error) return error; diff --git a/fs/xfs/scrub/tempexch.h b/fs/xfs/scrub/tempexch.h index 995ba187c5aa62..eccda720c2ca40 100644 --- a/fs/xfs/scrub/tempexch.h +++ b/fs/xfs/scrub/tempexch.h @@ -12,7 +12,7 @@ struct xrep_tempexch { }; int xrep_tempexch_trans_reserve(struct xfs_scrub *sc, int whichfork, - struct xrep_tempexch *ti); + xfs_fileoff_t off, xfs_filblks_t len, struct xrep_tempexch *ti); int xrep_tempexch_trans_alloc(struct xfs_scrub *sc, int whichfork, struct xrep_tempexch *ti); diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c index 4ebb5f8459e8f3..cf99e0ca51b008 100644 --- a/fs/xfs/scrub/tempfile.c +++ b/fs/xfs/scrub/tempfile.c @@ -606,6 +606,8 @@ STATIC int xrep_tempexch_prep_request( struct xfs_scrub *sc, int whichfork, + xfs_fileoff_t off, + xfs_filblks_t len, struct xrep_tempexch *tx) { struct xfs_exchmaps_req *req = &tx->req; @@ -629,18 +631,19 @@ xrep_tempexch_prep_request( /* Exchange all mappings in both forks. */ req->ip1 = sc->tempip; req->ip2 = sc->ip; - req->startoff1 = 0; - req->startoff2 = 0; + req->startoff1 = off; + req->startoff2 = off; switch (whichfork) { case XFS_ATTR_FORK: req->flags |= XFS_EXCHMAPS_ATTR_FORK; break; case XFS_DATA_FORK: - /* Always exchange sizes when exchanging data fork mappings. */ - req->flags |= XFS_EXCHMAPS_SET_SIZES; + /* Exchange sizes when exchanging all data fork mappings. */ + if (off == 0 && len == XFS_MAX_FILEOFF) + req->flags |= XFS_EXCHMAPS_SET_SIZES; break; } - req->blockcount = XFS_MAX_FILEOFF; + req->blockcount = len; return 0; } @@ -796,6 +799,8 @@ int xrep_tempexch_trans_reserve( struct xfs_scrub *sc, int whichfork, + xfs_fileoff_t off, + xfs_filblks_t len, struct xrep_tempexch *tx) { int error; @@ -804,7 +809,7 @@ xrep_tempexch_trans_reserve( xfs_assert_ilocked(sc->ip, XFS_ILOCK_EXCL); xfs_assert_ilocked(sc->tempip, XFS_ILOCK_EXCL); - error = xrep_tempexch_prep_request(sc, whichfork, tx); + error = xrep_tempexch_prep_request(sc, whichfork, off, len, tx); if (error) return error; @@ -842,7 +847,8 @@ xrep_tempexch_trans_alloc( ASSERT(sc->tp == NULL); ASSERT(xfs_has_exchange_range(sc->mp)); - error = xrep_tempexch_prep_request(sc, whichfork, tx); + error = xrep_tempexch_prep_request(sc, whichfork, 0, XFS_MAX_FILEOFF, + tx); if (error) return error; diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c index 98f923ae664d0e..2450e214103fed 100644 --- a/fs/xfs/scrub/trace.c +++ b/fs/xfs/scrub/trace.c @@ -21,6 +21,7 @@ #include "xfs_rmap.h" #include "xfs_parent.h" #include "xfs_metafile.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/xfile.h" #include "scrub/xfarray.h" diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 5afc440f22f56c..3b661e4443453c 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -17,6 +17,7 @@ #include "xfs_bit.h" #include "xfs_quota_defs.h" +struct xfs_rtgroup; struct xfs_scrub; struct xfile; struct xfarray; @@ -3607,6 +3608,155 @@ DEFINE_XCHK_METAPATH_EVENT(xrep_metapath_try_unlink); DEFINE_XCHK_METAPATH_EVENT(xrep_metapath_unlink); DEFINE_XCHK_METAPATH_EVENT(xrep_metapath_link); +#ifdef CONFIG_XFS_RT +DECLARE_EVENT_CLASS(xrep_rtbitmap_class, + TP_PROTO(struct xfs_mount *mp, xfs_rtxnum_t start, xfs_rtxnum_t end), + TP_ARGS(mp, start, end), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rtxnum_t, start) + __field(xfs_rtxnum_t, end) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->start = start; + __entry->end = end; + ), + TP_printk("dev %d:%d rtdev %d:%d startrtx 0x%llx endrtx 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->start, + __entry->end) +); +#define DEFINE_REPAIR_RGBITMAP_EVENT(name) \ +DEFINE_EVENT(xrep_rtbitmap_class, name, \ + TP_PROTO(struct xfs_mount *mp, xfs_rtxnum_t start, \ + xfs_rtxnum_t end), \ + TP_ARGS(mp, start, end)) +DEFINE_REPAIR_RGBITMAP_EVENT(xrep_rtbitmap_record_free); +DEFINE_REPAIR_RGBITMAP_EVENT(xrep_rtbitmap_record_free_bulk); + +TRACE_EVENT(xrep_rtbitmap_or, + TP_PROTO(struct xfs_mount *mp, unsigned long long wordoff, + xfs_rtword_t mask, xfs_rtword_t word), + TP_ARGS(mp, wordoff, mask, word), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(unsigned long long, wordoff) + __field(unsigned int, mask) + __field(unsigned int, word) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->wordoff = wordoff; + __entry->mask = mask; + __entry->word = word; + ), + TP_printk("dev %d:%d rtdev %d:%d wordoff 0x%llx mask 0x%x word 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->wordoff, + __entry->mask, + __entry->word) +); + +TRACE_EVENT(xrep_rtbitmap_load, + TP_PROTO(struct xfs_rtgroup *rtg, xfs_fileoff_t rbmoff, + xfs_rtxnum_t rtx, xfs_rtxnum_t len), + TP_ARGS(rtg, rbmoff, rtx, len), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgnumber_t, rgno) + __field(xfs_fileoff_t, rbmoff) + __field(xfs_rtxnum_t, rtx) + __field(xfs_rtxnum_t, len) + ), + TP_fast_assign( + __entry->dev = rtg_mount(rtg)->m_super->s_dev; + __entry->rtdev = rtg_mount(rtg)->m_rtdev_targp->bt_dev; + __entry->rgno = rtg_rgno(rtg); + __entry->rbmoff = rbmoff; + __entry->rtx = rtx; + __entry->len = len; + ), + TP_printk("dev %d:%d rtdev %d:%d rgno 0x%x rbmoff 0x%llx rtx 0x%llx rtxcount 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgno, + __entry->rbmoff, + __entry->rtx, + __entry->len) +); + +TRACE_EVENT(xrep_rtbitmap_load_words, + TP_PROTO(struct xfs_mount *mp, xfs_fileoff_t rbmoff, + unsigned long long wordoff, unsigned int wordcnt), + TP_ARGS(mp, rbmoff, wordoff, wordcnt), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_fileoff_t, rbmoff) + __field(unsigned long long, wordoff) + __field(unsigned int, wordcnt) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->rbmoff = rbmoff; + __entry->wordoff = wordoff; + __entry->wordcnt = wordcnt; + ), + TP_printk("dev %d:%d rtdev %d:%d rbmoff 0x%llx wordoff 0x%llx wordcnt 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rbmoff, + __entry->wordoff, + __entry->wordcnt) +); + +TRACE_EVENT(xrep_rtbitmap_load_word, + TP_PROTO(struct xfs_mount *mp, unsigned long long wordoff, + unsigned int bit, xfs_rtword_t ondisk_word, + xfs_rtword_t xfile_word, xfs_rtword_t word_mask), + TP_ARGS(mp, wordoff, bit, ondisk_word, xfile_word, word_mask), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(unsigned long long, wordoff) + __field(unsigned int, bit) + __field(xfs_rtword_t, ondisk_word) + __field(xfs_rtword_t, xfile_word) + __field(xfs_rtword_t, word_mask) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->wordoff = wordoff; + __entry->bit = bit; + __entry->ondisk_word = ondisk_word; + __entry->xfile_word = xfile_word; + __entry->word_mask = word_mask; + ), + TP_printk("dev %d:%d rtdev %d:%d wordoff 0x%llx bit %u ondisk 0x%x(0x%x) inmem 0x%x(0x%x) result 0x%x mask 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->wordoff, + __entry->bit, + __entry->ondisk_word, + __entry->ondisk_word & __entry->word_mask, + __entry->xfile_word, + __entry->xfile_word & ~__entry->word_mask, + (__entry->xfile_word & ~__entry->word_mask) | + (__entry->ondisk_word & __entry->word_mask), + __entry->word_mask) +); +#endif /* CONFIG_XFS_RT */ + #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */ #endif /* _TRACE_XFS_SCRUB_TRACE_H */ From patchwork Fri Dec 13 01:08:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906224 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B6EBC2629D for ; Fri, 13 Dec 2024 01:08:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052121; cv=none; b=eTrfrxuGrDOCZBxVSe6FRfncDZ+5GGEjlup8YKHrSo3FF58sQA8MXGHVyv8O/2WZOOwLLBAVe+4ugx1UTgiHYwRIYHpugKAuvY7+V/oqe02jzYW4Nc1+wxAfHmZP8urdPtORJubDRHozaMXwZ8MJ+3dxDfkY7cfclUqI+I0oM7I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052121; c=relaxed/simple; bh=0XR9nzjOOnXEdk0J6r9drqwcJPLcxFbSP6mfMnCugWU=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=RCQoc9a2jSfwIPFyc/JteXdVHHy/TTz7WrRdHYXycDCR6oShFRdZ+3hfSET0fdxGmEHvb0yuvUgsYgABc/RDWLhHW52Qe1nSGYV9BZyUo7b0X3kyYgFySomXsxzypEFuSgg2Pbk+31CJS+csFPPPQn5BL7BC1lcbbf8JPE4wi4Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vRTEq2Mf; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vRTEq2Mf" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 930B7C4CECE; Fri, 13 Dec 2024 01:08:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052121; bh=0XR9nzjOOnXEdk0J6r9drqwcJPLcxFbSP6mfMnCugWU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=vRTEq2MfUmuneSvspbdDgWSdxybJ8RRngtjM0iJZ8M+U7uVWvjS9NQmAdokUGyHp9 UODB8l/0VFrmvdfILgyd7n20fyGT75cpFucaOc2zVcBYq1y8thwc2rlxFNT2HPoJCS Ej3+b+s30kopQovdNCRGBCcPd4a5I33B2iJvIh0poFgiIgcbKwhZhyYpkuJ9lnTHI6 JludF5xy0xtiJNnUGDVtRQ1/JCtKSujaNcTF//9EPGjIMqmbdo5XdMfLy2ixZ5vGfu 7mf2mX0uGiXmF4UophUIido5h+B21CbMNQLk8BLm/hpB99rSlKkle+UgUsw42F3Ius jkEiyYUys5MNA== Date: Thu, 12 Dec 2024 17:08:41 -0800 Subject: [PATCH 31/37] xfs: support repairing metadata btrees rooted in metadir inodes From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123847.1181370.11971021393841190421.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Adapt the repair code so that we can stage a new btree in the data fork area of a metadir inode and reap the old blocks. We already have nearly all of the infrastructure; the only parts that were missing were the metadata inode reservation handling. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/scrub/newbt.c | 42 ++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/newbt.h | 1 + fs/xfs/scrub/reap.c | 41 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/reap.h | 2 ++ 4 files changed, 86 insertions(+) diff --git a/fs/xfs/scrub/newbt.c b/fs/xfs/scrub/newbt.c index 70af27d987342f..ac38f584309029 100644 --- a/fs/xfs/scrub/newbt.c +++ b/fs/xfs/scrub/newbt.c @@ -19,6 +19,8 @@ #include "xfs_rmap.h" #include "xfs_ag.h" #include "xfs_defer.h" +#include "xfs_metafile.h" +#include "xfs_quota.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -120,6 +122,43 @@ xrep_newbt_init_inode( return 0; } +/* + * Initialize accounting resources for staging a new metadata inode btree. + * If the metadata file has a space reservation, the caller must adjust that + * reservation when committing the new ondisk btree. + */ +int +xrep_newbt_init_metadir_inode( + struct xrep_newbt *xnr, + struct xfs_scrub *sc) +{ + struct xfs_owner_info oinfo; + struct xfs_ifork *ifp; + + ASSERT(xfs_is_metadir_inode(sc->ip)); + + xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK); + + ifp = kmem_cache_zalloc(xfs_ifork_cache, XCHK_GFP_FLAGS); + if (!ifp) + return -ENOMEM; + + /* + * Allocate new metadir btree blocks with XFS_AG_RESV_NONE because the + * inode metadata space reservations can only account allocated space + * to the i_nblocks. We do not want to change the inode core fields + * until we're ready to commit the new tree, so we allocate the blocks + * as if they were regular file blocks. This exposes us to a higher + * risk of the repair being cancelled due to ENOSPC. + */ + xrep_newbt_init_ag(xnr, sc, &oinfo, + XFS_INO_TO_FSB(sc->mp, sc->ip->i_ino), + XFS_AG_RESV_NONE); + xnr->ifake.if_fork = ifp; + xnr->ifake.if_fork_size = xfs_inode_fork_size(sc->ip, XFS_DATA_FORK); + return 0; +} + /* * Initialize accounting resources for staging a new btree. Callers are * expected to add their own reservations (and clean them up) manually. @@ -224,6 +263,7 @@ xrep_newbt_alloc_ag_blocks( int error = 0; ASSERT(sc->sa.pag != NULL); + ASSERT(xnr->resv != XFS_AG_RESV_METAFILE); while (nr_blocks > 0) { struct xfs_alloc_arg args = { @@ -297,6 +337,8 @@ xrep_newbt_alloc_file_blocks( struct xfs_mount *mp = sc->mp; int error = 0; + ASSERT(xnr->resv != XFS_AG_RESV_METAFILE); + while (nr_blocks > 0) { struct xfs_alloc_arg args = { .tp = sc->tp, diff --git a/fs/xfs/scrub/newbt.h b/fs/xfs/scrub/newbt.h index 3d804d31af24a8..5ce785599287be 100644 --- a/fs/xfs/scrub/newbt.h +++ b/fs/xfs/scrub/newbt.h @@ -63,6 +63,7 @@ void xrep_newbt_init_ag(struct xrep_newbt *xnr, struct xfs_scrub *sc, enum xfs_ag_resv_type resv); int xrep_newbt_init_inode(struct xrep_newbt *xnr, struct xfs_scrub *sc, int whichfork, const struct xfs_owner_info *oinfo); +int xrep_newbt_init_metadir_inode(struct xrep_newbt *xnr, struct xfs_scrub *sc); int xrep_newbt_alloc_blocks(struct xrep_newbt *xnr, uint64_t nr_blocks); int xrep_newbt_add_extent(struct xrep_newbt *xnr, struct xfs_perag *pag, xfs_agblock_t agbno, xfs_extlen_t len); diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c index 08230952053b7d..4d7f1b82dc559d 100644 --- a/fs/xfs/scrub/reap.c +++ b/fs/xfs/scrub/reap.c @@ -33,6 +33,7 @@ #include "xfs_attr.h" #include "xfs_attr_remote.h" #include "xfs_defer.h" +#include "xfs_metafile.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -390,6 +391,8 @@ xreap_agextent_iter( xfs_fsblock_t fsbno; int error = 0; + ASSERT(rs->resv != XFS_AG_RESV_METAFILE); + fsbno = xfs_agbno_to_fsb(sc->sa.pag, agbno); /* @@ -675,6 +678,44 @@ xrep_reap_fsblocks( return 0; } +/* + * Dispose of every block of an old metadata btree that used to be rooted in a + * metadata directory file. + */ +int +xrep_reap_metadir_fsblocks( + struct xfs_scrub *sc, + struct xfsb_bitmap *bitmap) +{ + /* + * Reap old metadir btree blocks with XFS_AG_RESV_NONE because the old + * blocks are no longer mapped by the inode, and inode metadata space + * reservations can only account freed space to the i_nblocks. + */ + struct xfs_owner_info oinfo; + struct xreap_state rs = { + .sc = sc, + .oinfo = &oinfo, + .resv = XFS_AG_RESV_NONE, + }; + int error; + + ASSERT(xfs_has_rmapbt(sc->mp)); + ASSERT(sc->ip != NULL); + ASSERT(xfs_is_metadir_inode(sc->ip)); + + xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK); + + error = xfsb_bitmap_walk(bitmap, xreap_fsmeta_extent, &rs); + if (error) + return error; + + if (xreap_dirty(&rs)) + return xrep_defer_finish(sc); + + return 0; +} + /* * Metadata files are not supposed to share blocks with anything else. * If blocks are shared, we remove the reverse mapping (thus reducing the diff --git a/fs/xfs/scrub/reap.h b/fs/xfs/scrub/reap.h index 3f2f1775e29db4..70e5e6bbb8d38d 100644 --- a/fs/xfs/scrub/reap.h +++ b/fs/xfs/scrub/reap.h @@ -14,6 +14,8 @@ int xrep_reap_agblocks(struct xfs_scrub *sc, struct xagb_bitmap *bitmap, int xrep_reap_fsblocks(struct xfs_scrub *sc, struct xfsb_bitmap *bitmap, const struct xfs_owner_info *oinfo); int xrep_reap_ifork(struct xfs_scrub *sc, struct xfs_inode *ip, int whichfork); +int xrep_reap_metadir_fsblocks(struct xfs_scrub *sc, + struct xfsb_bitmap *bitmap); /* Buffer cache scan context. */ struct xrep_bufscan { From patchwork Fri Dec 13 01:08:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906225 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62E6929415 for ; Fri, 13 Dec 2024 01:08:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052137; cv=none; b=ExOJEgI3slV5VE+IA5H7YUhz1qiqED58Aym8HDflgI4+pnk3MwyqdbdNvdc4RX/bESR+Cju6KOh5PjhWftbM0lY8foHvFMZfUk6Fve6zj09DxoC0kbd4jZDwB0lIB1xP+wKm9fk4KnRqzxN7UUT1tBk8qLH47d/bX1EDpF3neEQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052137; c=relaxed/simple; bh=BsK4+nGF3LiBb8ELpezgHfe7a+k+OCVrVCcwOLvJNxw=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=JDfk/XlRsFC8RyWDPoN93FpM2RxSAgp4JvXC5ODvzZ7F9AFAyx43EydZB5N+S9TwmSxq8ZVACBG+XKcUMSu5lVanBPPiBwnk9L1on76UGXuLn+gJl0F7BpkzoKj8FfOTncaz+iKdRR+7co/7aYlgg73Z/Z8RX9iheeW+MEys8Ng= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tg0sdidN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tg0sdidN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 320E5C4CECE; Fri, 13 Dec 2024 01:08:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052137; bh=BsK4+nGF3LiBb8ELpezgHfe7a+k+OCVrVCcwOLvJNxw=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=tg0sdidNUszkEIgPs9z0unI2xV3NyzRJmDt1wumgHq1W3wnGXs7xpYwlNluobGjvI /1JBzokva1yvl5wsq9fRifFu9ksj+HzsgAdJMJZsBPHLtzs+xWKg1XdJ/TKW9vLi1U V0oe955bq1BhaRXRoM0pGNXNd/chxOd9RBO9BbceAjkKtoNWtLLNB5l4lbnnW+PEl9 QZJtnMx9bi6hd19p8Q4Lji4cT6eD+T+uMZ4ywGFz9IOAB1NOBaclj1o5Mo0XcBj6c8 PfIku8FNZiWSVhJF6wdEr7tBzaWqPYfv4NEuE3Zb8QZtk7xl/VCQRtjq9ADfnj7FZ4 rtgaLiqtu5xnw== Date: Thu, 12 Dec 2024 17:08:56 -0800 Subject: [PATCH 32/37] xfs: online repair of the realtime rmap btree From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123864.1181370.13663462267519047567.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Repair the realtime rmap btree while mounted. Signed-off-by: "Darrick J. Wong" --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_btree_staging.c | 1 fs/xfs/libxfs/xfs_rtrmap_btree.c | 2 fs/xfs/libxfs/xfs_rtrmap_btree.h | 3 fs/xfs/scrub/common.c | 7 fs/xfs/scrub/common.h | 2 fs/xfs/scrub/repair.c | 142 +++++++ fs/xfs/scrub/repair.h | 14 + fs/xfs/scrub/rtrmap.c | 7 fs/xfs/scrub/rtrmap_repair.c | 733 +++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.c | 2 fs/xfs/scrub/trace.h | 57 +++ 12 files changed, 967 insertions(+), 4 deletions(-) create mode 100644 fs/xfs/scrub/rtrmap_repair.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 136a465e00d2b1..338e10f81b7b71 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -234,6 +234,7 @@ xfs-y += $(addprefix scrub/, \ xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ rtbitmap_repair.o \ + rtrmap_repair.o \ rtsummary_repair.o \ ) diff --git a/fs/xfs/libxfs/xfs_btree_staging.c b/fs/xfs/libxfs/xfs_btree_staging.c index 58c146b5c9d479..5ed84f9cc877ef 100644 --- a/fs/xfs/libxfs/xfs_btree_staging.c +++ b/fs/xfs/libxfs/xfs_btree_staging.c @@ -134,6 +134,7 @@ xfs_btree_stage_ifakeroot( cur->bc_ino.ifake = ifake; cur->bc_nlevels = ifake->if_levels; cur->bc_ino.forksize = ifake->if_fork_size; + cur->bc_ino.whichfork = XFS_STAGING_FORK; cur->bc_flags |= XFS_BTREE_STAGING; } diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 0a78dee01b1b2e..571a9e1b956099 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -660,7 +660,7 @@ xfs_rtrmapbt_compute_maxlevels( } /* Calculate the rtrmap btree size for some records. */ -static unsigned long long +unsigned long long xfs_rtrmapbt_calc_size( struct xfs_mount *mp, unsigned long long len) diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index db313492b17eed..6e3dab8c44f7c2 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -198,4 +198,7 @@ int xfs_rtrmapbt_create(struct xfs_rtgroup *rtg, struct xfs_inode *ip, int xfs_rtrmapbt_init_rtsb(struct xfs_mount *mp, struct xfs_rtgroup *rtg, struct xfs_trans *tp); +unsigned long long xfs_rtrmapbt_calc_size(struct xfs_mount *mp, + unsigned long long len); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index ca43dd4f52b2d6..ab2509ef3bc10c 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -819,7 +819,7 @@ xchk_rtgroup_btcur_free( * Unlock the realtime group. This must be done /after/ committing (or * cancelling) the scrub transaction. */ -static void +void xchk_rtgroup_unlock( struct xchk_rt *sr) { @@ -904,7 +904,10 @@ int xchk_setup_rt( struct xfs_scrub *sc) { - return xchk_trans_alloc(sc, 0); + uint resblks; + + resblks = xrep_calc_rtgroup_resblks(sc); + return xchk_trans_alloc(sc, resblks); } /* Set us up with AG headers and btree cursors. */ diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index e5891609af2740..50ac6cca18fe45 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -147,12 +147,14 @@ xchk_rtgroup_init_existing( int xchk_rtgroup_lock(struct xfs_scrub *sc, struct xchk_rt *sr, unsigned int rtglock_flags); +void xchk_rtgroup_unlock(struct xchk_rt *sr); void xchk_rtgroup_btcur_free(struct xchk_rt *sr); void xchk_rtgroup_free(struct xfs_scrub *sc, struct xchk_rt *sr); #else # define xchk_rtgroup_init(sc, rgno, sr) (-EFSCORRUPTED) # define xchk_rtgroup_init_existing(sc, rgno, sr) (-EFSCORRUPTED) # define xchk_rtgroup_lock(sc, sr, lockflags) (-EFSCORRUPTED) +# define xchk_rtgroup_unlock(sr) do { } while (0) # define xchk_rtgroup_btcur_free(sr) do { } while (0) # define xchk_rtgroup_free(sc, sr) do { } while (0) #endif /* CONFIG_XFS_RT */ diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 82fe01d78cb08d..d58843017391be 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -40,6 +40,8 @@ #include "xfs_rtrmap_btree.h" #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" +#include "xfs_rtalloc.h" +#include "xfs_metafile.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -382,6 +384,39 @@ xrep_calc_ag_resblks( return max(max(bnobt_sz, inobt_sz), max(rmapbt_sz, refcbt_sz)); } +#ifdef CONFIG_XFS_RT +/* + * Figure out how many blocks to reserve for a rtgroup repair. We calculate + * the worst case estimate for the number of blocks we'd need to rebuild one of + * any type of per-rtgroup btree. + */ +xfs_extlen_t +xrep_calc_rtgroup_resblks( + struct xfs_scrub *sc) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_scrub_metadata *sm = sc->sm; + struct xfs_rtgroup *rtg; + xfs_extlen_t usedlen; + xfs_extlen_t rmapbt_sz = 0; + + if (!(sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)) + return 0; + + rtg = xfs_rtgroup_get(mp, sm->sm_agno); + usedlen = rtg->rtg_extents * mp->m_sb.sb_rextsize; + xfs_rtgroup_put(rtg); + + if (xfs_has_rmapbt(mp)) + rmapbt_sz = xfs_rtrmapbt_calc_size(mp, usedlen); + + trace_xrep_calc_rtgroup_resblks_btsize(mp, sm->sm_agno, usedlen, + rmapbt_sz); + + return rmapbt_sz; +} +#endif /* CONFIG_XFS_RT */ + /* * Reconstructing per-AG Btrees * @@ -1284,3 +1319,110 @@ xrep_buf_verify_struct( return fa == NULL; } + +/* Check the sanity of a rmap record for a metadata btree inode. */ +int +xrep_check_ino_btree_mapping( + struct xfs_scrub *sc, + const struct xfs_rmap_irec *rec) +{ + enum xbtree_recpacking outcome; + int error; + + /* + * Metadata btree inodes never have extended attributes, and all blocks + * should have the bmbt block flag set. + */ + if ((rec->rm_flags & XFS_RMAP_ATTR_FORK) || + !(rec->rm_flags & XFS_RMAP_BMBT_BLOCK)) + return -EFSCORRUPTED; + + /* Make sure the block is within the AG. */ + if (!xfs_verify_agbext(sc->sa.pag, rec->rm_startblock, + rec->rm_blockcount)) + return -EFSCORRUPTED; + + /* Make sure this isn't free space. */ + error = xfs_alloc_has_records(sc->sa.bno_cur, rec->rm_startblock, + rec->rm_blockcount, &outcome); + if (error) + return error; + if (outcome != XBTREE_RECPACKING_EMPTY) + return -EFSCORRUPTED; + + return 0; +} + +/* + * Reset the block count of the inode being repaired, and adjust the dquot + * block usage to match. The inode must not have an xattr fork. + */ +void +xrep_inode_set_nblocks( + struct xfs_scrub *sc, + int64_t new_blocks) +{ + int64_t delta; + + delta = new_blocks - sc->ip->i_nblocks; + sc->ip->i_nblocks = new_blocks; + + xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE); + if (delta != 0) + xfs_trans_mod_dquot_byino(sc->tp, sc->ip, XFS_TRANS_DQ_BCOUNT, + delta); +} + +/* Reset the block reservation for a metadata inode. */ +int +xrep_reset_metafile_resv( + struct xfs_scrub *sc) +{ + struct xfs_inode *ip = sc->ip; + int64_t delta; + int error; + + delta = ip->i_nblocks + ip->i_delayed_blks - ip->i_meta_resv_asked; + if (delta == 0) + return 0; + + /* + * Too many blocks have been reserved, transfer some from the incore + * reservation back to the filesystem. + */ + if (delta > 0) { + int64_t give_back; + + give_back = min_t(uint64_t, delta, ip->i_delayed_blks); + if (give_back > 0) { + xfs_mod_delalloc(ip, 0, -give_back); + xfs_add_fdblocks(ip->i_mount, give_back); + ip->i_delayed_blks -= give_back; + } + + return 0; + } + + /* + * Not enough reservation; try to take some blocks from the filesystem + * to the metadata inode. @delta is negative here, so invert the sign. + */ + delta = -delta; + error = xfs_dec_fdblocks(sc->mp, delta, true); + while (error == -ENOSPC) { + delta--; + if (delta == 0) { + xfs_warn(sc->mp, +"Insufficient free space to reset space reservation for inode 0x%llx after repair.", + ip->i_ino); + return 0; + } + error = xfs_dec_fdblocks(sc->mp, delta, true); + } + if (error) + return error; + + xfs_mod_delalloc(ip, 0, delta); + ip->i_delayed_blks += delta; + return 0; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 7f493752ea78e6..ac5962732d269d 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -97,6 +97,7 @@ int xrep_setup_parent(struct xfs_scrub *sc); int xrep_setup_nlinks(struct xfs_scrub *sc); int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *resblks); int xrep_setup_dirtree(struct xfs_scrub *sc); +int xrep_setup_rtrmapbt(struct xfs_scrub *sc); /* Repair setup functions */ int xrep_setup_ag_allocbt(struct xfs_scrub *sc); @@ -113,10 +114,15 @@ int xrep_rtgroup_init(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, void xrep_rtgroup_btcur_init(struct xfs_scrub *sc, struct xchk_rt *sr); int xrep_require_rtext_inuse(struct xfs_scrub *sc, xfs_rgblock_t rgbno, xfs_filblks_t len); +xfs_extlen_t xrep_calc_rtgroup_resblks(struct xfs_scrub *sc); #else # define xrep_rtgroup_init(sc, rtg, sr, lockflags) (-ENOSYS) +# define xrep_calc_rtgroup_resblks(sc) (0) #endif /* CONFIG_XFS_RT */ +int xrep_check_ino_btree_mapping(struct xfs_scrub *sc, + const struct xfs_rmap_irec *rec); + /* Metadata revalidators */ int xrep_revalidate_allocbt(struct xfs_scrub *sc); @@ -150,10 +156,12 @@ int xrep_metapath(struct xfs_scrub *sc); int xrep_rtbitmap(struct xfs_scrub *sc); int xrep_rtsummary(struct xfs_scrub *sc); int xrep_rgsuperblock(struct xfs_scrub *sc); +int xrep_rtrmapbt(struct xfs_scrub *sc); #else # define xrep_rtbitmap xrep_notsupported # define xrep_rtsummary xrep_notsupported # define xrep_rgsuperblock xrep_notsupported +# define xrep_rtrmapbt xrep_notsupported #endif /* CONFIG_XFS_RT */ #ifdef CONFIG_XFS_QUOTA @@ -172,6 +180,8 @@ int xrep_trans_alloc_hook_dummy(struct xfs_mount *mp, void **cookiep, void xrep_trans_cancel_hook_dummy(void **cookiep, struct xfs_trans *tp); bool xrep_buf_verify_struct(struct xfs_buf *bp, const struct xfs_buf_ops *ops); +void xrep_inode_set_nblocks(struct xfs_scrub *sc, int64_t new_blocks); +int xrep_reset_metafile_resv(struct xfs_scrub *sc); #else @@ -195,6 +205,8 @@ xrep_calc_ag_resblks( return 0; } +#define xrep_calc_rtgroup_resblks xrep_calc_ag_resblks + static inline int xrep_reset_perag_resv( struct xfs_scrub *sc) @@ -222,6 +234,7 @@ xrep_setup_nothing( #define xrep_setup_nlinks xrep_setup_nothing #define xrep_setup_dirtree xrep_setup_nothing #define xrep_setup_metapath xrep_setup_nothing +#define xrep_setup_rtrmapbt xrep_setup_nothing #define xrep_setup_inode(sc, imap) ((void)0) @@ -259,6 +272,7 @@ static inline int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *x) #define xrep_dirtree xrep_notsupported #define xrep_metapath xrep_notsupported #define xrep_rgsuperblock xrep_notsupported +#define xrep_rtrmapbt xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index 764fa296792234..300a1e85b3d625 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -27,6 +27,7 @@ #include "scrub/common.h" #include "scrub/btree.h" #include "scrub/trace.h" +#include "scrub/repair.h" /* Set us up with the realtime metadata locked. */ int @@ -38,6 +39,12 @@ xchk_setup_rtrmapbt( if (xchk_need_intent_drain(sc)) xchk_fsgates_enable(sc, XCHK_FSGATES_DRAIN); + if (xchk_could_repair(sc)) { + error = xrep_setup_rtrmapbt(sc); + if (error) + return error; + } + error = xchk_rtgroup_init(sc, sc->sm->sm_agno, &sc->sr); if (error) return error; diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c new file mode 100644 index 00000000000000..60e317725dea86 --- /dev/null +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -0,0 +1,733 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2020-2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_alloc.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_inode.h" +#include "xfs_icache.h" +#include "xfs_bmap.h" +#include "xfs_bmap_btree.h" +#include "xfs_quota.h" +#include "xfs_rtalloc.h" +#include "xfs_ag.h" +#include "xfs_rtgroup.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" +#include "scrub/fsb_bitmap.h" +#include "scrub/xfile.h" +#include "scrub/xfarray.h" +#include "scrub/iscan.h" +#include "scrub/newbt.h" +#include "scrub/reap.h" + +/* + * Realtime Reverse Mapping Btree Repair + * ===================================== + * + * This isn't quite as difficult as repairing the rmap btree on the data + * device, since we only store the data fork extents of realtime files on the + * realtime device. We still have to freeze the filesystem and stop the + * background threads like we do for the rmap repair, but we only have to scan + * realtime inodes. + * + * Collecting entries for the new realtime rmap btree is easy -- all we have + * to do is generate rtrmap entries from the data fork mappings of all realtime + * files in the filesystem. We then scan the rmap btrees of the data device + * looking for extents belonging to the old btree and note them in a bitmap. + * + * To rebuild the realtime rmap btree, we bulk-load the collected mappings into + * a new btree cursor and atomically swap that into the realtime inode. Then + * we can free the blocks from the old btree. + * + * We use the 'xrep_rtrmap' prefix for all the rmap functions. + */ + +/* + * Packed rmap record. The UNWRITTEN flags are hidden in the upper bits of + * offset, just like the on-disk record. + */ +struct xrep_rtrmap_extent { + xfs_rgblock_t startblock; + xfs_extlen_t blockcount; + uint64_t owner; + uint64_t offset; +} __packed; + +/* Context for collecting rmaps */ +struct xrep_rtrmap { + /* new rtrmapbt information */ + struct xrep_newbt new_btree; + + /* rmap records generated from primary metadata */ + struct xfarray *rtrmap_records; + + struct xfs_scrub *sc; + + /* bitmap of old rtrmapbt blocks */ + struct xfsb_bitmap old_rtrmapbt_blocks; + + /* inode scan cursor */ + struct xchk_iscan iscan; + + /* get_records()'s position in the free space record array. */ + xfarray_idx_t array_cur; +}; + +/* Set us up to repair rt reverse mapping btrees. */ +int +xrep_setup_rtrmapbt( + struct xfs_scrub *sc) +{ + struct xrep_rtrmap *rr; + + rr = kzalloc(sizeof(struct xrep_rtrmap), XCHK_GFP_FLAGS); + if (!rr) + return -ENOMEM; + + rr->sc = sc; + sc->buf = rr; + return 0; +} + +/* Make sure there's nothing funny about this mapping. */ +STATIC int +xrep_rtrmap_check_mapping( + struct xfs_scrub *sc, + const struct xfs_rmap_irec *rec) +{ + if (xfs_rtrmap_check_irec(sc->sr.rtg, rec) != NULL) + return -EFSCORRUPTED; + + /* Make sure this isn't free space. */ + return xrep_require_rtext_inuse(sc, rec->rm_startblock, + rec->rm_blockcount); +} + +/* Store a reverse-mapping record. */ +static inline int +xrep_rtrmap_stash( + struct xrep_rtrmap *rr, + xfs_rgblock_t startblock, + xfs_extlen_t blockcount, + uint64_t owner, + uint64_t offset, + unsigned int flags) +{ + struct xrep_rtrmap_extent rre = { + .startblock = startblock, + .blockcount = blockcount, + .owner = owner, + }; + struct xfs_rmap_irec rmap = { + .rm_startblock = startblock, + .rm_blockcount = blockcount, + .rm_owner = owner, + .rm_offset = offset, + .rm_flags = flags, + }; + struct xfs_scrub *sc = rr->sc; + int error = 0; + + if (xchk_should_terminate(sc, &error)) + return error; + + trace_xrep_rtrmap_found(sc->mp, &rmap); + + rre.offset = xfs_rmap_irec_offset_pack(&rmap); + return xfarray_append(rr->rtrmap_records, &rre); +} + +/* Finding all file and bmbt extents. */ + +/* Context for accumulating rmaps for an inode fork. */ +struct xrep_rtrmap_ifork { + /* + * Accumulate rmap data here to turn multiple adjacent bmaps into a + * single rmap. + */ + struct xfs_rmap_irec accum; + + struct xrep_rtrmap *rr; +}; + +/* Stash an rmap that we accumulated while walking an inode fork. */ +STATIC int +xrep_rtrmap_stash_accumulated( + struct xrep_rtrmap_ifork *rf) +{ + if (rf->accum.rm_blockcount == 0) + return 0; + + return xrep_rtrmap_stash(rf->rr, rf->accum.rm_startblock, + rf->accum.rm_blockcount, rf->accum.rm_owner, + rf->accum.rm_offset, rf->accum.rm_flags); +} + +/* Accumulate a bmbt record. */ +STATIC int +xrep_rtrmap_visit_bmbt( + struct xfs_btree_cur *cur, + struct xfs_bmbt_irec *rec, + void *priv) +{ + struct xrep_rtrmap_ifork *rf = priv; + struct xfs_rmap_irec *accum = &rf->accum; + struct xfs_mount *mp = rf->rr->sc->mp; + xfs_rgblock_t rgbno; + unsigned int rmap_flags = 0; + int error; + + if (xfs_rtb_to_rgno(mp, rec->br_startblock) != + rtg_rgno(rf->rr->sc->sr.rtg)) + return 0; + + if (rec->br_state == XFS_EXT_UNWRITTEN) + rmap_flags |= XFS_RMAP_UNWRITTEN; + + /* If this bmap is adjacent to the previous one, just add it. */ + rgbno = xfs_rtb_to_rgbno(mp, rec->br_startblock); + if (accum->rm_blockcount > 0 && + rec->br_startoff == accum->rm_offset + accum->rm_blockcount && + rgbno == accum->rm_startblock + accum->rm_blockcount && + rmap_flags == accum->rm_flags) { + accum->rm_blockcount += rec->br_blockcount; + return 0; + } + + /* Otherwise stash the old rmap and start accumulating a new one. */ + error = xrep_rtrmap_stash_accumulated(rf); + if (error) + return error; + + accum->rm_startblock = rgbno; + accum->rm_blockcount = rec->br_blockcount; + accum->rm_offset = rec->br_startoff; + accum->rm_flags = rmap_flags; + return 0; +} + +/* + * Iterate the block mapping btree to collect rmap records for anything in this + * fork that maps to the rt volume. Sets @mappings_done to true if we've + * scanned the block mappings in this fork. + */ +STATIC int +xrep_rtrmap_scan_bmbt( + struct xrep_rtrmap_ifork *rf, + struct xfs_inode *ip, + bool *mappings_done) +{ + struct xrep_rtrmap *rr = rf->rr; + struct xfs_btree_cur *cur; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + int error = 0; + + *mappings_done = false; + + /* + * If the incore extent cache is already loaded, we'll just use the + * incore extent scanner to record mappings. Don't bother walking the + * ondisk extent tree. + */ + if (!xfs_need_iread_extents(ifp)) + return 0; + + /* Accumulate all the mappings in the bmap btree. */ + cur = xfs_bmbt_init_cursor(rr->sc->mp, rr->sc->tp, ip, XFS_DATA_FORK); + error = xfs_bmap_query_all(cur, xrep_rtrmap_visit_bmbt, rf); + xfs_btree_del_cursor(cur, error); + if (error) + return error; + + /* Stash any remaining accumulated rmaps and exit. */ + *mappings_done = true; + return xrep_rtrmap_stash_accumulated(rf); +} + +/* + * Iterate the in-core extent cache to collect rmap records for anything in + * this fork that matches the AG. + */ +STATIC int +xrep_rtrmap_scan_iext( + struct xrep_rtrmap_ifork *rf, + struct xfs_ifork *ifp) +{ + struct xfs_bmbt_irec rec; + struct xfs_iext_cursor icur; + int error; + + for_each_xfs_iext(ifp, &icur, &rec) { + if (isnullstartblock(rec.br_startblock)) + continue; + error = xrep_rtrmap_visit_bmbt(NULL, &rec, rf); + if (error) + return error; + } + + return xrep_rtrmap_stash_accumulated(rf); +} + +/* Find all the extents on the realtime device mapped by an inode fork. */ +STATIC int +xrep_rtrmap_scan_dfork( + struct xrep_rtrmap *rr, + struct xfs_inode *ip) +{ + struct xrep_rtrmap_ifork rf = { + .accum = { .rm_owner = ip->i_ino, }, + .rr = rr, + }; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + int error = 0; + + if (ifp->if_format == XFS_DINODE_FMT_BTREE) { + bool mappings_done; + + /* + * Scan the bmbt for mappings. If the incore extent tree is + * loaded, we want to scan the cached mappings since that's + * faster when the extent counts are very high. + */ + error = xrep_rtrmap_scan_bmbt(&rf, ip, &mappings_done); + if (error || mappings_done) + return error; + } else if (ifp->if_format != XFS_DINODE_FMT_EXTENTS) { + /* realtime data forks should only be extents or btree */ + return -EFSCORRUPTED; + } + + /* Scan incore extent cache. */ + return xrep_rtrmap_scan_iext(&rf, ifp); +} + +/* Record reverse mappings for a file. */ +STATIC int +xrep_rtrmap_scan_inode( + struct xrep_rtrmap *rr, + struct xfs_inode *ip) +{ + unsigned int lock_mode; + int error = 0; + + /* Skip the rt rmap btree inode. */ + if (rr->sc->ip == ip) + return 0; + + lock_mode = xfs_ilock_data_map_shared(ip); + + /* Check the data fork if it's on the realtime device. */ + if (XFS_IS_REALTIME_INODE(ip)) { + error = xrep_rtrmap_scan_dfork(rr, ip); + if (error) + goto out_unlock; + } + + xchk_iscan_mark_visited(&rr->iscan, ip); +out_unlock: + xfs_iunlock(ip, lock_mode); + return error; +} + +/* Record extents that belong to the realtime rmap inode. */ +STATIC int +xrep_rtrmap_walk_rmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_rtrmap *rr = priv; + int error = 0; + + if (xchk_should_terminate(rr->sc, &error)) + return error; + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != rr->sc->ip->i_ino) + return 0; + + error = xrep_check_ino_btree_mapping(rr->sc, rec); + if (error) + return error; + + return xfsb_bitmap_set(&rr->old_rtrmapbt_blocks, + xfs_gbno_to_fsb(cur->bc_group, rec->rm_startblock), + rec->rm_blockcount); +} + +/* Scan one AG for reverse mappings for the realtime rmap btree. */ +STATIC int +xrep_rtrmap_scan_ag( + struct xrep_rtrmap *rr, + struct xfs_perag *pag) +{ + struct xfs_scrub *sc = rr->sc; + int error; + + error = xrep_ag_init(sc, pag, &sc->sa); + if (error) + return error; + + error = xfs_rmap_query_all(sc->sa.rmap_cur, xrep_rtrmap_walk_rmap, rr); + xchk_ag_free(sc, &sc->sa); + return error; +} + +/* Generate all the reverse-mappings for the realtime device. */ +STATIC int +xrep_rtrmap_find_rmaps( + struct xrep_rtrmap *rr) +{ + struct xfs_scrub *sc = rr->sc; + struct xfs_perag *pag = NULL; + struct xfs_inode *ip; + int error; + + /* Generate rmaps for the realtime superblock */ + if (xfs_has_rtsb(sc->mp) && rtg_rgno(rr->sc->sr.rtg) == 0) { + error = xrep_rtrmap_stash(rr, 0, sc->mp->m_sb.sb_rextsize, + XFS_RMAP_OWN_FS, 0, 0); + if (error) + return error; + } + + /* + * Set up for a potentially lengthy filesystem scan by reducing our + * transaction resource usage for the duration. Specifically: + * + * Unlock the realtime metadata inodes and cancel the transaction to + * release the log grant space while we scan the filesystem. + * + * Create a new empty transaction to eliminate the possibility of the + * inode scan deadlocking on cyclical metadata. + * + * We pass the empty transaction to the file scanning function to avoid + * repeatedly cycling empty transactions. This can be done even though + * we take the IOLOCK to quiesce the file because empty transactions + * do not take sb_internal. + */ + xchk_trans_cancel(sc); + xchk_rtgroup_unlock(&sc->sr); + error = xchk_trans_alloc_empty(sc); + if (error) + return error; + + while ((error = xchk_iscan_iter(&rr->iscan, &ip)) == 1) { + error = xrep_rtrmap_scan_inode(rr, ip); + xchk_irele(sc, ip); + if (error) + break; + + if (xchk_should_terminate(sc, &error)) + break; + } + xchk_iscan_iter_finish(&rr->iscan); + if (error) + return error; + + /* + * Switch out for a real transaction and lock the RT metadata in + * preparation for building a new tree. + */ + xchk_trans_cancel(sc); + error = xchk_setup_rt(sc); + if (error) + return error; + error = xchk_rtgroup_lock(sc, &sc->sr, XCHK_RTGLOCK_ALL); + if (error) + return error; + + /* Scan for old rtrmap blocks. */ + while ((pag = xfs_perag_next(sc->mp, pag))) { + error = xrep_rtrmap_scan_ag(rr, pag); + if (error) { + xfs_perag_rele(pag); + return error; + } + } + + return 0; +} + +/* Building the new rtrmap btree. */ + +/* Retrieve rtrmapbt data for bulk load. */ +STATIC int +xrep_rtrmap_get_records( + struct xfs_btree_cur *cur, + unsigned int idx, + struct xfs_btree_block *block, + unsigned int nr_wanted, + void *priv) +{ + struct xrep_rtrmap_extent rec; + struct xfs_rmap_irec *irec = &cur->bc_rec.r; + struct xrep_rtrmap *rr = priv; + union xfs_btree_rec *block_rec; + unsigned int loaded; + int error; + + for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { + error = xfarray_load_next(rr->rtrmap_records, &rr->array_cur, + &rec); + if (error) + return error; + + irec->rm_startblock = rec.startblock; + irec->rm_blockcount = rec.blockcount; + irec->rm_owner = rec.owner; + + if (xfs_rmap_irec_offset_unpack(rec.offset, irec) != NULL) + return -EFSCORRUPTED; + + error = xrep_rtrmap_check_mapping(rr->sc, irec); + if (error) + return error; + + block_rec = xfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + } + + return loaded; +} + +/* Feed one of the new btree blocks to the bulk loader. */ +STATIC int +xrep_rtrmap_claim_block( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr, + void *priv) +{ + struct xrep_rtrmap *rr = priv; + + return xrep_newbt_claim_block(cur, &rr->new_btree, ptr); +} + +/* Figure out how much space we need to create the incore btree root block. */ +STATIC size_t +xrep_rtrmap_iroot_size( + struct xfs_btree_cur *cur, + unsigned int level, + unsigned int nr_this_level, + void *priv) +{ + return xfs_rtrmap_broot_space_calc(cur->bc_mp, level, nr_this_level); +} + +/* + * Use the collected rmap information to stage a new rmap btree. If this is + * successful we'll return with the new btree root information logged to the + * repair transaction but not yet committed. This implements section (III) + * above. + */ +STATIC int +xrep_rtrmap_build_new_tree( + struct xrep_rtrmap *rr) +{ + struct xfs_scrub *sc = rr->sc; + struct xfs_rtgroup *rtg = sc->sr.rtg; + struct xfs_btree_cur *rmap_cur; + uint64_t nr_records; + int error; + + /* + * Prepare to construct the new btree by reserving disk space for the + * new btree and setting up all the accounting information we'll need + * to root the new btree while it's under construction and before we + * attach it to the realtime rmapbt inode. + */ + error = xrep_newbt_init_metadir_inode(&rr->new_btree, sc); + if (error) + return error; + + rr->new_btree.bload.get_records = xrep_rtrmap_get_records; + rr->new_btree.bload.claim_block = xrep_rtrmap_claim_block; + rr->new_btree.bload.iroot_size = xrep_rtrmap_iroot_size; + + rmap_cur = xfs_rtrmapbt_init_cursor(NULL, rtg); + xfs_btree_stage_ifakeroot(rmap_cur, &rr->new_btree.ifake); + + nr_records = xfarray_length(rr->rtrmap_records); + + /* Compute how many blocks we'll need for the rmaps collected. */ + error = xfs_btree_bload_compute_geometry(rmap_cur, + &rr->new_btree.bload, nr_records); + if (error) + goto err_cur; + + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + goto err_cur; + + /* + * Guess how many blocks we're going to need to rebuild an entire + * rtrmapbt from the number of extents we found, and pump up our + * transaction to have sufficient block reservation. We're allowed + * to exceed quota to repair inconsistent metadata, though this is + * unlikely. + */ + error = xfs_trans_reserve_more_inode(sc->tp, rtg_rmap(rtg), + rr->new_btree.bload.nr_blocks, 0, true); + if (error) + goto err_cur; + + /* Reserve the space we'll need for the new btree. */ + error = xrep_newbt_alloc_blocks(&rr->new_btree, + rr->new_btree.bload.nr_blocks); + if (error) + goto err_cur; + + /* Add all observed rmap records. */ + rr->new_btree.ifake.if_fork->if_format = XFS_DINODE_FMT_META_BTREE; + rr->array_cur = XFARRAY_CURSOR_INIT; + error = xfs_btree_bload(rmap_cur, &rr->new_btree.bload, rr); + if (error) + goto err_cur; + + /* + * Install the new rtrmap btree in the inode. After this point the old + * btree is no longer accessible, the new tree is live, and we can + * delete the cursor. + */ + xfs_rtrmapbt_commit_staged_btree(rmap_cur, sc->tp); + xrep_inode_set_nblocks(rr->sc, rr->new_btree.ifake.if_blocks); + xfs_btree_del_cursor(rmap_cur, 0); + + /* Dispose of any unused blocks and the accounting information. */ + error = xrep_newbt_commit(&rr->new_btree); + if (error) + return error; + + return xrep_roll_trans(sc); + +err_cur: + xfs_btree_del_cursor(rmap_cur, error); + xrep_newbt_cancel(&rr->new_btree); + return error; +} + +/* Reaping the old btree. */ + +/* Reap the old rtrmapbt blocks. */ +STATIC int +xrep_rtrmap_remove_old_tree( + struct xrep_rtrmap *rr) +{ + int error; + + /* + * Free all the extents that were allocated to the former rtrmapbt and + * aren't cross-linked with something else. + */ + error = xrep_reap_metadir_fsblocks(rr->sc, &rr->old_rtrmapbt_blocks); + if (error) + return error; + + /* + * Ensure the proper reservation for the rtrmap inode so that we don't + * fail to expand the new btree. + */ + return xrep_reset_metafile_resv(rr->sc); +} + +/* Set up the filesystem scan components. */ +STATIC int +xrep_rtrmap_setup_scan( + struct xrep_rtrmap *rr) +{ + struct xfs_scrub *sc = rr->sc; + char *descr; + int error; + + xfsb_bitmap_init(&rr->old_rtrmapbt_blocks); + + /* Set up some storage */ + descr = xchk_xfile_rtgroup_descr(sc, "reverse mapping records"); + error = xfarray_create(descr, 0, sizeof(struct xrep_rtrmap_extent), + &rr->rtrmap_records); + kfree(descr); + if (error) + goto out_bitmap; + + /* Retry iget every tenth of a second for up to 30 seconds. */ + xchk_iscan_start(sc, 30000, 100, &rr->iscan); + return 0; + +out_bitmap: + xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); + return error; +} + +/* Tear down scan components. */ +STATIC void +xrep_rtrmap_teardown( + struct xrep_rtrmap *rr) +{ + xchk_iscan_teardown(&rr->iscan); + xfarray_destroy(rr->rtrmap_records); + xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); +} + +/* Repair the realtime rmap btree. */ +int +xrep_rtrmapbt( + struct xfs_scrub *sc) +{ + struct xrep_rtrmap *rr = sc->buf; + int error; + + /* Functionality is not yet complete. */ + return xrep_notsupported(sc); + + /* Make sure any problems with the fork are fixed. */ + error = xrep_metadata_inode_forks(sc); + if (error) + return error; + + error = xrep_rtrmap_setup_scan(rr); + if (error) + return error; + + /* Collect rmaps for realtime files. */ + error = xrep_rtrmap_find_rmaps(rr); + if (error) + goto out_records; + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + /* Rebuild the rtrmap information. */ + error = xrep_rtrmap_build_new_tree(rr); + if (error) + goto out_records; + + /* Kill the old tree. */ + error = xrep_rtrmap_remove_old_tree(rr); + if (error) + goto out_records; + +out_records: + xrep_rtrmap_teardown(rr); + return error; +} diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 09983899c34164..16da054b2eb0dc 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -465,7 +465,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .setup = xchk_setup_rtrmapbt, .scrub = xchk_rtrmapbt, .has = xfs_has_rtrmapbt, - .repair = xrep_notsupported, + .repair = xrep_rtrmapbt, }, }; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 3b661e4443453c..3f2a8695ef5cb5 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -2285,6 +2285,32 @@ TRACE_EVENT(xrep_calc_ag_resblks_btsize, __entry->rmapbt_sz, __entry->refcbt_sz) ) + +#ifdef CONFIG_XFS_RT +TRACE_EVENT(xrep_calc_rtgroup_resblks_btsize, + TP_PROTO(struct xfs_mount *mp, xfs_rgnumber_t rgno, + xfs_rgblock_t usedlen, xfs_rgblock_t rmapbt_sz), + TP_ARGS(mp, rgno, usedlen, rmapbt_sz), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_rgnumber_t, rgno) + __field(xfs_rgblock_t, usedlen) + __field(xfs_rgblock_t, rmapbt_sz) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rgno = rgno; + __entry->usedlen = usedlen; + __entry->rmapbt_sz = rmapbt_sz; + ), + TP_printk("dev %d:%d rgno 0x%x usedlen %u rmapbt %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->rgno, + __entry->usedlen, + __entry->rmapbt_sz) +); +#endif /* CONFIG_XFS_RT */ + TRACE_EVENT(xrep_reset_counters, TP_PROTO(struct xfs_mount *mp, struct xchk_fscounters *fsc), TP_ARGS(mp, fsc), @@ -3755,6 +3781,37 @@ TRACE_EVENT(xrep_rtbitmap_load_word, (__entry->ondisk_word & __entry->word_mask), __entry->word_mask) ); + +TRACE_EVENT(xrep_rtrmap_found, + TP_PROTO(struct xfs_mount *mp, const struct xfs_rmap_irec *rec), + TP_ARGS(mp, rec), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgblock_t, rgbno) + __field(xfs_extlen_t, len) + __field(uint64_t, owner) + __field(uint64_t, offset) + __field(unsigned int, flags) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->rgbno = rec->rm_startblock; + __entry->len = rec->rm_blockcount; + __entry->owner = rec->rm_owner; + __entry->offset = rec->rm_offset; + __entry->flags = rec->rm_flags; + ), + TP_printk("dev %d:%d rtdev %d:%d rgbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgbno, + __entry->len, + __entry->owner, + __entry->offset, + __entry->flags) +); #endif /* CONFIG_XFS_RT */ #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */ From patchwork Fri Dec 13 01:09:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906226 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57CB72AD02 for ; Fri, 13 Dec 2024 01:09:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052153; cv=none; b=j7J5FRvekJL6hOg2NzrO4p0lrSN8kNhMp9LeKvIQpu3ARpBSStlzwIynasQkiIiU29uckEff7+8Drz9S20ND1UtvKd/66txjA65yC6T39Fj9Rd93LsZPAnloVxt5xmuuKvMYWcF0jW3QugEv9QzxrVNdBjKX69JHLGNUEn55z0s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052153; c=relaxed/simple; bh=gm5L/M+qs7wi97HDtC0rn0lZYYM0+GbWyW0A9IvAljI=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Ty2buMrRFAA7b8H4DDearyfox3yP0R+rmI7OKkn7xTsQTlFGTQEmmbNfXAeqrVCUNmeG5A4vIxd8gLPWYu/wJPyWOclcRcs1QeZzQhG/zIsTEBjoMIo8G0mmK5HeS8ick1XotelcQEFFCgSYIf7MDifiQaaR0q3EJAJt6R6KRsw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Kn5+EMiZ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Kn5+EMiZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D4E79C4CECE; Fri, 13 Dec 2024 01:09:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052152; bh=gm5L/M+qs7wi97HDtC0rn0lZYYM0+GbWyW0A9IvAljI=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Kn5+EMiZFE5SXb7TWf8KgmtWoG6g5x3i2Wo4XSHN1LxKoo6CL3WQ+pAluOHNuGVDC sTABsorUKzn1q4awui+xVNaKOC2llWgOnOK2mkYdt+NDVN3TNqjwrAjktpcTQn9bdX M+bSb447y2NJVLJIu3ZX2ZqiCbVsnmQxmXNXJeiGisO2iDV1K+jeV8ECafseldgezW UlwHC9hG4l0yF/tp01oKy27QfV8mT/lO65nj5gY6eqrVPV8D9GKYtO5j99nIYkwGdt N2G7PyjasWrE9ZhsFNLGJOYQlUOJO7Wj12tTeHNXNlk3OcGdcFnrDlfgOGTXEbkQSS p+Pa2HwI9Yl8A== Date: Thu, 12 Dec 2024 17:09:12 -0800 Subject: [PATCH 33/37] xfs: create a shadow rmap btree during realtime rmap repair From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123883.1181370.13660475990300912157.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create an in-memory btree of rmap records instead of an array. This enables us to do live record collection instead of freezing the fs. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_btree_mem.c | 1 fs/xfs/libxfs/xfs_rmap.c | 3 + fs/xfs/libxfs/xfs_rtrmap_btree.c | 117 ++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 6 ++ fs/xfs/libxfs/xfs_shared.h | 7 ++ fs/xfs/scrub/rtrmap_repair.c | 137 ++++++++++++++++++++++++++------------ fs/xfs/xfs_stats.c | 3 + fs/xfs/xfs_stats.h | 1 8 files changed, 228 insertions(+), 47 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree_mem.c b/fs/xfs/libxfs/xfs_btree_mem.c index df3d613675a15a..f2f7b4305413e9 100644 --- a/fs/xfs/libxfs/xfs_btree_mem.c +++ b/fs/xfs/libxfs/xfs_btree_mem.c @@ -18,6 +18,7 @@ #include "xfs_ag.h" #include "xfs_buf_item.h" #include "xfs_trace.h" +#include "xfs_rtgroup.h" /* Set the root of an in-memory btree. */ void diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 2f0688a57991cc..f8415fd96cc2aa 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -327,7 +327,8 @@ xfs_rmap_check_btrec( struct xfs_btree_cur *cur, const struct xfs_rmap_irec *irec) { - if (xfs_btree_is_rtrmap(cur->bc_ops)) + if (xfs_btree_is_rtrmap(cur->bc_ops) || + xfs_btree_is_mem_rtrmap(cur->bc_ops)) return xfs_rtrmap_check_irec(to_rtg(cur->bc_group), irec); return xfs_rmap_check_irec(to_perag(cur->bc_group), irec); } diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 571a9e1b956099..3cb8f126b9ce16 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -28,6 +28,8 @@ #include "xfs_rtgroup.h" #include "xfs_bmap.h" #include "xfs_health.h" +#include "xfs_buf_mem.h" +#include "xfs_btree_mem.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -542,6 +544,121 @@ xfs_rtrmapbt_init_cursor( return cur; } +#ifdef CONFIG_XFS_BTREE_IN_MEM +/* + * Validate an in-memory realtime rmap btree block. Callers are allowed to + * generate an in-memory btree even if the ondisk feature is not enabled. + */ +static xfs_failaddr_t +xfs_rtrmapbt_mem_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_mount; + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + xfs_failaddr_t fa; + unsigned int level; + unsigned int maxrecs; + + if (!xfs_verify_magic(bp, block->bb_magic)) + return __this_address; + + fa = xfs_btree_fsblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN); + if (fa) + return fa; + + level = be16_to_cpu(block->bb_level); + if (xfs_has_rmapbt(mp)) { + if (level >= mp->m_rtrmap_maxlevels) + return __this_address; + } else { + if (level >= xfs_rtrmapbt_maxlevels_ondisk()) + return __this_address; + } + + maxrecs = xfs_rtrmapbt_maxrecs(mp, XFBNO_BLOCKSIZE, level == 0); + return xfs_btree_memblock_verify(bp, maxrecs); +} + +static void +xfs_rtrmapbt_mem_rw_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa = xfs_rtrmapbt_mem_verify(bp); + + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); +} + +/* skip crc checks on in-memory btrees to save time */ +static const struct xfs_buf_ops xfs_rtrmapbt_mem_buf_ops = { + .name = "xfs_rtrmapbt_mem", + .magic = { 0, cpu_to_be32(XFS_RTRMAP_CRC_MAGIC) }, + .verify_read = xfs_rtrmapbt_mem_rw_verify, + .verify_write = xfs_rtrmapbt_mem_rw_verify, + .verify_struct = xfs_rtrmapbt_mem_verify, +}; + +const struct xfs_btree_ops xfs_rtrmapbt_mem_ops = { + .type = XFS_BTREE_TYPE_MEM, + .geom_flags = XFS_BTGEO_OVERLAPPING, + + .rec_len = sizeof(struct xfs_rmap_rec), + /* Overlapping btree; 2 keys per pointer. */ + .key_len = 2 * sizeof(struct xfs_rmap_key), + .ptr_len = XFS_BTREE_LONG_PTR_LEN, + + .lru_refs = XFS_RMAP_BTREE_REF, + .statoff = XFS_STATS_CALC_INDEX(xs_rtrmap_mem_2), + + .dup_cursor = xfbtree_dup_cursor, + .set_root = xfbtree_set_root, + .alloc_block = xfbtree_alloc_block, + .free_block = xfbtree_free_block, + .get_minrecs = xfbtree_get_minrecs, + .get_maxrecs = xfbtree_get_maxrecs, + .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, + .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, + .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, + .init_ptr_from_cur = xfbtree_init_ptr_from_cur, + .key_diff = xfs_rtrmapbt_key_diff, + .buf_ops = &xfs_rtrmapbt_mem_buf_ops, + .diff_two_keys = xfs_rtrmapbt_diff_two_keys, + .keys_inorder = xfs_rtrmapbt_keys_inorder, + .recs_inorder = xfs_rtrmapbt_recs_inorder, + .keys_contiguous = xfs_rtrmapbt_keys_contiguous, +}; + +/* Create a cursor for an in-memory btree. */ +struct xfs_btree_cur * +xfs_rtrmapbt_mem_cursor( + struct xfs_rtgroup *rtg, + struct xfs_trans *tp, + struct xfbtree *xfbt) +{ + struct xfs_mount *mp = rtg_mount(rtg); + struct xfs_btree_cur *cur; + + cur = xfs_btree_alloc_cursor(mp, tp, &xfs_rtrmapbt_mem_ops, + mp->m_rtrmap_maxlevels, xfs_rtrmapbt_cur_cache); + cur->bc_mem.xfbtree = xfbt; + cur->bc_nlevels = xfbt->nlevels; + cur->bc_group = xfs_group_hold(rtg_group(rtg)); + return cur; +} + +/* Create an in-memory realtime rmap btree. */ +int +xfs_rtrmapbt_mem_init( + struct xfs_mount *mp, + struct xfbtree *xfbt, + struct xfs_buftarg *btp, + xfs_rgnumber_t rgno) +{ + xfbt->owner = rgno; + return xfbtree_init(mp, xfbt, btp, &xfs_rtrmapbt_mem_ops); +} +#endif /* CONFIG_XFS_BTREE_IN_MEM */ + /* * Install a new rt reverse mapping btree root. Caller is responsible for * invalidating and freeing the old btree blocks. diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 6e3dab8c44f7c2..6a2d432b55ad78 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -11,6 +11,7 @@ struct xfs_btree_cur; struct xfs_mount; struct xbtree_ifakeroot; struct xfs_rtgroup; +struct xfbtree; /* rmaps only exist on crc enabled filesystems */ #define XFS_RTRMAP_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN @@ -201,4 +202,9 @@ int xfs_rtrmapbt_init_rtsb(struct xfs_mount *mp, struct xfs_rtgroup *rtg, unsigned long long xfs_rtrmapbt_calc_size(struct xfs_mount *mp, unsigned long long len); +struct xfs_btree_cur *xfs_rtrmapbt_mem_cursor(struct xfs_rtgroup *rtg, + struct xfs_trans *tp, struct xfbtree *xfbtree); +int xfs_rtrmapbt_mem_init(struct xfs_mount *mp, struct xfbtree *xfbtree, + struct xfs_buftarg *btp, xfs_rgnumber_t rgno); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index da23dac22c3f08..960716c387cc2b 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -57,6 +57,7 @@ extern const struct xfs_btree_ops xfs_refcountbt_ops; extern const struct xfs_btree_ops xfs_rmapbt_ops; extern const struct xfs_btree_ops xfs_rmapbt_mem_ops; extern const struct xfs_btree_ops xfs_rtrmapbt_ops; +extern const struct xfs_btree_ops xfs_rtrmapbt_mem_ops; static inline bool xfs_btree_is_bno(const struct xfs_btree_ops *ops) { @@ -98,8 +99,14 @@ static inline bool xfs_btree_is_mem_rmap(const struct xfs_btree_ops *ops) { return ops == &xfs_rmapbt_mem_ops; } + +static inline bool xfs_btree_is_mem_rtrmap(const struct xfs_btree_ops *ops) +{ + return ops == &xfs_rtrmapbt_mem_ops; +} #else # define xfs_btree_is_mem_rmap(...) (false) +# define xfs_btree_is_mem_rtrmap(...) (false) #endif static inline bool xfs_btree_is_rtrmap(const struct xfs_btree_ops *ops) diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c index 60e317725dea86..b376bcc8d1d2ed 100644 --- a/fs/xfs/scrub/rtrmap_repair.c +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -12,6 +12,8 @@ #include "xfs_defer.h" #include "xfs_btree.h" #include "xfs_btree_staging.h" +#include "xfs_buf_mem.h" +#include "xfs_btree_mem.h" #include "xfs_bit.h" #include "xfs_log_format.h" #include "xfs_trans.h" @@ -64,24 +66,13 @@ * We use the 'xrep_rtrmap' prefix for all the rmap functions. */ -/* - * Packed rmap record. The UNWRITTEN flags are hidden in the upper bits of - * offset, just like the on-disk record. - */ -struct xrep_rtrmap_extent { - xfs_rgblock_t startblock; - xfs_extlen_t blockcount; - uint64_t owner; - uint64_t offset; -} __packed; - /* Context for collecting rmaps */ struct xrep_rtrmap { /* new rtrmapbt information */ struct xrep_newbt new_btree; /* rmap records generated from primary metadata */ - struct xfarray *rtrmap_records; + struct xfbtree rtrmap_btree; struct xfs_scrub *sc; @@ -91,8 +82,11 @@ struct xrep_rtrmap { /* inode scan cursor */ struct xchk_iscan iscan; - /* get_records()'s position in the free space record array. */ - xfarray_idx_t array_cur; + /* in-memory btree cursor for the ->get_blocks walk */ + struct xfs_btree_cur *mcur; + + /* Number of records we're staging in the new btree. */ + uint64_t nr_records; }; /* Set us up to repair rt reverse mapping btrees. */ @@ -101,6 +95,14 @@ xrep_setup_rtrmapbt( struct xfs_scrub *sc) { struct xrep_rtrmap *rr; + char *descr; + int error; + + descr = xchk_xfile_rtgroup_descr(sc, "reverse mapping records"); + error = xrep_setup_xfbtree(sc, descr); + kfree(descr); + if (error) + return error; rr = kzalloc(sizeof(struct xrep_rtrmap), XCHK_GFP_FLAGS); if (!rr) @@ -135,11 +137,6 @@ xrep_rtrmap_stash( uint64_t offset, unsigned int flags) { - struct xrep_rtrmap_extent rre = { - .startblock = startblock, - .blockcount = blockcount, - .owner = owner, - }; struct xfs_rmap_irec rmap = { .rm_startblock = startblock, .rm_blockcount = blockcount, @@ -148,6 +145,7 @@ xrep_rtrmap_stash( .rm_flags = flags, }; struct xfs_scrub *sc = rr->sc; + struct xfs_btree_cur *mcur; int error = 0; if (xchk_should_terminate(sc, &error)) @@ -155,8 +153,18 @@ xrep_rtrmap_stash( trace_xrep_rtrmap_found(sc->mp, &rmap); - rre.offset = xfs_rmap_irec_offset_pack(&rmap); - return xfarray_append(rr->rtrmap_records, &rre); + /* Add entry to in-memory btree. */ + mcur = xfs_rtrmapbt_mem_cursor(sc->sr.rtg, sc->tp, &rr->rtrmap_btree); + error = xfs_rmap_map_raw(mcur, &rmap); + xfs_btree_del_cursor(mcur, error); + if (error) + goto out_cancel; + + return xfbtree_trans_commit(&rr->rtrmap_btree, sc->tp); + +out_cancel: + xfbtree_trans_cancel(&rr->rtrmap_btree, sc->tp); + return error; } /* Finding all file and bmbt extents. */ @@ -395,6 +403,24 @@ xrep_rtrmap_scan_ag( return error; } +/* Count and check all collected records. */ +STATIC int +xrep_rtrmap_check_record( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_rtrmap *rr = priv; + int error; + + error = xrep_rtrmap_check_mapping(rr->sc, rec); + if (error) + return error; + + rr->nr_records++; + return 0; +} + /* Generate all the reverse-mappings for the realtime device. */ STATIC int xrep_rtrmap_find_rmaps( @@ -403,6 +429,7 @@ xrep_rtrmap_find_rmaps( struct xfs_scrub *sc = rr->sc; struct xfs_perag *pag = NULL; struct xfs_inode *ip; + struct xfs_btree_cur *mcur; int error; /* Generate rmaps for the realtime superblock */ @@ -468,7 +495,19 @@ xrep_rtrmap_find_rmaps( } } - return 0; + /* + * Now that we have everything locked again, we need to count the + * number of rmap records stashed in the btree. This should reflect + * all actively-owned rt files in the filesystem. At the same time, + * check all our records before we start building a new btree, which + * requires the rtbitmap lock. + */ + mcur = xfs_rtrmapbt_mem_cursor(rr->sc->sr.rtg, NULL, &rr->rtrmap_btree); + rr->nr_records = 0; + error = xfs_rmap_query_all(mcur, xrep_rtrmap_check_record, rr); + xfs_btree_del_cursor(mcur, error); + + return error; } /* Building the new rtrmap btree. */ @@ -482,29 +521,25 @@ xrep_rtrmap_get_records( unsigned int nr_wanted, void *priv) { - struct xrep_rtrmap_extent rec; - struct xfs_rmap_irec *irec = &cur->bc_rec.r; struct xrep_rtrmap *rr = priv; union xfs_btree_rec *block_rec; unsigned int loaded; int error; for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { - error = xfarray_load_next(rr->rtrmap_records, &rr->array_cur, - &rec); + int stat = 0; + + error = xfs_btree_increment(rr->mcur, 0, &stat); if (error) return error; - - irec->rm_startblock = rec.startblock; - irec->rm_blockcount = rec.blockcount; - irec->rm_owner = rec.owner; - - if (xfs_rmap_irec_offset_unpack(rec.offset, irec) != NULL) + if (!stat) return -EFSCORRUPTED; - error = xrep_rtrmap_check_mapping(rr->sc, irec); + error = xfs_rmap_get_rec(rr->mcur, &cur->bc_rec.r, &stat); if (error) return error; + if (!stat) + return -EFSCORRUPTED; block_rec = xfs_btree_rec_addr(cur, idx, block); cur->bc_ops->init_rec_from_cur(cur, block_rec); @@ -549,7 +584,6 @@ xrep_rtrmap_build_new_tree( struct xfs_scrub *sc = rr->sc; struct xfs_rtgroup *rtg = sc->sr.rtg; struct xfs_btree_cur *rmap_cur; - uint64_t nr_records; int error; /* @@ -569,11 +603,9 @@ xrep_rtrmap_build_new_tree( rmap_cur = xfs_rtrmapbt_init_cursor(NULL, rtg); xfs_btree_stage_ifakeroot(rmap_cur, &rr->new_btree.ifake); - nr_records = xfarray_length(rr->rtrmap_records); - /* Compute how many blocks we'll need for the rmaps collected. */ error = xfs_btree_bload_compute_geometry(rmap_cur, - &rr->new_btree.bload, nr_records); + &rr->new_btree.bload, rr->nr_records); if (error) goto err_cur; @@ -599,12 +631,20 @@ xrep_rtrmap_build_new_tree( if (error) goto err_cur; + /* + * Create a cursor to the in-memory btree so that we can bulk load the + * new btree. + */ + rr->mcur = xfs_rtrmapbt_mem_cursor(sc->sr.rtg, NULL, &rr->rtrmap_btree); + error = xfs_btree_goto_left_edge(rr->mcur); + if (error) + goto err_mcur; + /* Add all observed rmap records. */ rr->new_btree.ifake.if_fork->if_format = XFS_DINODE_FMT_META_BTREE; - rr->array_cur = XFARRAY_CURSOR_INIT; error = xfs_btree_bload(rmap_cur, &rr->new_btree.bload, rr); if (error) - goto err_cur; + goto err_mcur; /* * Install the new rtrmap btree in the inode. After this point the old @@ -614,6 +654,14 @@ xrep_rtrmap_build_new_tree( xfs_rtrmapbt_commit_staged_btree(rmap_cur, sc->tp); xrep_inode_set_nblocks(rr->sc, rr->new_btree.ifake.if_blocks); xfs_btree_del_cursor(rmap_cur, 0); + xfs_btree_del_cursor(rr->mcur, 0); + rr->mcur = NULL; + + /* + * Now that we've written the new btree to disk, we don't need to keep + * updating the in-memory btree. Abort the scan to stop live updates. + */ + xchk_iscan_abort(&rr->iscan); /* Dispose of any unused blocks and the accounting information. */ error = xrep_newbt_commit(&rr->new_btree); @@ -622,6 +670,8 @@ xrep_rtrmap_build_new_tree( return xrep_roll_trans(sc); +err_mcur: + xfs_btree_del_cursor(rr->mcur, error); err_cur: xfs_btree_del_cursor(rmap_cur, error); xrep_newbt_cancel(&rr->new_btree); @@ -658,16 +708,13 @@ xrep_rtrmap_setup_scan( struct xrep_rtrmap *rr) { struct xfs_scrub *sc = rr->sc; - char *descr; int error; xfsb_bitmap_init(&rr->old_rtrmapbt_blocks); /* Set up some storage */ - descr = xchk_xfile_rtgroup_descr(sc, "reverse mapping records"); - error = xfarray_create(descr, 0, sizeof(struct xrep_rtrmap_extent), - &rr->rtrmap_records); - kfree(descr); + error = xfs_rtrmapbt_mem_init(sc->mp, &rr->rtrmap_btree, sc->xmbtp, + rtg_rgno(sc->sr.rtg)); if (error) goto out_bitmap; @@ -686,7 +733,7 @@ xrep_rtrmap_teardown( struct xrep_rtrmap *rr) { xchk_iscan_teardown(&rr->iscan); - xfarray_destroy(rr->rtrmap_records); + xfbtree_destroy(&rr->rtrmap_btree); xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); } diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c index f94fb70b524ffb..b7f2988bc03bb7 100644 --- a/fs/xfs/xfs_stats.c +++ b/fs/xfs/xfs_stats.c @@ -53,7 +53,8 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf) { "refcntbt", xfsstats_offset(xs_rmap_mem_2) }, { "rmapbt_mem", xfsstats_offset(xs_rcbag_2) }, { "rcbagbt", xfsstats_offset(xs_rtrmap_2) }, - { "rtrmapbt", xfsstats_offset(xs_qm_dqreclaims)}, + { "rtrmapbt", xfsstats_offset(xs_rtrmap_mem_2)}, + { "rtrmapbt_mem", xfsstats_offset(xs_qm_dqreclaims)}, /* we print both series of quota information together */ { "qm", xfsstats_offset(xs_xstrat_bytes)}, }; diff --git a/fs/xfs/xfs_stats.h b/fs/xfs/xfs_stats.h index 05dc69c6d94906..9c47de5dff2dd6 100644 --- a/fs/xfs/xfs_stats.h +++ b/fs/xfs/xfs_stats.h @@ -128,6 +128,7 @@ struct __xfsstats { uint32_t xs_rmap_mem_2[__XBTS_MAX]; uint32_t xs_rcbag_2[__XBTS_MAX]; uint32_t xs_rtrmap_2[__XBTS_MAX]; + uint32_t xs_rtrmap_mem_2[__XBTS_MAX]; uint32_t xs_qm_dqreclaims; uint32_t xs_qm_dqreclaim_misses; uint32_t xs_qm_dquot_dups; From patchwork Fri Dec 13 01:09:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906227 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08B80629 for ; Fri, 13 Dec 2024 01:09:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052169; cv=none; b=FWeOWfA3HJvsOvBSaOf3CsZ5KHvkXIizDWsGL0wDX8at8/HwwZoY1XZ+s1KpomojtJHPpdHJ1MqqTpUEdoj8tfknQv8A0t03B8O+BUcTp8ImnckW1nevsmJss5wTgwEWyF/dhQypXaNkF3MziO3rXNXW2iZQtz+cNFDRH7wWDfw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052169; c=relaxed/simple; bh=ZbJx1M9JAY8n2e11arruWa/XZ6pMmwlV2jUQmuTVT7M=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=EpeuT7NGr8slItHwAGEvmSfngz/4vmu8W9NnG2ZYky1UrsiETM58lTP35TBXrMN9JY2tOuv83CSpfpQoguiEBTqtGZ7PruBNW9g+99btZ8JJcrE3WS6d+b/aUzsx9FAjhtd/nDOccPP0n6KidzLKDagI+N+a1ZhvFuJtVPLsGfA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gyj5kX2X; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gyj5kX2X" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 86286C4CECE; Fri, 13 Dec 2024 01:09:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052168; bh=ZbJx1M9JAY8n2e11arruWa/XZ6pMmwlV2jUQmuTVT7M=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=gyj5kX2XCmfU3b1ZTs+/hvXXat6uaA56XDPQiBBAZBmOEECohNYtqxSh0CMx+W7o0 5mAOZE532rz7SHv6G/LD3lpGGISs9vZQI2U07zm6ILrA7NqFFE3BBsOdFPIridPtsM LefKyxba4hOW3fMS4CEX/V6VX1V12J/WPtO7nB1xpGnI+Zm5HPYB6wbwS+dK4jZwBP v2YFE7ZAavdvae9cKljO/cNXoM8GJ1TFz9mVaNYv7vIQAoIhNil6I1vZnvlXqGL/zo l4au7HJhDr+pCHeZ/2zWQAYQOBjiu6f8e+oA8c9EQb/JwKjkTRKYaOeVqPP8FToAIn Sw9E23VAwlWDw== Date: Thu, 12 Dec 2024 17:09:28 -0800 Subject: [PATCH 34/37] xfs: hook live realtime rmap operations during a repair operation From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123900.1181370.14934453235625748262.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Hook the regular realtime rmap code when an rtrmapbt repair operation is running so that we can unlock the AGF buffer to scan the filesystem and keep the in-memory btree up to date during the scan. Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig --- fs/xfs/scrub/rmap_repair.c | 2 - fs/xfs/scrub/rtrmap_repair.c | 131 +++++++++++++++++++++++++++++++++++++++++- fs/xfs/scrub/trace.h | 17 ++++- 3 files changed, 140 insertions(+), 10 deletions(-) diff --git a/fs/xfs/scrub/rmap_repair.c b/fs/xfs/scrub/rmap_repair.c index 91c17feb49768b..c2c7b76cc25ab8 100644 --- a/fs/xfs/scrub/rmap_repair.c +++ b/fs/xfs/scrub/rmap_repair.c @@ -1614,7 +1614,7 @@ xrep_rmapbt_live_update( if (!xrep_rmapbt_want_live_update(&rr->iscan, &p->oinfo)) goto out_unlock; - trace_xrep_rmap_live_update(rr->sc->sa.pag, action, p); + trace_xrep_rmap_live_update(pag_group(rr->sc->sa.pag), action, p); error = xrep_trans_alloc_hook_dummy(mp, &txcookie, &tp); if (error) diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c index b376bcc8d1d2ed..49de8bc2dd17f5 100644 --- a/fs/xfs/scrub/rtrmap_repair.c +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -71,6 +71,9 @@ struct xrep_rtrmap { /* new rtrmapbt information */ struct xrep_newbt new_btree; + /* lock for the xfbtree and xfile */ + struct mutex lock; + /* rmap records generated from primary metadata */ struct xfbtree rtrmap_btree; @@ -79,6 +82,9 @@ struct xrep_rtrmap { /* bitmap of old rtrmapbt blocks */ struct xfsb_bitmap old_rtrmapbt_blocks; + /* Hooks into rtrmap update code. */ + struct xfs_rmap_hook rhook; + /* inode scan cursor */ struct xchk_iscan iscan; @@ -98,6 +104,8 @@ xrep_setup_rtrmapbt( char *descr; int error; + xchk_fsgates_enable(sc, XCHK_FSGATES_RMAP); + descr = xchk_xfile_rtgroup_descr(sc, "reverse mapping records"); error = xrep_setup_xfbtree(sc, descr); kfree(descr); @@ -151,19 +159,31 @@ xrep_rtrmap_stash( if (xchk_should_terminate(sc, &error)) return error; + if (xchk_iscan_aborted(&rr->iscan)) + return -EFSCORRUPTED; + trace_xrep_rtrmap_found(sc->mp, &rmap); /* Add entry to in-memory btree. */ + mutex_lock(&rr->lock); mcur = xfs_rtrmapbt_mem_cursor(sc->sr.rtg, sc->tp, &rr->rtrmap_btree); error = xfs_rmap_map_raw(mcur, &rmap); xfs_btree_del_cursor(mcur, error); if (error) goto out_cancel; - return xfbtree_trans_commit(&rr->rtrmap_btree, sc->tp); + error = xfbtree_trans_commit(&rr->rtrmap_btree, sc->tp); + if (error) + goto out_abort; + + mutex_unlock(&rr->lock); + return 0; out_cancel: xfbtree_trans_cancel(&rr->rtrmap_btree, sc->tp); +out_abort: + xchk_iscan_abort(&rr->iscan); + mutex_unlock(&rr->lock); return error; } @@ -486,6 +506,13 @@ xrep_rtrmap_find_rmaps( if (error) return error; + /* + * If a hook failed to update the in-memory btree, we lack the data to + * continue the repair. + */ + if (xchk_iscan_aborted(&rr->iscan)) + return -EFSCORRUPTED; + /* Scan for old rtrmap blocks. */ while ((pag = xfs_perag_next(sc->mp, pag))) { error = xrep_rtrmap_scan_ag(rr, pag); @@ -702,6 +729,83 @@ xrep_rtrmap_remove_old_tree( return xrep_reset_metafile_resv(rr->sc); } +static inline bool +xrep_rtrmapbt_want_live_update( + struct xchk_iscan *iscan, + const struct xfs_owner_info *oi) +{ + if (xchk_iscan_aborted(iscan)) + return false; + + /* + * We scanned the CoW staging extents before we started the iscan, so + * we need all the updates. + */ + if (XFS_RMAP_NON_INODE_OWNER(oi->oi_owner)) + return true; + + /* Ignore updates to files that the scanner hasn't visited yet. */ + return xchk_iscan_want_live_update(iscan, oi->oi_owner); +} + +/* + * Apply a rtrmapbt update from the regular filesystem into our shadow btree. + * We're running from the thread that owns the rtrmap ILOCK and is generating + * the update, so we must be careful about which parts of the struct + * xrep_rtrmap that we change. + */ +static int +xrep_rtrmapbt_live_update( + struct notifier_block *nb, + unsigned long action, + void *data) +{ + struct xfs_rmap_update_params *p = data; + struct xrep_rtrmap *rr; + struct xfs_mount *mp; + struct xfs_btree_cur *mcur; + struct xfs_trans *tp; + void *txcookie; + int error; + + rr = container_of(nb, struct xrep_rtrmap, rhook.rmap_hook.nb); + mp = rr->sc->mp; + + if (!xrep_rtrmapbt_want_live_update(&rr->iscan, &p->oinfo)) + goto out_unlock; + + trace_xrep_rmap_live_update(rtg_group(rr->sc->sr.rtg), action, p); + + error = xrep_trans_alloc_hook_dummy(mp, &txcookie, &tp); + if (error) + goto out_abort; + + mutex_lock(&rr->lock); + mcur = xfs_rtrmapbt_mem_cursor(rr->sc->sr.rtg, tp, &rr->rtrmap_btree); + error = __xfs_rmap_finish_intent(mcur, action, p->startblock, + p->blockcount, &p->oinfo, p->unwritten); + xfs_btree_del_cursor(mcur, error); + if (error) + goto out_cancel; + + error = xfbtree_trans_commit(&rr->rtrmap_btree, tp); + if (error) + goto out_cancel; + + xrep_trans_cancel_hook_dummy(&txcookie, tp); + mutex_unlock(&rr->lock); + return NOTIFY_DONE; + +out_cancel: + xfbtree_trans_cancel(&rr->rtrmap_btree, tp); + xrep_trans_cancel_hook_dummy(&txcookie, tp); +out_abort: + xchk_iscan_abort(&rr->iscan); + mutex_unlock(&rr->lock); +out_unlock: + return NOTIFY_DONE; +} + /* Set up the filesystem scan components. */ STATIC int xrep_rtrmap_setup_scan( @@ -710,6 +814,7 @@ xrep_rtrmap_setup_scan( struct xfs_scrub *sc = rr->sc; int error; + mutex_init(&rr->lock); xfsb_bitmap_init(&rr->old_rtrmapbt_blocks); /* Set up some storage */ @@ -720,10 +825,26 @@ xrep_rtrmap_setup_scan( /* Retry iget every tenth of a second for up to 30 seconds. */ xchk_iscan_start(sc, 30000, 100, &rr->iscan); + + /* + * Hook into live rtrmap operations so that we can update our in-memory + * btree to reflect live changes on the filesystem. Since we drop the + * rtrmap ILOCK to scan all the inodes, we need this piece to avoid + * installing a stale btree. + */ + ASSERT(sc->flags & XCHK_FSGATES_RMAP); + xfs_rmap_hook_setup(&rr->rhook, xrep_rtrmapbt_live_update); + error = xfs_rmap_hook_add(rtg_group(sc->sr.rtg), &rr->rhook); + if (error) + goto out_iscan; return 0; +out_iscan: + xchk_iscan_teardown(&rr->iscan); + xfbtree_destroy(&rr->rtrmap_btree); out_bitmap: xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); + mutex_destroy(&rr->lock); return error; } @@ -732,9 +853,14 @@ STATIC void xrep_rtrmap_teardown( struct xrep_rtrmap *rr) { + struct xfs_scrub *sc = rr->sc; + + xchk_iscan_abort(&rr->iscan); + xfs_rmap_hook_del(rtg_group(sc->sr.rtg), &rr->rhook); xchk_iscan_teardown(&rr->iscan); xfbtree_destroy(&rr->rtrmap_btree); xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); + mutex_destroy(&rr->lock); } /* Repair the realtime rmap btree. */ @@ -745,9 +871,6 @@ xrep_rtrmapbt( struct xrep_rtrmap *rr = sc->buf; int error; - /* Functionality is not yet complete. */ - return xrep_notsupported(sc); - /* Make sure any problems with the fork are fixed. */ error = xrep_metadata_inode_forks(sc); if (error) diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 3f2a8695ef5cb5..fb86b746bc174a 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -41,6 +41,9 @@ struct xchk_dirtree_outcomes; TRACE_DEFINE_ENUM(XFS_REFC_DOMAIN_SHARED); TRACE_DEFINE_ENUM(XFS_REFC_DOMAIN_COW); +TRACE_DEFINE_ENUM(XG_TYPE_AG); +TRACE_DEFINE_ENUM(XG_TYPE_RTG); + TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_PROBE); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_SB); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_AGF); @@ -2709,11 +2712,12 @@ DEFINE_SCRUB_NLINKS_DIFF_EVENT(xrep_nlinks_update_inode); DEFINE_SCRUB_NLINKS_DIFF_EVENT(xrep_nlinks_unfixable_inode); TRACE_EVENT(xrep_rmap_live_update, - TP_PROTO(const struct xfs_perag *pag, unsigned int op, + TP_PROTO(const struct xfs_group *xg, unsigned int op, const struct xfs_rmap_update_params *p), - TP_ARGS(pag, op, p), + TP_ARGS(xg, op, p), TP_STRUCT__entry( __field(dev_t, dev) + __field(enum xfs_group_type, type) __field(xfs_agnumber_t, agno) __field(unsigned int, op) __field(xfs_agblock_t, agbno) @@ -2723,8 +2727,9 @@ TRACE_EVENT(xrep_rmap_live_update, __field(unsigned int, flags) ), TP_fast_assign( - __entry->dev = pag_mount(pag)->m_super->s_dev; - __entry->agno = pag_agno(pag); + __entry->dev = xg->xg_mount->m_super->s_dev; + __entry->type = xg->xg_type; + __entry->agno = xg->xg_gno; __entry->op = op; __entry->agbno = p->startblock; __entry->len = p->blockcount; @@ -2733,10 +2738,12 @@ TRACE_EVENT(xrep_rmap_live_update, if (p->unwritten) __entry->flags |= XFS_RMAP_UNWRITTEN; ), - TP_printk("dev %d:%d agno 0x%x op %d agbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", + TP_printk("dev %d:%d %sno 0x%x op %d %sbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), + __print_symbolic(__entry->type, XG_TYPE_STRINGS), __entry->agno, __entry->op, + __print_symbolic(__entry->type, XG_TYPE_STRINGS), __entry->agbno, __entry->len, __entry->owner, From patchwork Fri Dec 13 01:09:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906228 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EF4B53BE for ; Fri, 13 Dec 2024 01:09:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052184; cv=none; b=CWdG1+Yzymonk/6pOCiapUF3jwebQxUJl3XQj7C0KPjWT7YsNlPmVALteGq7OvRPMMIGhowKkja+owaD/A33oV8N81iiZmlQAxF+E+UTi8owpELz8TX4Q/U+oeRQVZWqKZvDE36dMwpCUneZZefu6oyDOfX8OjQ9LNXU9Feu6Wg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052184; c=relaxed/simple; bh=mTlsMr36vrI4gtuF3v/blsk4bAnF7h04g4vXPxm+Ty4=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QOYvlsTdalJHuqRjlzmZkSiUru0sLPIgecf/BcrVWCQNHJwaQ0cGP7IqE0+nZYpPqFvaXdd46jsALWhnj1/w780MtxSum/0IWekzWbyRnR3rfJNC81mcki2oEEQZGT83AXJmhgWXooGUCQ3DRDVnT72aWHDJvo352wJaKeU7rfY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XKbH4YgV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XKbH4YgV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3A92CC4CECE; Fri, 13 Dec 2024 01:09:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052184; bh=mTlsMr36vrI4gtuF3v/blsk4bAnF7h04g4vXPxm+Ty4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=XKbH4YgVGFUMpZ8hdPqZFmfXiVKhRfnuL7W+WQcOYba/J+AQStf5uYf2/viWyX1fj 3YuEcy/M+dHI/8p3HeukR5i3wjdBLnNvdiS4fg7BNEvLFhIN/pFqbdAVImxJ2c7//v z3soHuAvznJMnH3Pyi4UHBfClY4NulYn+IJmyvGTY5hWZXUjNk8dNEImHovpBa6lV+ DwZOTIeXfmWmRwfoZ1Y7ve+6YAgzcv4vGZwXZAxGSU163OnI+45LKxkqkT3IUE9qDs gyKUJC4lI2GEe3qD1aWJKvtOEHKoxiIw3EZfW1oU01bDMm7C/6oiyxVoIIhPDqJT8A 2xL/DjxWc8A1Q== Date: Thu, 12 Dec 2024 17:09:43 -0800 Subject: [PATCH 35/37] xfs: clean up device translation in xfs_dax_notify_failure From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123917.1181370.5005272756259746108.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Move all the dax_dev -> buftarg and range translation code to a separate function so that xfs_dax_notify_failure will be more straightforward. Also make a proper header file for the dax holder ops. Signed-off-by: "Darrick J. Wong" --- fs/xfs/xfs_buf.c | 1 fs/xfs/xfs_notify_failure.c | 115 ++++++++++++++++++++++++++++++------------- fs/xfs/xfs_notify_failure.h | 11 ++++ fs/xfs/xfs_super.h | 1 4 files changed, 91 insertions(+), 37 deletions(-) create mode 100644 fs/xfs/xfs_notify_failure.h diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index aa63b8efd78228..6f313fbf766910 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -22,6 +22,7 @@ #include "xfs_error.h" #include "xfs_ag.h" #include "xfs_buf_mem.h" +#include "xfs_notify_failure.h" struct kmem_cache *xfs_buf_cache; diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c index fa50e5308292d3..da07d0efc5a2a0 100644 --- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -19,11 +19,18 @@ #include "xfs_rtalloc.h" #include "xfs_trans.h" #include "xfs_ag.h" +#include "xfs_notify_failure.h" #include #include #include +enum xfs_failed_device { + XFS_FAILED_DATADEV, + XFS_FAILED_LOGDEV, + XFS_FAILED_RTDEV, +}; + struct xfs_failure_info { xfs_agblock_t startblock; xfs_extlen_t blockcount; @@ -256,54 +263,38 @@ xfs_dax_notify_ddev_failure( } static int -xfs_dax_notify_failure( +xfs_dax_translate_range( + struct xfs_mount *mp, struct dax_device *dax_dev, u64 offset, u64 len, - int mf_flags) + enum xfs_failed_device *fdev, + xfs_daddr_t *daddr, + uint64_t *bbcount) { - struct xfs_mount *mp = dax_holder(dax_dev); + struct xfs_buftarg *btp; u64 ddev_start; u64 ddev_end; - if (!(mp->m_super->s_flags & SB_BORN)) { - xfs_warn(mp, "filesystem is not ready for notify_failure()!"); - return -EIO; - } - if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_daxdev == dax_dev) { - xfs_debug(mp, - "notify_failure() not supported on realtime device!"); - return -EOPNOTSUPP; + *fdev = XFS_FAILED_RTDEV; + btp = mp->m_rtdev_targp; + } else if (mp->m_logdev_targp != mp->m_ddev_targp && + mp->m_logdev_targp->bt_daxdev == dax_dev) { + *fdev = XFS_FAILED_LOGDEV; + btp = mp->m_logdev_targp; + } else { + *fdev = XFS_FAILED_DATADEV; + btp = mp->m_ddev_targp; } - if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev && - mp->m_logdev_targp != mp->m_ddev_targp) { - /* - * In the pre-remove case the failure notification is attempting - * to trigger a force unmount. The expectation is that the - * device is still present, but its removal is in progress and - * can not be cancelled, proceed with accessing the log device. - */ - if (mf_flags & MF_MEM_PRE_REMOVE) - return 0; - xfs_err(mp, "ondisk log corrupt, shutting down fs!"); - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK); - return -EFSCORRUPTED; - } - - if (!xfs_has_rmapbt(mp)) { - xfs_debug(mp, "notify_failure() needs rmapbt enabled!"); - return -EOPNOTSUPP; - } - - ddev_start = mp->m_ddev_targp->bt_dax_part_off; - ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1; + ddev_start = btp->bt_dax_part_off; + ddev_end = ddev_start + bdev_nr_bytes(btp->bt_bdev) - 1; /* Notify failure on the whole device. */ if (offset == 0 && len == U64_MAX) { offset = ddev_start; - len = bdev_nr_bytes(mp->m_ddev_targp->bt_bdev); + len = bdev_nr_bytes(btp->bt_bdev); } /* Ignore the range out of filesystem area */ @@ -322,8 +313,60 @@ xfs_dax_notify_failure( if (offset + len - 1 > ddev_end) len = ddev_end - offset + 1; - return xfs_dax_notify_ddev_failure(mp, BTOBB(offset), BTOBB(len), - mf_flags); + *daddr = BTOBB(offset); + *bbcount = BTOBB(len); + return 0; +} + +static int +xfs_dax_notify_failure( + struct dax_device *dax_dev, + u64 offset, + u64 len, + int mf_flags) +{ + struct xfs_mount *mp = dax_holder(dax_dev); + enum xfs_failed_device fdev; + xfs_daddr_t daddr; + uint64_t bbcount; + int error; + + if (!(mp->m_super->s_flags & SB_BORN)) { + xfs_warn(mp, "filesystem is not ready for notify_failure()!"); + return -EIO; + } + + error = xfs_dax_translate_range(mp, dax_dev, offset, len, &fdev, + &daddr, &bbcount); + if (error) + return error; + + if (fdev == XFS_FAILED_RTDEV) { + xfs_debug(mp, + "notify_failure() not supported on realtime device!"); + return -EOPNOTSUPP; + } + + if (fdev == XFS_FAILED_LOGDEV) { + /* + * In the pre-remove case the failure notification is attempting + * to trigger a force unmount. The expectation is that the + * device is still present, but its removal is in progress and + * can not be cancelled, proceed with accessing the log device. + */ + if (mf_flags & MF_MEM_PRE_REMOVE) + return 0; + xfs_err(mp, "ondisk log corrupt, shutting down fs!"); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK); + return -EFSCORRUPTED; + } + + if (!xfs_has_rmapbt(mp)) { + xfs_debug(mp, "notify_failure() needs rmapbt enabled!"); + return -EOPNOTSUPP; + } + + return xfs_dax_notify_ddev_failure(mp, daddr, bbcount, mf_flags); } const struct dax_holder_operations xfs_dax_holder_operations = { diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h new file mode 100644 index 00000000000000..8d08ec29dd2949 --- /dev/null +++ b/fs/xfs/xfs_notify_failure.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2024 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_NOTIFY_FAILURE_H__ +#define __XFS_NOTIFY_FAILURE_H__ + +extern const struct dax_holder_operations xfs_dax_holder_operations; + +#endif /* __XFS_NOTIFY_FAILURE_H__ */ diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h index 302e6e5d6c7e20..c0e85c1e42f27d 100644 --- a/fs/xfs/xfs_super.h +++ b/fs/xfs/xfs_super.h @@ -92,7 +92,6 @@ extern xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount *, extern const struct export_operations xfs_export_operations; extern const struct quotactl_ops xfs_quotactl_operations; -extern const struct dax_holder_operations xfs_dax_holder_operations; extern void xfs_reinit_percpu_counters(struct xfs_mount *mp); From patchwork Fri Dec 13 01:09:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906229 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE201DDBE for ; Fri, 13 Dec 2024 01:09:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052200; cv=none; b=TRMY3LpYnKXHnDwm2/6UAl74LD5DtLLthY4+JxbfI/X3b09zOaM7tUJVkYBcnyhLK4EYRjcGNVcmEXVPcWu5PxmTYWBfVwjXnp3fgB3hDDoNDdl7cqsDN3naciDWRp5+g8nBsmGVAhC42Ovrw+f3wmSa3tmYKMmT5MwjvRT4PYA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052200; c=relaxed/simple; bh=hdPatOOx+QN4Bn3MzKM+luLCc3AZdNPZ9NZOLueRpf0=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=f3s8uwdREpAOWPiUfbB/94HI1KMmi+Xb6Q/LyIKmWU6bVXFle7EwsL+onoAMs4EF0pO9LHJxwM7FOLBorojJjuDRTxEOOliMURjl3j/BtcK+XWSg0FREVLujmF96OPL50iG+k6255f2nsFo4W54giAX5VLjvYEE+TrO0cWp43OI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=EcC6JGJD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="EcC6JGJD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3547C4CECE; Fri, 13 Dec 2024 01:09:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052199; bh=hdPatOOx+QN4Bn3MzKM+luLCc3AZdNPZ9NZOLueRpf0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=EcC6JGJDi3Hq7s7o3EWV5aevqZXClZloZvtGXVQZBAn3S5M7O8CAl08OrrljtzQm3 6QL0s8rcINrY4wJdmtZndzLz8JrCNw+ftBJTzCT0v1ke9VbobnNrQ7lNMrNM/XW9TK omuTN2QAe31BPu7mFwFpNv2Xt7tdAIX/3IRLluX0pr2KWpFstgDC8FI6up2u9d794a JcatDBFjc3ZtDpGRM7XuhklmGbOrTYLqzRumTJKj3zmSZ6ZYCX6DlXinf5JvUe6DRh xdB6mZPvWqCrYgwZC7lOZj/qwtQSrrpJmAiGIKJWETEUTDL56yLUmEAKgZJH64YWDt 3GRALlbPVbjTQ== Date: Thu, 12 Dec 2024 17:09:59 -0800 Subject: [PATCH 36/37] xfs: react to fsdax failure notifications on the rt device From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123935.1181370.7404101961471776856.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now that we have reverse mapping for the realtime device, use the information to kill processes that have mappings to bad pmem. Signed-off-by: "Darrick J. Wong" --- fs/xfs/xfs_notify_failure.c | 114 +++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 108 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c index da07d0efc5a2a0..96d39e475d5a86 100644 --- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -20,6 +20,8 @@ #include "xfs_trans.h" #include "xfs_ag.h" #include "xfs_notify_failure.h" +#include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" #include #include @@ -262,6 +264,109 @@ xfs_dax_notify_ddev_failure( return error; } +#ifdef CONFIG_XFS_RT +static int +xfs_dax_notify_rtdev_failure( + struct xfs_mount *mp, + xfs_daddr_t daddr, + xfs_daddr_t bblen, + int mf_flags) +{ + struct xfs_failure_info notify = { .mf_flags = mf_flags }; + struct xfs_trans *tp = NULL; + struct xfs_btree_cur *cur = NULL; + int error = 0; + bool kernel_frozen = false; + xfs_rtblock_t rtbno = xfs_daddr_to_rtb(mp, daddr); + xfs_rtblock_t end_rtbno = xfs_daddr_to_rtb(mp, + daddr + bblen - 1); + xfs_rgnumber_t rgno = xfs_rtb_to_rgno(mp, rtbno); + xfs_rgnumber_t end_rgno = xfs_rtb_to_rgno(mp, end_rtbno); + xfs_rgblock_t start_rgbno = xfs_rtb_to_rgbno(mp, rtbno); + + if (mf_flags & MF_MEM_PRE_REMOVE) { + xfs_info(mp, "Device is about to be removed!"); + /* + * Freeze fs to prevent new mappings from being created. + * - Keep going on if others already hold the kernel forzen. + * - Keep going on if other errors too because this device is + * starting to fail. + * - If kernel frozen state is hold successfully here, thaw it + * here as well at the end. + */ + kernel_frozen = xfs_dax_notify_failure_freeze(mp) == 0; + } + + error = xfs_trans_alloc_empty(mp, &tp); + if (error) + goto out; + + for (; rgno <= end_rgno; rgno++) { + struct xfs_rmap_irec ri_low = { + .rm_startblock = start_rgbno, + }; + struct xfs_rmap_irec ri_high; + struct xfs_rtgroup *rtg; + xfs_rgblock_t range_rgend; + + rtg = xfs_rtgroup_get(mp, rgno); + if (!rtg) + break; + + xfs_rtgroup_lock(rtg, XFS_RTGLOCK_RMAP); + cur = xfs_rtrmapbt_init_cursor(tp, rtg); + + /* + * Set the rmap range from ri_low to ri_high, which represents + * a [start, end] where we looking for the files or metadata. + */ + memset(&ri_high, 0xFF, sizeof(ri_high)); + if (rgno == end_rgno) + ri_high.rm_startblock = xfs_rtb_to_rgbno(mp, end_rtbno); + + range_rgend = min(rtg->rtg_group.xg_block_count - 1, + ri_high.rm_startblock); + notify.startblock = ri_low.rm_startblock; + notify.blockcount = range_rgend + 1 - ri_low.rm_startblock; + + error = xfs_rmap_query_range(cur, &ri_low, &ri_high, + xfs_dax_failure_fn, ¬ify); + xfs_btree_del_cursor(cur, error); + xfs_rtgroup_unlock(rtg, XFS_RTGLOCK_RMAP); + xfs_rtgroup_put(rtg); + if (error) + break; + + start_rgbno = 0; + } + + xfs_trans_cancel(tp); + + /* + * Shutdown fs from a force umount in pre-remove case which won't fail, + * so errors can be ignored. Otherwise, shutdown the filesystem with + * CORRUPT flag if error occured or notify.want_shutdown was set during + * RMAP querying. + */ + if (mf_flags & MF_MEM_PRE_REMOVE) + xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT); + else if (error || notify.want_shutdown) { + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK); + if (!error) + error = -EFSCORRUPTED; + } + +out: + /* Thaw the fs if it has been frozen before. */ + if (mf_flags & MF_MEM_PRE_REMOVE) + xfs_dax_notify_failure_thaw(mp, kernel_frozen); + + return error; +} +#else +# define xfs_dax_notify_rtdev_failure(...) (-ENOSYS) +#endif + static int xfs_dax_translate_range( struct xfs_mount *mp, @@ -341,12 +446,6 @@ xfs_dax_notify_failure( if (error) return error; - if (fdev == XFS_FAILED_RTDEV) { - xfs_debug(mp, - "notify_failure() not supported on realtime device!"); - return -EOPNOTSUPP; - } - if (fdev == XFS_FAILED_LOGDEV) { /* * In the pre-remove case the failure notification is attempting @@ -366,6 +465,9 @@ xfs_dax_notify_failure( return -EOPNOTSUPP; } + if (fdev == XFS_FAILED_RTDEV) + return xfs_dax_notify_rtdev_failure(mp, daddr, bbcount, + mf_flags); return xfs_dax_notify_ddev_failure(mp, daddr, bbcount, mf_flags); } From patchwork Fri Dec 13 01:10:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13906230 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1BC017BA1 for ; Fri, 13 Dec 2024 01:10:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052215; cv=none; b=QU2LcTo6VQ6iGtcb6n7suRZW7iI13LoWawH7iie2W2ZvxE882tkdwC9XQ5W+7SjcBaJLlOCm+A5J3lSfI2z0to5XJlqBqkr7GhkAPnFHAlM1+TZ/Sfj947ru7fyOil3nZwbgfdjRy9x4QcvYJR0TmQyJkVvxpKGKnIPnOD06rw4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734052215; c=relaxed/simple; bh=lcd94iGFG/RIGBiHplSu0ztAyKSUF0xlBN8YAaGQ/ws=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=BZffmNRsOnhocdubuPL5xwYtlksboUqlWKvrrUjeCih4mbfxXIVkcuhW19b9t5j8OxCBZwubBdlUN/so/FVmeGfj2CY3+SCNmRMymtKPu88RUGUVJeso5+jVSPTAIRVrt+SKJ7Wri/swSzldIIKePU/JpCJX+kcDm+wbI6Gsz58= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=i4Kf/puN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="i4Kf/puN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6131DC4CECE; Fri, 13 Dec 2024 01:10:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734052215; bh=lcd94iGFG/RIGBiHplSu0ztAyKSUF0xlBN8YAaGQ/ws=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=i4Kf/puNdGY6ZILBKN9fmSuxJ+tTzw34m/b9leAvoy678qhRJnuaDjeBJWKk2Nt7n BbDHzV7wOcMdqp6W7EQ9BOYegWy00eNVGC3K1wK/jBhl9iecdm9zrR9utbmoesL1E5 zOfpBFo68Nf8qud5DWXYCwsJ9v1AObslGl8ceC7kiKi4we5d6cqWFHhhxsobH6k8Kz PxmcHlMfa6hXJB6vvzzAoa1gugD0KitN97GhACiz/ky3754qSdxokvbMg4sWjFEHoa zF0FlXdvUlUkPSZ/xkM9U/gfzRGV5XjpOIaD+tmJshF+xQ4KlHp9wa1T6pK9RtoJ2N 03dLsAGBDj8+w== Date: Thu, 12 Dec 2024 17:10:14 -0800 Subject: [PATCH 37/37] xfs: enable realtime rmap btree From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <173405123951.1181370.218717289433436256.stgit@frogsfrogsfrogs> In-Reply-To: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> References: <173405123212.1181370.1936576505332113490.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Permit mounting filesystems with realtime rmap btrees. Signed-off-by: "Darrick J. Wong" --- fs/xfs/xfs_rtalloc.c | 12 ++++++++---- fs/xfs/xfs_super.c | 6 ------ 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 3c1bce5a4855f2..a69967f9d88ead 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1282,11 +1282,15 @@ xfs_growfs_rt( XFS_FSB_TO_B(mp, in->extsize) < XFS_MIN_RTEXTSIZE) goto out_unlock; - /* Unsupported realtime features. */ + /* Check for features supported only on rtgroups filesystems. */ error = -EOPNOTSUPP; - if (xfs_has_quota(mp) && !xfs_has_rtgroups(mp)) - goto out_unlock; - if (xfs_has_rmapbt(mp) || xfs_has_reflink(mp)) + if (!xfs_has_rtgroups(mp)) { + if (xfs_has_rmapbt(mp)) + goto out_unlock; + if (xfs_has_quota(mp)) + goto out_unlock; + } + if (xfs_has_reflink(mp)) goto out_unlock; error = xfs_sb_validate_fsb_count(&mp->m_sb, in->newblocks); diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 394fdf3bb53531..ecd5a9f444d862 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1767,12 +1767,6 @@ xfs_fs_fill_super( } } - if (xfs_has_rmapbt(mp) && mp->m_sb.sb_rblocks) { - xfs_alert(mp, - "reverse mapping btree not compatible with realtime device!"); - error = -EINVAL; - goto out_filestream_unmount; - } if (xfs_has_exchange_range(mp)) xfs_warn_experimental(mp, XFS_EXPERIMENTAL_EXCHRANGE);