diff mbox series

[1/2] xfs: fix rmap key comparison functions

Message ID 166473481263.1084112.1077820948503334734.stgit@magnolia (mailing list archive)
State New, archived
Headers show
Series xfs: enhance btree key checking | expand

Commit Message

Darrick J. Wong Oct. 2, 2022, 6:20 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Keys for extent interval records in the reverse mapping btree are
supposed to be computed as follows:

(physical block, owner, fork, is_btree, offset)

This provides users the ability to look up a reverse mapping from a file
block mapping record -- start with the physical block; then if there are
multiple records for the same block, move on to the owner; then the
inode fork type; and so on to the file offset.

However, the key comparison functions incorrectly remove the fork/bmbt
information that's encoded in the on-disk offset.  This means that
lookup comparisons are only done with:

(physical block, owner, offset)

This means that queries can return incorrect results.  On consistent
filesystems this isn't an issue because bmbt blocks and blocks mapped to
an attr fork cannot be shared, but this prevents us from detecting
incorrect fork and bmbt flag bits in the rmap btree.

A previous version of this patch forgot to keep the (un)written state
flag masked during the comparison and caused a major regression in
5.9.x since unwritten extent conversion can update an rmap record
without requiring key updates.

Note that blocks cannot go directly from data fork to attr fork without
being deallocated and reallocated, nor can they be added to or removed
from a bmbt without a free/alloc cycle, so this should not cause any
regressions.

Found by fuzzing keys[1].attrfork = ones on xfs/371.

Fixes: 4b8ed67794fe ("xfs: add rmap btree operations")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rmap_btree.c |   25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

Comments

Dave Chinner Nov. 1, 2022, 11:40 p.m. UTC | #1
On Sun, Oct 02, 2022 at 11:20:12AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Keys for extent interval records in the reverse mapping btree are
> supposed to be computed as follows:
> 
> (physical block, owner, fork, is_btree, offset)
> 
> This provides users the ability to look up a reverse mapping from a file
> block mapping record -- start with the physical block; then if there are
> multiple records for the same block, move on to the owner; then the
> inode fork type; and so on to the file offset.
> 
> However, the key comparison functions incorrectly remove the fork/bmbt
> information that's encoded in the on-disk offset.  This means that
> lookup comparisons are only done with:
> 
> (physical block, owner, offset)
> 
> This means that queries can return incorrect results.  On consistent
> filesystems this isn't an issue because bmbt blocks and blocks mapped to
> an attr fork cannot be shared, but this prevents us from detecting
> incorrect fork and bmbt flag bits in the rmap btree.
> 
> A previous version of this patch forgot to keep the (un)written state
> flag masked during the comparison and caused a major regression in
> 5.9.x since unwritten extent conversion can update an rmap record
> without requiring key updates.
> 
> Note that blocks cannot go directly from data fork to attr fork without
> being deallocated and reallocated, nor can they be added to or removed
> from a bmbt without a free/alloc cycle, so this should not cause any
> regressions.
> 
> Found by fuzzing keys[1].attrfork = ones on xfs/371.
> 
> Fixes: 4b8ed67794fe ("xfs: add rmap btree operations")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_rmap_btree.c |   25 +++++++++++++++++--------
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
> index 7f83f62e51e0..e2e1f68cedf5 100644
> --- a/fs/xfs/libxfs/xfs_rmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_rmap_btree.c
> @@ -219,6 +219,15 @@ xfs_rmapbt_init_ptr_from_cur(
>  	ptr->s = agf->agf_roots[cur->bc_btnum];
>  }
>  
> +/*
> + * Fork and bmbt are significant parts of the rmap record key, but written
> + * status is merely a record attribute.
> + */
> +static inline uint64_t offset_keymask(uint64_t offset)
> +{
> +	return offset & ~XFS_RMAP_OFF_UNWRITTEN;
> +}

Ok. but doesn't that mean xfs_rmapbt_init_key_from_rec() and
xfs_rmapbt_init_high_key_from_rec() should be masking out the
XFS_RMAP_OFF_UNWRITTEN bit as well?

-Dave.
Darrick J. Wong Nov. 2, 2022, 11:53 p.m. UTC | #2
On Wed, Nov 02, 2022 at 10:40:22AM +1100, Dave Chinner wrote:
> On Sun, Oct 02, 2022 at 11:20:12AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Keys for extent interval records in the reverse mapping btree are
> > supposed to be computed as follows:
> > 
> > (physical block, owner, fork, is_btree, offset)
> > 
> > This provides users the ability to look up a reverse mapping from a file
> > block mapping record -- start with the physical block; then if there are
> > multiple records for the same block, move on to the owner; then the
> > inode fork type; and so on to the file offset.
> > 
> > However, the key comparison functions incorrectly remove the fork/bmbt
> > information that's encoded in the on-disk offset.  This means that
> > lookup comparisons are only done with:
> > 
> > (physical block, owner, offset)
> > 
> > This means that queries can return incorrect results.  On consistent
> > filesystems this isn't an issue because bmbt blocks and blocks mapped to
> > an attr fork cannot be shared, but this prevents us from detecting
> > incorrect fork and bmbt flag bits in the rmap btree.
> > 
> > A previous version of this patch forgot to keep the (un)written state
> > flag masked during the comparison and caused a major regression in
> > 5.9.x since unwritten extent conversion can update an rmap record
> > without requiring key updates.
> > 
> > Note that blocks cannot go directly from data fork to attr fork without
> > being deallocated and reallocated, nor can they be added to or removed
> > from a bmbt without a free/alloc cycle, so this should not cause any
> > regressions.
> > 
> > Found by fuzzing keys[1].attrfork = ones on xfs/371.
> > 
> > Fixes: 4b8ed67794fe ("xfs: add rmap btree operations")
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_rmap_btree.c |   25 +++++++++++++++++--------
> >  1 file changed, 17 insertions(+), 8 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
> > index 7f83f62e51e0..e2e1f68cedf5 100644
> > --- a/fs/xfs/libxfs/xfs_rmap_btree.c
> > +++ b/fs/xfs/libxfs/xfs_rmap_btree.c
> > @@ -219,6 +219,15 @@ xfs_rmapbt_init_ptr_from_cur(
> >  	ptr->s = agf->agf_roots[cur->bc_btnum];
> >  }
> >  
> > +/*
> > + * Fork and bmbt are significant parts of the rmap record key, but written
> > + * status is merely a record attribute.
> > + */
> > +static inline uint64_t offset_keymask(uint64_t offset)
> > +{
> > +	return offset & ~XFS_RMAP_OFF_UNWRITTEN;
> > +}
> 
> Ok. but doesn't that mean xfs_rmapbt_init_key_from_rec() and
> xfs_rmapbt_init_high_key_from_rec() should be masking out the
> XFS_RMAP_OFF_UNWRITTEN bit as well?

It ought to, but it might be too late for that because
_init_*key_from_rec have been letting the unwritten bit slip into the
rmap key structures since 4.8.  Somewhere out there is a filesystem with
rmapbt node blocks containing struct xfs_rmap_key's with that unwritten
bit set.  The best we can do is ignore it in the key comparison
function.

Let me think about this overnight though, because once we stop paying
attention to the unwritten bit for key comparisons, it might not matter
what's in the ondisk node blocks.

--D

> -Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 7f83f62e51e0..e2e1f68cedf5 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -219,6 +219,15 @@  xfs_rmapbt_init_ptr_from_cur(
 	ptr->s = agf->agf_roots[cur->bc_btnum];
 }
 
+/*
+ * Fork and bmbt are significant parts of the rmap record key, but written
+ * status is merely a record attribute.
+ */
+static inline uint64_t offset_keymask(uint64_t offset)
+{
+	return offset & ~XFS_RMAP_OFF_UNWRITTEN;
+}
+
 STATIC int64_t
 xfs_rmapbt_key_diff(
 	struct xfs_btree_cur		*cur,
@@ -240,8 +249,8 @@  xfs_rmapbt_key_diff(
 	else if (y > x)
 		return -1;
 
-	x = XFS_RMAP_OFF(be64_to_cpu(kp->rm_offset));
-	y = rec->rm_offset;
+	x = offset_keymask(be64_to_cpu(kp->rm_offset));
+	y = offset_keymask(xfs_rmap_irec_offset_pack(rec));
 	if (x > y)
 		return 1;
 	else if (y > x)
@@ -272,8 +281,8 @@  xfs_rmapbt_diff_two_keys(
 	else if (y > x)
 		return -1;
 
-	x = XFS_RMAP_OFF(be64_to_cpu(kp1->rm_offset));
-	y = XFS_RMAP_OFF(be64_to_cpu(kp2->rm_offset));
+	x = offset_keymask(be64_to_cpu(kp1->rm_offset));
+	y = offset_keymask(be64_to_cpu(kp2->rm_offset));
 	if (x > y)
 		return 1;
 	else if (y > x)
@@ -387,8 +396,8 @@  xfs_rmapbt_keys_inorder(
 		return 1;
 	else if (a > b)
 		return 0;
-	a = XFS_RMAP_OFF(be64_to_cpu(k1->rmap.rm_offset));
-	b = XFS_RMAP_OFF(be64_to_cpu(k2->rmap.rm_offset));
+	a = offset_keymask(be64_to_cpu(k1->rmap.rm_offset));
+	b = offset_keymask(be64_to_cpu(k2->rmap.rm_offset));
 	if (a <= b)
 		return 1;
 	return 0;
@@ -417,8 +426,8 @@  xfs_rmapbt_recs_inorder(
 		return 1;
 	else if (a > b)
 		return 0;
-	a = XFS_RMAP_OFF(be64_to_cpu(r1->rmap.rm_offset));
-	b = XFS_RMAP_OFF(be64_to_cpu(r2->rmap.rm_offset));
+	a = offset_keymask(be64_to_cpu(r1->rmap.rm_offset));
+	b = offset_keymask(be64_to_cpu(r2->rmap.rm_offset));
 	if (a <= b)
 		return 1;
 	return 0;