diff mbox series

[v2] xfs_repair: handling a block with bad crc, bad uuid, and bad magic number needs fixing

Message ID 20250321142848.676719-2-bodonnel@redhat.com (mailing list archive)
State New
Headers show
Series [v2] xfs_repair: handling a block with bad crc, bad uuid, and bad magic number needs fixing | expand

Commit Message

Bill O'Donnell March 21, 2025, 2:28 p.m. UTC
From: Bill O'Donnell <bodonnel@redhat.com>

In certain cases, if a block is so messed up that crc, uuid and magic
number are all bad, we need to not only detect in phase3 but fix it
properly in phase6. In the current code, the mechanism doesn't work
in that it only pays attention to one of the parameters.

Note: in this case, the nlink inode link count drops to 1, but
re-running xfs_repair fixes it back to 2. This is a side effect that
should probably be handled in update_inode_nlinks() with separate patch.
Regardless, running xfs_repair twice fixes the issue. Also, this patch
fixes the issue with v5, but not v4 xfs.

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>

v2: remove superfluous wantmagic logic

---
 repair/phase6.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Comments

Darrick J. Wong March 21, 2025, 3:27 p.m. UTC | #1
On Fri, Mar 21, 2025 at 09:28:49AM -0500, bodonnel@redhat.com wrote:
> From: Bill O'Donnell <bodonnel@redhat.com>
> 
> In certain cases, if a block is so messed up that crc, uuid and magic
> number are all bad, we need to not only detect in phase3 but fix it
> properly in phase6. In the current code, the mechanism doesn't work
> in that it only pays attention to one of the parameters.
> 
> Note: in this case, the nlink inode link count drops to 1, but
> re-running xfs_repair fixes it back to 2. This is a side effect that
> should probably be handled in update_inode_nlinks() with separate patch.
> Regardless, running xfs_repair twice fixes the issue. Also, this patch
> fixes the issue with v5, but not v4 xfs.
> 
> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>

That makes sense.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

Bonus question: does longform_dir2_check_leaf need a similar correction
for:

	if (leafhdr.magic == XFS_DIR3_LEAF1_MAGIC) {
		error = check_da3_header(mp, bp, ip->i_ino);
		if (error) {
			libxfs_buf_relse(bp);
			return error;
		}
	}

--D

> 
> v2: remove superfluous wantmagic logic
> 
> ---
>  repair/phase6.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/repair/phase6.c b/repair/phase6.c
> index 4064a84b2450..9cffbb1f4510 100644
> --- a/repair/phase6.c
> +++ b/repair/phase6.c
> @@ -2364,7 +2364,6 @@ longform_dir2_entry_check(
>  	     da_bno = (xfs_dablk_t)next_da_bno) {
>  		const struct xfs_buf_ops *ops;
>  		int			 error;
> -		struct xfs_dir2_data_hdr *d;
>  
>  		next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1;
>  		if (bmap_next_offset(ip, &next_da_bno)) {
> @@ -2404,9 +2403,7 @@ longform_dir2_entry_check(
>  		}
>  
>  		/* check v5 metadata */
> -		d = bp->b_addr;
> -		if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC ||
> -		    be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) {
> +		if (xfs_has_crc(mp)) {
>  			error = check_dir3_header(mp, bp, ino);
>  			if (error) {
>  				fixit++;
> -- 
> 2.48.1
> 
>
Bill O'Donnell March 21, 2025, 8:36 p.m. UTC | #2
On Fri, Mar 21, 2025 at 08:27:25AM -0700, Darrick J. Wong wrote:
> On Fri, Mar 21, 2025 at 09:28:49AM -0500, bodonnel@redhat.com wrote:
> > From: Bill O'Donnell <bodonnel@redhat.com>
> > 
> > In certain cases, if a block is so messed up that crc, uuid and magic
> > number are all bad, we need to not only detect in phase3 but fix it
> > properly in phase6. In the current code, the mechanism doesn't work
> > in that it only pays attention to one of the parameters.
> > 
> > Note: in this case, the nlink inode link count drops to 1, but
> > re-running xfs_repair fixes it back to 2. This is a side effect that
> > should probably be handled in update_inode_nlinks() with separate patch.
> > Regardless, running xfs_repair twice fixes the issue. Also, this patch
> > fixes the issue with v5, but not v4 xfs.
> > 
> > Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
> 
> That makes sense.
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> 
> Bonus question: does longform_dir2_check_leaf need a similar correction
> for:
> 
> 	if (leafhdr.magic == XFS_DIR3_LEAF1_MAGIC) {
> 		error = check_da3_header(mp, bp, ip->i_ino);
> 		if (error) {
> 			libxfs_buf_relse(bp);
> 			return error;
> 		}
> 	}
> --D
> 

I believe so, yes. Basing the v4/v5 decisions on an assumed correct
magic number is not so good. I'll fix it in a new version or separate
patch if preferred.

Thanks-
Bill


> > 
> > v2: remove superfluous wantmagic logic
> > 
> > ---
> >  repair/phase6.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/repair/phase6.c b/repair/phase6.c
> > index 4064a84b2450..9cffbb1f4510 100644
> > --- a/repair/phase6.c
> > +++ b/repair/phase6.c
> > @@ -2364,7 +2364,6 @@ longform_dir2_entry_check(
> >  	     da_bno = (xfs_dablk_t)next_da_bno) {
> >  		const struct xfs_buf_ops *ops;
> >  		int			 error;
> > -		struct xfs_dir2_data_hdr *d;
> >  
> >  		next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1;
> >  		if (bmap_next_offset(ip, &next_da_bno)) {
> > @@ -2404,9 +2403,7 @@ longform_dir2_entry_check(
> >  		}
> >  
> >  		/* check v5 metadata */
> > -		d = bp->b_addr;
> > -		if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC ||
> > -		    be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) {
> > +		if (xfs_has_crc(mp)) {
> >  			error = check_dir3_header(mp, bp, ino);
> >  			if (error) {
> >  				fixit++;
> > -- 
> > 2.48.1
> > 
> > 
>
Darrick J. Wong March 21, 2025, 8:39 p.m. UTC | #3
On Fri, Mar 21, 2025 at 03:36:39PM -0500, Bill O'Donnell wrote:
> On Fri, Mar 21, 2025 at 08:27:25AM -0700, Darrick J. Wong wrote:
> > On Fri, Mar 21, 2025 at 09:28:49AM -0500, bodonnel@redhat.com wrote:
> > > From: Bill O'Donnell <bodonnel@redhat.com>
> > > 
> > > In certain cases, if a block is so messed up that crc, uuid and magic
> > > number are all bad, we need to not only detect in phase3 but fix it
> > > properly in phase6. In the current code, the mechanism doesn't work
> > > in that it only pays attention to one of the parameters.
> > > 
> > > Note: in this case, the nlink inode link count drops to 1, but
> > > re-running xfs_repair fixes it back to 2. This is a side effect that
> > > should probably be handled in update_inode_nlinks() with separate patch.
> > > Regardless, running xfs_repair twice fixes the issue. Also, this patch
> > > fixes the issue with v5, but not v4 xfs.
> > > 
> > > Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
> > 
> > That makes sense.
> > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> > 
> > Bonus question: does longform_dir2_check_leaf need a similar correction
> > for:
> > 
> > 	if (leafhdr.magic == XFS_DIR3_LEAF1_MAGIC) {
> > 		error = check_da3_header(mp, bp, ip->i_ino);
> > 		if (error) {
> > 			libxfs_buf_relse(bp);
> > 			return error;
> > 		}
> > 	}
> > --D
> > 
> 
> I believe so, yes. Basing the v4/v5 decisions on an assumed correct
> magic number is not so good. I'll fix it in a new version or separate
> patch if preferred.

It's up to you, but since this fix has already earned its review, how
about a separate patch? :)

--D

> Thanks-
> Bill
> 
> 
> > > 
> > > v2: remove superfluous wantmagic logic
> > > 
> > > ---
> > >  repair/phase6.c | 5 +----
> > >  1 file changed, 1 insertion(+), 4 deletions(-)
> > > 
> > > diff --git a/repair/phase6.c b/repair/phase6.c
> > > index 4064a84b2450..9cffbb1f4510 100644
> > > --- a/repair/phase6.c
> > > +++ b/repair/phase6.c
> > > @@ -2364,7 +2364,6 @@ longform_dir2_entry_check(
> > >  	     da_bno = (xfs_dablk_t)next_da_bno) {
> > >  		const struct xfs_buf_ops *ops;
> > >  		int			 error;
> > > -		struct xfs_dir2_data_hdr *d;
> > >  
> > >  		next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1;
> > >  		if (bmap_next_offset(ip, &next_da_bno)) {
> > > @@ -2404,9 +2403,7 @@ longform_dir2_entry_check(
> > >  		}
> > >  
> > >  		/* check v5 metadata */
> > > -		d = bp->b_addr;
> > > -		if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC ||
> > > -		    be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) {
> > > +		if (xfs_has_crc(mp)) {
> > >  			error = check_dir3_header(mp, bp, ino);
> > >  			if (error) {
> > >  				fixit++;
> > > -- 
> > > 2.48.1
> > > 
> > > 
> > 
>
Eric Sandeen March 21, 2025, 8:49 p.m. UTC | #4
On 3/21/25 9:28 AM, bodonnel@redhat.com wrote:
> From: Bill O'Donnell <bodonnel@redhat.com>
> 
> In certain cases, if a block is so messed up that crc, uuid and magic
> number are all bad, we need to not only detect in phase3 but fix it
> properly in phase6. In the current code, the mechanism doesn't work
> in that it only pays attention to one of the parameters.
> 
> Note: in this case, the nlink inode link count drops to 1, but
> re-running xfs_repair fixes it back to 2. This is a side effect that
> should probably be handled in update_inode_nlinks() with separate patch.
> Regardless, running xfs_repair twice fixes the issue. Also, this patch
> fixes the issue with v5, but not v4 xfs.

Nitpick: IIRC V4 filesystems do not have UUIDs in metadata blocks,
so I think this problem is unique to corrupted V5 filesystems.

-Eric
Bill O'Donnell March 21, 2025, 9:57 p.m. UTC | #5
On Fri, Mar 21, 2025 at 03:49:59PM -0500, Eric Sandeen wrote:
> On 3/21/25 9:28 AM, bodonnel@redhat.com wrote:
> > From: Bill O'Donnell <bodonnel@redhat.com>
> > 
> > In certain cases, if a block is so messed up that crc, uuid and magic
> > number are all bad, we need to not only detect in phase3 but fix it
> > properly in phase6. In the current code, the mechanism doesn't work
> > in that it only pays attention to one of the parameters.
> > 
> > Note: in this case, the nlink inode link count drops to 1, but
> > re-running xfs_repair fixes it back to 2. This is a side effect that
> > should probably be handled in update_inode_nlinks() with separate patch.
> > Regardless, running xfs_repair twice fixes the issue. Also, this patch
> > fixes the issue with v5, but not v4 xfs.
> 
> Nitpick: IIRC V4 filesystems do not have UUIDs in metadata blocks,
> so I think this problem is unique to corrupted V5 filesystems.

Right. I'll send a patch version 3, just to clarify the message.

Thanks!
-Bill


> 
> -Eric
>
Bill O'Donnell March 21, 2025, 11:57 p.m. UTC | #6
On Fri, Mar 21, 2025 at 01:39:14PM -0700, Darrick J. Wong wrote:
> On Fri, Mar 21, 2025 at 03:36:39PM -0500, Bill O'Donnell wrote:
> > On Fri, Mar 21, 2025 at 08:27:25AM -0700, Darrick J. Wong wrote:
> > > On Fri, Mar 21, 2025 at 09:28:49AM -0500, bodonnel@redhat.com wrote:
> > > > From: Bill O'Donnell <bodonnel@redhat.com>
> > > > 
> > > > In certain cases, if a block is so messed up that crc, uuid and magic
> > > > number are all bad, we need to not only detect in phase3 but fix it
> > > > properly in phase6. In the current code, the mechanism doesn't work
> > > > in that it only pays attention to one of the parameters.
> > > > 
> > > > Note: in this case, the nlink inode link count drops to 1, but
> > > > re-running xfs_repair fixes it back to 2. This is a side effect that
> > > > should probably be handled in update_inode_nlinks() with separate patch.
> > > > Regardless, running xfs_repair twice fixes the issue. Also, this patch
> > > > fixes the issue with v5, but not v4 xfs.
> > > > 
> > > > Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
> > > 
> > > That makes sense.
> > > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> > > 
> > > Bonus question: does longform_dir2_check_leaf need a similar correction
> > > for:
> > > 
> > > 	if (leafhdr.magic == XFS_DIR3_LEAF1_MAGIC) {
> > > 		error = check_da3_header(mp, bp, ip->i_ino);
> > > 		if (error) {
> > > 			libxfs_buf_relse(bp);
> > > 			return error;
> > > 		}
> > > 	}
> > > --D
> > > 
> > 
> > I believe so, yes. Basing the v4/v5 decisions on an assumed correct
> > magic number is not so good. I'll fix it in a new version or separate
> > patch if preferred.
> 
> It's up to you, but since this fix has already earned its review, how
> about a separate patch? :)

That's what I'll do. Thanks again for the review :)
-Bill


> 
> --D
> 
> > Thanks-
> > Bill
> > 
> > 
> > > > 
> > > > v2: remove superfluous wantmagic logic
> > > > 
> > > > ---
> > > >  repair/phase6.c | 5 +----
> > > >  1 file changed, 1 insertion(+), 4 deletions(-)
> > > > 
> > > > diff --git a/repair/phase6.c b/repair/phase6.c
> > > > index 4064a84b2450..9cffbb1f4510 100644
> > > > --- a/repair/phase6.c
> > > > +++ b/repair/phase6.c
> > > > @@ -2364,7 +2364,6 @@ longform_dir2_entry_check(
> > > >  	     da_bno = (xfs_dablk_t)next_da_bno) {
> > > >  		const struct xfs_buf_ops *ops;
> > > >  		int			 error;
> > > > -		struct xfs_dir2_data_hdr *d;
> > > >  
> > > >  		next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1;
> > > >  		if (bmap_next_offset(ip, &next_da_bno)) {
> > > > @@ -2404,9 +2403,7 @@ longform_dir2_entry_check(
> > > >  		}
> > > >  
> > > >  		/* check v5 metadata */
> > > > -		d = bp->b_addr;
> > > > -		if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC ||
> > > > -		    be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) {
> > > > +		if (xfs_has_crc(mp)) {
> > > >  			error = check_dir3_header(mp, bp, ino);
> > > >  			if (error) {
> > > >  				fixit++;
> > > > -- 
> > > > 2.48.1
> > > > 
> > > > 
> > > 
> > 
>
diff mbox series

Patch

diff --git a/repair/phase6.c b/repair/phase6.c
index 4064a84b2450..9cffbb1f4510 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -2364,7 +2364,6 @@  longform_dir2_entry_check(
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		const struct xfs_buf_ops *ops;
 		int			 error;
-		struct xfs_dir2_data_hdr *d;
 
 		next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1;
 		if (bmap_next_offset(ip, &next_da_bno)) {
@@ -2404,9 +2403,7 @@  longform_dir2_entry_check(
 		}
 
 		/* check v5 metadata */
-		d = bp->b_addr;
-		if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC ||
-		    be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) {
+		if (xfs_has_crc(mp)) {
 			error = check_dir3_header(mp, bp, ino);
 			if (error) {
 				fixit++;