Message ID | 20250321142848.676719-2-bodonnel@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] xfs_repair: handling a block with bad crc, bad uuid, and bad magic number needs fixing | expand |
On Fri, Mar 21, 2025 at 09:28:49AM -0500, bodonnel@redhat.com wrote: > From: Bill O'Donnell <bodonnel@redhat.com> > > In certain cases, if a block is so messed up that crc, uuid and magic > number are all bad, we need to not only detect in phase3 but fix it > properly in phase6. In the current code, the mechanism doesn't work > in that it only pays attention to one of the parameters. > > Note: in this case, the nlink inode link count drops to 1, but > re-running xfs_repair fixes it back to 2. This is a side effect that > should probably be handled in update_inode_nlinks() with separate patch. > Regardless, running xfs_repair twice fixes the issue. Also, this patch > fixes the issue with v5, but not v4 xfs. > > Signed-off-by: Bill O'Donnell <bodonnel@redhat.com> That makes sense. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Bonus question: does longform_dir2_check_leaf need a similar correction for: if (leafhdr.magic == XFS_DIR3_LEAF1_MAGIC) { error = check_da3_header(mp, bp, ip->i_ino); if (error) { libxfs_buf_relse(bp); return error; } } --D > > v2: remove superfluous wantmagic logic > > --- > repair/phase6.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/repair/phase6.c b/repair/phase6.c > index 4064a84b2450..9cffbb1f4510 100644 > --- a/repair/phase6.c > +++ b/repair/phase6.c > @@ -2364,7 +2364,6 @@ longform_dir2_entry_check( > da_bno = (xfs_dablk_t)next_da_bno) { > const struct xfs_buf_ops *ops; > int error; > - struct xfs_dir2_data_hdr *d; > > next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1; > if (bmap_next_offset(ip, &next_da_bno)) { > @@ -2404,9 +2403,7 @@ longform_dir2_entry_check( > } > > /* check v5 metadata */ > - d = bp->b_addr; > - if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC || > - be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) { > + if (xfs_has_crc(mp)) { > error = check_dir3_header(mp, bp, ino); > if (error) { > fixit++; > -- > 2.48.1 > >
On Fri, Mar 21, 2025 at 08:27:25AM -0700, Darrick J. Wong wrote: > On Fri, Mar 21, 2025 at 09:28:49AM -0500, bodonnel@redhat.com wrote: > > From: Bill O'Donnell <bodonnel@redhat.com> > > > > In certain cases, if a block is so messed up that crc, uuid and magic > > number are all bad, we need to not only detect in phase3 but fix it > > properly in phase6. In the current code, the mechanism doesn't work > > in that it only pays attention to one of the parameters. > > > > Note: in this case, the nlink inode link count drops to 1, but > > re-running xfs_repair fixes it back to 2. This is a side effect that > > should probably be handled in update_inode_nlinks() with separate patch. > > Regardless, running xfs_repair twice fixes the issue. Also, this patch > > fixes the issue with v5, but not v4 xfs. > > > > Signed-off-by: Bill O'Donnell <bodonnel@redhat.com> > > That makes sense. > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> > > Bonus question: does longform_dir2_check_leaf need a similar correction > for: > > if (leafhdr.magic == XFS_DIR3_LEAF1_MAGIC) { > error = check_da3_header(mp, bp, ip->i_ino); > if (error) { > libxfs_buf_relse(bp); > return error; > } > } > --D > I believe so, yes. Basing the v4/v5 decisions on an assumed correct magic number is not so good. I'll fix it in a new version or separate patch if preferred. Thanks- Bill > > > > v2: remove superfluous wantmagic logic > > > > --- > > repair/phase6.c | 5 +---- > > 1 file changed, 1 insertion(+), 4 deletions(-) > > > > diff --git a/repair/phase6.c b/repair/phase6.c > > index 4064a84b2450..9cffbb1f4510 100644 > > --- a/repair/phase6.c > > +++ b/repair/phase6.c > > @@ -2364,7 +2364,6 @@ longform_dir2_entry_check( > > da_bno = (xfs_dablk_t)next_da_bno) { > > const struct xfs_buf_ops *ops; > > int error; > > - struct xfs_dir2_data_hdr *d; > > > > next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1; > > if (bmap_next_offset(ip, &next_da_bno)) { > > @@ -2404,9 +2403,7 @@ longform_dir2_entry_check( > > } > > > > /* check v5 metadata */ > > - d = bp->b_addr; > > - if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC || > > - be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) { > > + if (xfs_has_crc(mp)) { > > error = check_dir3_header(mp, bp, ino); > > if (error) { > > fixit++; > > -- > > 2.48.1 > > > > >
On Fri, Mar 21, 2025 at 03:36:39PM -0500, Bill O'Donnell wrote: > On Fri, Mar 21, 2025 at 08:27:25AM -0700, Darrick J. Wong wrote: > > On Fri, Mar 21, 2025 at 09:28:49AM -0500, bodonnel@redhat.com wrote: > > > From: Bill O'Donnell <bodonnel@redhat.com> > > > > > > In certain cases, if a block is so messed up that crc, uuid and magic > > > number are all bad, we need to not only detect in phase3 but fix it > > > properly in phase6. In the current code, the mechanism doesn't work > > > in that it only pays attention to one of the parameters. > > > > > > Note: in this case, the nlink inode link count drops to 1, but > > > re-running xfs_repair fixes it back to 2. This is a side effect that > > > should probably be handled in update_inode_nlinks() with separate patch. > > > Regardless, running xfs_repair twice fixes the issue. Also, this patch > > > fixes the issue with v5, but not v4 xfs. > > > > > > Signed-off-by: Bill O'Donnell <bodonnel@redhat.com> > > > > That makes sense. > > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> > > > > Bonus question: does longform_dir2_check_leaf need a similar correction > > for: > > > > if (leafhdr.magic == XFS_DIR3_LEAF1_MAGIC) { > > error = check_da3_header(mp, bp, ip->i_ino); > > if (error) { > > libxfs_buf_relse(bp); > > return error; > > } > > } > > --D > > > > I believe so, yes. Basing the v4/v5 decisions on an assumed correct > magic number is not so good. I'll fix it in a new version or separate > patch if preferred. It's up to you, but since this fix has already earned its review, how about a separate patch? :) --D > Thanks- > Bill > > > > > > > > v2: remove superfluous wantmagic logic > > > > > > --- > > > repair/phase6.c | 5 +---- > > > 1 file changed, 1 insertion(+), 4 deletions(-) > > > > > > diff --git a/repair/phase6.c b/repair/phase6.c > > > index 4064a84b2450..9cffbb1f4510 100644 > > > --- a/repair/phase6.c > > > +++ b/repair/phase6.c > > > @@ -2364,7 +2364,6 @@ longform_dir2_entry_check( > > > da_bno = (xfs_dablk_t)next_da_bno) { > > > const struct xfs_buf_ops *ops; > > > int error; > > > - struct xfs_dir2_data_hdr *d; > > > > > > next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1; > > > if (bmap_next_offset(ip, &next_da_bno)) { > > > @@ -2404,9 +2403,7 @@ longform_dir2_entry_check( > > > } > > > > > > /* check v5 metadata */ > > > - d = bp->b_addr; > > > - if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC || > > > - be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) { > > > + if (xfs_has_crc(mp)) { > > > error = check_dir3_header(mp, bp, ino); > > > if (error) { > > > fixit++; > > > -- > > > 2.48.1 > > > > > > > > >
On 3/21/25 9:28 AM, bodonnel@redhat.com wrote: > From: Bill O'Donnell <bodonnel@redhat.com> > > In certain cases, if a block is so messed up that crc, uuid and magic > number are all bad, we need to not only detect in phase3 but fix it > properly in phase6. In the current code, the mechanism doesn't work > in that it only pays attention to one of the parameters. > > Note: in this case, the nlink inode link count drops to 1, but > re-running xfs_repair fixes it back to 2. This is a side effect that > should probably be handled in update_inode_nlinks() with separate patch. > Regardless, running xfs_repair twice fixes the issue. Also, this patch > fixes the issue with v5, but not v4 xfs. Nitpick: IIRC V4 filesystems do not have UUIDs in metadata blocks, so I think this problem is unique to corrupted V5 filesystems. -Eric
On Fri, Mar 21, 2025 at 03:49:59PM -0500, Eric Sandeen wrote: > On 3/21/25 9:28 AM, bodonnel@redhat.com wrote: > > From: Bill O'Donnell <bodonnel@redhat.com> > > > > In certain cases, if a block is so messed up that crc, uuid and magic > > number are all bad, we need to not only detect in phase3 but fix it > > properly in phase6. In the current code, the mechanism doesn't work > > in that it only pays attention to one of the parameters. > > > > Note: in this case, the nlink inode link count drops to 1, but > > re-running xfs_repair fixes it back to 2. This is a side effect that > > should probably be handled in update_inode_nlinks() with separate patch. > > Regardless, running xfs_repair twice fixes the issue. Also, this patch > > fixes the issue with v5, but not v4 xfs. > > Nitpick: IIRC V4 filesystems do not have UUIDs in metadata blocks, > so I think this problem is unique to corrupted V5 filesystems. Right. I'll send a patch version 3, just to clarify the message. Thanks! -Bill > > -Eric >
On Fri, Mar 21, 2025 at 01:39:14PM -0700, Darrick J. Wong wrote: > On Fri, Mar 21, 2025 at 03:36:39PM -0500, Bill O'Donnell wrote: > > On Fri, Mar 21, 2025 at 08:27:25AM -0700, Darrick J. Wong wrote: > > > On Fri, Mar 21, 2025 at 09:28:49AM -0500, bodonnel@redhat.com wrote: > > > > From: Bill O'Donnell <bodonnel@redhat.com> > > > > > > > > In certain cases, if a block is so messed up that crc, uuid and magic > > > > number are all bad, we need to not only detect in phase3 but fix it > > > > properly in phase6. In the current code, the mechanism doesn't work > > > > in that it only pays attention to one of the parameters. > > > > > > > > Note: in this case, the nlink inode link count drops to 1, but > > > > re-running xfs_repair fixes it back to 2. This is a side effect that > > > > should probably be handled in update_inode_nlinks() with separate patch. > > > > Regardless, running xfs_repair twice fixes the issue. Also, this patch > > > > fixes the issue with v5, but not v4 xfs. > > > > > > > > Signed-off-by: Bill O'Donnell <bodonnel@redhat.com> > > > > > > That makes sense. > > > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> > > > > > > Bonus question: does longform_dir2_check_leaf need a similar correction > > > for: > > > > > > if (leafhdr.magic == XFS_DIR3_LEAF1_MAGIC) { > > > error = check_da3_header(mp, bp, ip->i_ino); > > > if (error) { > > > libxfs_buf_relse(bp); > > > return error; > > > } > > > } > > > --D > > > > > > > I believe so, yes. Basing the v4/v5 decisions on an assumed correct > > magic number is not so good. I'll fix it in a new version or separate > > patch if preferred. > > It's up to you, but since this fix has already earned its review, how > about a separate patch? :) That's what I'll do. Thanks again for the review :) -Bill > > --D > > > Thanks- > > Bill > > > > > > > > > > > > v2: remove superfluous wantmagic logic > > > > > > > > --- > > > > repair/phase6.c | 5 +---- > > > > 1 file changed, 1 insertion(+), 4 deletions(-) > > > > > > > > diff --git a/repair/phase6.c b/repair/phase6.c > > > > index 4064a84b2450..9cffbb1f4510 100644 > > > > --- a/repair/phase6.c > > > > +++ b/repair/phase6.c > > > > @@ -2364,7 +2364,6 @@ longform_dir2_entry_check( > > > > da_bno = (xfs_dablk_t)next_da_bno) { > > > > const struct xfs_buf_ops *ops; > > > > int error; > > > > - struct xfs_dir2_data_hdr *d; > > > > > > > > next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1; > > > > if (bmap_next_offset(ip, &next_da_bno)) { > > > > @@ -2404,9 +2403,7 @@ longform_dir2_entry_check( > > > > } > > > > > > > > /* check v5 metadata */ > > > > - d = bp->b_addr; > > > > - if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC || > > > > - be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) { > > > > + if (xfs_has_crc(mp)) { > > > > error = check_dir3_header(mp, bp, ino); > > > > if (error) { > > > > fixit++; > > > > -- > > > > 2.48.1 > > > > > > > > > > > > > >
diff --git a/repair/phase6.c b/repair/phase6.c index 4064a84b2450..9cffbb1f4510 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -2364,7 +2364,6 @@ longform_dir2_entry_check( da_bno = (xfs_dablk_t)next_da_bno) { const struct xfs_buf_ops *ops; int error; - struct xfs_dir2_data_hdr *d; next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1; if (bmap_next_offset(ip, &next_da_bno)) { @@ -2404,9 +2403,7 @@ longform_dir2_entry_check( } /* check v5 metadata */ - d = bp->b_addr; - if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC || - be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) { + if (xfs_has_crc(mp)) { error = check_dir3_header(mp, bp, ino); if (error) { fixit++;