Message ID | 20250321220532.691118-4-bodonnel@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] xfs_repair: handling a block with bad crc, bad uuid, and bad magic number needs fixing | expand |
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
On 3/21/25 5:05 PM, bodonnel@redhat.com wrote: > From: Bill O'Donnell <bodonnel@redhat.com> > > In certain cases, if a block is so messed up that crc, uuid and magic > number are all bad, we need to not only detect in phase3 but fix it > properly in phase6. In the current code, the mechanism doesn't work > in that it only pays attention to one of the parameters. > > Note: in this case, the nlink inode link count drops to 1, but > re-running xfs_repair fixes it back to 2. This is a side effect that > should probably be handled in update_inode_nlinks() with separate patch. > Regardless, running xfs_repair twice, with this patch applied > fixes the issue. Recognize that this patch is a fix for xfs v5. The reason this fix leaves the inode nlinks in an inconsistent state is because the dir is (was) a longform directory (XFS_DIR2_FMT_BLOCK), and we go down this path with your patch in place: /* check v5 metadata */ if (xfs_has_crc(mp)) { error = check_dir3_header(mp, bp, ino); if (error) { fixit++; if (fmt == XFS_DIR2_FMT_BLOCK) { <==== true goto out_fix; <=== goto } libxfs_buf_relse(bp); bp = NULL; continue; } } longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot, irec, ino_offset, bp, hashtab, &freetab, da_bno, fmt == XFS_DIR2_FMT_BLOCK); ... out_fix: if (!no_modify && (fixit || dotdot_update)) { longform_dir2_rebuild(mp, ino, ip, irec, ino_offset, hashtab); longform_dir2_rebuild tries to rebuild the directory from the entries found via longform_dir2_entry_check_data() and placed in hashtab, but because we never called longform_dir2_entry_check_data(), hashtab is empty. This is why all entries in the problematic dir end up in lost+found. That also means that longform_dir2_rebuild completes without adding any entries at all, and so the directory is now shortform. Because shortform directories have no explicit "." entry, I think it would need an extra ref added at this point. But I wonder - why not call longform_dir2_entry_check_data() before we check the header? That way it /will/ populate hashtab with any found entries in the block, and when the header is found to be corrupt, it will rebuild it with all entries intact, and leave nothing in lost+found. Thoughts? -Eric
On 3/23/25 10:51 AM, Eric Sandeen wrote: ... > But I wonder - why not call longform_dir2_entry_check_data() before we check > the header? That way it /will/ populate hashtab with any found entries in the > block, and when the header is found to be corrupt, it will rebuild it with all > entries intact, and leave nothing in lost+found. Or alternatively, since checking the header first makes intuitive sense, perhaps change the logic so that we still check the header first, set fixit++, and if we are in XFS_DIR2_FMT_BLOCK, allow longform_dir2_entry_check_data() to be called, and then if errors were found in the header, goto out_fix; at that point, with a populated hashtab. -Eric > > Thoughts? > -Eric > > > >
On Sun, Mar 23, 2025 at 10:51:52AM -0500, Eric Sandeen wrote: > On 3/21/25 5:05 PM, bodonnel@redhat.com wrote: > > From: Bill O'Donnell <bodonnel@redhat.com> > > > > In certain cases, if a block is so messed up that crc, uuid and magic > > number are all bad, we need to not only detect in phase3 but fix it > > properly in phase6. In the current code, the mechanism doesn't work > > in that it only pays attention to one of the parameters. > > > > Note: in this case, the nlink inode link count drops to 1, but > > re-running xfs_repair fixes it back to 2. This is a side effect that > > should probably be handled in update_inode_nlinks() with separate patch. > > Regardless, running xfs_repair twice, with this patch applied > > fixes the issue. Recognize that this patch is a fix for xfs v5. > > The reason this fix leaves the inode nlinks in an inconsistent state > is because the dir is (was) a longform directory (XFS_DIR2_FMT_BLOCK), > and we go down this path with your patch in place: > > /* check v5 metadata */ > if (xfs_has_crc(mp)) { > error = check_dir3_header(mp, bp, ino); > if (error) { > fixit++; > if (fmt == XFS_DIR2_FMT_BLOCK) { <==== true > goto out_fix; <=== goto > } > > libxfs_buf_relse(bp); > bp = NULL; > continue; > } > } > > longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot, > irec, ino_offset, bp, hashtab, > &freetab, da_bno, fmt == XFS_DIR2_FMT_BLOCK); > ... > > out_fix: > > if (!no_modify && (fixit || dotdot_update)) { > longform_dir2_rebuild(mp, ino, ip, irec, ino_offset, hashtab); > > > longform_dir2_rebuild tries to rebuild the directory from the entries found > via longform_dir2_entry_check_data() and placed in hashtab, but because we never > called longform_dir2_entry_check_data(), hashtab is empty. This is why all > entries in the problematic dir end up in lost+found. > > That also means that longform_dir2_rebuild completes without adding any entries > at all, and so the directory is now shortform. Because shortform directories > have no explicit "." entry, I think it would need an extra ref added at this > point. > > But I wonder - why not call longform_dir2_entry_check_data() before we check > the header? That way it /will/ populate hashtab with any found entries in the > block, and when the header is found to be corrupt, it will rebuild it with all > entries intact, and leave nothing in lost+found. Yes, this works as you describe. Thanks- Bill
diff --git a/repair/phase6.c b/repair/phase6.c index 4064a84b2450..9cffbb1f4510 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -2364,7 +2364,6 @@ longform_dir2_entry_check( da_bno = (xfs_dablk_t)next_da_bno) { const struct xfs_buf_ops *ops; int error; - struct xfs_dir2_data_hdr *d; next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1; if (bmap_next_offset(ip, &next_da_bno)) { @@ -2404,9 +2403,7 @@ longform_dir2_entry_check( } /* check v5 metadata */ - d = bp->b_addr; - if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC || - be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) { + if (xfs_has_crc(mp)) { error = check_dir3_header(mp, bp, ino); if (error) { fixit++;