Message ID | 20250320202518.644182-1-bodonnel@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | xfs_repair: handling a block with bad crc, bad uuid, and bad magic number needs fixing | expand |
For anyone interested in evaluating this problem, here is a cleanroom'd reproducer metadump image that demonstrates it. For a single-block extent format directory ... xfs_db> inode 131 xfs_db> p ... 0:[0,15,1,0] ... xfs_db> fsblock 15 xfs_db> type dir3 we corrupt the magic, the uuid, and the crc of this dir3 block: xfs_db> write -c bhdr.hdr.uuid 0xdeadbeef Allowing write of corrupted data and bad CRC bhdr.hdr.uuid = 20000000-0000-0000-0000-0000deadbeef xfs_db> write -c bhdr.hdr.magic 0xfeedface Allowing write of corrupted data and bad CRC bhdr.hdr.magic = 0xfeedface xfs_db> quit and then xfs_repair fails to fix things up enough to pass the verifier when it tries to write out the buffer after "fixing" it: # xfs_repair foo.img Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 Metadata CRC error detected at 0x5556997d5ca0, xfs_dir3_block block 0x78/0x1000 bad directory block magic # 0xfeedface in block 0 for directory inode 131 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 1 - agno = 3 bad directory block magic # 0xfeedface in block 0 for directory inode 131 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... bad directory block magic # 0xfeedface for directory inode 131 block 0: fixing magic # to 0x58444233 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Metadata corruption detected at 0x5556997d5990, xfs_dir3_block block 0x78/0x1000 libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0x78/0x8 xfs_repair: Releasing dirty buffer to free list! xfs_repair: Refusing to write a corrupt buffer to the data device! xfs_repair: Lost a write to the data device! fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair. This is because this sequence in longform_dir2_entry_check(): /* check v5 metadata */ d = bp->b_addr; if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC || be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) { error = check_dir3_header(mp, bp, ino); if (error) { fixit++; if (fmt == XFS_DIR2_FMT_BLOCK) goto out_fix; libxfs_buf_relse(bp); bp = NULL; continue; } } longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot, irec, ino_offset, bp, hashtab, &freetab, da_bno, fmt == XFS_DIR2_FMT_BLOCK); would have fixed the UUID had check_dir3_header found the error, but the magic was wrong so that never ran and fixit was never set. longform_dir2_entry_check_data then fixes the magic and the crc, but does not fix the UUID, so the verifier check fails on writeout. When all 3 items are bad, I'm not exactly sure what we should do to get the UUID fixed up here (or if it just should have been junked at that point) -Eric
On 3/20/25 3:25 PM, bodonnel@redhat.com wrote: > From: Bill O'Donnell <bodonnel@redhat.com> > > In certain cases, if a block is so messed up that crc, uuid and magic number are all > bad, we need to not only detect in phase3 but fix it properly in phase6. In the > current code, the mechanism doesn't work in that it only pays attention to one of the > parameters. > > Note: in this case, the nlink inode link count drops to 1, but re-running xfs_repair > fixes it back to 2. This is a side effect that should probably be handled in > update_inode_nlinks() with separate patch. Regardless, running xfs_repair twice > fixes the issue. Also, this patch fixes the issue with v5, but not v4 xfs. > > Signed-off-by: Bill O'Donnell <bodonnel@redhat.com> > --- > repair/phase6.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/repair/phase6.c b/repair/phase6.c > index 4064a84b2450..677505d45357 100644 > --- a/repair/phase6.c > +++ b/repair/phase6.c > @@ -2336,6 +2336,7 @@ longform_dir2_entry_check( > int fixit = 0; > struct xfs_da_args args; > int error; > + int wantmagic; > > *need_dot = 1; > freetab = malloc(FREETAB_SIZE(ip->i_disk_size / mp->m_dir_geo->blksize)); > @@ -2364,7 +2365,6 @@ longform_dir2_entry_check( > da_bno = (xfs_dablk_t)next_da_bno) { > const struct xfs_buf_ops *ops; > int error; > - struct xfs_dir2_data_hdr *d; > > next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1; > if (bmap_next_offset(ip, &next_da_bno)) { > @@ -2404,9 +2404,11 @@ longform_dir2_entry_check( > } > > /* check v5 metadata */ > - d = bp->b_addr; > - if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC || > - be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) { > + if (xfs_has_crc(mp)) > + wantmagic = XFS_DIR3_BLOCK_MAGIC; > + else > + wantmagic = XFS_DIR2_BLOCK_MAGIC; > + if (wantmagic == XFS_DIR3_BLOCK_MAGIC) { So I guess the prior 5 lines are equivalent to: /* check v5 metadata */ if (xfs_has_crc(mp)) { ... and that will force it to check the header, below. And then I think we hit the goto out_fix; line, and it moves the contents of this directory to lost+found (at least on my custom repro image.) Curious to see what others think is the right path through all this. -Eric > error = check_dir3_header(mp, bp, ino); > if (error) { > fixit++;
On Thu, Mar 20, 2025 at 05:21:22PM -0500, Eric Sandeen wrote: > For anyone interested in evaluating this problem, here is a cleanroom'd > reproducer metadump image that demonstrates it. Which makes me wonder again if we want these kinds of images stored in xfstests somehow (or figure out a way to create them by fuzzing)
On Thu, Mar 20, 2025 at 06:28:19PM -0500, Eric Sandeen wrote: > > - be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) { > > + if (xfs_has_crc(mp)) > > + wantmagic = XFS_DIR3_BLOCK_MAGIC; > > + else > > + wantmagic = XFS_DIR2_BLOCK_MAGIC; > > + if (wantmagic == XFS_DIR3_BLOCK_MAGIC) { > > So I guess the prior 5 lines are equivalent to: > > /* check v5 metadata */ > if (xfs_has_crc(mp)) { > ... > > and that will force it to check the header, below. And then I think we hit > the goto out_fix; line, and it moves the contents of this directory to > lost+found (at least on my custom repro image.) > > Curious to see what others think is the right path through all this. I'll need some time to actually understand the code, but replacing the wantmagic logic with just doign the xfs_has_crc check looks right and muche easier to follow.
diff --git a/repair/phase6.c b/repair/phase6.c index 4064a84b2450..677505d45357 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -2336,6 +2336,7 @@ longform_dir2_entry_check( int fixit = 0; struct xfs_da_args args; int error; + int wantmagic; *need_dot = 1; freetab = malloc(FREETAB_SIZE(ip->i_disk_size / mp->m_dir_geo->blksize)); @@ -2364,7 +2365,6 @@ longform_dir2_entry_check( da_bno = (xfs_dablk_t)next_da_bno) { const struct xfs_buf_ops *ops; int error; - struct xfs_dir2_data_hdr *d; next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1; if (bmap_next_offset(ip, &next_da_bno)) { @@ -2404,9 +2404,11 @@ longform_dir2_entry_check( } /* check v5 metadata */ - d = bp->b_addr; - if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC || - be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) { + if (xfs_has_crc(mp)) + wantmagic = XFS_DIR3_BLOCK_MAGIC; + else + wantmagic = XFS_DIR2_BLOCK_MAGIC; + if (wantmagic == XFS_DIR3_BLOCK_MAGIC) { error = check_dir3_header(mp, bp, ino); if (error) { fixit++;