diff mbox series

xfs_repair: handling a block with bad crc, bad uuid, and bad magic number needs fixing

Message ID 20250320202518.644182-1-bodonnel@redhat.com (mailing list archive)
State New
Headers show
Series xfs_repair: handling a block with bad crc, bad uuid, and bad magic number needs fixing | expand

Commit Message

Bill O'Donnell March 20, 2025, 8:25 p.m. UTC
From: Bill O'Donnell <bodonnel@redhat.com>

In certain cases, if a block is so messed up that crc, uuid and magic number are all
bad, we need to not only detect in phase3 but fix it properly in phase6. In the
current code, the mechanism doesn't work in that it only pays attention to one of the
parameters.

Note: in this case, the nlink inode link count drops to 1, but re-running xfs_repair
fixes it back to 2. This is a side effect that should probably be handled in
update_inode_nlinks() with separate patch. Regardless, running xfs_repair twice
fixes the issue. Also, this patch fixes the issue with v5, but not v4 xfs.

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
---
 repair/phase6.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Comments

Eric Sandeen March 20, 2025, 10:21 p.m. UTC | #1
For anyone interested in evaluating this problem, here is a cleanroom'd
reproducer metadump image that demonstrates it.

For a single-block extent format directory ...

xfs_db> inode 131
xfs_db> p
...
0:[0,15,1,0]
...
xfs_db> fsblock 15
xfs_db> type dir3

we corrupt the magic, the uuid, and the crc of this dir3 block:

xfs_db> write -c bhdr.hdr.uuid 0xdeadbeef
Allowing write of corrupted data and bad CRC
bhdr.hdr.uuid = 20000000-0000-0000-0000-0000deadbeef
xfs_db> write -c bhdr.hdr.magic 0xfeedface
Allowing write of corrupted data and bad CRC
bhdr.hdr.magic = 0xfeedface
xfs_db> quit

and then xfs_repair fails to fix things up enough to pass the verifier
when it tries to write out the buffer after "fixing" it:

# xfs_repair foo.img
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
Metadata CRC error detected at 0x5556997d5ca0, xfs_dir3_block block 0x78/0x1000
bad directory block magic # 0xfeedface in block 0 for directory inode 131
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
        - agno = 3
bad directory block magic # 0xfeedface in block 0 for directory inode 131
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
bad directory block magic # 0xfeedface for directory inode 131 block 0: fixing magic # to 0x58444233
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Metadata corruption detected at 0x5556997d5990, xfs_dir3_block block 0x78/0x1000
libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0x78/0x8
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.

This is because this sequence in longform_dir2_entry_check():

                /* check v5 metadata */
                d = bp->b_addr;
                if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC ||
                    be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) {
                        error = check_dir3_header(mp, bp, ino);
                        if (error) {
                                fixit++;
                                if (fmt == XFS_DIR2_FMT_BLOCK)
                                        goto out_fix;

                                libxfs_buf_relse(bp);
                                bp = NULL;
                                continue;
                        }
                }

                longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot,
                                irec, ino_offset, bp, hashtab,
                                &freetab, da_bno, fmt == XFS_DIR2_FMT_BLOCK);

would have fixed the UUID had check_dir3_header found the error, but the magic
was wrong so that never ran and fixit was never set.

longform_dir2_entry_check_data then fixes the magic and the crc, but does not
fix the UUID, so the verifier check fails on writeout.

When all 3 items are bad, I'm not exactly sure what we should do to get the
UUID fixed up here (or if it just should have been junked at that point)

-Eric
Eric Sandeen March 20, 2025, 11:28 p.m. UTC | #2
On 3/20/25 3:25 PM, bodonnel@redhat.com wrote:
> From: Bill O'Donnell <bodonnel@redhat.com>
> 
> In certain cases, if a block is so messed up that crc, uuid and magic number are all
> bad, we need to not only detect in phase3 but fix it properly in phase6. In the
> current code, the mechanism doesn't work in that it only pays attention to one of the
> parameters.
> 
> Note: in this case, the nlink inode link count drops to 1, but re-running xfs_repair
> fixes it back to 2. This is a side effect that should probably be handled in
> update_inode_nlinks() with separate patch. Regardless, running xfs_repair twice
> fixes the issue. Also, this patch fixes the issue with v5, but not v4 xfs.
> 
> Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
> ---
>  repair/phase6.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/repair/phase6.c b/repair/phase6.c
> index 4064a84b2450..677505d45357 100644
> --- a/repair/phase6.c
> +++ b/repair/phase6.c
> @@ -2336,6 +2336,7 @@ longform_dir2_entry_check(
>  	int			fixit = 0;
>  	struct xfs_da_args	args;
>  	int			error;
> +	int			wantmagic;
>  
>  	*need_dot = 1;
>  	freetab = malloc(FREETAB_SIZE(ip->i_disk_size / mp->m_dir_geo->blksize));
> @@ -2364,7 +2365,6 @@ longform_dir2_entry_check(
>  	     da_bno = (xfs_dablk_t)next_da_bno) {
>  		const struct xfs_buf_ops *ops;
>  		int			 error;
> -		struct xfs_dir2_data_hdr *d;
>  
>  		next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1;
>  		if (bmap_next_offset(ip, &next_da_bno)) {
> @@ -2404,9 +2404,11 @@ longform_dir2_entry_check(
>  		}
>  
>  		/* check v5 metadata */
> -		d = bp->b_addr;
> -		if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC ||
> -		    be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) {
> +		if (xfs_has_crc(mp))
> +			wantmagic = XFS_DIR3_BLOCK_MAGIC;
> +		else
> +			wantmagic = XFS_DIR2_BLOCK_MAGIC;
> +		if (wantmagic == XFS_DIR3_BLOCK_MAGIC) {

So I guess the prior 5 lines are equivalent to:

		/* check v5 metadata */
		if (xfs_has_crc(mp)) {
...

and that will force it to check the header, below. And then I think we hit
the goto out_fix; line, and it moves the contents of this directory to
lost+found (at least on my custom repro image.)

Curious to see what others think is the right path through all this.

-Eric

>  			error = check_dir3_header(mp, bp, ino);
>  			if (error) {
>  				fixit++;
Christoph Hellwig March 21, 2025, 7:32 a.m. UTC | #3
On Thu, Mar 20, 2025 at 05:21:22PM -0500, Eric Sandeen wrote:
> For anyone interested in evaluating this problem, here is a cleanroom'd
> reproducer metadump image that demonstrates it.

Which makes me wonder again if we want these kinds of images stored
in xfstests somehow (or figure out a way to create them by fuzzing)
Christoph Hellwig March 21, 2025, 7:35 a.m. UTC | #4
On Thu, Mar 20, 2025 at 06:28:19PM -0500, Eric Sandeen wrote:
> > -		    be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) {
> > +		if (xfs_has_crc(mp))
> > +			wantmagic = XFS_DIR3_BLOCK_MAGIC;
> > +		else
> > +			wantmagic = XFS_DIR2_BLOCK_MAGIC;
> > +		if (wantmagic == XFS_DIR3_BLOCK_MAGIC) {
> 
> So I guess the prior 5 lines are equivalent to:
> 
> 		/* check v5 metadata */
> 		if (xfs_has_crc(mp)) {
> ...
> 
> and that will force it to check the header, below. And then I think we hit
> the goto out_fix; line, and it moves the contents of this directory to
> lost+found (at least on my custom repro image.)
> 
> Curious to see what others think is the right path through all this.

I'll need some time to actually understand the code, but replacing the
wantmagic logic with just doign the xfs_has_crc check looks right and
muche easier to follow.
diff mbox series

Patch

diff --git a/repair/phase6.c b/repair/phase6.c
index 4064a84b2450..677505d45357 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -2336,6 +2336,7 @@  longform_dir2_entry_check(
 	int			fixit = 0;
 	struct xfs_da_args	args;
 	int			error;
+	int			wantmagic;
 
 	*need_dot = 1;
 	freetab = malloc(FREETAB_SIZE(ip->i_disk_size / mp->m_dir_geo->blksize));
@@ -2364,7 +2365,6 @@  longform_dir2_entry_check(
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		const struct xfs_buf_ops *ops;
 		int			 error;
-		struct xfs_dir2_data_hdr *d;
 
 		next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1;
 		if (bmap_next_offset(ip, &next_da_bno)) {
@@ -2404,9 +2404,11 @@  longform_dir2_entry_check(
 		}
 
 		/* check v5 metadata */
-		d = bp->b_addr;
-		if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC ||
-		    be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) {
+		if (xfs_has_crc(mp))
+			wantmagic = XFS_DIR3_BLOCK_MAGIC;
+		else
+			wantmagic = XFS_DIR2_BLOCK_MAGIC;
+		if (wantmagic == XFS_DIR3_BLOCK_MAGIC) {
 			error = check_dir3_header(mp, bp, ino);
 			if (error) {
 				fixit++;