diff mbox series

[v3] xfs_repair: handling a block with bad crc, bad uuid, and bad magic number needs fixing

Message ID 20250321220532.691118-4-bodonnel@redhat.com (mailing list archive)
State New
Headers show
Series [v3] xfs_repair: handling a block with bad crc, bad uuid, and bad magic number needs fixing | expand

Commit Message

Bill O'Donnell March 21, 2025, 10:05 p.m. UTC
From: Bill O'Donnell <bodonnel@redhat.com>

In certain cases, if a block is so messed up that crc, uuid and magic
number are all bad, we need to not only detect in phase3 but fix it
properly in phase6. In the current code, the mechanism doesn't work
in that it only pays attention to one of the parameters.

Note: in this case, the nlink inode link count drops to 1, but
re-running xfs_repair fixes it back to 2. This is a side effect that
should probably be handled in update_inode_nlinks() with separate patch.
Regardless, running xfs_repair twice, with this patch applied
fixes the issue. Recognize that this patch is a fix for xfs v5.

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

v2: remove superfluous needmagic logic
v3: clarify the description

---
 repair/phase6.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Comments

Christoph Hellwig March 23, 2025, 6:31 a.m. UTC | #1
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Eric Sandeen March 23, 2025, 3:51 p.m. UTC | #2
On 3/21/25 5:05 PM, bodonnel@redhat.com wrote:
> From: Bill O'Donnell <bodonnel@redhat.com>
> 
> In certain cases, if a block is so messed up that crc, uuid and magic
> number are all bad, we need to not only detect in phase3 but fix it
> properly in phase6. In the current code, the mechanism doesn't work
> in that it only pays attention to one of the parameters.
> 
> Note: in this case, the nlink inode link count drops to 1, but
> re-running xfs_repair fixes it back to 2. This is a side effect that
> should probably be handled in update_inode_nlinks() with separate patch.
> Regardless, running xfs_repair twice, with this patch applied
> fixes the issue. Recognize that this patch is a fix for xfs v5.

The reason this fix leaves the inode nlinks in an inconsistent state
is because the dir is (was) a longform directory (XFS_DIR2_FMT_BLOCK),
and we go down this path with your patch in place:

                /* check v5 metadata */
                if (xfs_has_crc(mp)) {
                        error = check_dir3_header(mp, bp, ino);
                        if (error) {
                                fixit++;
                                if (fmt == XFS_DIR2_FMT_BLOCK) { <==== true
                                        goto out_fix;	<=== goto
                                }

                                libxfs_buf_relse(bp);
                                bp = NULL;
                                continue;
                        }
                }

                longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot,
                                irec, ino_offset, bp, hashtab,
                                &freetab, da_bno, fmt == XFS_DIR2_FMT_BLOCK);
...

out_fix:

        if (!no_modify && (fixit || dotdot_update)) {
                longform_dir2_rebuild(mp, ino, ip, irec, ino_offset, hashtab);


longform_dir2_rebuild tries to rebuild the directory from the entries found
via longform_dir2_entry_check_data() and placed in hashtab, but because we never
called longform_dir2_entry_check_data(), hashtab is empty. This is why all
entries in the problematic dir end up in lost+found.

That also means that longform_dir2_rebuild completes without adding any entries
at all, and so the directory is now shortform. Because shortform directories 
have no explicit "." entry, I think it would need an extra ref added at this
point.

But I wonder - why not call longform_dir2_entry_check_data() before we check
the header? That way it /will/ populate hashtab with any found entries in the
block, and when the header is found to be corrupt, it will rebuild it with all
entries intact, and leave nothing in lost+found.


Thoughts?
-Eric
Eric Sandeen March 23, 2025, 4:10 p.m. UTC | #3
On 3/23/25 10:51 AM, Eric Sandeen wrote:

...

> But I wonder - why not call longform_dir2_entry_check_data() before we check
> the header? That way it /will/ populate hashtab with any found entries in the
> block, and when the header is found to be corrupt, it will rebuild it with all
> entries intact, and leave nothing in lost+found.

Or alternatively, since checking the header first makes intuitive sense, perhaps
change the logic so that we still check the header first, set fixit++, and if we
are in XFS_DIR2_FMT_BLOCK, allow longform_dir2_entry_check_data() to be called,
and then if errors were found in the header, goto out_fix; at that point, with
a populated hashtab.

-Eric
 
> 
> Thoughts?
> -Eric
> 
> 
> 
>
Bill O'Donnell March 24, 2025, 4:56 p.m. UTC | #4
On Sun, Mar 23, 2025 at 10:51:52AM -0500, Eric Sandeen wrote:
> On 3/21/25 5:05 PM, bodonnel@redhat.com wrote:
> > From: Bill O'Donnell <bodonnel@redhat.com>
> > 
> > In certain cases, if a block is so messed up that crc, uuid and magic
> > number are all bad, we need to not only detect in phase3 but fix it
> > properly in phase6. In the current code, the mechanism doesn't work
> > in that it only pays attention to one of the parameters.
> > 
> > Note: in this case, the nlink inode link count drops to 1, but
> > re-running xfs_repair fixes it back to 2. This is a side effect that
> > should probably be handled in update_inode_nlinks() with separate patch.
> > Regardless, running xfs_repair twice, with this patch applied
> > fixes the issue. Recognize that this patch is a fix for xfs v5.
> 
> The reason this fix leaves the inode nlinks in an inconsistent state
> is because the dir is (was) a longform directory (XFS_DIR2_FMT_BLOCK),
> and we go down this path with your patch in place:
> 
>                 /* check v5 metadata */
>                 if (xfs_has_crc(mp)) {
>                         error = check_dir3_header(mp, bp, ino);
>                         if (error) {
>                                 fixit++;
>                                 if (fmt == XFS_DIR2_FMT_BLOCK) { <==== true
>                                         goto out_fix;	<=== goto
>                                 }
> 
>                                 libxfs_buf_relse(bp);
>                                 bp = NULL;
>                                 continue;
>                         }
>                 }
> 
>                 longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot,
>                                 irec, ino_offset, bp, hashtab,
>                                 &freetab, da_bno, fmt == XFS_DIR2_FMT_BLOCK);
> ...
> 
> out_fix:
> 
>         if (!no_modify && (fixit || dotdot_update)) {
>                 longform_dir2_rebuild(mp, ino, ip, irec, ino_offset, hashtab);
> 
> 
> longform_dir2_rebuild tries to rebuild the directory from the entries found
> via longform_dir2_entry_check_data() and placed in hashtab, but because we never
> called longform_dir2_entry_check_data(), hashtab is empty. This is why all
> entries in the problematic dir end up in lost+found.
> 
> That also means that longform_dir2_rebuild completes without adding any entries
> at all, and so the directory is now shortform. Because shortform directories 
> have no explicit "." entry, I think it would need an extra ref added at this
> point.
> 
> But I wonder - why not call longform_dir2_entry_check_data() before we check
> the header? That way it /will/ populate hashtab with any found entries in the
> block, and when the header is found to be corrupt, it will rebuild it with all
> entries intact, and leave nothing in lost+found.

Yes, this works as you describe.
Thanks-
Bill
diff mbox series

Patch

diff --git a/repair/phase6.c b/repair/phase6.c
index 4064a84b2450..9cffbb1f4510 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -2364,7 +2364,6 @@  longform_dir2_entry_check(
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		const struct xfs_buf_ops *ops;
 		int			 error;
-		struct xfs_dir2_data_hdr *d;
 
 		next_da_bno = da_bno + mp->m_dir_geo->fsbcount - 1;
 		if (bmap_next_offset(ip, &next_da_bno)) {
@@ -2404,9 +2403,7 @@  longform_dir2_entry_check(
 		}
 
 		/* check v5 metadata */
-		d = bp->b_addr;
-		if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC ||
-		    be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) {
+		if (xfs_has_crc(mp)) {
 			error = check_dir3_header(mp, bp, ino);
 			if (error) {
 				fixit++;