[01/16] xfs_repair: fix missing dir buffer corruption checks
diff mbox series

Message ID 158904179840.982941.17275782452712518850.stgit@magnolia
State Superseded
Headers show
Series
  • xfs_repair: catch things that xfs_check misses
Related show

Commit Message

Darrick J. Wong May 9, 2020, 4:29 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

Both callers of da_read_buf do not adequately check for verifier
failures in the (salvage mode) buffer that gets returned.  This leads to
repair sometimes failing to complain about corrupt directories when run
with -n, which leads to an incorrect return value of 0 (all clean).

Found by running xfs/496 against lhdr.stale = middlebit.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/da_util.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Christoph Hellwig May 9, 2020, 5:08 p.m. UTC | #1
On Sat, May 09, 2020 at 09:29:58AM -0700, Darrick J. Wong wrote:
> @@ -140,11 +140,24 @@ _("can't read %s block %u for inode %" PRIu64 "\n"),
>  		if (whichfork == XFS_DATA_FORK &&
>  		    (nodehdr.magic == XFS_DIR2_LEAFN_MAGIC ||
>  		    nodehdr.magic == XFS_DIR3_LEAFN_MAGIC)) {
> +			int bad = 0;
> +
>  			if (i != -1) {
>  				do_warn(
>  _("found non-root LEAFN node in inode %" PRIu64 " bno = %u\n"),
>  					da_cursor->ino, bno);
> +				bad++;
>  			}
> +
> +			/* corrupt leafn node; rebuild the dir. */
> +			if (!bad &&
> +			    (bp->b_error == -EFSBADCRC ||
> +			     bp->b_error == -EFSCORRUPTED)) {
> +				do_warn(
> +_("corrupt %s LEAFN block %u for inode %" PRIu64 "\n"),
> +					FORKNAME(whichfork), bno, da_cursor->ino);
> +			}
> +

So this doesn't really change any return value, but just the error
message.  But looking at this code I wonder why we don't check
b_error first thing after reading the buffer, as checking the magic
for a corrupt buffer seems a little pointless.
Darrick J. Wong May 11, 2020, 4:44 p.m. UTC | #2
On Sat, May 09, 2020 at 10:08:50AM -0700, Christoph Hellwig wrote:
> On Sat, May 09, 2020 at 09:29:58AM -0700, Darrick J. Wong wrote:
> > @@ -140,11 +140,24 @@ _("can't read %s block %u for inode %" PRIu64 "\n"),
> >  		if (whichfork == XFS_DATA_FORK &&
> >  		    (nodehdr.magic == XFS_DIR2_LEAFN_MAGIC ||
> >  		    nodehdr.magic == XFS_DIR3_LEAFN_MAGIC)) {
> > +			int bad = 0;
> > +
> >  			if (i != -1) {
> >  				do_warn(
> >  _("found non-root LEAFN node in inode %" PRIu64 " bno = %u\n"),
> >  					da_cursor->ino, bno);
> > +				bad++;
> >  			}
> > +
> > +			/* corrupt leafn node; rebuild the dir. */
> > +			if (!bad &&
> > +			    (bp->b_error == -EFSBADCRC ||
> > +			     bp->b_error == -EFSCORRUPTED)) {
> > +				do_warn(
> > +_("corrupt %s LEAFN block %u for inode %" PRIu64 "\n"),
> > +					FORKNAME(whichfork), bno, da_cursor->ino);
> > +			}
> > +
> 
> So this doesn't really change any return value, but just the error
> message.  But looking at this code I wonder why we don't check
> b_error first thing after reading the buffer, as checking the magic
> for a corrupt buffer seems a little pointless.

<shrug> In the first hunk I was merely following what we did for DA_NODE
blocks (check magic, then check for corruption errors) but I guess I
could just pull that up in the file.  I'll have a look and see what
happens if I do that.

--D
Darrick J. Wong May 11, 2020, 5:36 p.m. UTC | #3
On Mon, May 11, 2020 at 09:44:21AM -0700, Darrick J. Wong wrote:
> On Sat, May 09, 2020 at 10:08:50AM -0700, Christoph Hellwig wrote:
> > On Sat, May 09, 2020 at 09:29:58AM -0700, Darrick J. Wong wrote:
> > > @@ -140,11 +140,24 @@ _("can't read %s block %u for inode %" PRIu64 "\n"),
> > >  		if (whichfork == XFS_DATA_FORK &&
> > >  		    (nodehdr.magic == XFS_DIR2_LEAFN_MAGIC ||
> > >  		    nodehdr.magic == XFS_DIR3_LEAFN_MAGIC)) {
> > > +			int bad = 0;
> > > +
> > >  			if (i != -1) {
> > >  				do_warn(
> > >  _("found non-root LEAFN node in inode %" PRIu64 " bno = %u\n"),
> > >  					da_cursor->ino, bno);
> > > +				bad++;
> > >  			}
> > > +
> > > +			/* corrupt leafn node; rebuild the dir. */
> > > +			if (!bad &&
> > > +			    (bp->b_error == -EFSBADCRC ||
> > > +			     bp->b_error == -EFSCORRUPTED)) {
> > > +				do_warn(
> > > +_("corrupt %s LEAFN block %u for inode %" PRIu64 "\n"),
> > > +					FORKNAME(whichfork), bno, da_cursor->ino);
> > > +			}
> > > +
> > 
> > So this doesn't really change any return value, but just the error
> > message.  But looking at this code I wonder why we don't check
> > b_error first thing after reading the buffer, as checking the magic
> > for a corrupt buffer seems a little pointless.
> 
> <shrug> In the first hunk I was merely following what we did for DA_NODE
> blocks (check magic, then check for corruption errors) but I guess I
> could just pull that up in the file.  I'll have a look and see what
> happens if I do that.

...and on re-examination, the other da_read_buf callers in repair/dir2.c
ought to look for EFSCORRUPTED (they already look for EFSBADCRC) and
complain about that.  Seeing as the directory salvage is done separately
in phase6.c, I guess there's no harm in jumping out early in phases 3/4.

--D

> --D

Patch
diff mbox series

diff --git a/repair/da_util.c b/repair/da_util.c
index 5061880f..92c04337 100644
--- a/repair/da_util.c
+++ b/repair/da_util.c
@@ -140,11 +140,24 @@  _("can't read %s block %u for inode %" PRIu64 "\n"),
 		if (whichfork == XFS_DATA_FORK &&
 		    (nodehdr.magic == XFS_DIR2_LEAFN_MAGIC ||
 		    nodehdr.magic == XFS_DIR3_LEAFN_MAGIC)) {
+			int bad = 0;
+
 			if (i != -1) {
 				do_warn(
 _("found non-root LEAFN node in inode %" PRIu64 " bno = %u\n"),
 					da_cursor->ino, bno);
+				bad++;
 			}
+
+			/* corrupt leafn node; rebuild the dir. */
+			if (!bad &&
+			    (bp->b_error == -EFSBADCRC ||
+			     bp->b_error == -EFSCORRUPTED)) {
+				do_warn(
+_("corrupt %s LEAFN block %u for inode %" PRIu64 "\n"),
+					FORKNAME(whichfork), bno, da_cursor->ino);
+			}
+
 			*rbno = 0;
 			libxfs_buf_relse(bp);
 			return 1;
@@ -599,6 +612,14 @@  _("bad level %d in %s block %u for inode %" PRIu64 "\n"),
 				dabno, cursor->ino);
 			bad++;
 		}
+		if (!bad &&
+		    (bp->b_error == -EFSCORRUPTED ||
+		     bp->b_error == -EFSBADCRC)) {
+			do_warn(
+_("corruption error reading %s block %u for inode %" PRIu64 "\n"),
+				FORKNAME(whichfork), dabno, cursor->ino);
+			bad++;
+		}
 		if (bad) {
 #ifdef XR_DIR_TRACE
 			fprintf(stderr, "verify_da_path returns 1 (bad) #4\n");