[01/12] xfs: bmap debugging should never panic the system
diff mbox

Message ID 152537075974.16676.4678069099698404803.stgit@magnolia
State Accepted
Headers show

Commit Message

Darrick J. Wong May 3, 2018, 6:05 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

Don't panic() the system if the bmap records are garbage, just call
ASSERT which gives us the same backtrace but enables developers to
control if the system goes down or not.  This makes debugging with
generic/388 much easier because it won't reboot the machine midway
through a run just because btree_read_bufl returns EIO when the fs has
already shut down.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christoph Hellwig May 7, 2018, 3:06 p.m. UTC | #1
On Thu, May 03, 2018 at 11:05:59AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Don't panic() the system if the bmap records are garbage, just call
> ASSERT which gives us the same backtrace but enables developers to
> control if the system goes down or not.  This makes debugging with
> generic/388 much easier because it won't reboot the machine midway
> through a run just because btree_read_bufl returns EIO when the fs has
> already shut down.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Foster May 10, 2018, 1:53 p.m. UTC | #2
On Thu, May 03, 2018 at 11:05:59AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Don't panic() the system if the bmap records are garbage, just call
> ASSERT which gives us the same backtrace but enables developers to
> control if the system goes down or not.  This makes debugging with
> generic/388 much easier because it won't reboot the machine midway
> through a run just because btree_read_bufl returns EIO when the fs has
> already shut down.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 6a7c2f03ea11..393b4e2c63ad 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -312,8 +312,10 @@ xfs_check_block(
>  				xfs_warn(mp, "%s: thispa(%d) == pp(%d) %Ld",
>  					__func__, j, i,
>  					(unsigned long long)be64_to_cpu(*thispa));
> -				panic("%s: ptrs are equal in node\n",
> +				xfs_err(mp, "%s: ptrs are equal in node\n",
>  					__func__);
> +				ASSERT(XFS_FORCED_SHUTDOWN(mp));
> +				xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);

Why the assert if we're just going to shutdown? At minimum, it looks a
bit funny to assert that the fs is shutdown just prior to issuing a
shutdown. Can we just shutdown and let the error level dictate whether
we want a backtrace?

Brian

>  			}
>  		}
>  	}
> @@ -483,7 +485,9 @@ xfs_bmap_check_leaf_extents(
>  error_norelse:
>  	xfs_warn(mp, "%s: BAD after btree leaves for %d extents",
>  		__func__, i);
> -	panic("%s: CORRUPTED BTREE OR SOMETHING", __func__);
> +	xfs_err(mp, "%s: CORRUPTED BTREE OR SOMETHING", __func__);
> +	ASSERT(XFS_FORCED_SHUTDOWN(mp));
> +	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
>  	return;
>  }
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong May 10, 2018, 3:45 p.m. UTC | #3
On Thu, May 10, 2018 at 09:53:19AM -0400, Brian Foster wrote:
> On Thu, May 03, 2018 at 11:05:59AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Don't panic() the system if the bmap records are garbage, just call
> > ASSERT which gives us the same backtrace but enables developers to
> > control if the system goes down or not.  This makes debugging with
> > generic/388 much easier because it won't reboot the machine midway
> > through a run just because btree_read_bufl returns EIO when the fs has
> > already shut down.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_bmap.c |    8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 6a7c2f03ea11..393b4e2c63ad 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -312,8 +312,10 @@ xfs_check_block(
> >  				xfs_warn(mp, "%s: thispa(%d) == pp(%d) %Ld",
> >  					__func__, j, i,
> >  					(unsigned long long)be64_to_cpu(*thispa));
> > -				panic("%s: ptrs are equal in node\n",
> > +				xfs_err(mp, "%s: ptrs are equal in node\n",
> >  					__func__);
> > +				ASSERT(XFS_FORCED_SHUTDOWN(mp));
> > +				xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> 
> Why the assert if we're just going to shutdown? At minimum, it looks a
> bit funny to assert that the fs is shutdown just prior to issuing a
> shutdown. Can we just shutdown and let the error level dictate whether
> we want a backtrace?

Ok.

--D

> Brian
> 
> >  			}
> >  		}
> >  	}
> > @@ -483,7 +485,9 @@ xfs_bmap_check_leaf_extents(
> >  error_norelse:
> >  	xfs_warn(mp, "%s: BAD after btree leaves for %d extents",
> >  		__func__, i);
> > -	panic("%s: CORRUPTED BTREE OR SOMETHING", __func__);
> > +	xfs_err(mp, "%s: CORRUPTED BTREE OR SOMETHING", __func__);
> > +	ASSERT(XFS_FORCED_SHUTDOWN(mp));
> > +	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> >  	return;
> >  }
> >  
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6a7c2f03ea11..393b4e2c63ad 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -312,8 +312,10 @@  xfs_check_block(
 				xfs_warn(mp, "%s: thispa(%d) == pp(%d) %Ld",
 					__func__, j, i,
 					(unsigned long long)be64_to_cpu(*thispa));
-				panic("%s: ptrs are equal in node\n",
+				xfs_err(mp, "%s: ptrs are equal in node\n",
 					__func__);
+				ASSERT(XFS_FORCED_SHUTDOWN(mp));
+				xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 			}
 		}
 	}
@@ -483,7 +485,9 @@  xfs_bmap_check_leaf_extents(
 error_norelse:
 	xfs_warn(mp, "%s: BAD after btree leaves for %d extents",
 		__func__, i);
-	panic("%s: CORRUPTED BTREE OR SOMETHING", __func__);
+	xfs_err(mp, "%s: CORRUPTED BTREE OR SOMETHING", __func__);
+	ASSERT(XFS_FORCED_SHUTDOWN(mp));
+	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 	return;
 }