diff mbox series

[5/9] xfs: make forced shutdown processing atomic

Message ID 20210630063813.1751007-6-david@fromorbit.com (mailing list archive)
State Superseded
Headers show
Series xfs: shutdown is a racy mess | expand

Commit Message

Dave Chinner June 30, 2021, 6:38 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

The running of a forced shutdown is a bit of a mess. It does racy
checks for XFS_MOUNT_SHUTDOWN in xfs_do_force_shutdown(), then
does more racy checks in xfs_log_force_unmount() before finally
setting XFS_MOUNT_SHUTDOWN and XLOG_IO_ERROR under the
log->icloglock.

Move the checking and setting of XFS_MOUNT_SHUTDOWN into
xfs_do_force_shutdown() so we only process a shutdown once and once
only. Serialise this with the mp->m_sb_lock spinlock so that the
state change is atomic and won't race. Move all the mount specific
shutdown state changes from xfs_log_force_unmount() to
xfs_do_force_shutdown() so they are done atomically with setting
XFS_MOUNT_SHUTDOWN.

Then get rid of the racy xlog_is_shutdown() check from
xlog_force_shutdown(), and gate the log shutdown on the
test_and_set_bit(XLOG_IO_ERROR) test under the icloglock. This
means that the log is shutdown once and once only, and code that
needs to prevent races with shutdown can do so by holding the
icloglock and checking the return value of xlog_is_shutdown().

This results in a predicable shutdown execution process - we set the
shutdown flags once and process the shutdown once rather than the
current "as many concurrent shutdowns as can race to the flag
setting" situation we have now.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c |  64 ++++++++++++++---------------
 fs/xfs/xfs_log.c   | 100 ++++++++++++++++++++-------------------------
 fs/xfs/xfs_log.h   |   2 +-
 3 files changed, 77 insertions(+), 89 deletions(-)

Comments

Christoph Hellwig July 2, 2021, 8:24 a.m. UTC | #1
> +	spin_lock(&mp->m_sb_lock);
> +	if (XFS_FORCED_SHUTDOWN(mp)) {
> +		spin_unlock(&mp->m_sb_lock);
>  		return;
>  	}
> +	mp->m_flags |= XFS_MOUNT_FS_SHUTDOWN;
> +	if (mp->m_sb_bp)
> +		mp->m_sb_bp->b_flags |= XBF_DONE;
> +	spin_unlock(&mp->m_sb_lock);

Any particular reason for picking m_sb_lock which so far doesn't
seem to be related to mp->m_flags at all? (On which we probably
have a few other races, most notably remount).

> +	if (xfs_error_level >= XFS_ERRLEVEL_HIGH)
> +		xfs_stack_trace();

This seems new and unrelated?

> +

Spurious empty line at the end of the function.
Dave Chinner July 5, 2021, 1:28 a.m. UTC | #2
On Fri, Jul 02, 2021 at 09:24:27AM +0100, Christoph Hellwig wrote:
> > +	spin_lock(&mp->m_sb_lock);
> > +	if (XFS_FORCED_SHUTDOWN(mp)) {
> > +		spin_unlock(&mp->m_sb_lock);
> >  		return;
> >  	}
> > +	mp->m_flags |= XFS_MOUNT_FS_SHUTDOWN;
> > +	if (mp->m_sb_bp)
> > +		mp->m_sb_bp->b_flags |= XBF_DONE;
> > +	spin_unlock(&mp->m_sb_lock);
> 
> Any particular reason for picking m_sb_lock which so far doesn't
> seem to be related to mp->m_flags at all? (On which we probably
> have a few other races, most notably remount).

I was just reusing an existing lock rather than having to add yet
another global scope spinlock just for the shutdown. I can add
another lock but...

... as you point out, m_flags is a mess when it comes to races. I've
got a series somewhere that addresses this... Yeah:

https://lore.kernel.org/linux-xfs/20180820044851.414-1-david@fromorbit.com/

And that does similar things bit making state changes atomic as this
patchset does, in which case the lock around this shutdown state
change goes away...

I need to rebase that series and get it moving again, because we
really do need to split m_flags up into features and atomic
operational state, too. In the mean time, I considered using the
m_sb_lock largely harmless...

> > +	if (xfs_error_level >= XFS_ERRLEVEL_HIGH)
> > +		xfs_stack_trace();
> 
> This seems new and unrelated?

It's run on the majority of shutdowns already, but not all. It makes
no sense to have an error level triggered stack dump on shutdown and
not actually use it multiple shutdown vectors - that has been
problematic in the past when trying to diagnose shutdown causes in
the field. I'll add a comment to the commit description.

> > +
> 
> Spurious empty line at the end of the function.

Removed.

-Dave.
Darrick J. Wong July 9, 2021, 4:40 a.m. UTC | #3
On Wed, Jun 30, 2021 at 04:38:09PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The running of a forced shutdown is a bit of a mess. It does racy
> checks for XFS_MOUNT_SHUTDOWN in xfs_do_force_shutdown(), then
> does more racy checks in xfs_log_force_unmount() before finally
> setting XFS_MOUNT_SHUTDOWN and XLOG_IO_ERROR under the
> log->icloglock.
> 
> Move the checking and setting of XFS_MOUNT_SHUTDOWN into
> xfs_do_force_shutdown() so we only process a shutdown once and once
> only. Serialise this with the mp->m_sb_lock spinlock so that the
> state change is atomic and won't race. Move all the mount specific

Assuming you're working on cleaning /that/ up too, I'll let that go...

> shutdown state changes from xfs_log_force_unmount() to
> xfs_do_force_shutdown() so they are done atomically with setting
> XFS_MOUNT_SHUTDOWN.
> 
> Then get rid of the racy xlog_is_shutdown() check from
> xlog_force_shutdown(), and gate the log shutdown on the
> test_and_set_bit(XLOG_IO_ERROR) test under the icloglock. This
> means that the log is shutdown once and once only, and code that
> needs to prevent races with shutdown can do so by holding the
> icloglock and checking the return value of xlog_is_shutdown().
> 
> This results in a predicable shutdown execution process - we set the

s/predicable/predictable/

> shutdown flags once and process the shutdown once rather than the
> current "as many concurrent shutdowns as can race to the flag
> setting" situation we have now.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_fsops.c |  64 ++++++++++++++---------------
>  fs/xfs/xfs_log.c   | 100 ++++++++++++++++++++-------------------------
>  fs/xfs/xfs_log.h   |   2 +-
>  3 files changed, 77 insertions(+), 89 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 6ed29b158312..caca391ce1a9 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -511,6 +511,11 @@ xfs_fs_goingdown(
>   * consistent. We don't do an unmount here; just shutdown the shop, make sure
>   * that absolutely nothing persistent happens to this filesystem after this
>   * point.
> + *
> + * The shutdown state change is atomic, resulting in the first and only the
> + * first shutdown call processing the shutdown. This means we only shutdown the
> + * log once as it requires, and we don't spam the logs when multiple concurrent
> + * shutdowns race to set the shutdown flags.
>   */
>  void
>  xfs_do_force_shutdown(
> @@ -519,48 +524,41 @@ xfs_do_force_shutdown(
>  	char		*fname,
>  	int		lnnum)
>  {
> -	bool		logerror = flags & SHUTDOWN_LOG_IO_ERROR;
> -
> -	/*
> -	 * No need to duplicate efforts.
> -	 */
> -	if (XFS_FORCED_SHUTDOWN(mp) && !logerror)
> -		return;
> -
> -	/*
> -	 * This flags XFS_MOUNT_FS_SHUTDOWN, makes sure that we don't
> -	 * queue up anybody new on the log reservations, and wakes up
> -	 * everybody who's sleeping on log reservations to tell them
> -	 * the bad news.
> -	 */
> -	if (xfs_log_force_umount(mp, logerror))
> -		return;
> +	int		tag;
> +	const char	*why;
>  
> -	if (flags & SHUTDOWN_FORCE_UMOUNT) {
> -		xfs_alert(mp,
> -"User initiated shutdown (0x%x) received. Shutting down filesystem",
> -				flags);
> +	spin_lock(&mp->m_sb_lock);
> +	if (XFS_FORCED_SHUTDOWN(mp)) {
> +		spin_unlock(&mp->m_sb_lock);
>  		return;
>  	}
> +	mp->m_flags |= XFS_MOUNT_FS_SHUTDOWN;
> +	if (mp->m_sb_bp)
> +		mp->m_sb_bp->b_flags |= XBF_DONE;
> +	spin_unlock(&mp->m_sb_lock);
> +
> +	if (flags & SHUTDOWN_FORCE_UMOUNT)
> +		xfs_alert(mp, "User initiated shutdown received.");
>  
> -	if (flags & SHUTDOWN_CORRUPT_INCORE) {
> -		xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_CORRUPT,
> -"Corruption of in-memory data (0x%x) detected at %pS (%s:%d).  Shutting down filesystem",
> -				flags, __return_address, fname, lnnum);
> -		if (XFS_ERRLEVEL_HIGH <= xfs_error_level)
> -			xfs_stack_trace();
> -	} else if (logerror) {
> -		xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_LOGERROR,
> -"Log I/O error (0x%x) detected at %pS (%s:%d). Shutting down filesystem",
> -				flags, __return_address, fname, lnnum);
> +	if (xlog_force_shutdown(mp->m_log, flags)) {
> +		tag = XFS_PTAG_SHUTDOWN_LOGERROR;
> +		why = "Log I/O Error";
> +	} else if (flags & SHUTDOWN_CORRUPT_INCORE) {
> +		tag = XFS_PTAG_SHUTDOWN_CORRUPT;
> +		why = "Corruption of in-memory data";
>  	} else {
> -		xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_IOERROR,
> -"I/O error (0x%x) detected at %pS (%s:%d). Shutting down filesystem",
> -				flags, __return_address, fname, lnnum);
> +		tag = XFS_PTAG_SHUTDOWN_IOERROR;
> +		why = "Metadata I/O Error";
>  	}
>  
> +	xfs_alert_tag(mp, tag,
> +"%s (0x%x) detected at %pS (%s:%d).  Shutting down filesystem.",
> +			why, flags, __return_address, fname, lnnum);
>  	xfs_alert(mp,
>  		"Please unmount the filesystem and rectify the problem(s)");
> +	if (xfs_error_level >= XFS_ERRLEVEL_HIGH)
> +		xfs_stack_trace();

Doesn't xfs_alert already drop a stack trace for xfs_error_level >=
XFS_ERRLEVEL_HIGH ?

--D

> +
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 049d6ac9cb4d..6c7cfc052135 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -3673,76 +3673,66 @@ xlog_verify_iclog(
>  #endif
>  
>  /*
> - * This is called from xfs_force_shutdown, when we're forcibly
> - * shutting down the filesystem, typically because of an IO error.
> - * Our main objectives here are to make sure that:
> - *	a. if !logerror, flush the logs to disk. Anything modified
> - *	   after this is ignored.
> - *	b. the filesystem gets marked 'SHUTDOWN' for all interested
> - *	   parties to find out, 'atomically'.
> - *	c. those who're sleeping on log reservations, pinned objects and
> - *	    other resources get woken up, and be told the bad news.
> - *	d. nothing new gets queued up after (b) and (c) are done.
> + * Perform a forced shutdown on the log. This should be called once and once
> + * only by the high level filesystem shutdown code to shut the log subsystem
> + * down cleanly.
>   *
> - * Note: for the !logerror case we need to flush the regions held in memory out
> - * to disk first. This needs to be done before the log is marked as shutdown,
> - * otherwise the iclog writes will fail.
> + * Our main objectives here are to make sure that:
> + *	a. if the shutdown was not due to a log IO error, flush the logs to
> + *	   disk. Anything modified after this is ignored.
> + *	b. the log gets atomically marked 'XLOG_IO_ERROR' for all interested
> + *	   parties to find out. Nothing new gets queued after this is done.
> + *	c. Tasks sleeping on log reservations, pinned objects and
> + *	   other resources get woken up.
>   *
> - * Return non-zero if log shutdown transition had already happened.
> + * Return true if the shutdown cause was a log IO error and we actually shut the
> + * log down.
>   */
> -int
> -xfs_log_force_umount(
> -	struct xfs_mount	*mp,
> -	int			logerror)
> +bool
> +xlog_force_shutdown(
> +	struct xlog	*log,
> +	int		shutdown_flags)
>  {
> -	struct xlog	*log;
> -	int		retval = 0;
> -
> -	log = mp->m_log;
> +	bool		log_error = (shutdown_flags & SHUTDOWN_LOG_IO_ERROR);
>  
>  	/*
> -	 * If this happens during log recovery, don't worry about
> -	 * locking; the log isn't open for business yet.
> +	 * If this happens during log recovery then we aren't using the runtime
> +	 * log mechanisms yet so there's nothing to shut down.
>  	 */
> -	if (!log || xlog_in_recovery(log)) {
> -		mp->m_flags |= XFS_MOUNT_FS_SHUTDOWN;
> -		if (mp->m_sb_bp)
> -			mp->m_sb_bp->b_flags |= XBF_DONE;
> -		return 0;
> -	}
> +	if (!log || xlog_in_recovery(log))
> +		return false;
>  
> -	/*
> -	 * Somebody could've already done the hard work for us.
> -	 * No need to get locks for this.
> -	 */
> -	if (logerror && xlog_is_shutdown(log))
> -		return 1;
> +	ASSERT(!xlog_is_shutdown(log));
>  
>  	/*
>  	 * Flush all the completed transactions to disk before marking the log
> -	 * being shut down. We need to do it in this order to ensure that
> -	 * completed operations are safely on disk before we shut down, and that
> -	 * we don't have to issue any buffer IO after the shutdown flags are set
> -	 * to guarantee this.
> +	 * being shut down. We need to do this first as shutting down the log
> +	 * before the force will prevent the log force from flushing the iclogs
> +	 * to disk.
> +	 *
> +	 * Re-entry due to a log IO error shutdown during the log force is
> +	 * prevented by the atomicity of higher level shutdown code.
>  	 */
> -	if (!logerror)
> -		xfs_log_force(mp, XFS_LOG_SYNC);
> +	if (!log_error)
> +		xfs_log_force(log->l_mp, XFS_LOG_SYNC);
>  
>  	/*
> -	 * mark the filesystem and the as in a shutdown state and wake
> -	 * everybody up to tell them the bad news.
> +	 * Atomically set the shutdown state. If the shutdown state is already
> +	 * set, there someone else is performing the shutdown and so we are done
> +	 * here. This should never happen because we should only ever get called
> +	 * once by the first shutdown caller.
> +	 *
> +	 * Much of the log state machine transitions assume that shutdown state
> +	 * cannot change once they hold the log->l_icloglock. Hence we need to
> +	 * hold that lock here, even though we use the atomic test_and_set_bit()
> +	 * operation to set the shutdown state.
>  	 */
>  	spin_lock(&log->l_icloglock);
> -	mp->m_flags |= XFS_MOUNT_FS_SHUTDOWN;
> -	if (mp->m_sb_bp)
> -		mp->m_sb_bp->b_flags |= XBF_DONE;
> -
> -	/*
> -	 * Mark the log and the iclogs with IO error flags to prevent any
> -	 * further log IO from being issued or completed.
> -	 */
> -	if (!test_and_set_bit(XLOG_IO_ERROR, &log->l_opstate))
> -		retval = 1;
> +	if (test_and_set_bit(XLOG_IO_ERROR, &log->l_opstate)) {
> +		spin_unlock(&log->l_icloglock);
> +		ASSERT(0);
> +		return false;
> +	}
>  	spin_unlock(&log->l_icloglock);
>  
>  	/*
> @@ -3766,7 +3756,7 @@ xfs_log_force_umount(
>  	spin_unlock(&log->l_cilp->xc_push_lock);
>  	xlog_state_do_callback(log);
>  
> -	return retval;
> +	return log_error;
>  }
>  
>  STATIC int
> diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
> index 4c5c8a7db1d9..3f680f0c9744 100644
> --- a/fs/xfs/xfs_log.h
> +++ b/fs/xfs/xfs_log.h
> @@ -125,7 +125,6 @@ int	  xfs_log_reserve(struct xfs_mount *mp,
>  			  bool		   permanent);
>  int	  xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic);
>  void      xfs_log_unmount(struct xfs_mount *mp);
> -int	  xfs_log_force_umount(struct xfs_mount *mp, int logerror);
>  bool	xfs_log_writable(struct xfs_mount *mp);
>  
>  struct xlog_ticket *xfs_log_ticket_get(struct xlog_ticket *ticket);
> @@ -140,5 +139,6 @@ void	xfs_log_clean(struct xfs_mount *mp);
>  bool	xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t);
>  
>  xfs_lsn_t xlog_grant_push_threshold(struct xlog *log, int need_bytes);
> +bool	  xlog_force_shutdown(struct xlog *log, int shutdown_flags);
>  
>  #endif	/* __XFS_LOG_H__ */
> -- 
> 2.31.1
>
Dave Chinner July 14, 2021, 3:15 a.m. UTC | #4
On Thu, Jul 08, 2021 at 09:40:20PM -0700, Darrick J. Wong wrote:
> On Wed, Jun 30, 2021 at 04:38:09PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > The running of a forced shutdown is a bit of a mess. It does racy
> > checks for XFS_MOUNT_SHUTDOWN in xfs_do_force_shutdown(), then
> > does more racy checks in xfs_log_force_unmount() before finally
> > setting XFS_MOUNT_SHUTDOWN and XLOG_IO_ERROR under the
> > log->icloglock.
> > 
> > Move the checking and setting of XFS_MOUNT_SHUTDOWN into
> > xfs_do_force_shutdown() so we only process a shutdown once and once
> > only. Serialise this with the mp->m_sb_lock spinlock so that the
> > state change is atomic and won't race. Move all the mount specific
> 
> Assuming you're working on cleaning /that/ up too, I'll let that go...

Yes, a forward ported patch set that does this will be posted soon.

> > +	xfs_alert_tag(mp, tag,
> > +"%s (0x%x) detected at %pS (%s:%d).  Shutting down filesystem.",
> > +			why, flags, __return_address, fname, lnnum);
> >  	xfs_alert(mp,
> >  		"Please unmount the filesystem and rectify the problem(s)");
> > +	if (xfs_error_level >= XFS_ERRLEVEL_HIGH)
> > +		xfs_stack_trace();
> 
> Doesn't xfs_alert already drop a stack trace for xfs_error_level >=
> XFS_ERRLEVEL_HIGH ?

It does? I've never seen it do that, and the existing code implies
it doesn't do this, either, and that's the logic was looking at
here:

        if (flags & SHUTDOWN_CORRUPT_INCORE) {
                xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_CORRUPT,
"Corruption of in-memory data (0x%x) detected at %pS (%s:%d).  Shutting down filesystem",
                                flags, __return_address, fname, lnnum);
                if (XFS_ERRLEVEL_HIGH <= xfs_error_level)
                        xfs_stack_trace();
	} else if (....)

Yes, xfs_alert_tag() does not trigger a stack trace at all, but
there's an unconditional xfs_alert() call after this so if that
issues stack traces then we'd get a double stack trace on all
SHUTDOWN_CORRUPT_INCORE incidents. AFAICT, that doesn't actually
happen....

This pattern is repeated in several places - look at
xfs_inode_verifier_error(), xfs_buf_verifier_error(), and
xfs_buf_corruption_error(). They all have xfs_alert() calls, then
follow it up with a specific error level check for a stack dump.

Hmmm, it looks like xfs_alert() was intended to dump stacks, but I
don't think it works:

        if (!kstrtoint(kern_level, 0, &level) &&                \
            level <= LOGLEVEL_ERR &&                            \
            xfs_error_level >= XFS_ERRLEVEL_HIGH)               \
                xfs_stack_trace();                              \

And kern_level is KERN_ALERT, which is:

#define KERN_SOH        "\001"
....
#define KERN_ALERT      KERN_SOH "1"

And:

#define LOGLEVEL_ERR            3       /* error conditions */

So what does kstrtoint() return when passed the string "\0011"? It's
not actually an integer string...

Hmmm, I think it returns -EINVAL, which means it then uses level
uninitialised, and the result is .... unpredictable it is likely
no stack trace is emitted....

Fixing this mess is out of scope for this patchset.  The changes in
this patchset don't change the existing pattern of the function of
unconditionally calling xfs_alert() and conditionally and explicitly
dumping stack traces manually. I'll add it to my ever growing list
of cleanups that need to be done...

Cheers,

Dave.
Darrick J. Wong July 14, 2021, 5:02 a.m. UTC | #5
On Wed, Jul 14, 2021 at 01:15:24PM +1000, Dave Chinner wrote:
> On Thu, Jul 08, 2021 at 09:40:20PM -0700, Darrick J. Wong wrote:
> > On Wed, Jun 30, 2021 at 04:38:09PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > The running of a forced shutdown is a bit of a mess. It does racy
> > > checks for XFS_MOUNT_SHUTDOWN in xfs_do_force_shutdown(), then
> > > does more racy checks in xfs_log_force_unmount() before finally
> > > setting XFS_MOUNT_SHUTDOWN and XLOG_IO_ERROR under the
> > > log->icloglock.
> > > 
> > > Move the checking and setting of XFS_MOUNT_SHUTDOWN into
> > > xfs_do_force_shutdown() so we only process a shutdown once and once
> > > only. Serialise this with the mp->m_sb_lock spinlock so that the
> > > state change is atomic and won't race. Move all the mount specific
> > 
> > Assuming you're working on cleaning /that/ up too, I'll let that go...
> 
> Yes, a forward ported patch set that does this will be posted soon.

Ok.

> > > +	xfs_alert_tag(mp, tag,
> > > +"%s (0x%x) detected at %pS (%s:%d).  Shutting down filesystem.",
> > > +			why, flags, __return_address, fname, lnnum);
> > >  	xfs_alert(mp,
> > >  		"Please unmount the filesystem and rectify the problem(s)");
> > > +	if (xfs_error_level >= XFS_ERRLEVEL_HIGH)
> > > +		xfs_stack_trace();
> > 
> > Doesn't xfs_alert already drop a stack trace for xfs_error_level >=
> > XFS_ERRLEVEL_HIGH ?
> 
> It does? I've never seen it do that, and the existing code implies
> it doesn't do this, either, and that's the logic was looking at
> here:
> 
>         if (flags & SHUTDOWN_CORRUPT_INCORE) {
>                 xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_CORRUPT,
> "Corruption of in-memory data (0x%x) detected at %pS (%s:%d).  Shutting down filesystem",
>                                 flags, __return_address, fname, lnnum);
>                 if (XFS_ERRLEVEL_HIGH <= xfs_error_level)
>                         xfs_stack_trace();
> 	} else if (....)
> 
> Yes, xfs_alert_tag() does not trigger a stack trace at all, but
> there's an unconditional xfs_alert() call after this so if that
> issues stack traces then we'd get a double stack trace on all
> SHUTDOWN_CORRUPT_INCORE incidents. AFAICT, that doesn't actually
> happen....
> 
> This pattern is repeated in several places - look at
> xfs_inode_verifier_error(), xfs_buf_verifier_error(), and
> xfs_buf_corruption_error(). They all have xfs_alert() calls, then
> follow it up with a specific error level check for a stack dump.
> 
> Hmmm, it looks like xfs_alert() was intended to dump stacks, but I
> don't think it works:
> 
>         if (!kstrtoint(kern_level, 0, &level) &&                \
>             level <= LOGLEVEL_ERR &&                            \
>             xfs_error_level >= XFS_ERRLEVEL_HIGH)               \
>                 xfs_stack_trace();                              \
> 
> And kern_level is KERN_ALERT, which is:
> 
> #define KERN_SOH        "\001"
> ....
> #define KERN_ALERT      KERN_SOH "1"
> 
> And:
> 
> #define LOGLEVEL_ERR            3       /* error conditions */
> 
> So what does kstrtoint() return when passed the string "\0011"? It's
> not actually an integer string...
> 
> Hmmm, I think it returns -EINVAL, which means it then uses level
> uninitialised, and the result is .... unpredictable it is likely
> no stack trace is emitted....
> 
> Fixing this mess is out of scope for this patchset.  The changes in
> this patchset don't change the existing pattern of the function of
> unconditionally calling xfs_alert() and conditionally and explicitly
> dumping stack traces manually. I'll add it to my ever growing list
> of cleanups that need to be done...

AHA!  That's why that's never seemed to work right for me.

Well, at least the good news is that we each have enough outstanding
patchsets to keep the other busy reviewing until 2028 or so. ;)

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 6ed29b158312..caca391ce1a9 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -511,6 +511,11 @@  xfs_fs_goingdown(
  * consistent. We don't do an unmount here; just shutdown the shop, make sure
  * that absolutely nothing persistent happens to this filesystem after this
  * point.
+ *
+ * The shutdown state change is atomic, resulting in the first and only the
+ * first shutdown call processing the shutdown. This means we only shutdown the
+ * log once as it requires, and we don't spam the logs when multiple concurrent
+ * shutdowns race to set the shutdown flags.
  */
 void
 xfs_do_force_shutdown(
@@ -519,48 +524,41 @@  xfs_do_force_shutdown(
 	char		*fname,
 	int		lnnum)
 {
-	bool		logerror = flags & SHUTDOWN_LOG_IO_ERROR;
-
-	/*
-	 * No need to duplicate efforts.
-	 */
-	if (XFS_FORCED_SHUTDOWN(mp) && !logerror)
-		return;
-
-	/*
-	 * This flags XFS_MOUNT_FS_SHUTDOWN, makes sure that we don't
-	 * queue up anybody new on the log reservations, and wakes up
-	 * everybody who's sleeping on log reservations to tell them
-	 * the bad news.
-	 */
-	if (xfs_log_force_umount(mp, logerror))
-		return;
+	int		tag;
+	const char	*why;
 
-	if (flags & SHUTDOWN_FORCE_UMOUNT) {
-		xfs_alert(mp,
-"User initiated shutdown (0x%x) received. Shutting down filesystem",
-				flags);
+	spin_lock(&mp->m_sb_lock);
+	if (XFS_FORCED_SHUTDOWN(mp)) {
+		spin_unlock(&mp->m_sb_lock);
 		return;
 	}
+	mp->m_flags |= XFS_MOUNT_FS_SHUTDOWN;
+	if (mp->m_sb_bp)
+		mp->m_sb_bp->b_flags |= XBF_DONE;
+	spin_unlock(&mp->m_sb_lock);
+
+	if (flags & SHUTDOWN_FORCE_UMOUNT)
+		xfs_alert(mp, "User initiated shutdown received.");
 
-	if (flags & SHUTDOWN_CORRUPT_INCORE) {
-		xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_CORRUPT,
-"Corruption of in-memory data (0x%x) detected at %pS (%s:%d).  Shutting down filesystem",
-				flags, __return_address, fname, lnnum);
-		if (XFS_ERRLEVEL_HIGH <= xfs_error_level)
-			xfs_stack_trace();
-	} else if (logerror) {
-		xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_LOGERROR,
-"Log I/O error (0x%x) detected at %pS (%s:%d). Shutting down filesystem",
-				flags, __return_address, fname, lnnum);
+	if (xlog_force_shutdown(mp->m_log, flags)) {
+		tag = XFS_PTAG_SHUTDOWN_LOGERROR;
+		why = "Log I/O Error";
+	} else if (flags & SHUTDOWN_CORRUPT_INCORE) {
+		tag = XFS_PTAG_SHUTDOWN_CORRUPT;
+		why = "Corruption of in-memory data";
 	} else {
-		xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_IOERROR,
-"I/O error (0x%x) detected at %pS (%s:%d). Shutting down filesystem",
-				flags, __return_address, fname, lnnum);
+		tag = XFS_PTAG_SHUTDOWN_IOERROR;
+		why = "Metadata I/O Error";
 	}
 
+	xfs_alert_tag(mp, tag,
+"%s (0x%x) detected at %pS (%s:%d).  Shutting down filesystem.",
+			why, flags, __return_address, fname, lnnum);
 	xfs_alert(mp,
 		"Please unmount the filesystem and rectify the problem(s)");
+	if (xfs_error_level >= XFS_ERRLEVEL_HIGH)
+		xfs_stack_trace();
+
 }
 
 /*
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 049d6ac9cb4d..6c7cfc052135 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3673,76 +3673,66 @@  xlog_verify_iclog(
 #endif
 
 /*
- * This is called from xfs_force_shutdown, when we're forcibly
- * shutting down the filesystem, typically because of an IO error.
- * Our main objectives here are to make sure that:
- *	a. if !logerror, flush the logs to disk. Anything modified
- *	   after this is ignored.
- *	b. the filesystem gets marked 'SHUTDOWN' for all interested
- *	   parties to find out, 'atomically'.
- *	c. those who're sleeping on log reservations, pinned objects and
- *	    other resources get woken up, and be told the bad news.
- *	d. nothing new gets queued up after (b) and (c) are done.
+ * Perform a forced shutdown on the log. This should be called once and once
+ * only by the high level filesystem shutdown code to shut the log subsystem
+ * down cleanly.
  *
- * Note: for the !logerror case we need to flush the regions held in memory out
- * to disk first. This needs to be done before the log is marked as shutdown,
- * otherwise the iclog writes will fail.
+ * Our main objectives here are to make sure that:
+ *	a. if the shutdown was not due to a log IO error, flush the logs to
+ *	   disk. Anything modified after this is ignored.
+ *	b. the log gets atomically marked 'XLOG_IO_ERROR' for all interested
+ *	   parties to find out. Nothing new gets queued after this is done.
+ *	c. Tasks sleeping on log reservations, pinned objects and
+ *	   other resources get woken up.
  *
- * Return non-zero if log shutdown transition had already happened.
+ * Return true if the shutdown cause was a log IO error and we actually shut the
+ * log down.
  */
-int
-xfs_log_force_umount(
-	struct xfs_mount	*mp,
-	int			logerror)
+bool
+xlog_force_shutdown(
+	struct xlog	*log,
+	int		shutdown_flags)
 {
-	struct xlog	*log;
-	int		retval = 0;
-
-	log = mp->m_log;
+	bool		log_error = (shutdown_flags & SHUTDOWN_LOG_IO_ERROR);
 
 	/*
-	 * If this happens during log recovery, don't worry about
-	 * locking; the log isn't open for business yet.
+	 * If this happens during log recovery then we aren't using the runtime
+	 * log mechanisms yet so there's nothing to shut down.
 	 */
-	if (!log || xlog_in_recovery(log)) {
-		mp->m_flags |= XFS_MOUNT_FS_SHUTDOWN;
-		if (mp->m_sb_bp)
-			mp->m_sb_bp->b_flags |= XBF_DONE;
-		return 0;
-	}
+	if (!log || xlog_in_recovery(log))
+		return false;
 
-	/*
-	 * Somebody could've already done the hard work for us.
-	 * No need to get locks for this.
-	 */
-	if (logerror && xlog_is_shutdown(log))
-		return 1;
+	ASSERT(!xlog_is_shutdown(log));
 
 	/*
 	 * Flush all the completed transactions to disk before marking the log
-	 * being shut down. We need to do it in this order to ensure that
-	 * completed operations are safely on disk before we shut down, and that
-	 * we don't have to issue any buffer IO after the shutdown flags are set
-	 * to guarantee this.
+	 * being shut down. We need to do this first as shutting down the log
+	 * before the force will prevent the log force from flushing the iclogs
+	 * to disk.
+	 *
+	 * Re-entry due to a log IO error shutdown during the log force is
+	 * prevented by the atomicity of higher level shutdown code.
 	 */
-	if (!logerror)
-		xfs_log_force(mp, XFS_LOG_SYNC);
+	if (!log_error)
+		xfs_log_force(log->l_mp, XFS_LOG_SYNC);
 
 	/*
-	 * mark the filesystem and the as in a shutdown state and wake
-	 * everybody up to tell them the bad news.
+	 * Atomically set the shutdown state. If the shutdown state is already
+	 * set, there someone else is performing the shutdown and so we are done
+	 * here. This should never happen because we should only ever get called
+	 * once by the first shutdown caller.
+	 *
+	 * Much of the log state machine transitions assume that shutdown state
+	 * cannot change once they hold the log->l_icloglock. Hence we need to
+	 * hold that lock here, even though we use the atomic test_and_set_bit()
+	 * operation to set the shutdown state.
 	 */
 	spin_lock(&log->l_icloglock);
-	mp->m_flags |= XFS_MOUNT_FS_SHUTDOWN;
-	if (mp->m_sb_bp)
-		mp->m_sb_bp->b_flags |= XBF_DONE;
-
-	/*
-	 * Mark the log and the iclogs with IO error flags to prevent any
-	 * further log IO from being issued or completed.
-	 */
-	if (!test_and_set_bit(XLOG_IO_ERROR, &log->l_opstate))
-		retval = 1;
+	if (test_and_set_bit(XLOG_IO_ERROR, &log->l_opstate)) {
+		spin_unlock(&log->l_icloglock);
+		ASSERT(0);
+		return false;
+	}
 	spin_unlock(&log->l_icloglock);
 
 	/*
@@ -3766,7 +3756,7 @@  xfs_log_force_umount(
 	spin_unlock(&log->l_cilp->xc_push_lock);
 	xlog_state_do_callback(log);
 
-	return retval;
+	return log_error;
 }
 
 STATIC int
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 4c5c8a7db1d9..3f680f0c9744 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -125,7 +125,6 @@  int	  xfs_log_reserve(struct xfs_mount *mp,
 			  bool		   permanent);
 int	  xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic);
 void      xfs_log_unmount(struct xfs_mount *mp);
-int	  xfs_log_force_umount(struct xfs_mount *mp, int logerror);
 bool	xfs_log_writable(struct xfs_mount *mp);
 
 struct xlog_ticket *xfs_log_ticket_get(struct xlog_ticket *ticket);
@@ -140,5 +139,6 @@  void	xfs_log_clean(struct xfs_mount *mp);
 bool	xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t);
 
 xfs_lsn_t xlog_grant_push_threshold(struct xlog *log, int need_bytes);
+bool	  xlog_force_shutdown(struct xlog *log, int shutdown_flags);
 
 #endif	/* __XFS_LOG_H__ */