diff mbox series

[RFC,v5,6/9] xfs: automatically relog the quotaoff start intent

Message ID 20200227134321.7238-7-bfoster@redhat.com (mailing list archive)
State Superseded, archived
Headers show
Series xfs: automatic relogging experiment | expand

Commit Message

Brian Foster Feb. 27, 2020, 1:43 p.m. UTC
The quotaoff operation has a rare but longstanding deadlock vector
in terms of how the operation is logged. A quotaoff start intent is
logged (synchronously) at the onset to ensure recovery can handle
the operation if interrupted before in-core changes are made. This
quotaoff intent pins the log tail while the quotaoff sequence scans
and purges dquots from all in-core inodes. While this operation
generally doesn't generate much log traffic on its own, it can be
time consuming. If unrelated, concurrent filesystem activity
consumes remaining log space before quotaoff is able to acquire log
reservation for the quotaoff end intent, the filesystem locks up
indefinitely.

quotaoff cannot allocate the end intent before the scan because the
latter can result in transaction allocation itself in certain
indirect cases (releasing an inode, for example). Further, rolling
the original transaction is difficult because the scanning work
occurs multiple layers down where caller context is lost and not
much information is available to determine how often to roll the
transaction.

To address this problem, enable automatic relogging of the quotaoff
start intent. This automatically relogs the intent whenever AIL
pushing finds the item at the tail of the log. When quotaoff
completes, wait for relogging to complete as the end intent expects
to be able to permanently remove the start intent from the log
subsystem. This ensures that the log tail is kept moving during a
particularly long quotaoff operation and avoids the log reservation
deadlock.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c |  3 ++-
 fs/xfs/xfs_dquot_item.c        |  7 +++++++
 fs/xfs/xfs_qm_syscalls.c       | 12 +++++++++++-
 3 files changed, 20 insertions(+), 2 deletions(-)

Comments

Allison Henderson Feb. 27, 2020, 11:19 p.m. UTC | #1
On 2/27/20 6:43 AM, Brian Foster wrote:
> The quotaoff operation has a rare but longstanding deadlock vector
> in terms of how the operation is logged. A quotaoff start intent is
> logged (synchronously) at the onset to ensure recovery can handle
> the operation if interrupted before in-core changes are made. This
> quotaoff intent pins the log tail while the quotaoff sequence scans
> and purges dquots from all in-core inodes. While this operation
> generally doesn't generate much log traffic on its own, it can be
> time consuming. If unrelated, concurrent filesystem activity
> consumes remaining log space before quotaoff is able to acquire log
> reservation for the quotaoff end intent, the filesystem locks up
> indefinitely.
> 
> quotaoff cannot allocate the end intent before the scan because the
> latter can result in transaction allocation itself in certain
> indirect cases (releasing an inode, for example). Further, rolling
> the original transaction is difficult because the scanning work
> occurs multiple layers down where caller context is lost and not
> much information is available to determine how often to roll the
> transaction.
> 
> To address this problem, enable automatic relogging of the quotaoff
> start intent. This automatically relogs the intent whenever AIL
> pushing finds the item at the tail of the log. When quotaoff
> completes, wait for relogging to complete as the end intent expects
> to be able to permanently remove the start intent from the log
> subsystem. This ensures that the log tail is kept moving during a
> particularly long quotaoff operation and avoids the log reservation
> deadlock.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>   fs/xfs/libxfs/xfs_trans_resv.c |  3 ++-
>   fs/xfs/xfs_dquot_item.c        |  7 +++++++
>   fs/xfs/xfs_qm_syscalls.c       | 12 +++++++++++-
>   3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> index 1f5c9e6e1afc..f49b20c9ca33 100644
> --- a/fs/xfs/libxfs/xfs_trans_resv.c
> +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> @@ -935,7 +935,8 @@ xfs_trans_resv_calc(
>   	resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT;
>   
>   	resp->tr_qm_quotaoff.tr_logres = xfs_calc_qm_quotaoff_reservation(mp);
> -	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT;
> +	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
> +	resp->tr_qm_quotaoff.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
What's the reason for the log count change here?  Otherwise looks ok.

Allison
>   
>   	resp->tr_qm_equotaoff.tr_logres =
>   		xfs_calc_qm_quotaoff_end_reservation();
> diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
> index d60647d7197b..ea5123678466 100644
> --- a/fs/xfs/xfs_dquot_item.c
> +++ b/fs/xfs/xfs_dquot_item.c
> @@ -297,6 +297,13 @@ xfs_qm_qoff_logitem_push(
>   	struct xfs_log_item	*lip,
>   	struct list_head	*buffer_list)
>   {
> +	struct xfs_log_item	*mlip = xfs_ail_min(lip->li_ailp);
> +
> +	if (test_bit(XFS_LI_RELOG, &lip->li_flags) &&
> +	    !test_bit(XFS_LI_RELOGGED, &lip->li_flags) &&
> +	    !XFS_LSN_CMP(lip->li_lsn, mlip->li_lsn))
> +		return XFS_ITEM_RELOG;
> +
>   	return XFS_ITEM_LOCKED;
>   }
>   
> diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> index 1ea82764bf89..7b48d34da0f4 100644
> --- a/fs/xfs/xfs_qm_syscalls.c
> +++ b/fs/xfs/xfs_qm_syscalls.c
> @@ -18,6 +18,7 @@
>   #include "xfs_quota.h"
>   #include "xfs_qm.h"
>   #include "xfs_icache.h"
> +#include "xfs_trans_priv.h"
>   
>   STATIC int
>   xfs_qm_log_quotaoff(
> @@ -31,12 +32,14 @@ xfs_qm_log_quotaoff(
>   
>   	*qoffstartp = NULL;
>   
> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0,
> +				XFS_TRANS_RELOG, &tp);
>   	if (error)
>   		goto out;
>   
>   	qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT);
>   	xfs_trans_log_quotaoff_item(tp, qoffi);
> +	xfs_trans_relog_item(&qoffi->qql_item);
>   
>   	spin_lock(&mp->m_sb_lock);
>   	mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL;
> @@ -69,6 +72,13 @@ xfs_qm_log_quotaoff_end(
>   	int			error;
>   	struct xfs_qoff_logitem	*qoffi;
>   
> +	/*
> +	 * startqoff must be in the AIL and not the CIL when the end intent
> +	 * commits to ensure it is not readded to the AIL out of order. Wait on
> +	 * relog activity to drain to isolate startqoff to the AIL.
> +	 */
> +	xfs_trans_relog_item_cancel(&startqoff->qql_item, true);
> +
>   	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
>   	if (error)
>   		return error;
>
Darrick J. Wong Feb. 28, 2020, 1:16 a.m. UTC | #2
On Thu, Feb 27, 2020 at 08:43:18AM -0500, Brian Foster wrote:
> The quotaoff operation has a rare but longstanding deadlock vector
> in terms of how the operation is logged. A quotaoff start intent is
> logged (synchronously) at the onset to ensure recovery can handle
> the operation if interrupted before in-core changes are made. This
> quotaoff intent pins the log tail while the quotaoff sequence scans
> and purges dquots from all in-core inodes. While this operation
> generally doesn't generate much log traffic on its own, it can be
> time consuming. If unrelated, concurrent filesystem activity
> consumes remaining log space before quotaoff is able to acquire log
> reservation for the quotaoff end intent, the filesystem locks up
> indefinitely.
> 
> quotaoff cannot allocate the end intent before the scan because the
> latter can result in transaction allocation itself in certain
> indirect cases (releasing an inode, for example). Further, rolling
> the original transaction is difficult because the scanning work
> occurs multiple layers down where caller context is lost and not
> much information is available to determine how often to roll the
> transaction.
> 
> To address this problem, enable automatic relogging of the quotaoff
> start intent. This automatically relogs the intent whenever AIL
> pushing finds the item at the tail of the log. When quotaoff
> completes, wait for relogging to complete as the end intent expects
> to be able to permanently remove the start intent from the log
> subsystem. This ensures that the log tail is kept moving during a
> particularly long quotaoff operation and avoids the log reservation
> deadlock.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_trans_resv.c |  3 ++-
>  fs/xfs/xfs_dquot_item.c        |  7 +++++++
>  fs/xfs/xfs_qm_syscalls.c       | 12 +++++++++++-
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> index 1f5c9e6e1afc..f49b20c9ca33 100644
> --- a/fs/xfs/libxfs/xfs_trans_resv.c
> +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> @@ -935,7 +935,8 @@ xfs_trans_resv_calc(
>  	resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT;
>  
>  	resp->tr_qm_quotaoff.tr_logres = xfs_calc_qm_quotaoff_reservation(mp);
> -	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT;
> +	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
> +	resp->tr_qm_quotaoff.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
>  
>  	resp->tr_qm_equotaoff.tr_logres =
>  		xfs_calc_qm_quotaoff_end_reservation();
> diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
> index d60647d7197b..ea5123678466 100644
> --- a/fs/xfs/xfs_dquot_item.c
> +++ b/fs/xfs/xfs_dquot_item.c
> @@ -297,6 +297,13 @@ xfs_qm_qoff_logitem_push(
>  	struct xfs_log_item	*lip,
>  	struct list_head	*buffer_list)
>  {
> +	struct xfs_log_item	*mlip = xfs_ail_min(lip->li_ailp);
> +
> +	if (test_bit(XFS_LI_RELOG, &lip->li_flags) &&
> +	    !test_bit(XFS_LI_RELOGGED, &lip->li_flags) &&
> +	    !XFS_LSN_CMP(lip->li_lsn, mlip->li_lsn))
> +		return XFS_ITEM_RELOG;
> +
>  	return XFS_ITEM_LOCKED;
>  }
>  
> diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> index 1ea82764bf89..7b48d34da0f4 100644
> --- a/fs/xfs/xfs_qm_syscalls.c
> +++ b/fs/xfs/xfs_qm_syscalls.c
> @@ -18,6 +18,7 @@
>  #include "xfs_quota.h"
>  #include "xfs_qm.h"
>  #include "xfs_icache.h"
> +#include "xfs_trans_priv.h"
>  
>  STATIC int
>  xfs_qm_log_quotaoff(
> @@ -31,12 +32,14 @@ xfs_qm_log_quotaoff(
>  
>  	*qoffstartp = NULL;
>  
> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0,
> +				XFS_TRANS_RELOG, &tp);

Humm, maybe I don't understand how this works after all.  From what I
can tell from this patch, (1) the quotaoff transaction is created with
RELOG, so (2) the AIL steals some reservation from it for an eventual
relogging of the quotaoff item, and then (3) we log the quotaoff item.

Later, the AIL can decide to trigger the workqueue item to take the
ticket generated in step (2) to relog the item we logged in step (3) to
move the log tail forward, but what happens if there are further delays
and the AIL needs to relog again?  That ticket from (2) is now used up
and is gone, right?

I suppose some other RELOG transaction could wander in and generate a
new relog ticket, but as this is the only RELOG transaction that gets
created anywhere, that won't happen.  Is there some magic I missed? :)

--D

>  	if (error)
>  		goto out;
>  
>  	qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT);
>  	xfs_trans_log_quotaoff_item(tp, qoffi);
> +	xfs_trans_relog_item(&qoffi->qql_item);
>  
>  	spin_lock(&mp->m_sb_lock);
>  	mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL;
> @@ -69,6 +72,13 @@ xfs_qm_log_quotaoff_end(
>  	int			error;
>  	struct xfs_qoff_logitem	*qoffi;
>  
> +	/*
> +	 * startqoff must be in the AIL and not the CIL when the end intent
> +	 * commits to ensure it is not readded to the AIL out of order. Wait on
> +	 * relog activity to drain to isolate startqoff to the AIL.
> +	 */
> +	xfs_trans_relog_item_cancel(&startqoff->qql_item, true);
> +
>  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
>  	if (error)
>  		return error;
> -- 
> 2.21.1
>
Brian Foster Feb. 28, 2020, 2:03 p.m. UTC | #3
On Thu, Feb 27, 2020 at 04:19:42PM -0700, Allison Collins wrote:
> 
> 
> On 2/27/20 6:43 AM, Brian Foster wrote:
> > The quotaoff operation has a rare but longstanding deadlock vector
> > in terms of how the operation is logged. A quotaoff start intent is
> > logged (synchronously) at the onset to ensure recovery can handle
> > the operation if interrupted before in-core changes are made. This
> > quotaoff intent pins the log tail while the quotaoff sequence scans
> > and purges dquots from all in-core inodes. While this operation
> > generally doesn't generate much log traffic on its own, it can be
> > time consuming. If unrelated, concurrent filesystem activity
> > consumes remaining log space before quotaoff is able to acquire log
> > reservation for the quotaoff end intent, the filesystem locks up
> > indefinitely.
> > 
> > quotaoff cannot allocate the end intent before the scan because the
> > latter can result in transaction allocation itself in certain
> > indirect cases (releasing an inode, for example). Further, rolling
> > the original transaction is difficult because the scanning work
> > occurs multiple layers down where caller context is lost and not
> > much information is available to determine how often to roll the
> > transaction.
> > 
> > To address this problem, enable automatic relogging of the quotaoff
> > start intent. This automatically relogs the intent whenever AIL
> > pushing finds the item at the tail of the log. When quotaoff
> > completes, wait for relogging to complete as the end intent expects
> > to be able to permanently remove the start intent from the log
> > subsystem. This ensures that the log tail is kept moving during a
> > particularly long quotaoff operation and avoids the log reservation
> > deadlock.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >   fs/xfs/libxfs/xfs_trans_resv.c |  3 ++-
> >   fs/xfs/xfs_dquot_item.c        |  7 +++++++
> >   fs/xfs/xfs_qm_syscalls.c       | 12 +++++++++++-
> >   3 files changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> > index 1f5c9e6e1afc..f49b20c9ca33 100644
> > --- a/fs/xfs/libxfs/xfs_trans_resv.c
> > +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> > @@ -935,7 +935,8 @@ xfs_trans_resv_calc(
> >   	resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT;
> >   	resp->tr_qm_quotaoff.tr_logres = xfs_calc_qm_quotaoff_reservation(mp);
> > -	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT;
> > +	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
> > +	resp->tr_qm_quotaoff.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> What's the reason for the log count change here?  Otherwise looks ok.
> 

Permanent transactions have a separate default log count (2 instead of
1) because they are intended to be rolled (at least once). This
basically means the initial allocation will acquire enough log
reservation for the initial transaction and a subsequent roll, similar
to how all other permanant transactions are initialized in this file.
This is required for quotaoff because the transaction now uses
XFS_TRANS_RELOG, which requires a roll up front.

Brian

> Allison
> >   	resp->tr_qm_equotaoff.tr_logres =
> >   		xfs_calc_qm_quotaoff_end_reservation();
> > diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
> > index d60647d7197b..ea5123678466 100644
> > --- a/fs/xfs/xfs_dquot_item.c
> > +++ b/fs/xfs/xfs_dquot_item.c
> > @@ -297,6 +297,13 @@ xfs_qm_qoff_logitem_push(
> >   	struct xfs_log_item	*lip,
> >   	struct list_head	*buffer_list)
> >   {
> > +	struct xfs_log_item	*mlip = xfs_ail_min(lip->li_ailp);
> > +
> > +	if (test_bit(XFS_LI_RELOG, &lip->li_flags) &&
> > +	    !test_bit(XFS_LI_RELOGGED, &lip->li_flags) &&
> > +	    !XFS_LSN_CMP(lip->li_lsn, mlip->li_lsn))
> > +		return XFS_ITEM_RELOG;
> > +
> >   	return XFS_ITEM_LOCKED;
> >   }
> > diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> > index 1ea82764bf89..7b48d34da0f4 100644
> > --- a/fs/xfs/xfs_qm_syscalls.c
> > +++ b/fs/xfs/xfs_qm_syscalls.c
> > @@ -18,6 +18,7 @@
> >   #include "xfs_quota.h"
> >   #include "xfs_qm.h"
> >   #include "xfs_icache.h"
> > +#include "xfs_trans_priv.h"
> >   STATIC int
> >   xfs_qm_log_quotaoff(
> > @@ -31,12 +32,14 @@ xfs_qm_log_quotaoff(
> >   	*qoffstartp = NULL;
> > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
> > +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0,
> > +				XFS_TRANS_RELOG, &tp);
> >   	if (error)
> >   		goto out;
> >   	qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT);
> >   	xfs_trans_log_quotaoff_item(tp, qoffi);
> > +	xfs_trans_relog_item(&qoffi->qql_item);
> >   	spin_lock(&mp->m_sb_lock);
> >   	mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL;
> > @@ -69,6 +72,13 @@ xfs_qm_log_quotaoff_end(
> >   	int			error;
> >   	struct xfs_qoff_logitem	*qoffi;
> > +	/*
> > +	 * startqoff must be in the AIL and not the CIL when the end intent
> > +	 * commits to ensure it is not readded to the AIL out of order. Wait on
> > +	 * relog activity to drain to isolate startqoff to the AIL.
> > +	 */
> > +	xfs_trans_relog_item_cancel(&startqoff->qql_item, true);
> > +
> >   	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
> >   	if (error)
> >   		return error;
> > 
>
Brian Foster Feb. 28, 2020, 2:04 p.m. UTC | #4
On Thu, Feb 27, 2020 at 05:16:40PM -0800, Darrick J. Wong wrote:
> On Thu, Feb 27, 2020 at 08:43:18AM -0500, Brian Foster wrote:
> > The quotaoff operation has a rare but longstanding deadlock vector
> > in terms of how the operation is logged. A quotaoff start intent is
> > logged (synchronously) at the onset to ensure recovery can handle
> > the operation if interrupted before in-core changes are made. This
> > quotaoff intent pins the log tail while the quotaoff sequence scans
> > and purges dquots from all in-core inodes. While this operation
> > generally doesn't generate much log traffic on its own, it can be
> > time consuming. If unrelated, concurrent filesystem activity
> > consumes remaining log space before quotaoff is able to acquire log
> > reservation for the quotaoff end intent, the filesystem locks up
> > indefinitely.
> > 
> > quotaoff cannot allocate the end intent before the scan because the
> > latter can result in transaction allocation itself in certain
> > indirect cases (releasing an inode, for example). Further, rolling
> > the original transaction is difficult because the scanning work
> > occurs multiple layers down where caller context is lost and not
> > much information is available to determine how often to roll the
> > transaction.
> > 
> > To address this problem, enable automatic relogging of the quotaoff
> > start intent. This automatically relogs the intent whenever AIL
> > pushing finds the item at the tail of the log. When quotaoff
> > completes, wait for relogging to complete as the end intent expects
> > to be able to permanently remove the start intent from the log
> > subsystem. This ensures that the log tail is kept moving during a
> > particularly long quotaoff operation and avoids the log reservation
> > deadlock.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_trans_resv.c |  3 ++-
> >  fs/xfs/xfs_dquot_item.c        |  7 +++++++
> >  fs/xfs/xfs_qm_syscalls.c       | 12 +++++++++++-
> >  3 files changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> > index 1f5c9e6e1afc..f49b20c9ca33 100644
> > --- a/fs/xfs/libxfs/xfs_trans_resv.c
> > +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> > @@ -935,7 +935,8 @@ xfs_trans_resv_calc(
> >  	resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT;
> >  
> >  	resp->tr_qm_quotaoff.tr_logres = xfs_calc_qm_quotaoff_reservation(mp);
> > -	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT;
> > +	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
> > +	resp->tr_qm_quotaoff.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> >  
> >  	resp->tr_qm_equotaoff.tr_logres =
> >  		xfs_calc_qm_quotaoff_end_reservation();
> > diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
> > index d60647d7197b..ea5123678466 100644
> > --- a/fs/xfs/xfs_dquot_item.c
> > +++ b/fs/xfs/xfs_dquot_item.c
> > @@ -297,6 +297,13 @@ xfs_qm_qoff_logitem_push(
> >  	struct xfs_log_item	*lip,
> >  	struct list_head	*buffer_list)
> >  {
> > +	struct xfs_log_item	*mlip = xfs_ail_min(lip->li_ailp);
> > +
> > +	if (test_bit(XFS_LI_RELOG, &lip->li_flags) &&
> > +	    !test_bit(XFS_LI_RELOGGED, &lip->li_flags) &&
> > +	    !XFS_LSN_CMP(lip->li_lsn, mlip->li_lsn))
> > +		return XFS_ITEM_RELOG;
> > +
> >  	return XFS_ITEM_LOCKED;
> >  }
> >  
> > diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> > index 1ea82764bf89..7b48d34da0f4 100644
> > --- a/fs/xfs/xfs_qm_syscalls.c
> > +++ b/fs/xfs/xfs_qm_syscalls.c
> > @@ -18,6 +18,7 @@
> >  #include "xfs_quota.h"
> >  #include "xfs_qm.h"
> >  #include "xfs_icache.h"
> > +#include "xfs_trans_priv.h"
> >  
> >  STATIC int
> >  xfs_qm_log_quotaoff(
> > @@ -31,12 +32,14 @@ xfs_qm_log_quotaoff(
> >  
> >  	*qoffstartp = NULL;
> >  
> > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
> > +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0,
> > +				XFS_TRANS_RELOG, &tp);
> 
> Humm, maybe I don't understand how this works after all.  From what I
> can tell from this patch, (1) the quotaoff transaction is created with
> RELOG, so (2) the AIL steals some reservation from it for an eventual
> relogging of the quotaoff item, and then (3) we log the quotaoff item.
> 

Yep.

> Later, the AIL can decide to trigger the workqueue item to take the
> ticket generated in step (2) to relog the item we logged in step (3) to
> move the log tail forward, but what happens if there are further delays
> and the AIL needs to relog again?  That ticket from (2) is now used up
> and is gone, right?
> 
> I suppose some other RELOG transaction could wander in and generate a
> new relog ticket, but as this is the only RELOG transaction that gets
> created anywhere, that won't happen.  Is there some magic I missed? :)
> 

xfs_ail_relog() only ever rolls its transaction, even if nothing else
happens to be queued at the time, so the relog ticket constantly
regrants. Since relogs never commit, the relog ticket always has
available relog reservation so long as XFS_LI_RELOG items exist. Once
there are no more relog items or transactions, the pending reservation
is released via xfs_trans_ail_relog_put() -> xfs_log_done().

It might be more simple to reason about the reservation model if you
factor out the dynamic relog ticket bits. This is basically equivalent
to the AIL allocating a relog transaction at mount time, constantly
rolling it with relog items when they pass through, and then cancelling
the reservation at unmount time. All of the extra XFS_TRANS_RELOG and
reference counting and ticket management stuff is purely so we only have
an active relog reservation when relogging is being used.

Brian

> --D
> 
> >  	if (error)
> >  		goto out;
> >  
> >  	qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT);
> >  	xfs_trans_log_quotaoff_item(tp, qoffi);
> > +	xfs_trans_relog_item(&qoffi->qql_item);
> >  
> >  	spin_lock(&mp->m_sb_lock);
> >  	mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL;
> > @@ -69,6 +72,13 @@ xfs_qm_log_quotaoff_end(
> >  	int			error;
> >  	struct xfs_qoff_logitem	*qoffi;
> >  
> > +	/*
> > +	 * startqoff must be in the AIL and not the CIL when the end intent
> > +	 * commits to ensure it is not readded to the AIL out of order. Wait on
> > +	 * relog activity to drain to isolate startqoff to the AIL.
> > +	 */
> > +	xfs_trans_relog_item_cancel(&startqoff->qql_item, true);
> > +
> >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
> >  	if (error)
> >  		return error;
> > -- 
> > 2.21.1
> > 
>
Allison Henderson Feb. 28, 2020, 6:55 p.m. UTC | #5
On 2/28/20 7:03 AM, Brian Foster wrote:
> On Thu, Feb 27, 2020 at 04:19:42PM -0700, Allison Collins wrote:
>>
>>
>> On 2/27/20 6:43 AM, Brian Foster wrote:
>>> The quotaoff operation has a rare but longstanding deadlock vector
>>> in terms of how the operation is logged. A quotaoff start intent is
>>> logged (synchronously) at the onset to ensure recovery can handle
>>> the operation if interrupted before in-core changes are made. This
>>> quotaoff intent pins the log tail while the quotaoff sequence scans
>>> and purges dquots from all in-core inodes. While this operation
>>> generally doesn't generate much log traffic on its own, it can be
>>> time consuming. If unrelated, concurrent filesystem activity
>>> consumes remaining log space before quotaoff is able to acquire log
>>> reservation for the quotaoff end intent, the filesystem locks up
>>> indefinitely.
>>>
>>> quotaoff cannot allocate the end intent before the scan because the
>>> latter can result in transaction allocation itself in certain
>>> indirect cases (releasing an inode, for example). Further, rolling
>>> the original transaction is difficult because the scanning work
>>> occurs multiple layers down where caller context is lost and not
>>> much information is available to determine how often to roll the
>>> transaction.
>>>
>>> To address this problem, enable automatic relogging of the quotaoff
>>> start intent. This automatically relogs the intent whenever AIL
>>> pushing finds the item at the tail of the log. When quotaoff
>>> completes, wait for relogging to complete as the end intent expects
>>> to be able to permanently remove the start intent from the log
>>> subsystem. This ensures that the log tail is kept moving during a
>>> particularly long quotaoff operation and avoids the log reservation
>>> deadlock.
>>>
>>> Signed-off-by: Brian Foster <bfoster@redhat.com>
>>> ---
>>>    fs/xfs/libxfs/xfs_trans_resv.c |  3 ++-
>>>    fs/xfs/xfs_dquot_item.c        |  7 +++++++
>>>    fs/xfs/xfs_qm_syscalls.c       | 12 +++++++++++-
>>>    3 files changed, 20 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
>>> index 1f5c9e6e1afc..f49b20c9ca33 100644
>>> --- a/fs/xfs/libxfs/xfs_trans_resv.c
>>> +++ b/fs/xfs/libxfs/xfs_trans_resv.c
>>> @@ -935,7 +935,8 @@ xfs_trans_resv_calc(
>>>    	resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT;
>>>    	resp->tr_qm_quotaoff.tr_logres = xfs_calc_qm_quotaoff_reservation(mp);
>>> -	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT;
>>> +	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
>>> +	resp->tr_qm_quotaoff.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
>> What's the reason for the log count change here?  Otherwise looks ok.
>>
> 
> Permanent transactions have a separate default log count (2 instead of
> 1) because they are intended to be rolled (at least once). This
> basically means the initial allocation will acquire enough log
> reservation for the initial transaction and a subsequent roll, similar
> to how all other permanant transactions are initialized in this file.
> This is required for quotaoff because the transaction now uses
> XFS_TRANS_RELOG, which requires a roll up front.
> 
> Brian

Ok, I see.  Thank you for the explanation!
Allison

> 
>> Allison
>>>    	resp->tr_qm_equotaoff.tr_logres =
>>>    		xfs_calc_qm_quotaoff_end_reservation();
>>> diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
>>> index d60647d7197b..ea5123678466 100644
>>> --- a/fs/xfs/xfs_dquot_item.c
>>> +++ b/fs/xfs/xfs_dquot_item.c
>>> @@ -297,6 +297,13 @@ xfs_qm_qoff_logitem_push(
>>>    	struct xfs_log_item	*lip,
>>>    	struct list_head	*buffer_list)
>>>    {
>>> +	struct xfs_log_item	*mlip = xfs_ail_min(lip->li_ailp);
>>> +
>>> +	if (test_bit(XFS_LI_RELOG, &lip->li_flags) &&
>>> +	    !test_bit(XFS_LI_RELOGGED, &lip->li_flags) &&
>>> +	    !XFS_LSN_CMP(lip->li_lsn, mlip->li_lsn))
>>> +		return XFS_ITEM_RELOG;
>>> +
>>>    	return XFS_ITEM_LOCKED;
>>>    }
>>> diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
>>> index 1ea82764bf89..7b48d34da0f4 100644
>>> --- a/fs/xfs/xfs_qm_syscalls.c
>>> +++ b/fs/xfs/xfs_qm_syscalls.c
>>> @@ -18,6 +18,7 @@
>>>    #include "xfs_quota.h"
>>>    #include "xfs_qm.h"
>>>    #include "xfs_icache.h"
>>> +#include "xfs_trans_priv.h"
>>>    STATIC int
>>>    xfs_qm_log_quotaoff(
>>> @@ -31,12 +32,14 @@ xfs_qm_log_quotaoff(
>>>    	*qoffstartp = NULL;
>>> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
>>> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0,
>>> +				XFS_TRANS_RELOG, &tp);
>>>    	if (error)
>>>    		goto out;
>>>    	qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT);
>>>    	xfs_trans_log_quotaoff_item(tp, qoffi);
>>> +	xfs_trans_relog_item(&qoffi->qql_item);
>>>    	spin_lock(&mp->m_sb_lock);
>>>    	mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL;
>>> @@ -69,6 +72,13 @@ xfs_qm_log_quotaoff_end(
>>>    	int			error;
>>>    	struct xfs_qoff_logitem	*qoffi;
>>> +	/*
>>> +	 * startqoff must be in the AIL and not the CIL when the end intent
>>> +	 * commits to ensure it is not readded to the AIL out of order. Wait on
>>> +	 * relog activity to drain to isolate startqoff to the AIL.
>>> +	 */
>>> +	xfs_trans_relog_item_cancel(&startqoff->qql_item, true);
>>> +
>>>    	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
>>>    	if (error)
>>>    		return error;
>>>
>>
>
Darrick J. Wong Feb. 29, 2020, 5:35 a.m. UTC | #6
On Fri, Feb 28, 2020 at 09:04:13AM -0500, Brian Foster wrote:
> On Thu, Feb 27, 2020 at 05:16:40PM -0800, Darrick J. Wong wrote:
> > On Thu, Feb 27, 2020 at 08:43:18AM -0500, Brian Foster wrote:
> > > The quotaoff operation has a rare but longstanding deadlock vector
> > > in terms of how the operation is logged. A quotaoff start intent is
> > > logged (synchronously) at the onset to ensure recovery can handle
> > > the operation if interrupted before in-core changes are made. This
> > > quotaoff intent pins the log tail while the quotaoff sequence scans
> > > and purges dquots from all in-core inodes. While this operation
> > > generally doesn't generate much log traffic on its own, it can be
> > > time consuming. If unrelated, concurrent filesystem activity
> > > consumes remaining log space before quotaoff is able to acquire log
> > > reservation for the quotaoff end intent, the filesystem locks up
> > > indefinitely.
> > > 
> > > quotaoff cannot allocate the end intent before the scan because the
> > > latter can result in transaction allocation itself in certain
> > > indirect cases (releasing an inode, for example). Further, rolling
> > > the original transaction is difficult because the scanning work
> > > occurs multiple layers down where caller context is lost and not
> > > much information is available to determine how often to roll the
> > > transaction.
> > > 
> > > To address this problem, enable automatic relogging of the quotaoff
> > > start intent. This automatically relogs the intent whenever AIL
> > > pushing finds the item at the tail of the log. When quotaoff
> > > completes, wait for relogging to complete as the end intent expects
> > > to be able to permanently remove the start intent from the log
> > > subsystem. This ensures that the log tail is kept moving during a
> > > particularly long quotaoff operation and avoids the log reservation
> > > deadlock.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_trans_resv.c |  3 ++-
> > >  fs/xfs/xfs_dquot_item.c        |  7 +++++++
> > >  fs/xfs/xfs_qm_syscalls.c       | 12 +++++++++++-
> > >  3 files changed, 20 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> > > index 1f5c9e6e1afc..f49b20c9ca33 100644
> > > --- a/fs/xfs/libxfs/xfs_trans_resv.c
> > > +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> > > @@ -935,7 +935,8 @@ xfs_trans_resv_calc(
> > >  	resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT;
> > >  
> > >  	resp->tr_qm_quotaoff.tr_logres = xfs_calc_qm_quotaoff_reservation(mp);
> > > -	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT;
> > > +	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
> > > +	resp->tr_qm_quotaoff.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> > >  
> > >  	resp->tr_qm_equotaoff.tr_logres =
> > >  		xfs_calc_qm_quotaoff_end_reservation();
> > > diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
> > > index d60647d7197b..ea5123678466 100644
> > > --- a/fs/xfs/xfs_dquot_item.c
> > > +++ b/fs/xfs/xfs_dquot_item.c
> > > @@ -297,6 +297,13 @@ xfs_qm_qoff_logitem_push(
> > >  	struct xfs_log_item	*lip,
> > >  	struct list_head	*buffer_list)
> > >  {
> > > +	struct xfs_log_item	*mlip = xfs_ail_min(lip->li_ailp);
> > > +
> > > +	if (test_bit(XFS_LI_RELOG, &lip->li_flags) &&
> > > +	    !test_bit(XFS_LI_RELOGGED, &lip->li_flags) &&
> > > +	    !XFS_LSN_CMP(lip->li_lsn, mlip->li_lsn))
> > > +		return XFS_ITEM_RELOG;
> > > +
> > >  	return XFS_ITEM_LOCKED;
> > >  }
> > >  
> > > diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> > > index 1ea82764bf89..7b48d34da0f4 100644
> > > --- a/fs/xfs/xfs_qm_syscalls.c
> > > +++ b/fs/xfs/xfs_qm_syscalls.c
> > > @@ -18,6 +18,7 @@
> > >  #include "xfs_quota.h"
> > >  #include "xfs_qm.h"
> > >  #include "xfs_icache.h"
> > > +#include "xfs_trans_priv.h"
> > >  
> > >  STATIC int
> > >  xfs_qm_log_quotaoff(
> > > @@ -31,12 +32,14 @@ xfs_qm_log_quotaoff(
> > >  
> > >  	*qoffstartp = NULL;
> > >  
> > > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
> > > +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0,
> > > +				XFS_TRANS_RELOG, &tp);
> > 
> > Humm, maybe I don't understand how this works after all.  From what I
> > can tell from this patch, (1) the quotaoff transaction is created with
> > RELOG, so (2) the AIL steals some reservation from it for an eventual
> > relogging of the quotaoff item, and then (3) we log the quotaoff item.
> > 
> 
> Yep.
> 
> > Later, the AIL can decide to trigger the workqueue item to take the
> > ticket generated in step (2) to relog the item we logged in step (3) to
> > move the log tail forward, but what happens if there are further delays
> > and the AIL needs to relog again?  That ticket from (2) is now used up
> > and is gone, right?
> > 
> > I suppose some other RELOG transaction could wander in and generate a
> > new relog ticket, but as this is the only RELOG transaction that gets
> > created anywhere, that won't happen.  Is there some magic I missed? :)
> > 
> 
> xfs_ail_relog() only ever rolls its transaction, even if nothing else
> happens to be queued at the time, so the relog ticket constantly
> regrants. Since relogs never commit, the relog ticket always has
> available relog reservation so long as XFS_LI_RELOG items exist. Once
> there are no more relog items or transactions, the pending reservation
> is released via xfs_trans_ail_relog_put() -> xfs_log_done().

Aha, that's the subtlety I didn't quite catch. :)

Now that I see how this works for the simple case, I guess I'll try to
figure out on my own what would happen if we flooded the system with a
/lot/ of reloggable items.  Though I bet you've already done that, given
our earlier speculating about closing the writeback hole.

> It might be more simple to reason about the reservation model if you
> factor out the dynamic relog ticket bits. This is basically equivalent
> to the AIL allocating a relog transaction at mount time, constantly
> rolling it with relog items when they pass through, and then cancelling
> the reservation at unmount time. All of the extra XFS_TRANS_RELOG and
> reference counting and ticket management stuff is purely so we only have
> an active relog reservation when relogging is being used.

<nod>

--D

> Brian
> 
> > --D
> > 
> > >  	if (error)
> > >  		goto out;
> > >  
> > >  	qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT);
> > >  	xfs_trans_log_quotaoff_item(tp, qoffi);
> > > +	xfs_trans_relog_item(&qoffi->qql_item);
> > >  
> > >  	spin_lock(&mp->m_sb_lock);
> > >  	mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL;
> > > @@ -69,6 +72,13 @@ xfs_qm_log_quotaoff_end(
> > >  	int			error;
> > >  	struct xfs_qoff_logitem	*qoffi;
> > >  
> > > +	/*
> > > +	 * startqoff must be in the AIL and not the CIL when the end intent
> > > +	 * commits to ensure it is not readded to the AIL out of order. Wait on
> > > +	 * relog activity to drain to isolate startqoff to the AIL.
> > > +	 */
> > > +	xfs_trans_relog_item_cancel(&startqoff->qql_item, true);
> > > +
> > >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
> > >  	if (error)
> > >  		return error;
> > > -- 
> > > 2.21.1
> > > 
> > 
>
Brian Foster Feb. 29, 2020, 12:15 p.m. UTC | #7
On Fri, Feb 28, 2020 at 09:35:31PM -0800, Darrick J. Wong wrote:
> On Fri, Feb 28, 2020 at 09:04:13AM -0500, Brian Foster wrote:
> > On Thu, Feb 27, 2020 at 05:16:40PM -0800, Darrick J. Wong wrote:
> > > On Thu, Feb 27, 2020 at 08:43:18AM -0500, Brian Foster wrote:
> > > > The quotaoff operation has a rare but longstanding deadlock vector
> > > > in terms of how the operation is logged. A quotaoff start intent is
> > > > logged (synchronously) at the onset to ensure recovery can handle
> > > > the operation if interrupted before in-core changes are made. This
> > > > quotaoff intent pins the log tail while the quotaoff sequence scans
> > > > and purges dquots from all in-core inodes. While this operation
> > > > generally doesn't generate much log traffic on its own, it can be
> > > > time consuming. If unrelated, concurrent filesystem activity
> > > > consumes remaining log space before quotaoff is able to acquire log
> > > > reservation for the quotaoff end intent, the filesystem locks up
> > > > indefinitely.
> > > > 
> > > > quotaoff cannot allocate the end intent before the scan because the
> > > > latter can result in transaction allocation itself in certain
> > > > indirect cases (releasing an inode, for example). Further, rolling
> > > > the original transaction is difficult because the scanning work
> > > > occurs multiple layers down where caller context is lost and not
> > > > much information is available to determine how often to roll the
> > > > transaction.
> > > > 
> > > > To address this problem, enable automatic relogging of the quotaoff
> > > > start intent. This automatically relogs the intent whenever AIL
> > > > pushing finds the item at the tail of the log. When quotaoff
> > > > completes, wait for relogging to complete as the end intent expects
> > > > to be able to permanently remove the start intent from the log
> > > > subsystem. This ensures that the log tail is kept moving during a
> > > > particularly long quotaoff operation and avoids the log reservation
> > > > deadlock.
> > > > 
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_trans_resv.c |  3 ++-
> > > >  fs/xfs/xfs_dquot_item.c        |  7 +++++++
> > > >  fs/xfs/xfs_qm_syscalls.c       | 12 +++++++++++-
> > > >  3 files changed, 20 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> > > > index 1f5c9e6e1afc..f49b20c9ca33 100644
> > > > --- a/fs/xfs/libxfs/xfs_trans_resv.c
> > > > +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> > > > @@ -935,7 +935,8 @@ xfs_trans_resv_calc(
> > > >  	resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT;
> > > >  
> > > >  	resp->tr_qm_quotaoff.tr_logres = xfs_calc_qm_quotaoff_reservation(mp);
> > > > -	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT;
> > > > +	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
> > > > +	resp->tr_qm_quotaoff.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> > > >  
> > > >  	resp->tr_qm_equotaoff.tr_logres =
> > > >  		xfs_calc_qm_quotaoff_end_reservation();
> > > > diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
> > > > index d60647d7197b..ea5123678466 100644
> > > > --- a/fs/xfs/xfs_dquot_item.c
> > > > +++ b/fs/xfs/xfs_dquot_item.c
> > > > @@ -297,6 +297,13 @@ xfs_qm_qoff_logitem_push(
> > > >  	struct xfs_log_item	*lip,
> > > >  	struct list_head	*buffer_list)
> > > >  {
> > > > +	struct xfs_log_item	*mlip = xfs_ail_min(lip->li_ailp);
> > > > +
> > > > +	if (test_bit(XFS_LI_RELOG, &lip->li_flags) &&
> > > > +	    !test_bit(XFS_LI_RELOGGED, &lip->li_flags) &&
> > > > +	    !XFS_LSN_CMP(lip->li_lsn, mlip->li_lsn))
> > > > +		return XFS_ITEM_RELOG;
> > > > +
> > > >  	return XFS_ITEM_LOCKED;
> > > >  }
> > > >  
> > > > diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> > > > index 1ea82764bf89..7b48d34da0f4 100644
> > > > --- a/fs/xfs/xfs_qm_syscalls.c
> > > > +++ b/fs/xfs/xfs_qm_syscalls.c
> > > > @@ -18,6 +18,7 @@
> > > >  #include "xfs_quota.h"
> > > >  #include "xfs_qm.h"
> > > >  #include "xfs_icache.h"
> > > > +#include "xfs_trans_priv.h"
> > > >  
> > > >  STATIC int
> > > >  xfs_qm_log_quotaoff(
> > > > @@ -31,12 +32,14 @@ xfs_qm_log_quotaoff(
> > > >  
> > > >  	*qoffstartp = NULL;
> > > >  
> > > > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
> > > > +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0,
> > > > +				XFS_TRANS_RELOG, &tp);
> > > 
> > > Humm, maybe I don't understand how this works after all.  From what I
> > > can tell from this patch, (1) the quotaoff transaction is created with
> > > RELOG, so (2) the AIL steals some reservation from it for an eventual
> > > relogging of the quotaoff item, and then (3) we log the quotaoff item.
> > > 
> > 
> > Yep.
> > 
> > > Later, the AIL can decide to trigger the workqueue item to take the
> > > ticket generated in step (2) to relog the item we logged in step (3) to
> > > move the log tail forward, but what happens if there are further delays
> > > and the AIL needs to relog again?  That ticket from (2) is now used up
> > > and is gone, right?
> > > 
> > > I suppose some other RELOG transaction could wander in and generate a
> > > new relog ticket, but as this is the only RELOG transaction that gets
> > > created anywhere, that won't happen.  Is there some magic I missed? :)
> > > 
> > 
> > xfs_ail_relog() only ever rolls its transaction, even if nothing else
> > happens to be queued at the time, so the relog ticket constantly
> > regrants. Since relogs never commit, the relog ticket always has
> > available relog reservation so long as XFS_LI_RELOG items exist. Once
> > there are no more relog items or transactions, the pending reservation
> > is released via xfs_trans_ail_relog_put() -> xfs_log_done().
> 
> Aha, that's the subtlety I didn't quite catch. :)
> 
> Now that I see how this works for the simple case, I guess I'll try to
> figure out on my own what would happen if we flooded the system with a
> /lot/ of reloggable items.  Though I bet you've already done that, given
> our earlier speculating about closing the writeback hole.
> 

I haven't stressed it to the breakdown max yet (from a performance
perspective). My stress testing to this point has been focused on
correctness, particularly related to the relog reservation management,
and working out any kinks. I've stressed it enough so far that I haven't
been able to reproduce log res deadlocks that are otherwise reproducible
without the regrant mechanism in place. IOW, manually replace the relog
ticket bits with a regular old xfs_trans_alloc() in xfs_ail_relog(),
enable aggressive enough buffer relogging via the errortag and throw
fsstress at it and things will likely (eventually) grind to a halt
waiting on relog ticket reservation.

Brian

> > It might be more simple to reason about the reservation model if you
> > factor out the dynamic relog ticket bits. This is basically equivalent
> > to the AIL allocating a relog transaction at mount time, constantly
> > rolling it with relog items when they pass through, and then cancelling
> > the reservation at unmount time. All of the extra XFS_TRANS_RELOG and
> > reference counting and ticket management stuff is purely so we only have
> > an active relog reservation when relogging is being used.
> 
> <nod>
> 
> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > >  	if (error)
> > > >  		goto out;
> > > >  
> > > >  	qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT);
> > > >  	xfs_trans_log_quotaoff_item(tp, qoffi);
> > > > +	xfs_trans_relog_item(&qoffi->qql_item);
> > > >  
> > > >  	spin_lock(&mp->m_sb_lock);
> > > >  	mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL;
> > > > @@ -69,6 +72,13 @@ xfs_qm_log_quotaoff_end(
> > > >  	int			error;
> > > >  	struct xfs_qoff_logitem	*qoffi;
> > > >  
> > > > +	/*
> > > > +	 * startqoff must be in the AIL and not the CIL when the end intent
> > > > +	 * commits to ensure it is not readded to the AIL out of order. Wait on
> > > > +	 * relog activity to drain to isolate startqoff to the AIL.
> > > > +	 */
> > > > +	xfs_trans_relog_item_cancel(&startqoff->qql_item, true);
> > > > +
> > > >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
> > > >  	if (error)
> > > >  		return error;
> > > > -- 
> > > > 2.21.1
> > > > 
> > > 
> > 
>
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 1f5c9e6e1afc..f49b20c9ca33 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -935,7 +935,8 @@  xfs_trans_resv_calc(
 	resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT;
 
 	resp->tr_qm_quotaoff.tr_logres = xfs_calc_qm_quotaoff_reservation(mp);
-	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_LOG_COUNT;
+	resp->tr_qm_quotaoff.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
+	resp->tr_qm_quotaoff.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
 	resp->tr_qm_equotaoff.tr_logres =
 		xfs_calc_qm_quotaoff_end_reservation();
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index d60647d7197b..ea5123678466 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -297,6 +297,13 @@  xfs_qm_qoff_logitem_push(
 	struct xfs_log_item	*lip,
 	struct list_head	*buffer_list)
 {
+	struct xfs_log_item	*mlip = xfs_ail_min(lip->li_ailp);
+
+	if (test_bit(XFS_LI_RELOG, &lip->li_flags) &&
+	    !test_bit(XFS_LI_RELOGGED, &lip->li_flags) &&
+	    !XFS_LSN_CMP(lip->li_lsn, mlip->li_lsn))
+		return XFS_ITEM_RELOG;
+
 	return XFS_ITEM_LOCKED;
 }
 
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index 1ea82764bf89..7b48d34da0f4 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -18,6 +18,7 @@ 
 #include "xfs_quota.h"
 #include "xfs_qm.h"
 #include "xfs_icache.h"
+#include "xfs_trans_priv.h"
 
 STATIC int
 xfs_qm_log_quotaoff(
@@ -31,12 +32,14 @@  xfs_qm_log_quotaoff(
 
 	*qoffstartp = NULL;
 
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0,
+				XFS_TRANS_RELOG, &tp);
 	if (error)
 		goto out;
 
 	qoffi = xfs_trans_get_qoff_item(tp, NULL, flags & XFS_ALL_QUOTA_ACCT);
 	xfs_trans_log_quotaoff_item(tp, qoffi);
+	xfs_trans_relog_item(&qoffi->qql_item);
 
 	spin_lock(&mp->m_sb_lock);
 	mp->m_sb.sb_qflags = (mp->m_qflags & ~(flags)) & XFS_MOUNT_QUOTA_ALL;
@@ -69,6 +72,13 @@  xfs_qm_log_quotaoff_end(
 	int			error;
 	struct xfs_qoff_logitem	*qoffi;
 
+	/*
+	 * startqoff must be in the AIL and not the CIL when the end intent
+	 * commits to ensure it is not readded to the AIL out of order. Wait on
+	 * relog activity to drain to isolate startqoff to the AIL.
+	 */
+	xfs_trans_relog_item_cancel(&startqoff->qql_item, true);
+
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
 	if (error)
 		return error;