diff mbox series

[4/4] xfs: xfs_defer_capture should absorb remaining transaction reservation

Message ID 160125009361.174438.2579393022515355249.stgit@magnolia (mailing list archive)
State Superseded
Headers show
Series xfs: fix how we deal with new intents during recovery | expand

Commit Message

Darrick J. Wong Sept. 27, 2020, 11:41 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should record the transaction reservation type
from the old transaction so that when we continue the dfops chain, we
still use the same reservation parameters.

This avoids a potential failure vector by ensuring that we never ask for
more log reservation space than we would have asked for had the system
not gone down.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c |    5 +++++
 fs/xfs/libxfs/xfs_defer.h |    1 +
 fs/xfs/xfs_log_recover.c  |    4 ++--
 3 files changed, 8 insertions(+), 2 deletions(-)

Comments

Dave Chinner Sept. 28, 2020, 5:52 a.m. UTC | #1
On Sun, Sep 27, 2020 at 04:41:33PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When xfs_defer_capture extracts the deferred ops and transaction state
> from a transaction, it should record the transaction reservation type
> from the old transaction so that when we continue the dfops chain, we
> still use the same reservation parameters.
> 
> This avoids a potential failure vector by ensuring that we never ask for
> more log reservation space than we would have asked for had the system
> not gone down.

Nope, it does not do that.

> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c |    5 +++++
>  fs/xfs/libxfs/xfs_defer.h |    1 +
>  fs/xfs/xfs_log_recover.c  |    4 ++--
>  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 85d70f1edc1c..c53443252389 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -577,6 +577,11 @@ xfs_defer_capture(
>  	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
>  	tp->t_blk_res = tp->t_blk_res_used;
>  
> +	/* Preserve the transaction reservation type. */
> +	dfc->dfc_tres.tr_logres = tp->t_log_res;
> +	dfc->dfc_tres.tr_logcount = tp->t_log_count;
> +	dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;

This means every child deferop takes a full tp->t_log_count
reservation, whilst in memory the child reservation would ahve been
handled by the parent via the log ticket unit count being
decremented by one. Hence child deferops -never- run with the same
maximal reservation that their parents held.

The difference is that at runtime we are rolling transaction which
regrant space from the initial reservation of (tp->t_log_count *
tp->t_log_res) made a run time. i.e. the first child deferop that
runs has a total log space grant of ((tp->t_log_count - 1)
* tp->t_log_res), the second it is "- 2", and so on right down to
when the log ticket runs out of initial reservation and so it goes
to reserving a single unit (tp->t_log_res) at a time.

Hence both the intents being recovered and all their children are
over-reserving log space by using the default log count for the
&M_RES(mp)->tr_itruncate reservation. Even if we ignore the initial
reservation being incorrect, the child reservations of the same size
as the parent are definitely incorrect. They really should be
allowed only a single unit reservation, and if the transaction rolls
to process defer ops, it needs to regrant new log space during the
commit process.

Hence I think this can only be correct as:

	dfc->dfc_tres.tr_log_count = 1;

Regardless of how many units the parent recovery reservation
obtained. (Which I also think can only be correct as 1 because we
don't know how many units of reservation space the parent had
consumed when it was logged.)

Cheers,

Dave.
Darrick J. Wong Sept. 28, 2020, 5:15 p.m. UTC | #2
On Mon, Sep 28, 2020 at 03:52:30PM +1000, Dave Chinner wrote:
> On Sun, Sep 27, 2020 at 04:41:33PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When xfs_defer_capture extracts the deferred ops and transaction state
> > from a transaction, it should record the transaction reservation type
> > from the old transaction so that when we continue the dfops chain, we
> > still use the same reservation parameters.
> > 
> > This avoids a potential failure vector by ensuring that we never ask for
> > more log reservation space than we would have asked for had the system
> > not gone down.
> 
> Nope, it does not do that.
> 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_defer.c |    5 +++++
> >  fs/xfs/libxfs/xfs_defer.h |    1 +
> >  fs/xfs/xfs_log_recover.c  |    4 ++--
> >  3 files changed, 8 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > index 85d70f1edc1c..c53443252389 100644
> > --- a/fs/xfs/libxfs/xfs_defer.c
> > +++ b/fs/xfs/libxfs/xfs_defer.c
> > @@ -577,6 +577,11 @@ xfs_defer_capture(
> >  	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
> >  	tp->t_blk_res = tp->t_blk_res_used;
> >  
> > +	/* Preserve the transaction reservation type. */
> > +	dfc->dfc_tres.tr_logres = tp->t_log_res;
> > +	dfc->dfc_tres.tr_logcount = tp->t_log_count;
> > +	dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
> 
> This means every child deferop takes a full tp->t_log_count
> reservation, whilst in memory the child reservation would ahve been
> handled by the parent via the log ticket unit count being
> decremented by one. Hence child deferops -never- run with the same
> maximal reservation that their parents held.
> 
> The difference is that at runtime we are rolling transaction which
> regrant space from the initial reservation of (tp->t_log_count *
> tp->t_log_res) made a run time. i.e. the first child deferop that
> runs has a total log space grant of ((tp->t_log_count - 1)
> * tp->t_log_res), the second it is "- 2", and so on right down to
> when the log ticket runs out of initial reservation and so it goes
> to reserving a single unit (tp->t_log_res) at a time.
> 
> Hence both the intents being recovered and all their children are
> over-reserving log space by using the default log count for the
> &M_RES(mp)->tr_itruncate reservation. Even if we ignore the initial
> reservation being incorrect, the child reservations of the same size
> as the parent are definitely incorrect. They really should be
> allowed only a single unit reservation, and if the transaction rolls
> to process defer ops, it needs to regrant new log space during the
> commit process.
> 
> Hence I think this can only be correct as:
> 
> 	dfc->dfc_tres.tr_log_count = 1;
> 
> Regardless of how many units the parent recovery reservation
> obtained. (Which I also think can only be correct as 1 because we
> don't know how many units of reservation space the parent had
> consumed when it was logged.)

Can we reach down into the transaction's xlog_ticket.t_cnt to find out
how many were left?

Hm.  Maybe that doesn't matter, seeing as recovery basically
single-steps through whatever it salvaged from the log, I think it's
better for recovery to regrant every time it rolls the transaction
because the amount of space required for a single step will (once we get
the logres part sorted out) never be larger than the sum of the logres
of all transactions that were running when we crashed.

It also occurs to me (esp. given the earlier conversation about the
transaction reservations used to recover intents) that in general we
might need to finish one item to push the tail forward to free up space,
so therefore log recovery should try to use as few resources as
possible.

Well, I'll try hardcoding logcount=1 and see what happens. :)

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 85d70f1edc1c..c53443252389 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -577,6 +577,11 @@  xfs_defer_capture(
 	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
 	tp->t_blk_res = tp->t_blk_res_used;
 
+	/* Preserve the transaction reservation type. */
+	dfc->dfc_tres.tr_logres = tp->t_log_res;
+	dfc->dfc_tres.tr_logcount = tp->t_log_count;
+	dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
+
 	return dfc;
 }
 
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 999adbb8c449..063276c063b6 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -77,6 +77,7 @@  struct xfs_defer_capture {
 	struct list_head	dfc_dfops;
 	unsigned int		dfc_tpflags;
 	unsigned int		dfc_blkres;
+	struct xfs_trans_res	dfc_tres;
 };
 
 /*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 60670ac4e5ae..0d899ab7df2e 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2487,8 +2487,8 @@  xlog_finish_defer_ops(
 	int			error = 0;
 
 	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0,
-				0, XFS_TRANS_RESERVE, &tp);
+		error = xfs_trans_alloc(mp, &dfc->dfc_tres, 0, 0,
+				XFS_TRANS_RESERVE, &tp);
 		if (error)
 			return error;