[34/45] xfs: rework per-iclog header CIL reservation

Message ID	20210305051143.182133-35-david@fromorbit.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-xfs-owner@kernel.org> From: Dave Chinner <david@fromorbit.com> To: linux-xfs@vger.kernel.org Subject: [PATCH 34/45] xfs: rework per-iclog header CIL reservation Date: Fri, 5 Mar 2021 16:11:32 +1100 Message-Id: <20210305051143.182133-35-david@fromorbit.com> In-Reply-To: <20210305051143.182133-1-david@fromorbit.com> References: <20210305051143.182133-1-david@fromorbit.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	xfs: consolidated log and optimisation changes \| expand [00/45,v3] xfs: consolidated log and optimisation changes [01/45] xfs: initialise attr fork on inode create [02/45] xfs: log stripe roundoff is a property of the log [03/45] xfs: separate CIL commit record IO [04/45] xfs: remove xfs_blkdev_issue_flush [05/45] xfs: async blkdev cache flush [06/45] xfs: CIL checkpoint flushes caches unconditionally [07/45] xfs: remove need_start_rec parameter from xlog_write() [08/45] xfs: journal IO cache flush reductions [09/45] xfs: Fix CIL throttle hang when CIL space used going backwards [10/45] xfs: reduce buffer log item shadow allocations [11/45] xfs: xfs_buf_item_size_segment() needs to pass segment offset [12/45] xfs: optimise xfs_buf_item_size/format for contiguous regions [13/45] xfs: xfs_log_force_lsn isn't passed a LSN [14/45] xfs: AIL needs asynchronous CIL forcing [15/45] xfs: CIL work is serialised, not pipelined [16/45] xfs: type verification is expensive [17/45] xfs: No need for inode number error injection in __xfs_dir3_data_check [18/45] xfs: reduce debug overhead of dir leaf/node checks [19/45] xfs: factor out the CIL transaction header building [20/45] xfs: only CIL pushes require a start record [21/45] xfs: embed the xlog_op_header in the unmount record [22/45] xfs: embed the xlog_op_header in the commit record [23/45] xfs: log tickets don't need log client id [24/45] xfs: move log iovec alignment to preparation function [25/45] xfs: reserve space and initialise xlog_op_header in item formatting [26/45] xfs: log ticket region debug is largely useless [27/45] xfs: pass lv chain length into xlog_write() [28/45] xfs: introduce xlog_write_single() [29/45] xfs:_introduce xlog_write_partial() [30/45] xfs: xlog_write() no longer needs contwr state [31/45] xfs: CIL context doesn't need to count iovecs [32/45] xfs: use the CIL space used counter for emptiness checks [33/45] xfs: lift init CIL reservation out of xc_cil_lock [34/45] xfs: rework per-iclog header CIL reservation [35/45] xfs: introduce per-cpu CIL tracking sructure [36/45] xfs: implement percpu cil space used calculation [37/45] xfs: track CIL ticket reservation in percpu structure [38/45] xfs: convert CIL busy extents to per-cpu [39/45] xfs: Add order IDs to log items in CIL [40/45] xfs: convert CIL to unordered per cpu lists [41/45] xfs: move CIL ordering to the logvec chain [42/45] xfs: __percpu_counter_compare() inode count debug too expensive [43/45] xfs: avoid cil push lock if possible [44/45] xfs: xlog_sync() manually adjusts grant head space [45/45] xfs: expanding delayed logging design with background material

Message ID

20210305051143.182133-35-david@fromorbit.com (mailing list archive)

State

Superseded

Headers

From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 34/45] xfs: rework per-iclog header CIL reservation
Date: Fri,  5 Mar 2021 16:11:32 +1100
Message-Id: <20210305051143.182133-35-david@fromorbit.com>
In-Reply-To: <20210305051143.182133-1-david@fromorbit.com>
References: <20210305051143.182133-1-david@fromorbit.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

xfs: consolidated log and optimisation changes | expand

Commit Message

Dave Chinner March 5, 2021, 5:11 a.m. UTC

From: Dave Chinner <dchinner@redhat.com>

For every iclog that a CIL push will use up, we need to ensure we
have space reserved for the iclog header in each iclog. It is
extremely difficult to do this accurately with a per-cpu counter
without expensive summing of the counter in every commit. However,
we know what the maximum CIL size is going to be because of the
hard space limit we have, and hence we know exactly how many iclogs
we are going to need to write out the CIL.

We are constrained by the requirement that small transactions only
have reservation space for a single iclog header built into them.
At commit time we don't know how much of the current transaction
reservation is made up of iclog header reservations as calculated by
xfs_log_calc_unit_res() when the ticket was reserved. As larger
reservations have multiple header spaces reserved, we can steal
more than one iclog header reservation at a time, but we only steal
the exact number needed for the given log vector size delta.

As a result, we don't know exactly when we are going to steal iclog
header reservations, nor do we know exactly how many we are going to
need for a given CIL.

To make things simple, start by calculating the worst case number of
iclog headers a full CIL push will require. Record this into an
atomic variable in the CIL. Then add a byte counter to the log
ticket that records exactly how much iclog header space has been
reserved in this ticket by xfs_log_calc_unit_res(). This tells us
exactly how much space we can steal from the ticket at transaction
commit time.

Now, at transaction commit time, we can check if the CIL has a full
iclog header reservation and, if not, steal the entire reservation
the current ticket holds for iclog headers. This minimises the
number of times we need to do atomic operations in the fast path,
but still guarantees we get all the reservations we need.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_log_rlimit.c |  2 +-
 fs/xfs/libxfs/xfs_shared.h     |  3 +-
 fs/xfs/xfs_log.c               | 12 +++++---
 fs/xfs/xfs_log_cil.c           | 55 ++++++++++++++++++++++++++--------
 fs/xfs/xfs_log_priv.h          | 20 +++++++------
 5 files changed, 64 insertions(+), 28 deletions(-)

Comments

Darrick J. Wong March 11, 2021, 12:03 a.m. UTC | #1

On Fri, Mar 05, 2021 at 04:11:32PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> For every iclog that a CIL push will use up, we need to ensure we
> have space reserved for the iclog header in each iclog. It is
> extremely difficult to do this accurately with a per-cpu counter
> without expensive summing of the counter in every commit. However,
> we know what the maximum CIL size is going to be because of the
> hard space limit we have, and hence we know exactly how many iclogs
> we are going to need to write out the CIL.
> 
> We are constrained by the requirement that small transactions only
> have reservation space for a single iclog header built into them.
> At commit time we don't know how much of the current transaction
> reservation is made up of iclog header reservations as calculated by
> xfs_log_calc_unit_res() when the ticket was reserved. As larger
> reservations have multiple header spaces reserved, we can steal
> more than one iclog header reservation at a time, but we only steal
> the exact number needed for the given log vector size delta.
> 
> As a result, we don't know exactly when we are going to steal iclog
> header reservations, nor do we know exactly how many we are going to
> need for a given CIL.
> 
> To make things simple, start by calculating the worst case number of
> iclog headers a full CIL push will require. Record this into an
> atomic variable in the CIL. Then add a byte counter to the log
> ticket that records exactly how much iclog header space has been
> reserved in this ticket by xfs_log_calc_unit_res(). This tells us
> exactly how much space we can steal from the ticket at transaction
> commit time.
> 
> Now, at transaction commit time, we can check if the CIL has a full
> iclog header reservation and, if not, steal the entire reservation
> the current ticket holds for iclog headers. This minimises the
> number of times we need to do atomic operations in the fast path,
> but still guarantees we get all the reservations we need.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_log_rlimit.c |  2 +-
>  fs/xfs/libxfs/xfs_shared.h     |  3 +-
>  fs/xfs/xfs_log.c               | 12 +++++---
>  fs/xfs/xfs_log_cil.c           | 55 ++++++++++++++++++++++++++--------
>  fs/xfs/xfs_log_priv.h          | 20 +++++++------
>  5 files changed, 64 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c
> index 7f55eb3f3653..75390134346d 100644
> --- a/fs/xfs/libxfs/xfs_log_rlimit.c
> +++ b/fs/xfs/libxfs/xfs_log_rlimit.c
> @@ -88,7 +88,7 @@ xfs_log_calc_minimum_size(
>  
>  	xfs_log_get_max_trans_res(mp, &tres);
>  
> -	max_logres = xfs_log_calc_unit_res(mp, tres.tr_logres);
> +	max_logres = xfs_log_calc_unit_res(mp, tres.tr_logres, NULL);

This is currently the only call site of xfs_log_calc_unit_res, so if a
subsequent patch doesn't make use of that last argument it should go
away.  (I don't know yet, I haven't looked...)

>  	if (tres.tr_logcount > 1)
>  		max_logres *= tres.tr_logcount;
>  
> diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
> index 8c61a461bf7b..b4791b817fe3 100644
> --- a/fs/xfs/libxfs/xfs_shared.h
> +++ b/fs/xfs/libxfs/xfs_shared.h
> @@ -48,7 +48,8 @@ extern const struct xfs_buf_ops xfs_symlink_buf_ops;
>  extern const struct xfs_buf_ops xfs_rtbuf_ops;
>  
>  /* log size calculation functions */
> -int	xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes);
> +int	xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes,
> +				int *niclogs);
>  int	xfs_log_calc_minimum_size(struct xfs_mount *);
>  
>  struct xfs_trans_res;
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 8f4f7ae84358..46a006d41184 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -3312,7 +3312,8 @@ xfs_log_ticket_get(
>  static int
>  xlog_calc_unit_res(
>  	struct xlog		*log,
> -	int			unit_bytes)
> +	int			unit_bytes,
> +	int			*niclogs)
>  {
>  	int			iclog_space;
>  	uint			num_headers;
> @@ -3392,15 +3393,18 @@ xlog_calc_unit_res(
>  	/* roundoff padding for transaction data and one for commit record */
>  	unit_bytes += 2 * log->l_iclog_roundoff;
>  
> +	if (niclogs)
> +		*niclogs = num_headers;
>  	return unit_bytes;
>  }
>  
>  int
>  xfs_log_calc_unit_res(
>  	struct xfs_mount	*mp,
> -	int			unit_bytes)
> +	int			unit_bytes,
> +	int			*niclogs)
>  {
> -	return xlog_calc_unit_res(mp->m_log, unit_bytes);
> +	return xlog_calc_unit_res(mp->m_log, unit_bytes, niclogs);
>  }
>  
>  /*
> @@ -3418,7 +3422,7 @@ xlog_ticket_alloc(
>  
>  	tic = kmem_cache_zalloc(xfs_log_ticket_zone, GFP_NOFS | __GFP_NOFAIL);
>  
> -	unit_res = xlog_calc_unit_res(log, unit_bytes);
> +	unit_res = xlog_calc_unit_res(log, unit_bytes, &tic->t_iclog_hdrs);

Ok, so each transaction ticket now gets to know the maximum number of
iclog headers that the transaction can consume if we use every last byte
of the reservation...

>  
>  	atomic_set(&tic->t_ref, 1);
>  	tic->t_task		= current;
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 50101336a7f4..f8fb2f59e24c 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -44,9 +44,20 @@ xlog_cil_ticket_alloc(
>  	 * transaction overhead reservation from the first transaction commit.
>  	 */
>  	tic->t_curr_res = 0;
> +	tic->t_iclog_hdrs = 0;
>  	return tic;
>  }
>  
> +static inline void
> +xlog_cil_set_iclog_hdr_count(struct xfs_cil *cil)
> +{
> +	struct xlog	*log = cil->xc_log;
> +
> +	atomic_set(&cil->xc_iclog_hdrs,
> +		   (XLOG_CIL_BLOCKING_SPACE_LIMIT(log) /
> +			(log->l_iclog_size - log->l_iclog_hsize)));
> +}
> +
>  /*
>   * Unavoidable forward declaration - xlog_cil_push_work() calls
>   * xlog_cil_ctx_alloc() itself.
> @@ -70,6 +81,7 @@ xlog_cil_ctx_switch(
>  	struct xfs_cil		*cil,
>  	struct xfs_cil_ctx	*ctx)
>  {
> +	xlog_cil_set_iclog_hdr_count(cil);

...and I guess every time the CIL gets a fresh context, we also record
the maximum number of iclog headers that we might be pushing to disk in
one go?  Which I guess happens if someone commits a lot of updates to a
filesystem, a comitting thread hits the throttle threshold, and now the
CIL has to switch contexts and write the old context's transactions to
disk?

>  	set_bit(XLOG_CIL_EMPTY, &cil->xc_flags);
>  	ctx->sequence = ++cil->xc_current_sequence;
>  	ctx->cil = cil;
> @@ -92,6 +104,7 @@ xlog_cil_init_post_recovery(
>  {
>  	log->l_cilp->xc_ctx->ticket = xlog_cil_ticket_alloc(log);
>  	log->l_cilp->xc_ctx->sequence = 1;
> +	xlog_cil_set_iclog_hdr_count(log->l_cilp);
>  }
>  
>  static inline int
> @@ -419,7 +432,6 @@ xlog_cil_insert_items(
>  	struct xfs_cil_ctx	*ctx = cil->xc_ctx;
>  	struct xfs_log_item	*lip;
>  	int			len = 0;
> -	int			iclog_space;
>  	int			iovhdr_res = 0, split_res = 0, ctx_res = 0;
>  
>  	ASSERT(tp);
> @@ -442,19 +454,36 @@ xlog_cil_insert_items(
>  	    test_and_clear_bit(XLOG_CIL_EMPTY, &cil->xc_flags))
>  		ctx_res = ctx->ticket->t_unit_res;
>  
> -	spin_lock(&cil->xc_cil_lock);
> -
> -	/* do we need space for more log record headers? */
> -	iclog_space = log->l_iclog_size - log->l_iclog_hsize;
> -	if (len > 0 && (ctx->space_used / iclog_space !=
> -				(ctx->space_used + len) / iclog_space)) {
> -		split_res = (len + iclog_space - 1) / iclog_space;
> -		/* need to take into account split region headers, too */
> -		split_res *= log->l_iclog_hsize + sizeof(struct xlog_op_header);
> -		ctx->ticket->t_unit_res += split_res;
> +	/*
> +	 * Check if we need to steal iclog headers. atomic_read() is not a
> +	 * locked atomic operation, so we can check the value before we do any
> +	 * real atomic ops in the fast path. If we've already taken the CIL unit
> +	 * reservation from this commit, we've already got one iclog header
> +	 * space reserved so we have to account for that otherwise we risk
> +	 * overrunning the reservation on this ticket.
> +	 *
> +	 * If the CIL is already at the hard limit, we might need more header
> +	 * space that originally reserved. So steal more header space from every
> +	 * commit that occurs once we are over the hard limit to ensure the CIL
> +	 * push won't run out of reservation space.
> +	 *
> +	 * This can steal more than we need, but that's OK.
> +	 */
> +	if (atomic_read(&cil->xc_iclog_hdrs) > 0 ||

If we haven't stolen enough iclog header space...

> +	    ctx->space_used + len >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {

...or we've hit a throttling threshold, in which case we know we're
going to push, so we might as well take everything and (I guess?) not
give back any reservation that would encourage more commits before we're
ready?

> +		int	split_res = log->l_iclog_hsize +
> +					sizeof(struct xlog_op_header);
> +		if (ctx_res)
> +			ctx_res += split_res * (tp->t_ticket->t_iclog_hdrs - 1);
> +		else
> +			ctx_res = split_res * tp->t_ticket->t_iclog_hdrs;
> +		atomic_sub(tp->t_ticket->t_iclog_hdrs, &cil->xc_iclog_hdrs);

What happens if xc_iclog_hdrs goes negative?  Does that merely mean that
we stole more space from the transaction than we needed?  Or does it
indicate that we're trying to cram too much into a single context?

I suppose I worry about what might happen if each transaction's
committed items actually somehow eats up every byte of reservation and
that actually translates to t_iclog_hdrs iclogs being written out with a
particular context, where sum(t_iclog_hdrs) is larger than what
xlog_cil_set_iclog_hdr_count() precomputes?

--D

>  	}
> -	tp->t_ticket->t_curr_res -= split_res + ctx_res + len;
> -	ctx->ticket->t_curr_res += split_res + ctx_res;
> +
> +	spin_lock(&cil->xc_cil_lock);
> +	tp->t_ticket->t_curr_res -= ctx_res + len;
> +	ctx->ticket->t_unit_res += ctx_res;
> +	ctx->ticket->t_curr_res += ctx_res;
>  	ctx->space_used += len;
>  
>  	/*
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index b0dc3bc9de59..e72d14c76e03 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -140,15 +140,16 @@ enum xlog_iclog_state {
>  #define XLOG_TIC_LEN_MAX	15
>  
>  typedef struct xlog_ticket {
> -	struct list_head   t_queue;	 /* reserve/write queue */
> -	struct task_struct *t_task;	 /* task that owns this ticket */
> -	xlog_tid_t	   t_tid;	 /* transaction identifier	 : 4  */
> -	atomic_t	   t_ref;	 /* ticket reference count       : 4  */
> -	int		   t_curr_res;	 /* current reservation in bytes : 4  */
> -	int		   t_unit_res;	 /* unit reservation in bytes    : 4  */
> -	char		   t_ocnt;	 /* original count		 : 1  */
> -	char		   t_cnt;	 /* current count		 : 1  */
> -	char		   t_flags;	 /* properties of reservation	 : 1  */
> +	struct list_head	t_queue;	/* reserve/write queue */
> +	struct task_struct	*t_task;	/* task that owns this ticket */
> +	xlog_tid_t		t_tid;		/* transaction identifier */
> +	atomic_t		t_ref;		/* ticket reference count */
> +	int			t_curr_res;	/* current reservation */
> +	int			t_unit_res;	/* unit reservation */
> +	char			t_ocnt;		/* original count */
> +	char			t_cnt;		/* current count */
> +	char			t_flags;	/* properties of reservation */
> +	int			t_iclog_hdrs;	/* iclog hdrs in t_curr_res */
>  } xlog_ticket_t;
>  
>  /*
> @@ -249,6 +250,7 @@ struct xfs_cil_ctx {
>  struct xfs_cil {
>  	struct xlog		*xc_log;
>  	unsigned long		xc_flags;
> +	atomic_t		xc_iclog_hdrs;
>  	struct list_head	xc_cil;
>  	spinlock_t		xc_cil_lock;
>  
> -- 
> 2.28.0
>

Dave Chinner March 11, 2021, 6:03 a.m. UTC | #2

On Wed, Mar 10, 2021 at 04:03:38PM -0800, Darrick J. Wong wrote:
> On Fri, Mar 05, 2021 at 04:11:32PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > For every iclog that a CIL push will use up, we need to ensure we
> > have space reserved for the iclog header in each iclog. It is
> > extremely difficult to do this accurately with a per-cpu counter
> > without expensive summing of the counter in every commit. However,
> > we know what the maximum CIL size is going to be because of the
> > hard space limit we have, and hence we know exactly how many iclogs
> > we are going to need to write out the CIL.
> > 
> > We are constrained by the requirement that small transactions only
> > have reservation space for a single iclog header built into them.
> > At commit time we don't know how much of the current transaction
> > reservation is made up of iclog header reservations as calculated by
> > xfs_log_calc_unit_res() when the ticket was reserved. As larger
> > reservations have multiple header spaces reserved, we can steal
> > more than one iclog header reservation at a time, but we only steal
> > the exact number needed for the given log vector size delta.
> > 
> > As a result, we don't know exactly when we are going to steal iclog
> > header reservations, nor do we know exactly how many we are going to
> > need for a given CIL.
> > 
> > To make things simple, start by calculating the worst case number of
> > iclog headers a full CIL push will require. Record this into an
> > atomic variable in the CIL. Then add a byte counter to the log
> > ticket that records exactly how much iclog header space has been
> > reserved in this ticket by xfs_log_calc_unit_res(). This tells us
> > exactly how much space we can steal from the ticket at transaction
> > commit time.
> > 
> > Now, at transaction commit time, we can check if the CIL has a full
> > iclog header reservation and, if not, steal the entire reservation
> > the current ticket holds for iclog headers. This minimises the
> > number of times we need to do atomic operations in the fast path,
> > but still guarantees we get all the reservations we need.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_log_rlimit.c |  2 +-
> >  fs/xfs/libxfs/xfs_shared.h     |  3 +-
> >  fs/xfs/xfs_log.c               | 12 +++++---
> >  fs/xfs/xfs_log_cil.c           | 55 ++++++++++++++++++++++++++--------
> >  fs/xfs/xfs_log_priv.h          | 20 +++++++------
> >  5 files changed, 64 insertions(+), 28 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c
> > index 7f55eb3f3653..75390134346d 100644
> > --- a/fs/xfs/libxfs/xfs_log_rlimit.c
> > +++ b/fs/xfs/libxfs/xfs_log_rlimit.c
> > @@ -88,7 +88,7 @@ xfs_log_calc_minimum_size(
> >  
> >  	xfs_log_get_max_trans_res(mp, &tres);
> >  
> > -	max_logres = xfs_log_calc_unit_res(mp, tres.tr_logres);
> > +	max_logres = xfs_log_calc_unit_res(mp, tres.tr_logres, NULL);
> 
> This is currently the only call site of xfs_log_calc_unit_res, so if a
> subsequent patch doesn't make use of that last argument it should go
> away.  (I don't know yet, I haven't looked...)

Can't remember, I'll have to check.

> > @@ -3418,7 +3422,7 @@ xlog_ticket_alloc(
> >  
> >  	tic = kmem_cache_zalloc(xfs_log_ticket_zone, GFP_NOFS | __GFP_NOFAIL);
> >  
> > -	unit_res = xlog_calc_unit_res(log, unit_bytes);
> > +	unit_res = xlog_calc_unit_res(log, unit_bytes, &tic->t_iclog_hdrs);
> 
> Ok, so each transaction ticket now gets to know the maximum number of
> iclog headers that the transaction can consume if we use every last byte
> of the reservation...

yes.

> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 50101336a7f4..f8fb2f59e24c 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -44,9 +44,20 @@ xlog_cil_ticket_alloc(
> >  	 * transaction overhead reservation from the first transaction commit.
> >  	 */
> >  	tic->t_curr_res = 0;
> > +	tic->t_iclog_hdrs = 0;
> >  	return tic;
> >  }
> >  
> > +static inline void
> > +xlog_cil_set_iclog_hdr_count(struct xfs_cil *cil)
> > +{
> > +	struct xlog	*log = cil->xc_log;
> > +
> > +	atomic_set(&cil->xc_iclog_hdrs,
> > +		   (XLOG_CIL_BLOCKING_SPACE_LIMIT(log) /
> > +			(log->l_iclog_size - log->l_iclog_hsize)));
> > +}
> > +
> >  /*
> >   * Unavoidable forward declaration - xlog_cil_push_work() calls
> >   * xlog_cil_ctx_alloc() itself.
> > @@ -70,6 +81,7 @@ xlog_cil_ctx_switch(
> >  	struct xfs_cil		*cil,
> >  	struct xfs_cil_ctx	*ctx)
> >  {
> > +	xlog_cil_set_iclog_hdr_count(cil);
> 
> ...and I guess every time the CIL gets a fresh context, we also record
> the maximum number of iclog headers that we might be pushing to disk in
> one go?

Yes. that defines the maximum size of the iclog header reservation
the CIL checkpoint is going to need if it stays within the hard
limit.

> Which I guess happens if someone commits a lot of updates to a
> filesystem, a comitting thread hits the throttle threshold, and now the
> CIL has to switch contexts and write the old context's transactions to
> disk?

Right - it reserves enough space for delays in context switches to
use all the overrun without having to do anything ... slow.

> > @@ -442,19 +454,36 @@ xlog_cil_insert_items(
> >  	    test_and_clear_bit(XLOG_CIL_EMPTY, &cil->xc_flags))
> >  		ctx_res = ctx->ticket->t_unit_res;
> >  
> > -	spin_lock(&cil->xc_cil_lock);
> > -
> > -	/* do we need space for more log record headers? */
> > -	iclog_space = log->l_iclog_size - log->l_iclog_hsize;
> > -	if (len > 0 && (ctx->space_used / iclog_space !=
> > -				(ctx->space_used + len) / iclog_space)) {
> > -		split_res = (len + iclog_space - 1) / iclog_space;
> > -		/* need to take into account split region headers, too */
> > -		split_res *= log->l_iclog_hsize + sizeof(struct xlog_op_header);
> > -		ctx->ticket->t_unit_res += split_res;
> > +	/*
> > +	 * Check if we need to steal iclog headers. atomic_read() is not a
> > +	 * locked atomic operation, so we can check the value before we do any
> > +	 * real atomic ops in the fast path. If we've already taken the CIL unit
> > +	 * reservation from this commit, we've already got one iclog header
> > +	 * space reserved so we have to account for that otherwise we risk
> > +	 * overrunning the reservation on this ticket.
> > +	 *
> > +	 * If the CIL is already at the hard limit, we might need more header
> > +	 * space that originally reserved. So steal more header space from every
> > +	 * commit that occurs once we are over the hard limit to ensure the CIL
> > +	 * push won't run out of reservation space.
> > +	 *
> > +	 * This can steal more than we need, but that's OK.
> > +	 */
> > +	if (atomic_read(&cil->xc_iclog_hdrs) > 0 ||
> 
> If we haven't stolen enough iclog header space...
> 
> > +	    ctx->space_used + len >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
> 
> ...or we've hit a throttling threshold, in which case we know we're
> going to push, so we might as well take everything and (I guess?) not
> give back any reservation that would encourage more commits before we're
> ready?

Partially. This is also safety against the CIL bumping back
down below and above the space limit multiple times. It just ensures
that every transaction that commits over the hard limit is
guaranteed to have enough iclog headers reserved to write the CIL
when it goes over the hard limit.

> > +		int	split_res = log->l_iclog_hsize +
> > +					sizeof(struct xlog_op_header);
> > +		if (ctx_res)
> > +			ctx_res += split_res * (tp->t_ticket->t_iclog_hdrs - 1);
> > +		else
> > +			ctx_res = split_res * tp->t_ticket->t_iclog_hdrs;
> > +		atomic_sub(tp->t_ticket->t_iclog_hdrs, &cil->xc_iclog_hdrs);
> 
> What happens if xc_iclog_hdrs goes negative?  Does that merely mean that
> we stole more space from the transaction than we needed?  Or does it
> indicate that we're trying to cram too much into a single context?

Nothing. Yes. Indicates that we have commits throttling on the hard
limit.

> I suppose I worry about what might happen if each transaction's
> committed items actually somehow eats up every byte of reservation and
> that actually translates to t_iclog_hdrs iclogs being written out with a
> particular context, where sum(t_iclog_hdrs) is larger than what
> xlog_cil_set_iclog_hdr_count() precomputes?

If I understand what you are asking correctly, that should never
happen because the iclog header count should always span the maximum
number of iclogs that change requires to write into the log. And the
CIL context also reserves enough headers to write the entire set of
CIL data to the iclogs, so again we should not ever get into an
overrun situation because we have maximally dirty transactions being
committed. If these sorts of overruns ever occur, we've got a unit
reservation calculation issue, not a CIL iclog header space
reservation stealling issue...

Cheers,

Dave.

diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c
index 7f55eb3f3653..75390134346d 100644
--- a/fs/xfs/libxfs/xfs_log_rlimit.c
+++ b/fs/xfs/libxfs/xfs_log_rlimit.c
@@ -88,7 +88,7 @@  xfs_log_calc_minimum_size(
 
 	xfs_log_get_max_trans_res(mp, &tres);
 
-	max_logres = xfs_log_calc_unit_res(mp, tres.tr_logres);
+	max_logres = xfs_log_calc_unit_res(mp, tres.tr_logres, NULL);
 	if (tres.tr_logcount > 1)
 		max_logres *= tres.tr_logcount;
 
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 8c61a461bf7b..b4791b817fe3 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -48,7 +48,8 @@  extern const struct xfs_buf_ops xfs_symlink_buf_ops;
 extern const struct xfs_buf_ops xfs_rtbuf_ops;
 
 /* log size calculation functions */
-int	xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes);
+int	xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes,
+				int *niclogs);
 int	xfs_log_calc_minimum_size(struct xfs_mount *);
 
 struct xfs_trans_res;
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 8f4f7ae84358..46a006d41184 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3312,7 +3312,8 @@  xfs_log_ticket_get(
 static int
 xlog_calc_unit_res(
 	struct xlog		*log,
-	int			unit_bytes)
+	int			unit_bytes,
+	int			*niclogs)
 {
 	int			iclog_space;
 	uint			num_headers;
@@ -3392,15 +3393,18 @@  xlog_calc_unit_res(
 	/* roundoff padding for transaction data and one for commit record */
 	unit_bytes += 2 * log->l_iclog_roundoff;
 
+	if (niclogs)
+		*niclogs = num_headers;
 	return unit_bytes;
 }
 
 int
 xfs_log_calc_unit_res(
 	struct xfs_mount	*mp,
-	int			unit_bytes)
+	int			unit_bytes,
+	int			*niclogs)
 {
-	return xlog_calc_unit_res(mp->m_log, unit_bytes);
+	return xlog_calc_unit_res(mp->m_log, unit_bytes, niclogs);
 }
 
 /*
@@ -3418,7 +3422,7 @@  xlog_ticket_alloc(
 
 	tic = kmem_cache_zalloc(xfs_log_ticket_zone, GFP_NOFS | __GFP_NOFAIL);
 
-	unit_res = xlog_calc_unit_res(log, unit_bytes);
+	unit_res = xlog_calc_unit_res(log, unit_bytes, &tic->t_iclog_hdrs);
 
 	atomic_set(&tic->t_ref, 1);
 	tic->t_task		= current;
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 50101336a7f4..f8fb2f59e24c 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -44,9 +44,20 @@  xlog_cil_ticket_alloc(
 	 * transaction overhead reservation from the first transaction commit.
 	 */
 	tic->t_curr_res = 0;
+	tic->t_iclog_hdrs = 0;
 	return tic;
 }
 
+static inline void
+xlog_cil_set_iclog_hdr_count(struct xfs_cil *cil)
+{
+	struct xlog	*log = cil->xc_log;
+
+	atomic_set(&cil->xc_iclog_hdrs,
+		   (XLOG_CIL_BLOCKING_SPACE_LIMIT(log) /
+			(log->l_iclog_size - log->l_iclog_hsize)));
+}
+
 /*
  * Unavoidable forward declaration - xlog_cil_push_work() calls
  * xlog_cil_ctx_alloc() itself.
@@ -70,6 +81,7 @@  xlog_cil_ctx_switch(
 	struct xfs_cil		*cil,
 	struct xfs_cil_ctx	*ctx)
 {
+	xlog_cil_set_iclog_hdr_count(cil);
 	set_bit(XLOG_CIL_EMPTY, &cil->xc_flags);
 	ctx->sequence = ++cil->xc_current_sequence;
 	ctx->cil = cil;
@@ -92,6 +104,7 @@  xlog_cil_init_post_recovery(
 {
 	log->l_cilp->xc_ctx->ticket = xlog_cil_ticket_alloc(log);
 	log->l_cilp->xc_ctx->sequence = 1;
+	xlog_cil_set_iclog_hdr_count(log->l_cilp);
 }
 
 static inline int
@@ -419,7 +432,6 @@  xlog_cil_insert_items(
 	struct xfs_cil_ctx	*ctx = cil->xc_ctx;
 	struct xfs_log_item	*lip;
 	int			len = 0;
-	int			iclog_space;
 	int			iovhdr_res = 0, split_res = 0, ctx_res = 0;
 
 	ASSERT(tp);
@@ -442,19 +454,36 @@  xlog_cil_insert_items(
 	    test_and_clear_bit(XLOG_CIL_EMPTY, &cil->xc_flags))
 		ctx_res = ctx->ticket->t_unit_res;
 
-	spin_lock(&cil->xc_cil_lock);
-
-	/* do we need space for more log record headers? */
-	iclog_space = log->l_iclog_size - log->l_iclog_hsize;
-	if (len > 0 && (ctx->space_used / iclog_space !=
-				(ctx->space_used + len) / iclog_space)) {
-		split_res = (len + iclog_space - 1) / iclog_space;
-		/* need to take into account split region headers, too */
-		split_res *= log->l_iclog_hsize + sizeof(struct xlog_op_header);
-		ctx->ticket->t_unit_res += split_res;
+	/*
+	 * Check if we need to steal iclog headers. atomic_read() is not a
+	 * locked atomic operation, so we can check the value before we do any
+	 * real atomic ops in the fast path. If we've already taken the CIL unit
+	 * reservation from this commit, we've already got one iclog header
+	 * space reserved so we have to account for that otherwise we risk
+	 * overrunning the reservation on this ticket.
+	 *
+	 * If the CIL is already at the hard limit, we might need more header
+	 * space that originally reserved. So steal more header space from every
+	 * commit that occurs once we are over the hard limit to ensure the CIL
+	 * push won't run out of reservation space.
+	 *
+	 * This can steal more than we need, but that's OK.
+	 */
+	if (atomic_read(&cil->xc_iclog_hdrs) > 0 ||
+	    ctx->space_used + len >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
+		int	split_res = log->l_iclog_hsize +
+					sizeof(struct xlog_op_header);
+		if (ctx_res)
+			ctx_res += split_res * (tp->t_ticket->t_iclog_hdrs - 1);
+		else
+			ctx_res = split_res * tp->t_ticket->t_iclog_hdrs;
+		atomic_sub(tp->t_ticket->t_iclog_hdrs, &cil->xc_iclog_hdrs);
 	}
-	tp->t_ticket->t_curr_res -= split_res + ctx_res + len;
-	ctx->ticket->t_curr_res += split_res + ctx_res;
+
+	spin_lock(&cil->xc_cil_lock);
+	tp->t_ticket->t_curr_res -= ctx_res + len;
+	ctx->ticket->t_unit_res += ctx_res;
+	ctx->ticket->t_curr_res += ctx_res;
 	ctx->space_used += len;
 
 	/*
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index b0dc3bc9de59..e72d14c76e03 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -140,15 +140,16 @@  enum xlog_iclog_state {
 #define XLOG_TIC_LEN_MAX	15
 
 typedef struct xlog_ticket {
-	struct list_head   t_queue;	 /* reserve/write queue */
-	struct task_struct *t_task;	 /* task that owns this ticket */
-	xlog_tid_t	   t_tid;	 /* transaction identifier	 : 4  */
-	atomic_t	   t_ref;	 /* ticket reference count       : 4  */
-	int		   t_curr_res;	 /* current reservation in bytes : 4  */
-	int		   t_unit_res;	 /* unit reservation in bytes    : 4  */
-	char		   t_ocnt;	 /* original count		 : 1  */
-	char		   t_cnt;	 /* current count		 : 1  */
-	char		   t_flags;	 /* properties of reservation	 : 1  */
+	struct list_head	t_queue;	/* reserve/write queue */
+	struct task_struct	*t_task;	/* task that owns this ticket */
+	xlog_tid_t		t_tid;		/* transaction identifier */
+	atomic_t		t_ref;		/* ticket reference count */
+	int			t_curr_res;	/* current reservation */
+	int			t_unit_res;	/* unit reservation */
+	char			t_ocnt;		/* original count */
+	char			t_cnt;		/* current count */
+	char			t_flags;	/* properties of reservation */
+	int			t_iclog_hdrs;	/* iclog hdrs in t_curr_res */
 } xlog_ticket_t;
 
 /*
@@ -249,6 +250,7 @@  struct xfs_cil_ctx {
 struct xfs_cil {
 	struct xlog		*xc_log;
 	unsigned long		xc_flags;
+	atomic_t		xc_iclog_hdrs;
 	struct list_head	xc_cil;
 	spinlock_t		xc_cil_lock;

[34/45] xfs: rework per-iclog header CIL reservation

Commit Message

Comments

Patch