diff mbox

[031/119] xfs: rmap btree requires more reserved free space

Message ID 146612647122.12839.13018596528662402580.stgit@birch.djwong.org (mailing list archive)
State New, archived
Headers show

Commit Message

Darrick J. Wong June 17, 2016, 1:21 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

The rmap btree is allocated from the AGFL, which means we have to
ensure ENOSPC is reported to userspace before we run out of free
space in each AG. The last allocation in an AG can cause a full
height rmap btree split, and that means we have to reserve at least
this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
Update the various space calculation functiosn to handle this.

Also, because the macros are now executing conditional code and are called quite
frequently, convert them to functions that initialise varaibles in the struct
xfs_mount, use the new variables everywhere and document the calculations
better.

v2: If rmapbt is disabled, it is incorrect to require 1 extra AGFL block
for the rmapbt (due to the + 1); the entire clause needs to be gated
on the feature flag.

v3: Use m_rmap_maxlevels to determine min_free.

[darrick.wong@oracle.com: don't reserve blocks if !rmap]
[dchinner@redhat.com: update m_ag_max_usable after growfs]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/libxfs/xfs_alloc.c |   71 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_alloc.h |   41 +++-----------------------
 fs/xfs/libxfs/xfs_bmap.c  |    2 +
 fs/xfs/libxfs/xfs_sb.c    |    2 +
 fs/xfs/xfs_discard.c      |    2 +
 fs/xfs/xfs_fsops.c        |    5 ++-
 fs/xfs/xfs_log_recover.c  |    1 +
 fs/xfs/xfs_mount.c        |    2 +
 fs/xfs/xfs_mount.h        |    2 +
 fs/xfs/xfs_super.c        |    2 +
 10 files changed, 88 insertions(+), 42 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Brian Foster July 8, 2016, 1:21 p.m. UTC | #1
On Thu, Jun 16, 2016 at 06:21:11PM -0700, Darrick J. Wong wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The rmap btree is allocated from the AGFL, which means we have to
> ensure ENOSPC is reported to userspace before we run out of free
> space in each AG. The last allocation in an AG can cause a full
> height rmap btree split, and that means we have to reserve at least
> this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
> Update the various space calculation functiosn to handle this.

				       functions

> 
> Also, because the macros are now executing conditional code and are called quite
> frequently, convert them to functions that initialise varaibles in the struct
> xfs_mount, use the new variables everywhere and document the calculations
> better.
> 
> v2: If rmapbt is disabled, it is incorrect to require 1 extra AGFL block
> for the rmapbt (due to the + 1); the entire clause needs to be gated
> on the feature flag.
> 
> v3: Use m_rmap_maxlevels to determine min_free.
> 
> [darrick.wong@oracle.com: don't reserve blocks if !rmap]
> [dchinner@redhat.com: update m_ag_max_usable after growfs]
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c |   71 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_alloc.h |   41 +++-----------------------
>  fs/xfs/libxfs/xfs_bmap.c  |    2 +
>  fs/xfs/libxfs/xfs_sb.c    |    2 +
>  fs/xfs/xfs_discard.c      |    2 +
>  fs/xfs/xfs_fsops.c        |    5 ++-
>  fs/xfs/xfs_log_recover.c  |    1 +
>  fs/xfs/xfs_mount.c        |    2 +
>  fs/xfs/xfs_mount.h        |    2 +
>  fs/xfs/xfs_super.c        |    2 +
>  10 files changed, 88 insertions(+), 42 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 570ca17..4c8ffd4 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -63,6 +63,72 @@ xfs_prealloc_blocks(
>  }
>  
>  /*
> + * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
> + * AGF buffer (PV 947395), we place constraints on the relationship among
> + * actual allocations for data blocks, freelist blocks, and potential file data
> + * bmap btree blocks. However, these restrictions may result in no actual space
> + * allocated for a delayed extent, for example, a data block in a certain AG is
> + * allocated but there is no additional block for the additional bmap btree
> + * block due to a split of the bmap btree of the file. The result of this may
> + * lead to an infinite loop when the file gets flushed to disk and all delayed
> + * extents need to be actually allocated. To get around this, we explicitly set
> + * aside a few blocks which will not be reserved in delayed allocation.
> + *
> + * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
> + * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
> + * so we set the number of set-aside blocks to 4 + 4*agcount when not using
> + * rmap btrees.
> + *

That's a bit wordy.

> + * When rmap btrees are active, we have to consider that using the last block
> + * in the AG can cause a full height rmap btree split and we need enough blocks
> + * on the AGFL to be able to handle this. That means we have, in addition to
> + * the above consideration, another (2 * mp->m_rmap_levels) - 1 blocks required
> + * to be available to the free list.

I'm probably missing something, but why does a full tree split require 2
blocks per-level (minus 1)? Wouldn't that involve an allocated block per
level (and possibly a new root block)?

Otherwise, the rest looks good to me.

Brian

> + */
> +unsigned int
> +xfs_alloc_set_aside(
> +	struct xfs_mount *mp)
> +{
> +	unsigned int	blocks;
> +
> +	blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
> +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		return blocks;
> +	return blocks + (mp->m_sb.sb_agcount * (2 * mp->m_rmap_maxlevels) - 1);
> +}
> +
> +/*
> + * When deciding how much space to allocate out of an AG, we limit the
> + * allocation maximum size to the size the AG. However, we cannot use all the
> + * blocks in the AG - some are permanently used by metadata. These
> + * blocks are generally:
> + *	- the AG superblock, AGF, AGI and AGFL
> + *	- the AGF (bno and cnt) and AGI btree root blocks, and optionally
> + *	  the AGI free inode and rmap btree root blocks.
> + *	- blocks on the AGFL according to xfs_alloc_set_aside() limits
> + *
> + * The AG headers are sector sized, so the amount of space they take up is
> + * dependent on filesystem geometry. The others are all single blocks.
> + */
> +unsigned int
> +xfs_alloc_ag_max_usable(struct xfs_mount *mp)
> +{
> +	unsigned int	blocks;
> +
> +	blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
> +	blocks += XFS_ALLOC_AGFL_RESERVE;
> +	blocks += 3;			/* AGF, AGI btree root blocks */
> +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> +		blocks++;		/* finobt root block */
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +		/* rmap root block + full tree split on full AG */
> +		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
> +	}
> +
> +	return mp->m_sb.sb_agblocks - blocks;
> +}
> +
> +/*
>   * Lookup the record equal to [bno, len] in the btree given by cur.
>   */
>  STATIC int				/* error */
> @@ -1904,6 +1970,11 @@ xfs_alloc_min_freelist(
>  	/* space needed by-size freespace btree */
>  	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
>  				       mp->m_ag_maxlevels);
> +	/* space needed reverse mapping used space btree */
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		min_free += min_t(unsigned int,
> +				  pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
> +				  mp->m_rmap_maxlevels);
>  
>  	return min_free;
>  }
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index 0721a48..7b6c66b 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -56,42 +56,6 @@ typedef unsigned int xfs_alloctype_t;
>  #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
>  
>  /*
> - * In order to avoid ENOSPC-related deadlock caused by
> - * out-of-order locking of AGF buffer (PV 947395), we place
> - * constraints on the relationship among actual allocations for
> - * data blocks, freelist blocks, and potential file data bmap
> - * btree blocks. However, these restrictions may result in no
> - * actual space allocated for a delayed extent, for example, a data
> - * block in a certain AG is allocated but there is no additional
> - * block for the additional bmap btree block due to a split of the
> - * bmap btree of the file. The result of this may lead to an
> - * infinite loop in xfssyncd when the file gets flushed to disk and
> - * all delayed extents need to be actually allocated. To get around
> - * this, we explicitly set aside a few blocks which will not be
> - * reserved in delayed allocation. Considering the minimum number of
> - * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
> - * btree requires 1 fsb, so we set the number of set-aside blocks
> - * to 4 + 4*agcount.
> - */
> -#define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
> -
> -/*
> - * When deciding how much space to allocate out of an AG, we limit the
> - * allocation maximum size to the size the AG. However, we cannot use all the
> - * blocks in the AG - some are permanently used by metadata. These
> - * blocks are generally:
> - *	- the AG superblock, AGF, AGI and AGFL
> - *	- the AGF (bno and cnt) and AGI btree root blocks
> - *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
> - *
> - * The AG headers are sector sized, so the amount of space they take up is
> - * dependent on filesystem geometry. The others are all single blocks.
> - */
> -#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
> -	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
> -
> -
> -/*
>   * Argument structure for xfs_alloc routines.
>   * This is turned into a structure to avoid having 20 arguments passed
>   * down several levels of the stack.
> @@ -133,6 +97,11 @@ typedef struct xfs_alloc_arg {
>  #define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
>  #define XFS_ALLOC_USERDATA_ZERO		(1 << 2)/* zero extent on allocation */
>  
> +/* freespace limit calculations */
> +#define XFS_ALLOC_AGFL_RESERVE	4
> +unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
> +unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
> +
>  xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
>  		struct xfs_perag *pag, xfs_extlen_t need);
>  unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 2c28f2a..61c0231 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3672,7 +3672,7 @@ xfs_bmap_btalloc(
>  	args.fsbno = ap->blkno;
>  
>  	/* Trim the allocation back to the maximum an AG can fit. */
> -	args.maxlen = MIN(ap->length, XFS_ALLOC_AG_MAX_USABLE(mp));
> +	args.maxlen = MIN(ap->length, mp->m_ag_max_usable);
>  	args.firstblock = *ap->firstblock;
>  	blen = 0;
>  	if (nullfb) {
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index f86226b..59c9f59 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -749,6 +749,8 @@ xfs_sb_mount_common(
>  		mp->m_ialloc_min_blks = sbp->sb_spino_align;
>  	else
>  		mp->m_ialloc_min_blks = mp->m_ialloc_blks;
> +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
> +	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> index 272c3f8..4ff499a 100644
> --- a/fs/xfs/xfs_discard.c
> +++ b/fs/xfs/xfs_discard.c
> @@ -179,7 +179,7 @@ xfs_ioc_trim(
>  	 * matter as trimming blocks is an advisory interface.
>  	 */
>  	if (range.start >= XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks) ||
> -	    range.minlen > XFS_FSB_TO_B(mp, XFS_ALLOC_AG_MAX_USABLE(mp)) ||
> +	    range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) ||
>  	    range.len < mp->m_sb.sb_blocksize)
>  		return -EINVAL;
>  
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 8a85e49..3772f6c 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -583,6 +583,7 @@ xfs_growfs_data_private(
>  	} else
>  		mp->m_maxicount = 0;
>  	xfs_set_low_space_thresholds(mp);
> +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
>  
>  	/* update secondary superblocks. */
>  	for (agno = 1; agno < nagcount; agno++) {
> @@ -720,7 +721,7 @@ xfs_fs_counts(
>  	cnt->allocino = percpu_counter_read_positive(&mp->m_icount);
>  	cnt->freeino = percpu_counter_read_positive(&mp->m_ifree);
>  	cnt->freedata = percpu_counter_read_positive(&mp->m_fdblocks) -
> -							XFS_ALLOC_SET_ASIDE(mp);
> +						mp->m_alloc_set_aside;
>  
>  	spin_lock(&mp->m_sb_lock);
>  	cnt->freertx = mp->m_sb.sb_frextents;
> @@ -793,7 +794,7 @@ retry:
>  		__int64_t	free;
>  
>  		free = percpu_counter_sum(&mp->m_fdblocks) -
> -							XFS_ALLOC_SET_ASIDE(mp);
> +						mp->m_alloc_set_aside;
>  		if (!free)
>  			goto out; /* ENOSPC and fdblks_delta = 0 */
>  
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 0c41bd2..b33187b 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -5027,6 +5027,7 @@ xlog_do_recover(
>  		xfs_warn(mp, "Failed post-recovery per-ag init: %d", error);
>  		return error;
>  	}
> +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
>  
>  	xlog_recover_check_summary(log);
>  
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 8af1c88..879f3ef 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1219,7 +1219,7 @@ xfs_mod_fdblocks(
>  		batch = XFS_FDBLOCKS_BATCH;
>  
>  	__percpu_counter_add(&mp->m_fdblocks, delta, batch);
> -	if (__percpu_counter_compare(&mp->m_fdblocks, XFS_ALLOC_SET_ASIDE(mp),
> +	if (__percpu_counter_compare(&mp->m_fdblocks, mp->m_alloc_set_aside,
>  				     XFS_FDBLOCKS_BATCH) >= 0) {
>  		/* we had space! */
>  		return 0;
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 0ed0f29..b36676c 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -123,6 +123,8 @@ typedef struct xfs_mount {
>  	uint			m_in_maxlevels;	/* max inobt btree levels. */
>  	uint			m_rmap_maxlevels; /* max rmap btree levels */
>  	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
> +	uint			m_alloc_set_aside; /* space we can't use */
> +	uint			m_ag_max_usable; /* max space per AG */
>  	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
>  	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
>  	struct mutex		m_growlock;	/* growfs mutex */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index bf63f6d..1575849 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1076,7 +1076,7 @@ xfs_fs_statfs(
>  	statp->f_blocks = sbp->sb_dblocks - lsize;
>  	spin_unlock(&mp->m_sb_lock);
>  
> -	statp->f_bfree = fdblocks - XFS_ALLOC_SET_ASIDE(mp);
> +	statp->f_bfree = fdblocks - mp->m_alloc_set_aside;
>  	statp->f_bavail = statp->f_bfree;
>  
>  	fakeinos = statp->f_bfree << sbp->sb_inopblog;
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong July 13, 2016, 4:50 p.m. UTC | #2
On Fri, Jul 08, 2016 at 09:21:55AM -0400, Brian Foster wrote:
> On Thu, Jun 16, 2016 at 06:21:11PM -0700, Darrick J. Wong wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > The rmap btree is allocated from the AGFL, which means we have to
> > ensure ENOSPC is reported to userspace before we run out of free
> > space in each AG. The last allocation in an AG can cause a full
> > height rmap btree split, and that means we have to reserve at least
> > this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
> > Update the various space calculation functiosn to handle this.
> 
> 				       functions
> 
> > 
> > Also, because the macros are now executing conditional code and are called quite
> > frequently, convert them to functions that initialise varaibles in the struct
> > xfs_mount, use the new variables everywhere and document the calculations
> > better.
> > 
> > v2: If rmapbt is disabled, it is incorrect to require 1 extra AGFL block
> > for the rmapbt (due to the + 1); the entire clause needs to be gated
> > on the feature flag.
> > 
> > v3: Use m_rmap_maxlevels to determine min_free.
> > 
> > [darrick.wong@oracle.com: don't reserve blocks if !rmap]
> > [dchinner@redhat.com: update m_ag_max_usable after growfs]
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Dave Chinner <david@fromorbit.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c |   71 +++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_alloc.h |   41 +++-----------------------
> >  fs/xfs/libxfs/xfs_bmap.c  |    2 +
> >  fs/xfs/libxfs/xfs_sb.c    |    2 +
> >  fs/xfs/xfs_discard.c      |    2 +
> >  fs/xfs/xfs_fsops.c        |    5 ++-
> >  fs/xfs/xfs_log_recover.c  |    1 +
> >  fs/xfs/xfs_mount.c        |    2 +
> >  fs/xfs/xfs_mount.h        |    2 +
> >  fs/xfs/xfs_super.c        |    2 +
> >  10 files changed, 88 insertions(+), 42 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 570ca17..4c8ffd4 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -63,6 +63,72 @@ xfs_prealloc_blocks(
> >  }
> >  
> >  /*
> > + * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
> > + * AGF buffer (PV 947395), we place constraints on the relationship among
> > + * actual allocations for data blocks, freelist blocks, and potential file data
> > + * bmap btree blocks. However, these restrictions may result in no actual space
> > + * allocated for a delayed extent, for example, a data block in a certain AG is
> > + * allocated but there is no additional block for the additional bmap btree
> > + * block due to a split of the bmap btree of the file. The result of this may
> > + * lead to an infinite loop when the file gets flushed to disk and all delayed
> > + * extents need to be actually allocated. To get around this, we explicitly set
> > + * aside a few blocks which will not be reserved in delayed allocation.
> > + *
> > + * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
> > + * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
> > + * so we set the number of set-aside blocks to 4 + 4*agcount when not using
> > + * rmap btrees.
> > + *
> 
> That's a bit wordy.

Yikes, that whole thing is a single sentence!

One thing I'm not really sure about is how "a potential split of file's bmap
btree requires 1 fsb" seems to translate to 4 in the actual formula.  I'd
have thought it would be m_bm_maxlevels or something... not just 4.

/* 
 * When rmap is disabled, we need to reserve 4 fsbs _per AG_ for the freelist
 * and 4 more to handle a potential split of the file's bmap btree.
 *
 * When rmap is enabled, we must also be able to handle two rmap btree inserts
 * to record both the file data extent and a new bmbt block.  The bmbt block
 * might not be in the same AG as the file data extent.  In the worst case
 * the bmap btree splits multiple levels and all the new blocks come from
 * different AGs, so set aside enough to handle rmap btree splits in all AGs.
 */

> > + * When rmap btrees are active, we have to consider that using the last block
> > + * in the AG can cause a full height rmap btree split and we need enough blocks
> > + * on the AGFL to be able to handle this. That means we have, in addition to
> > + * the above consideration, another (2 * mp->m_rmap_levels) - 1 blocks required
> > + * to be available to the free list.
> 
> I'm probably missing something, but why does a full tree split require 2
> blocks per-level (minus 1)? Wouldn't that involve an allocated block per
> level (and possibly a new root block)?

The whole rmap clause is wrong. :(

I think we'll be fine with agcount * m_rmap_maxlevels.

> Otherwise, the rest looks good to me.

Cool.

<keep going downwards>

> Brian
> 
> > + */
> > +unsigned int
> > +xfs_alloc_set_aside(
> > +	struct xfs_mount *mp)
> > +{
> > +	unsigned int	blocks;
> > +
> > +	blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
> > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > +		return blocks;
> > +	return blocks + (mp->m_sb.sb_agcount * (2 * mp->m_rmap_maxlevels) - 1);
> > +}
> > +
> > +/*
> > + * When deciding how much space to allocate out of an AG, we limit the
> > + * allocation maximum size to the size the AG. However, we cannot use all the
> > + * blocks in the AG - some are permanently used by metadata. These
> > + * blocks are generally:
> > + *	- the AG superblock, AGF, AGI and AGFL
> > + *	- the AGF (bno and cnt) and AGI btree root blocks, and optionally
> > + *	  the AGI free inode and rmap btree root blocks.
> > + *	- blocks on the AGFL according to xfs_alloc_set_aside() limits
> > + *
> > + * The AG headers are sector sized, so the amount of space they take up is
> > + * dependent on filesystem geometry. The others are all single blocks.
> > + */
> > +unsigned int
> > +xfs_alloc_ag_max_usable(struct xfs_mount *mp)
> > +{
> > +	unsigned int	blocks;
> > +
> > +	blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
> > +	blocks += XFS_ALLOC_AGFL_RESERVE;
> > +	blocks += 3;			/* AGF, AGI btree root blocks */
> > +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > +		blocks++;		/* finobt root block */
> > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > +		/* rmap root block + full tree split on full AG */
> > +		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;

I think this could be blocks++ since we now have AG reservations.

--D

> > +	}
> > +
> > +	return mp->m_sb.sb_agblocks - blocks;
> > +}
> > +
> > +/*
> >   * Lookup the record equal to [bno, len] in the btree given by cur.
> >   */
> >  STATIC int				/* error */
> > @@ -1904,6 +1970,11 @@ xfs_alloc_min_freelist(
> >  	/* space needed by-size freespace btree */
> >  	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
> >  				       mp->m_ag_maxlevels);
> > +	/* space needed reverse mapping used space btree */
> > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > +		min_free += min_t(unsigned int,
> > +				  pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
> > +				  mp->m_rmap_maxlevels);
> >  
> >  	return min_free;
> >  }
> > diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> > index 0721a48..7b6c66b 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.h
> > +++ b/fs/xfs/libxfs/xfs_alloc.h
> > @@ -56,42 +56,6 @@ typedef unsigned int xfs_alloctype_t;
> >  #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
> >  
> >  /*
> > - * In order to avoid ENOSPC-related deadlock caused by
> > - * out-of-order locking of AGF buffer (PV 947395), we place
> > - * constraints on the relationship among actual allocations for
> > - * data blocks, freelist blocks, and potential file data bmap
> > - * btree blocks. However, these restrictions may result in no
> > - * actual space allocated for a delayed extent, for example, a data
> > - * block in a certain AG is allocated but there is no additional
> > - * block for the additional bmap btree block due to a split of the
> > - * bmap btree of the file. The result of this may lead to an
> > - * infinite loop in xfssyncd when the file gets flushed to disk and
> > - * all delayed extents need to be actually allocated. To get around
> > - * this, we explicitly set aside a few blocks which will not be
> > - * reserved in delayed allocation. Considering the minimum number of
> > - * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
> > - * btree requires 1 fsb, so we set the number of set-aside blocks
> > - * to 4 + 4*agcount.
> > - */
> > -#define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
> > -
> > -/*
> > - * When deciding how much space to allocate out of an AG, we limit the
> > - * allocation maximum size to the size the AG. However, we cannot use all the
> > - * blocks in the AG - some are permanently used by metadata. These
> > - * blocks are generally:
> > - *	- the AG superblock, AGF, AGI and AGFL
> > - *	- the AGF (bno and cnt) and AGI btree root blocks
> > - *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
> > - *
> > - * The AG headers are sector sized, so the amount of space they take up is
> > - * dependent on filesystem geometry. The others are all single blocks.
> > - */
> > -#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
> > -	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
> > -
> > -
> > -/*
> >   * Argument structure for xfs_alloc routines.
> >   * This is turned into a structure to avoid having 20 arguments passed
> >   * down several levels of the stack.
> > @@ -133,6 +97,11 @@ typedef struct xfs_alloc_arg {
> >  #define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
> >  #define XFS_ALLOC_USERDATA_ZERO		(1 << 2)/* zero extent on allocation */
> >  
> > +/* freespace limit calculations */
> > +#define XFS_ALLOC_AGFL_RESERVE	4
> > +unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
> > +unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
> > +
> >  xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
> >  		struct xfs_perag *pag, xfs_extlen_t need);
> >  unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 2c28f2a..61c0231 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -3672,7 +3672,7 @@ xfs_bmap_btalloc(
> >  	args.fsbno = ap->blkno;
> >  
> >  	/* Trim the allocation back to the maximum an AG can fit. */
> > -	args.maxlen = MIN(ap->length, XFS_ALLOC_AG_MAX_USABLE(mp));
> > +	args.maxlen = MIN(ap->length, mp->m_ag_max_usable);
> >  	args.firstblock = *ap->firstblock;
> >  	blen = 0;
> >  	if (nullfb) {
> > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > index f86226b..59c9f59 100644
> > --- a/fs/xfs/libxfs/xfs_sb.c
> > +++ b/fs/xfs/libxfs/xfs_sb.c
> > @@ -749,6 +749,8 @@ xfs_sb_mount_common(
> >  		mp->m_ialloc_min_blks = sbp->sb_spino_align;
> >  	else
> >  		mp->m_ialloc_min_blks = mp->m_ialloc_blks;
> > +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
> > +	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
> >  }
> >  
> >  /*
> > diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> > index 272c3f8..4ff499a 100644
> > --- a/fs/xfs/xfs_discard.c
> > +++ b/fs/xfs/xfs_discard.c
> > @@ -179,7 +179,7 @@ xfs_ioc_trim(
> >  	 * matter as trimming blocks is an advisory interface.
> >  	 */
> >  	if (range.start >= XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks) ||
> > -	    range.minlen > XFS_FSB_TO_B(mp, XFS_ALLOC_AG_MAX_USABLE(mp)) ||
> > +	    range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) ||
> >  	    range.len < mp->m_sb.sb_blocksize)
> >  		return -EINVAL;
> >  
> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index 8a85e49..3772f6c 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> > @@ -583,6 +583,7 @@ xfs_growfs_data_private(
> >  	} else
> >  		mp->m_maxicount = 0;
> >  	xfs_set_low_space_thresholds(mp);
> > +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
> >  
> >  	/* update secondary superblocks. */
> >  	for (agno = 1; agno < nagcount; agno++) {
> > @@ -720,7 +721,7 @@ xfs_fs_counts(
> >  	cnt->allocino = percpu_counter_read_positive(&mp->m_icount);
> >  	cnt->freeino = percpu_counter_read_positive(&mp->m_ifree);
> >  	cnt->freedata = percpu_counter_read_positive(&mp->m_fdblocks) -
> > -							XFS_ALLOC_SET_ASIDE(mp);
> > +						mp->m_alloc_set_aside;
> >  
> >  	spin_lock(&mp->m_sb_lock);
> >  	cnt->freertx = mp->m_sb.sb_frextents;
> > @@ -793,7 +794,7 @@ retry:
> >  		__int64_t	free;
> >  
> >  		free = percpu_counter_sum(&mp->m_fdblocks) -
> > -							XFS_ALLOC_SET_ASIDE(mp);
> > +						mp->m_alloc_set_aside;
> >  		if (!free)
> >  			goto out; /* ENOSPC and fdblks_delta = 0 */
> >  
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index 0c41bd2..b33187b 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -5027,6 +5027,7 @@ xlog_do_recover(
> >  		xfs_warn(mp, "Failed post-recovery per-ag init: %d", error);
> >  		return error;
> >  	}
> > +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
> >  
> >  	xlog_recover_check_summary(log);
> >  
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index 8af1c88..879f3ef 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -1219,7 +1219,7 @@ xfs_mod_fdblocks(
> >  		batch = XFS_FDBLOCKS_BATCH;
> >  
> >  	__percpu_counter_add(&mp->m_fdblocks, delta, batch);
> > -	if (__percpu_counter_compare(&mp->m_fdblocks, XFS_ALLOC_SET_ASIDE(mp),
> > +	if (__percpu_counter_compare(&mp->m_fdblocks, mp->m_alloc_set_aside,
> >  				     XFS_FDBLOCKS_BATCH) >= 0) {
> >  		/* we had space! */
> >  		return 0;
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 0ed0f29..b36676c 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -123,6 +123,8 @@ typedef struct xfs_mount {
> >  	uint			m_in_maxlevels;	/* max inobt btree levels. */
> >  	uint			m_rmap_maxlevels; /* max rmap btree levels */
> >  	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
> > +	uint			m_alloc_set_aside; /* space we can't use */
> > +	uint			m_ag_max_usable; /* max space per AG */
> >  	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
> >  	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
> >  	struct mutex		m_growlock;	/* growfs mutex */
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index bf63f6d..1575849 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1076,7 +1076,7 @@ xfs_fs_statfs(
> >  	statp->f_blocks = sbp->sb_dblocks - lsize;
> >  	spin_unlock(&mp->m_sb_lock);
> >  
> > -	statp->f_bfree = fdblocks - XFS_ALLOC_SET_ASIDE(mp);
> > +	statp->f_bfree = fdblocks - mp->m_alloc_set_aside;
> >  	statp->f_bavail = statp->f_bfree;
> >  
> >  	fakeinos = statp->f_bfree << sbp->sb_inopblog;
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Foster July 13, 2016, 6:32 p.m. UTC | #3
On Wed, Jul 13, 2016 at 09:50:08AM -0700, Darrick J. Wong wrote:
> On Fri, Jul 08, 2016 at 09:21:55AM -0400, Brian Foster wrote:
> > On Thu, Jun 16, 2016 at 06:21:11PM -0700, Darrick J. Wong wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > The rmap btree is allocated from the AGFL, which means we have to
> > > ensure ENOSPC is reported to userspace before we run out of free
> > > space in each AG. The last allocation in an AG can cause a full
> > > height rmap btree split, and that means we have to reserve at least
> > > this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
> > > Update the various space calculation functiosn to handle this.
> > 
> > 				       functions
> > 
> > > 
> > > Also, because the macros are now executing conditional code and are called quite
> > > frequently, convert them to functions that initialise varaibles in the struct
> > > xfs_mount, use the new variables everywhere and document the calculations
> > > better.
> > > 
> > > v2: If rmapbt is disabled, it is incorrect to require 1 extra AGFL block
> > > for the rmapbt (due to the + 1); the entire clause needs to be gated
> > > on the feature flag.
> > > 
> > > v3: Use m_rmap_maxlevels to determine min_free.
> > > 
> > > [darrick.wong@oracle.com: don't reserve blocks if !rmap]
> > > [dchinner@redhat.com: update m_ag_max_usable after growfs]
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > > Signed-off-by: Dave Chinner <david@fromorbit.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_alloc.c |   71 +++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/libxfs/xfs_alloc.h |   41 +++-----------------------
> > >  fs/xfs/libxfs/xfs_bmap.c  |    2 +
> > >  fs/xfs/libxfs/xfs_sb.c    |    2 +
> > >  fs/xfs/xfs_discard.c      |    2 +
> > >  fs/xfs/xfs_fsops.c        |    5 ++-
> > >  fs/xfs/xfs_log_recover.c  |    1 +
> > >  fs/xfs/xfs_mount.c        |    2 +
> > >  fs/xfs/xfs_mount.h        |    2 +
> > >  fs/xfs/xfs_super.c        |    2 +
> > >  10 files changed, 88 insertions(+), 42 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index 570ca17..4c8ffd4 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > @@ -63,6 +63,72 @@ xfs_prealloc_blocks(
> > >  }
> > >  
> > >  /*
> > > + * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
> > > + * AGF buffer (PV 947395), we place constraints on the relationship among
> > > + * actual allocations for data blocks, freelist blocks, and potential file data
> > > + * bmap btree blocks. However, these restrictions may result in no actual space
> > > + * allocated for a delayed extent, for example, a data block in a certain AG is
> > > + * allocated but there is no additional block for the additional bmap btree
> > > + * block due to a split of the bmap btree of the file. The result of this may
> > > + * lead to an infinite loop when the file gets flushed to disk and all delayed
> > > + * extents need to be actually allocated. To get around this, we explicitly set
> > > + * aside a few blocks which will not be reserved in delayed allocation.
> > > + *
> > > + * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
> > > + * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
> > > + * so we set the number of set-aside blocks to 4 + 4*agcount when not using
> > > + * rmap btrees.
> > > + *
> > 
> > That's a bit wordy.
> 
> Yikes, that whole thing is a single sentence!
> 
> One thing I'm not really sure about is how "a potential split of file's bmap
> btree requires 1 fsb" seems to translate to 4 in the actual formula.  I'd
> have thought it would be m_bm_maxlevels or something... not just 4.
> 

I'm not sure about that either, tbh.

> /* 
>  * When rmap is disabled, we need to reserve 4 fsbs _per AG_ for the freelist
>  * and 4 more to handle a potential split of the file's bmap btree.
>  *
>  * When rmap is enabled, we must also be able to handle two rmap btree inserts
>  * to record both the file data extent and a new bmbt block.  The bmbt block
>  * might not be in the same AG as the file data extent.  In the worst case
>  * the bmap btree splits multiple levels and all the new blocks come from
>  * different AGs, so set aside enough to handle rmap btree splits in all AGs.
>  */
> 

That sounds much better.

> > > + * When rmap btrees are active, we have to consider that using the last block
> > > + * in the AG can cause a full height rmap btree split and we need enough blocks
> > > + * on the AGFL to be able to handle this. That means we have, in addition to
> > > + * the above consideration, another (2 * mp->m_rmap_levels) - 1 blocks required
> > > + * to be available to the free list.
> > 
> > I'm probably missing something, but why does a full tree split require 2
> > blocks per-level (minus 1)? Wouldn't that involve an allocated block per
> > level (and possibly a new root block)?
> 
> The whole rmap clause is wrong. :(
> 
> I think we'll be fine with agcount * m_rmap_maxlevels.
> 

Ok, that certainly makes more sense.

> > Otherwise, the rest looks good to me.
> 
> Cool.
> 
> <keep going downwards>
> 
> > Brian
> > 
> > > + */
> > > +unsigned int
> > > +xfs_alloc_set_aside(
> > > +	struct xfs_mount *mp)
> > > +{
> > > +	unsigned int	blocks;
> > > +
> > > +	blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
> > > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > +		return blocks;
> > > +	return blocks + (mp->m_sb.sb_agcount * (2 * mp->m_rmap_maxlevels) - 1);
> > > +}
> > > +
> > > +/*
> > > + * When deciding how much space to allocate out of an AG, we limit the
> > > + * allocation maximum size to the size the AG. However, we cannot use all the
> > > + * blocks in the AG - some are permanently used by metadata. These
> > > + * blocks are generally:
> > > + *	- the AG superblock, AGF, AGI and AGFL
> > > + *	- the AGF (bno and cnt) and AGI btree root blocks, and optionally
> > > + *	  the AGI free inode and rmap btree root blocks.
> > > + *	- blocks on the AGFL according to xfs_alloc_set_aside() limits
> > > + *
> > > + * The AG headers are sector sized, so the amount of space they take up is
> > > + * dependent on filesystem geometry. The others are all single blocks.
> > > + */
> > > +unsigned int
> > > +xfs_alloc_ag_max_usable(struct xfs_mount *mp)
> > > +{
> > > +	unsigned int	blocks;
> > > +
> > > +	blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
> > > +	blocks += XFS_ALLOC_AGFL_RESERVE;
> > > +	blocks += 3;			/* AGF, AGI btree root blocks */
> > > +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > > +		blocks++;		/* finobt root block */
> > > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > > +		/* rmap root block + full tree split on full AG */
> > > +		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
> 
> I think this could be blocks++ since we now have AG reservations.
> 

Sounds good.

Brian

> --D
> 
> > > +	}
> > > +
> > > +	return mp->m_sb.sb_agblocks - blocks;
> > > +}
> > > +
> > > +/*
> > >   * Lookup the record equal to [bno, len] in the btree given by cur.
> > >   */
> > >  STATIC int				/* error */
> > > @@ -1904,6 +1970,11 @@ xfs_alloc_min_freelist(
> > >  	/* space needed by-size freespace btree */
> > >  	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
> > >  				       mp->m_ag_maxlevels);
> > > +	/* space needed reverse mapping used space btree */
> > > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > +		min_free += min_t(unsigned int,
> > > +				  pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
> > > +				  mp->m_rmap_maxlevels);
> > >  
> > >  	return min_free;
> > >  }
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> > > index 0721a48..7b6c66b 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.h
> > > +++ b/fs/xfs/libxfs/xfs_alloc.h
> > > @@ -56,42 +56,6 @@ typedef unsigned int xfs_alloctype_t;
> > >  #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
> > >  
> > >  /*
> > > - * In order to avoid ENOSPC-related deadlock caused by
> > > - * out-of-order locking of AGF buffer (PV 947395), we place
> > > - * constraints on the relationship among actual allocations for
> > > - * data blocks, freelist blocks, and potential file data bmap
> > > - * btree blocks. However, these restrictions may result in no
> > > - * actual space allocated for a delayed extent, for example, a data
> > > - * block in a certain AG is allocated but there is no additional
> > > - * block for the additional bmap btree block due to a split of the
> > > - * bmap btree of the file. The result of this may lead to an
> > > - * infinite loop in xfssyncd when the file gets flushed to disk and
> > > - * all delayed extents need to be actually allocated. To get around
> > > - * this, we explicitly set aside a few blocks which will not be
> > > - * reserved in delayed allocation. Considering the minimum number of
> > > - * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
> > > - * btree requires 1 fsb, so we set the number of set-aside blocks
> > > - * to 4 + 4*agcount.
> > > - */
> > > -#define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
> > > -
> > > -/*
> > > - * When deciding how much space to allocate out of an AG, we limit the
> > > - * allocation maximum size to the size the AG. However, we cannot use all the
> > > - * blocks in the AG - some are permanently used by metadata. These
> > > - * blocks are generally:
> > > - *	- the AG superblock, AGF, AGI and AGFL
> > > - *	- the AGF (bno and cnt) and AGI btree root blocks
> > > - *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
> > > - *
> > > - * The AG headers are sector sized, so the amount of space they take up is
> > > - * dependent on filesystem geometry. The others are all single blocks.
> > > - */
> > > -#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
> > > -	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
> > > -
> > > -
> > > -/*
> > >   * Argument structure for xfs_alloc routines.
> > >   * This is turned into a structure to avoid having 20 arguments passed
> > >   * down several levels of the stack.
> > > @@ -133,6 +97,11 @@ typedef struct xfs_alloc_arg {
> > >  #define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
> > >  #define XFS_ALLOC_USERDATA_ZERO		(1 << 2)/* zero extent on allocation */
> > >  
> > > +/* freespace limit calculations */
> > > +#define XFS_ALLOC_AGFL_RESERVE	4
> > > +unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
> > > +unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
> > > +
> > >  xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
> > >  		struct xfs_perag *pag, xfs_extlen_t need);
> > >  unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
> > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > > index 2c28f2a..61c0231 100644
> > > --- a/fs/xfs/libxfs/xfs_bmap.c
> > > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > > @@ -3672,7 +3672,7 @@ xfs_bmap_btalloc(
> > >  	args.fsbno = ap->blkno;
> > >  
> > >  	/* Trim the allocation back to the maximum an AG can fit. */
> > > -	args.maxlen = MIN(ap->length, XFS_ALLOC_AG_MAX_USABLE(mp));
> > > +	args.maxlen = MIN(ap->length, mp->m_ag_max_usable);
> > >  	args.firstblock = *ap->firstblock;
> > >  	blen = 0;
> > >  	if (nullfb) {
> > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > > index f86226b..59c9f59 100644
> > > --- a/fs/xfs/libxfs/xfs_sb.c
> > > +++ b/fs/xfs/libxfs/xfs_sb.c
> > > @@ -749,6 +749,8 @@ xfs_sb_mount_common(
> > >  		mp->m_ialloc_min_blks = sbp->sb_spino_align;
> > >  	else
> > >  		mp->m_ialloc_min_blks = mp->m_ialloc_blks;
> > > +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
> > > +	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
> > >  }
> > >  
> > >  /*
> > > diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> > > index 272c3f8..4ff499a 100644
> > > --- a/fs/xfs/xfs_discard.c
> > > +++ b/fs/xfs/xfs_discard.c
> > > @@ -179,7 +179,7 @@ xfs_ioc_trim(
> > >  	 * matter as trimming blocks is an advisory interface.
> > >  	 */
> > >  	if (range.start >= XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks) ||
> > > -	    range.minlen > XFS_FSB_TO_B(mp, XFS_ALLOC_AG_MAX_USABLE(mp)) ||
> > > +	    range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) ||
> > >  	    range.len < mp->m_sb.sb_blocksize)
> > >  		return -EINVAL;
> > >  
> > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > index 8a85e49..3772f6c 100644
> > > --- a/fs/xfs/xfs_fsops.c
> > > +++ b/fs/xfs/xfs_fsops.c
> > > @@ -583,6 +583,7 @@ xfs_growfs_data_private(
> > >  	} else
> > >  		mp->m_maxicount = 0;
> > >  	xfs_set_low_space_thresholds(mp);
> > > +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
> > >  
> > >  	/* update secondary superblocks. */
> > >  	for (agno = 1; agno < nagcount; agno++) {
> > > @@ -720,7 +721,7 @@ xfs_fs_counts(
> > >  	cnt->allocino = percpu_counter_read_positive(&mp->m_icount);
> > >  	cnt->freeino = percpu_counter_read_positive(&mp->m_ifree);
> > >  	cnt->freedata = percpu_counter_read_positive(&mp->m_fdblocks) -
> > > -							XFS_ALLOC_SET_ASIDE(mp);
> > > +						mp->m_alloc_set_aside;
> > >  
> > >  	spin_lock(&mp->m_sb_lock);
> > >  	cnt->freertx = mp->m_sb.sb_frextents;
> > > @@ -793,7 +794,7 @@ retry:
> > >  		__int64_t	free;
> > >  
> > >  		free = percpu_counter_sum(&mp->m_fdblocks) -
> > > -							XFS_ALLOC_SET_ASIDE(mp);
> > > +						mp->m_alloc_set_aside;
> > >  		if (!free)
> > >  			goto out; /* ENOSPC and fdblks_delta = 0 */
> > >  
> > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > > index 0c41bd2..b33187b 100644
> > > --- a/fs/xfs/xfs_log_recover.c
> > > +++ b/fs/xfs/xfs_log_recover.c
> > > @@ -5027,6 +5027,7 @@ xlog_do_recover(
> > >  		xfs_warn(mp, "Failed post-recovery per-ag init: %d", error);
> > >  		return error;
> > >  	}
> > > +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
> > >  
> > >  	xlog_recover_check_summary(log);
> > >  
> > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > index 8af1c88..879f3ef 100644
> > > --- a/fs/xfs/xfs_mount.c
> > > +++ b/fs/xfs/xfs_mount.c
> > > @@ -1219,7 +1219,7 @@ xfs_mod_fdblocks(
> > >  		batch = XFS_FDBLOCKS_BATCH;
> > >  
> > >  	__percpu_counter_add(&mp->m_fdblocks, delta, batch);
> > > -	if (__percpu_counter_compare(&mp->m_fdblocks, XFS_ALLOC_SET_ASIDE(mp),
> > > +	if (__percpu_counter_compare(&mp->m_fdblocks, mp->m_alloc_set_aside,
> > >  				     XFS_FDBLOCKS_BATCH) >= 0) {
> > >  		/* we had space! */
> > >  		return 0;
> > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > > index 0ed0f29..b36676c 100644
> > > --- a/fs/xfs/xfs_mount.h
> > > +++ b/fs/xfs/xfs_mount.h
> > > @@ -123,6 +123,8 @@ typedef struct xfs_mount {
> > >  	uint			m_in_maxlevels;	/* max inobt btree levels. */
> > >  	uint			m_rmap_maxlevels; /* max rmap btree levels */
> > >  	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
> > > +	uint			m_alloc_set_aside; /* space we can't use */
> > > +	uint			m_ag_max_usable; /* max space per AG */
> > >  	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
> > >  	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
> > >  	struct mutex		m_growlock;	/* growfs mutex */
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index bf63f6d..1575849 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -1076,7 +1076,7 @@ xfs_fs_statfs(
> > >  	statp->f_blocks = sbp->sb_dblocks - lsize;
> > >  	spin_unlock(&mp->m_sb_lock);
> > >  
> > > -	statp->f_bfree = fdblocks - XFS_ALLOC_SET_ASIDE(mp);
> > > +	statp->f_bfree = fdblocks - mp->m_alloc_set_aside;
> > >  	statp->f_bavail = statp->f_bfree;
> > >  
> > >  	fakeinos = statp->f_bfree << sbp->sb_inopblog;
> > > 
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner July 13, 2016, 11:50 p.m. UTC | #4
On Wed, Jul 13, 2016 at 02:32:17PM -0400, Brian Foster wrote:
> On Wed, Jul 13, 2016 at 09:50:08AM -0700, Darrick J. Wong wrote:
> > On Fri, Jul 08, 2016 at 09:21:55AM -0400, Brian Foster wrote:
> > > >  /*
> > > > + * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
> > > > + * AGF buffer (PV 947395), we place constraints on the relationship among
> > > > + * actual allocations for data blocks, freelist blocks, and potential file data
> > > > + * bmap btree blocks. However, these restrictions may result in no actual space
> > > > + * allocated for a delayed extent, for example, a data block in a certain AG is
> > > > + * allocated but there is no additional block for the additional bmap btree
> > > > + * block due to a split of the bmap btree of the file. The result of this may
> > > > + * lead to an infinite loop when the file gets flushed to disk and all delayed
> > > > + * extents need to be actually allocated. To get around this, we explicitly set
> > > > + * aside a few blocks which will not be reserved in delayed allocation.
> > > > + *
> > > > + * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
> > > > + * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
> > > > + * so we set the number of set-aside blocks to 4 + 4*agcount when not using
> > > > + * rmap btrees.
> > > > + *
> > > 
> > > That's a bit wordy.
> > 
> > Yikes, that whole thing is a single sentence!
> > 
> > One thing I'm not really sure about is how "a potential split of file's bmap
> > btree requires 1 fsb" seems to translate to 4 in the actual formula.  I'd
> > have thought it would be m_bm_maxlevels or something... not just 4.
> > 
> 
> I'm not sure about that either, tbh.

So, a trip down memory lane. 

<wavy line dissolve>

Back in 2006, I fixed a bug that changed XFS_ALLOC_SET_ASIDE from
a  fixed value of 8 blocks to 4 blocks + 4 AGFL blocks per AG in
commit 4be536de ("[XFS] Prevent free space
oversubscription and xfssyncd looping."). The original value of
8 was for 4 blocks for the bmbt split, and 4 blocks from the current
AG for the AGFL (commit message explains the reason this was a
problem (Yay for writing good commit messages 10 years ago!)). The
oringinal comment text was:

- * reserved in delayed allocation. Considering the minimum number of
- * needed freelist blocks is 4 fsbs, a potential split of file's bmap
- * btree requires 1 fsb, so we set the number of set-aside blocks to 8.
-*/

So we need to go back further. We have an obvious git log search
target in the comment (PV#947395), and that points to:

commit d210a28cd851082cec9b282443f8cc0e6fc09830
Author: Yingping Lu <yingping@sgi.com>
Date:   Fri Jun 9 14:55:18 2006 +1000

    [XFS] In actual allocation of file system blocks and freeing extents, the
    transaction within each such operation may involve multiple locking of AGF
    buffer. While the freeing extent function has sorted the extents based on
    AGF number before entering into transaction, however, when the file system
    space is very limited, the allocation of space would try every AGF to get
    space allocated, this could potentially cause out-of-order locking, thus
    deadlock could happen. This fix mitigates the scarce space for allocation
    by setting aside a few blocks without reservation, and avoid deadlock by
    maintaining ascending order of AGF locking.
    
    SGI-PV: 947395
    SGI-Modid: xfs-linux-melb:xfs-kern:210801a
    
    Signed-off-by: Yingping Lu <yingping@sgi.com>
    Signed-off-by: Nathan Scott <nathans@sgi.com>

Which tells us nothing about why 1 fsb for the bmbt split was
actually reserved as 4fsbs. IIRC, I ended up having to find and fix
the problem because Yingping left SGI soon after this fix was made,
and at the time nobody understood or could work out why that was
done. It worked, however, so we left it that way, and just fixed the
per-ag reservation problem this bug fix had.


> > /* 
> >  * When rmap is disabled, we need to reserve 4 fsbs _per AG_ for the freelist
> >  * and 4 more to handle a potential split of the file's bmap btree.

As such, I'm not sure that is any more correct than the original
comment.

Looking back on this now with 10 years more time working on XFS, my
suspicion is that a single level bmap btree split requires 1
block to be allocated, but that allocation will call
xfs_alloc_fix_freelist() to refill the freelist to the minimum
(which is 4 blocks), and so we need at least 4 blocks for the
allocation to succeed (4 blocks for the freelist fill, and if we are
at ENOSPC then the bmap btree block will be allocated from the
AGFL).

Whether the value of 4 is correct or not for this purpose is just a
guess based on the per-ag AGFL requirements, so my only comment
right now is: it's worked for 10 years, so let's not change it until
there's at least some evidence that is it wrong.

Cheers,

Dave.
diff mbox

Patch

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 570ca17..4c8ffd4 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -63,6 +63,72 @@  xfs_prealloc_blocks(
 }
 
 /*
+ * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
+ * AGF buffer (PV 947395), we place constraints on the relationship among
+ * actual allocations for data blocks, freelist blocks, and potential file data
+ * bmap btree blocks. However, these restrictions may result in no actual space
+ * allocated for a delayed extent, for example, a data block in a certain AG is
+ * allocated but there is no additional block for the additional bmap btree
+ * block due to a split of the bmap btree of the file. The result of this may
+ * lead to an infinite loop when the file gets flushed to disk and all delayed
+ * extents need to be actually allocated. To get around this, we explicitly set
+ * aside a few blocks which will not be reserved in delayed allocation.
+ *
+ * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
+ * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
+ * so we set the number of set-aside blocks to 4 + 4*agcount when not using
+ * rmap btrees.
+ *
+ * When rmap btrees are active, we have to consider that using the last block
+ * in the AG can cause a full height rmap btree split and we need enough blocks
+ * on the AGFL to be able to handle this. That means we have, in addition to
+ * the above consideration, another (2 * mp->m_rmap_levels) - 1 blocks required
+ * to be available to the free list.
+ */
+unsigned int
+xfs_alloc_set_aside(
+	struct xfs_mount *mp)
+{
+	unsigned int	blocks;
+
+	blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return blocks;
+	return blocks + (mp->m_sb.sb_agcount * (2 * mp->m_rmap_maxlevels) - 1);
+}
+
+/*
+ * When deciding how much space to allocate out of an AG, we limit the
+ * allocation maximum size to the size the AG. However, we cannot use all the
+ * blocks in the AG - some are permanently used by metadata. These
+ * blocks are generally:
+ *	- the AG superblock, AGF, AGI and AGFL
+ *	- the AGF (bno and cnt) and AGI btree root blocks, and optionally
+ *	  the AGI free inode and rmap btree root blocks.
+ *	- blocks on the AGFL according to xfs_alloc_set_aside() limits
+ *
+ * The AG headers are sector sized, so the amount of space they take up is
+ * dependent on filesystem geometry. The others are all single blocks.
+ */
+unsigned int
+xfs_alloc_ag_max_usable(struct xfs_mount *mp)
+{
+	unsigned int	blocks;
+
+	blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
+	blocks += XFS_ALLOC_AGFL_RESERVE;
+	blocks += 3;			/* AGF, AGI btree root blocks */
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		blocks++;		/* finobt root block */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		/* rmap root block + full tree split on full AG */
+		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
+	}
+
+	return mp->m_sb.sb_agblocks - blocks;
+}
+
+/*
  * Lookup the record equal to [bno, len] in the btree given by cur.
  */
 STATIC int				/* error */
@@ -1904,6 +1970,11 @@  xfs_alloc_min_freelist(
 	/* space needed by-size freespace btree */
 	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
 				       mp->m_ag_maxlevels);
+	/* space needed reverse mapping used space btree */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		min_free += min_t(unsigned int,
+				  pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
+				  mp->m_rmap_maxlevels);
 
 	return min_free;
 }
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 0721a48..7b6c66b 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -56,42 +56,6 @@  typedef unsigned int xfs_alloctype_t;
 #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
 
 /*
- * In order to avoid ENOSPC-related deadlock caused by
- * out-of-order locking of AGF buffer (PV 947395), we place
- * constraints on the relationship among actual allocations for
- * data blocks, freelist blocks, and potential file data bmap
- * btree blocks. However, these restrictions may result in no
- * actual space allocated for a delayed extent, for example, a data
- * block in a certain AG is allocated but there is no additional
- * block for the additional bmap btree block due to a split of the
- * bmap btree of the file. The result of this may lead to an
- * infinite loop in xfssyncd when the file gets flushed to disk and
- * all delayed extents need to be actually allocated. To get around
- * this, we explicitly set aside a few blocks which will not be
- * reserved in delayed allocation. Considering the minimum number of
- * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
- * btree requires 1 fsb, so we set the number of set-aside blocks
- * to 4 + 4*agcount.
- */
-#define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
-
-/*
- * When deciding how much space to allocate out of an AG, we limit the
- * allocation maximum size to the size the AG. However, we cannot use all the
- * blocks in the AG - some are permanently used by metadata. These
- * blocks are generally:
- *	- the AG superblock, AGF, AGI and AGFL
- *	- the AGF (bno and cnt) and AGI btree root blocks
- *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
- *
- * The AG headers are sector sized, so the amount of space they take up is
- * dependent on filesystem geometry. The others are all single blocks.
- */
-#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
-	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
-
-
-/*
  * Argument structure for xfs_alloc routines.
  * This is turned into a structure to avoid having 20 arguments passed
  * down several levels of the stack.
@@ -133,6 +97,11 @@  typedef struct xfs_alloc_arg {
 #define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
 #define XFS_ALLOC_USERDATA_ZERO		(1 << 2)/* zero extent on allocation */
 
+/* freespace limit calculations */
+#define XFS_ALLOC_AGFL_RESERVE	4
+unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
+unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
+
 xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
 		struct xfs_perag *pag, xfs_extlen_t need);
 unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 2c28f2a..61c0231 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3672,7 +3672,7 @@  xfs_bmap_btalloc(
 	args.fsbno = ap->blkno;
 
 	/* Trim the allocation back to the maximum an AG can fit. */
-	args.maxlen = MIN(ap->length, XFS_ALLOC_AG_MAX_USABLE(mp));
+	args.maxlen = MIN(ap->length, mp->m_ag_max_usable);
 	args.firstblock = *ap->firstblock;
 	blen = 0;
 	if (nullfb) {
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index f86226b..59c9f59 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -749,6 +749,8 @@  xfs_sb_mount_common(
 		mp->m_ialloc_min_blks = sbp->sb_spino_align;
 	else
 		mp->m_ialloc_min_blks = mp->m_ialloc_blks;
+	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
+	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
 }
 
 /*
diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index 272c3f8..4ff499a 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -179,7 +179,7 @@  xfs_ioc_trim(
 	 * matter as trimming blocks is an advisory interface.
 	 */
 	if (range.start >= XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks) ||
-	    range.minlen > XFS_FSB_TO_B(mp, XFS_ALLOC_AG_MAX_USABLE(mp)) ||
+	    range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) ||
 	    range.len < mp->m_sb.sb_blocksize)
 		return -EINVAL;
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 8a85e49..3772f6c 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -583,6 +583,7 @@  xfs_growfs_data_private(
 	} else
 		mp->m_maxicount = 0;
 	xfs_set_low_space_thresholds(mp);
+	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
 
 	/* update secondary superblocks. */
 	for (agno = 1; agno < nagcount; agno++) {
@@ -720,7 +721,7 @@  xfs_fs_counts(
 	cnt->allocino = percpu_counter_read_positive(&mp->m_icount);
 	cnt->freeino = percpu_counter_read_positive(&mp->m_ifree);
 	cnt->freedata = percpu_counter_read_positive(&mp->m_fdblocks) -
-							XFS_ALLOC_SET_ASIDE(mp);
+						mp->m_alloc_set_aside;
 
 	spin_lock(&mp->m_sb_lock);
 	cnt->freertx = mp->m_sb.sb_frextents;
@@ -793,7 +794,7 @@  retry:
 		__int64_t	free;
 
 		free = percpu_counter_sum(&mp->m_fdblocks) -
-							XFS_ALLOC_SET_ASIDE(mp);
+						mp->m_alloc_set_aside;
 		if (!free)
 			goto out; /* ENOSPC and fdblks_delta = 0 */
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 0c41bd2..b33187b 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -5027,6 +5027,7 @@  xlog_do_recover(
 		xfs_warn(mp, "Failed post-recovery per-ag init: %d", error);
 		return error;
 	}
+	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
 
 	xlog_recover_check_summary(log);
 
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 8af1c88..879f3ef 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1219,7 +1219,7 @@  xfs_mod_fdblocks(
 		batch = XFS_FDBLOCKS_BATCH;
 
 	__percpu_counter_add(&mp->m_fdblocks, delta, batch);
-	if (__percpu_counter_compare(&mp->m_fdblocks, XFS_ALLOC_SET_ASIDE(mp),
+	if (__percpu_counter_compare(&mp->m_fdblocks, mp->m_alloc_set_aside,
 				     XFS_FDBLOCKS_BATCH) >= 0) {
 		/* we had space! */
 		return 0;
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 0ed0f29..b36676c 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -123,6 +123,8 @@  typedef struct xfs_mount {
 	uint			m_in_maxlevels;	/* max inobt btree levels. */
 	uint			m_rmap_maxlevels; /* max rmap btree levels */
 	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
+	uint			m_alloc_set_aside; /* space we can't use */
+	uint			m_ag_max_usable; /* max space per AG */
 	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
 	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
 	struct mutex		m_growlock;	/* growfs mutex */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index bf63f6d..1575849 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1076,7 +1076,7 @@  xfs_fs_statfs(
 	statp->f_blocks = sbp->sb_dblocks - lsize;
 	spin_unlock(&mp->m_sb_lock);
 
-	statp->f_bfree = fdblocks - XFS_ALLOC_SET_ASIDE(mp);
+	statp->f_bfree = fdblocks - mp->m_alloc_set_aside;
 	statp->f_bavail = statp->f_bfree;
 
 	fakeinos = statp->f_bfree << sbp->sb_inopblog;