diff mbox series

[3/6] xfs: don't include bnobt blocks when reserving free block pool

Message ID 164779462392.550479.11627083041484347485.stgit@magnolia (mailing list archive)
State Accepted
Headers show
Series xfs: fix incorrect reserve pool calculations and reporting | expand

Commit Message

Darrick J. Wong March 20, 2022, 4:43 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

xfs_reserve_blocks controls the size of the user-visible free space
reserve pool.  Given the difference between the current and requested
pool sizes, it will try to reserve free space from fdblocks.  However,
the amount requested from fdblocks is also constrained by the amount of
space that we think xfs_mod_fdblocks will give us.  We'll keep trying to
reserve space so long as xfs_mod_fdblocks returns ENOSPC.

In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand
out the "free space" used by the free space btrees, because some portion
of the free space btrees hold in reserve space for future btree
expansion.  Unfortunately, xfs_reserve_blocks' estimation of the number
of blocks that it could request from xfs_mod_fdblocks was not updated to
include m_allocbt_blks, so if space is extremely low, the caller hangs.

Fix this by creating a function to estimate the number of blocks that
can be reserved from fdblocks, which needs to exclude the set-aside and
m_allocbt_blks.

Found by running xfs/306 (which formats a single-AG 20MB filesystem)
with an fstests configuration that specifies a 1k blocksize and a
specially crafted log size that will consume 7/8 of the space (17920
blocks, specifically) in that AG.

Cc: Brian Foster <bfoster@redhat.com>
Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsops.c |    2 +-
 fs/xfs/xfs_mount.c |    2 +-
 fs/xfs/xfs_mount.h |   15 +++++++++++++++
 3 files changed, 17 insertions(+), 2 deletions(-)

Comments

Brian Foster March 21, 2022, 3:22 p.m. UTC | #1
On Sun, Mar 20, 2022 at 09:43:43AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> xfs_reserve_blocks controls the size of the user-visible free space
> reserve pool.  Given the difference between the current and requested
> pool sizes, it will try to reserve free space from fdblocks.  However,
> the amount requested from fdblocks is also constrained by the amount of
> space that we think xfs_mod_fdblocks will give us.  We'll keep trying to
> reserve space so long as xfs_mod_fdblocks returns ENOSPC.
> 
> In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand
> out the "free space" used by the free space btrees, because some portion
> of the free space btrees hold in reserve space for future btree
> expansion.  Unfortunately, xfs_reserve_blocks' estimation of the number
> of blocks that it could request from xfs_mod_fdblocks was not updated to
> include m_allocbt_blks, so if space is extremely low, the caller hangs.
> 
> Fix this by creating a function to estimate the number of blocks that
> can be reserved from fdblocks, which needs to exclude the set-aside and
> m_allocbt_blks.
> 
> Found by running xfs/306 (which formats a single-AG 20MB filesystem)
> with an fstests configuration that specifies a 1k blocksize and a
> specially crafted log size that will consume 7/8 of the space (17920
> blocks, specifically) in that AG.
> 
> Cc: Brian Foster <bfoster@redhat.com>
> Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_fsops.c |    2 +-
>  fs/xfs/xfs_mount.c |    2 +-
>  fs/xfs/xfs_mount.h |   15 +++++++++++++++
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> 
...
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 00720a02e761..da1b7056e743 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -479,6 +479,21 @@ extern void	xfs_unmountfs(xfs_mount_t *);
>   */
>  #define XFS_FDBLOCKS_BATCH	1024
>  
> +/*
> + * Estimate the amount of free space that is not available to userspace and is
> + * not explicitly reserved from the incore fdblocks:
> + *
> + * - Space reserved to ensure that we can always split a bmap btree
> + * - Free space btree blocks that are not available for allocation due to
> + *   per-AG metadata reservations
> + */

What does this mean by "due to" perag res? That sounds like a separate
thing to me. Perhaps this could just say something like:

"Estimate the amount of accounted free space that is not available to
userspace. This includes the minimum number of blocks to support a bmbt
split (calculated at mount time) and the blocks currently in-use by the
allocation btrees."

Comment nit aside, this LGTM. Thanks for the rework..

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +static inline uint64_t
> +xfs_fdblocks_unavailable(
> +	struct xfs_mount	*mp)
> +{
> +	return mp->m_alloc_set_aside + atomic64_read(&mp->m_allocbt_blks);
> +}
> +
>  extern int	xfs_mod_fdblocks(struct xfs_mount *mp, int64_t delta,
>  				 bool reserved);
>  extern int	xfs_mod_frextents(struct xfs_mount *mp, int64_t delta);
>
Darrick J. Wong March 21, 2022, 8:42 p.m. UTC | #2
On Mon, Mar 21, 2022 at 11:22:27AM -0400, Brian Foster wrote:
> On Sun, Mar 20, 2022 at 09:43:43AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > xfs_reserve_blocks controls the size of the user-visible free space
> > reserve pool.  Given the difference between the current and requested
> > pool sizes, it will try to reserve free space from fdblocks.  However,
> > the amount requested from fdblocks is also constrained by the amount of
> > space that we think xfs_mod_fdblocks will give us.  We'll keep trying to
> > reserve space so long as xfs_mod_fdblocks returns ENOSPC.
> > 
> > In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand
> > out the "free space" used by the free space btrees, because some portion
> > of the free space btrees hold in reserve space for future btree
> > expansion.  Unfortunately, xfs_reserve_blocks' estimation of the number
> > of blocks that it could request from xfs_mod_fdblocks was not updated to
> > include m_allocbt_blks, so if space is extremely low, the caller hangs.
> > 
> > Fix this by creating a function to estimate the number of blocks that
> > can be reserved from fdblocks, which needs to exclude the set-aside and
> > m_allocbt_blks.
> > 
> > Found by running xfs/306 (which formats a single-AG 20MB filesystem)
> > with an fstests configuration that specifies a 1k blocksize and a
> > specially crafted log size that will consume 7/8 of the space (17920
> > blocks, specifically) in that AG.
> > 
> > Cc: Brian Foster <bfoster@redhat.com>
> > Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation")
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/xfs_fsops.c |    2 +-
> >  fs/xfs/xfs_mount.c |    2 +-
> >  fs/xfs/xfs_mount.h |   15 +++++++++++++++
> >  3 files changed, 17 insertions(+), 2 deletions(-)
> > 
> > 
> ...
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 00720a02e761..da1b7056e743 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -479,6 +479,21 @@ extern void	xfs_unmountfs(xfs_mount_t *);
> >   */
> >  #define XFS_FDBLOCKS_BATCH	1024
> >  
> > +/*
> > + * Estimate the amount of free space that is not available to userspace and is
> > + * not explicitly reserved from the incore fdblocks:
> > + *
> > + * - Space reserved to ensure that we can always split a bmap btree
> > + * - Free space btree blocks that are not available for allocation due to
> > + *   per-AG metadata reservations
> > + */
> 
> What does this mean by "due to" perag res? That sounds like a separate
> thing to me. Perhaps this could just say something like:
> 
> "Estimate the amount of accounted free space that is not available to
> userspace. This includes the minimum number of blocks to support a bmbt
> split (calculated at mount time) and the blocks currently in-use by the
> allocation btrees."

IMO, the last sentence should capture /why/ we subtract the blocks that
are in use by the free space btrees because we used to advertise freesp
btree blocks in f_bfree/f_bavail, and someone who is accustomed to that
behavior might read the sentence and think "Oh, that's incorrect".

But clearly "due to per-AG metadata reservations" isn't clear enough, so
I'll change it to:

"The blocks currently in use by the freespace btrees because they record
the actual blocks that will fill per-AG metadata space reservations."

Thanks for the review :)

--D

> Comment nit aside, this LGTM. Thanks for the rework..
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> > +static inline uint64_t
> > +xfs_fdblocks_unavailable(
> > +	struct xfs_mount	*mp)
> > +{
> > +	return mp->m_alloc_set_aside + atomic64_read(&mp->m_allocbt_blks);
> > +}
> > +
> >  extern int	xfs_mod_fdblocks(struct xfs_mount *mp, int64_t delta,
> >  				 bool reserved);
> >  extern int	xfs_mod_frextents(struct xfs_mount *mp, int64_t delta);
> > 
>
Dave Chinner March 23, 2022, 8:51 p.m. UTC | #3
On Sun, Mar 20, 2022 at 09:43:43AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> xfs_reserve_blocks controls the size of the user-visible free space
> reserve pool.  Given the difference between the current and requested
> pool sizes, it will try to reserve free space from fdblocks.  However,
> the amount requested from fdblocks is also constrained by the amount of
> space that we think xfs_mod_fdblocks will give us.  We'll keep trying to
> reserve space so long as xfs_mod_fdblocks returns ENOSPC.
> 
> In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand
> out the "free space" used by the free space btrees, because some portion
> of the free space btrees hold in reserve space for future btree
> expansion.  Unfortunately, xfs_reserve_blocks' estimation of the number
> of blocks that it could request from xfs_mod_fdblocks was not updated to
> include m_allocbt_blks, so if space is extremely low, the caller hangs.
> 
> Fix this by creating a function to estimate the number of blocks that
> can be reserved from fdblocks, which needs to exclude the set-aside and
> m_allocbt_blks.
> 
> Found by running xfs/306 (which formats a single-AG 20MB filesystem)
> with an fstests configuration that specifies a 1k blocksize and a
> specially crafted log size that will consume 7/8 of the space (17920
> blocks, specifically) in that AG.
> 
> Cc: Brian Foster <bfoster@redhat.com>
> Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 33e26690a8c4..710e857bb825 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -434,7 +434,7 @@  xfs_reserve_blocks(
 	error = -ENOSPC;
 	do {
 		free = percpu_counter_sum(&mp->m_fdblocks) -
-						mp->m_alloc_set_aside;
+						xfs_fdblocks_unavailable(mp);
 		if (free <= 0)
 			break;
 
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 4f8fac8175e8..c9fd5219d377 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1153,7 +1153,7 @@  xfs_mod_fdblocks(
 	 * problems (i.e. transaction abort, pagecache discards, etc.) than
 	 * slightly premature -ENOSPC.
 	 */
-	set_aside = mp->m_alloc_set_aside + atomic64_read(&mp->m_allocbt_blks);
+	set_aside = xfs_fdblocks_unavailable(mp);
 	percpu_counter_add_batch(&mp->m_fdblocks, delta, batch);
 	if (__percpu_counter_compare(&mp->m_fdblocks, set_aside,
 				     XFS_FDBLOCKS_BATCH) >= 0) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 00720a02e761..da1b7056e743 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -479,6 +479,21 @@  extern void	xfs_unmountfs(xfs_mount_t *);
  */
 #define XFS_FDBLOCKS_BATCH	1024
 
+/*
+ * Estimate the amount of free space that is not available to userspace and is
+ * not explicitly reserved from the incore fdblocks:
+ *
+ * - Space reserved to ensure that we can always split a bmap btree
+ * - Free space btree blocks that are not available for allocation due to
+ *   per-AG metadata reservations
+ */
+static inline uint64_t
+xfs_fdblocks_unavailable(
+	struct xfs_mount	*mp)
+{
+	return mp->m_alloc_set_aside + atomic64_read(&mp->m_allocbt_blks);
+}
+
 extern int	xfs_mod_fdblocks(struct xfs_mount *mp, int64_t delta,
 				 bool reserved);
 extern int	xfs_mod_frextents(struct xfs_mount *mp, int64_t delta);