diff mbox series

[02/42] xfs: prefer free inodes at ENOSPC over chunk allocation

Message ID 20230118224505.1964941-3-david@fromorbit.com (mailing list archive)
State Superseded, archived
Headers show
Series xfs: per-ag centric allocation alogrithms | expand

Commit Message

Dave Chinner Jan. 18, 2023, 10:44 p.m. UTC
From: Dave Chinner <dchinner@redhat.com>

When an XFS filesystem has free inodes in chunks already allocated
on disk, it will still allocate new inode chunks if the target AG
has no free inodes in it. Normally, this is a good idea as it
preserves locality of all the inodes in a given directory.

However, at ENOSPC this can lead to using the last few remaining
free filesystem blocks to allocate a new chunk when there are many,
many free inodes that could be allocated without consuming free
space. This results in speeding up the consumption of the last few
blocks and inode create operations then returning ENOSPC when there
free inodes available because we don't have enough block left in the
filesystem for directory creation reservations to proceed.

Hence when we are near ENOSPC, we should be attempting to preserve
the remaining blocks for directory block allocation rather than
using them for unnecessary inode chunk creation.

This particular behaviour is exposed by xfs/294, when it drives to
ENOSPC on empty file creation whilst there are still thousands of
free inodes available for allocation in other AGs in the filesystem.

Hence, when we are within 1% of ENOSPC, change the inode allocation
behaviour to prefer to use existing free inodes over allocating new
inode chunks, even though it results is poorer locality of the data
set. It is more important for the allocations to be space efficient
near ENOSPC than to have optimal locality for performance, so lets
modify the inode AG selection code to reflect that fact.

This allows generic/294 to not only pass with this allocator rework
patchset, but to increase the number of post-ENOSPC empty inode
allocations to from ~600 to ~9080 before we hit ENOSPC on the
directory create transaction reservation.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Allison Henderson Jan. 19, 2023, 7:08 p.m. UTC | #1
On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When an XFS filesystem has free inodes in chunks already allocated
> on disk, it will still allocate new inode chunks if the target AG
> has no free inodes in it. Normally, this is a good idea as it
> preserves locality of all the inodes in a given directory.
> 
> However, at ENOSPC this can lead to using the last few remaining
> free filesystem blocks to allocate a new chunk when there are many,
> many free inodes that could be allocated without consuming free
> space. This results in speeding up the consumption of the last few
> blocks and inode create operations then returning ENOSPC when there
> free inodes available because we don't have enough block left in the
> filesystem for directory creation reservations to proceed.
> 
> Hence when we are near ENOSPC, we should be attempting to preserve
> the remaining blocks for directory block allocation rather than
> using them for unnecessary inode chunk creation.
> 
> This particular behaviour is exposed by xfs/294, when it drives to
> ENOSPC on empty file creation whilst there are still thousands of
> free inodes available for allocation in other AGs in the filesystem.
> 
> Hence, when we are within 1% of ENOSPC, change the inode allocation
> behaviour to prefer to use existing free inodes over allocating new
> inode chunks, even though it results is poorer locality of the data
> set. It is more important for the allocations to be space efficient
> near ENOSPC than to have optimal locality for performance, so lets
> modify the inode AG selection code to reflect that fact.
> 
> This allows generic/294 to not only pass with this allocator rework
> patchset, but to increase the number of post-ENOSPC empty inode
> allocations to from ~600 to ~9080 before we hit ENOSPC on the
> directory create transaction reservation.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, makes sense
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
 
> ---
>  fs/xfs/libxfs/xfs_ialloc.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 5118dedf9267..e8068422aa21 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -1737,6 +1737,7 @@ xfs_dialloc(
>         struct xfs_perag        *pag;
>         struct xfs_ino_geometry *igeo = M_IGEO(mp);
>         bool                    ok_alloc = true;
> +       bool                    low_space = false;
>         int                     flags;
>         xfs_ino_t               ino;
>  
> @@ -1767,6 +1768,20 @@ xfs_dialloc(
>                 ok_alloc = false;
>         }
>  
> +       /*
> +        * If we are near to ENOSPC, we want to prefer allocation
> from AGs that
> +        * have free inodes in them rather than use up free space
> allocating new
> +        * inode chunks. Hence we turn off allocation for the first
> non-blocking
> +        * pass through the AGs if we are near ENOSPC to consume free
> inodes
> +        * that we can immediately allocate, but then we allow
> allocation on the
> +        * second pass if we fail to find an AG with free inodes in
> it.
> +        */
> +       if (percpu_counter_read_positive(&mp->m_fdblocks) <
> +                       mp->m_low_space[XFS_LOWSP_1_PCNT]) {
> +               ok_alloc = false;
> +               low_space = true;
> +       }
> +
>         /*
>          * Loop until we find an allocation group that either has
> free inodes
>          * or in which we can allocate some inodes.  Iterate through
> the
> @@ -1795,6 +1810,8 @@ xfs_dialloc(
>                                 break;
>                         }
>                         flags = 0;
> +                       if (low_space)
> +                               ok_alloc = true;
>                 }
>                 xfs_perag_put(pag);
>         }
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 5118dedf9267..e8068422aa21 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1737,6 +1737,7 @@  xfs_dialloc(
 	struct xfs_perag	*pag;
 	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
 	bool			ok_alloc = true;
+	bool			low_space = false;
 	int			flags;
 	xfs_ino_t		ino;
 
@@ -1767,6 +1768,20 @@  xfs_dialloc(
 		ok_alloc = false;
 	}
 
+	/*
+	 * If we are near to ENOSPC, we want to prefer allocation from AGs that
+	 * have free inodes in them rather than use up free space allocating new
+	 * inode chunks. Hence we turn off allocation for the first non-blocking
+	 * pass through the AGs if we are near ENOSPC to consume free inodes
+	 * that we can immediately allocate, but then we allow allocation on the
+	 * second pass if we fail to find an AG with free inodes in it.
+	 */
+	if (percpu_counter_read_positive(&mp->m_fdblocks) <
+			mp->m_low_space[XFS_LOWSP_1_PCNT]) {
+		ok_alloc = false;
+		low_space = true;
+	}
+
 	/*
 	 * Loop until we find an allocation group that either has free inodes
 	 * or in which we can allocate some inodes.  Iterate through the
@@ -1795,6 +1810,8 @@  xfs_dialloc(
 				break;
 			}
 			flags = 0;
+			if (low_space)
+				ok_alloc = true;
 		}
 		xfs_perag_put(pag);
 	}