mbox series

[GIT,PULL] xfs: use busy extents for fstrim

Message ID ZRygqkCkbH32I+x9@dread.disaster.area (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] xfs: use busy extents for fstrim | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs tags/xfs-fstrim-busy-tag

Message

Dave Chinner Oct. 3, 2023, 11:15 p.m. UTC
Hi Chandan,

Can you please pull the changes to fstrim behaviour from the signed
tag below? This has been rebased on 6.6-rc4 so should merge cleanly
into a current tree.

Thanks,

Dave.

----------------------------------------------------------------
The following changes since commit 8a749fd1a8720d4619c91c8b6e7528c0a355c0aa:

  Linux 6.6-rc4 (2023-10-01 14:15:13 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs tags/xfs-fstrim-busy-tag

for you to fetch changes up to e78a40b851712b422d7d4ae345f25511d47a9a38:

  xfs: abort fstrim if kernel is suspending (2023-10-04 09:25:04 +1100)

----------------------------------------------------------------
xfs: reduce AGF hold times during fstrim operations

A recent log space overflow and recovery failure was root caused to
a long running truncate blocking on the AGF and ending up pinning
the tail of the log. The filesystem then hung, the machine was
rebooted, and log recoery then refused to run because there wasn't
enough space in the log for EFI transaction reservation.

The reason the long running truncate got blocked on the AGF for so
long was that an fstrim was being run. THe underlying block device
was large and very slow (10TB ceph rbd volume) and so discarding all
the free space in the AG took a really long time.

The current fstrim implementation holds the AGF across the entire
operations - both the freee space scan and the issuing of all the
discards. The discards are synchronous and single depth, so if there
are millions of free spaces, we hold the AGF lock across millions of
discard operations.

It doesn't really need to be said that this is a Bad Thing.

This series reworks the fstrim discard path to use the same
mechanisms as online discard. This allows discards to be issued
asynchronously without holding the AGF locked, enabling higher
discard queue depths (much faster on fast devices) and only
requiring the AGF lock to be held whilst we are scanning free space.

To do this, we make use of busy extents - we lock the AGF, mark all
the extents we want to discard as "busy under discard" so that
nothing will be allowed to allocate them, and then drop the AGF
lock. We then issue discards on the gathered busy extents and on
discard completion remove them from the busy list.

This results in AGF lock holds times for fstrim dropping to a few
milliseconds each batch of free extents we scan, and so the hours
long hold times that can currently occur on large, slow, badly
fragmented device no longer occur.

Signed-off-by: Dave Chinner <dchinner@redhat.com>

----------------------------------------------------------------
Dave Chinner (3):
      xfs: move log discard work to xfs_discard.c
      xfs: reduce AGF hold times during fstrim operations
      xfs: abort fstrim if kernel is suspending

 fs/xfs/xfs_discard.c     | 266 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 fs/xfs/xfs_discard.h     |   6 +-
 fs/xfs/xfs_extent_busy.c |  34 ++++++++--
 fs/xfs/xfs_extent_busy.h |  24 ++++++-
 fs/xfs/xfs_log_cil.c     |  93 ++++-----------------------
 fs/xfs/xfs_log_priv.h    |   5 +-
 6 files changed, 311 insertions(+), 117 deletions(-)

Comments

Chandan Babu R Oct. 4, 2023, 5:03 a.m. UTC | #1
On Wed, Oct 04, 2023 at 10:15:54 AM +1100, Dave Chinner wrote:
> Hi Chandan,
>
> Can you please pull the changes to fstrim behaviour from the signed
> tag below? This has been rebased on 6.6-rc4 so should merge cleanly
> into a current tree.
>

Thank you. I have merged the changes and started an fstests run now. I will
push the changes to xfs-linux's for-next branch tomorrow if I do not encounter
any regressions.