mbox series

[PATCHSET,v27.0,0/7] xfs: reserve disk space for online repairs

Message ID 169577059140.3312911.17578000557997208473.stgit@frogsfrogsfrogs (mailing list archive)
Headers show
Series xfs: reserve disk space for online repairs | expand

Message

Darrick J. Wong Sept. 26, 2023, 11:29 p.m. UTC
Hi all,

Online repair fixes metadata structures by writing a new copy out to
disk and atomically committing the new structure into the filesystem.
For this to work, we need to reserve all the space we're going to need
ahead of time so that the atomic commit transaction is as small as
possible.  We also require the reserved space to be freed if the system
goes down, or if we decide not to commit the repair, or if we reserve
too much space.

To keep the atomic commit transaction as small as possible, we would
like to allocate some space and simultaneously schedule automatic
reaping of the reserved space, even on log recovery.  EFIs are the
mechanism to get us there, but we need to use them in a novel manner.
Once we allocate the space, we want to hold on to the EFI (relogging as
necessary) until we can commit or cancel the repair.  EFIs for written
committed blocks need to go away, but unwritten or uncommitted blocks
can be freed like normal.

Earlier versions of this patchset directly manipulated the log items,
but Dave thought that to be a layering violation.  For v27, I've
modified the defer ops handling code to be capable of pausing a deferred
work item.  Log intent items are created as they always have been, but
paused items are pushed onto a side list when finishing deferred work
items, and pushed back onto the transaction after that.  Log intent done
item are not created for paused work.

The second part adds a "stale" flag to the EFI so that the repair
reservation code can dispose of an EFI the normal way, but without the
space actually being freed.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-auto-reap-space-reservations

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=repair-auto-reap-space-reservations
---
 fs/xfs/Makefile                    |    1 
 fs/xfs/libxfs/xfs_ag.c             |    2 
 fs/xfs/libxfs/xfs_alloc.c          |  102 +++++++
 fs/xfs/libxfs/xfs_alloc.h          |   22 +-
 fs/xfs/libxfs/xfs_bmap.c           |    4 
 fs/xfs/libxfs/xfs_bmap_btree.c     |    2 
 fs/xfs/libxfs/xfs_btree_staging.h  |    7 
 fs/xfs/libxfs/xfs_defer.c          |  229 ++++++++++++++--
 fs/xfs/libxfs/xfs_defer.h          |   20 +
 fs/xfs/libxfs/xfs_ialloc.c         |    5 
 fs/xfs/libxfs/xfs_ialloc_btree.c   |    2 
 fs/xfs/libxfs/xfs_refcount.c       |    6 
 fs/xfs/libxfs/xfs_refcount_btree.c |    2 
 fs/xfs/scrub/agheader_repair.c     |    1 
 fs/xfs/scrub/common.c              |    1 
 fs/xfs/scrub/newbt.c               |  510 ++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/newbt.h               |   65 +++++
 fs/xfs/scrub/reap.c                |    7 
 fs/xfs/scrub/scrub.c               |    2 
 fs/xfs/scrub/trace.h               |   37 +++
 fs/xfs/xfs_extfree_item.c          |   13 +
 fs/xfs/xfs_reflink.c               |    2 
 fs/xfs/xfs_trace.h                 |   13 +
 23 files changed, 991 insertions(+), 64 deletions(-)
 create mode 100644 fs/xfs/scrub/newbt.c
 create mode 100644 fs/xfs/scrub/newbt.h

Comments

Darrick J. Wong Oct. 4, 2023, 11:32 p.m. UTC | #1
On Tue, Sep 26, 2023 at 04:29:17PM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> Online repair fixes metadata structures by writing a new copy out to
> disk and atomically committing the new structure into the filesystem.
> For this to work, we need to reserve all the space we're going to need
> ahead of time so that the atomic commit transaction is as small as
> possible.  We also require the reserved space to be freed if the system
> goes down, or if we decide not to commit the repair, or if we reserve
> too much space.

Just a heads up -- I rebased my development branch on 6.6-rc4 and
uploaded the whole mess to kernel.org.  These earlier patchsets are more
or less the same, but wanted people to be able to build the branch with
the latest (xfs/iomap) bugfixes.

In the meantime, I'll go look through Dave's allocator reorg series and
figure out how that meshes with the forcealign series that John sent.

--D

> To keep the atomic commit transaction as small as possible, we would
> like to allocate some space and simultaneously schedule automatic
> reaping of the reserved space, even on log recovery.  EFIs are the
> mechanism to get us there, but we need to use them in a novel manner.
> Once we allocate the space, we want to hold on to the EFI (relogging as
> necessary) until we can commit or cancel the repair.  EFIs for written
> committed blocks need to go away, but unwritten or uncommitted blocks
> can be freed like normal.
> 
> Earlier versions of this patchset directly manipulated the log items,
> but Dave thought that to be a layering violation.  For v27, I've
> modified the defer ops handling code to be capable of pausing a deferred
> work item.  Log intent items are created as they always have been, but
> paused items are pushed onto a side list when finishing deferred work
> items, and pushed back onto the transaction after that.  Log intent done
> item are not created for paused work.
> 
> The second part adds a "stale" flag to the EFI so that the repair
> reservation code can dispose of an EFI the normal way, but without the
> space actually being freed.
> 
> If you're going to start using this code, I strongly recommend pulling
> from my git trees, which are linked below.
> 
> This has been running on the djcloud for months with no problems.  Enjoy!
> Comments and questions are, as always, welcome.
> 
> --D
> 
> kernel git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-auto-reap-space-reservations
> 
> xfsprogs git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=repair-auto-reap-space-reservations
> ---
>  fs/xfs/Makefile                    |    1 
>  fs/xfs/libxfs/xfs_ag.c             |    2 
>  fs/xfs/libxfs/xfs_alloc.c          |  102 +++++++
>  fs/xfs/libxfs/xfs_alloc.h          |   22 +-
>  fs/xfs/libxfs/xfs_bmap.c           |    4 
>  fs/xfs/libxfs/xfs_bmap_btree.c     |    2 
>  fs/xfs/libxfs/xfs_btree_staging.h  |    7 
>  fs/xfs/libxfs/xfs_defer.c          |  229 ++++++++++++++--
>  fs/xfs/libxfs/xfs_defer.h          |   20 +
>  fs/xfs/libxfs/xfs_ialloc.c         |    5 
>  fs/xfs/libxfs/xfs_ialloc_btree.c   |    2 
>  fs/xfs/libxfs/xfs_refcount.c       |    6 
>  fs/xfs/libxfs/xfs_refcount_btree.c |    2 
>  fs/xfs/scrub/agheader_repair.c     |    1 
>  fs/xfs/scrub/common.c              |    1 
>  fs/xfs/scrub/newbt.c               |  510 ++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/newbt.h               |   65 +++++
>  fs/xfs/scrub/reap.c                |    7 
>  fs/xfs/scrub/scrub.c               |    2 
>  fs/xfs/scrub/trace.h               |   37 +++
>  fs/xfs/xfs_extfree_item.c          |   13 +
>  fs/xfs/xfs_reflink.c               |    2 
>  fs/xfs/xfs_trace.h                 |   13 +
>  23 files changed, 991 insertions(+), 64 deletions(-)
>  create mode 100644 fs/xfs/scrub/newbt.c
>  create mode 100644 fs/xfs/scrub/newbt.h
>