mbox series

[GIT,PULL,1/22] xfs: design documentation for online fsck

Message ID 168127093760.417736.12181322234550374115.stg-ugh@frogsfrogsfrogs (mailing list archive)
State Deferred, archived
Headers show
Series [GIT,PULL,1/22] xfs: design documentation for online fsck | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git tags/online-fsck-design-6.4_2023-04-11

Message

Darrick J. Wong April 12, 2023, 3:45 a.m. UTC
Hi Dave,

Please pull this branch with changes for xfs.

As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts.  Please let me know if you
encounter any problems.

--D

The following changes since commit 09a9639e56c01c7a00d6c0ca63f4c7c41abe075d:

Linux 6.3-rc6 (2023-04-09 11:15:57 -0700)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git tags/online-fsck-design-6.4_2023-04-11

for you to fetch changes up to 03786f0afb2ed5705a0478e14fea50a7f1a44f7e:

xfs: document future directions of online fsck (2023-04-11 18:59:52 -0700)

----------------------------------------------------------------
xfs: design documentation for online fsck [v24.5]

After six years of development and a nearly two year hiatus from
patchbombing, I think it is time to resume the process of merging the
online fsck feature into XFS.  The full patchset comprises 105 separate
patchsets that capture 470 patches across the kernel, xfsprogs, and
fstests projects.

I would like to merge this feature into upstream in time for the 2023
LTS kernel.  As of 5.15 (aka last year's LTS), we have merged all
generally useful infrastructure improvements into the regular
filesystem.  The only changes to the core filesystem that remain are the
ones that are only useful to online fsck itself.  In other words, the
vast majority of the new code in the patchsets comprising the online
fsck feature are is mostly self contained and can be turned off via
Kconfig.

Many of you readers might be wondering -- why have I chosen to make one
large submission with 100+ patchsets comprising ~500 patches?  Why
didn't I merge small pieces of functionality bit by bit and revise
common code as necessary?  Well, the simple answer is that in the past
six years, the fundamental algorithms have been revised repeatedly as
I've built out the functionality.  In other words, the codebase as it is
now has the benefit that I now know every piece that's necessary to get
the job done in a reasonable manner and within the constraints laid out
by community reviews.  I believe this has reduced code churn in mainline
and freed up my time so that I can iterate faster.

As a concession to the mail servers, I'm breaking up the submission into
smaller pieces; I'm only pushing the design document and the revisions
to the existing scrub code, which is the first 20%% of the patches.
Also, I'm arbitrarily restarting the version numbering by reversioning
all patchsets from version 22 to epoch 23, version 1.

The big question to everyone reading this is: How might I convince you
that there is more merit in merging the whole feature and dealing with
the consequences than continuing to maintain it out of tree?

---------

To prepare the XFS community and potential patch reviewers for the
upstream submission of the online fsck feature, I decided to write a
document capturing the broader picture behind the online repair
development effort.  The document begins by defining the problems that
online fsck aims to solve and outlining specific use cases for the
functionality.

Using that as a base, the rest of the design document presents the high
level algorithms that fulfill the goals set out at the start and the
interactions between the large pieces of the system.  Case studies round
out the design documentation by adding the details of exactly how
specific parts of the online fsck code integrate the algorithms with the
filesystem.

The goal of this effort is to help the XFS community understand how the
gigantic online repair patchset works.  The questions I submit to the
community reviewers are:

1. As you read the design doc (and later the code), do you feel that you
understand what's going on well enough to try to fix a bug if you
found one?

2. What sorts of interactions between systems (or between scrub and the
rest of the kernel) am I missing?

3. Do you feel confident enough in the implementation as it is now that
the benefits of merging the feature (as EXPERIMENTAL) outweigh any
potential disruptions to XFS at large?

4. Are there problematic interactions between subsystems that ought to
be cleared up before merging?

5. Can I just merge all of this?

I intend to commit this document to the kernel's documentation directory
when we start merging the patchset, albeit without the links to
git.kernel.org.  A much more readable version of this is posted at:
https://djwong.org/docs/xfs-online-fsck-design/

v2: add missing sections about: all the in-kernel data structures and
new apis that the scrub and repair functions use; how xattrs and
directories are checked; how space btree records are checked; and
add more details to the parts where all these bits tie together.
Proofread for verb tense inconsistencies and eliminate vague 'we'
usage.  Move all the discussion of what we can do with pageable
kernel memory into a single source file and section.  Document where
log incompat feature locks fit into the locking model.

v3: resync with 6.0, fix a few typos, begin discussion of the merging
plan for this megapatchset.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>

----------------------------------------------------------------
Darrick J. Wong (14):
xfs: document the motivation for online fsck design
xfs: document the general theory underlying online fsck design
xfs: document the testing plan for online fsck
xfs: document the user interface for online fsck
xfs: document the filesystem metadata checking strategy
xfs: document how online fsck deals with eventual consistency
xfs: document pageable kernel memory
xfs: document btree bulk loading
xfs: document online file metadata repair code
xfs: document full filesystem scans for online fsck
xfs: document metadata file repair
xfs: document directory tree repairs
xfs: document the userspace fsck driver program
xfs: document future directions of online fsck

Documentation/filesystems/index.rst                |    1 +
.../filesystems/xfs-online-fsck-design.rst         | 5315 ++++++++++++++++++++
.../filesystems/xfs-self-describing-metadata.rst   |    1 +
3 files changed, 5317 insertions(+)
create mode 100644 Documentation/filesystems/xfs-online-fsck-design.rst