[04/14] xfs: document the user interface for online fsck

Message ID	165989702796.2495930.11103527352292676325.stgit@magnolia (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> Subject: [PATCH 04/14] xfs: document the user interface for online fsck From: "Darrick J. Wong" <djwong@kernel.org> To: djwong@kernel.org Cc: linux-xfs@vger.kernel.org, willy@infradead.org, chandan.babu@oracle.com, allison.henderson@oracle.com, linux-fsdevel@vger.kernel.org, hch@infradead.org, catherine.hoang@oracle.com Date: Sun, 07 Aug 2022 11:30:28 -0700 Message-ID: <165989702796.2495930.11103527352292676325.stgit@magnolia> In-Reply-To: <165989700514.2495930.13997256907290563223.stgit@magnolia> References: <165989700514.2495930.13997256907290563223.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk
Series	xfs: design documentation for online fsck \| expand [PATCHSET,v2,00/14] xfs: design documentation for online fsck [01/14] xfs: document the motivation for online fsck design [02/14] xfs: document the general theory underlying online fsck design [03/14] xfs: document the testing plan for online fsck [04/14] xfs: document the user interface for online fsck [05/14] xfs: document the filesystem metadata checking strategy [06/14] xfs: document how online fsck deals with eventual consistency [07/14] xfs: document pageable kernel memory [08/14] xfs: document btree bulk loading [09/14] xfs: document online file metadata repair code [10/14] xfs: document full filesystem scans for online fsck [11/14] xfs: document metadata file repair [12/14] xfs: document directory tree repairs [13/14] xfs: document the userspace fsck driver program [14/14] xfs: document future directions of online fsck

Message ID

165989702796.2495930.11103527352292676325.stgit@magnolia (mailing list archive)

State

New, archived

Headers

Subject: [PATCH 04/14] xfs: document the user interface for online fsck
From: "Darrick J. Wong" <djwong@kernel.org>
To: djwong@kernel.org
Cc: linux-xfs@vger.kernel.org, willy@infradead.org,
        chandan.babu@oracle.com, allison.henderson@oracle.com,
        linux-fsdevel@vger.kernel.org, hch@infradead.org,
        catherine.hoang@oracle.com
Date: Sun, 07 Aug 2022 11:30:28 -0700
Message-ID: <165989702796.2495930.11103527352292676325.stgit@magnolia>
In-Reply-To: <165989700514.2495930.13997256907290563223.stgit@magnolia>
References: <165989700514.2495930.13997256907290563223.stgit@magnolia>
User-Agent: StGit/0.19
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Precedence: bulk

Series

xfs: design documentation for online fsck | expand

Commit Message

Darrick J. Wong Aug. 7, 2022, 6:30 p.m. UTC

From: Darrick J. Wong <djwong@kernel.org>

Start the fourth chapter of the online fsck design documentation, which
discusses the user interface and the background scrubbing service.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 .../filesystems/xfs-online-fsck-design.rst         |  114 ++++++++++++++++++++
 1 file changed, 114 insertions(+)

Comments

Dave Chinner Aug. 11, 2022, 12:20 a.m. UTC | #1

On Sun, Aug 07, 2022 at 11:30:28AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Start the fourth chapter of the online fsck design documentation, which
> discusses the user interface and the background scrubbing service.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  .../filesystems/xfs-online-fsck-design.rst         |  114 ++++++++++++++++++++
>  1 file changed, 114 insertions(+)
> 
> 
> diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs-online-fsck-design.rst
> index d630b6bdbe4a..42e82971e036 100644
> --- a/Documentation/filesystems/xfs-online-fsck-design.rst
> +++ b/Documentation/filesystems/xfs-online-fsck-design.rst
> @@ -750,3 +750,117 @@ Proposed patchsets include `general stress testing
>  <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_
>  and the `evolution of existing per-function stress testing
>  <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_.
> +
> +4. User Interface
> +=================
> +
> +The primary user of online fsck is the system administrator, just like offline
> +repair.
> +Online fsck presents two modes of operation to administrators:
> +A foreground CLI process for online fsck on demand, and a background service
> +that performs autonomous checking and repair.
> +
> +Checking on Demand
> +------------------
> +
> +For administrators who want the absolute freshest information about the
> +metadata in a filesystem, ``xfs_scrub`` can be run as a foreground process on
> +a command line.
> +The program checks every piece of metadata in the filesystem while the
> +administrator waits for the results to be reported, just like the existing
> +``xfs_repair`` tool.
> +Both tools share a ``-n`` option to perform a read-only scan, and a ``-v``
> +option to increase the verbosity of the information reported.
> +
> +A new feature of ``xfs_scrub`` is the ``-x`` option, which employs the error
> +correction capabilities of the hardware to check data file contents.
> +The media scan is not enabled by default because it may dramatically increase
> +program runtime and consume a lot of bandwidth on older storage hardware.

So '-x' runs a media scrub command? What does that do with software
RAID? Does that trigger parity checks of the RAID volume, or pass
through to the underlying hardware to do physical media scrub? Or
maybe both?

Rewriting the paragraph to be focussed around the functionality
being provided (i.e "media scrubbing is a new feature of xfs_scrub.
It provides .....")

> +The output of a foreground invocation is captured in the system log.

At what log level?

> +The ``xfs_scrub_all`` program walks the list of mounted filesystems and
> +initiates ``xfs_scrub`` for each of them in parallel.
> +It serializes scans for any filesystems that resolve to the same top level
> +kernel block device to prevent resource overconsumption.

Is this serialisation necessary for non-HDD devices?

> +Background Service
> +------------------
> +
> +To reduce the workload of system administrators, the ``xfs_scrub`` package
> +provides a suite of `systemd <https://systemd.io/>`_ timers and services that
> +run online fsck automatically on weekends.

Weekends change depending on where you are in the world, right? So
maybe this should be more explicit?

[....]

> +**Question**: Should the health reporting integrate with the new inotify fs
> +error notification system?

Can the new inotify fs error notification system report complex
health information structures? How much pain is involved in making
it do what we want, considering we already have a health reporting
ioctl that can be polled?

> +**Question**: Would it be helpful for sysadmins to have a daemon to listen for
> +corruption notifications and initiate a repair?

Seems like an obvious extension to the online repair capability.

Cheers,

Dave.

Darrick J. Wong Aug. 16, 2022, 2:30 a.m. UTC | #2

On Thu, Aug 11, 2022 at 10:20:12AM +1000, Dave Chinner wrote:
> On Sun, Aug 07, 2022 at 11:30:28AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Start the fourth chapter of the online fsck design documentation, which
> > discusses the user interface and the background scrubbing service.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  .../filesystems/xfs-online-fsck-design.rst         |  114 ++++++++++++++++++++
> >  1 file changed, 114 insertions(+)
> > 
> > 
> > diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs-online-fsck-design.rst
> > index d630b6bdbe4a..42e82971e036 100644
> > --- a/Documentation/filesystems/xfs-online-fsck-design.rst
> > +++ b/Documentation/filesystems/xfs-online-fsck-design.rst
> > @@ -750,3 +750,117 @@ Proposed patchsets include `general stress testing
> >  <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_
> >  and the `evolution of existing per-function stress testing
> >  <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_.
> > +
> > +4. User Interface
> > +=================
> > +
> > +The primary user of online fsck is the system administrator, just like offline
> > +repair.
> > +Online fsck presents two modes of operation to administrators:
> > +A foreground CLI process for online fsck on demand, and a background service
> > +that performs autonomous checking and repair.
> > +
> > +Checking on Demand
> > +------------------
> > +
> > +For administrators who want the absolute freshest information about the
> > +metadata in a filesystem, ``xfs_scrub`` can be run as a foreground process on
> > +a command line.
> > +The program checks every piece of metadata in the filesystem while the
> > +administrator waits for the results to be reported, just like the existing
> > +``xfs_repair`` tool.
> > +Both tools share a ``-n`` option to perform a read-only scan, and a ``-v``
> > +option to increase the verbosity of the information reported.
> > +
> > +A new feature of ``xfs_scrub`` is the ``-x`` option, which employs the error
> > +correction capabilities of the hardware to check data file contents.
> > +The media scan is not enabled by default because it may dramatically increase
> > +program runtime and consume a lot of bandwidth on older storage hardware.
> 
> So '-x' runs a media scrub command? What does that do with software
> RAID?

Nothing special unless the RAID controller itself does parity checking
of reads -- the kernel doesn't have any API calls (that I know of) to do
that.  I think md-raid5 will check the parity, but afaict nothing else
(raid1) does that.

> Does that trigger parity checks of the RAID volume, or pass
> through to the underlying hardware to do physical media scrub?

Chaitanya proposed a userspace api so that xfs_scrub could actually ask
the hardware to perform a media verification[1], but willy pointed out
that it none of the device protocols have a means for the device to
prove that it did anything, so it stalled.

[1] https://lore.kernel.org/linux-fsdevel/20220713072019.5885-1-kch@nvidia.com/

> Or maybe both?

I wish. :)

> Rewriting the paragraph to be focussed around the functionality
> being provided (i.e "media scrubbing is a new feature of xfs_scrub.
> It provides .....")

Er.. you're doing that, or asking me to do it?

> > +The output of a foreground invocation is captured in the system log.
> 
> At what log level?

That depends on the message, but right now it only uses
LOG_{ERR,WARNING,INFO}.

Errors, corruptions, and unfixable problems are LOG_ERR.

Warnings are LOG_WARNING.

Notices of infomration, repairs completed, and optimizations made are
all recorded with LOG_INFO.

> > +The ``xfs_scrub_all`` program walks the list of mounted filesystems and
> > +initiates ``xfs_scrub`` for each of them in parallel.
> > +It serializes scans for any filesystems that resolve to the same top level
> > +kernel block device to prevent resource overconsumption.
> 
> Is this serialisation necessary for non-HDD devices?

That ultimately depends on the preferences of the sysadmins, but for the
initial push I'd rather err on the side of using fewer iops on a running
system.

> > +Background Service
> > +------------------
> > +
> > +To reduce the workload of system administrators, the ``xfs_scrub`` package
> > +provides a suite of `systemd <https://systemd.io/>`_ timers and services that
> > +run online fsck automatically on weekends.
> 
> Weekends change depending on where you are in the world, right? So
> maybe this should be more explicit?

Sunday at 3:10am, whenever that is in the local time zone.

> [....]
> 
> > +**Question**: Should the health reporting integrate with the new inotify fs
> > +error notification system?
> 
> Can the new inotify fs error notification system report complex
> health information structures?

In theory, yes, said the authors.

> How much pain is involved in making
> it do what we want, considering we already have a health reporting
> ioctl that can be polled?

I haven't tried this myself, but I think it involves defining a new type
code and message length within the inotify system.  The last time I
looked at the netlink protocol, I /think/ I saw that it's the case that
the consuming programs will read the header, see that there's a type
code and a buffer length, and decide to use it or skip it.

That said, there were some size and GFP_ limits on what could be sent,
so I don't know how difficult it would be to make this part actually
work in practice.  Gabriel said it wouldn't be too difficult once I was
ready.

> > +**Question**: Would it be helpful for sysadmins to have a daemon to listen for
> > +corruption notifications and initiate a repair?
> 
> Seems like an obvious extension to the online repair capability.

...too bad there are dragons thataways.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs-online-fsck-design.rst
index d630b6bdbe4a..42e82971e036 100644
--- a/Documentation/filesystems/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs-online-fsck-design.rst
@@ -750,3 +750,117 @@  Proposed patchsets include `general stress testing
 <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_
 and the `evolution of existing per-function stress testing
 <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_.
+
+4. User Interface
+=================
+
+The primary user of online fsck is the system administrator, just like offline
+repair.
+Online fsck presents two modes of operation to administrators:
+A foreground CLI process for online fsck on demand, and a background service
+that performs autonomous checking and repair.
+
+Checking on Demand
+------------------
+
+For administrators who want the absolute freshest information about the
+metadata in a filesystem, ``xfs_scrub`` can be run as a foreground process on
+a command line.
+The program checks every piece of metadata in the filesystem while the
+administrator waits for the results to be reported, just like the existing
+``xfs_repair`` tool.
+Both tools share a ``-n`` option to perform a read-only scan, and a ``-v``
+option to increase the verbosity of the information reported.
+
+A new feature of ``xfs_scrub`` is the ``-x`` option, which employs the error
+correction capabilities of the hardware to check data file contents.
+The media scan is not enabled by default because it may dramatically increase
+program runtime and consume a lot of bandwidth on older storage hardware.
+
+The output of a foreground invocation is captured in the system log.
+
+The ``xfs_scrub_all`` program walks the list of mounted filesystems and
+initiates ``xfs_scrub`` for each of them in parallel.
+It serializes scans for any filesystems that resolve to the same top level
+kernel block device to prevent resource overconsumption.
+
+Background Service
+------------------
+
+To reduce the workload of system administrators, the ``xfs_scrub`` package
+provides a suite of `systemd <https://systemd.io/>`_ timers and services that
+run online fsck automatically on weekends.
+The background service configures scrub to run with as little privilege as
+possible, the lowest CPU and IO priority, and in a CPU-constrained single
+threaded mode.
+It is hoped that this minimizes the amount of load generated on the system and
+avoids starving regular workloads.
+
+The output of the background service is also captured in the system log.
+If desired, reports of failures (either due to inconsistencies or mere runtime
+errors) can be emailed automatically by setting the ``EMAIL_ADDR`` environment
+variable in the following service files:
+
+* ``xfs_scrub_fail@.service``
+* ``xfs_scrub_media_fail@.service``
+* ``xfs_scrub_all_fail.service``
+
+The decision to enable the background scan is left to the system administrator.
+This can be done by enabling either of the following services:
+
+* ``xfs_scrub_all.timer`` on systemd systems
+* ``xfs_scrub_all.cron`` on non-systemd systems
+
+This automatic weekly scan is configured out of the box to perform an
+additional media scan of all file data once per month.
+This is less foolproof than, say, storing file data block checksums, but much
+more performant if application software provides its own integrity checking,
+redundancy can be provided elsewhere above the filesystem, or the storage
+device's integrity guarantees are deemed sufficient.
+
+The systemd unit file definitions have been subjected to a security audit
+(as of systemd 249) to ensure that the xfs_scrub processes have as little
+access to the rest of the system as possible.
+This was performed via ``systemd-analyze security``, after which privileges
+were restricted to the minimum required, sandboxing was set up to the maximal
+extent possible with sandboxing and system call filtering; and access to the
+filesystem tree was restricted to the minimum needed to start the program and
+access the filesystem being scanned.
+The service definition files restrict CPU usage to 80% of one CPU core, and
+apply as nice of a priority to IO and CPU scheduling as possible.
+This measure was taken to minimize delays in the rest of the filesystem.
+No such hardening has been performed for the cron job.
+
+Proposed patchset:
+`Enabling the xfs_scrub background service
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_.
+
+Health Reporting
+----------------
+
+XFS caches a summary of each filesystem's health status in memory.
+The information is updated whenever ``xfs_scrub`` is run, or whenever
+inconsistencies are detected in the filesystem metadata during regular
+operations.
+System administrators should use the ``health`` command of ``xfs_spaceman`` to
+download this information into a human-readable format.
+If problems have been observed, the administrator can schedule a reduced
+service window to run the online repair tool to correct the problem.
+Failing that, the administrator can decide to schedule a maintenance window to
+run the traditional offline repair tool to correct the problem.
+
+**Question**: Should the health reporting integrate with the new inotify fs
+error notification system?
+
+**Question**: Would it be helpful for sysadmins to have a daemon to listen for
+corruption notifications and initiate a repair?
+
+*Answer*: These questions remain unanswered, but should be a part of the
+conversation with early adopters and potential downstream users of XFS.
+
+Proposed patchsets include
+`wiring up health reports to correction returns
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=corruption-health-reports>`_
+and
+`preservation of sickness info during memory reclaim
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=indirect-health-reporting>`_.

[04/14] xfs: document the user interface for online fsck

Commit Message

Comments

Patch