Message ID | 1473326635-30209-1-git-send-email-cmaiolino@redhat.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On 9/8/16 4:23 AM, Carlos Maiolino wrote: > Document the implementation of error handlers into sysfs. > > Changelog: > > V2: > - Add a description of the precedence order of each option, focusing on > the behavior of "fail_at_unmount" which was not well explained in V1 > > V3: > - Fix English spelling mistakes suggested by Eric Please put the patch version changelog after the "---" so it doesn't become part of the permanent commit log; it's for current patch reviewers, not for future code archaeologists. > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> > --- > Documentation/filesystems/xfs.txt | 70 +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 70 insertions(+) > > diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt > index 8146e9f..8b6c861 100644 > --- a/Documentation/filesystems/xfs.txt > +++ b/Documentation/filesystems/xfs.txt > @@ -348,3 +348,73 @@ Removed Sysctls > ---- ------- > fs.xfs.xfsbufd_centisec v4.0 > fs.xfs.age_buffer_centisecs v4.0 > + > +Error handling > +============== > + > +XFS can act differently according to the type of error found > +during its operation. The implementation introduces the following > +concepts to the error handler: > + > + -failure speed: > + Defines how fast XFS should shut down when of a specific error is found when a specific error is found > + during the filesystem operation. It can shut down immediately, after a > + defined number of retries, after a set time period, or simply retry > + forever. The old "retry forever" behavior is still the default, except > + during unmount, where any IOs retrying due to errors will be cancelled > + and unmount will be allowed to proceed. > + > + -error classes: > + Specifies the subsystem/location where the error handlers, such as location of the error handlers > + metadata or memory allocation. Only metadata IO errors are handled > + at this time. > + > + -error handlers: > + Defines the behavior for a specific error. > + > +The filesystem behavior during an error can be set via sysfs files, where the > +errors are organized with the structure below. Each configuration option works > +independently, the first condition met for a specific configuration will cause > +the filesystem to shut down: > + > + /sys/fs/xfs/<dev>/error/<class>/<error>/ The above line kind of hangs there oddly, because the first thing you do below is describe a file which isn't in the above hierarchy. Maybe we should show something like: + /sys/fs/xfs/<dev>/error/fail_at_unmount + /sys/fs/xfs/<dev>/error/<class>/<error>/<configuration> to show everything that might be under it? Not sure if that's better. > + > +Each directory contains: > + > + /sys/fs/xfs/<dev>/error/ > + > + fail_at_unmount (Min: 0 Default: 1 Max: 1) > + Defines the global error behavior at unmount time. If set to the > + default value of 1, XFS will cancel any pending IO retries, shut > + down, and unmount. If set to 0, pending IO retries may prevent > + the filesystem from unmounting. > + > + <class> subdirectories > + Contains specific error handlers configuration > + (Ex: /sys/fs/xfs/<dev>/error/metadata, see below). > + > + /sys/fs/xfs/<dev>/error/<class>/ > + > + Directory containing configuration for a specific error <class>; > + currently only the "metadata" <class> is implemented. > + The contents of this directory are <class> specific, since each <class> > + might need to handle different types of errors. > + > + /sys/fs/xfs/<dev>/error/<class>/<error>/ > + > + Contains the failure speed configuration files for specific errors in > + this <class, as well as a "default" behavior. Each <error> directory <class> > + contains the following configuration files: > + > + max_retries (Min: -1 Default: -1 Max: INTMAX) > + Defines the allowed number of retries of a specific error before > + the filesystem will shut down. The default value of "-1" will > + cause XFS to retry forever for this specific error. Setting it > + to "0" will cause XFS to fail immediately when the specific > + error is found, and setting it to "N," where N is greater than 0, > + will make XFS retry "N" times before shutting down. > + > + retry_timeout_seconds (Min: 0 Default: 0 Max: INTMAX) > + Define the amount of time (in seconds) that the filesystem is > + allowed to retry its operations when the specific error is > + found. The default value of "0" will cause XFS to retry forever. The default for ENODEV is different ... tricky to document that. Good luck. ;) The maximum for retry_timeout_seconds is 86400 (1 day), not INTMAX: retry_timeout_seconds_store() { ... /* 1 day timeout maximum */ if (val < 0 || val > 86400) return -EINVAL; ... } The default of -1 vs. 0 might change with the other patch I sent, but we can fix this up if it's accepted. -Eric
On Thu, Sep 08, 2016 at 09:29:18AM -0500, Eric Sandeen wrote: > On 9/8/16 4:23 AM, Carlos Maiolino wrote: > > Document the implementation of error handlers into sysfs. > > > > Changelog: > > > > V2: > > - Add a description of the precedence order of each option, focusing on > > the behavior of "fail_at_unmount" which was not well explained in V1 > > > > V3: > > - Fix English spelling mistakes suggested by Eric > > Please put the patch version changelog after the "---" so it doesn't become > part of the permanent commit log; it's for current patch reviewers, not for > future code archaeologists. Thanks, I'll make sure to do it with next patches too > > + > > +The filesystem behavior during an error can be set via sysfs files, where the > > +errors are organized with the structure below. Each configuration option works > > +independently, the first condition met for a specific configuration will cause > > +the filesystem to shut down: > > + > > + /sys/fs/xfs/<dev>/error/<class>/<error>/ > > The above line kind of hangs there oddly, because the first thing you do below > is describe a file which isn't in the above hierarchy. > > Maybe we should show something like: > > + /sys/fs/xfs/<dev>/error/fail_at_unmount > + /sys/fs/xfs/<dev>/error/<class>/<error>/<configuration> > > to show everything that might be under it? Not sure if that's better. > > > + > > +Each directory contains: > > + > > + /sys/fs/xfs/<dev>/error/ > > + > > + fail_at_unmount (Min: 0 Default: 1 Max: 1) > > + Defines the global error behavior at unmount time. If set to the > > + default value of 1, XFS will cancel any pending IO retries, shut > > + down, and unmount. If set to 0, pending IO retries may prevent > > + the filesystem from unmounting. > > + > > + <class> subdirectories > > + Contains specific error handlers configuration > > + (Ex: /sys/fs/xfs/<dev>/error/metadata, see below). > > + > > + /sys/fs/xfs/<dev>/error/<class>/ > > + > > + Directory containing configuration for a specific error <class>; > > + currently only the "metadata" <class> is implemented. > > + The contents of this directory are <class> specific, since each <class> > > + might need to handle different types of errors. > > + > > + /sys/fs/xfs/<dev>/error/<class>/<error>/ > > + > > The default for ENODEV is different ... tricky to document that. Good luck. ;) > > The maximum for retry_timeout_seconds is 86400 (1 day), not INTMAX: > > retry_timeout_seconds_store() > { > ... > /* 1 day timeout maximum */ > if (val < 0 || val > 86400) > return -EINVAL; > ... Fixing it, thanks for catching it, copy/paste sux :) > } > > The default of -1 vs. 0 might change with the other patch I sent, but we can > fix this up if it's accepted. > Ok. Thanks for the review, I'll submit a new version in a few
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt index 8146e9f..8b6c861 100644 --- a/Documentation/filesystems/xfs.txt +++ b/Documentation/filesystems/xfs.txt @@ -348,3 +348,73 @@ Removed Sysctls ---- ------- fs.xfs.xfsbufd_centisec v4.0 fs.xfs.age_buffer_centisecs v4.0 + +Error handling +============== + +XFS can act differently according to the type of error found +during its operation. The implementation introduces the following +concepts to the error handler: + + -failure speed: + Defines how fast XFS should shut down when of a specific error is found + during the filesystem operation. It can shut down immediately, after a + defined number of retries, after a set time period, or simply retry + forever. The old "retry forever" behavior is still the default, except + during unmount, where any IOs retrying due to errors will be cancelled + and unmount will be allowed to proceed. + + -error classes: + Specifies the subsystem/location where the error handlers, such as + metadata or memory allocation. Only metadata IO errors are handled + at this time. + + -error handlers: + Defines the behavior for a specific error. + +The filesystem behavior during an error can be set via sysfs files, where the +errors are organized with the structure below. Each configuration option works +independently, the first condition met for a specific configuration will cause +the filesystem to shut down: + + /sys/fs/xfs/<dev>/error/<class>/<error>/ + +Each directory contains: + + /sys/fs/xfs/<dev>/error/ + + fail_at_unmount (Min: 0 Default: 1 Max: 1) + Defines the global error behavior at unmount time. If set to the + default value of 1, XFS will cancel any pending IO retries, shut + down, and unmount. If set to 0, pending IO retries may prevent + the filesystem from unmounting. + + <class> subdirectories + Contains specific error handlers configuration + (Ex: /sys/fs/xfs/<dev>/error/metadata, see below). + + /sys/fs/xfs/<dev>/error/<class>/ + + Directory containing configuration for a specific error <class>; + currently only the "metadata" <class> is implemented. + The contents of this directory are <class> specific, since each <class> + might need to handle different types of errors. + + /sys/fs/xfs/<dev>/error/<class>/<error>/ + + Contains the failure speed configuration files for specific errors in + this <class, as well as a "default" behavior. Each <error> directory + contains the following configuration files: + + max_retries (Min: -1 Default: -1 Max: INTMAX) + Defines the allowed number of retries of a specific error before + the filesystem will shut down. The default value of "-1" will + cause XFS to retry forever for this specific error. Setting it + to "0" will cause XFS to fail immediately when the specific + error is found, and setting it to "N," where N is greater than 0, + will make XFS retry "N" times before shutting down. + + retry_timeout_seconds (Min: 0 Default: 0 Max: INTMAX) + Define the amount of time (in seconds) that the filesystem is + allowed to retry its operations when the specific error is + found. The default value of "0" will cause XFS to retry forever.
Document the implementation of error handlers into sysfs. Changelog: V2: - Add a description of the precedence order of each option, focusing on the behavior of "fail_at_unmount" which was not well explained in V1 V3: - Fix English spelling mistakes suggested by Eric Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> --- Documentation/filesystems/xfs.txt | 70 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+)