diff mbox

xfs: Document error handling behavior

Message ID 1468922657-3895-1-git-send-email-cmaiolino@redhat.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Carlos Maiolino July 19, 2016, 10:04 a.m. UTC
This is the first try to document the implementation of error handlers into
sysfs.

Reviews and comments are appreciated, please also notice I'm not english-native,
so, spelling corrections are also appreciated :)

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
---
 Documentation/filesystems/xfs.txt | 78 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

Comments

Eric Sandeen July 19, 2016, 9:15 p.m. UTC | #1
On 7/19/16 3:04 AM, Carlos Maiolino wrote:
> This is the first try to document the implementation of error handlers into
> sysfs.
> 
> Reviews and comments are appreciated, please also notice I'm not english-native,
> so, spelling corrections are also appreciated :)

Thanks for doing this! 

There seems to be a specific sysfs documentation format, see for example
Documentation/ABI/testing/sysfs-fs-ext4 

It might be better to follow that format, and refer to it after a brief
explanation of the functionality in the xfs.txt file?

-Eric
Eric Sandeen July 20, 2016, 6:18 a.m. UTC | #2
On 7/19/16 2:15 PM, Eric Sandeen wrote:
> On 7/19/16 3:04 AM, Carlos Maiolino wrote:
>> This is the first try to document the implementation of error handlers into
>> sysfs.
>>
>> Reviews and comments are appreciated, please also notice I'm not english-native,
>> so, spelling corrections are also appreciated :)
> 
> Thanks for doing this! 
> 
> There seems to be a specific sysfs documentation format, see for example
> Documentation/ABI/testing/sysfs-fs-ext4 
> 
> It might be better to follow that format, and refer to it after a brief
> explanation of the functionality in the xfs.txt file?

Or not; Dave doesn't like this location, so perhaps best not to take
my suggestion.  ;)

-Eric
Carlos Maiolino July 20, 2016, 9:04 a.m. UTC | #3
On Tue, Jul 19, 2016 at 11:18:01PM -0700, Eric Sandeen wrote:
> 
> 
> On 7/19/16 2:15 PM, Eric Sandeen wrote:
> > On 7/19/16 3:04 AM, Carlos Maiolino wrote:
> >> This is the first try to document the implementation of error handlers into
> >> sysfs.
> >>
> >> Reviews and comments are appreciated, please also notice I'm not english-native,
> >> so, spelling corrections are also appreciated :)
> > 
> > Thanks for doing this! 
> > 
> > There seems to be a specific sysfs documentation format, see for example
> > Documentation/ABI/testing/sysfs-fs-ext4 
> > 
> > It might be better to follow that format, and refer to it after a brief
> > explanation of the functionality in the xfs.txt file?
> 
> Or not; Dave doesn't like this location, so perhaps best not to take
> my suggestion.  ;)

Oh, I can see now why he doesn't like that, I've never seen such directory until
you mentioned it, why should it be so hidden, and why should we split filesystem
information into different locations.

IMHO, if someone want to take a look into filesystem documentation, the person
goes directly to Documentation/filesystems, I honestly think splitting
information into two different directories are wrong, and, even though you point
to there in some other place, it is still bad, sounds like a RPG book... Start
here...now go to page X...now go to page Y...now go to page Z.

I can re-format the documentation to the same format from sysfs-fs-ext4, but I
believe keeping it under Documentation/filesystems is still the best to do. To
be honest, I actually think we should create an XFS directory under it and put
everything xfs related there.

Cheers
> 
> -Eric
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
Jan Tulak July 20, 2016, 2 p.m. UTC | #4
On Wed, Jul 20, 2016 at 11:04 AM, Carlos Maiolino <cmaiolino@redhat.com> wrote:
>
> IMHO, if someone want to take a look into filesystem documentation, the person
> goes directly to Documentation/filesystems, I honestly think splitting
> information into two different directories are wrong, and, even though you point
> to there in some other place, it is still bad, sounds like a RPG book... Start
> here...now go to page X...now go to page Y...now go to page Z.
>

I'm sorry for this offtopic, but I would almost bet that I saw a game
made in man pages. Yet Google can't find anything, so maybe it is a
kind of deja vu, or a common experience... :(

Jan
Dave Chinner July 20, 2016, 10:25 p.m. UTC | #5
On Wed, Jul 20, 2016 at 11:04:06AM +0200, Carlos Maiolino wrote:
> On Tue, Jul 19, 2016 at 11:18:01PM -0700, Eric Sandeen wrote:
> > 
> > 
> > On 7/19/16 2:15 PM, Eric Sandeen wrote:
> > > On 7/19/16 3:04 AM, Carlos Maiolino wrote:
> > >> This is the first try to document the implementation of error handlers into
> > >> sysfs.
> > >>
> > >> Reviews and comments are appreciated, please also notice I'm not english-native,
> > >> so, spelling corrections are also appreciated :)
> > > 
> > > Thanks for doing this! 
> > > 
> > > There seems to be a specific sysfs documentation format, see for example
> > > Documentation/ABI/testing/sysfs-fs-ext4 
> > > 
> > > It might be better to follow that format, and refer to it after a brief
> > > explanation of the functionality in the xfs.txt file?
> > 
> > Or not; Dave doesn't like this location, so perhaps best not to take
> > my suggestion.  ;)
> 
> Oh, I can see now why he doesn't like that, I've never seen such directory until
> you mentioned it, why should it be so hidden, and why should we split filesystem
> information into different locations.
> 
> IMHO, if someone want to take a look into filesystem documentation, the person
> goes directly to Documentation/filesystems, I honestly think splitting
> information into two different directories are wrong, and, even though you point
> to there in some other place, it is still bad, sounds like a RPG book... Start
> here...now go to page X...now go to page Y...now go to page Z.
> 
> I can re-format the documentation to the same format from sysfs-fs-ext4, but I
> believe keeping it under Documentation/filesystems is still the best to do. To
> be honest, I actually think we should create an XFS directory under it and put
> everything xfs related there.

I'd just add it to Doc/fs/xfs.txt right now, and we can work out
restructuring details later. Especially as we really need this documentation
added to the xfs-documentation repo (along with a "how to use"
guide). It's a similar situation to the libxfs code shared between
kernel and userspace, I think...

Cheers,

Dave.
Zorro Lang July 22, 2016, 4:09 a.m. UTC | #6
On Tue, Jul 19, 2016 at 12:04:17PM +0200, Carlos Maiolino wrote:
> This is the first try to document the implementation of error handlers into
> sysfs.
> 
> Reviews and comments are appreciated, please also notice I'm not english-native,
> so, spelling corrections are also appreciated :)
> 
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
>  Documentation/filesystems/xfs.txt | 78 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 78 insertions(+)
> 
> diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
> index 8146e9f..1df868a 100644
> --- a/Documentation/filesystems/xfs.txt
> +++ b/Documentation/filesystems/xfs.txt
> @@ -348,3 +348,81 @@ Removed Sysctls
>    ----				-------
>    fs.xfs.xfsbufd_centisec	v4.0
>    fs.xfs.age_buffer_centisecs	v4.0
> +
> +Error handling
> +==============
> +
> +XFS can act differently according with the type of error found
> +during its operation. The implementation introduces the following
> +concepts to the error handler:
> +
> + -failure speed:
> +	Defines how fast XFS should shutdown in case of a specific
> +	error is found during the filesystem  operation. It can
> +	shutdown immediately, after a defined number of tries, or
> +	simply try forever, which was the old behavior and is now
> +	set as default behavior, except during unmount time, where
> +	in case of a error is found while unmounting, the filesystem
> +	will shutdown.
> +
> + -error classes:
> +	Specifies the subsystem/location where the error handlers
> +	configure the behavior for, such as metadata or memory allocation.
> +
> + -error handlers:
> +	Defines the behavior for a specific error.
> +
> +The filesystem behavior during an error can be set via sysfs files, where, the
> +errors are organized with the following structure:
> +
> +  /sys/fs/xfs/<dev>/error/<class>/<error>/
> +
> +Each directory contains:
> +
> + /sys/fs/xfs/<dev>/error/
> +
> +	fail_at_unmount		(Min:  0  Default:  1  Max: 1)
> +		Defines the global error behavior during unmount time. If set to
> +		"1", XFS will shutdown in case of any error is found, otherwise,
> +		if set to "0", the filesystem will indefinitely retry to cleanly
> +		unmount the filesystem.

Hi Carlos,

Could you explain more about the relationship of fail_at_unmount and
max_retries(/retry_timeout_seconds). For example, if I set fail_at_unmount=0,
and set EIO/max_retries=1, what's expected?

I'd like to write test case about this error handling, according to
your document.

Thanks,
Zorro

> +
> +	<class> subdirectories
> +		Contains specific error handlers configuration
> +		(Ex: /sys/fs/xfs/<dev>/error/metadata).
> +
> + /sys/fs/xfs/<dev>/error/<class>/
> +
> +	The contents of this directory are <class> specific, since each <class>
> +	might need to handle different types of errors. All <error> directory
> +	though, contains the "default" directory, which is a global configuration
> +	for errors not available for independent configuration.
> +
> + /sys/fs/xfs/<dev>/error/<class>/<error>
> +
> +	Contains the failure speed configuration files for each specific error,
> +	including the "default" behavior, which contains the same configuration
> +	options as the specific errors.
> +
> +	The available configurations for each error type are:
> +
> +	max_retries			(Min: -1  Default: -1  Max: INTMAX)
> +		Define how many tries the filesystem is allowed to retry its
> +		operations during the specific error, before shutdown the
> +		filesystem. Setting this file to "-1", will set XFS to retry
> +		forever in the specific error, setting it to "0", will make
> +		XFS to fail immediately after the specific error is found,
> +		while setting it to a "N" value, where N is greater than 0,
> +		will make XFS retry "N" times before shutdown.
> +
> +	retry_timeout_seconds		(Min:  0  Default:  0  Max: INTMAX)
> +		Define the amount of time (in seconds) that the filesystem is
> +		allowed to retry its operations when the specific error is
> +		found. "0" means no wait time.
> +
> +
> +	"max_retries" takes precedence over "retry_timeout_seconds", where,
> +	"retry_timeout_seconds" will only be tested if the "max_retries" limit
> +	were not reached yet or is set to retry forever ("-1"). If "max_retries"
> +	limit is reached, the filesystem will shutdown, wether or not
> +	"retry_timeout_seconds" has been reached.
> -- 
> 2.7.4
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
Carlos Maiolino July 22, 2016, 8:58 a.m. UTC | #7
On Fri, Jul 22, 2016 at 12:09:55PM +0800, Zorro Lang wrote:
> On Tue, Jul 19, 2016 at 12:04:17PM +0200, Carlos Maiolino wrote:
> > This is the first try to document the implementation of error handlers into
> > sysfs.
> > 
> > Reviews and comments are appreciated, please also notice I'm not english-native,
> > so, spelling corrections are also appreciated :)
> > 
> > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> > ---
> >  Documentation/filesystems/xfs.txt | 78 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 78 insertions(+)
> > 
> > diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
> > index 8146e9f..1df868a 100644
> > --- a/Documentation/filesystems/xfs.txt
> > +++ b/Documentation/filesystems/xfs.txt
> > @@ -348,3 +348,81 @@ Removed Sysctls
> >    ----				-------
> >    fs.xfs.xfsbufd_centisec	v4.0
> >    fs.xfs.age_buffer_centisecs	v4.0
> > +
> > +Error handling
> > +==============
> > +
> > +XFS can act differently according with the type of error found
> > +during its operation. The implementation introduces the following
> > +concepts to the error handler:
> > +
> > + -failure speed:
> > +	Defines how fast XFS should shutdown in case of a specific
> > +	error is found during the filesystem  operation. It can
> > +	shutdown immediately, after a defined number of tries, or
> > +	simply try forever, which was the old behavior and is now
> > +	set as default behavior, except during unmount time, where
> > +	in case of a error is found while unmounting, the filesystem
> > +	will shutdown.
> > +
> > + -error classes:
> > +	Specifies the subsystem/location where the error handlers
> > +	configure the behavior for, such as metadata or memory allocation.
> > +
> > + -error handlers:
> > +	Defines the behavior for a specific error.
> > +
> > +The filesystem behavior during an error can be set via sysfs files, where, the
> > +errors are organized with the following structure:
> > +
> > +  /sys/fs/xfs/<dev>/error/<class>/<error>/
> > +
> > +Each directory contains:
> > +
> > + /sys/fs/xfs/<dev>/error/
> > +
> > +	fail_at_unmount		(Min:  0  Default:  1  Max: 1)
> > +		Defines the global error behavior during unmount time. If set to
> > +		"1", XFS will shutdown in case of any error is found, otherwise,
> > +		if set to "0", the filesystem will indefinitely retry to cleanly
> > +		unmount the filesystem.
> 
> Hi Carlos,
> 
> Could you explain more about the relationship of fail_at_unmount and
> max_retries(/retry_timeout_seconds). For example, if I set fail_at_unmount=0,
> and set EIO/max_retries=1, what's expected?
> 

They are different options, if max_retries is set to 1, it will fail
after the first try as expected, even if during unmount, and even if
fail_at_unmount = 0.

The problem, and the reason for us to have added fail_at_unmount, is that, you
can't change any configuration after umount is issued, because the sysfs
directory for the device being unmounted will be detached from sysfs, so, if the
sysadmin wants to make XFS retry forever for any error during the filesystem
operation, he is still able to unmount the filesystem "properly" (since, if the
FS find errors, it might not be a clean mount) if he sets fail_at_unmount,
otherwise, he might have umount process stuck forever.


> I'd like to write test case about this error handling, according to
> your document.
> 
> Thanks,
> Zorro
> 
> > +
> > +	<class> subdirectories
> > +		Contains specific error handlers configuration
> > +		(Ex: /sys/fs/xfs/<dev>/error/metadata).
> > +
> > + /sys/fs/xfs/<dev>/error/<class>/
> > +
> > +	The contents of this directory are <class> specific, since each <class>
> > +	might need to handle different types of errors. All <error> directory
> > +	though, contains the "default" directory, which is a global configuration
> > +	for errors not available for independent configuration.
> > +
> > + /sys/fs/xfs/<dev>/error/<class>/<error>
> > +
> > +	Contains the failure speed configuration files for each specific error,
> > +	including the "default" behavior, which contains the same configuration
> > +	options as the specific errors.
> > +
> > +	The available configurations for each error type are:
> > +
> > +	max_retries			(Min: -1  Default: -1  Max: INTMAX)
> > +		Define how many tries the filesystem is allowed to retry its
> > +		operations during the specific error, before shutdown the
> > +		filesystem. Setting this file to "-1", will set XFS to retry
> > +		forever in the specific error, setting it to "0", will make
> > +		XFS to fail immediately after the specific error is found,
> > +		while setting it to a "N" value, where N is greater than 0,
> > +		will make XFS retry "N" times before shutdown.
> > +
> > +	retry_timeout_seconds		(Min:  0  Default:  0  Max: INTMAX)
> > +		Define the amount of time (in seconds) that the filesystem is
> > +		allowed to retry its operations when the specific error is
> > +		found. "0" means no wait time.
> > +
> > +
> > +	"max_retries" takes precedence over "retry_timeout_seconds", where,
> > +	"retry_timeout_seconds" will only be tested if the "max_retries" limit
> > +	were not reached yet or is set to retry forever ("-1"). If "max_retries"
> > +	limit is reached, the filesystem will shutdown, wether or not
> > +	"retry_timeout_seconds" has been reached.
> > -- 
> > 2.7.4
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
Carlos Eduardo Maiolino Aug. 8, 2016, 10:57 a.m. UTC | #8
Hi folks,

is there any update about this? I didn't see any comments if I need to change
something on this patch to get the documentation applied, or perhaps I missed some e-mail?



Cheers

----- Original Message ----- 
From: "Carlos Maiolino" <cmaiolino@redhat.com> 
To: "Zorro Lang" <zlang@redhat.com> 
Cc: xfs@oss.sgi.com 
Sent: Friday, July 22, 2016 10:58:04 AM 
Subject: Re: [PATCH] xfs: Document error handling behavior 

On Fri, Jul 22, 2016 at 12:09:55PM +0800, Zorro Lang wrote: 
> On Tue, Jul 19, 2016 at 12:04:17PM +0200, Carlos Maiolino wrote: 
> > This is the first try to document the implementation of error handlers into 
> > sysfs. 
> > 
> > Reviews and comments are appreciated, please also notice I'm not english-native, 
> > so, spelling corrections are also appreciated :) 
> > 
> > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> 
> > ---
Dave Chinner Aug. 8, 2016, 10:40 p.m. UTC | #9
On Mon, Aug 08, 2016 at 06:57:15AM -0400, Carlos Eduardo Maiolino wrote:
> Hi folks,
> 
> is there any update about this? I didn't see any comments if I need to change
> something on this patch to get the documentation applied, or perhaps I missed some e-mail?

I've been waiting for a v2.

i.e. If you have to explain how fail at unmount works (or doesn't,
in this case) during review, then that clearly needs to be added to
the documentation.

Cheers,

Dave.
Carlos Maiolino Aug. 9, 2016, 8:11 a.m. UTC | #10
On Tue, Aug 09, 2016 at 08:40:11AM +1000, Dave Chinner wrote:
> On Mon, Aug 08, 2016 at 06:57:15AM -0400, Carlos Eduardo Maiolino wrote:
> > Hi folks,
> > 
> > is there any update about this? I didn't see any comments if I need to change
> > something on this patch to get the documentation applied, or perhaps I missed some e-mail?
> 
> I've been waiting for a v2.
> 
> i.e. If you have to explain how fail at unmount works (or doesn't,
> in this case) during review, then that clearly needs to be added to
> the documentation.

Well, can't argue with that, I'll rework it and send a V2 :)


> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox

Patch

diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 8146e9f..1df868a 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -348,3 +348,81 @@  Removed Sysctls
   ----				-------
   fs.xfs.xfsbufd_centisec	v4.0
   fs.xfs.age_buffer_centisecs	v4.0
+
+Error handling
+==============
+
+XFS can act differently according with the type of error found
+during its operation. The implementation introduces the following
+concepts to the error handler:
+
+ -failure speed:
+	Defines how fast XFS should shutdown in case of a specific
+	error is found during the filesystem  operation. It can
+	shutdown immediately, after a defined number of tries, or
+	simply try forever, which was the old behavior and is now
+	set as default behavior, except during unmount time, where
+	in case of a error is found while unmounting, the filesystem
+	will shutdown.
+
+ -error classes:
+	Specifies the subsystem/location where the error handlers
+	configure the behavior for, such as metadata or memory allocation.
+
+ -error handlers:
+	Defines the behavior for a specific error.
+
+The filesystem behavior during an error can be set via sysfs files, where, the
+errors are organized with the following structure:
+
+  /sys/fs/xfs/<dev>/error/<class>/<error>/
+
+Each directory contains:
+
+ /sys/fs/xfs/<dev>/error/
+
+	fail_at_unmount		(Min:  0  Default:  1  Max: 1)
+		Defines the global error behavior during unmount time. If set to
+		"1", XFS will shutdown in case of any error is found, otherwise,
+		if set to "0", the filesystem will indefinitely retry to cleanly
+		unmount the filesystem.
+
+	<class> subdirectories
+		Contains specific error handlers configuration
+		(Ex: /sys/fs/xfs/<dev>/error/metadata).
+
+ /sys/fs/xfs/<dev>/error/<class>/
+
+	The contents of this directory are <class> specific, since each <class>
+	might need to handle different types of errors. All <error> directory
+	though, contains the "default" directory, which is a global configuration
+	for errors not available for independent configuration.
+
+ /sys/fs/xfs/<dev>/error/<class>/<error>
+
+	Contains the failure speed configuration files for each specific error,
+	including the "default" behavior, which contains the same configuration
+	options as the specific errors.
+
+	The available configurations for each error type are:
+
+	max_retries			(Min: -1  Default: -1  Max: INTMAX)
+		Define how many tries the filesystem is allowed to retry its
+		operations during the specific error, before shutdown the
+		filesystem. Setting this file to "-1", will set XFS to retry
+		forever in the specific error, setting it to "0", will make
+		XFS to fail immediately after the specific error is found,
+		while setting it to a "N" value, where N is greater than 0,
+		will make XFS retry "N" times before shutdown.
+
+	retry_timeout_seconds		(Min:  0  Default:  0  Max: INTMAX)
+		Define the amount of time (in seconds) that the filesystem is
+		allowed to retry its operations when the specific error is
+		found. "0" means no wait time.
+
+
+	"max_retries" takes precedence over "retry_timeout_seconds", where,
+	"retry_timeout_seconds" will only be tested if the "max_retries" limit
+	were not reached yet or is set to retry forever ("-1"). If "max_retries"
+	limit is reached, the filesystem will shutdown, wether or not
+	"retry_timeout_seconds" has been reached.