diff mbox series

[v8,30/32] ext4: Send notifications on error

Message ID 20211019000015.1666608-31-krisman@collabora.com (mailing list archive)
State New, archived
Headers show
Series file system-wide error monitoring | expand

Commit Message

Gabriel Krisman Bertazi Oct. 19, 2021, midnight UTC
Send a FS_ERROR message via fsnotify to a userspace monitoring tool
whenever a ext4 error condition is triggered.  This follows the existing
error conditions in ext4, so it is hooked to the ext4_error* functions.

It also follows the current dmesg reporting in the format.  The
filesystem message is composed mostly by the string that would be
otherwise printed in dmesg.

A new ext4 specific record format is exposed in the uapi, such that a
monitoring tool knows what to expect when listening errors of an ext4
filesystem.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v6:
  - Report ext4_std_errors agains superblock (jan)
---
 fs/ext4/super.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jan Kara Oct. 19, 2021, 3:44 p.m. UTC | #1
On Mon 18-10-21 21:00:13, Gabriel Krisman Bertazi wrote:
> Send a FS_ERROR message via fsnotify to a userspace monitoring tool
> whenever a ext4 error condition is triggered.  This follows the existing
> error conditions in ext4, so it is hooked to the ext4_error* functions.
> 
> It also follows the current dmesg reporting in the format.  The
> filesystem message is composed mostly by the string that would be
> otherwise printed in dmesg.
> 
> A new ext4 specific record format is exposed in the uapi, such that a
> monitoring tool knows what to expect when listening errors of an ext4
> filesystem.
> 
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> Reviewed-by: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> 
> ---
> Changes since v6:
>   - Report ext4_std_errors agains superblock (jan)
> ---
>  fs/ext4/super.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 88d5d274a868..67183e6b1920 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -46,6 +46,7 @@
>  #include <linux/part_stat.h>
>  #include <linux/kthread.h>
>  #include <linux/freezer.h>
> +#include <linux/fsnotify.h>
>  
>  #include "ext4.h"
>  #include "ext4_extents.h"	/* Needed for trace points definition */
> @@ -759,6 +760,8 @@ void __ext4_error(struct super_block *sb, const char *function,
>  		       sb->s_id, function, line, current->comm, &vaf);
>  		va_end(args);
>  	}
> +	fsnotify_sb_error(sb, NULL, error);
> +
>  	ext4_handle_error(sb, force_ro, error, 0, block, function, line);
>  }
>  
> @@ -789,6 +792,8 @@ void __ext4_error_inode(struct inode *inode, const char *function,
>  			       current->comm, &vaf);
>  		va_end(args);
>  	}
> +	fsnotify_sb_error(inode->i_sb, inode, error);
> +
>  	ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
>  			  function, line);
>  }
> @@ -827,6 +832,8 @@ void __ext4_error_file(struct file *file, const char *function,
>  			       current->comm, path, &vaf);
>  		va_end(args);
>  	}
> +	fsnotify_sb_error(inode->i_sb, inode, EFSCORRUPTED);
> +
>  	ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
>  			  function, line);
>  }
> @@ -894,6 +901,7 @@ void __ext4_std_error(struct super_block *sb, const char *function,
>  		printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
>  		       sb->s_id, function, line, errstr);
>  	}
> +	fsnotify_sb_error(sb, NULL, errno);
>  
>  	ext4_handle_error(sb, false, -errno, 0, 0, function, line);
>  }
> -- 
> 2.33.0
>
Jan Kara Oct. 19, 2021, 4:01 p.m. UTC | #2
On Tue 19-10-21 17:44:26, Jan Kara wrote:
> On Mon 18-10-21 21:00:13, Gabriel Krisman Bertazi wrote:
> > Send a FS_ERROR message via fsnotify to a userspace monitoring tool
> > whenever a ext4 error condition is triggered.  This follows the existing
> > error conditions in ext4, so it is hooked to the ext4_error* functions.
> > 
> > It also follows the current dmesg reporting in the format.  The
> > filesystem message is composed mostly by the string that would be
> > otherwise printed in dmesg.
> > 
> > A new ext4 specific record format is exposed in the uapi, such that a
> > monitoring tool knows what to expect when listening errors of an ext4
> > filesystem.
> > 
> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> > Reviewed-by: Theodore Ts'o <tytso@mit.edu>
> > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> 
> Looks good to me. Feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

Hum, I actually retract this because the code doesn't match what is written
in the documentation and I'm not 100% sure what is correct. In particular:

> > @@ -759,6 +760,8 @@ void __ext4_error(struct super_block *sb, const char *function,
> >  		       sb->s_id, function, line, current->comm, &vaf);
> >  		va_end(args);
> >  	}
> > +	fsnotify_sb_error(sb, NULL, error);
> > +

E.g. here you pass the 'error' to fsnotify. This will be just standard
'errno' number, not ext4 error code as described in the documentation. Also
note that frequently 'error' will be 0 which gets magically transformed to
EFSCORRUPTED in save_error_info() in the ext4 error handling below. So
there's clearly some more work to do...

								Honza
Gabriel Krisman Bertazi Oct. 19, 2021, 4:54 p.m. UTC | #3
Jan Kara <jack@suse.cz> writes:

> On Tue 19-10-21 17:44:26, Jan Kara wrote:
>> On Mon 18-10-21 21:00:13, Gabriel Krisman Bertazi wrote:
>> > Send a FS_ERROR message via fsnotify to a userspace monitoring tool
>> > whenever a ext4 error condition is triggered.  This follows the existing
>> > error conditions in ext4, so it is hooked to the ext4_error* functions.
>> > 
>> > It also follows the current dmesg reporting in the format.  The
>> > filesystem message is composed mostly by the string that would be
>> > otherwise printed in dmesg.
>> > 
>> > A new ext4 specific record format is exposed in the uapi, such that a
>> > monitoring tool knows what to expect when listening errors of an ext4
>> > filesystem.
>> > 
>> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
>> > Reviewed-by: Theodore Ts'o <tytso@mit.edu>
>> > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>> 
>> Looks good to me. Feel free to add:
>> 
>> Reviewed-by: Jan Kara <jack@suse.cz>
>
> Hum, I actually retract this because the code doesn't match what is written
> in the documentation and I'm not 100% sure what is correct. In particular:
>
>> > @@ -759,6 +760,8 @@ void __ext4_error(struct super_block *sb, const char *function,
>> >  		       sb->s_id, function, line, current->comm, &vaf);
>> >  		va_end(args);
>> >  	}
>> > +	fsnotify_sb_error(sb, NULL, error);
>> > +
>
> E.g. here you pass the 'error' to fsnotify. This will be just standard
> 'errno' number, not ext4 error code as described in the documentation. Also
> note that frequently 'error' will be 0 which gets magically transformed to
> EFSCORRUPTED in save_error_info() in the ext4 error handling below. So
> there's clearly some more work to do...

Nice catch.

The many 0 returns were discussed before, around v3.  You can notice one
of my LTP tests is designed to catch that.  We agreed ext4 shouldn't be
returning 0, and that we would write a patch to fix it, but I didn't
think it belonged as part of this series.

You are also right about the EXT4_ vs. errno.  the documentation is
buggy, since it was brought from the fs-specific descriptor days, which
no longer exists.  Nevertheless, I think there is a case for always
returning file system specific errors here, since they are more
descriptive.

Should we agree to follow the documentation and return FS specific
errors instead of errno, then?

Either way, I'm dropping all r-by flags here.
Theodore Ts'o Oct. 20, 2021, 3:11 a.m. UTC | #4
On Tue, Oct 19, 2021 at 01:54:59PM -0300, Gabriel Krisman Bertazi wrote:
> >
> > E.g. here you pass the 'error' to fsnotify. This will be just standard
> > 'errno' number, not ext4 error code as described in the documentation. Also
> > note that frequently 'error' will be 0 which gets magically transformed to
> > EFSCORRUPTED in save_error_info() in the ext4 error handling below. So
> > there's clearly some more work to do...
> 
> The many 0 returns were discussed before, around v3.  You can notice one
> of my LTP tests is designed to catch that.  We agreed ext4 shouldn't be
> returning 0, and that we would write a patch to fix it, but I didn't
> think it belonged as part of this series.

The fact that ext4 passes 0 into __ext4_error() to mean EFSCORRUPTED
is an internal implementation detail, and as currently implemented it
is *not* a bug.  It was just a convenience to minimize the number of
call sites that needed to be modified when we added the feature of
storing the error code to be stored in the superblock.

So I think this is something that should be addressed in this
patchset, and it's pretty simple to do so.  It's just a matter of
doing something like this:

      fsnotify_sb_error(sb, NULL, error ? error : EFSCORRUPTED);


> You are also right about the EXT4_ vs. errno.  the documentation is
> buggy, since it was brought from the fs-specific descriptor days, which
> no longer exists.  Nevertheless, I think there is a case for always
> returning file system specific errors here, since they are more
> descriptive.

So the history is that ext4 specific errors were used because we were
storing them in the superblock --- and so we need an architecture
independent way of storing the error codes.  (Errno codes are not
stable across architectures; and consider what might happen if we had
error codes written on an say, an ARM platform, and then that disk is
attached to an Alpha, S390, or Power system?)

> Should we agree to follow the documentation and return FS specific
> errors instead of errno, then?

I disagree.  We should use errno's, for a couple of reasons.  First of
all, users of fsnotify shouldn't need to know which file system to
interpret the error codes.

Secondly, the reason why ext4 has file system specific error cdoes is
because those codes are written into the superblock, and errno's are
not stable across different architectures.  So for ext4, we needed to
worry what might happen if the error code was written while the file
system was mounted on say, an ARM-64 system, and then storage device
might get attached to a S390, Alpha, or PA-RISC system.  This is not a
problem that the fsnotify API needs to worry about.

Finally, the error codes that we used for the ext4 superblock are
*not* more descriptive than errno's --- we only have 16 ext4-specific
error codes, and there are far more errno values.

Cheers,

					- Ted
diff mbox series

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 88d5d274a868..67183e6b1920 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -46,6 +46,7 @@ 
 #include <linux/part_stat.h>
 #include <linux/kthread.h>
 #include <linux/freezer.h>
+#include <linux/fsnotify.h>
 
 #include "ext4.h"
 #include "ext4_extents.h"	/* Needed for trace points definition */
@@ -759,6 +760,8 @@  void __ext4_error(struct super_block *sb, const char *function,
 		       sb->s_id, function, line, current->comm, &vaf);
 		va_end(args);
 	}
+	fsnotify_sb_error(sb, NULL, error);
+
 	ext4_handle_error(sb, force_ro, error, 0, block, function, line);
 }
 
@@ -789,6 +792,8 @@  void __ext4_error_inode(struct inode *inode, const char *function,
 			       current->comm, &vaf);
 		va_end(args);
 	}
+	fsnotify_sb_error(inode->i_sb, inode, error);
+
 	ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block,
 			  function, line);
 }
@@ -827,6 +832,8 @@  void __ext4_error_file(struct file *file, const char *function,
 			       current->comm, path, &vaf);
 		va_end(args);
 	}
+	fsnotify_sb_error(inode->i_sb, inode, EFSCORRUPTED);
+
 	ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block,
 			  function, line);
 }
@@ -894,6 +901,7 @@  void __ext4_std_error(struct super_block *sb, const char *function,
 		printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n",
 		       sb->s_id, function, line, errstr);
 	}
+	fsnotify_sb_error(sb, NULL, errno);
 
 	ext4_handle_error(sb, false, -errno, 0, 0, function, line);
 }