mbox series

[v9,00/31] file system-wide error monitoring

Message ID 20211025192746.66445-1-krisman@collabora.com (mailing list archive)
Headers show
Series file system-wide error monitoring | expand

Message

Gabriel Krisman Bertazi Oct. 25, 2021, 7:27 p.m. UTC
Hi,

This is the 9th version of this patch series.  Thank you, Amir, Jan and
Ted, for the feedback in the previous versions.

The main difference in this version is that the pool is no longer
resizeable nor limited in number of marks, even though we only
pre-allocate 32 slots.  In addition, ext4 was modified to always return
non-zero errno, and the documentation was fixed accordingly (No longer
suggests we return EXT4_ERR* values.

I also droped the Reviewed-by tags from the ext4 patch, due to the
changes above.

Please let me know what you think.

This was tested with LTP for regressions and also using the sample code
on the last patch, with a corrupted image.  I wrote a new ltp test for
this feature which is being reviewed and is available at:

  https://gitlab.collabora.com/krisman/ltp  -b fan-fs-error

In addition, I wrote a man-page that can be pulled from:

  https://gitlab.collabora.com/krisman/man-pages.git -b fan-fs-error

And is being reviewed at the list.

I also pushed this full series to:

  https://gitlab.collabora.com/krisman/linux -b fanotify-notifications-v8

Thank you

Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>
Cc: jack@suse.com
To: amir73il@gmail.com
Cc: dhowells@redhat.com
Cc: khazhy@google.com
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-ext4@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: linux-api@vger.kernel.org

Amir Goldstein (3):
  fsnotify: pass data_type to fsnotify_name()
  fsnotify: pass dentry instead of inode data
  fsnotify: clarify contract for create event hooks

Gabriel Krisman Bertazi (28):
  fsnotify: Don't insert unmergeable events in hashtable
  fanotify: Fold event size calculation to its own function
  fanotify: Split fsid check from other fid mode checks
  inotify: Don't force FS_IN_IGNORED
  fsnotify: Add helper to detect overflow_event
  fsnotify: Add wrapper around fsnotify_add_event
  fsnotify: Retrieve super block from the data field
  fsnotify: Protect fsnotify_handle_inode_event from no-inode events
  fsnotify: Pass group argument to free_event
  fanotify: Support null inode event in fanotify_dfid_inode
  fanotify: Allow file handle encoding for unhashed events
  fanotify: Encode empty file handle when no inode is provided
  fanotify: Require fid_mode for any non-fd event
  fsnotify: Support FS_ERROR event type
  fanotify: Reserve UAPI bits for FAN_FS_ERROR
  fanotify: Pre-allocate pool of error events
  fanotify: Support enqueueing of error events
  fanotify: Support merging of error events
  fanotify: Wrap object_fh inline space in a creator macro
  fanotify: Add helpers to decide whether to report FID/DFID
  fanotify: Report fid entry even for zero-length file_handle
  fanotify: WARN_ON against too large file handles
  fanotify: Report fid info for file related file system errors
  fanotify: Emit generic error info for error event
  fanotify: Allow users to request FAN_FS_ERROR events
  ext4: Send notifications on error
  samples: Add fs error monitoring example
  docs: Document the FAN_FS_ERROR event

 .../admin-guide/filesystem-monitoring.rst     |  74 +++++++++
 Documentation/admin-guide/index.rst           |   1 +
 fs/ext4/super.c                               |   8 +
 fs/nfsd/filecache.c                           |   3 +
 fs/notify/fanotify/fanotify.c                 | 117 +++++++++++--
 fs/notify/fanotify/fanotify.h                 |  54 +++++-
 fs/notify/fanotify/fanotify_user.c            | 156 +++++++++++++-----
 fs/notify/fsnotify.c                          |  10 +-
 fs/notify/group.c                             |   2 +-
 fs/notify/inotify/inotify_fsnotify.c          |   5 +-
 fs/notify/inotify/inotify_user.c              |   6 +-
 fs/notify/notification.c                      |  14 +-
 include/linux/fanotify.h                      |   9 +-
 include/linux/fsnotify.h                      |  58 +++++--
 include/linux/fsnotify_backend.h              |  96 ++++++++++-
 include/uapi/linux/fanotify.h                 |   8 +
 kernel/audit_fsnotify.c                       |   3 +-
 kernel/audit_watch.c                          |   3 +-
 samples/Kconfig                               |   9 +
 samples/Makefile                              |   1 +
 samples/fanotify/Makefile                     |   5 +
 samples/fanotify/fs-monitor.c                 | 142 ++++++++++++++++
 22 files changed, 685 insertions(+), 99 deletions(-)
 create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst
 create mode 100644 samples/fanotify/Makefile
 create mode 100644 samples/fanotify/fs-monitor.c

Comments

Amir Goldstein Oct. 26, 2021, 9:12 a.m. UTC | #1
On Mon, Oct 25, 2021 at 10:27 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Hi,
>
> This is the 9th version of this patch series.  Thank you, Amir, Jan and
> Ted, for the feedback in the previous versions.
>
> The main difference in this version is that the pool is no longer
> resizeable nor limited in number of marks, even though we only
> pre-allocate 32 slots.  In addition, ext4 was modified to always return
> non-zero errno, and the documentation was fixed accordingly (No longer
> suggests we return EXT4_ERR* values.
>
> I also droped the Reviewed-by tags from the ext4 patch, due to the
> changes above.
>
> Please let me know what you think.
>

All good on my end.
I've made a couple of minor comments that
could be addressed on commit if no other issues are found.

Thanks,
Amir.
Jan Kara Oct. 27, 2021, 11:22 a.m. UTC | #2
On Tue 26-10-21 12:12:38, Amir Goldstein wrote:
> On Mon, Oct 25, 2021 at 10:27 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
> >
> > Hi,
> >
> > This is the 9th version of this patch series.  Thank you, Amir, Jan and
> > Ted, for the feedback in the previous versions.
> >
> > The main difference in this version is that the pool is no longer
> > resizeable nor limited in number of marks, even though we only
> > pre-allocate 32 slots.  In addition, ext4 was modified to always return
> > non-zero errno, and the documentation was fixed accordingly (No longer
> > suggests we return EXT4_ERR* values.
> >
> > I also droped the Reviewed-by tags from the ext4 patch, due to the
> > changes above.
> >
> > Please let me know what you think.
> >
> 
> All good on my end.
> I've made a couple of minor comments that
> could be addressed on commit if no other issues are found.

All good on my end as well. I've applied all the minor updates, tested the
result and pushed it out to fsnotify branch of my tree. WRT to your new
FS_ERROR LTP tests, I've noticed that the testcases 1 and 3 from test
fanotify20 fail for me. After a bit of debugging this seems to be a bug in
ext4 where it calls ext4_abort() with EXT4_ERR_ESHUTDOWN instead of plain
ESHUTDOWN. Not sure if you have that fixed or how come the tests passed for
you. After fixing that ext4 bug everything passes for me.

								Honza
Amir Goldstein Oct. 27, 2021, 12:36 p.m. UTC | #3
On Wed, Oct 27, 2021 at 2:22 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 26-10-21 12:12:38, Amir Goldstein wrote:
> > On Mon, Oct 25, 2021 at 10:27 PM Gabriel Krisman Bertazi
> > <krisman@collabora.com> wrote:
> > >
> > > Hi,
> > >
> > > This is the 9th version of this patch series.  Thank you, Amir, Jan and
> > > Ted, for the feedback in the previous versions.
> > >
> > > The main difference in this version is that the pool is no longer
> > > resizeable nor limited in number of marks, even though we only
> > > pre-allocate 32 slots.  In addition, ext4 was modified to always return
> > > non-zero errno, and the documentation was fixed accordingly (No longer
> > > suggests we return EXT4_ERR* values.
> > >
> > > I also droped the Reviewed-by tags from the ext4 patch, due to the
> > > changes above.
> > >
> > > Please let me know what you think.
> > >
> >
> > All good on my end.
> > I've made a couple of minor comments that
> > could be addressed on commit if no other issues are found.
>
> All good on my end as well. I've applied all the minor updates, tested the
> result and pushed it out to fsnotify branch of my tree. WRT to your new
> FS_ERROR LTP tests, I've noticed that the testcases 1 and 3 from test
> fanotify20 fail for me. After a bit of debugging this seems to be a bug in
> ext4 where it calls ext4_abort() with EXT4_ERR_ESHUTDOWN instead of plain
> ESHUTDOWN. Not sure if you have that fixed or how come the tests passed for
> you. After fixing that ext4 bug everything passes for me.
>

Gabriel mentioned that bug in the cover letter of the LTP series :-)
https://lore.kernel.org/linux-ext4/20211026173302.84000-1-krisman@collabora.com/T/#u

Thanks,
Amir.
Gabriel Krisman Bertazi Oct. 27, 2021, 1:03 p.m. UTC | #4
Amir Goldstein <amir73il@gmail.com> writes:

> On Wed, Oct 27, 2021 at 2:22 PM Jan Kara <jack@suse.cz> wrote:
>>
>> On Tue 26-10-21 12:12:38, Amir Goldstein wrote:
>> > On Mon, Oct 25, 2021 at 10:27 PM Gabriel Krisman Bertazi
>> > <krisman@collabora.com> wrote:
>> > >
>> > > Hi,
>> > >
>> > > This is the 9th version of this patch series.  Thank you, Amir, Jan and
>> > > Ted, for the feedback in the previous versions.
>> > >
>> > > The main difference in this version is that the pool is no longer
>> > > resizeable nor limited in number of marks, even though we only
>> > > pre-allocate 32 slots.  In addition, ext4 was modified to always return
>> > > non-zero errno, and the documentation was fixed accordingly (No longer
>> > > suggests we return EXT4_ERR* values.
>> > >
>> > > I also droped the Reviewed-by tags from the ext4 patch, due to the
>> > > changes above.
>> > >
>> > > Please let me know what you think.
>> > >
>> >
>> > All good on my end.
>> > I've made a couple of minor comments that
>> > could be addressed on commit if no other issues are found.
>>
>> All good on my end as well. I've applied all the minor updates, tested the
>> result and pushed it out to fsnotify branch of my tree. WRT to your new
>> FS_ERROR LTP tests, I've noticed that the testcases 1 and 3 from test
>> fanotify20 fail for me. After a bit of debugging this seems to be a bug in
>> ext4 where it calls ext4_abort() with EXT4_ERR_ESHUTDOWN instead of plain
>> ESHUTDOWN. Not sure if you have that fixed or how come the tests passed for
>> you. After fixing that ext4 bug everything passes for me.
>>
>
> Gabriel mentioned that bug in the cover letter of the LTP series :-)
> https://lore.kernel.org/linux-ext4/20211026173302.84000-1-krisman@collabora.com/T/#u

Yes :)

Also, thank you both for the extensive review and ideas during the
development of this series.  It was really appreciated!

I'm sending out the new version for tests + man pages today.
Amir Goldstein Oct. 28, 2021, 5:55 a.m. UTC | #5
> Also, thank you both for the extensive review and ideas during the
> development of this series.  It was really appreciated!
>

Thank you for your appreciated effort!
It was a wild journey through some interesting experiments, but
you survived it well ;-)

Would you be interested in pursuing FAN_WB_ERROR after a due rest
and after all the dust on FAN_FS_ERROR has settled?

FAN_WB_ERROR can use the same info record and same internal
error event struct as FAN_FS_ERROR.

A call to fsnotify_wb_error(sb, inode, err) can be placed inside
mapping_set_error() and maybe for other sporadic callers of errseq_set().

For wb error, we can consider storing a snapshot of errseq of the sb/inode
in the sb/inode mark and compute error_count from the errseq diff
instead of counting it when merging events. This will keep a more accurate
report even when error reports are dropped due to allocation failure or event
queue overflow.

I have a feeling that the Postgres project would find this
functionality useful (?).

Thanks,
Amir.
Gabriel Krisman Bertazi Oct. 29, 2021, 10:23 p.m. UTC | #6
Amir Goldstein <amir73il@gmail.com> writes:

>> Also, thank you both for the extensive review and ideas during the
>> development of this series.  It was really appreciated!
>>
>
> Thank you for your appreciated effort!
> It was a wild journey through some interesting experiments, but
> you survived it well ;-)
>
> Would you be interested in pursuing FAN_WB_ERROR after a due rest
> and after all the dust on FAN_FS_ERROR has settled?

I think it would make sense for me to continue working on it, yes.  But,
before that, I think I still have some support to add to FAN_FS_ERROR,
like a detailed, fs-specific, info record, and an error location info
record, which has a use-case in Google Cloud environments.  I have to
discuss priorities internally, but we (collabora) do have an interest in
supporting WB_ERROR too.

For the detailed error report, fanotify could have a new info record
that carries a structure sent out by the file system.  fanotify could
handle the lifetime of this object, by keeping a larger mempool, or
delegate its allocation/destruction to the filesystem.

Like I proposed in an earlier version of FAN_FS_ERROR, the format could
be as simple as:

struct fanotify_error_data_info {
   struct fanotify_event_info_header hdr;
   char data[];
}

I think xfs, at least, would be able to make good use of this record with
xfs_scrub, as the xfs maintainers mentioned.
Amir Goldstein Oct. 30, 2021, 6:53 a.m. UTC | #7
On Sat, Oct 30, 2021 at 1:24 AM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Amir Goldstein <amir73il@gmail.com> writes:
>
> >> Also, thank you both for the extensive review and ideas during the
> >> development of this series.  It was really appreciated!
> >>
> >
> > Thank you for your appreciated effort!
> > It was a wild journey through some interesting experiments, but
> > you survived it well ;-)
> >
> > Would you be interested in pursuing FAN_WB_ERROR after a due rest
> > and after all the dust on FAN_FS_ERROR has settled?
>
> I think it would make sense for me to continue working on it, yes.  But,
> before that, I think I still have some support to add to FAN_FS_ERROR,
> like a detailed, fs-specific, info record, and an error location info
> record, which has a use-case in Google Cloud environments.  I have to
> discuss priorities internally, but we (collabora) do have an interest in
> supporting WB_ERROR too.
>
> For the detailed error report, fanotify could have a new info record
> that carries a structure sent out by the file system.  fanotify could
> handle the lifetime of this object, by keeping a larger mempool, or
> delegate its allocation/destruction to the filesystem.
>

Before you try anything radical, please check the size of prospect
fs-specific data.
My hunch says that in most cases fs-specific data could fit cozy along side the
file handle within MAX_HANDLE_SZ and if this is true, then we do not need to
worry about extreme cases right now.
If there comes a time when we have a justified case of a filesystem that needs
to report much bigger fs-specific data, we can consider it then.
Until that time, we simply drop the over sized fs-specific data same as we do
if filesystem passed in a file handle larger than MAX_HANDLE_SZ.

> Like I proposed in an earlier version of FAN_FS_ERROR, the format could
> be as simple as:
>
> struct fanotify_error_data_info {
>    struct fanotify_event_info_header hdr;
>    char data[];
> }
>

We can add char data[] field to the end of struct fanotify_event_info_error.
It does not change the layout nor size of the structure and the info record
is variable size per definition anyway.

I know Jan didn't like this so much at the time and contemplated a
separate info record for filename, but eventually, fanotify_event_info_fid
also has an optional name following the unsigned char handle[].

> I think xfs, at least, would be able to make good use of this record with
> xfs_scrub, as the xfs maintainers mentioned.

I am not sure if that was the final conclusion.
xfs_scrub is proactive and should have no problem reporting its own findings,
but I have no objections to fs-specific details in FS_ERROR event.

Thanks,
Amir.