mbox series

[v2,0/9] fanotify: add support for more event types

Message ID 20181115184544.30681-1-amir73il@gmail.com (mailing list archive)
Headers show
Series fanotify: add support for more event types | expand

Message

Amir Goldstein Nov. 15, 2018, 6:45 p.m. UTC
Jan,

This is the final part of patch series to add support for filesystem
change monitoring to fanotify.

The end game is to use:
  fd = fanotify_init(FAN_CLASS_NOTIF|FAN_REPORT_FID, ...);
  rc = fanotify_mark(fd, FAN_MARK_FILESYSTEM, FAN_CREATE|FAN_DELETE...);
to monitor changes to a large scale namespace.

This functionality was not available with inotify API, which does not
scale well with recursive directory watches and was not available
with fanotify API, which did not support directory modification events.

This patch set depends on the fsnotify prep patches posted yesterday.
The entire work based on your fsnotify branch is available on my
fanotify_dentry branch [1].

I have tested this work with some preliminary LTP tests [2] and a demo
program [3]. Matthew Bobrowski has agreed to help me with writing more
tests and man pages (thanks Matthew!).

Last patch raises a question about how to deal with FAN_ONDIR flag with
new event types and proposed a minimal viable implementation that could
be used as a base for further discussion.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fanotify_dentry
[2] https://github.com/amir73il/ltp/commits/fanotify_dentry
[3] https://github.com/amir73il/fsnotify-utils/blob/master/src/test/fanotify_demo.c

Amir Goldstein (9):
  fanotify: rename struct fanotify_{,perm_}event_info
  fanotify: define the structures to report a unique file identifier
  fanotify: classify events that hold a file identifier
  fanotify: encode file identifier for FAN_REPORT_FID
  fanotify: copy event fid info to user
  fanotify: enable FAN_REPORT_FID init flag
  fanotify: support events with data type FSNOTIFY_EVENT_DENTRY
  fanotify: add support for create/attrib/move/delete events
  fanotify: report FAN_ONDIR to listener for filename events

 fs/notify/fanotify/fanotify.c      | 199 ++++++++++++++++++++++++-----
 fs/notify/fanotify/fanotify.h      |  76 ++++++++---
 fs/notify/fanotify/fanotify_user.c | 131 ++++++++++++++++---
 fs/statfs.c                        |   3 +-
 include/linux/fanotify.h           |  31 ++++-
 include/linux/statfs.h             |   3 +
 include/uapi/linux/fanotify.h      |  45 ++++++-
 7 files changed, 408 insertions(+), 80 deletions(-)

Comments

Amir Goldstein Nov. 18, 2018, 12:09 p.m. UTC | #1
On Thu, Nov 15, 2018 at 8:45 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Jan,
>
> This is the final part of patch series to add support for filesystem
> change monitoring to fanotify.
>
> The end game is to use:
>   fd = fanotify_init(FAN_CLASS_NOTIF|FAN_REPORT_FID, ...);
>   rc = fanotify_mark(fd, FAN_MARK_FILESYSTEM, FAN_CREATE|FAN_DELETE...);
> to monitor changes to a large scale namespace.
>
> This functionality was not available with inotify API, which does not
> scale well with recursive directory watches and was not available
> with fanotify API, which did not support directory modification events.
>

And to demonstrate the power unleashed by FAN_MARK_FILESYSTEM
with FAN_REPORT_FID, I have made a prototype of "global filesystem
monitor" based on inotify-tools:
https://github.com/amir73il/inotify-tools/commits/fanotify_fid

=== This is how legacy "recursive inotify" monitoring looks like: ===

~# /vtmp/inotifywait -m -r /vdf &
Setting up watches.  Beware: since -r was given, this may take a while!
Watches established.
root@kvm-xfstests:~# mkdir -p /vdf/a/b/c/d/e/ && touch /vdf/a/b/c/d/e/x
/vdf/ CREATE,ISDIR a
/vdf/ OPEN,ISDIR a
/vdf/ CLOSE_NOWRITE,CLOSE,ISDIR a
/vdf/ OPEN,ISDIR a
/vdf/ ACCESS,ISDIR a
/vdf/ CLOSE_NOWRITE,CLOSE,ISDIR a
/vdf/a/b/c/d/e/ CREATE x
/vdf/a/b/c/d/e/ OPEN x
/vdf/a/b/c/d/e/ ATTRIB x
/vdf/a/b/c/d/e/ CLOSE_WRITE,CLOSE x

1. The inherent recursive watch race missed most dir create events.
2. Watches setup time depends on the size of directory tree.
3. Recursive watches pins to inode cache all directory inodes in the tree.
4. CREATE events carry the created filename information
5. Events are generated only under the watched tree

=== And this is how "global fanotify" monitoring looks like: ===

root@kvm-xfstests:~# /vtmp/inotifywait -m -g /vdf &
Setting up global filesystem watches.
Watches established.

root@kvm-xfstests:~# mkdir -p /vdf/a/b/c/d/e/ && touch /vdf/a/b/c/d/e/x

/vdf/ CREATE,ISDIR
Start watching /vdf/a (fid=10001f1...).
/vdf/a CLOSE_NOWRITE,OPEN,CLOSE
/vdf/a CREATE,ISDIR
/vdf/a CLOSE_NOWRITE,OPEN,CLOSE
Start watching /vdf/a/b (fid=200e4a6...).
/vdf/a/b CLOSE_NOWRITE,OPEN,CREATE,CLOSE,ISDIR
/vdf/a/b CLOSE_NOWRITE,OPEN,CLOSE
Start watching /vdf/a/b/c (fid=3000159...).
/vdf/a/b/c CLOSE_NOWRITE,OPEN,CLOSE
/vdf/a/b/c CLOSE_NOWRITE,OPEN,CLOSE
/vdf/a/b/c CREATE,ISDIR
Start watching /vdf/a/b/c/d (fid=105...).
/vdf/a/b/c/d OPEN
/vdf/a/b/c/d CLOSE_NOWRITE,OPEN,CLOSE
/vdf/a/b/c/d CLOSE_NOWRITE,CLOSE
/vdf/a/b/c/d CREATE,ISDIR
Start watching /vdf/a/b/c/d/e (fid=10001f2...).
/vdf/a/b/c/d/e CREATE
Start watching /vdf/a/b/c/d/e/x (fid=10001f3...).
/vdf/a/b/c/d/e/x ATTRIB,CLOSE_WRITE,OPEN,CLOSE
/vdf/a/b/c/d/e CLOSE_NOWRITE,OPEN,CLOSE
/vdf/a/b/c/d/e/x CLOSE_NOWRITE,OPEN,CLOSE

1. No directory create/access events are missed
2. Setup time in O(1)
3. No directory inodes are pinned to inode cache
4. CREATE events carry only the directory where file
    was created and the ISDIR flag if this was a mkdir
5. Events are generated for any object in the watched
    filesystem (can be filtered in userspace by regex on path)

NOTE: "Start watching..." in global watch means that
userspace adds an entry to fid => path map, but there is
no object associated with that "watch" in the kernel and no
kernel resource is consumed per "watch".

NOTE #2: This is just a prototype. Some use cases like
monitoring several filesystems and renaming userspace
watch entries are not supported.

Thanks,
Amir.
Jan Kara Nov. 20, 2018, 10:34 a.m. UTC | #2
On Sun 18-11-18 14:09:34, Amir Goldstein wrote:
> On Thu, Nov 15, 2018 at 8:45 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Jan,
> >
> > This is the final part of patch series to add support for filesystem
> > change monitoring to fanotify.
> >
> > The end game is to use:
> >   fd = fanotify_init(FAN_CLASS_NOTIF|FAN_REPORT_FID, ...);
> >   rc = fanotify_mark(fd, FAN_MARK_FILESYSTEM, FAN_CREATE|FAN_DELETE...);
> > to monitor changes to a large scale namespace.
> >
> > This functionality was not available with inotify API, which does not
> > scale well with recursive directory watches and was not available
> > with fanotify API, which did not support directory modification events.
> >
> 
> And to demonstrate the power unleashed by FAN_MARK_FILESYSTEM
> with FAN_REPORT_FID, I have made a prototype of "global filesystem
> monitor" based on inotify-tools:
> https://github.com/amir73il/inotify-tools/commits/fanotify_fid

This is indeed impressive :).

								Honza

> === This is how legacy "recursive inotify" monitoring looks like: ===
> 
> ~# /vtmp/inotifywait -m -r /vdf &
> Setting up watches.  Beware: since -r was given, this may take a while!
> Watches established.
> root@kvm-xfstests:~# mkdir -p /vdf/a/b/c/d/e/ && touch /vdf/a/b/c/d/e/x
> /vdf/ CREATE,ISDIR a
> /vdf/ OPEN,ISDIR a
> /vdf/ CLOSE_NOWRITE,CLOSE,ISDIR a
> /vdf/ OPEN,ISDIR a
> /vdf/ ACCESS,ISDIR a
> /vdf/ CLOSE_NOWRITE,CLOSE,ISDIR a
> /vdf/a/b/c/d/e/ CREATE x
> /vdf/a/b/c/d/e/ OPEN x
> /vdf/a/b/c/d/e/ ATTRIB x
> /vdf/a/b/c/d/e/ CLOSE_WRITE,CLOSE x
> 
> 1. The inherent recursive watch race missed most dir create events.
> 2. Watches setup time depends on the size of directory tree.
> 3. Recursive watches pins to inode cache all directory inodes in the tree.
> 4. CREATE events carry the created filename information
> 5. Events are generated only under the watched tree
> 
> === And this is how "global fanotify" monitoring looks like: ===
> 
> root@kvm-xfstests:~# /vtmp/inotifywait -m -g /vdf &
> Setting up global filesystem watches.
> Watches established.
> 
> root@kvm-xfstests:~# mkdir -p /vdf/a/b/c/d/e/ && touch /vdf/a/b/c/d/e/x
> 
> /vdf/ CREATE,ISDIR
> Start watching /vdf/a (fid=10001f1...).
> /vdf/a CLOSE_NOWRITE,OPEN,CLOSE
> /vdf/a CREATE,ISDIR
> /vdf/a CLOSE_NOWRITE,OPEN,CLOSE
> Start watching /vdf/a/b (fid=200e4a6...).
> /vdf/a/b CLOSE_NOWRITE,OPEN,CREATE,CLOSE,ISDIR
> /vdf/a/b CLOSE_NOWRITE,OPEN,CLOSE
> Start watching /vdf/a/b/c (fid=3000159...).
> /vdf/a/b/c CLOSE_NOWRITE,OPEN,CLOSE
> /vdf/a/b/c CLOSE_NOWRITE,OPEN,CLOSE
> /vdf/a/b/c CREATE,ISDIR
> Start watching /vdf/a/b/c/d (fid=105...).
> /vdf/a/b/c/d OPEN
> /vdf/a/b/c/d CLOSE_NOWRITE,OPEN,CLOSE
> /vdf/a/b/c/d CLOSE_NOWRITE,CLOSE
> /vdf/a/b/c/d CREATE,ISDIR
> Start watching /vdf/a/b/c/d/e (fid=10001f2...).
> /vdf/a/b/c/d/e CREATE
> Start watching /vdf/a/b/c/d/e/x (fid=10001f3...).
> /vdf/a/b/c/d/e/x ATTRIB,CLOSE_WRITE,OPEN,CLOSE
> /vdf/a/b/c/d/e CLOSE_NOWRITE,OPEN,CLOSE
> /vdf/a/b/c/d/e/x CLOSE_NOWRITE,OPEN,CLOSE
> 
> 1. No directory create/access events are missed
> 2. Setup time in O(1)
> 3. No directory inodes are pinned to inode cache
> 4. CREATE events carry only the directory where file
>     was created and the ISDIR flag if this was a mkdir
> 5. Events are generated for any object in the watched
>     filesystem (can be filtered in userspace by regex on path)
> 
> NOTE: "Start watching..." in global watch means that
> userspace adds an entry to fid => path map, but there is
> no object associated with that "watch" in the kernel and no
> kernel resource is consumed per "watch".
> 
> NOTE #2: This is just a prototype. Some use cases like
> monitoring several filesystems and renaming userspace
> watch entries are not supported.
> 
> Thanks,
> Amir.