Message ID | 20230918123217.932179-3-max.kellermann@ionos.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/4] inotify_user: pass directory fd to inotify_find_inode() | expand |
On Mon 18-09-23 14:32:16, Max Kellermann wrote: > This implements a missing piece in the inotify API: referring to a > file by a directory file descriptor and a path name. This can be > solved in userspace currently only by doing something similar to: > > int old = open("."); > fchdir(dfd); > inotify_add_watch(....); > fchdir(old); > > Support for LOOKUP_EMPTY is still missing. We could add another IN_* > flag for that (which would clutter the IN_* flags list further) or > add a "flags" parameter to the new system call (which would however > duplicate features already present via special IN_* flags). > > To: Jan Kara <jack@suse.cz> > Cc: Amir Goldstein <amir73il@gmail.com> > To: linux-fsdevel@vger.kernel.org > To: linux-kernel@vger.kernel.org > Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Thanks for the patches! But generally we don't add new functionality to the inotify API and rather steer users towards fanotify. In this particular case fanotify_mark(2) already has the support for dirfd + name. Is there any problem with using fanotify for you? Note that since kernel 5.13 you don't need CAP_SYS_ADMIN capability for fanotify functionality that is more-or-less equivalent to what inotify provides. Honza > --- > fs/notify/inotify/inotify_user.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c > index b6e6f6ab21f8..8a9096c5ebb1 100644 > --- a/fs/notify/inotify/inotify_user.c > +++ b/fs/notify/inotify/inotify_user.c > @@ -797,6 +797,12 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname, > return do_inotify_add_watch(fd, AT_FDCWD, pathname, mask); > } > > +SYSCALL_DEFINE4(inotify_add_watch_at, int, fd, int, dfd, const char __user *, pathname, > + u32, mask) > +{ > + return do_inotify_add_watch(fd, dfd, pathname, mask); > +} > + > SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd) > { > struct fsnotify_group *group; > -- > 2.39.2 >
On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote: > Note that since kernel 5.13 you > don't need CAP_SYS_ADMIN capability for fanotify functionality that is > more-or-less equivalent to what inotify provides. Oh, I missed that change - I remember fanotify as being inaccessible for unprivileged processes, and fanotify being designed for things like virus scanners. Indeed I should migrate my code to fanotify. If fanotify has now become the designated successor of inotify, that should be hinted in the inotify manpage, and if inotify is effectively feature-frozen, maybe that should be an extra status in the MAINTAINERS file? Max
On Mon 18-09-23 15:57:43, Max Kellermann wrote: > On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote: > > Note that since kernel 5.13 you > > don't need CAP_SYS_ADMIN capability for fanotify functionality that is > > more-or-less equivalent to what inotify provides. > > Oh, I missed that change - I remember fanotify as being inaccessible > for unprivileged processes, and fanotify being designed for things > like virus scanners. Indeed I should migrate my code to fanotify. > > If fanotify has now become the designated successor of inotify, that > should be hinted in the inotify manpage, and if inotify is effectively > feature-frozen, maybe that should be an extra status in the > MAINTAINERS file? The manpage update is a good idea. I'm not sure about the MAINTAINERS status - we do have 'Obsolete' but I'm reluctant to mark inotify as obsolete as it's perfectly fine for existing users, we fully maintain it and support it but we just don't want to extend the API anymore. Amir, what are your thoughts on this? Honza
On Mon, Sep 18, 2023 at 5:23 PM Jan Kara <jack@suse.cz> wrote: > > On Mon 18-09-23 15:57:43, Max Kellermann wrote: > > On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote: > > > Note that since kernel 5.13 you > > > don't need CAP_SYS_ADMIN capability for fanotify functionality that is > > > more-or-less equivalent to what inotify provides. > > > > Oh, I missed that change - I remember fanotify as being inaccessible > > for unprivileged processes, and fanotify being designed for things > > like virus scanners. Indeed I should migrate my code to fanotify. > > > > If fanotify has now become the designated successor of inotify, that > > should be hinted in the inotify manpage, and if inotify is effectively > > feature-frozen, maybe that should be an extra status in the > > MAINTAINERS file? > > The manpage update is a good idea. I'm not sure about the MAINTAINERS > status - we do have 'Obsolete' but I'm reluctant to mark inotify as > obsolete as it's perfectly fine for existing users, we fully maintain it > and support it but we just don't want to extend the API anymore. Amir, what > are your thoughts on this? I think that the mention of inotify vs. fanotify features in fanotify.7 man page is decent - if anyone wants to improve it I won't mind. A mention of fanotify as successor in inotify.7 man page is not a bad idea - patches welcome. As to MAINTAINERS, I think that 'Maintained' feels right. We may consider 'Odd Fixes' for inotify and certainly for 'dnotify', but that sounds a bit too harsh for the level of maintenance they get. I'd like to point out that IMO, man-page is mainly aimed for the UAPI users and MAINTAINERS is mostly aimed at bug reporters and drive-by developers who submit small fixes. When developers wish to add a feature/improvement to a subsystem, they are advised to send an RFC with their intentions to the subsystem maintainers/list to get feedback on their design before starting to implement. Otherwise, the feature could be NACKed for several reasons other than "we would rather invest in the newer API". Bottom line - I don't see a strong reason to change anything, but I also do not object to improving man page nor to switching to 'Odd Fixes' status. Thanks, Amir.
[Forked from https://lore.kernel.org/linux-fsdevel/20230918123217.932179-1-max.kellermann@ionos.com/] On Mon, Sep 18, 2023 at 6:28 PM Amir Goldstein <amir73il@gmail.com> wrote: > > On Mon, Sep 18, 2023 at 5:23 PM Jan Kara <jack@suse.cz> wrote: > > > > On Mon 18-09-23 15:57:43, Max Kellermann wrote: > > > On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote: > > > > Note that since kernel 5.13 you > > > > don't need CAP_SYS_ADMIN capability for fanotify functionality that is > > > > more-or-less equivalent to what inotify provides. > > > > > > Oh, I missed that change - I remember fanotify as being inaccessible > > > for unprivileged processes, and fanotify being designed for things > > > like virus scanners. Indeed I should migrate my code to fanotify. > > > > > > If fanotify has now become the designated successor of inotify, that > > > should be hinted in the inotify manpage, and if inotify is effectively > > > feature-frozen, maybe that should be an extra status in the > > > MAINTAINERS file? > > > > The manpage update is a good idea. I'm not sure about the MAINTAINERS > > status - we do have 'Obsolete' but I'm reluctant to mark inotify as > > obsolete as it's perfectly fine for existing users, we fully maintain it > > and support it but we just don't want to extend the API anymore. Amir, what > > are your thoughts on this? > > I think that the mention of inotify vs. fanotify features in fanotify.7 man page > is decent - if anyone wants to improve it I won't mind. > A mention of fanotify as successor in inotify.7 man page is not a bad idea - > patches welcome. > > As to MAINTAINERS, I think that 'Maintained' feels right. > We may consider 'Odd Fixes' for inotify and certainly for 'dnotify', > but that sounds a bit too harsh for the level of maintenance they get. > > I'd like to point out that IMO, man-page is mainly aimed for the UAPI > users and MAINTAINERS is mostly aimed at bug reporters and drive-by > developers who submit small fixes. > > When developers wish to add a feature/improvement to a subsystem, > they are advised to send an RFC with their intentions to the subsystem > maintainers/list to get feedback on their design before starting to implement. > Otherwise, the feature could be NACKed for several reasons other than > "we would rather invest in the newer API". > > Bottom line - I don't see a strong reason to change anything, but I also do > not object to improving man page nor to switching to 'Odd Fixes' status. > BTW, before we can really mark inotify as Obsolete and document that inotify was superseded by fanotify, there are at least two items on the TODO list [1]: 1. UNMOUNT/IGNORED events 2. Filesystems without fid support MOUNT/UNMOUNT fanotify events have already been discussed and the feature has known users. Christian has also mentioned [1] the IN_UNMOUNT use case for waiting for sb shutdown several times and I will not be surprised to see systemd starting to use inotify for that use case before too long... Regarding the second item on the TODO list, we have had this discussion before - if we are going to continue the current policy of opting-in to fanotify (i.e. tmpfs, fuse, overlayfs, kernfs), we will always have odd filesystems that only support inotify and we will need to keep supporting inotify only for the users that use inotify on those odd filesystems. OTOH, if we implement FAN_REPORT_DINO_NAME, we could have fanotify inode mark support for any filesystem, where the pinned marked inode ino is the object id. Thanks, Amir. [1] https://github.com/amir73il/fsnotify-utils/wiki/fsnotify-TODO [2] https://lore.kernel.org/linux-fsdevel/20230908-verflachen-neudefinition-4da649d673a9@brauner/
On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote:
> Is there any problem with using fanotify for you?
Turns out fanotify is unusable for me, unfortunately.
I have been using inotify to get notifications of cgroup events, but
the cgroup filesystem appears to be unsupported by fanotify: all
attempts to use fanotify_mark() on cgroup event files fail with
ENODEV. I think that comes from fanotify_test_fsid(). Filesystems
without a fsid work just fine with inotify, but fail with fanotify.
Since fanotify lacks important features, is it really a good idea to
feature-freeze inotify?
(By the way, what was not documented is that fanotify_init() can only
be used by unprivileged processes if the FAN_REPORT_FID flag was
specified. I had to read the kernel sources to figure that out - I
have no idea why this limitation exists - the code comment in the
kernel source doesn't explain it.)
On Mon, Sep 18, 2023 at 9:05 PM Amir Goldstein <amir73il@gmail.com> wrote: > > [Forked from https://lore.kernel.org/linux-fsdevel/20230918123217.932179-1-max.kellermann@ionos.com/] > > On Mon, Sep 18, 2023 at 6:28 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > On Mon, Sep 18, 2023 at 5:23 PM Jan Kara <jack@suse.cz> wrote: > > > > > > On Mon 18-09-23 15:57:43, Max Kellermann wrote: > > > > On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote: > > > > > Note that since kernel 5.13 you > > > > > don't need CAP_SYS_ADMIN capability for fanotify functionality that is > > > > > more-or-less equivalent to what inotify provides. > > > > > > > > Oh, I missed that change - I remember fanotify as being inaccessible > > > > for unprivileged processes, and fanotify being designed for things > > > > like virus scanners. Indeed I should migrate my code to fanotify. > > > > > > > > If fanotify has now become the designated successor of inotify, that > > > > should be hinted in the inotify manpage, and if inotify is effectively > > > > feature-frozen, maybe that should be an extra status in the > > > > MAINTAINERS file? > > > > > > The manpage update is a good idea. I'm not sure about the MAINTAINERS > > > status - we do have 'Obsolete' but I'm reluctant to mark inotify as > > > obsolete as it's perfectly fine for existing users, we fully maintain it > > > and support it but we just don't want to extend the API anymore. Amir, what > > > are your thoughts on this? > > > > I think that the mention of inotify vs. fanotify features in fanotify.7 man page > > is decent - if anyone wants to improve it I won't mind. > > A mention of fanotify as successor in inotify.7 man page is not a bad idea - > > patches welcome. > > > > As to MAINTAINERS, I think that 'Maintained' feels right. > > We may consider 'Odd Fixes' for inotify and certainly for 'dnotify', > > but that sounds a bit too harsh for the level of maintenance they get. > > > > I'd like to point out that IMO, man-page is mainly aimed for the UAPI > > users and MAINTAINERS is mostly aimed at bug reporters and drive-by > > developers who submit small fixes. > > > > When developers wish to add a feature/improvement to a subsystem, > > they are advised to send an RFC with their intentions to the subsystem > > maintainers/list to get feedback on their design before starting to implement. > > Otherwise, the feature could be NACKed for several reasons other than > > "we would rather invest in the newer API". > > > > Bottom line - I don't see a strong reason to change anything, but I also do > > not object to improving man page nor to switching to 'Odd Fixes' status. > > > > BTW, before we can really mark inotify as Obsolete and document that > inotify was superseded by fanotify, there are at least two items on the > TODO list [1]: > 1. UNMOUNT/IGNORED events > 2. Filesystems without fid support > > MOUNT/UNMOUNT fanotify events have already been discussed > and the feature has known users. > > Christian has also mentioned [1] the IN_UNMOUNT use case for > waiting for sb shutdown several times and I will not be surprised > to see systemd starting to use inotify for that use case before too long... > > Regarding the second item on the TODO list, we have had this discussion > before - if we are going to continue the current policy of opting-in to fanotify > (i.e. tmpfs, fuse, overlayfs, kernfs), we will always have odd filesystems that > only support inotify and we will need to keep supporting inotify only for the > users that use inotify on those odd filesystems. > > OTOH, if we implement FAN_REPORT_DINO_NAME, we could > have fanotify inode mark support for any filesystem, where the > pinned marked inode ino is the object id. > Hi Max, Not sure if you have seen my email before asking your question on the original patch review thread. I prefer to answer it here in the wider context of inotify maintenance, because it touches directly on the topic I raised above. On Mon, Sep 18, 2023 at 10:45 PM Max Kellermann <max.kellermann@ionos.com> wrote: > > On Mon, Sep 18, 2023 at 2:40 PM Jan Kara <jack@suse.cz> wrote: > > Is there any problem with using fanotify for you? > > Turns out fanotify is unusable for me, unfortunately. > I have been using inotify to get notifications of cgroup events, but > the cgroup filesystem appears to be unsupported by fanotify: all > attempts to use fanotify_mark() on cgroup event files fail with > ENODEV. I think that comes from fanotify_test_fsid(). Filesystems > without a fsid work just fine with inotify, but fail with fanotify. > This was just fixed by Ivan in commit: 0ce7c12e88cf ("kernfs: attach uuid for every kernfs and report it in fsid") > Since fanotify lacks important features, is it really a good idea to > feature-freeze inotify? As my summary above states, it is correct that fanotify does not yet fully supersedes inotify, but there is a plan to go in this direction, hence, inotify is "being phased out" it is not Obsolete nor Deprecated. However, the question to be asked is different IMO: When both inotify and fanotify do not support the use case at hand (as in your case), which is better? to fix/improve inotify or to fix/improve fanotify? For me, there should be a very strong reason to choose improving inotify over improving fanotify. With the case at hand, you can see that the patch to improve fanotify to support your use case was far simpler (in LOC at least) than your patches, not to mention, not having to add a new syscall and new documentation for an old phased out API. But there may be exceptions, for example, in 4.19, inotify gained a new feature: 4d97f7d53da7 ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") I am not sure if this patch would have been accepted nowadays, but we need to judge every case. > > (By the way, what was not documented is that fanotify_init() can only > be used by unprivileged processes if the FAN_REPORT_FID flag was fanotify_init(2): Prior to Linux 5.13, calling fanotify_init() required the CAP_SYS_ADMIN capability. Since Linux 5.13, users may call fanotify_init() without the CAP_SYS_ADMIN capability to create and initialize an fanotify group with limited functionality. The limitations imposed on an event listener created by a user without the CAP_SYS_ADMIN capability are as follows: ... • The user is required to create a group that identifies filesystem objects by file handles, for example, by providing the FAN_REPORT_FID flag. I find this documentation that was written by Matthew very good, but writing documentation is not my strong side and if you feel that any part of the documentation should be improved I highly appreciate the feedback and would appreciate man page patches even more. When we get to the point that the missing functionality gaps between inotify and fanotify have been closed, I will surely follow your advice to mention that in the inotify man page and possibly also in MAINTAINERS. > specified. I had to read the kernel sources to figure that out - I > have no idea why this limitation exists - the code comment in the > kernel source doesn't explain it.) The legacy fanotify events open and report an event->fd as a way to identify the object - that is not a safe practice for unprivileged listeners for several reasons. FAN_REPORT_FID is designed in a way to be almost a drop in replacement for inotify watch descriptors as an opaque identifier of the object, except that fsid+fhanle provide much stronger functionality than wd did. The limitation that FAN_REPORT_FID requires that fs has fsid+fhandle is a technicality. It can be solved by either providing fsid and trivial encode_fh() (*) to the filesystem in question (as was done in 6.6-rc1 for overlayfs and kernfs) or by introducing a new mode FAN_REPORT_INO which reports inode number instead of fsid+fhandle and is enough for listeners that watch directories and files on a single fs. Thanks, Amir. (*) the ability for fs to support only encode_fh() was added in kernel v6.5 96b2b072ee62 ("exportfs: allow exporting non-decodeable file handles to userspace") and a man page patch was already posted [3]. > > [1] https://github.com/amir73il/fsnotify-utils/wiki/fsnotify-TODO > [2] https://lore.kernel.org/linux-fsdevel/20230908-verflachen-neudefinition-4da649d673a9@brauner/ [3] https://lore.kernel.org/linux-fsdevel/20230903120433.2605027-1-amir73il@gmail.com/
On Tue, Sep 19, 2023 at 9:17 AM Amir Goldstein <amir73il@gmail.com> wrote: > This was just fixed by Ivan in commit: > 0ce7c12e88cf ("kernfs: attach uuid for every kernfs and report it in fsid") Indeed, nice to see this will soon be fixed. > As my summary above states, it is correct that fanotify does not > yet fully supersedes inotify, but there is a plan to go in this direction, > hence, inotify is "being phased out" it is not Obsolete nor Deprecated. I agree that if inotify is to be phased out, we should concentrate on fanotify. I'm however somewhat disappointed with the complexity of the fanotify API. I'm not entirely convinced that fanotify is a good successor for inotify, or that inotify should really be replaced. The additional features that fanotify has could have been added to inotify instead; I don't get why this needs an entirely new API. Of course, I'm late to complain, having just learned about (the unprivileged availability of) fanotify many years after it has been invented. System calls needed for one inotify event: - read() System calls needed for one fanotify event: - read() - (do some magic to look up the fsid - https://github.com/martinpitt/fatrace/blob/master/fatrace.c implements a lookup table, yet more complexity that doesn't exist with inotify) - open() to get a file descriptor for the fsid - open_by_handle_at(fsid_fd, fid.handle) - readlink("/proc/self/fd/%d") (which adds a dependency on /proc being mounted) - close(fd) - close(fsid_fd) I should mention that this workflow still needs a privileged process - yes, I can use fanotify in an unprivileged process, but open_by_handle_at() is a privileged system call - it requires CAP_DAC_READ_SEARCH. Without it, I cannot obtain information on which file was modified, can I? There is FAN_REPORT_NAME, but it gives me only the name of the directory entry; but since I'm watching a lot of files and all of them are called "memory.events.local", that's of no use. Or am I supposed to use name_to_handle_at() on all watched files to roll my own lookup? (The name_to_hamdle_at() manpage doesn't make me confident it's a reliable system call; it sounds like it needs explicit support from filesystems.) > > (By the way, what was not documented is that fanotify_init() can only > > be used by unprivileged processes if the FAN_REPORT_FID flag was [...] > I find this documentation that was written by Matthew very good, Indeed! That's my mistake, I missed this section. > FAN_REPORT_FID is designed in a way to be almost a drop in replacement > for inotify watch descriptors as an opaque identifier of the object, except that > fsid+fhanle provide much stronger functionality than wd did. How is it stronger?
> Christian has also mentioned [1] the IN_UNMOUNT use case for > waiting for sb shutdown several times and I will not be surprised > to see systemd starting to use inotify for that use case before too long... I think that having a version of IN_UMOUNT for fanotify would be great. I've said so a couple of times indeed. It is a really good feature to monitor superblock deactivation.
On Mon 18-09-23 21:05:11, Amir Goldstein wrote: > [Forked from https://lore.kernel.org/linux-fsdevel/20230918123217.932179-1-max.kellermann@ionos.com/] ... > BTW, before we can really mark inotify as Obsolete and document that > inotify was superseded by fanotify, there are at least two items on the > TODO list [1]: Yeah, as I wrote in the original thread, I don't feel like inotify should be marked as obsolete (at least for some more time) so we are on the same page here I think. > 1. UNMOUNT/IGNORED events > 2. Filesystems without fid support > > MOUNT/UNMOUNT fanotify events have already been discussed > and the feature has known users. > > Christian has also mentioned [1] the IN_UNMOUNT use case for > waiting for sb shutdown several times and I will not be surprised > to see systemd starting to use inotify for that use case before too long... Yup, both FAN_UNMOUNT and FAN_IGNORED should be easy. Unlike inotify, I'd just make these explicit events you can opt into and not something you always get. > Regarding the second item on the TODO list, we have had this discussion > before - if we are going to continue the current policy of opting-in to > fanotify (i.e. tmpfs, fuse, overlayfs, kernfs), we will always have odd > filesystems that only support inotify and we will need to keep supporting > inotify only for the users that use inotify on those odd filesystems. > > OTOH, if we implement FAN_REPORT_DINO_NAME, we could > have fanotify inode mark support for any filesystem, where the > pinned marked inode ino is the object id. Is it a real problem after your work to allow filehandles that are not necessarily usable for NFS export or open_by_handle()? As far as I remember fanotify should be now able to handle anything that provides f_fsid in its statfs(2) call. And as I'm checking filesystems not setting fsid currently are: afs, coda, nfs - networking filesystems where inotify and fanotify have dubious value anyway configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore, ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is quite limited. But some special cases could be useful. Adding fsid support is the same amount of trouble as for kernfs - a few LOC. In fact, we could perhaps add a fstype flag to indicate that this is a filesystem without persistent identification and so uuid should be autogenerated on mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid. This way we could handle all these filesystems with trivial amount of effort. freevxfs - the only real filesystem without f_fsid. Trivial to handle one way or the other. So I don't think we need new uAPI additions to finish off this TODO item. Honza
On Tue 19-09-23 11:08:21, Max Kellermann wrote: > On Tue, Sep 19, 2023 at 9:17 AM Amir Goldstein <amir73il@gmail.com> wrote: > > As my summary above states, it is correct that fanotify does not > > yet fully supersedes inotify, but there is a plan to go in this direction, > > hence, inotify is "being phased out" it is not Obsolete nor Deprecated. > > I agree that if inotify is to be phased out, we should concentrate on fanotify. > > I'm however somewhat disappointed with the complexity of the fanotify > API. I'm not entirely convinced that fanotify is a good successor for > inotify, or that inotify should really be replaced. The additional > features that fanotify has could have been added to inotify instead; I > don't get why this needs an entirely new API. Of course, I'm late to > complain, having just learned about (the unprivileged availability of) > fanotify many years after it has been invented. > > System calls needed for one inotify event: > - read() > > System calls needed for one fanotify event: > - read() > - (do some magic to look up the fsid - > https://github.com/martinpitt/fatrace/blob/master/fatrace.c implements > a lookup table, yet more complexity that doesn't exist with inotify) > - open() to get a file descriptor for the fsid > - open_by_handle_at(fsid_fd, fid.handle) > - readlink("/proc/self/fd/%d") (which adds a dependency on /proc being mounted) > - close(fd) > - close(fsid_fd) > > I should mention that this workflow still needs a privileged process - > yes, I can use fanotify in an unprivileged process, but > open_by_handle_at() is a privileged system call - it requires > CAP_DAC_READ_SEARCH. Without it, I cannot obtain information on which > file was modified, can I? > There is FAN_REPORT_NAME, but it gives me only the name of the > directory entry; but since I'm watching a lot of files and all of them > are called "memory.events.local", that's of no use. > > Or am I supposed to use name_to_handle_at() on all watched files to > roll my own lookup? (The name_to_hamdle_at() manpage doesn't make me > confident it's a reliable system call; it sounds like it needs > explicit support from filesystems.) So with inotify event, you get back 'wd' and 'name' to identify the object where the event happened. How is this (for your usecase) different from getting back 'fsid + handle' and 'name' back from fanotify? In inotify case you had to somehow track wd -> path linkage, with fanotify you need to track 'fsid + handle' -> path linkage. > > FAN_REPORT_FID is designed in a way to be almost a drop in replacement > > for inotify watch descriptors as an opaque identifier of the object, except that > > fsid+fhanle provide much stronger functionality than wd did. > > How is it stronger? For your particular usecase I don't think there's any advantage of fsid+fhandle over plain wd. But if you want to monitor multiple filesystems or if you have priviledged process that can open by handle, or a standard filesystem where handles are actually persistent, then there are benefits. Honza
On Tue, Sep 19, 2023 at 1:01 PM Jan Kara <jack@suse.cz> wrote: > > On Tue 19-09-23 11:08:21, Max Kellermann wrote: > > On Tue, Sep 19, 2023 at 9:17 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > As my summary above states, it is correct that fanotify does not > > > yet fully supersedes inotify, but there is a plan to go in this direction, > > > hence, inotify is "being phased out" it is not Obsolete nor Deprecated. > > > > I agree that if inotify is to be phased out, we should concentrate on fanotify. > > > > I'm however somewhat disappointed with the complexity of the fanotify > > API. I'm not entirely convinced that fanotify is a good successor for > > inotify, or that inotify should really be replaced. The additional > > features that fanotify has could have been added to inotify instead; I > > don't get why this needs an entirely new API. Of course, I'm late to > > complain, having just learned about (the unprivileged availability of) > > fanotify many years after it has been invented. > > > > System calls needed for one inotify event: > > - read() > > > > System calls needed for one fanotify event: > > - read() > > - (do some magic to look up the fsid - > > https://github.com/martinpitt/fatrace/blob/master/fatrace.c implements > > a lookup table, yet more complexity that doesn't exist with inotify) > > - open() to get a file descriptor for the fsid > > - open_by_handle_at(fsid_fd, fid.handle) > > - readlink("/proc/self/fd/%d") (which adds a dependency on /proc being mounted) > > - close(fd) > > - close(fsid_fd) > > > > I should mention that this workflow still needs a privileged process - > > yes, I can use fanotify in an unprivileged process, but > > open_by_handle_at() is a privileged system call - it requires > > CAP_DAC_READ_SEARCH. Without it, I cannot obtain information on which > > file was modified, can I? > > There is FAN_REPORT_NAME, but it gives me only the name of the > > directory entry; but since I'm watching a lot of files and all of them > > are called "memory.events.local", that's of no use. > > > > Or am I supposed to use name_to_handle_at() on all watched files to > > roll my own lookup? (The name_to_hamdle_at() manpage doesn't make me > > confident it's a reliable system call; it sounds like it needs > > explicit support from filesystems.) > > So with inotify event, you get back 'wd' and 'name' to identify the object > where the event happened. How is this (for your usecase) different from > getting back 'fsid + handle' and 'name' back from fanotify? In inotify case > you had to somehow track wd -> path linkage, with fanotify you need to > track 'fsid + handle' -> path linkage. > And if you want to see an implementation of a drop-in replacement of inotify/fanotify, you can take a look at: https://github.com/inotify-tools/inotify-tools/pull/134 And specifically the first commit 41b2ec4 ("Index watches by fanotify fid") to understand why fid is a drop-in replacement for wd. > > > FAN_REPORT_FID is designed in a way to be almost a drop in replacement > > > for inotify watch descriptors as an opaque identifier of the object, except that > > > fsid+fhanle provide much stronger functionality than wd did. > > > > How is it stronger? > > For your particular usecase I don't think there's any advantage of > fsid+fhandle over plain wd. But if you want to monitor multiple filesystems > or if you have priviledged process that can open by handle, or a standard > filesystem where handles are actually persistent, then there are benefits. > Those cases are demonstrated in the --filesystem functionality of the pull request above, which handles "dynamic watches" instead of having to setup watches recursively on all subdirs. Thanks, Amir.
On Tue, Sep 19, 2023 at 12:01 PM Jan Kara <jack@suse.cz> wrote: > So with inotify event, you get back 'wd' and 'name' to identify the object > where the event happened. How is this (for your usecase) different from > getting back 'fsid + handle' and 'name' back from fanotify? In inotify case > you had to somehow track wd -> path linkage, with fanotify you need to > track 'fsid + handle' -> path linkage. The wd is a simple "int" which is the return value of the system call, and it's part of "struct inotify_event". One system call for registering it, one system call fo reading it. From fanotify, I read a "struct fanotify_event_metadata", and then check variable-length follow-up structs, iterable those follow-up structs, find the one with "info_type==FAN_EVENT_INFO_TYPE_FID", now I have a "fsid" of type "__kernel_fsid_t" (a struct containing two 32-bit integers) and a "file_handle" (a variable-length opaque BLOB). What do I do with these? The answer appears to be: when I registered, I should have obtained the fsid (via statfs()) and the file_handle (via name_to_handle_at()). That's three extra system calls. One statfs(), and twice name_to_handle_at(), because the first one is needed to get the length of the buffer I need to allocate for the file_handle (and hope my filesystem supports file_handles, because apparently that's an optional feature). Just look at the name_to_handle_at() manpage for some horrors of its complexity. Imagine how much more complex the data structure for looking up the modified file is: inotify has an int as the lookup key, and fanotify has two integers plus a variable-length BLOB. > But if you want to monitor multiple filesystems I can monitor multiple filesystems with inotify. > or if you have priviledged process that can open by handle Getting an already-opened file descriptor, or just the file_handle, is certainly an interesting fanotify feature. But that could have easily been added to inotify with a new "mask" flag for the inotify_add_watch() function. > or a standard filesystem where handles are actually persistent, then there are benefits. Same here: that could have been an (optional) inotify feature, instead of making the whole complexity mandatory for everybody. Max
On Tue, Sep 19, 2023 at 12:42 PM Amir Goldstein <amir73il@gmail.com> wrote: > Those cases are demonstrated in the --filesystem functionality of the > pull request above, which handles "dynamic watches" instead of > having to setup watches recursively on all subdirs. The whole-filesystem mode is certainly the most interesting fanotify feature, and the lack of it the greatest weakness of inotify - but API-wise, it could have been implemented as a new "mask" flag in inotify_add_watch() with no extra API complexity. That still doesn't justify the huge complexity of fanotify.
On Tue, Sep 19, 2023 at 1:43 PM Max Kellermann <max.kellermann@ionos.com> wrote: > > On Tue, Sep 19, 2023 at 12:01 PM Jan Kara <jack@suse.cz> wrote: > > So with inotify event, you get back 'wd' and 'name' to identify the object > > where the event happened. How is this (for your usecase) different from > > getting back 'fsid + handle' and 'name' back from fanotify? In inotify case > > you had to somehow track wd -> path linkage, with fanotify you need to > > track 'fsid + handle' -> path linkage. > > The wd is a simple "int" which is the return value of the system call, > and it's part of "struct inotify_event". One system call for > registering it, one system call fo reading it. > > From fanotify, I read a "struct fanotify_event_metadata", and then > check variable-length follow-up structs, iterable those follow-up > structs, find the one with "info_type==FAN_EVENT_INFO_TYPE_FID", now I > have a "fsid" of type "__kernel_fsid_t" (a struct containing two > 32-bit integers) and a "file_handle" (a variable-length opaque BLOB). > What do I do with these? > > The answer appears to be: when I registered, I should have obtained > the fsid (via statfs()) and the file_handle (via name_to_handle_at()). > That's three extra system calls. One statfs(), and twice > name_to_handle_at(), because the first one is needed to get the length > of the buffer I need to allocate for the file_handle (and hope my > filesystem supports file_handles, because apparently that's an > optional feature). Just look at the name_to_handle_at() manpage for > some horrors of its complexity. > > Imagine how much more complex the data structure for looking up the > modified file is: inotify has an int as the lookup key, and fanotify > has two integers plus a variable-length BLOB. > You are not describing a technical problem. Any API complexity can be hidden from users with userspace libraries. You can use the inotify-tools lib if you prefer. I assure you that the added complexity to the API was not done to make your life harder. inotify API has several design flaws and fanotify API extensions were designed to address those flaws. > > But if you want to monitor multiple filesystems > > I can monitor multiple filesystems with inotify. > > > or if you have priviledged process that can open by handle > > Getting an already-opened file descriptor, or just the file_handle, is > certainly an interesting fanotify feature. But that could have easily > been added to inotify with a new "mask" flag for the > inotify_add_watch() function. > "could have easily been added" is not a statement that I am willing to accept. You are saying that because you do not understand the complexity involved and that is fine - you can ask. The things that you are complaining about in the API are the exact things that were needed to make the advanced features work. Beyond that, it is a matter of API consolidation - We prefer to maintain a single unified API that can cover all the use cases over maintaining several overlapping APIs. The complexity added to the API for simple use cases can be mitigated with user libraries - it is not a good reason IMO to keep maintaining an old limited API in parallel to a new improved one. Thanks, Amir.
On Tue, Sep 19, 2023 at 12:59 PM Amir Goldstein <amir73il@gmail.com> wrote: > Any API complexity can be hidden from users with userspace > libraries. You can use the inotify-tools lib if you prefer. That doesn't convince me at all, but that's a question of taste. We'll just keep using inotify (with a patched kernel, which we have anyway). > > Getting an already-opened file descriptor, or just the file_handle, is > > certainly an interesting fanotify feature. But that could have easily > > been added to inotify with a new "mask" flag for the > > inotify_add_watch() function. > > > > "could have easily been added" is not a statement that I am willing > to accept. Are you willing to take a bet? I come up with a patch for implementing this for inotify, let's say within a week, and you agree to merge it? (I'm not interested in this feature, I won't ever use it - all I wanted is dfd support for inotify_add_watch()). > The things that you are complaining about in the API are the exact > things that were needed to make the advanced features work. Not exactly - I complain that fanotify makes the complexity mandatory, the complexity is the baseline of the API. It would have been possible to design an API that is simple for 99% of all users, as simple as inotify; and only those who need the advanced features get the complexity as an option. I don't agree with your point that unnecessary complexity should be mitigated by throwing more (library) code at it. That's just adding more complexity and more overhead, the opposite of what I want. Max
On Tue, Sep 19, 2023 at 2:22 PM Max Kellermann <max.kellermann@ionos.com> wrote: > > On Tue, Sep 19, 2023 at 12:59 PM Amir Goldstein <amir73il@gmail.com> wrote: > > Any API complexity can be hidden from users with userspace > > libraries. You can use the inotify-tools lib if you prefer. > > That doesn't convince me at all, but that's a question of taste. We'll > just keep using inotify (with a patched kernel, which we have anyway). > ok. > > > Getting an already-opened file descriptor, or just the file_handle, is > > > certainly an interesting fanotify feature. But that could have easily > > > been added to inotify with a new "mask" flag for the > > > inotify_add_watch() function. > > > > > > > "could have easily been added" is not a statement that I am willing > > to accept. > > Are you willing to take a bet? I come up with a patch for implementing > this for inotify, let's say within a week, and you agree to merge it? > > (I'm not interested in this feature, I won't ever use it - all I > wanted is dfd support for inotify_add_watch()). > I am not into ego fights. I have no desire to win an argument. If you have an improvement that you want to make, you can submit it and it will be judged technically. If you want to improve inotify you can argue your case and it will be judged technically. But if you do that, I strongly advise to share the community early in the design review of the new feature/API. It can save you time. > > The things that you are complaining about in the API are the exact > > things that were needed to make the advanced features work. > > Not exactly - I complain that fanotify makes the complexity mandatory, > the complexity is the baseline of the API. It would have been possible > to design an API that is simple for 99% of all users, as simple as > inotify; and only those who need the advanced features get the > complexity as an option. > > I don't agree with your point that unnecessary complexity should be > mitigated by throwing more (library) code at it. That's just adding > more complexity and more overhead, the opposite of what I want. > Sorry, "what I want" is not a technical argument :) "what many users want" with proof could be a start of a technical argument. I agree that simplicity of the kernel UAPI vs. delegating simplicity to user libraries is a matter of taste and different subsystem maintainers have different opinions in that regard. And also, it is a bit late to discuss design preferences of an API that was merged 4 years ago. Design flaws and problems, sure, but for complexity it's a bit late. Regarding inotify improvements, as I wrote, they will each be judged technically, but the trend is towards phasing it out. Thanks, Amir.
On Tue, Sep 19, 2023 at 2:21 PM Amir Goldstein <amir73il@gmail.com> wrote: > Regarding inotify improvements, as I wrote, they will each be judged > technically, but the trend is towards phasing it out. Then please reconsider merging inotify_add_watch_at(). It is a rather trivial patch set, only exposing a user_path_at() parameter to use space, like many other new system calls did with other old-style system calls. Only the last patch, the one which adds the new system call ot all arch-specific tables, is an ugly one, but that's not a property of the new feature but a general property of how system calls are wired in Linux. My proposed system call adds real value to all those who are currently using inotify, allowing them to use inotify with a modern and safe and race-free syscall interface, eliminating the unsafe fchdir() dance to emulate it in userspace. The inotify interface is widely used and will be for a long time to come, while it is hard to find code which already uses fanotify. GitHub code search finds 438 occurences of fanotify_init() calls, 4.6k inotify_init1() calls and 6.9k inotify_init() calls. Given the added complexity of fanotify and the uselessness of most of fanotify's feature for most software (except for dfd support), it is extremely unlikely that a noticable fraction of those thousands of projects will ever migrate to fanotify. Even if inotify is considered a legacy API, it should be allowed to modernize it; and adding dfd support to system calls is really important. Max
On Tue, Sep 19, 2023 at 3:51 PM Max Kellermann <max.kellermann@ionos.com> wrote: > > On Tue, Sep 19, 2023 at 2:21 PM Amir Goldstein <amir73il@gmail.com> wrote: > > Regarding inotify improvements, as I wrote, they will each be judged > > technically, but the trend is towards phasing it out. > > Then please reconsider merging inotify_add_watch_at(). It is a rather > trivial patch set, only exposing a user_path_at() parameter to use > space, like many other new system calls did with other old-style > system calls. Only the last patch, the one which adds the new system > call ot all arch-specific tables, is an ugly one, but that's not a > property of the new feature but a general property of how system calls > are wired in Linux. > > My proposed system call adds real value to all those who are currently > using inotify, allowing them to use inotify with a modern and safe and > race-free syscall interface, eliminating the unsafe fchdir() dance to > emulate it in userspace. > > The inotify interface is widely used and will be for a long time to > come, while it is hard to find code which already uses fanotify. > GitHub code search finds 438 occurences of fanotify_init() calls, 4.6k > inotify_init1() calls and 6.9k inotify_init() calls. Given the added > complexity of fanotify and the uselessness of most of fanotify's > feature for most software (except for dfd support), it is extremely > unlikely that a noticable fraction of those thousands of projects will > ever migrate to fanotify. Even if inotify is considered a legacy API, > it should be allowed to modernize it; and adding dfd support to system > calls is really important. > Both Jan and I already gave an answer to this specific patch. The answer was no. We do not add new system calls for doing something that is already possible with existing system calls to make the life of a programmer easier - this has never been a valid argument for adding a new syscall. Thanks, Amir.
On Tue, Sep 19, 2023 at 3:01 PM Amir Goldstein <amir73il@gmail.com> wrote: > We do not add new system calls for doing something that is already > possible with existing system calls to make the life of a programmer > easier - this has never been a valid argument for adding a new syscall. - it's not possible to safely add an inotify watch; this isn't about making something easier, but about making something safe/reliable/race-free in a way that wasn't possible before - there are many precedents of new system calls just to add dfd support (fchmodat, execveat, linkat, mkdirat, ....) - there are also a few new system calls that were added to make the life of a programmer easier even though the same was already possible with existing system calls (close_range, process_madvise, pidfd_getfd, mount_setattr, ...)
On Tue, Sep 19, 2023 at 4:11 PM Max Kellermann <max.kellermann@ionos.com> wrote: > > On Tue, Sep 19, 2023 at 3:01 PM Amir Goldstein <amir73il@gmail.com> wrote: > > We do not add new system calls for doing something that is already > > possible with existing system calls to make the life of a programmer > > easier - this has never been a valid argument for adding a new syscall. > > - it's not possible to safely add an inotify watch; this isn't about > making something easier, but about making something > safe/reliable/race-free in a way that wasn't possible before Yes, I meant it is possible to get the very similar functionality in a race-free way using fanotify. If fanotify does not meet your requirements please let us know in what way and perhaps fanotify could be improved. Using "inotify and not fanotity" is not a legit technical requirement. > - there are many precedents of new system calls just to add dfd > support (fchmodat, execveat, linkat, mkdirat, ....) > - there are also a few new system calls that were added to make the > life of a programmer easier even though the same was already possible > with existing system calls (close_range, process_madvise, pidfd_getfd, > mount_setattr, ...) All those new syscalls add new functionality/security/performance. If you think they were added to make the life of the programmer easier you did not understand them. Anyway, I've said my opinion about inotify_add_watch_at(). final call is up to Jan. Thanks, Amir. P.S. you may be able to provide magic /proc/self/$fd symlinks as path argument to inotify_add_watch() after opening them with O_PATH to get what you want - I didn't try.
On Tue 19-09-23 13:21:56, Max Kellermann wrote: > On Tue, Sep 19, 2023 at 12:59 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > Getting an already-opened file descriptor, or just the file_handle, is > > > certainly an interesting fanotify feature. But that could have easily > > > been added to inotify with a new "mask" flag for the > > > inotify_add_watch() function. > > > > > > > "could have easily been added" is not a statement that I am willing > > to accept. > > Are you willing to take a bet? I come up with a patch for implementing > this for inotify, let's say within a week, and you agree to merge it? I guess no point in you wasting time for this. But if you'd try, I'll really find out it isn't so easy. Inotify event is fixed length so fsid+fhandle is completely out of the realm of "easy extension". If you wanted to return fd instead of wd, that would be doable with some kind of a flag in the mark mask, although it would be a bit inconsistent with the rest of the inotify API. > > The things that you are complaining about in the API are the exact > > things that were needed to make the advanced features work. > > Not exactly - I complain that fanotify makes the complexity mandatory, > the complexity is the baseline of the API. It would have been possible > to design an API that is simple for 99% of all users, as simple as > inotify; and only those who need the advanced features get the > complexity as an option. Well yes, fanotify could have been designed to make basic usage easier. But the design (some 15 years ago) was focusing more on filling in the functional gaps inotify had for usecases such as anti-virus monitors etc. and kind of left thinking about simple usecases for sometime later. So we have what we have. Honza
On Tue, Sep 19, 2023 at 3:22 PM Amir Goldstein <amir73il@gmail.com> wrote: > Yes, I meant it is possible to get the very similar functionality in > a race-free way using fanotify. That's not the same. We already agreed that fanotify still misses features that have been available in inotify since forever. Going fanotify requires a rewrite of large chunks of code. Rejecting trivial inotify improvements because people should be using fanotify doesn't make real-world users happy. > If fanotify does not meet your requirements please let us know > in what way and perhaps fanotify could be improved. - return a watch descriptor (like inotify) as a fixed-size lookup key - add an option so returned events contain the watch descriptor and path relative to it (like inotify), not just the directory entry name - allow unprivileged processes to use this new option instead of FAN_REPORT_FID Supporting this simplified API still makes fanotify harder to use than inotify, but retains fanotify's full power while minimizing its API churn for the 99% of users who were already happy with inotify's feature set. > > - there are many precedents of new system calls just to add dfd > > support (fchmodat, execveat, linkat, mkdirat, ....) > > - there are also a few new system calls that were added to make the > > life of a programmer easier even though the same was already possible > > with existing system calls (close_range, process_madvise, pidfd_getfd, > > mount_setattr, ...) > > All those new syscalls add new functionality/security/performance. So does inotify_add_watch_at(). On the other hand, fanotify reduces performance by adding complexity and overhead - more system calls necessary, increased lookup overhead due to variable-length keys instead of 32-bit integers. > If you think they were added to make the life of the programmer easier > you did not understand them. Oh please. Don't be so arrogant.
On Tue, Sep 19, 2023 at 3:28 PM Jan Kara <jack@suse.cz> wrote: > Inotify event is fixed length so > fsid+fhandle is completely out of the realm of "easy extension". (Not quite true, it's variable-length. But that's not relevant if we're talking about adding an optional feature.) If I were to implement this, I'd add a mask bit called, say, IN_FILE_HANDLE. If that bit is enabled on a watch, the (variable-length) inotify_event would be followed by another (variable-length) struct that looks just like fanotify_event_info_fid, containining fsid and file_handle (of course, only if the same bit is also set in inotify_event.mask, which would give the kernel a way to omit it if it's not available for a certain file). This is a backwards-compatible extension because it's opt-in. Only applications which support it will set the IN_FILE_HANDLE bit. Old applications don't set this bit and never see this trailing struct. Max
> > > - there are many precedents of new system calls just to add dfd > > > support (fchmodat, execveat, linkat, mkdirat, ....) > > > - there are also a few new system calls that were added to make the > > > life of a programmer easier even though the same was already possible > > > with existing system calls (close_range, process_madvise, pidfd_getfd, > > > mount_setattr, ...) > > > > All those new syscalls add new functionality/security/performance. > > So does inotify_add_watch_at(). > > On the other hand, fanotify reduces performance by adding complexity > and overhead - more system calls necessary, increased lookup overhead > due to variable-length keys instead of 32-bit integers. > Technical arguments of performance need to be backed up by performance numbers from real life workloads. I am not inventing this stuff as I go. This is how kernel development works. > > If you think they were added to make the life of the programmer easier > > you did not understand them. > > Oh please. Don't be so arrogant. I will try. Please try as well to accept a different POV. Thanks, Amir.
On Tue, Sep 19, 2023 at 12:48 PM Jan Kara <jack@suse.cz> wrote: > > On Mon 18-09-23 21:05:11, Amir Goldstein wrote: > > [Forked from https://lore.kernel.org/linux-fsdevel/20230918123217.932179-1-max.kellermann@ionos.com/] > ... > > BTW, before we can really mark inotify as Obsolete and document that > > inotify was superseded by fanotify, there are at least two items on the > > TODO list [1]: > > Yeah, as I wrote in the original thread, I don't feel like inotify should > be marked as obsolete (at least for some more time) so we are on the same > page here I think. > > > 1. UNMOUNT/IGNORED events > > 2. Filesystems without fid support > > > > MOUNT/UNMOUNT fanotify events have already been discussed > > and the feature has known users. > > > > Christian has also mentioned [1] the IN_UNMOUNT use case for > > waiting for sb shutdown several times and I will not be surprised > > to see systemd starting to use inotify for that use case before too long... > > Yup, both FAN_UNMOUNT and FAN_IGNORED should be easy. Unlike inotify, I'd > just make these explicit events you can opt into and not something you > always get. > > > Regarding the second item on the TODO list, we have had this discussion > > before - if we are going to continue the current policy of opting-in to > > fanotify (i.e. tmpfs, fuse, overlayfs, kernfs), we will always have odd > > filesystems that only support inotify and we will need to keep supporting > > inotify only for the users that use inotify on those odd filesystems. > > > > OTOH, if we implement FAN_REPORT_DINO_NAME, we could > > have fanotify inode mark support for any filesystem, where the > > pinned marked inode ino is the object id. > > Is it a real problem after your work to allow filehandles that are not > necessarily usable for NFS export or open_by_handle()? As far as I remember > fanotify should be now able to handle anything that provides f_fsid in its Not exactly. We still have a requirement for non-empty dentry->d_sb->s_export_op in fanotify_test_fid(), to align with the same requirement for AT_HANDLE_FID support. > statfs(2) call. And as I'm checking filesystems not setting fsid currently are: > > afs, coda, nfs - networking filesystems where inotify and fanotify have > dubious value anyway > > configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore, > ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is > quite limited. But some special cases could be useful. Adding fsid support > is the same amount of trouble as for kernfs - a few LOC. In fact, we > could perhaps add a fstype flag to indicate that this is a filesystem > without persistent identification and so uuid should be autogenerated on > mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid. > This way we could handle all these filesystems with trivial amount of > effort. This sounds good to me. I have a vague memory of suggesting the same and I think Christian had objections, but I may be remembering wrong. Possibly, the same opt-in fstype flag could allow also trivial AT_HANDLE_FID support with the default export_encode_fh()? > > freevxfs - the only real filesystem without f_fsid. Trivial to handle one > way or the other. > > So I don't think we need new uAPI additions to finish off this TODO item. Yes, I'd love that. I can try to post something if there are no objections. Thanks, Amir.
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index b6e6f6ab21f8..8a9096c5ebb1 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -797,6 +797,12 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname, return do_inotify_add_watch(fd, AT_FDCWD, pathname, mask); } +SYSCALL_DEFINE4(inotify_add_watch_at, int, fd, int, dfd, const char __user *, pathname, + u32, mask) +{ + return do_inotify_add_watch(fd, dfd, pathname, mask); +} + SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd) { struct fsnotify_group *group;
This implements a missing piece in the inotify API: referring to a file by a directory file descriptor and a path name. This can be solved in userspace currently only by doing something similar to: int old = open("."); fchdir(dfd); inotify_add_watch(....); fchdir(old); Support for LOOKUP_EMPTY is still missing. We could add another IN_* flag for that (which would clutter the IN_* flags list further) or add a "flags" parameter to the new system call (which would however duplicate features already present via special IN_* flags). To: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> To: linux-fsdevel@vger.kernel.org To: linux-kernel@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> --- fs/notify/inotify/inotify_user.c | 6 ++++++ 1 file changed, 6 insertions(+)