Message ID | 20241029231244.2834368-3-song@kernel.org (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | Fanotify fastpath handler | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for bpf-next, async |
netdev/apply | fail | Patch does not apply to bpf-next-0 |
On Tue, Oct 29, 2024 at 04:12:41PM -0700, Song Liu wrote:
> + if (strstr(file_name->name, item->prefix) == (char *)file_name->name)
Huh? "Find the first substring (if any) equal to item->prefix and
then check if that happens to be in the very beginning"???
And you are placing that into the place where it's most likely to cause
the maximal braindamage and spread all over the tree. Wonderful ;-/
Where does that "idiom" come from, anyway? Java? Not the first time
I see that kind of garbage; typecast is an unusual twist, though...
On Wed, Oct 30, 2024 at 12:11:55AM +0000, Al Viro wrote: > On Tue, Oct 29, 2024 at 04:12:41PM -0700, Song Liu wrote: > > + if (strstr(file_name->name, item->prefix) == (char *)file_name->name) > > Huh? "Find the first substring (if any) equal to item->prefix and > then check if that happens to be in the very beginning"??? > > And you are placing that into the place where it's most likely to cause > the maximal braindamage and spread all over the tree. Wonderful ;-/ > > Where does that "idiom" come from, anyway? Java? Not the first time > I see that kind of garbage; typecast is an unusual twist, though... After some coproarchaeology: it's probably even worse than java - in javashit indexOf() predates startsWith() by quite a few years; java itself had both methods from the very beginning. So that's probably where it had been cargo-culted from... The thing is, performance requirements of some garbage script from a javascript-infested webshite are somewhat different from what you want to see anywhere near the kernel-side filesystem event handling...
> On Oct 29, 2024, at 5:11 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Tue, Oct 29, 2024 at 04:12:41PM -0700, Song Liu wrote: >> + if (strstr(file_name->name, item->prefix) == (char *)file_name->name) > > Huh? "Find the first substring (if any) equal to item->prefix and > then check if that happens to be in the very beginning"??? Replaced it with strcmp locally, assuming strncmp is not necessary. > > And you are placing that into the place where it's most likely to cause > the maximal braindamage and spread all over the tree. Wonderful ;-/ > > Where does that "idiom" come from, anyway? Java? Not the first time > I see that kind of garbage; typecast is an unusual twist, though... The strstr() was probably from lack of coffee. The cast is most likely from the compiler yelling at sleepy me. BTW: Why do we need "const unsigned char *" in qstr instead of "const char *"? I do notice git doesn't show the real history of it.. Thanks, Song
> On Oct 29, 2024, at 7:07 PM, Song Liu <songliubraving@meta.com> wrote: > > > >> On Oct 29, 2024, at 5:11 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> >> On Tue, Oct 29, 2024 at 04:12:41PM -0700, Song Liu wrote: >>> + if (strstr(file_name->name, item->prefix) == (char *)file_name->name) >> >> Huh? "Find the first substring (if any) equal to item->prefix and >> then check if that happens to be in the very beginning"??? > > Replaced it with strcmp locally, assuming strncmp is not necessary. Oops, we do need strncmp and more coffee..
On Tue, 2024-10-29 at 16:12 -0700, Song Liu wrote: > This fastpath handler filters out events for files with certain prefixes. > To use it: > > [root] insmod fastpath-mod.ko # This requires root. > > [user] ./fastpath-user /tmp a,b,c & # Root is not needed > [user] touch /tmp/aa # a is in the prefix list (a,b,c), no events > [user] touch /tmp/xx # x is not in the prefix list, generates events > > Accessing file xx # this is the output from fastpath_user > > Signed-off-by: Song Liu <song@kernel.org> > --- > MAINTAINERS | 1 + > samples/Kconfig | 20 ++++- > samples/Makefile | 2 +- > samples/fanotify/.gitignore | 1 + > samples/fanotify/Makefile | 5 +- > samples/fanotify/fastpath-mod.c | 138 +++++++++++++++++++++++++++++++ > samples/fanotify/fastpath-user.c | 90 ++++++++++++++++++++ > 7 files changed, 254 insertions(+), 3 deletions(-) > create mode 100644 samples/fanotify/fastpath-mod.c > create mode 100644 samples/fanotify/fastpath-user.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index 7ad507f49324..8939a48b2d99 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -8658,6 +8658,7 @@ S: Maintained > F: fs/notify/fanotify/ > F: include/linux/fanotify.h > F: include/uapi/linux/fanotify.h > +F: samples/fanotify/ > > FARADAY FOTG210 USB2 DUAL-ROLE CONTROLLER > M: Linus Walleij <linus.walleij@linaro.org> > diff --git a/samples/Kconfig b/samples/Kconfig > index b288d9991d27..b0d3dff48bb0 100644 > --- a/samples/Kconfig > +++ b/samples/Kconfig > @@ -149,15 +149,33 @@ config SAMPLE_CONNECTOR > with it. > See also Documentation/driver-api/connector.rst > > +config SAMPLE_FANOTIFY > + bool "Build fanotify monitoring sample" > + depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL > + help > + When enabled, this builds samples for fanotify. > + There multiple samples for fanotify. Please see the > + following configs for more details of these > + samples. > + > config SAMPLE_FANOTIFY_ERROR > bool "Build fanotify error monitoring sample" > - depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL > + depends on SAMPLE_FANOTIFY > help > When enabled, this builds an example code that uses the > FAN_FS_ERROR fanotify mechanism to monitor filesystem > errors. > See also Documentation/admin-guide/filesystem-monitoring.rst. > > +config SAMPLE_FANOTIFY_FASTPATH > + tristate "Build fanotify fastpath sample" > + depends on SAMPLE_FANOTIFY && m > + help > + When enabled, this builds kernel module that contains a > + fanotify fastpath handler. > + The fastpath handler filters out certain filename > + prefixes for the fanotify user. > + > config SAMPLE_HIDRAW > bool "hidraw sample" > depends on CC_CAN_LINK && HEADERS_INSTALL > diff --git a/samples/Makefile b/samples/Makefile > index b85fa64390c5..108360972626 100644 > --- a/samples/Makefile > +++ b/samples/Makefile > @@ -6,7 +6,7 @@ subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs > subdir-$(CONFIG_SAMPLE_CGROUP) += cgroup > obj-$(CONFIG_SAMPLE_CONFIGFS) += configfs/ > obj-$(CONFIG_SAMPLE_CONNECTOR) += connector/ > -obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fanotify/ > +obj-$(CONFIG_SAMPLE_FANOTIFY) += fanotify/ > subdir-$(CONFIG_SAMPLE_HIDRAW) += hidraw > obj-$(CONFIG_SAMPLE_HW_BREAKPOINT) += hw_breakpoint/ > obj-$(CONFIG_SAMPLE_KDB) += kdb/ > diff --git a/samples/fanotify/.gitignore b/samples/fanotify/.gitignore > index d74593e8b2de..306e1ddec4e0 100644 > --- a/samples/fanotify/.gitignore > +++ b/samples/fanotify/.gitignore > @@ -1 +1,2 @@ > fs-monitor > +fastpath-user > diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile > index e20db1bdde3b..f5bbd7380104 100644 > --- a/samples/fanotify/Makefile > +++ b/samples/fanotify/Makefile > @@ -1,5 +1,8 @@ > # SPDX-License-Identifier: GPL-2.0-only > -userprogs-always-y += fs-monitor > +userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fs-monitor > > userccflags += -I usr/include -Wall > > +obj-$(CONFIG_SAMPLE_FANOTIFY_FASTPATH) += fastpath-mod.o > + > +userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_FASTPATH) += fastpath-user > diff --git a/samples/fanotify/fastpath-mod.c b/samples/fanotify/fastpath-mod.c > new file mode 100644 > index 000000000000..06c4b42ff114 > --- /dev/null > +++ b/samples/fanotify/fastpath-mod.c > @@ -0,0 +1,138 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +#include <linux/fsnotify.h> > +#include <linux/fanotify.h> > +#include <linux/list.h> > +#include <linux/module.h> > +#include <linux/string.h> > + > +struct prefix_item { > + const char *prefix; > + struct list_head list; > +}; > + > +struct sample_fp_data { > + /* > + * str_table contains all the prefixes to ignore. For example, > + * "prefix1\0prefix2\0prefix3" > + */ > + char *str_table; > + > + /* item->prefix points to different prefixes in the str_table. */ > + struct list_head item_list; > +}; > + > +static int sample_fp_handler(struct fsnotify_group *group, > + struct fanotify_fastpath_hook *fp_hook, > + struct fanotify_fastpath_event *fp_event) > +{ > + const struct qstr *file_name = fp_event->file_name; > + struct sample_fp_data *fp_data; > + struct prefix_item *item; > + > + if (!file_name) > + return FAN_FP_RET_SEND_TO_USERSPACE; > + fp_data = fp_hook->data; > + > + list_for_each_entry(item, &fp_data->item_list, list) { > + if (strstr(file_name->name, item->prefix) == (char *)file_name->name) > + return FAN_FP_RET_SKIP_EVENT; > + } > + > + return FAN_FP_RET_SEND_TO_USERSPACE; > +} The sample is a little underwhelming and everyone hates string parsing in the kernel ;). It'd be nice to see a more real-world use-case for this. Could this be used to implement subtree filtering? I guess you'd have to walk back up the directory tree and see whether it had a given ancestor? > + > +static int add_item(struct sample_fp_data *fp_data, const char *prev) > +{ > + struct prefix_item *item; > + > + item = kzalloc(sizeof(*item), GFP_KERNEL); > + if (!item) > + return -ENOMEM; > + item->prefix = prev; > + list_add_tail(&item->list, &fp_data->item_list); > + return 0; > +} > + > +static void free_sample_fp_data(struct sample_fp_data *fp_data) > +{ > + struct prefix_item *item, *tmp; > + > + list_for_each_entry_safe(item, tmp, &fp_data->item_list, list) { > + list_del_init(&item->list); > + kfree(item); > + } > + kfree(fp_data->str_table); > + kfree(fp_data); > +} > + > +static int sample_fp_init(struct fanotify_fastpath_hook *fp_hook, const char *args) > +{ > + struct sample_fp_data *fp_data = kzalloc(sizeof(struct sample_fp_data), GFP_KERNEL); > + char *p, *prev; > + int ret; > + > + if (!fp_data) > + return -ENOMEM; > + > + /* Make a copy of the list of prefix to ignore */ > + fp_data->str_table = kstrndup(args, FAN_FP_ARGS_MAX, GFP_KERNEL); > + if (!fp_data->str_table) { > + ret = -ENOMEM; > + goto err_out; > + } > + > + INIT_LIST_HEAD(&fp_data->item_list); > + prev = fp_data->str_table; > + p = fp_data->str_table; > + > + /* Update the list replace ',' with '\n'*/ > + while ((p = strchr(p, ',')) != NULL) { > + *p = '\0'; > + ret = add_item(fp_data, prev); > + if (ret) > + goto err_out; > + p = p + 1; > + prev = p; > + } > + > + ret = add_item(fp_data, prev); > + if (ret) > + goto err_out; > + > + fp_hook->data = fp_data; > + > + return 0; > + > +err_out: > + free_sample_fp_data(fp_data); > + return ret; > +} > + > +static void sample_fp_free(struct fanotify_fastpath_hook *fp_hook) > +{ > + free_sample_fp_data(fp_hook->data); > +} > + > +static struct fanotify_fastpath_ops fan_fp_ignore_a_ops = { > + .fp_handler = sample_fp_handler, > + .fp_init = sample_fp_init, > + .fp_free = sample_fp_free, > + .name = "ignore-prefix", > + .owner = THIS_MODULE, > +}; > + > +static int __init fanotify_fastpath_sample_init(void) > +{ > + return fanotify_fastpath_register(&fan_fp_ignore_a_ops); > +} > +static void __exit fanotify_fastpath_sample_exit(void) > +{ > + fanotify_fastpath_unregister(&fan_fp_ignore_a_ops); > +} > + > +module_init(fanotify_fastpath_sample_init); > +module_exit(fanotify_fastpath_sample_exit); > + > +MODULE_AUTHOR("Song Liu"); > +MODULE_DESCRIPTION("Example fanotify fastpath handler"); > +MODULE_LICENSE("GPL"); > diff --git a/samples/fanotify/fastpath-user.c b/samples/fanotify/fastpath-user.c > new file mode 100644 > index 000000000000..f301c4e0d21a > --- /dev/null > +++ b/samples/fanotify/fastpath-user.c > @@ -0,0 +1,90 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#define _GNU_SOURCE > +#include <err.h> > +#include <fcntl.h> > +#include <stdio.h> > +#include <string.h> > +#include <sys/fanotify.h> > +#include <unistd.h> > +#include <sys/ioctl.h> > + > +static int total_event_cnt; > + > +static void handle_notifications(char *buffer, int len) > +{ > + struct fanotify_event_metadata *event = > + (struct fanotify_event_metadata *) buffer; > + struct fanotify_event_info_header *info; > + struct fanotify_event_info_fid *fid; > + struct file_handle *handle; > + char *name; > + int off; > + > + for (; FAN_EVENT_OK(event, len); event = FAN_EVENT_NEXT(event, len)) { > + for (off = sizeof(*event) ; off < event->event_len; > + off += info->len) { > + info = (struct fanotify_event_info_header *) > + ((char *) event + off); > + switch (info->info_type) { > + case FAN_EVENT_INFO_TYPE_DFID_NAME: > + fid = (struct fanotify_event_info_fid *) info; > + handle = (struct file_handle *)&fid->handle; > + name = (char *)handle + sizeof(*handle) + handle->handle_bytes; > + > + printf("Accessing file %s\n", name); > + total_event_cnt++; > + break; > + default: > + break; > + } > + } > + } > +} > + > +int main(int argc, char **argv) > +{ > + struct fanotify_fastpath_args args = { > + .name = "ignore-prefix", > + .version = 1, > + .flags = 0, > + }; > + char buffer[BUFSIZ]; > + int fd; > + > + if (argc < 3) { > + printf("Usage\n" > + "\t %s <path to monitor> <prefix to ignore>\n", > + argv[0]); > + return 1; > + } > + > + args.init_args = (__u64)argv[2]; > + args.init_args_len = strlen(argv[2]) + 1; > + > + fd = fanotify_init(FAN_CLASS_NOTIF | FAN_REPORT_NAME | FAN_REPORT_DIR_FID, O_RDONLY); > + if (fd < 0) > + errx(1, "fanotify_init"); > + > + if (fanotify_mark(fd, FAN_MARK_ADD, > + FAN_OPEN | FAN_ONDIR | FAN_EVENT_ON_CHILD, > + AT_FDCWD, argv[1])) { > + errx(1, "fanotify_mark"); > + } > + > + if (ioctl(fd, FAN_IOC_ADD_FP, &args)) > + errx(1, "ioctl"); > + > + while (total_event_cnt < 10) { > + int n = read(fd, buffer, BUFSIZ); > + > + if (n < 0) > + errx(1, "read"); > + > + handle_notifications(buffer, n); > + } > + > + ioctl(fd, FAN_IOC_DEL_FP); > + close(fd); > + > + return 0; > +}
Hi Jeff, Thanks for your review! I will update 1/2 based on the feedback. (Replying here to save everyone an email..) > On Oct 30, 2024, at 6:03 AM, Jeff Layton <jlayton@kernel.org> wrote: [...] >> + >> +static int sample_fp_handler(struct fsnotify_group *group, >> + struct fanotify_fastpath_hook *fp_hook, >> + struct fanotify_fastpath_event *fp_event) >> +{ >> + const struct qstr *file_name = fp_event->file_name; >> + struct sample_fp_data *fp_data; >> + struct prefix_item *item; >> + >> + if (!file_name) >> + return FAN_FP_RET_SEND_TO_USERSPACE; >> + fp_data = fp_hook->data; >> + >> + list_for_each_entry(item, &fp_data->item_list, list) { >> + if (strstr(file_name->name, item->prefix) == (char *)file_name->name) >> + return FAN_FP_RET_SKIP_EVENT; >> + } >> + >> + return FAN_FP_RET_SEND_TO_USERSPACE; >> +} > > The sample is a little underwhelming and everyone hates string parsing > in the kernel ;). It'd be nice to see a more real-world use-case for > this. > > Could this be used to implement subtree filtering? I guess you'd have > to walk back up the directory tree and see whether it had a given > ancestor? If the subtree is all in the same file system, we can attach fanotify to the whole file system, and then use some dget_parent() and follow_up() to walk up the directory tree in the fastpath handler. However, if there are other mount points in the subtree, we will need more logic to handle these mount points. @Christian, I would like to know your thoughts on this (walking up the directory tree in fanotify fastpath handler). It can be expensive for very very deep subtree. How should we pass in the subtree? I guess we can just use full path in a string as the argument. Thanks, Song
On Wed, 2024-10-30 at 20:30 +0000, Song Liu wrote: > Hi Jeff, > > Thanks for your review! > > I will update 1/2 based on the feedback. (Replying here to save everyone > an email..) > > > On Oct 30, 2024, at 6:03 AM, Jeff Layton <jlayton@kernel.org> wrote: > > [...] > > > > + > > > +static int sample_fp_handler(struct fsnotify_group *group, > > > + struct fanotify_fastpath_hook *fp_hook, > > > + struct fanotify_fastpath_event *fp_event) > > > +{ > > > + const struct qstr *file_name = fp_event->file_name; > > > + struct sample_fp_data *fp_data; > > > + struct prefix_item *item; > > > + > > > + if (!file_name) > > > + return FAN_FP_RET_SEND_TO_USERSPACE; > > > + fp_data = fp_hook->data; > > > + > > > + list_for_each_entry(item, &fp_data->item_list, list) { > > > + if (strstr(file_name->name, item->prefix) == (char *)file_name->name) > > > + return FAN_FP_RET_SKIP_EVENT; > > > + } > > > + > > > + return FAN_FP_RET_SEND_TO_USERSPACE; > > > +} > > > > The sample is a little underwhelming and everyone hates string parsing > > in the kernel ;). It'd be nice to see a more real-world use-case for > > this. > > > > Could this be used to implement subtree filtering? I guess you'd have > > to walk back up the directory tree and see whether it had a given > > ancestor? > > If the subtree is all in the same file system, we can attach fanotify to > the whole file system, and then use some dget_parent() and follow_up() > to walk up the directory tree in the fastpath handler. However, if there > are other mount points in the subtree, we will need more logic to handle > these mount points. > My 2 cents... I'd just confine it to a single vfsmount. If you want to monitor in several submounts, then you need to add new fanotify watches. Alternately, maybe there is some way to designate that an entire vfsmount is a child of a watched (or ignored) directory? > @Christian, I would like to know your thoughts on this (walking up the > directory tree in fanotify fastpath handler). It can be expensive for > very very deep subtree. > I'm not Christian, but I'll make the case for it. It's basically a bunch of pointer chasing. That's probably not "cheap", but if you can do it under RCU it might not be too awful. It might still suck with really deep paths, but this is a sample module. It's not expected that everyone will want to use it anyway. > How should we pass in the subtree? I guess we can just use full path in > a string as the argument. > I'd stay away from string parsing. How about this instead? Allow a process to open a directory fd, and then hand that fd to an fanotify ioctl that says that you want to ignore everything that has that directory as an ancestor. Or, maybe make it so that you only watch dentries that have that directory as an ancestor? I'm not sure what makes the most sense. Then, when you get a dentry-based event, you just walk the d_parent pointers back up to the root of the vfsmount. If one of them matches the dentry in your fd then you're done. If you hit the root of the vfsmount, then you're also done (and know that it's not a child of that dentry).
Hi Jeff, > On Oct 30, 2024, at 5:23 PM, Jeff Layton <jlayton@kernel.org> wrote: [...] >> If the subtree is all in the same file system, we can attach fanotify to >> the whole file system, and then use some dget_parent() and follow_up() >> to walk up the directory tree in the fastpath handler. However, if there >> are other mount points in the subtree, we will need more logic to handle >> these mount points. >> > > My 2 cents... > > I'd just confine it to a single vfsmount. If you want to monitor in > several submounts, then you need to add new fanotify watches. > > Alternately, maybe there is some way to designate that an entire > vfsmount is a child of a watched (or ignored) directory? > >> @Christian, I would like to know your thoughts on this (walking up the >> directory tree in fanotify fastpath handler). It can be expensive for >> very very deep subtree. >> > > I'm not Christian, but I'll make the case for it. It's basically a > bunch of pointer chasing. That's probably not "cheap", but if you can > do it under RCU it might not be too awful. It might still suck with > really deep paths, but this is a sample module. It's not expected that > everyone will want to use it anyway. Thanks for the suggestion! I will try to do it under RCU. > >> How should we pass in the subtree? I guess we can just use full path in >> a string as the argument. >> > > I'd stay away from string parsing. How about this instead? > > Allow a process to open a directory fd, and then hand that fd to an > fanotify ioctl that says that you want to ignore everything that has > that directory as an ancestor. Or, maybe make it so that you only watch > dentries that have that directory as an ancestor? I'm not sure what > makes the most sense. Yes, directory fd is another option. Currently, the "attach to group" function only takes a string as input. I guess it makes sense to allow taking a fd, or maybe we should allow any random format (pass in a pointer to a structure. Let me give it a try. Thanks again! Song > > Then, when you get a dentry-based event, you just walk the d_parent > pointers back up to the root of the vfsmount. If one of them matches > the dentry in your fd then you're done. If you hit the root of the > vfsmount, then you're also done (and know that it's not a child of that > dentry). > -- > Jeff Layton <jlayton@kernel.org>
On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote: > > Hi Jeff, > > > On Oct 30, 2024, at 5:23 PM, Jeff Layton <jlayton@kernel.org> wrote: > > [...] > > >> If the subtree is all in the same file system, we can attach fanotify to > >> the whole file system, and then use some dget_parent() and follow_up() > >> to walk up the directory tree in the fastpath handler. However, if there > >> are other mount points in the subtree, we will need more logic to handle > >> these mount points. > >> > > > > My 2 cents... > > > > I'd just confine it to a single vfsmount. If you want to monitor in > > several submounts, then you need to add new fanotify watches. > > Agreed. > > Alternately, maybe there is some way to designate that an entire > > vfsmount is a child of a watched (or ignored) directory? > > > >> @Christian, I would like to know your thoughts on this (walking up the > >> directory tree in fanotify fastpath handler). It can be expensive for > >> very very deep subtree. > >> > > > > I'm not Christian, but I'll make the case for it. It's basically a > > bunch of pointer chasing. That's probably not "cheap", but if you can > > do it under RCU it might not be too awful. It might still suck with > > really deep paths, but this is a sample module. It's not expected that > > everyone will want to use it anyway. > > Thanks for the suggestion! I will try to do it under RCU. > That's the cost of doing a subtree filter. Not sure how it could be avoided? > > > >> How should we pass in the subtree? I guess we can just use full path in > >> a string as the argument. > >> > > > > I'd stay away from string parsing. How about this instead? > > > > Allow a process to open a directory fd, and then hand that fd to an > > fanotify ioctl that says that you want to ignore everything that has > > that directory as an ancestor. Or, maybe make it so that you only watch > > dentries that have that directory as an ancestor? I'm not sure what > > makes the most sense. > > Yes, directory fd is another option. Currently, the "attach to group" > function only takes a string as input. I guess it makes sense to allow > taking a fd, or maybe we should allow any random format (pass in a > pointer to a structure. Let me give it a try. > > Thanks again! > > Song > > > > > Then, when you get a dentry-based event, you just walk the d_parent > > pointers back up to the root of the vfsmount. If one of them matches > > the dentry in your fd then you're done. If you hit the root of the > > vfsmount, then you're also done (and know that it's not a child of that > > dentry). Agreed. Thanks, Amir.
Hi Amir, > On Nov 6, 2024, at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote: > > On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote: >> >> Hi Jeff, >> >>> On Oct 30, 2024, at 5:23 PM, Jeff Layton <jlayton@kernel.org> wrote: >> >> [...] >> >>>> If the subtree is all in the same file system, we can attach fanotify to >>>> the whole file system, and then use some dget_parent() and follow_up() >>>> to walk up the directory tree in the fastpath handler. However, if there >>>> are other mount points in the subtree, we will need more logic to handle >>>> these mount points. >>>> >>> >>> My 2 cents... >>> >>> I'd just confine it to a single vfsmount. If you want to monitor in >>> several submounts, then you need to add new fanotify watches. >>> > > Agreed. > >>> Alternately, maybe there is some way to designate that an entire >>> vfsmount is a child of a watched (or ignored) directory? >>> >>>> @Christian, I would like to know your thoughts on this (walking up the >>>> directory tree in fanotify fastpath handler). It can be expensive for >>>> very very deep subtree. >>>> >>> >>> I'm not Christian, but I'll make the case for it. It's basically a >>> bunch of pointer chasing. That's probably not "cheap", but if you can >>> do it under RCU it might not be too awful. It might still suck with >>> really deep paths, but this is a sample module. It's not expected that >>> everyone will want to use it anyway. >> >> Thanks for the suggestion! I will try to do it under RCU. >> > > That's the cost of doing a subtree filter. > Not sure how it could be avoided? [...] >> >>> >>> Then, when you get a dentry-based event, you just walk the d_parent >>> pointers back up to the root of the vfsmount. If one of them matches >>> the dentry in your fd then you're done. If you hit the root of the >>> vfsmount, then you're also done (and know that it's not a child of that >>> dentry). Thanks for your feedback. I will update the set based on Jeff's suggestion. Song
On Wed 06-11-24 20:40:50, Amir Goldstein wrote: > On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote: > > > Alternately, maybe there is some way to designate that an entire > > > vfsmount is a child of a watched (or ignored) directory? > > > > > >> @Christian, I would like to know your thoughts on this (walking up the > > >> directory tree in fanotify fastpath handler). It can be expensive for > > >> very very deep subtree. > > >> > > > > > > I'm not Christian, but I'll make the case for it. It's basically a > > > bunch of pointer chasing. That's probably not "cheap", but if you can > > > do it under RCU it might not be too awful. It might still suck with > > > really deep paths, but this is a sample module. It's not expected that > > > everyone will want to use it anyway. > > > > Thanks for the suggestion! I will try to do it under RCU. > > > > That's the cost of doing a subtree filter. > Not sure how it could be avoided? Yes. For a real solution (not this example), we'd probably have to limit the parent walk to say 16 steps and if we can reach neither root nor our dir in that number of steps, we'll just chicken out and pass the event to userspace to deal with it. This way the kernel filter will deal with most cases anyway and we won't risk livelocking or too big performance overhead of the filter. For this example, I think using fs/dcache.c:is_subdir() will be OK for demonstration purposes. Honza
On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote: > > Hi Jeff, > > > On Oct 30, 2024, at 5:23 PM, Jeff Layton <jlayton@kernel.org> wrote: > > [...] > > >> If the subtree is all in the same file system, we can attach fanotify to > >> the whole file system, and then use some dget_parent() and follow_up() > >> to walk up the directory tree in the fastpath handler. However, if there > >> are other mount points in the subtree, we will need more logic to handle > >> these mount points. > >> > > > > My 2 cents... > > > > I'd just confine it to a single vfsmount. If you want to monitor in > > several submounts, then you need to add new fanotify watches. > > > > Alternately, maybe there is some way to designate that an entire > > vfsmount is a child of a watched (or ignored) directory? > > > >> @Christian, I would like to know your thoughts on this (walking up the > >> directory tree in fanotify fastpath handler). It can be expensive for > >> very very deep subtree. > >> > > > > I'm not Christian, but I'll make the case for it. It's basically a > > bunch of pointer chasing. That's probably not "cheap", but if you can > > do it under RCU it might not be too awful. It might still suck with > > really deep paths, but this is a sample module. It's not expected that > > everyone will want to use it anyway. > > Thanks for the suggestion! I will try to do it under RCU. > > > > >> How should we pass in the subtree? I guess we can just use full path in > >> a string as the argument. > >> > > > > I'd stay away from string parsing. How about this instead? > > > > Allow a process to open a directory fd, and then hand that fd to an > > fanotify ioctl that says that you want to ignore everything that has > > that directory as an ancestor. Or, maybe make it so that you only watch > > dentries that have that directory as an ancestor? I'm not sure what > > makes the most sense. > > Yes, directory fd is another option. Currently, the "attach to group" > function only takes a string as input. I guess it makes sense to allow > taking a fd, or maybe we should allow any random format (pass in a > pointer to a structure. Let me give it a try. > IIUC, the BFP program example uses another API to configure the filter (i.e. the inode map). IMO, passing any single argument during setup time is not scalable and any filter should have its own way to reconfigure its parameters in runtime (i.e. add/remove watched subtree). Assuming that the same module/bfp_prog serves multiple fanotify groups and each group may have a different filter config, I think that passing an integer arg to identify the config (be it fd or something else) is the most we need for this minimal API. If we need something more elaborate, we can extend the ioctl size or add a new ioctl later. Thanks, Amir.
> On Nov 7, 2024, at 3:19 AM, Amir Goldstein <amir73il@gmail.com> wrote: > > On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote: >> >> Hi Jeff, >> >>> On Oct 30, 2024, at 5:23 PM, Jeff Layton <jlayton@kernel.org> wrote: >> >> [...] >> >>>> If the subtree is all in the same file system, we can attach fanotify to >>>> the whole file system, and then use some dget_parent() and follow_up() >>>> to walk up the directory tree in the fastpath handler. However, if there >>>> are other mount points in the subtree, we will need more logic to handle >>>> these mount points. >>>> >>> >>> My 2 cents... >>> >>> I'd just confine it to a single vfsmount. If you want to monitor in >>> several submounts, then you need to add new fanotify watches. >>> >>> Alternately, maybe there is some way to designate that an entire >>> vfsmount is a child of a watched (or ignored) directory? >>> >>>> @Christian, I would like to know your thoughts on this (walking up the >>>> directory tree in fanotify fastpath handler). It can be expensive for >>>> very very deep subtree. >>>> >>> >>> I'm not Christian, but I'll make the case for it. It's basically a >>> bunch of pointer chasing. That's probably not "cheap", but if you can >>> do it under RCU it might not be too awful. It might still suck with >>> really deep paths, but this is a sample module. It's not expected that >>> everyone will want to use it anyway. >> >> Thanks for the suggestion! I will try to do it under RCU. >> >>> >>>> How should we pass in the subtree? I guess we can just use full path in >>>> a string as the argument. >>>> >>> >>> I'd stay away from string parsing. How about this instead? >>> >>> Allow a process to open a directory fd, and then hand that fd to an >>> fanotify ioctl that says that you want to ignore everything that has >>> that directory as an ancestor. Or, maybe make it so that you only watch >>> dentries that have that directory as an ancestor? I'm not sure what >>> makes the most sense. >> >> Yes, directory fd is another option. Currently, the "attach to group" >> function only takes a string as input. I guess it makes sense to allow >> taking a fd, or maybe we should allow any random format (pass in a >> pointer to a structure. Let me give it a try. >> > > IIUC, the BFP program example uses another API to configure the filter > (i.e. the inode map). With BPF, the users can configure the filter via different BPF maps. The inode map is just one example, we can also use task map to create a different filter for each task (task that generates the event). > IMO, passing any single argument during setup time is not scalable > and any filter should have its own way to reconfigure its parameters > in runtime (i.e. add/remove watched subtree). > > Assuming that the same module/bfp_prog serves multiple fanotify > groups and each group may have a different filter config, I think that > passing an integer arg to identify the config (be it fd or something else) > is the most we need for this minimal API. > If we need something more elaborate, we can extend the ioctl size > or add a new ioctl later. With my local code, which is slightly different to the RFC, I have the ioctl pass in a pointer to fanotify_fastpath_args. struct fanotify_fastpath_args { char name[FAN_FP_NAME_MAX]; __u32 version; __u32 flags; /* * user space pointer to the init args of fastpath handler, * up to init_args_len (<= FAN_FP_ARGS_MAX). */ __u64 init_args; /* size of init_args */ __u32 init_args_size; } __attribute__((__packed__)); fanotify_fastpath_args->init_args is a user pointer to a custom (per fast path) structure. Then fanotify_fastpath_args->init_args will be passed to fanotify_fastpath_ops->fp_init(). I think this is flexible enough for the "attach fast path to a group" operation. If we want to reconfigure the fast path later, we may need another API. Thanks, Song
> On Nov 7, 2024, at 2:41 AM, Jan Kara <jack@suse.cz> wrote: > > On Wed 06-11-24 20:40:50, Amir Goldstein wrote: >> On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote: >>>> Alternately, maybe there is some way to designate that an entire >>>> vfsmount is a child of a watched (or ignored) directory? >>>> >>>>> @Christian, I would like to know your thoughts on this (walking up the >>>>> directory tree in fanotify fastpath handler). It can be expensive for >>>>> very very deep subtree. >>>>> >>>> >>>> I'm not Christian, but I'll make the case for it. It's basically a >>>> bunch of pointer chasing. That's probably not "cheap", but if you can >>>> do it under RCU it might not be too awful. It might still suck with >>>> really deep paths, but this is a sample module. It's not expected that >>>> everyone will want to use it anyway. >>> >>> Thanks for the suggestion! I will try to do it under RCU. >>> >> >> That's the cost of doing a subtree filter. >> Not sure how it could be avoided? > > Yes. For a real solution (not this example), we'd probably have to limit > the parent walk to say 16 steps and if we can reach neither root nor our > dir in that number of steps, we'll just chicken out and pass the event to > userspace to deal with it. This way the kernel filter will deal with most > cases anyway and we won't risk livelocking or too big performance overhead > of the filter. > > For this example, I think using fs/dcache.c:is_subdir() will be OK for > demonstration purposes. Thanks for the pointer! It does look like a good solution for this example. Song
diff --git a/MAINTAINERS b/MAINTAINERS index 7ad507f49324..8939a48b2d99 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8658,6 +8658,7 @@ S: Maintained F: fs/notify/fanotify/ F: include/linux/fanotify.h F: include/uapi/linux/fanotify.h +F: samples/fanotify/ FARADAY FOTG210 USB2 DUAL-ROLE CONTROLLER M: Linus Walleij <linus.walleij@linaro.org> diff --git a/samples/Kconfig b/samples/Kconfig index b288d9991d27..b0d3dff48bb0 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -149,15 +149,33 @@ config SAMPLE_CONNECTOR with it. See also Documentation/driver-api/connector.rst +config SAMPLE_FANOTIFY + bool "Build fanotify monitoring sample" + depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL + help + When enabled, this builds samples for fanotify. + There multiple samples for fanotify. Please see the + following configs for more details of these + samples. + config SAMPLE_FANOTIFY_ERROR bool "Build fanotify error monitoring sample" - depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL + depends on SAMPLE_FANOTIFY help When enabled, this builds an example code that uses the FAN_FS_ERROR fanotify mechanism to monitor filesystem errors. See also Documentation/admin-guide/filesystem-monitoring.rst. +config SAMPLE_FANOTIFY_FASTPATH + tristate "Build fanotify fastpath sample" + depends on SAMPLE_FANOTIFY && m + help + When enabled, this builds kernel module that contains a + fanotify fastpath handler. + The fastpath handler filters out certain filename + prefixes for the fanotify user. + config SAMPLE_HIDRAW bool "hidraw sample" depends on CC_CAN_LINK && HEADERS_INSTALL diff --git a/samples/Makefile b/samples/Makefile index b85fa64390c5..108360972626 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -6,7 +6,7 @@ subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs subdir-$(CONFIG_SAMPLE_CGROUP) += cgroup obj-$(CONFIG_SAMPLE_CONFIGFS) += configfs/ obj-$(CONFIG_SAMPLE_CONNECTOR) += connector/ -obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fanotify/ +obj-$(CONFIG_SAMPLE_FANOTIFY) += fanotify/ subdir-$(CONFIG_SAMPLE_HIDRAW) += hidraw obj-$(CONFIG_SAMPLE_HW_BREAKPOINT) += hw_breakpoint/ obj-$(CONFIG_SAMPLE_KDB) += kdb/ diff --git a/samples/fanotify/.gitignore b/samples/fanotify/.gitignore index d74593e8b2de..306e1ddec4e0 100644 --- a/samples/fanotify/.gitignore +++ b/samples/fanotify/.gitignore @@ -1 +1,2 @@ fs-monitor +fastpath-user diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile index e20db1bdde3b..f5bbd7380104 100644 --- a/samples/fanotify/Makefile +++ b/samples/fanotify/Makefile @@ -1,5 +1,8 @@ # SPDX-License-Identifier: GPL-2.0-only -userprogs-always-y += fs-monitor +userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fs-monitor userccflags += -I usr/include -Wall +obj-$(CONFIG_SAMPLE_FANOTIFY_FASTPATH) += fastpath-mod.o + +userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_FASTPATH) += fastpath-user diff --git a/samples/fanotify/fastpath-mod.c b/samples/fanotify/fastpath-mod.c new file mode 100644 index 000000000000..06c4b42ff114 --- /dev/null +++ b/samples/fanotify/fastpath-mod.c @@ -0,0 +1,138 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include <linux/fsnotify.h> +#include <linux/fanotify.h> +#include <linux/list.h> +#include <linux/module.h> +#include <linux/string.h> + +struct prefix_item { + const char *prefix; + struct list_head list; +}; + +struct sample_fp_data { + /* + * str_table contains all the prefixes to ignore. For example, + * "prefix1\0prefix2\0prefix3" + */ + char *str_table; + + /* item->prefix points to different prefixes in the str_table. */ + struct list_head item_list; +}; + +static int sample_fp_handler(struct fsnotify_group *group, + struct fanotify_fastpath_hook *fp_hook, + struct fanotify_fastpath_event *fp_event) +{ + const struct qstr *file_name = fp_event->file_name; + struct sample_fp_data *fp_data; + struct prefix_item *item; + + if (!file_name) + return FAN_FP_RET_SEND_TO_USERSPACE; + fp_data = fp_hook->data; + + list_for_each_entry(item, &fp_data->item_list, list) { + if (strstr(file_name->name, item->prefix) == (char *)file_name->name) + return FAN_FP_RET_SKIP_EVENT; + } + + return FAN_FP_RET_SEND_TO_USERSPACE; +} + +static int add_item(struct sample_fp_data *fp_data, const char *prev) +{ + struct prefix_item *item; + + item = kzalloc(sizeof(*item), GFP_KERNEL); + if (!item) + return -ENOMEM; + item->prefix = prev; + list_add_tail(&item->list, &fp_data->item_list); + return 0; +} + +static void free_sample_fp_data(struct sample_fp_data *fp_data) +{ + struct prefix_item *item, *tmp; + + list_for_each_entry_safe(item, tmp, &fp_data->item_list, list) { + list_del_init(&item->list); + kfree(item); + } + kfree(fp_data->str_table); + kfree(fp_data); +} + +static int sample_fp_init(struct fanotify_fastpath_hook *fp_hook, const char *args) +{ + struct sample_fp_data *fp_data = kzalloc(sizeof(struct sample_fp_data), GFP_KERNEL); + char *p, *prev; + int ret; + + if (!fp_data) + return -ENOMEM; + + /* Make a copy of the list of prefix to ignore */ + fp_data->str_table = kstrndup(args, FAN_FP_ARGS_MAX, GFP_KERNEL); + if (!fp_data->str_table) { + ret = -ENOMEM; + goto err_out; + } + + INIT_LIST_HEAD(&fp_data->item_list); + prev = fp_data->str_table; + p = fp_data->str_table; + + /* Update the list replace ',' with '\n'*/ + while ((p = strchr(p, ',')) != NULL) { + *p = '\0'; + ret = add_item(fp_data, prev); + if (ret) + goto err_out; + p = p + 1; + prev = p; + } + + ret = add_item(fp_data, prev); + if (ret) + goto err_out; + + fp_hook->data = fp_data; + + return 0; + +err_out: + free_sample_fp_data(fp_data); + return ret; +} + +static void sample_fp_free(struct fanotify_fastpath_hook *fp_hook) +{ + free_sample_fp_data(fp_hook->data); +} + +static struct fanotify_fastpath_ops fan_fp_ignore_a_ops = { + .fp_handler = sample_fp_handler, + .fp_init = sample_fp_init, + .fp_free = sample_fp_free, + .name = "ignore-prefix", + .owner = THIS_MODULE, +}; + +static int __init fanotify_fastpath_sample_init(void) +{ + return fanotify_fastpath_register(&fan_fp_ignore_a_ops); +} +static void __exit fanotify_fastpath_sample_exit(void) +{ + fanotify_fastpath_unregister(&fan_fp_ignore_a_ops); +} + +module_init(fanotify_fastpath_sample_init); +module_exit(fanotify_fastpath_sample_exit); + +MODULE_AUTHOR("Song Liu"); +MODULE_DESCRIPTION("Example fanotify fastpath handler"); +MODULE_LICENSE("GPL"); diff --git a/samples/fanotify/fastpath-user.c b/samples/fanotify/fastpath-user.c new file mode 100644 index 000000000000..f301c4e0d21a --- /dev/null +++ b/samples/fanotify/fastpath-user.c @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <err.h> +#include <fcntl.h> +#include <stdio.h> +#include <string.h> +#include <sys/fanotify.h> +#include <unistd.h> +#include <sys/ioctl.h> + +static int total_event_cnt; + +static void handle_notifications(char *buffer, int len) +{ + struct fanotify_event_metadata *event = + (struct fanotify_event_metadata *) buffer; + struct fanotify_event_info_header *info; + struct fanotify_event_info_fid *fid; + struct file_handle *handle; + char *name; + int off; + + for (; FAN_EVENT_OK(event, len); event = FAN_EVENT_NEXT(event, len)) { + for (off = sizeof(*event) ; off < event->event_len; + off += info->len) { + info = (struct fanotify_event_info_header *) + ((char *) event + off); + switch (info->info_type) { + case FAN_EVENT_INFO_TYPE_DFID_NAME: + fid = (struct fanotify_event_info_fid *) info; + handle = (struct file_handle *)&fid->handle; + name = (char *)handle + sizeof(*handle) + handle->handle_bytes; + + printf("Accessing file %s\n", name); + total_event_cnt++; + break; + default: + break; + } + } + } +} + +int main(int argc, char **argv) +{ + struct fanotify_fastpath_args args = { + .name = "ignore-prefix", + .version = 1, + .flags = 0, + }; + char buffer[BUFSIZ]; + int fd; + + if (argc < 3) { + printf("Usage\n" + "\t %s <path to monitor> <prefix to ignore>\n", + argv[0]); + return 1; + } + + args.init_args = (__u64)argv[2]; + args.init_args_len = strlen(argv[2]) + 1; + + fd = fanotify_init(FAN_CLASS_NOTIF | FAN_REPORT_NAME | FAN_REPORT_DIR_FID, O_RDONLY); + if (fd < 0) + errx(1, "fanotify_init"); + + if (fanotify_mark(fd, FAN_MARK_ADD, + FAN_OPEN | FAN_ONDIR | FAN_EVENT_ON_CHILD, + AT_FDCWD, argv[1])) { + errx(1, "fanotify_mark"); + } + + if (ioctl(fd, FAN_IOC_ADD_FP, &args)) + errx(1, "ioctl"); + + while (total_event_cnt < 10) { + int n = read(fd, buffer, BUFSIZ); + + if (n < 0) + errx(1, "read"); + + handle_notifications(buffer, n); + } + + ioctl(fd, FAN_IOC_DEL_FP); + close(fd); + + return 0; +}
This fastpath handler filters out events for files with certain prefixes. To use it: [root] insmod fastpath-mod.ko # This requires root. [user] ./fastpath-user /tmp a,b,c & # Root is not needed [user] touch /tmp/aa # a is in the prefix list (a,b,c), no events [user] touch /tmp/xx # x is not in the prefix list, generates events Accessing file xx # this is the output from fastpath_user Signed-off-by: Song Liu <song@kernel.org> --- MAINTAINERS | 1 + samples/Kconfig | 20 ++++- samples/Makefile | 2 +- samples/fanotify/.gitignore | 1 + samples/fanotify/Makefile | 5 +- samples/fanotify/fastpath-mod.c | 138 +++++++++++++++++++++++++++++++ samples/fanotify/fastpath-user.c | 90 ++++++++++++++++++++ 7 files changed, 254 insertions(+), 3 deletions(-) create mode 100644 samples/fanotify/fastpath-mod.c create mode 100644 samples/fanotify/fastpath-user.c