Message ID | 20200612093343.5669-2-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Prep work for fanotify named events | expand |
On Fri 12-06-20 12:33:24, Amir Goldstein wrote: > From: Mel Gorman <mgorman@techsingularity.net> > > The fsnotify paths are trivial to hit even when there are no watchers and > they are surprisingly expensive. For example, every successful vfs_write() > hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless > FMODE_NONOTIFY is set which is an internal flag invisible to userspace. > As it stands, fsnotify_parent is a guaranteed functional call even if there > are no watchers and fsnotify() does a substantial amount of unnecessary > work before it checks if there are any watchers. A perf profile showed > that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the > total samples taken in that function during a test. This patch rearranges > the fast paths to reduce the amount of work done when there are no > watchers. > > The test motivating this was "perf bench sched messaging --pipe". Despite > the fact the pipes are anonymous, fsnotify is still called a lot and > the overhead is noticeable even though it's completely pointless. It's > likely the overhead is negligible for real IO so this is an extreme > example. This is a comparison of hackbench using processes and pipes on > a 1-socket machine with 8 CPU threads without fanotify watchers. > > 5.7.0 5.7.0 > vanilla fastfsnotify-v1r1 > Amean 1 0.4837 ( 0.00%) 0.4630 * 4.27%* > Amean 3 1.5447 ( 0.00%) 1.4557 ( 5.76%) > Amean 5 2.6037 ( 0.00%) 2.4363 ( 6.43%) > Amean 7 3.5987 ( 0.00%) 3.4757 ( 3.42%) > Amean 12 5.8267 ( 0.00%) 5.6983 ( 2.20%) > Amean 18 8.4400 ( 0.00%) 8.1327 ( 3.64%) > Amean 24 11.0187 ( 0.00%) 10.0290 * 8.98%* > Amean 30 13.1013 ( 0.00%) 12.8510 ( 1.91%) > Amean 32 13.9190 ( 0.00%) 13.2410 ( 4.87%) > > 5.7.0 5.7.0 > vanilla fastfsnotify-v1r1 > Duration User 157.05 152.79 > Duration System 1279.98 1219.32 > Duration Elapsed 182.81 174.52 > > This is showing that the latencies are improved by roughly 2-9%. The > variability is not shown but some of these results are within the noise > as this workload heavily overloads the machine. That said, the system CPU > usage is reduced by quite a bit so it makes sense to avoid the overhead > even if it is a bit tricky to detect at times. A perf profile of just 1 > group of tasks showed that 5.14% of samples taken were in either fsnotify() > or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify, > mostly function entry and the initial check for watchers. The check for > watchers is complicated enough that inlining it may be controversial. > > [Amir] Slightly simplify with mnt_or_sb_mask => marks_mask > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > fs/notify/fsnotify.c | 27 +++++++++++++++------------ > include/linux/fsnotify.h | 10 ++++++++++ > include/linux/fsnotify_backend.h | 4 ++-- > 3 files changed, 27 insertions(+), 14 deletions(-) > > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c > index 72d332ce8e12..d59a58d10b84 100644 > --- a/fs/notify/fsnotify.c > +++ b/fs/notify/fsnotify.c > @@ -143,7 +143,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode) > } > > /* Notify this dentry's parent about a child's events. */ > -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, > +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, > int data_type) > { > struct dentry *parent; Hum, should we actually remove the DCACHE_FSNOTIFY_PARENT_WATCHED check from here when it's moved to fsnotify_parent() inline helper? > @@ -315,17 +315,11 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, > struct fsnotify_iter_info iter_info = {}; > struct super_block *sb = to_tell->i_sb; > struct mount *mnt = NULL; > - __u32 mnt_or_sb_mask = sb->s_fsnotify_mask; > int ret = 0; > - __u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS); > + __u32 test_mask, marks_mask; > > - if (path) { > + if (path) > mnt = real_mount(path->mnt); > - mnt_or_sb_mask |= mnt->mnt_fsnotify_mask; > - } > - /* An event "on child" is not intended for a mount/sb mark */ > - if (mask & FS_EVENT_ON_CHILD) > - mnt_or_sb_mask = 0; > > /* > * Optimization: srcu_read_lock() has a memory barrier which can > @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, > if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks && > (!mnt || !mnt->mnt_fsnotify_marks)) > return 0; > + > + /* An event "on child" is not intended for a mount/sb mark */ > + marks_mask = to_tell->i_fsnotify_mask; > + if (!(mask & FS_EVENT_ON_CHILD)) { > + marks_mask |= sb->s_fsnotify_mask; > + if (mnt) > + marks_mask |= mnt->mnt_fsnotify_mask; > + } > + > /* > * if this is a modify event we may need to clear the ignored masks > * otherwise return if neither the inode nor the vfsmount/sb care about > * this type of event. > */ > - if (!(mask & FS_MODIFY) && > - !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask))) > + test_mask = (mask & ALL_FSNOTIFY_EVENTS); > + if (!(mask & FS_MODIFY) && !(test_mask & marks_mask)) > return 0; Otherwise the patch looks good. One observation though: The (mask & FS_MODIFY) check means that all vfs_write() calls end up going through the "slower" path iterating all mark types and checking whether there are marks anyway. That could be relatively simply optimized using a hidden mask flag like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark needing special handling of FS_MODIFY... Not sure if we care enough at this point... Honza
On Fri, Jul 3, 2020 at 5:03 PM Jan Kara <jack@suse.cz> wrote: > > On Fri 12-06-20 12:33:24, Amir Goldstein wrote: > > From: Mel Gorman <mgorman@techsingularity.net> > > > > The fsnotify paths are trivial to hit even when there are no watchers and > > they are surprisingly expensive. For example, every successful vfs_write() > > hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless > > FMODE_NONOTIFY is set which is an internal flag invisible to userspace. > > As it stands, fsnotify_parent is a guaranteed functional call even if there > > are no watchers and fsnotify() does a substantial amount of unnecessary > > work before it checks if there are any watchers. A perf profile showed > > that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the > > total samples taken in that function during a test. This patch rearranges > > the fast paths to reduce the amount of work done when there are no > > watchers. > > > > The test motivating this was "perf bench sched messaging --pipe". Despite > > the fact the pipes are anonymous, fsnotify is still called a lot and > > the overhead is noticeable even though it's completely pointless. It's > > likely the overhead is negligible for real IO so this is an extreme > > example. This is a comparison of hackbench using processes and pipes on > > a 1-socket machine with 8 CPU threads without fanotify watchers. > > > > 5.7.0 5.7.0 > > vanilla fastfsnotify-v1r1 > > Amean 1 0.4837 ( 0.00%) 0.4630 * 4.27%* > > Amean 3 1.5447 ( 0.00%) 1.4557 ( 5.76%) > > Amean 5 2.6037 ( 0.00%) 2.4363 ( 6.43%) > > Amean 7 3.5987 ( 0.00%) 3.4757 ( 3.42%) > > Amean 12 5.8267 ( 0.00%) 5.6983 ( 2.20%) > > Amean 18 8.4400 ( 0.00%) 8.1327 ( 3.64%) > > Amean 24 11.0187 ( 0.00%) 10.0290 * 8.98%* > > Amean 30 13.1013 ( 0.00%) 12.8510 ( 1.91%) > > Amean 32 13.9190 ( 0.00%) 13.2410 ( 4.87%) > > > > 5.7.0 5.7.0 > > vanilla fastfsnotify-v1r1 > > Duration User 157.05 152.79 > > Duration System 1279.98 1219.32 > > Duration Elapsed 182.81 174.52 > > > > This is showing that the latencies are improved by roughly 2-9%. The > > variability is not shown but some of these results are within the noise > > as this workload heavily overloads the machine. That said, the system CPU > > usage is reduced by quite a bit so it makes sense to avoid the overhead > > even if it is a bit tricky to detect at times. A perf profile of just 1 > > group of tasks showed that 5.14% of samples taken were in either fsnotify() > > or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify, > > mostly function entry and the initial check for watchers. The check for > > watchers is complicated enough that inlining it may be controversial. > > > > [Amir] Slightly simplify with mnt_or_sb_mask => marks_mask > > > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > fs/notify/fsnotify.c | 27 +++++++++++++++------------ > > include/linux/fsnotify.h | 10 ++++++++++ > > include/linux/fsnotify_backend.h | 4 ++-- > > 3 files changed, 27 insertions(+), 14 deletions(-) > > > > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c > > index 72d332ce8e12..d59a58d10b84 100644 > > --- a/fs/notify/fsnotify.c > > +++ b/fs/notify/fsnotify.c > > @@ -143,7 +143,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode) > > } > > > > /* Notify this dentry's parent about a child's events. */ > > -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, > > +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, > > int data_type) > > { > > struct dentry *parent; > > Hum, should we actually remove the DCACHE_FSNOTIFY_PARENT_WATCHED check > from here when it's moved to fsnotify_parent() inline helper? No point. It is making a comeback on: fsnotify: send event with parent/name info to sb/mount/non-dir marks > > > @@ -315,17 +315,11 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, > > struct fsnotify_iter_info iter_info = {}; > > struct super_block *sb = to_tell->i_sb; > > struct mount *mnt = NULL; > > - __u32 mnt_or_sb_mask = sb->s_fsnotify_mask; > > int ret = 0; > > - __u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS); > > + __u32 test_mask, marks_mask; > > > > - if (path) { > > + if (path) > > mnt = real_mount(path->mnt); > > - mnt_or_sb_mask |= mnt->mnt_fsnotify_mask; > > - } > > - /* An event "on child" is not intended for a mount/sb mark */ > > - if (mask & FS_EVENT_ON_CHILD) > > - mnt_or_sb_mask = 0; > > > > /* > > * Optimization: srcu_read_lock() has a memory barrier which can > > @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, > > if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks && > > (!mnt || !mnt->mnt_fsnotify_marks)) > > return 0; > > + > > + /* An event "on child" is not intended for a mount/sb mark */ > > + marks_mask = to_tell->i_fsnotify_mask; > > + if (!(mask & FS_EVENT_ON_CHILD)) { > > + marks_mask |= sb->s_fsnotify_mask; > > + if (mnt) > > + marks_mask |= mnt->mnt_fsnotify_mask; > > + } > > + > > /* > > * if this is a modify event we may need to clear the ignored masks > > * otherwise return if neither the inode nor the vfsmount/sb care about > > * this type of event. > > */ > > - if (!(mask & FS_MODIFY) && > > - !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask))) > > + test_mask = (mask & ALL_FSNOTIFY_EVENTS); > > + if (!(mask & FS_MODIFY) && !(test_mask & marks_mask)) > > return 0; > > Otherwise the patch looks good. One observation though: The (mask & > FS_MODIFY) check means that all vfs_write() calls end up going through the > "slower" path iterating all mark types and checking whether there are marks > anyway. That could be relatively simply optimized using a hidden mask flag > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark > needing special handling of FS_MODIFY... Not sure if we care enough at this > point... Yeh that sounds low hanging. Actually, I Don't think we need to define a flag for that. __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed. I will take a look at that as part of FS_PRE_MODIFY work. But in general, we should fight the urge to optimize theoretic performance issues... Thanks, Amir.
On Sat 04-07-20 12:30:10, Amir Goldstein wrote: > On Fri, Jul 3, 2020 at 5:03 PM Jan Kara <jack@suse.cz> wrote: > > > /* Notify this dentry's parent about a child's events. */ > > > -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, > > > +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, > > > int data_type) > > > { > > > struct dentry *parent; > > > > Hum, should we actually remove the DCACHE_FSNOTIFY_PARENT_WATCHED check > > from here when it's moved to fsnotify_parent() inline helper? > > No point. > It is making a comeback on: > fsnotify: send event with parent/name info to sb/mount/non-dir marks Right, I've noticed that later as well. > > > @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, > > > if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks && > > > (!mnt || !mnt->mnt_fsnotify_marks)) > > > return 0; > > > + > > > + /* An event "on child" is not intended for a mount/sb mark */ > > > + marks_mask = to_tell->i_fsnotify_mask; > > > + if (!(mask & FS_EVENT_ON_CHILD)) { > > > + marks_mask |= sb->s_fsnotify_mask; > > > + if (mnt) > > > + marks_mask |= mnt->mnt_fsnotify_mask; > > > + } > > > + > > > /* > > > * if this is a modify event we may need to clear the ignored masks > > > * otherwise return if neither the inode nor the vfsmount/sb care about > > > * this type of event. > > > */ > > > - if (!(mask & FS_MODIFY) && > > > - !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask))) > > > + test_mask = (mask & ALL_FSNOTIFY_EVENTS); > > > + if (!(mask & FS_MODIFY) && !(test_mask & marks_mask)) > > > return 0; > > > > Otherwise the patch looks good. One observation though: The (mask & > > FS_MODIFY) check means that all vfs_write() calls end up going through the > > "slower" path iterating all mark types and checking whether there are marks > > anyway. That could be relatively simply optimized using a hidden mask flag > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark > > needing special handling of FS_MODIFY... Not sure if we care enough at this > > point... > > Yeh that sounds low hanging. > Actually, I Don't think we need to define a flag for that. > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed. Yes, that would be even more elegant. > I will take a look at that as part of FS_PRE_MODIFY work. > But in general, we should fight the urge to optimize theoretic > performance issues... Agreed. I just suspect this may bring measurable benefit for hackbench pipe or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a separate idea and without some numbers confirming my suspicion I don't think the complication is worth it so I don't want you to burn time on this unless you're really interested :). Honza
> > > Otherwise the patch looks good. One observation though: The (mask & > > > FS_MODIFY) check means that all vfs_write() calls end up going through the > > > "slower" path iterating all mark types and checking whether there are marks > > > anyway. That could be relatively simply optimized using a hidden mask flag > > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark > > > needing special handling of FS_MODIFY... Not sure if we care enough at this > > > point... > > > > Yeh that sounds low hanging. > > Actually, I Don't think we need to define a flag for that. > > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed. > > Yes, that would be even more elegant. > > > I will take a look at that as part of FS_PRE_MODIFY work. > > But in general, we should fight the urge to optimize theoretic > > performance issues... > > Agreed. I just suspect this may bring measurable benefit for hackbench pipe > or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a > separate idea and without some numbers confirming my suspicion I don't > think the complication is worth it so I don't want you to burn time on this > unless you're really interested :). > You know me ;-) FS_MODIFY optimization pushed to fsnotify_pre_modify branch. Only tested that LTP tests pass. Note that this is only expected to improve performance in case there *are* marks, but not marks with ignore mask, because there is an earlier optimization in fsnotify() for the no marks case. Sorry for bombarding you with more patches (I should let you finish with fanotify_prep and fanotify_name_fid), but if you get a chance and can take a quick look at these 2 patches on fsnotify_pre_modify branch: 1. fsnotify: replace igrab() with ihold() on attach connector 2. fsnotify: allow adding an inode mark without pinning inode They are very small and simple, but I am afraid I may be missing something. Why did we use igrab() there in the first place? Is there a reason or is it relic from old code? As for the second patch, I won't get into why I need the evictable inode marks right now, but I was wondering if there was some inherent reason that I am missing why that cannot be done and inodes *have* to be pinned if you attach a mark to them (besides functionality of course)? Thanks, Amir.
On Thu, Jul 9, 2020 at 8:56 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > Otherwise the patch looks good. One observation though: The (mask & > > > > FS_MODIFY) check means that all vfs_write() calls end up going through the > > > > "slower" path iterating all mark types and checking whether there are marks > > > > anyway. That could be relatively simply optimized using a hidden mask flag > > > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark > > > > needing special handling of FS_MODIFY... Not sure if we care enough at this > > > > point... > > > > > > Yeh that sounds low hanging. > > > Actually, I Don't think we need to define a flag for that. > > > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed. > > > > Yes, that would be even more elegant. > > > > > I will take a look at that as part of FS_PRE_MODIFY work. > > > But in general, we should fight the urge to optimize theoretic > > > performance issues... > > > > Agreed. I just suspect this may bring measurable benefit for hackbench pipe > > or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a > > separate idea and without some numbers confirming my suspicion I don't > > think the complication is worth it so I don't want you to burn time on this > > unless you're really interested :). > > > > You know me ;-) > FS_MODIFY optimization pushed to fsnotify_pre_modify branch. > Only tested that LTP tests pass. > > Note that this is only expected to improve performance in case there *are* > marks, but not marks with ignore mask, because there is an earlier > optimization in fsnotify() for the no marks case. > Hi Mel, After following up on Jan's suggestion above, I realized there is another low hanging optimization we can make. As you may remember, one of the solutions we considered was to exclude special or internal sb's from notifications based on some SB flag, but making assumptions about which sb are expected to provide notifications turned out to be a risky game. We can however, keep a counter on sb to *know* there are no watches on any object in this sb, so the test: if (!sb->s_fsnotify_marks && (!mnt || !mnt->mnt_fsnotify_marks) && (!inode || !inode->i_fsnotify_marks)) return 0; Which is not so nice for inlining, can be summarized as: if (atomic_long_read(&inode->i_sb->s_fsnotify_obj_refs) == 0) return 0; Which is nicer for inlining. I am not sure if you had a concrete reason for: "fs: Do not check if there is a fsnotify watcher on pseudo inodes" or if you did it for the sport. I have made the change above for the sport and for now I do not intend to post it for review unless a good reason comes up. If you are interested or curious to queue this code to Suse perf testing, I pushed it to branch fsnotify-perf [1]. It may be interesting to see if it won back the cpu cycles lost in the revert of your patch. This branch is based on some changes already in Jan's tree and some changes in my development tree (fsnotify_pre_modify), but you already fed this development branch to perf test machine once and reported back that there was no significant degradation. I can also provide the optimization patches based on Linus' tree if needed. Thanks, Amir. [1] https://github.com/amir73il/linux/commits/fsnotify-perf
On Sun 26-07-20 18:20:26, Amir Goldstein wrote: > On Thu, Jul 9, 2020 at 8:56 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > Otherwise the patch looks good. One observation though: The (mask & > > > > > FS_MODIFY) check means that all vfs_write() calls end up going through the > > > > > "slower" path iterating all mark types and checking whether there are marks > > > > > anyway. That could be relatively simply optimized using a hidden mask flag > > > > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark > > > > > needing special handling of FS_MODIFY... Not sure if we care enough at this > > > > > point... > > > > > > > > Yeh that sounds low hanging. > > > > Actually, I Don't think we need to define a flag for that. > > > > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed. > > > > > > Yes, that would be even more elegant. > > > > > > > I will take a look at that as part of FS_PRE_MODIFY work. > > > > But in general, we should fight the urge to optimize theoretic > > > > performance issues... > > > > > > Agreed. I just suspect this may bring measurable benefit for hackbench pipe > > > or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a > > > separate idea and without some numbers confirming my suspicion I don't > > > think the complication is worth it so I don't want you to burn time on this > > > unless you're really interested :). > > > > > > > You know me ;-) > > FS_MODIFY optimization pushed to fsnotify_pre_modify branch. > > Only tested that LTP tests pass. > > > > Note that this is only expected to improve performance in case there *are* > > marks, but not marks with ignore mask, because there is an earlier > > optimization in fsnotify() for the no marks case. > > > > Hi Mel, > > After following up on Jan's suggestion above, I realized there is another > low hanging optimization we can make. > > As you may remember, one of the solutions we considered was to exclude > special or internal sb's from notifications based on some SB flag, but making > assumptions about which sb are expected to provide notifications turned out > to be a risky game. > > We can however, keep a counter on sb to *know* there are no watches > on any object in this sb, so the test: > > if (!sb->s_fsnotify_marks && > (!mnt || !mnt->mnt_fsnotify_marks) && > (!inode || !inode->i_fsnotify_marks)) > return 0; > > Which is not so nice for inlining, can be summarized as: > > if (atomic_long_read(&inode->i_sb->s_fsnotify_obj_refs) == 0) > return 0; > > Which is nicer for inlining. That's a nice idea. I was just wondering why do you account only inode references in the superblock. Because if there's only say mount watch, s_fsnotify_obj_refs will be 0 and you will wrongly skip reporting. Or am I misunderstanding something? I'd rather have counter like sb->s_fsnotify_connectors, that will account all connectors related to the superblock - i.e., connectors attached to the superblock, mounts referring to the superblock, or inodes referring to the superblock... Honza
On Mon, Jul 27, 2020 at 10:44 AM Jan Kara <jack@suse.cz> wrote: > > On Sun 26-07-20 18:20:26, Amir Goldstein wrote: > > On Thu, Jul 9, 2020 at 8:56 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > Otherwise the patch looks good. One observation though: The (mask & > > > > > > FS_MODIFY) check means that all vfs_write() calls end up going through the > > > > > > "slower" path iterating all mark types and checking whether there are marks > > > > > > anyway. That could be relatively simply optimized using a hidden mask flag > > > > > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark > > > > > > needing special handling of FS_MODIFY... Not sure if we care enough at this > > > > > > point... > > > > > > > > > > Yeh that sounds low hanging. > > > > > Actually, I Don't think we need to define a flag for that. > > > > > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed. > > > > > > > > Yes, that would be even more elegant. > > > > > > > > > I will take a look at that as part of FS_PRE_MODIFY work. > > > > > But in general, we should fight the urge to optimize theoretic > > > > > performance issues... > > > > > > > > Agreed. I just suspect this may bring measurable benefit for hackbench pipe > > > > or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a > > > > separate idea and without some numbers confirming my suspicion I don't > > > > think the complication is worth it so I don't want you to burn time on this > > > > unless you're really interested :). > > > > > > > > > > You know me ;-) > > > FS_MODIFY optimization pushed to fsnotify_pre_modify branch. > > > Only tested that LTP tests pass. > > > > > > Note that this is only expected to improve performance in case there *are* > > > marks, but not marks with ignore mask, because there is an earlier > > > optimization in fsnotify() for the no marks case. > > > > > > > Hi Mel, > > > > After following up on Jan's suggestion above, I realized there is another > > low hanging optimization we can make. > > > > As you may remember, one of the solutions we considered was to exclude > > special or internal sb's from notifications based on some SB flag, but making > > assumptions about which sb are expected to provide notifications turned out > > to be a risky game. > > > > We can however, keep a counter on sb to *know* there are no watches > > on any object in this sb, so the test: > > > > if (!sb->s_fsnotify_marks && > > (!mnt || !mnt->mnt_fsnotify_marks) && > > (!inode || !inode->i_fsnotify_marks)) > > return 0; > > > > Which is not so nice for inlining, can be summarized as: > > > > if (atomic_long_read(&inode->i_sb->s_fsnotify_obj_refs) == 0) > > return 0; > > > > Which is nicer for inlining. > > That's a nice idea. I was just wondering why do you account only inode > references in the superblock. Because if there's only say mount watch, > s_fsnotify_obj_refs will be 0 and you will wrongly skip reporting. Or am I > misunderstanding something? I'd rather have counter like > sb->s_fsnotify_connectors, that will account all connectors related to the > superblock - i.e., connectors attached to the superblock, mounts referring > to the superblock, or inodes referring to the superblock... > Yeh, that is what I did. Those two commits change the former s_fsnotify_inode_refs to s_fsnotify_obj_refs which counts objects (inodes/mounts/sb) pointed to be connectors. I agree that s_fsnotify_connectors may be a better choice of name ;-) de1255f8a64c fsnotify: count all objects with attached connectors 5e6c3af6e2df fsnotify: count s_fsnotify_inode_refs for attached connectors Thanks, Amir.
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 72d332ce8e12..d59a58d10b84 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -143,7 +143,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode) } /* Notify this dentry's parent about a child's events. */ -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type) { struct dentry *parent; @@ -174,7 +174,7 @@ int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, return ret; } -EXPORT_SYMBOL_GPL(fsnotify_parent); +EXPORT_SYMBOL_GPL(__fsnotify_parent); static int send_to_group(struct inode *to_tell, __u32 mask, const void *data, @@ -315,17 +315,11 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, struct fsnotify_iter_info iter_info = {}; struct super_block *sb = to_tell->i_sb; struct mount *mnt = NULL; - __u32 mnt_or_sb_mask = sb->s_fsnotify_mask; int ret = 0; - __u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS); + __u32 test_mask, marks_mask; - if (path) { + if (path) mnt = real_mount(path->mnt); - mnt_or_sb_mask |= mnt->mnt_fsnotify_mask; - } - /* An event "on child" is not intended for a mount/sb mark */ - if (mask & FS_EVENT_ON_CHILD) - mnt_or_sb_mask = 0; /* * Optimization: srcu_read_lock() has a memory barrier which can @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks && (!mnt || !mnt->mnt_fsnotify_marks)) return 0; + + /* An event "on child" is not intended for a mount/sb mark */ + marks_mask = to_tell->i_fsnotify_mask; + if (!(mask & FS_EVENT_ON_CHILD)) { + marks_mask |= sb->s_fsnotify_mask; + if (mnt) + marks_mask |= mnt->mnt_fsnotify_mask; + } + /* * if this is a modify event we may need to clear the ignored masks * otherwise return if neither the inode nor the vfsmount/sb care about * this type of event. */ - if (!(mask & FS_MODIFY) && - !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask))) + test_mask = (mask & ALL_FSNOTIFY_EVENTS); + if (!(mask & FS_MODIFY) && !(test_mask & marks_mask)) return 0; iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 5ab28f6c7d26..508f6bb0b06b 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -44,6 +44,16 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0); } +/* Notify this dentry's parent about a child's events. */ +static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, + const void *data, int data_type) +{ + if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) + return 0; + + return __fsnotify_parent(dentry, mask, data, data_type); +} + /* * Simple wrappers to consolidate calls fsnotify_parent()/fsnotify() when * an event is on a file/dentry. diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index f0c506405b54..1626fa7d10ff 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -379,7 +379,7 @@ struct fsnotify_mark { /* main fsnotify call to send events */ extern int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_type, const struct qstr *name, u32 cookie); -extern int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, +extern int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type); extern void __fsnotify_inode_delete(struct inode *inode); extern void __fsnotify_vfsmount_delete(struct vfsmount *mnt); @@ -541,7 +541,7 @@ static inline int fsnotify(struct inode *to_tell, __u32 mask, const void *data, return 0; } -static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, +static inline int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type) { return 0;