[01/20] fsnotify: Rearrange fast path to minimise overhead when there is no watcher
diff mbox series

Message ID 20200612093343.5669-2-amir73il@gmail.com
State New
Headers show
Series
  • Prep work for fanotify named events
Related show

Commit Message

Amir Goldstein June 12, 2020, 9:33 a.m. UTC
From: Mel Gorman <mgorman@techsingularity.net>

The fsnotify paths are trivial to hit even when there are no watchers and
they are surprisingly expensive. For example, every successful vfs_write()
hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless
FMODE_NONOTIFY is set which is an internal flag invisible to userspace.
As it stands, fsnotify_parent is a guaranteed functional call even if there
are no watchers and fsnotify() does a substantial amount of unnecessary
work before it checks if there are any watchers. A perf profile showed
that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the
total samples taken in that function during a test. This patch rearranges
the fast paths to reduce the amount of work done when there are no
watchers.

The test motivating this was "perf bench sched messaging --pipe". Despite
the fact the pipes are anonymous, fsnotify is still called a lot and
the overhead is noticeable even though it's completely pointless. It's
likely the overhead is negligible for real IO so this is an extreme
example. This is a comparison of hackbench using processes and pipes on
a 1-socket machine with 8 CPU threads without fanotify watchers.

                              5.7.0                  5.7.0
                            vanilla      fastfsnotify-v1r1
Amean     1       0.4837 (   0.00%)      0.4630 *   4.27%*
Amean     3       1.5447 (   0.00%)      1.4557 (   5.76%)
Amean     5       2.6037 (   0.00%)      2.4363 (   6.43%)
Amean     7       3.5987 (   0.00%)      3.4757 (   3.42%)
Amean     12      5.8267 (   0.00%)      5.6983 (   2.20%)
Amean     18      8.4400 (   0.00%)      8.1327 (   3.64%)
Amean     24     11.0187 (   0.00%)     10.0290 *   8.98%*
Amean     30     13.1013 (   0.00%)     12.8510 (   1.91%)
Amean     32     13.9190 (   0.00%)     13.2410 (   4.87%)

                       5.7.0       5.7.0
                     vanilla fastfsnotify-v1r1
Duration User         157.05      152.79
Duration System      1279.98     1219.32
Duration Elapsed      182.81      174.52

This is showing that the latencies are improved by roughly 2-9%. The
variability is not shown but some of these results are within the noise
as this workload heavily overloads the machine. That said, the system CPU
usage is reduced by quite a bit so it makes sense to avoid the overhead
even if it is a bit tricky to detect at times. A perf profile of just 1
group of tasks showed that 5.14% of samples taken were in either fsnotify()
or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify,
mostly function entry and the initial check for watchers.  The check for
watchers is complicated enough that inlining it may be controversial.

[Amir] Slightly simplify with mnt_or_sb_mask => marks_mask

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fsnotify.c             | 27 +++++++++++++++------------
 include/linux/fsnotify.h         | 10 ++++++++++
 include/linux/fsnotify_backend.h |  4 ++--
 3 files changed, 27 insertions(+), 14 deletions(-)

Comments

Jan Kara July 3, 2020, 2:03 p.m. UTC | #1
On Fri 12-06-20 12:33:24, Amir Goldstein wrote:
> From: Mel Gorman <mgorman@techsingularity.net>
> 
> The fsnotify paths are trivial to hit even when there are no watchers and
> they are surprisingly expensive. For example, every successful vfs_write()
> hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless
> FMODE_NONOTIFY is set which is an internal flag invisible to userspace.
> As it stands, fsnotify_parent is a guaranteed functional call even if there
> are no watchers and fsnotify() does a substantial amount of unnecessary
> work before it checks if there are any watchers. A perf profile showed
> that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the
> total samples taken in that function during a test. This patch rearranges
> the fast paths to reduce the amount of work done when there are no
> watchers.
> 
> The test motivating this was "perf bench sched messaging --pipe". Despite
> the fact the pipes are anonymous, fsnotify is still called a lot and
> the overhead is noticeable even though it's completely pointless. It's
> likely the overhead is negligible for real IO so this is an extreme
> example. This is a comparison of hackbench using processes and pipes on
> a 1-socket machine with 8 CPU threads without fanotify watchers.
> 
>                               5.7.0                  5.7.0
>                             vanilla      fastfsnotify-v1r1
> Amean     1       0.4837 (   0.00%)      0.4630 *   4.27%*
> Amean     3       1.5447 (   0.00%)      1.4557 (   5.76%)
> Amean     5       2.6037 (   0.00%)      2.4363 (   6.43%)
> Amean     7       3.5987 (   0.00%)      3.4757 (   3.42%)
> Amean     12      5.8267 (   0.00%)      5.6983 (   2.20%)
> Amean     18      8.4400 (   0.00%)      8.1327 (   3.64%)
> Amean     24     11.0187 (   0.00%)     10.0290 *   8.98%*
> Amean     30     13.1013 (   0.00%)     12.8510 (   1.91%)
> Amean     32     13.9190 (   0.00%)     13.2410 (   4.87%)
> 
>                        5.7.0       5.7.0
>                      vanilla fastfsnotify-v1r1
> Duration User         157.05      152.79
> Duration System      1279.98     1219.32
> Duration Elapsed      182.81      174.52
> 
> This is showing that the latencies are improved by roughly 2-9%. The
> variability is not shown but some of these results are within the noise
> as this workload heavily overloads the machine. That said, the system CPU
> usage is reduced by quite a bit so it makes sense to avoid the overhead
> even if it is a bit tricky to detect at times. A perf profile of just 1
> group of tasks showed that 5.14% of samples taken were in either fsnotify()
> or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify,
> mostly function entry and the initial check for watchers.  The check for
> watchers is complicated enough that inlining it may be controversial.
> 
> [Amir] Slightly simplify with mnt_or_sb_mask => marks_mask
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/notify/fsnotify.c             | 27 +++++++++++++++------------
>  include/linux/fsnotify.h         | 10 ++++++++++
>  include/linux/fsnotify_backend.h |  4 ++--
>  3 files changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 72d332ce8e12..d59a58d10b84 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -143,7 +143,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
>  }
>  
>  /* Notify this dentry's parent about a child's events. */
> -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
>  		    int data_type)
>  {
>  	struct dentry *parent;

Hum, should we actually remove the DCACHE_FSNOTIFY_PARENT_WATCHED check
from here when it's moved to fsnotify_parent() inline helper?

> @@ -315,17 +315,11 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
>  	struct fsnotify_iter_info iter_info = {};
>  	struct super_block *sb = to_tell->i_sb;
>  	struct mount *mnt = NULL;
> -	__u32 mnt_or_sb_mask = sb->s_fsnotify_mask;
>  	int ret = 0;
> -	__u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> +	__u32 test_mask, marks_mask;
>  
> -	if (path) {
> +	if (path)
>  		mnt = real_mount(path->mnt);
> -		mnt_or_sb_mask |= mnt->mnt_fsnotify_mask;
> -	}
> -	/* An event "on child" is not intended for a mount/sb mark */
> -	if (mask & FS_EVENT_ON_CHILD)
> -		mnt_or_sb_mask = 0;
>  
>  	/*
>  	 * Optimization: srcu_read_lock() has a memory barrier which can
> @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
>  	if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks &&
>  	    (!mnt || !mnt->mnt_fsnotify_marks))
>  		return 0;
> +
> +	/* An event "on child" is not intended for a mount/sb mark */
> +	marks_mask = to_tell->i_fsnotify_mask;
> +	if (!(mask & FS_EVENT_ON_CHILD)) {
> +		marks_mask |= sb->s_fsnotify_mask;
> +		if (mnt)
> +			marks_mask |= mnt->mnt_fsnotify_mask;
> +	}
> +
>  	/*
>  	 * if this is a modify event we may need to clear the ignored masks
>  	 * otherwise return if neither the inode nor the vfsmount/sb care about
>  	 * this type of event.
>  	 */
> -	if (!(mask & FS_MODIFY) &&
> -	    !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask)))
> +	test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> +	if (!(mask & FS_MODIFY) && !(test_mask & marks_mask))
>  		return 0;

Otherwise the patch looks good. One observation though: The (mask &
FS_MODIFY) check means that all vfs_write() calls end up going through the
"slower" path iterating all mark types and checking whether there are marks
anyway. That could be relatively simply optimized using a hidden mask flag
like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
needing special handling of FS_MODIFY... Not sure if we care enough at this
point...

								Honza
Amir Goldstein July 4, 2020, 9:30 a.m. UTC | #2
On Fri, Jul 3, 2020 at 5:03 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 12-06-20 12:33:24, Amir Goldstein wrote:
> > From: Mel Gorman <mgorman@techsingularity.net>
> >
> > The fsnotify paths are trivial to hit even when there are no watchers and
> > they are surprisingly expensive. For example, every successful vfs_write()
> > hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless
> > FMODE_NONOTIFY is set which is an internal flag invisible to userspace.
> > As it stands, fsnotify_parent is a guaranteed functional call even if there
> > are no watchers and fsnotify() does a substantial amount of unnecessary
> > work before it checks if there are any watchers. A perf profile showed
> > that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the
> > total samples taken in that function during a test. This patch rearranges
> > the fast paths to reduce the amount of work done when there are no
> > watchers.
> >
> > The test motivating this was "perf bench sched messaging --pipe". Despite
> > the fact the pipes are anonymous, fsnotify is still called a lot and
> > the overhead is noticeable even though it's completely pointless. It's
> > likely the overhead is negligible for real IO so this is an extreme
> > example. This is a comparison of hackbench using processes and pipes on
> > a 1-socket machine with 8 CPU threads without fanotify watchers.
> >
> >                               5.7.0                  5.7.0
> >                             vanilla      fastfsnotify-v1r1
> > Amean     1       0.4837 (   0.00%)      0.4630 *   4.27%*
> > Amean     3       1.5447 (   0.00%)      1.4557 (   5.76%)
> > Amean     5       2.6037 (   0.00%)      2.4363 (   6.43%)
> > Amean     7       3.5987 (   0.00%)      3.4757 (   3.42%)
> > Amean     12      5.8267 (   0.00%)      5.6983 (   2.20%)
> > Amean     18      8.4400 (   0.00%)      8.1327 (   3.64%)
> > Amean     24     11.0187 (   0.00%)     10.0290 *   8.98%*
> > Amean     30     13.1013 (   0.00%)     12.8510 (   1.91%)
> > Amean     32     13.9190 (   0.00%)     13.2410 (   4.87%)
> >
> >                        5.7.0       5.7.0
> >                      vanilla fastfsnotify-v1r1
> > Duration User         157.05      152.79
> > Duration System      1279.98     1219.32
> > Duration Elapsed      182.81      174.52
> >
> > This is showing that the latencies are improved by roughly 2-9%. The
> > variability is not shown but some of these results are within the noise
> > as this workload heavily overloads the machine. That said, the system CPU
> > usage is reduced by quite a bit so it makes sense to avoid the overhead
> > even if it is a bit tricky to detect at times. A perf profile of just 1
> > group of tasks showed that 5.14% of samples taken were in either fsnotify()
> > or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify,
> > mostly function entry and the initial check for watchers.  The check for
> > watchers is complicated enough that inlining it may be controversial.
> >
> > [Amir] Slightly simplify with mnt_or_sb_mask => marks_mask
> >
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/notify/fsnotify.c             | 27 +++++++++++++++------------
> >  include/linux/fsnotify.h         | 10 ++++++++++
> >  include/linux/fsnotify_backend.h |  4 ++--
> >  3 files changed, 27 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> > index 72d332ce8e12..d59a58d10b84 100644
> > --- a/fs/notify/fsnotify.c
> > +++ b/fs/notify/fsnotify.c
> > @@ -143,7 +143,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
> >  }
> >
> >  /* Notify this dentry's parent about a child's events. */
> > -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> > +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> >                   int data_type)
> >  {
> >       struct dentry *parent;
>
> Hum, should we actually remove the DCACHE_FSNOTIFY_PARENT_WATCHED check
> from here when it's moved to fsnotify_parent() inline helper?

No point.
It is making a comeback on:
 fsnotify: send event with parent/name info to sb/mount/non-dir marks

>
> > @@ -315,17 +315,11 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> >       struct fsnotify_iter_info iter_info = {};
> >       struct super_block *sb = to_tell->i_sb;
> >       struct mount *mnt = NULL;
> > -     __u32 mnt_or_sb_mask = sb->s_fsnotify_mask;
> >       int ret = 0;
> > -     __u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> > +     __u32 test_mask, marks_mask;
> >
> > -     if (path) {
> > +     if (path)
> >               mnt = real_mount(path->mnt);
> > -             mnt_or_sb_mask |= mnt->mnt_fsnotify_mask;
> > -     }
> > -     /* An event "on child" is not intended for a mount/sb mark */
> > -     if (mask & FS_EVENT_ON_CHILD)
> > -             mnt_or_sb_mask = 0;
> >
> >       /*
> >        * Optimization: srcu_read_lock() has a memory barrier which can
> > @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> >       if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks &&
> >           (!mnt || !mnt->mnt_fsnotify_marks))
> >               return 0;
> > +
> > +     /* An event "on child" is not intended for a mount/sb mark */
> > +     marks_mask = to_tell->i_fsnotify_mask;
> > +     if (!(mask & FS_EVENT_ON_CHILD)) {
> > +             marks_mask |= sb->s_fsnotify_mask;
> > +             if (mnt)
> > +                     marks_mask |= mnt->mnt_fsnotify_mask;
> > +     }
> > +
> >       /*
> >        * if this is a modify event we may need to clear the ignored masks
> >        * otherwise return if neither the inode nor the vfsmount/sb care about
> >        * this type of event.
> >        */
> > -     if (!(mask & FS_MODIFY) &&
> > -         !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask)))
> > +     test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> > +     if (!(mask & FS_MODIFY) && !(test_mask & marks_mask))
> >               return 0;
>
> Otherwise the patch looks good. One observation though: The (mask &
> FS_MODIFY) check means that all vfs_write() calls end up going through the
> "slower" path iterating all mark types and checking whether there are marks
> anyway. That could be relatively simply optimized using a hidden mask flag
> like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> needing special handling of FS_MODIFY... Not sure if we care enough at this
> point...

Yeh that sounds low hanging.
Actually, I Don't think we need to define a flag for that.
__fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.

I will take a look at that as part of FS_PRE_MODIFY work.
But in general, we should fight the urge to optimize theoretic
performance issues...

Thanks,
Amir.
Jan Kara July 6, 2020, 11:05 a.m. UTC | #3
On Sat 04-07-20 12:30:10, Amir Goldstein wrote:
> On Fri, Jul 3, 2020 at 5:03 PM Jan Kara <jack@suse.cz> wrote:
> > >  /* Notify this dentry's parent about a child's events. */
> > > -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> > > +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
> > >                   int data_type)
> > >  {
> > >       struct dentry *parent;
> >
> > Hum, should we actually remove the DCACHE_FSNOTIFY_PARENT_WATCHED check
> > from here when it's moved to fsnotify_parent() inline helper?
> 
> No point.
> It is making a comeback on:
>  fsnotify: send event with parent/name info to sb/mount/non-dir marks

Right, I've noticed that later as well.

> > > @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
> > >       if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks &&
> > >           (!mnt || !mnt->mnt_fsnotify_marks))
> > >               return 0;
> > > +
> > > +     /* An event "on child" is not intended for a mount/sb mark */
> > > +     marks_mask = to_tell->i_fsnotify_mask;
> > > +     if (!(mask & FS_EVENT_ON_CHILD)) {
> > > +             marks_mask |= sb->s_fsnotify_mask;
> > > +             if (mnt)
> > > +                     marks_mask |= mnt->mnt_fsnotify_mask;
> > > +     }
> > > +
> > >       /*
> > >        * if this is a modify event we may need to clear the ignored masks
> > >        * otherwise return if neither the inode nor the vfsmount/sb care about
> > >        * this type of event.
> > >        */
> > > -     if (!(mask & FS_MODIFY) &&
> > > -         !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask)))
> > > +     test_mask = (mask & ALL_FSNOTIFY_EVENTS);
> > > +     if (!(mask & FS_MODIFY) && !(test_mask & marks_mask))
> > >               return 0;
> >
> > Otherwise the patch looks good. One observation though: The (mask &
> > FS_MODIFY) check means that all vfs_write() calls end up going through the
> > "slower" path iterating all mark types and checking whether there are marks
> > anyway. That could be relatively simply optimized using a hidden mask flag
> > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> > needing special handling of FS_MODIFY... Not sure if we care enough at this
> > point...
> 
> Yeh that sounds low hanging.
> Actually, I Don't think we need to define a flag for that.
> __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.

Yes, that would be even more elegant.

> I will take a look at that as part of FS_PRE_MODIFY work.
> But in general, we should fight the urge to optimize theoretic
> performance issues...

Agreed. I just suspect this may bring measurable benefit for hackbench pipe
or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a
separate idea and without some numbers confirming my suspicion I don't
think the complication is worth it so I don't want you to burn time on this
unless you're really interested :).

								Honza
Amir Goldstein July 9, 2020, 5:56 p.m. UTC | #4
> > > Otherwise the patch looks good. One observation though: The (mask &
> > > FS_MODIFY) check means that all vfs_write() calls end up going through the
> > > "slower" path iterating all mark types and checking whether there are marks
> > > anyway. That could be relatively simply optimized using a hidden mask flag
> > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> > > needing special handling of FS_MODIFY... Not sure if we care enough at this
> > > point...
> >
> > Yeh that sounds low hanging.
> > Actually, I Don't think we need to define a flag for that.
> > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.
>
> Yes, that would be even more elegant.
>
> > I will take a look at that as part of FS_PRE_MODIFY work.
> > But in general, we should fight the urge to optimize theoretic
> > performance issues...
>
> Agreed. I just suspect this may bring measurable benefit for hackbench pipe
> or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a
> separate idea and without some numbers confirming my suspicion I don't
> think the complication is worth it so I don't want you to burn time on this
> unless you're really interested :).
>

You know me ;-)
FS_MODIFY optimization pushed to fsnotify_pre_modify branch.
Only tested that LTP tests pass.

Note that this is only expected to improve performance in case there *are*
marks, but not marks with ignore mask, because there is an earlier
optimization in fsnotify() for the no marks case.

Sorry for bombarding you with more patches (I should let you finish with
fanotify_prep and fanotify_name_fid), but if you get a chance and can
take a quick look at these 2 patches on fsnotify_pre_modify branch:
1. fsnotify: replace igrab() with ihold() on attach connector
2. fsnotify: allow adding an inode mark without pinning inode

They are very small and simple, but I am afraid I may be missing something.

Why did we use igrab() there in the first place? Is there a reason or is it
relic from old code?

As for the second patch, I won't get into why I need the evictable inode
marks right now, but I was wondering if there was some inherent reason
that I am missing why that cannot be done and inodes *have* to be pinned
if you attach a mark to them (besides functionality of course)?

Thanks,
Amir.
Amir Goldstein July 26, 2020, 3:20 p.m. UTC | #5
On Thu, Jul 9, 2020 at 8:56 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > > Otherwise the patch looks good. One observation though: The (mask &
> > > > FS_MODIFY) check means that all vfs_write() calls end up going through the
> > > > "slower" path iterating all mark types and checking whether there are marks
> > > > anyway. That could be relatively simply optimized using a hidden mask flag
> > > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> > > > needing special handling of FS_MODIFY... Not sure if we care enough at this
> > > > point...
> > >
> > > Yeh that sounds low hanging.
> > > Actually, I Don't think we need to define a flag for that.
> > > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.
> >
> > Yes, that would be even more elegant.
> >
> > > I will take a look at that as part of FS_PRE_MODIFY work.
> > > But in general, we should fight the urge to optimize theoretic
> > > performance issues...
> >
> > Agreed. I just suspect this may bring measurable benefit for hackbench pipe
> > or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a
> > separate idea and without some numbers confirming my suspicion I don't
> > think the complication is worth it so I don't want you to burn time on this
> > unless you're really interested :).
> >
>
> You know me ;-)
> FS_MODIFY optimization pushed to fsnotify_pre_modify branch.
> Only tested that LTP tests pass.
>
> Note that this is only expected to improve performance in case there *are*
> marks, but not marks with ignore mask, because there is an earlier
> optimization in fsnotify() for the no marks case.
>

Hi Mel,

After following up on Jan's suggestion above, I realized there is another
low hanging optimization we can make.

As you may remember, one of the solutions we considered was to exclude
special or internal sb's from notifications based on some SB flag, but making
assumptions about which sb are expected to provide notifications turned out
to be a risky game.

We can however, keep a counter on sb to *know* there are no watches
on any object in this sb, so the test:

        if (!sb->s_fsnotify_marks &&
            (!mnt || !mnt->mnt_fsnotify_marks) &&
            (!inode || !inode->i_fsnotify_marks))
                return 0;

Which is not so nice for inlining, can be summarized as:

        if (atomic_long_read(&inode->i_sb->s_fsnotify_obj_refs) == 0)
                return 0;

Which is nicer for inlining.

I am not sure if you had a concrete reason for:
"fs: Do not check if there is a fsnotify watcher on pseudo inodes"
or if you did it for the sport.

I have made the change above for the sport and for now I do not
intend to post it for review unless a good reason comes up.

If you are interested or curious to queue this code to Suse perf testing,
I pushed it to branch fsnotify-perf [1]. It may be interesting to see if it
won back the cpu cycles lost in the revert of your patch.

This branch is based on some changes already in Jan's tree and some
changes in my development tree (fsnotify_pre_modify), but you already
fed this development branch to perf test machine once and reported back
that there was no significant degradation.

I can also provide the optimization patches based on Linus' tree if needed.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fsnotify-perf
Jan Kara July 27, 2020, 7:44 a.m. UTC | #6
On Sun 26-07-20 18:20:26, Amir Goldstein wrote:
> On Thu, Jul 9, 2020 at 8:56 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > > > Otherwise the patch looks good. One observation though: The (mask &
> > > > > FS_MODIFY) check means that all vfs_write() calls end up going through the
> > > > > "slower" path iterating all mark types and checking whether there are marks
> > > > > anyway. That could be relatively simply optimized using a hidden mask flag
> > > > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> > > > > needing special handling of FS_MODIFY... Not sure if we care enough at this
> > > > > point...
> > > >
> > > > Yeh that sounds low hanging.
> > > > Actually, I Don't think we need to define a flag for that.
> > > > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.
> > >
> > > Yes, that would be even more elegant.
> > >
> > > > I will take a look at that as part of FS_PRE_MODIFY work.
> > > > But in general, we should fight the urge to optimize theoretic
> > > > performance issues...
> > >
> > > Agreed. I just suspect this may bring measurable benefit for hackbench pipe
> > > or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a
> > > separate idea and without some numbers confirming my suspicion I don't
> > > think the complication is worth it so I don't want you to burn time on this
> > > unless you're really interested :).
> > >
> >
> > You know me ;-)
> > FS_MODIFY optimization pushed to fsnotify_pre_modify branch.
> > Only tested that LTP tests pass.
> >
> > Note that this is only expected to improve performance in case there *are*
> > marks, but not marks with ignore mask, because there is an earlier
> > optimization in fsnotify() for the no marks case.
> >
> 
> Hi Mel,
> 
> After following up on Jan's suggestion above, I realized there is another
> low hanging optimization we can make.
> 
> As you may remember, one of the solutions we considered was to exclude
> special or internal sb's from notifications based on some SB flag, but making
> assumptions about which sb are expected to provide notifications turned out
> to be a risky game.
> 
> We can however, keep a counter on sb to *know* there are no watches
> on any object in this sb, so the test:
> 
>         if (!sb->s_fsnotify_marks &&
>             (!mnt || !mnt->mnt_fsnotify_marks) &&
>             (!inode || !inode->i_fsnotify_marks))
>                 return 0;
> 
> Which is not so nice for inlining, can be summarized as:
> 
>         if (atomic_long_read(&inode->i_sb->s_fsnotify_obj_refs) == 0)
>                 return 0;
> 
> Which is nicer for inlining.

That's a nice idea. I was just wondering why do you account only inode
references in the superblock. Because if there's only say mount watch,
s_fsnotify_obj_refs will be 0 and you will wrongly skip reporting. Or am I
misunderstanding something? I'd rather have counter like
sb->s_fsnotify_connectors, that will account all connectors related to the
superblock - i.e., connectors attached to the superblock, mounts referring
to the superblock, or inodes referring to the superblock...

								Honza
Amir Goldstein July 27, 2020, 10:02 a.m. UTC | #7
On Mon, Jul 27, 2020 at 10:44 AM Jan Kara <jack@suse.cz> wrote:
>
> On Sun 26-07-20 18:20:26, Amir Goldstein wrote:
> > On Thu, Jul 9, 2020 at 8:56 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > > > > Otherwise the patch looks good. One observation though: The (mask &
> > > > > > FS_MODIFY) check means that all vfs_write() calls end up going through the
> > > > > > "slower" path iterating all mark types and checking whether there are marks
> > > > > > anyway. That could be relatively simply optimized using a hidden mask flag
> > > > > > like FS_ALWAYS_RECEIVE_MODIFY which would be set when there's some mark
> > > > > > needing special handling of FS_MODIFY... Not sure if we care enough at this
> > > > > > point...
> > > > >
> > > > > Yeh that sounds low hanging.
> > > > > Actually, I Don't think we need to define a flag for that.
> > > > > __fsnotify_recalc_mask() can add FS_MODIFY to the object's mask if needed.
> > > >
> > > > Yes, that would be even more elegant.
> > > >
> > > > > I will take a look at that as part of FS_PRE_MODIFY work.
> > > > > But in general, we should fight the urge to optimize theoretic
> > > > > performance issues...
> > > >
> > > > Agreed. I just suspect this may bring measurable benefit for hackbench pipe
> > > > or tiny tmpfs writes after seeing Mel's results. But as I wrote this is a
> > > > separate idea and without some numbers confirming my suspicion I don't
> > > > think the complication is worth it so I don't want you to burn time on this
> > > > unless you're really interested :).
> > > >
> > >
> > > You know me ;-)
> > > FS_MODIFY optimization pushed to fsnotify_pre_modify branch.
> > > Only tested that LTP tests pass.
> > >
> > > Note that this is only expected to improve performance in case there *are*
> > > marks, but not marks with ignore mask, because there is an earlier
> > > optimization in fsnotify() for the no marks case.
> > >
> >
> > Hi Mel,
> >
> > After following up on Jan's suggestion above, I realized there is another
> > low hanging optimization we can make.
> >
> > As you may remember, one of the solutions we considered was to exclude
> > special or internal sb's from notifications based on some SB flag, but making
> > assumptions about which sb are expected to provide notifications turned out
> > to be a risky game.
> >
> > We can however, keep a counter on sb to *know* there are no watches
> > on any object in this sb, so the test:
> >
> >         if (!sb->s_fsnotify_marks &&
> >             (!mnt || !mnt->mnt_fsnotify_marks) &&
> >             (!inode || !inode->i_fsnotify_marks))
> >                 return 0;
> >
> > Which is not so nice for inlining, can be summarized as:
> >
> >         if (atomic_long_read(&inode->i_sb->s_fsnotify_obj_refs) == 0)
> >                 return 0;
> >
> > Which is nicer for inlining.
>
> That's a nice idea. I was just wondering why do you account only inode
> references in the superblock. Because if there's only say mount watch,
> s_fsnotify_obj_refs will be 0 and you will wrongly skip reporting. Or am I
> misunderstanding something? I'd rather have counter like
> sb->s_fsnotify_connectors, that will account all connectors related to the
> superblock - i.e., connectors attached to the superblock, mounts referring
> to the superblock, or inodes referring to the superblock...
>

Yeh, that is what I did.
Those two commits change the former s_fsnotify_inode_refs
to s_fsnotify_obj_refs which counts objects (inodes/mounts/sb) pointed to
be connectors.
I agree that s_fsnotify_connectors may be a better choice of name ;-)

de1255f8a64c fsnotify: count all objects with attached connectors
5e6c3af6e2df fsnotify: count s_fsnotify_inode_refs for attached connectors

Thanks,
Amir.

Patch
diff mbox series

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 72d332ce8e12..d59a58d10b84 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -143,7 +143,7 @@  void __fsnotify_update_child_dentry_flags(struct inode *inode)
 }
 
 /* Notify this dentry's parent about a child's events. */
-int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
+int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 		    int data_type)
 {
 	struct dentry *parent;
@@ -174,7 +174,7 @@  int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(fsnotify_parent);
+EXPORT_SYMBOL_GPL(__fsnotify_parent);
 
 static int send_to_group(struct inode *to_tell,
 			 __u32 mask, const void *data,
@@ -315,17 +315,11 @@  int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 	struct fsnotify_iter_info iter_info = {};
 	struct super_block *sb = to_tell->i_sb;
 	struct mount *mnt = NULL;
-	__u32 mnt_or_sb_mask = sb->s_fsnotify_mask;
 	int ret = 0;
-	__u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
+	__u32 test_mask, marks_mask;
 
-	if (path) {
+	if (path)
 		mnt = real_mount(path->mnt);
-		mnt_or_sb_mask |= mnt->mnt_fsnotify_mask;
-	}
-	/* An event "on child" is not intended for a mount/sb mark */
-	if (mask & FS_EVENT_ON_CHILD)
-		mnt_or_sb_mask = 0;
 
 	/*
 	 * Optimization: srcu_read_lock() has a memory barrier which can
@@ -337,13 +331,22 @@  int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 	if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks &&
 	    (!mnt || !mnt->mnt_fsnotify_marks))
 		return 0;
+
+	/* An event "on child" is not intended for a mount/sb mark */
+	marks_mask = to_tell->i_fsnotify_mask;
+	if (!(mask & FS_EVENT_ON_CHILD)) {
+		marks_mask |= sb->s_fsnotify_mask;
+		if (mnt)
+			marks_mask |= mnt->mnt_fsnotify_mask;
+	}
+
 	/*
 	 * if this is a modify event we may need to clear the ignored masks
 	 * otherwise return if neither the inode nor the vfsmount/sb care about
 	 * this type of event.
 	 */
-	if (!(mask & FS_MODIFY) &&
-	    !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask)))
+	test_mask = (mask & ALL_FSNOTIFY_EVENTS);
+	if (!(mask & FS_MODIFY) && !(test_mask & marks_mask))
 		return 0;
 
 	iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu);
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 5ab28f6c7d26..508f6bb0b06b 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -44,6 +44,16 @@  static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
 	fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0);
 }
 
+/* Notify this dentry's parent about a child's events. */
+static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
+				  const void *data, int data_type)
+{
+	if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))
+		return 0;
+
+	return __fsnotify_parent(dentry, mask, data, data_type);
+}
+
 /*
  * Simple wrappers to consolidate calls fsnotify_parent()/fsnotify() when
  * an event is on a file/dentry.
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index f0c506405b54..1626fa7d10ff 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -379,7 +379,7 @@  struct fsnotify_mark {
 /* main fsnotify call to send events */
 extern int fsnotify(struct inode *to_tell, __u32 mask, const void *data,
 		    int data_type, const struct qstr *name, u32 cookie);
-extern int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
+extern int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 			   int data_type);
 extern void __fsnotify_inode_delete(struct inode *inode);
 extern void __fsnotify_vfsmount_delete(struct vfsmount *mnt);
@@ -541,7 +541,7 @@  static inline int fsnotify(struct inode *to_tell, __u32 mask, const void *data,
 	return 0;
 }
 
-static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
+static inline int __fsnotify_parent(struct dentry *dentry, __u32 mask,
 				  const void *data, int data_type)
 {
 	return 0;