diff mbox series

fsnotify: fix unlink performance regression

Message ID 20190505091549.1934-1-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series fsnotify: fix unlink performance regression | expand

Commit Message

Amir Goldstein May 5, 2019, 9:15 a.m. UTC
__fsnotify_parent() has an optimization in place to avoid unneeded
take_dentry_name_snapshot().  When fsnotify_nameremove() was changed
not to call __fsnotify_parent(), we left out the optimization.
Kernel test robot reported a 5% performance regression in concurrent
unlink() workload.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/
Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Jan,

The linked 5.1-rc1 performance regression report came with bad timing.
Not sure if Linus is planning an rc8. If not, you will probably not
see this before the 5.1 release and we shall have to queue it for 5.2
and backport to stable 5.1.

I crafted the patch so it applies cleanly both to master and Al's
for-next branch (there are some fsnotify changes in work.dcache).

Thanks,
Amir.

 include/linux/fsnotify.h | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Al Viro May 5, 2019, 1:05 p.m. UTC | #1
On Sun, May 05, 2019 at 12:15:49PM +0300, Amir Goldstein wrote:
> __fsnotify_parent() has an optimization in place to avoid unneeded
> take_dentry_name_snapshot().  When fsnotify_nameremove() was changed
> not to call __fsnotify_parent(), we left out the optimization.
> Kernel test robot reported a 5% performance regression in concurrent
> unlink() workload.
> 
> Reported-by: kernel test robot <rong.a.chen@intel.com>
> Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/
> Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Jan,
> 
> The linked 5.1-rc1 performance regression report came with bad timing.
> Not sure if Linus is planning an rc8. If not, you will probably not
> see this before the 5.1 release and we shall have to queue it for 5.2
> and backport to stable 5.1.
> 
> I crafted the patch so it applies cleanly both to master and Al's
> for-next branch (there are some fsnotify changes in work.dcache).

Charming...  What about rename() and matching regressions there?
Amir Goldstein May 5, 2019, 1:19 p.m. UTC | #2
On Sun, May 5, 2019 at 4:05 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, May 05, 2019 at 12:15:49PM +0300, Amir Goldstein wrote:
> > __fsnotify_parent() has an optimization in place to avoid unneeded
> > take_dentry_name_snapshot().  When fsnotify_nameremove() was changed
> > not to call __fsnotify_parent(), we left out the optimization.
> > Kernel test robot reported a 5% performance regression in concurrent
> > unlink() workload.
> >
> > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/
> > Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> > Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events")
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >
> > Jan,
> >
> > The linked 5.1-rc1 performance regression report came with bad timing.
> > Not sure if Linus is planning an rc8. If not, you will probably not
> > see this before the 5.1 release and we shall have to queue it for 5.2
> > and backport to stable 5.1.
> >
> > I crafted the patch so it applies cleanly both to master and Al's
> > for-next branch (there are some fsnotify changes in work.dcache).
>
> Charming...  What about rename() and matching regressions there?

rename() and create() do not take_dentry_name_snapshot(), because
they are called with parent inode lock held.
I have made an analysis of callers to d_delete() and found that all callers
either hold parent inode lock or name is stable for another reason:
https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/

But Jan preferred to keep take_dentry_name_snapshot() to be safe.
I think the right thing to do is assert that parent inode is locked or
no rename op in d_delete() and take the lock in ceph/ocfs2 to conform
to the standard.

If that sounds good to you, I will follow up with a patch for next and then
remove take_dentry_name_snapshot() from fsnotify_nameremove() hook.

Thanks,
Amir.
Al Viro May 5, 2019, 1:47 p.m. UTC | #3
On Sun, May 05, 2019 at 04:19:02PM +0300, Amir Goldstein wrote:

> I have made an analysis of callers to d_delete() and found that all callers
> either hold parent inode lock or name is stable for another reason:
> https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> 
> But Jan preferred to keep take_dentry_name_snapshot() to be safe.
> I think the right thing to do is assert that parent inode is locked or
> no rename op in d_delete() and take the lock in ceph/ocfs2 to conform
> to the standard.

Any messing with the locking in ceph_fill_trace() would have to come
with very detailed proof of correctness, convincingly stable wrt
future changes in ceph...
Amir Goldstein May 5, 2019, 2:18 p.m. UTC | #4
On Sun, May 5, 2019 at 4:47 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, May 05, 2019 at 04:19:02PM +0300, Amir Goldstein wrote:
>
> > I have made an analysis of callers to d_delete() and found that all callers
> > either hold parent inode lock or name is stable for another reason:
> > https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> >
> > But Jan preferred to keep take_dentry_name_snapshot() to be safe.
> > I think the right thing to do is assert that parent inode is locked or
> > no rename op in d_delete() and take the lock in ceph/ocfs2 to conform
> > to the standard.
>
> Any messing with the locking in ceph_fill_trace() would have to come
> with very detailed proof of correctness, convincingly stable wrt
> future changes in ceph...

Yeh... Is there any other way to assert that d_name is stable in d_delete()?
My original rule was:
WARN_ON_ONCE(!inode_is_locked(dir) && dir->i_op->rename &&
      !(dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE));

In the hope that declaring FS_RENAME_DOES_D_MOVE means fs knows
what it is doing...

Thanks,
Amir.
Amir Goldstein May 5, 2019, 6:26 p.m. UTC | #5
On Sun, May 5, 2019 at 7:34 PM kbuild test robot <lkp@intel.com> wrote:
>
> Hi Amir,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [also build test ERROR on v5.1-rc7 next-20190503]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Amir-Goldstein/fsnotify-fix-unlink-performance-regression/20190505-233115
> config: riscv-tinyconfig (attached as .config)
> compiler: riscv64-linux-gcc (GCC) 8.1.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=8.1.0 make.cross ARCH=riscv
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>    In file included from fs///attr.c:15:
>    include/linux/fsnotify.h: In function 'fsnotify_nameremove':
> >> include/linux/fsnotify.h:179:23: error: 'struct inode' has no member named 'i_fsnotify_mask'
>      if (!(d_inode(parent)->i_fsnotify_mask & FS_DELETE) &&
>                           ^~
> >> include/linux/fsnotify.h:180:20: error: 'struct super_block' has no member named 's_fsnotify_mask'
>          !(dentry->d_sb->s_fsnotify_mask & FS_DELETE))
>                        ^~
>

Crap! forgot these wrappers are not NOOPed without CONFIG_FSNOTIFY.
It is so annoying to fix bugs in code that should not exist.

In d_delete() at this point, dentry is either negative or inode->i_nlink
which accounts for this name should be decremented.
If d_move() was possible on this dentry, bad things would happen.

I really wish I could just drop this take_dentry_name_snapshot()
and leave the WARN_ON() I suggested instead...
For now will just send an unbroken patch.

Thanks,
Amir.
Amir Goldstein May 6, 2019, 12:43 p.m. UTC | #6
On Sun, May 5, 2019 at 4:47 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, May 05, 2019 at 04:19:02PM +0300, Amir Goldstein wrote:
>
> > I have made an analysis of callers to d_delete() and found that all callers
> > either hold parent inode lock or name is stable for another reason:
> > https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
> >
> > But Jan preferred to keep take_dentry_name_snapshot() to be safe.
> > I think the right thing to do is assert that parent inode is locked or
> > no rename op in d_delete() and take the lock in ceph/ocfs2 to conform
> > to the standard.
>
> Any messing with the locking in ceph_fill_trace() would have to come
> with very detailed proof of correctness, convincingly stable wrt
> future changes in ceph...

OK. What do you have to say about this statement?

    Because fsnotify_nameremove() is called from d_delete() with negative
    or unhashed dentry, d_move() is not expected on this dentry, so it is
    safe to use d_parent/d_name without take_dentry_name_snapshot().

I assume it is not correct, but cannot figure out why.
Under what circumstances is d_move() expected to move an unhashed
dentry and hash it?

If this is not expected then wouldn't we be better off with:
@@ -2774,9 +2774,9 @@ static void __d_move(struct dentry *dentry,
struct dentry *target,

        /* unhash both */
-       if (!d_unhashed(dentry))
+       if (!WARN_ON(d_unhashed(dentry)))
                ___d_drop(dentry);
-       if (!d_unhashed(target))
+       if (!WARN_ON(d_unhashed(target)))
                ___d_drop(target);

And then we can get rid of take_dentry_name_snapshot() in fsnotify_nameremove()
and leave an assertion instead:

+       /* Proof of stability of d_parent and d_name */
+       WARN_ON_ONCE(d_inode(dentry) && !d_unhashed(dentry));
+

My other thought is why is fsnotify_nameremove() in d_delete() and
not in vfs_unlink()/vfs_rmdir() under parent inode lock like the rest
of the fsnotify_create/fsnotify_move hooks?

In what case would we need the fsnotify event that is not coming
from vfs_unlink()/vfs_rmdir()?

Thanks,
Amir.
Al Viro May 6, 2019, 2:26 p.m. UTC | #7
On Mon, May 06, 2019 at 03:43:24PM +0300, Amir Goldstein wrote:
> OK. What do you have to say about this statement?
> 
>     Because fsnotify_nameremove() is called from d_delete() with negative
>     or unhashed dentry, d_move() is not expected on this dentry, so it is
>     safe to use d_parent/d_name without take_dentry_name_snapshot().
> 
> I assume it is not correct, but cannot figure out why.
> Under what circumstances is d_move() expected to move an unhashed
> dentry and hash it?

For starters, d_splice_alias() picking an exising alias for given directory
inode.

> My other thought is why is fsnotify_nameremove() in d_delete() and
> not in vfs_unlink()/vfs_rmdir() under parent inode lock like the rest
> of the fsnotify_create/fsnotify_move hooks?
> 
> In what case would we need the fsnotify event that is not coming
> from vfs_unlink()/vfs_rmdir()?

*snort*

You can thank those who whine about notifications on sysfs/devpts/whatnot.
Go talk to them if you wish, but don't ask me to translate what you'll get
into something coherent - I'd never been able to.
Amir Goldstein May 6, 2019, 4:22 p.m. UTC | #8
On Mon, May 6, 2019 at 5:26 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, May 06, 2019 at 03:43:24PM +0300, Amir Goldstein wrote:
> > OK. What do you have to say about this statement?
> >
> >     Because fsnotify_nameremove() is called from d_delete() with negative
> >     or unhashed dentry, d_move() is not expected on this dentry, so it is
> >     safe to use d_parent/d_name without take_dentry_name_snapshot().
> >
> > I assume it is not correct, but cannot figure out why.
> > Under what circumstances is d_move() expected to move an unhashed
> > dentry and hash it?
>
> For starters, d_splice_alias() picking an exising alias for given directory
> inode.

Ok. But seeing that we are already in d_delete() said directory is already
IS_DEADDIR, so that can be added to the assertion that proves the
stability of d_name.
Are there any other cases? weird filesystems?

>
> > My other thought is why is fsnotify_nameremove() in d_delete() and
> > not in vfs_unlink()/vfs_rmdir() under parent inode lock like the rest
> > of the fsnotify_create/fsnotify_move hooks?
> >
> > In what case would we need the fsnotify event that is not coming
> > from vfs_unlink()/vfs_rmdir()?
>
> *snort*
>
> You can thank those who whine about notifications on sysfs/devpts/whatnot.
> Go talk to them if you wish, but don't ask me to translate what you'll get
> into something coherent - I'd never been able to.

I see. So all of those fs that are interested in notifications already have
fsnotify_create()/fsnotify_move() calls in them.
There are only 5 of them: binderfs, debugfs, devpts, tracefs, sunrpc.
It would be easier and less convoluted to also add the fsnotify_nameremove()
explicit calls in those fs.
With those fs either d_name is inherently stable or (debugfs) locks on parent
are taken properly.

But d_delete() effectively provides something else. If provides "remote server
change notifications" for clustered/networking filesystems when server
invalidates the local dentry. This is not a feature that is guarantied
by fsnotify.
notifications will be delivered randomly based on existence of local entry
in dcache. Only networking fs that provide proper remote notifications
(cifs is interested in doing that) should really be calling the fsnofity hooks.

Of course, we remove the random remote notifications from clustered/network
fs, there is bound to be someone unhappy. Urgh! out of those fs, only ocfs2
calls fsnotify_create(). If it is deemed important, can add
fsnotify_nameremove()
to ocfs2 as well.

Dare we make this change and see if people shout?
It's easy to add the random remote fsnotify_nameremove() calls per request
for a filesystems that want it.

Thanks,
Amir.
diff mbox series

Patch

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 09587e2860b5..2272c8c2023c 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -175,12 +175,19 @@  static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
 		mask |= FS_ISDIR;
 
 	parent = dget_parent(dentry);
+	/* Avoid unneeded take_dentry_name_snapshot() */
+	if (!(d_inode(parent)->i_fsnotify_mask & FS_DELETE) &&
+	    !(dentry->d_sb->s_fsnotify_mask & FS_DELETE))
+		goto out_dput;
+
 	take_dentry_name_snapshot(&name, dentry);
 
 	fsnotify(d_inode(parent), mask, d_inode(dentry), FSNOTIFY_EVENT_INODE,
 		 name.name, 0);
 
 	release_dentry_name_snapshot(&name);
+
+out_dput:
 	dput(parent);
 }