mbox series

[RFC,v3,0/9] Suppress negative dentry

Message ID 20200515072047.31454-1-cgxu519@mykernel.net (mailing list archive)
Headers show
Series Suppress negative dentry | expand

Message

Chengguang Xu May 15, 2020, 7:20 a.m. UTC
This series adds a new lookup flag LOOKUP_DONTCACHE_NEGATIVE
to indicate to drop negative dentry in slow path of lookup.

In overlayfs, negative dentries in upper/lower layers are useless
after construction of overlayfs' own dentry, so in order to
effectively reclaim those dentries, specify LOOKUP_DONTCACHE_NEGATIVE
flag when doing lookup in upper/lower layers.

Patch 1 adds flag LOOKUP_DONTCACHE_NEGATIVE and related logic in vfs layer.
Patch 2 does lookup optimazation for overlayfs.
Patch 3-9 just adjusts function argument when calling
lookup_positive_unlocked() and lookup_one_len_unlocked().

v1->v2:
- Only drop negative dentry in slow path of lookup.

v2->v3:
- Drop negative dentry in vfs layer.
- Rebase on latest linus-tree(5.7.0-rc5).

Chengguang Xu (9):
  fs/dcache: Introduce a new lookup flag LOOKUP_DONTCACHE_NEGATIVE
  ovl: Suppress negative dentry in lookup
  cifs: Adjust argument for lookup_positive_unlocked()
  debugfs: Adjust argument for lookup_positive_unlocked()
  ecryptfs: Adjust argument for lookup_one_len_unlocked()
  exportfs: Adjust argument for lookup_one_len_unlocked()
  kernfs: Adjust argument for lookup_positive_unlocked()
  nfsd: Adjust argument for lookup_positive_unlocked()
  quota: Adjust argument for lookup_positive_unlocked()

 fs/cifs/cifsfs.c      |  2 +-
 fs/debugfs/inode.c    |  2 +-
 fs/ecryptfs/inode.c   |  2 +-
 fs/exportfs/expfs.c   |  2 +-
 fs/kernfs/mount.c     |  2 +-
 fs/namei.c            | 14 ++++++++++----
 fs/nfsd/nfs3xdr.c     |  2 +-
 fs/nfsd/nfs4xdr.c     |  3 ++-
 fs/overlayfs/namei.c  |  9 +++++----
 fs/quota/dquot.c      |  3 ++-
 include/linux/namei.h |  9 +++++++--
 11 files changed, 32 insertions(+), 18 deletions(-)

Comments

Chengguang Xu May 15, 2020, 8:25 a.m. UTC | #1
---- 在 星期五, 2020-05-15 15:30:27 Amir Goldstein <amir73il@gmail.com> 撰写 ----
 > On Fri, May 15, 2020 at 10:21 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
 > >
 > > This series adds a new lookup flag LOOKUP_DONTCACHE_NEGATIVE
 > > to indicate to drop negative dentry in slow path of lookup.
 > >
 > > In overlayfs, negative dentries in upper/lower layers are useless
 > > after construction of overlayfs' own dentry, so in order to
 > > effectively reclaim those dentries, specify LOOKUP_DONTCACHE_NEGATIVE
 > > flag when doing lookup in upper/lower layers.
 > >
 > > Patch 1 adds flag LOOKUP_DONTCACHE_NEGATIVE and related logic in vfs layer.
 > > Patch 2 does lookup optimazation for overlayfs.
 > > Patch 3-9 just adjusts function argument when calling
 > > lookup_positive_unlocked() and lookup_one_len_unlocked().
 > 
 > Hmm you cannot do that, build must not be broken mid series.
 > When Miklos said split he meant to patch 1 and 2.
 > Patch 1 must convert all callers to the new argument list,
 > at which point all overlayfs calls are with 0 flags.
 > 
 > Once that's done, you may add:
 > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
 > 

OK, I got it, I'll still wait for a while in case of other feedbacks.

Miklos, AI

I'm not sure this series will go into whose tree in the end, 
so I just rebased on current linus-tree, any suggestion for the code base?

Thanks,
cgxu
Miklos Szeredi May 15, 2020, 8:42 a.m. UTC | #2
On Fri, May 15, 2020 at 10:26 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
>  ---- 在 星期五, 2020-05-15 15:30:27 Amir Goldstein <amir73il@gmail.com> 撰写 ----
>  > On Fri, May 15, 2020 at 10:21 AM Chengguang Xu <cgxu519@mykernel.net> wrote:
>  > >
>  > > This series adds a new lookup flag LOOKUP_DONTCACHE_NEGATIVE
>  > > to indicate to drop negative dentry in slow path of lookup.
>  > >
>  > > In overlayfs, negative dentries in upper/lower layers are useless
>  > > after construction of overlayfs' own dentry, so in order to
>  > > effectively reclaim those dentries, specify LOOKUP_DONTCACHE_NEGATIVE
>  > > flag when doing lookup in upper/lower layers.
>  > >
>  > > Patch 1 adds flag LOOKUP_DONTCACHE_NEGATIVE and related logic in vfs layer.
>  > > Patch 2 does lookup optimazation for overlayfs.
>  > > Patch 3-9 just adjusts function argument when calling
>  > > lookup_positive_unlocked() and lookup_one_len_unlocked().
>  >
>  > Hmm you cannot do that, build must not be broken mid series.
>  > When Miklos said split he meant to patch 1 and 2.
>  > Patch 1 must convert all callers to the new argument list,
>  > at which point all overlayfs calls are with 0 flags.
>  >
>  > Once that's done, you may add:
>  > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
>  >
>
> OK, I got it, I'll still wait for a while in case of other feedbacks.
>
> Miklos, AI
>
> I'm not sure this series will go into whose tree in the end,
> so I just rebased on current linus-tree, any suggestion for the code base?

Linus' tree is a good base in this case.  I'm happier if VFS changes
go through Al's tree, but simple stuff can go through overlayfs tree
as well.

Thanks,
Miklos
Ian Kent May 18, 2020, 12:53 a.m. UTC | #3
On Fri, 2020-05-15 at 15:20 +0800, Chengguang Xu wrote:
> This series adds a new lookup flag LOOKUP_DONTCACHE_NEGATIVE
> to indicate to drop negative dentry in slow path of lookup.
> 
> In overlayfs, negative dentries in upper/lower layers are useless
> after construction of overlayfs' own dentry, so in order to
> effectively reclaim those dentries, specify LOOKUP_DONTCACHE_NEGATIVE
> flag when doing lookup in upper/lower layers.

I've looked at this a couple of times now.

I'm not at all sure of the wisdom of adding a flag to a VFS function
that allows circumventing what a file system chooses to do.

I also do really see the need for it because only hashed negative
dentrys will be retained by the VFS so, if you see a hashed negative
dentry then you can cause it to be discarded on release of the last
reference by dropping it.

So what's different here, why is adding an argument to do that drop
in the VFS itself needed instead of just doing it in overlayfs?

> 
> Patch 1 adds flag LOOKUP_DONTCACHE_NEGATIVE and related logic in vfs
> layer.
> Patch 2 does lookup optimazation for overlayfs.
> Patch 3-9 just adjusts function argument when calling
> lookup_positive_unlocked() and lookup_one_len_unlocked().
> 
> v1->v2:
> - Only drop negative dentry in slow path of lookup.
> 
> v2->v3:
> - Drop negative dentry in vfs layer.
> - Rebase on latest linus-tree(5.7.0-rc5).
> 
> Chengguang Xu (9):
>   fs/dcache: Introduce a new lookup flag LOOKUP_DONTCACHE_NEGATIVE
>   ovl: Suppress negative dentry in lookup
>   cifs: Adjust argument for lookup_positive_unlocked()
>   debugfs: Adjust argument for lookup_positive_unlocked()
>   ecryptfs: Adjust argument for lookup_one_len_unlocked()
>   exportfs: Adjust argument for lookup_one_len_unlocked()
>   kernfs: Adjust argument for lookup_positive_unlocked()
>   nfsd: Adjust argument for lookup_positive_unlocked()
>   quota: Adjust argument for lookup_positive_unlocked()
> 
>  fs/cifs/cifsfs.c      |  2 +-
>  fs/debugfs/inode.c    |  2 +-
>  fs/ecryptfs/inode.c   |  2 +-
>  fs/exportfs/expfs.c   |  2 +-
>  fs/kernfs/mount.c     |  2 +-
>  fs/namei.c            | 14 ++++++++++----
>  fs/nfsd/nfs3xdr.c     |  2 +-
>  fs/nfsd/nfs4xdr.c     |  3 ++-
>  fs/overlayfs/namei.c  |  9 +++++----
>  fs/quota/dquot.c      |  3 ++-
>  include/linux/namei.h |  9 +++++++--
>  11 files changed, 32 insertions(+), 18 deletions(-)
>
Amir Goldstein May 18, 2020, 5:27 a.m. UTC | #4
On Mon, May 18, 2020 at 3:53 AM Ian Kent <raven@themaw.net> wrote:
>
> On Fri, 2020-05-15 at 15:20 +0800, Chengguang Xu wrote:
> > This series adds a new lookup flag LOOKUP_DONTCACHE_NEGATIVE
> > to indicate to drop negative dentry in slow path of lookup.
> >
> > In overlayfs, negative dentries in upper/lower layers are useless
> > after construction of overlayfs' own dentry, so in order to
> > effectively reclaim those dentries, specify LOOKUP_DONTCACHE_NEGATIVE
> > flag when doing lookup in upper/lower layers.
>
> I've looked at this a couple of times now.
>
> I'm not at all sure of the wisdom of adding a flag to a VFS function
> that allows circumventing what a file system chooses to do.

But it is not really a conscious choice is it?
How exactly does a filesystem express its desire to cache a negative
dentry? The documentation of lookup() in vfs.rst makes it clear that
it is not up to the filesystem to make that decision.
The VFS needs to cache the negative dentry on lookup(), so
it can turn it positive on create().
Low level kernel modules that call the VFS lookup() might know
that caching the negative dentry is counter productive.

>
> I also do really see the need for it because only hashed negative
> dentrys will be retained by the VFS so, if you see a hashed negative
> dentry then you can cause it to be discarded on release of the last
> reference by dropping it.
>
> So what's different here, why is adding an argument to do that drop
> in the VFS itself needed instead of just doing it in overlayfs?

That was v1 patch. It was dealing with the possible race of
returned negative dentry becoming positive before dropping it
in an intrusive manner.

In retrospect, I think this race doesn't matter and there is no
harm in dropping a positive dentry in a race obviously caused by
accessing the underlying layer, which as documented results in
"undefined behavior".

Miklos, am I missing something?

Thanks,
Amir.
Miklos Szeredi May 18, 2020, 7:52 a.m. UTC | #5
On Mon, May 18, 2020 at 7:27 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, May 18, 2020 at 3:53 AM Ian Kent <raven@themaw.net> wrote:
> >
> > On Fri, 2020-05-15 at 15:20 +0800, Chengguang Xu wrote:
> > > This series adds a new lookup flag LOOKUP_DONTCACHE_NEGATIVE
> > > to indicate to drop negative dentry in slow path of lookup.
> > >
> > > In overlayfs, negative dentries in upper/lower layers are useless
> > > after construction of overlayfs' own dentry, so in order to
> > > effectively reclaim those dentries, specify LOOKUP_DONTCACHE_NEGATIVE
> > > flag when doing lookup in upper/lower layers.
> >
> > I've looked at this a couple of times now.
> >
> > I'm not at all sure of the wisdom of adding a flag to a VFS function
> > that allows circumventing what a file system chooses to do.
>
> But it is not really a conscious choice is it?
> How exactly does a filesystem express its desire to cache a negative
> dentry? The documentation of lookup() in vfs.rst makes it clear that
> it is not up to the filesystem to make that decision.
> The VFS needs to cache the negative dentry on lookup(), so
> it can turn it positive on create().
> Low level kernel modules that call the VFS lookup() might know
> that caching the negative dentry is counter productive.
>
> >
> > I also do really see the need for it because only hashed negative
> > dentrys will be retained by the VFS so, if you see a hashed negative
> > dentry then you can cause it to be discarded on release of the last
> > reference by dropping it.
> >
> > So what's different here, why is adding an argument to do that drop
> > in the VFS itself needed instead of just doing it in overlayfs?
>
> That was v1 patch. It was dealing with the possible race of
> returned negative dentry becoming positive before dropping it
> in an intrusive manner.
>
> In retrospect, I think this race doesn't matter and there is no
> harm in dropping a positive dentry in a race obviously caused by
> accessing the underlying layer, which as documented results in
> "undefined behavior".
>
> Miklos, am I missing something?

Dropping a positive dentry is harmful in case there's a long term
reference to the dentry (e.g. an open file) since it will look as if
the file was deleted, when in fact it wasn't.

It's possible to unhash a negative dentry in a safe way if we make
sure it cannot become positive.  One way is to grab d_lock and remove
it from the hash table only if count is one.

So yes, we could have a helper to do that instead of the lookup flag.
The disadvantage being that we'd also be dropping negatives that did
not enter the cache because of our lookup.

I don't really care, both are probably good enough for the overlayfs case.

Thanks,
Miklos
Amir Goldstein May 18, 2020, 8:51 a.m. UTC | #6
> > > I also do really see the need for it because only hashed negative
> > > dentrys will be retained by the VFS so, if you see a hashed negative
> > > dentry then you can cause it to be discarded on release of the last
> > > reference by dropping it.
> > >
> > > So what's different here, why is adding an argument to do that drop
> > > in the VFS itself needed instead of just doing it in overlayfs?
> >
> > That was v1 patch. It was dealing with the possible race of
> > returned negative dentry becoming positive before dropping it
> > in an intrusive manner.
> >
> > In retrospect, I think this race doesn't matter and there is no
> > harm in dropping a positive dentry in a race obviously caused by
> > accessing the underlying layer, which as documented results in
> > "undefined behavior".
> >
> > Miklos, am I missing something?
>
> Dropping a positive dentry is harmful in case there's a long term
> reference to the dentry (e.g. an open file) since it will look as if
> the file was deleted, when in fact it wasn't.
>

I see. My point was that the negative->positive transition cannot
happen on underlying layers without user modifying underlying
layers underneath overlay, so it is fine to be in the "undefined" behavior
zone.

> It's possible to unhash a negative dentry in a safe way if we make
> sure it cannot become positive.  One way is to grab d_lock and remove
> it from the hash table only if count is one.
>
> So yes, we could have a helper to do that instead of the lookup flag.
> The disadvantage being that we'd also be dropping negatives that did
> not enter the cache because of our lookup.
>
> I don't really care, both are probably good enough for the overlayfs case.
>

There is another point to consider.
A negative underlying fs dentry may be useless for *this* overlayfs instance,
but since lower layers can be shared among many overlayfs instances,
for example, thousands of containers all testing for existence of file /etc/FOO
on startup.

It sounds like if we want to go through with DONTCACHE_NEGATIVE, that
it should be opt-in behavior for overlayfs.

Thanks,
Amir.
Miklos Szeredi May 18, 2020, 9:17 a.m. UTC | #7
On Mon, May 18, 2020 at 10:52 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > > I also do really see the need for it because only hashed negative
> > > > dentrys will be retained by the VFS so, if you see a hashed negative
> > > > dentry then you can cause it to be discarded on release of the last
> > > > reference by dropping it.
> > > >
> > > > So what's different here, why is adding an argument to do that drop
> > > > in the VFS itself needed instead of just doing it in overlayfs?
> > >
> > > That was v1 patch. It was dealing with the possible race of
> > > returned negative dentry becoming positive before dropping it
> > > in an intrusive manner.
> > >
> > > In retrospect, I think this race doesn't matter and there is no
> > > harm in dropping a positive dentry in a race obviously caused by
> > > accessing the underlying layer, which as documented results in
> > > "undefined behavior".
> > >
> > > Miklos, am I missing something?
> >
> > Dropping a positive dentry is harmful in case there's a long term
> > reference to the dentry (e.g. an open file) since it will look as if
> > the file was deleted, when in fact it wasn't.
> >
>
> I see. My point was that the negative->positive transition cannot
> happen on underlying layers without user modifying underlying
> layers underneath overlay, so it is fine to be in the "undefined" behavior
> zone.

Right, I don't think you can actually crash a filesystem by unhashing
a positive dentry in the middle of a create op, but it would
definitely be prudent to avoid that.

>
> > It's possible to unhash a negative dentry in a safe way if we make
> > sure it cannot become positive.  One way is to grab d_lock and remove
> > it from the hash table only if count is one.
> >
> > So yes, we could have a helper to do that instead of the lookup flag.
> > The disadvantage being that we'd also be dropping negatives that did
> > not enter the cache because of our lookup.
> >
> > I don't really care, both are probably good enough for the overlayfs case.
> >
>
> There is another point to consider.
> A negative underlying fs dentry may be useless for *this* overlayfs instance,
> but since lower layers can be shared among many overlayfs instances,
> for example, thousands of containers all testing for existence of file /etc/FOO
> on startup.
>
> It sounds like if we want to go through with DONTCACHE_NEGATIVE, that
> it should be opt-in behavior for overlayfs.

Good point.

Thanks,
Miklos
Ian Kent May 18, 2020, 10:26 a.m. UTC | #8
On Mon, 2020-05-18 at 08:27 +0300, Amir Goldstein wrote:
> On Mon, May 18, 2020 at 3:53 AM Ian Kent <raven@themaw.net> wrote:
> > On Fri, 2020-05-15 at 15:20 +0800, Chengguang Xu wrote:
> > > This series adds a new lookup flag LOOKUP_DONTCACHE_NEGATIVE
> > > to indicate to drop negative dentry in slow path of lookup.
> > > 
> > > In overlayfs, negative dentries in upper/lower layers are useless
> > > after construction of overlayfs' own dentry, so in order to
> > > effectively reclaim those dentries, specify
> > > LOOKUP_DONTCACHE_NEGATIVE
> > > flag when doing lookup in upper/lower layers.
> > 
> > I've looked at this a couple of times now.
> > 
> > I'm not at all sure of the wisdom of adding a flag to a VFS
> > function
> > that allows circumventing what a file system chooses to do.
> 
> But it is not really a conscious choice is it?

I thought that too until recently when I had the need to ask the
question "how do these negative hashed dentries get into the dcache.

> How exactly does a filesystem express its desire to cache a negative
> dentry? The documentation of lookup() in vfs.rst makes it clear that
> it is not up to the filesystem to make that decision.

That's the question I had too and, somewhat embarrassingly, got
a response that started with "Are you serious ..." for Al when
I made a claim that ext4 doesn't create negative hashed dentries.

What was so bad about that claim is it's really obvious that ext4
does do this in ext4/namei.c:ext4_lookup():

...
inode = NULL;
if (bh) {
...
}
...
return d_splice_alias(inode, dentry);

and inode can be negative here.

Now d_splice_alias() is pretty complicated but, if passed a NULL
dentry it amounts to calling __d_add(dentry, NULL) which produces
a negative hashed dentry, a decision made by ext4 ->lookup() (and
I must say typical of the very few ways such dentries get into
the dcache).

Now on final dput of the walk these dentries should end up with a
reference count of zero which triggers dput() to add them to the
lru list so they can be considered as prune targets and can be
found in subsequent lookups (they are hashed).

This is how using negative hashed detries helps to avoid the
expensive alloc/free (at least) that occurs when looking up paths
that don't exist.

Negative "unhashed" dentries are discarded on final dput() so
don't really come into the picture except that dropping a negative
hashed dentry will cause it to be discarded on final dput().

But nothing is ever quite as simple as a description of how it
appears to (is meant to) work so, by all means, take what I say
with a grain of salt, ;)

> The VFS needs to cache the negative dentry on lookup(), so
> it can turn it positive on create().
> Low level kernel modules that call the VFS lookup() might know
> that caching the negative dentry is counter productive.
> 
> > I also do really see the need for it because only hashed negative
> > dentrys will be retained by the VFS so, if you see a hashed
> > negative
> > dentry then you can cause it to be discarded on release of the last
> > reference by dropping it.
> > 
> > So what's different here, why is adding an argument to do that drop
> > in the VFS itself needed instead of just doing it in overlayfs?
> 
> That was v1 patch. It was dealing with the possible race of
> returned negative dentry becoming positive before dropping it
> in an intrusive manner.

Right, very much something that overlay fs must care about, file
systems not so much.

It might be possible to assume that if the underlying file system
->lookup() call has produced a negative hashed dentry that it will
stay negative. That assumption definitely can't be made for negative
unhashed dentries, of course, since they may still be in the process
of being filled in.

But, unfortunately, I see your point, it's a risky assumption.

> 
> In retrospect, I think this race doesn't matter and there is no
> harm in dropping a positive dentry in a race obviously caused by
> accessing the underlying layer, which as documented results in
> "undefined behavior".

Umm ... I thought we were talking about negative dentries ...

Ian
Ian Kent May 18, 2020, 10:39 a.m. UTC | #9
On Mon, 2020-05-18 at 18:26 +0800, Ian Kent wrote:
> 
> Now d_splice_alias() is pretty complicated but, if passed a NULL
> dentry it amounts to calling __d_add(dentry, NULL) which produces
> a negative hashed dentry, a decision made by ext4 ->lookup() (and
> I must say typical of the very few ways such dentries get into
> the dcache).

Oh, rather that's a NULL inode not dentry.

Ian
Chengguang Xu May 19, 2020, 5:01 a.m. UTC | #10
---- 在 星期一, 2020-05-18 15:52:48 Miklos Szeredi <miklos@szeredi.hu> 撰写 
----
  > On Mon, May 18, 2020 at 7:27 AM Amir Goldstein <amir73il@gmail.com> 
wrote:
  > >
  > > On Mon, May 18, 2020 at 3:53 AM Ian Kent <raven@themaw.net> wrote:
  > > >
  > > > On Fri, 2020-05-15 at 15:20 +0800, Chengguang Xu wrote:
  > > > > This series adds a new lookup flag LOOKUP_DONTCACHE_NEGATIVE
  > > > > to indicate to drop negative dentry in slow path of lookup.
  > > > >
  > > > > In overlayfs, negative dentries in upper/lower layers are useless
  > > > > after construction of overlayfs' own dentry, so in order to
  > > > > effectively reclaim those dentries, specify 
LOOKUP_DONTCACHE_NEGATIVE
  > > > > flag when doing lookup in upper/lower layers.
  > > >
  > > > I've looked at this a couple of times now.
  > > >
  > > > I'm not at all sure of the wisdom of adding a flag to a VFS function
  > > > that allows circumventing what a file system chooses to do.
  > >
  > > But it is not really a conscious choice is it?
  > > How exactly does a filesystem express its desire to cache a negative
  > > dentry? The documentation of lookup() in vfs.rst makes it clear that
  > > it is not up to the filesystem to make that decision.
  > > The VFS needs to cache the negative dentry on lookup(), so
  > > it can turn it positive on create().
  > > Low level kernel modules that call the VFS lookup() might know
  > > that caching the negative dentry is counter productive.
  > >
  > > >
  > > > I also do really see the need for it because only hashed negative
  > > > dentrys will be retained by the VFS so, if you see a hashed negative
  > > > dentry then you can cause it to be discarded on release of the last
  > > > reference by dropping it.
  > > >
  > > > So what's different here, why is adding an argument to do that drop
  > > > in the VFS itself needed instead of just doing it in overlayfs?
  > >
  > > That was v1 patch. It was dealing with the possible race of
  > > returned negative dentry becoming positive before dropping it
  > > in an intrusive manner.
  > >
  > > In retrospect, I think this race doesn't matter and there is no
  > > harm in dropping a positive dentry in a race obviously caused by
  > > accessing the underlying layer, which as documented results in
  > > "undefined behavior".
  > >
  > > Miklos, am I missing something?
  >  > Dropping a positive dentry is harmful in case there's a long term
  > reference to the dentry (e.g. an open file) since it will look as if
  > the file was deleted, when in fact it wasn't.
  >  > It's possible to unhash a negative dentry in a safe way if we make
  > sure it cannot become positive.  One way is to grab d_lock and remove
  > it from the hash table only if count is one.
  >  > So yes, we could have a helper to do that instead of the lookup flag.
  > The disadvantage being that we'd also be dropping negatives that did
  > not enter the cache because of our lookup.
  >


If we don't consider that only drop negative dentry of our lookup,
it is possible to do like below, isn't it?



diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 723d17744758..fa339e23b0f8 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -200,7 +200,7 @@ static int ovl_lookup_single(struct dentry *base, 
struct ovl_lookup_data *d,
         int err;
         bool last_element = !post[0];

-       this = lookup_positive_unlocked(name, base, namelen);
+       this = lookup_one_len_unlocked(name, base, namelen);
         if (IS_ERR(this)) {
                 err = PTR_ERR(this);
                 this = NULL;
@@ -209,6 +209,18 @@ static int ovl_lookup_single(struct dentry *base, 
struct ovl_lookup_data *d,
                 goto out_err;
         }

+       if (d_flags_negative(this->d_flags)) {
+               inode_lock_shared(base->d_inode);
+               if (d_flags_negative(this->d_flags))
+                       d_drop(this);
+               inode_unlock_shared(base->d_inode);
+
+               dput(this);
+               this = NULL;
+               err = -ENOENT;
+               goto out;
+       }
+
         if (ovl_dentry_weird(this)) {
                 /* Don't support traversing automounts and other 
weirdness */
                 err = -EREMOTE;


Thanks,
cgxu
Miklos Szeredi May 19, 2020, 8:21 a.m. UTC | #11
On Tue, May 19, 2020 at 7:02 AM cgxu <cgxu519@mykernel.net> wrote:

> If we don't consider that only drop negative dentry of our lookup,
> it is possible to do like below, isn't it?

Yes, the code looks good, though I'd consider using d_lock on dentry
instead if i_lock on parent, something like this:

if (d_is_negative(dentry) && dentry->d_lockref.count == 1) {
    spin_lock(&dentry->d_lock);
    /* Recheck condition under lock */
    if (d_is_negative(dentry) && dentry->d_lockref.count == 1)
        __d_drop(dentry)
    spin_unlock(&dentry->d_lock);
}

But as Amir noted, we do need to take into account the case where
lower layers are shared by multiple overlays, in which case dropping
the negative dentries could result in a performance regression.
Have you looked at that case, and the effect of this patch on negative
dentry lookup performance?

Upper layer negative dentries don't have this issue, since they are
never shared, so I think it would be safe to drop them
unconditionally.

Thanks,
Miklos
Chengguang Xu May 19, 2020, 9:23 a.m. UTC | #12
On 5/19/20 4:21 PM, Miklos Szeredi wrote:
> On Tue, May 19, 2020 at 7:02 AM cgxu <cgxu519@mykernel.net> wrote:
>
>> If we don't consider that only drop negative dentry of our lookup,
>> it is possible to do like below, isn't it?
> Yes, the code looks good, though I'd consider using d_lock on dentry
> instead if i_lock on parent, something like this:
>
> if (d_is_negative(dentry) && dentry->d_lockref.count == 1) {
>      spin_lock(&dentry->d_lock);
>      /* Recheck condition under lock */
>      if (d_is_negative(dentry) && dentry->d_lockref.count == 1)
>          __d_drop(dentry)
>      spin_unlock(&dentry->d_lock);

And after this we will still treat 'dentry' as negative dentry and dput it
regardless of the second check result of d_is_negative(dentry), right?


> }
>
> But as Amir noted, we do need to take into account the case where
> lower layers are shared by multiple overlays, in which case dropping
> the negative dentries could result in a performance regression.
> Have you looked at that case, and the effect of this patch on negative
> dentry lookup performance?

The container which is affected by this feature is just take advantage
of previous another container but we could not guarantee that always
happening. I think there no way for best of both worlds, consider that
some malicious containers continuously make negative dentries by
searching non-exist files, so that page cache of clean data, clean
inodes/dentries will be freed by memory reclaim. All of those
behaviors will impact the performance of other container instances.

On the other hand, if this feature significantly affects particular 
container,
doesn't that mean the container is noisy neighbor and should be restricted
in some way?

Thanks,
cgxu
Miklos Szeredi May 20, 2020, 2:44 p.m. UTC | #13
On Tue, May 19, 2020 at 11:24 AM cgxu <cgxu519@mykernel.net> wrote:
>
> On 5/19/20 4:21 PM, Miklos Szeredi wrote:
> > On Tue, May 19, 2020 at 7:02 AM cgxu <cgxu519@mykernel.net> wrote:
> >
> >> If we don't consider that only drop negative dentry of our lookup,
> >> it is possible to do like below, isn't it?
> > Yes, the code looks good, though I'd consider using d_lock on dentry
> > instead if i_lock on parent, something like this:
> >
> > if (d_is_negative(dentry) && dentry->d_lockref.count == 1) {
> >      spin_lock(&dentry->d_lock);
> >      /* Recheck condition under lock */
> >      if (d_is_negative(dentry) && dentry->d_lockref.count == 1)
> >          __d_drop(dentry)
> >      spin_unlock(&dentry->d_lock);
>
> And after this we will still treat 'dentry' as negative dentry and dput it
> regardless of the second check result of d_is_negative(dentry), right?

I'd restructure it in the same way as lookup_positive_unlocked()...

> > }
> >
> > But as Amir noted, we do need to take into account the case where
> > lower layers are shared by multiple overlays, in which case dropping
> > the negative dentries could result in a performance regression.
> > Have you looked at that case, and the effect of this patch on negative
> > dentry lookup performance?
>
> The container which is affected by this feature is just take advantage
> of previous another container but we could not guarantee that always
> happening. I think there no way for best of both worlds, consider that
> some malicious containers continuously make negative dentries by
> searching non-exist files, so that page cache of clean data, clean
> inodes/dentries will be freed by memory reclaim. All of those
> behaviors will impact the performance of other container instances.
>
> On the other hand, if this feature significantly affects particular
> container,
> doesn't that mean the container is noisy neighbor and should be restricted
> in some way?

Not necessarily.   Negative dentries can be useful and in case of
layers shared between two containers having negative dentries cached
in the lower layer can in theory positively affect performance.   I
don't have data to back this up, nor the opposite.  You should run
some numbers for container startup times with and without this patch.

Thanks,
Milklos
Chengguang Xu May 25, 2020, 1:37 p.m. UTC | #14
在 5/20/2020 10:44 PM, Miklos Szeredi 写道:
> On Tue, May 19, 2020 at 11:24 AM cgxu <cgxu519@mykernel.net> wrote:
>> On 5/19/20 4:21 PM, Miklos Szeredi wrote:
>>> On Tue, May 19, 2020 at 7:02 AM cgxu <cgxu519@mykernel.net> wrote:
>>>
>>>> If we don't consider that only drop negative dentry of our lookup,
>>>> it is possible to do like below, isn't it?
>>> Yes, the code looks good, though I'd consider using d_lock on dentry
>>> instead if i_lock on parent, something like this:
>>>
>>> if (d_is_negative(dentry) && dentry->d_lockref.count == 1) {
>>>       spin_lock(&dentry->d_lock);
>>>       /* Recheck condition under lock */
>>>       if (d_is_negative(dentry) && dentry->d_lockref.count == 1)
>>>           __d_drop(dentry)
>>>       spin_unlock(&dentry->d_lock);
>> And after this we will still treat 'dentry' as negative dentry and dput it
>> regardless of the second check result of d_is_negative(dentry), right?
> I'd restructure it in the same way as lookup_positive_unlocked()...
>
>>> }
>>>
>>> But as Amir noted, we do need to take into account the case where
>>> lower layers are shared by multiple overlays, in which case dropping
>>> the negative dentries could result in a performance regression.
>>> Have you looked at that case, and the effect of this patch on negative
>>> dentry lookup performance?
>> The container which is affected by this feature is just take advantage
>> of previous another container but we could not guarantee that always
>> happening. I think there no way for best of both worlds, consider that
>> some malicious containers continuously make negative dentries by
>> searching non-exist files, so that page cache of clean data, clean
>> inodes/dentries will be freed by memory reclaim. All of those
>> behaviors will impact the performance of other container instances.
>>
>> On the other hand, if this feature significantly affects particular
>> container,
>> doesn't that mean the container is noisy neighbor and should be restricted
>> in some way?
> Not necessarily.   Negative dentries can be useful and in case of
> layers shared between two containers having negative dentries cached
> in the lower layer can in theory positively affect performance.   I
> don't have data to back this up, nor the opposite.  You should run
> some numbers for container startup times with and without this patch.

I did some simple tests  for it but the result seems not very steady, so 
I need to take time to do more detail tests later. Is it possible to 
apply the patch for upper layer first?

Thanks,
cgxu
Miklos Szeredi May 25, 2020, 1:50 p.m. UTC | #15
On Mon, May 25, 2020 at 3:37 PM Chengguang Xu <cgxu519@mykernel.net> wrote:
>
> 在 5/20/2020 10:44 PM, Miklos Szeredi 写道:
> > On Tue, May 19, 2020 at 11:24 AM cgxu <cgxu519@mykernel.net> wrote:
> >> On 5/19/20 4:21 PM, Miklos Szeredi wrote:
> >>> On Tue, May 19, 2020 at 7:02 AM cgxu <cgxu519@mykernel.net> wrote:
> >>>
> >>>> If we don't consider that only drop negative dentry of our lookup,
> >>>> it is possible to do like below, isn't it?
> >>> Yes, the code looks good, though I'd consider using d_lock on dentry
> >>> instead if i_lock on parent, something like this:
> >>>
> >>> if (d_is_negative(dentry) && dentry->d_lockref.count == 1) {
> >>>       spin_lock(&dentry->d_lock);
> >>>       /* Recheck condition under lock */
> >>>       if (d_is_negative(dentry) && dentry->d_lockref.count == 1)
> >>>           __d_drop(dentry)
> >>>       spin_unlock(&dentry->d_lock);
> >> And after this we will still treat 'dentry' as negative dentry and dput it
> >> regardless of the second check result of d_is_negative(dentry), right?
> > I'd restructure it in the same way as lookup_positive_unlocked()...
> >
> >>> }
> >>>
> >>> But as Amir noted, we do need to take into account the case where
> >>> lower layers are shared by multiple overlays, in which case dropping
> >>> the negative dentries could result in a performance regression.
> >>> Have you looked at that case, and the effect of this patch on negative
> >>> dentry lookup performance?
> >> The container which is affected by this feature is just take advantage
> >> of previous another container but we could not guarantee that always
> >> happening. I think there no way for best of both worlds, consider that
> >> some malicious containers continuously make negative dentries by
> >> searching non-exist files, so that page cache of clean data, clean
> >> inodes/dentries will be freed by memory reclaim. All of those
> >> behaviors will impact the performance of other container instances.
> >>
> >> On the other hand, if this feature significantly affects particular
> >> container,
> >> doesn't that mean the container is noisy neighbor and should be restricted
> >> in some way?
> > Not necessarily.   Negative dentries can be useful and in case of
> > layers shared between two containers having negative dentries cached
> > in the lower layer can in theory positively affect performance.   I
> > don't have data to back this up, nor the opposite.  You should run
> > some numbers for container startup times with and without this patch.
>
> I did some simple tests  for it but the result seems not very steady, so
> I need to take time to do more detail tests later. Is it possible to
> apply the patch for upper layer first?

Sure, that's a good start.

Thanks,
Miklos