diff mbox series

namei: clear nd->root.mnt before O_CREAT unlazy

Message ID 20220105180259.115760-1-bfoster@redhat.com (mailing list archive)
State New, archived
Headers show
Series namei: clear nd->root.mnt before O_CREAT unlazy | expand

Commit Message

Brian Foster Jan. 5, 2022, 6:02 p.m. UTC
The unlazy sequence of an rcuwalk lookup occurs a bit earlier than
normal for O_CREAT lookups (i.e. in open_last_lookups()). The create
logic here historically invoked complete_walk(), which clears the
nd->root.mnt pointer when appropriate before the unlazy.  This
changed in commit 72287417abd1 ("open_last_lookups(): don't abuse
complete_walk() when all we want is unlazy"), which refactored the
create path to invoke unlazy_walk() and not consider nd->root.mnt.

This tweak negatively impacts performance on a concurrent
open(O_CREAT) workload to multiple independent mounts beneath the
root directory. This attributes to increased spinlock contention on
the root dentry via legitimize_root(), to the point where the
spinlock becomes the primary bottleneck over the directory inode
rwsem of the individual submounts. For example, the completion rate
of a 32k thread aim7 create/close benchmark that repeatedly passes
O_CREAT to open preexisting files drops from over 700k "jobs per
minute" to 30, increasing the overall test time from a few minutes
to over an hour.

A similar, more simplified test to create a set of opener tasks
across a set of submounts can demonstrate the problem more quickly.
For example, consider sets of 100 open/close tasks each running
against 64 independent filesystem mounts (i.e. 6400 tasks total),
with each task completing 10k iterations before it exits. On an
80xcpu box running v5.16.0-rc2, this test completes in 50-55s. With
this patch applied, the same test completes in 10-15s.

This is not the most realistic workload in the world as it factors
out inode allocation in the filesystem. The contention can also be
avoided by more selective use of O_CREAT or via use of relative
pathnames. That said, this regression appears to be an unintentional
side effect of code cleanup and might be unexpected for users.
Restore original behavior prior to commit 72287417abd1 by factoring
the rcu logic from complete_walk() into a new helper and invoke that
from both places.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/namei.c | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

Comments

Ian Kent Jan. 6, 2022, 11:46 p.m. UTC | #1
On Wed, 2022-01-05 at 13:02 -0500, Brian Foster wrote:
> The unlazy sequence of an rcuwalk lookup occurs a bit earlier than
> normal for O_CREAT lookups (i.e. in open_last_lookups()). The create
> logic here historically invoked complete_walk(), which clears the
> nd->root.mnt pointer when appropriate before the unlazy.  This
> changed in commit 72287417abd1 ("open_last_lookups(): don't abuse
> complete_walk() when all we want is unlazy"), which refactored the
> create path to invoke unlazy_walk() and not consider nd->root.mnt.
> 
> This tweak negatively impacts performance on a concurrent
> open(O_CREAT) workload to multiple independent mounts beneath the
> root directory. This attributes to increased spinlock contention on
> the root dentry via legitimize_root(), to the point where the
> spinlock becomes the primary bottleneck over the directory inode
> rwsem of the individual submounts. For example, the completion rate
> of a 32k thread aim7 create/close benchmark that repeatedly passes
> O_CREAT to open preexisting files drops from over 700k "jobs per
> minute" to 30, increasing the overall test time from a few minutes
> to over an hour.
> 
> A similar, more simplified test to create a set of opener tasks
> across a set of submounts can demonstrate the problem more quickly.
> For example, consider sets of 100 open/close tasks each running
> against 64 independent filesystem mounts (i.e. 6400 tasks total),
> with each task completing 10k iterations before it exits. On an
> 80xcpu box running v5.16.0-rc2, this test completes in 50-55s. With
> this patch applied, the same test completes in 10-15s.
> 
> This is not the most realistic workload in the world as it factors
> out inode allocation in the filesystem. The contention can also be
> avoided by more selective use of O_CREAT or via use of relative
> pathnames. That said, this regression appears to be an unintentional
> side effect of code cleanup and might be unexpected for users.
> Restore original behavior prior to commit 72287417abd1 by factoring
> the rcu logic from complete_walk() into a new helper and invoke that
> from both places.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/namei.c | 37 +++++++++++++++++++++----------------
>  1 file changed, 21 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 1f9d2187c765..b32fcbc99929 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -856,6 +856,22 @@ static inline int d_revalidate(struct dentry
> *dentry, unsigned int flags)
>                 return 1;
>  }
>  
> +static inline bool complete_walk_rcu(struct nameidata *nd)
> +{
> +       if (nd->flags & LOOKUP_RCU) {
> +               /*
> +                * We don't want to zero nd->root for scoped-lookups
> or
> +                * externally-managed nd->root.
> +                */
> +               if (!(nd->state & ND_ROOT_PRESET))
> +                       if (!(nd->flags & LOOKUP_IS_SCOPED))
> +                               nd->root.mnt = NULL;
> +               nd->flags &= ~LOOKUP_CACHED;
> +               return try_to_unlazy(nd);
> +       }
> +       return true;
> +}
> +
>  /**
>   * complete_walk - successful completion of path walk
>   * @nd:  pointer nameidata
> @@ -871,18 +887,8 @@ static int complete_walk(struct nameidata *nd)
>         struct dentry *dentry = nd->path.dentry;
>         int status;
>  
> -       if (nd->flags & LOOKUP_RCU) {
> -               /*
> -                * We don't want to zero nd->root for scoped-lookups
> or
> -                * externally-managed nd->root.
> -                */
> -               if (!(nd->state & ND_ROOT_PRESET))
> -                       if (!(nd->flags & LOOKUP_IS_SCOPED))
> -                               nd->root.mnt = NULL;
> -               nd->flags &= ~LOOKUP_CACHED;
> -               if (!try_to_unlazy(nd))
> -                       return -ECHILD;
> -       }
> +       if (!complete_walk_rcu(nd))
> +               return -ECHILD;
>  
>         if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
>                 /*
> @@ -3325,10 +3331,9 @@ static const char *open_last_lookups(struct
> nameidata *nd,
>                 BUG_ON(nd->flags & LOOKUP_RCU);
>         } else {
>                 /* create side of things */
> -               if (nd->flags & LOOKUP_RCU) {
> -                       if (!try_to_unlazy(nd))
> -                               return ERR_PTR(-ECHILD);
> -               }
> +               if (!complete_walk_rcu(nd))
> +                       return ERR_PTR(-ECHILD);
> +
>                 audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
>                 /* trailing slashes? */
>                 if (unlikely(nd->last.name[nd->last.len]))

Looks good, assuming Al is ok with the re-factoring.
Reviewed-by: Ian Kent <raven@themaw.net>

Ian
Al Viro Jan. 7, 2022, 5:52 a.m. UTC | #2
On Fri, Jan 07, 2022 at 07:46:10AM +0800, Ian Kent wrote:
> On Wed, 2022-01-05 at 13:02 -0500, Brian Foster wrote:
> > The unlazy sequence of an rcuwalk lookup occurs a bit earlier than
> > normal for O_CREAT lookups (i.e. in open_last_lookups()). The create
> > logic here historically invoked complete_walk(), which clears the
> > nd->root.mnt pointer when appropriate before the unlazy.  This
> > changed in commit 72287417abd1 ("open_last_lookups(): don't abuse
> > complete_walk() when all we want is unlazy"), which refactored the
> > create path to invoke unlazy_walk() and not consider nd->root.mnt.
> > 
> > This tweak negatively impacts performance on a concurrent
> > open(O_CREAT) workload to multiple independent mounts beneath the
> > root directory. This attributes to increased spinlock contention on
> > the root dentry via legitimize_root(), to the point where the
> > spinlock becomes the primary bottleneck over the directory inode
> > rwsem of the individual submounts. For example, the completion rate
> > of a 32k thread aim7 create/close benchmark that repeatedly passes
> > O_CREAT to open preexisting files drops from over 700k "jobs per
> > minute" to 30, increasing the overall test time from a few minutes
> > to over an hour.
> > 
> > A similar, more simplified test to create a set of opener tasks
> > across a set of submounts can demonstrate the problem more quickly.
> > For example, consider sets of 100 open/close tasks each running
> > against 64 independent filesystem mounts (i.e. 6400 tasks total),
> > with each task completing 10k iterations before it exits. On an
> > 80xcpu box running v5.16.0-rc2, this test completes in 50-55s. With
> > this patch applied, the same test completes in 10-15s.
> > 
> > This is not the most realistic workload in the world as it factors
> > out inode allocation in the filesystem. The contention can also be
> > avoided by more selective use of O_CREAT or via use of relative
> > pathnames. That said, this regression appears to be an unintentional
> > side effect of code cleanup and might be unexpected for users.
> > Restore original behavior prior to commit 72287417abd1 by factoring
> > the rcu logic from complete_walk() into a new helper and invoke that
> > from both places.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/namei.c | 37 +++++++++++++++++++++----------------
> >  1 file changed, 21 insertions(+), 16 deletions(-)
> > 
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 1f9d2187c765..b32fcbc99929 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -856,6 +856,22 @@ static inline int d_revalidate(struct dentry
> > *dentry, unsigned int flags)
> >                 return 1;
> >  }
> >  
> > +static inline bool complete_walk_rcu(struct nameidata *nd)
> > +{
> > +       if (nd->flags & LOOKUP_RCU) {
> > +               /*
> > +                * We don't want to zero nd->root for scoped-lookups
> > or
> > +                * externally-managed nd->root.
> > +                */
> > +               if (!(nd->state & ND_ROOT_PRESET))
> > +                       if (!(nd->flags & LOOKUP_IS_SCOPED))
> > +                               nd->root.mnt = NULL;
> > +               nd->flags &= ~LOOKUP_CACHED;
> > +               return try_to_unlazy(nd);
> > +       }
> > +       return true;
> > +}
> > +
> >  /**
> >   * complete_walk - successful completion of path walk
> >   * @nd:  pointer nameidata
> > @@ -871,18 +887,8 @@ static int complete_walk(struct nameidata *nd)
> >         struct dentry *dentry = nd->path.dentry;
> >         int status;
> >  
> > -       if (nd->flags & LOOKUP_RCU) {
> > -               /*
> > -                * We don't want to zero nd->root for scoped-lookups
> > or
> > -                * externally-managed nd->root.
> > -                */
> > -               if (!(nd->state & ND_ROOT_PRESET))
> > -                       if (!(nd->flags & LOOKUP_IS_SCOPED))
> > -                               nd->root.mnt = NULL;
> > -               nd->flags &= ~LOOKUP_CACHED;
> > -               if (!try_to_unlazy(nd))
> > -                       return -ECHILD;
> > -       }
> > +       if (!complete_walk_rcu(nd))
> > +               return -ECHILD;
> >  
> >         if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
> >                 /*
> > @@ -3325,10 +3331,9 @@ static const char *open_last_lookups(struct
> > nameidata *nd,
> >                 BUG_ON(nd->flags & LOOKUP_RCU);
> >         } else {
> >                 /* create side of things */
> > -               if (nd->flags & LOOKUP_RCU) {
> > -                       if (!try_to_unlazy(nd))
> > -                               return ERR_PTR(-ECHILD);
> > -               }
> > +               if (!complete_walk_rcu(nd))
> > +                       return ERR_PTR(-ECHILD);
> > +
> >                 audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
> >                 /* trailing slashes? */
> >                 if (unlikely(nd->last.name[nd->last.len]))
> 
> Looks good, assuming Al is ok with the re-factoring.
> Reviewed-by: Ian Kent <raven@themaw.net>

Ummm....  Mind resending that?  I'm still digging myself from under
the huge pile of mail, and this seems to have been lost in process...
Ian Kent Jan. 7, 2022, 7:04 a.m. UTC | #3
On Fri, 2022-01-07 at 05:52 +0000, Al Viro wrote:
> On Fri, Jan 07, 2022 at 07:46:10AM +0800, Ian Kent wrote:
> > On Wed, 2022-01-05 at 13:02 -0500, Brian Foster wrote:
> > > The unlazy sequence of an rcuwalk lookup occurs a bit earlier than
> > > normal for O_CREAT lookups (i.e. in open_last_lookups()). The
> > > create
> > > logic here historically invoked complete_walk(), which clears the
> > > nd->root.mnt pointer when appropriate before the unlazy.  This
> > > changed in commit 72287417abd1 ("open_last_lookups(): don't abuse
> > > complete_walk() when all we want is unlazy"), which refactored the
> > > create path to invoke unlazy_walk() and not consider nd->root.mnt.
> > > 
> > > This tweak negatively impacts performance on a concurrent
> > > open(O_CREAT) workload to multiple independent mounts beneath the
> > > root directory. This attributes to increased spinlock contention on
> > > the root dentry via legitimize_root(), to the point where the
> > > spinlock becomes the primary bottleneck over the directory inode
> > > rwsem of the individual submounts. For example, the completion rate
> > > of a 32k thread aim7 create/close benchmark that repeatedly passes
> > > O_CREAT to open preexisting files drops from over 700k "jobs per
> > > minute" to 30, increasing the overall test time from a few minutes
> > > to over an hour.
> > > 
> > > A similar, more simplified test to create a set of opener tasks
> > > across a set of submounts can demonstrate the problem more quickly.
> > > For example, consider sets of 100 open/close tasks each running
> > > against 64 independent filesystem mounts (i.e. 6400 tasks total),
> > > with each task completing 10k iterations before it exits. On an
> > > 80xcpu box running v5.16.0-rc2, this test completes in 50-55s. With
> > > this patch applied, the same test completes in 10-15s.
> > > 
> > > This is not the most realistic workload in the world as it factors
> > > out inode allocation in the filesystem. The contention can also be
> > > avoided by more selective use of O_CREAT or via use of relative
> > > pathnames. That said, this regression appears to be an
> > > unintentional
> > > side effect of code cleanup and might be unexpected for users.
> > > Restore original behavior prior to commit 72287417abd1 by factoring
> > > the rcu logic from complete_walk() into a new helper and invoke
> > > that
> > > from both places.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/namei.c | 37 +++++++++++++++++++++----------------
> > >  1 file changed, 21 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/namei.c b/fs/namei.c
> > > index 1f9d2187c765..b32fcbc99929 100644
> > > --- a/fs/namei.c
> > > +++ b/fs/namei.c
> > > @@ -856,6 +856,22 @@ static inline int d_revalidate(struct dentry
> > > *dentry, unsigned int flags)
> > >                 return 1;
> > >  }
> > >  
> > > +static inline bool complete_walk_rcu(struct nameidata *nd)
> > > +{
> > > +       if (nd->flags & LOOKUP_RCU) {
> > > +               /*
> > > +                * We don't want to zero nd->root for scoped-
> > > lookups
> > > or
> > > +                * externally-managed nd->root.
> > > +                */
> > > +               if (!(nd->state & ND_ROOT_PRESET))
> > > +                       if (!(nd->flags & LOOKUP_IS_SCOPED))
> > > +                               nd->root.mnt = NULL;
> > > +               nd->flags &= ~LOOKUP_CACHED;
> > > +               return try_to_unlazy(nd);
> > > +       }
> > > +       return true;
> > > +}
> > > +
> > >  /**
> > >   * complete_walk - successful completion of path walk
> > >   * @nd:  pointer nameidata
> > > @@ -871,18 +887,8 @@ static int complete_walk(struct nameidata *nd)
> > >         struct dentry *dentry = nd->path.dentry;
> > >         int status;
> > >  
> > > -       if (nd->flags & LOOKUP_RCU) {
> > > -               /*
> > > -                * We don't want to zero nd->root for scoped-
> > > lookups
> > > or
> > > -                * externally-managed nd->root.
> > > -                */
> > > -               if (!(nd->state & ND_ROOT_PRESET))
> > > -                       if (!(nd->flags & LOOKUP_IS_SCOPED))
> > > -                               nd->root.mnt = NULL;
> > > -               nd->flags &= ~LOOKUP_CACHED;
> > > -               if (!try_to_unlazy(nd))
> > > -                       return -ECHILD;
> > > -       }
> > > +       if (!complete_walk_rcu(nd))
> > > +               return -ECHILD;
> > >  
> > >         if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
> > >                 /*
> > > @@ -3325,10 +3331,9 @@ static const char *open_last_lookups(struct
> > > nameidata *nd,
> > >                 BUG_ON(nd->flags & LOOKUP_RCU);
> > >         } else {
> > >                 /* create side of things */
> > > -               if (nd->flags & LOOKUP_RCU) {
> > > -                       if (!try_to_unlazy(nd))
> > > -                               return ERR_PTR(-ECHILD);
> > > -               }
> > > +               if (!complete_walk_rcu(nd))
> > > +                       return ERR_PTR(-ECHILD);
> > > +
> > >                 audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
> > >                 /* trailing slashes? */
> > >                 if (unlikely(nd->last.name[nd->last.len]))
> > 
> > Looks good, assuming Al is ok with the re-factoring.
> > Reviewed-by: Ian Kent <raven@themaw.net>
> 
> Ummm....  Mind resending that?  I'm still digging myself from under
> the huge pile of mail, and this seems to have been lost in process...

Brain wrote and sent the patch, I'm sure he'll resent it.

The re-factor I mention is just pulling out the needed bits from
complete_walk() rather than open coding it or reverting the original
change, commit 72287417abd1 ("open_last_lookups(): don't abuse
complete_walk() when all we want is unlazy").

Ian
Al Viro Jan. 7, 2022, 7:10 a.m. UTC | #4
On Fri, Jan 07, 2022 at 05:52:27AM +0000, Al Viro wrote:

> > Looks good, assuming Al is ok with the re-factoring.
> > Reviewed-by: Ian Kent <raven@themaw.net>
> 
> Ummm....  Mind resending that?  I'm still digging myself from under
> the huge pile of mail, and this seems to have been lost in process...

Non-obvious part is that current code only does this forgetting
the root when we are certain that we won't look at it later in
pathwalk.  IOW, it's guaranteed to be the same through the entire
thing.  This patch changes that; the final component may very well
be e.g. an absolute symlink.  We won't know that until we unlazy,
so we can't make forgetting conditional upon that.

I _think_ it's not going to lead to any problems, but I'll need to
take a good look at the entire thing after I get some sleep -
I'm about to fall down right now.

Other problems here (aside of whitespace damage - was that a
cut'n'paste of some kind?  Looks like 8859-1 NBSP for each
leading space...) are
	* misleading name of the new helper - it sounds like
"non-RCU side of complete_walk()" and that's not what it does
	* LOOKUP_CACHED needs to be mentioned in commit message
(it's incompatible with O_CREAT, so it couldn't occur on that
codepath - the transformation is an equivalent one, but that
ought to be mentioned)
	* the change I mentioned above needs to be in commit
message - this one is a lot more subtle.

Anyway, I'll look into that tomorrow - too sleepy right now
to do any serious analysis.
Al Viro Jan. 7, 2022, 7:22 a.m. UTC | #5
On Fri, Jan 07, 2022 at 03:04:46PM +0800, Ian Kent wrote:

> > Ummm....  Mind resending that?  I'm still digging myself from under
> > the huge pile of mail, and this seems to have been lost in process...
> 
> Brain wrote and sent the patch, I'm sure he'll resent it.
> 
> The re-factor I mention is just pulling out the needed bits from
> complete_walk() rather than open coding it or reverting the original
> change, commit 72287417abd1 ("open_last_lookups(): don't abuse
> complete_walk() when all we want is unlazy").

Good grief...  Ian, please fix your MUA - something's crapping those
NBSP (this time - UTF8 ones) into your replies...

Anyway, the problem is that it's not an equivalent transformation
and I'm not sure at the moment that there won't be interplay with
the things added during the last couple of years.

Tomorrow...
Brian Foster Jan. 7, 2022, 5:32 p.m. UTC | #6
On Fri, Jan 07, 2022 at 07:10:59AM +0000, Al Viro wrote:
> On Fri, Jan 07, 2022 at 05:52:27AM +0000, Al Viro wrote:
> 
> > > Looks good, assuming Al is ok with the re-factoring.
> > > Reviewed-by: Ian Kent <raven@themaw.net>
> > 
> > Ummm....  Mind resending that?  I'm still digging myself from under
> > the huge pile of mail, and this seems to have been lost in process...
> 
> Non-obvious part is that current code only does this forgetting
> the root when we are certain that we won't look at it later in
> pathwalk.  IOW, it's guaranteed to be the same through the entire
> thing.  This patch changes that; the final component may very well
> be e.g. an absolute symlink.  We won't know that until we unlazy,
> so we can't make forgetting conditional upon that.
> 
> I _think_ it's not going to lead to any problems, but I'll need to
> take a good look at the entire thing after I get some sleep -
> I'm about to fall down right now.
> 

Heh, Ok. I think I understand what you're getting at, but I'd have to
stare at the code much more to grok the details. Let me know if you
think the logic and/or commit log needs to change wrt to this and I'll
give it a shot.

> Other problems here (aside of whitespace damage - was that a
> cut'n'paste of some kind?  Looks like 8859-1 NBSP for each
> leading space...) are

Hmm.. I don't see any whitespace damage, even if I pull the patch back
from the mailing list into my tree..?

> 	* misleading name of the new helper - it sounds like
> "non-RCU side of complete_walk()" and that's not what it does

The intent was the opposite, of course. :P I'm not sure how you infer
the above from _rcu(), but I'll name the helper whatever. Suggestions?

> 	* LOOKUP_CACHED needs to be mentioned in commit message
> (it's incompatible with O_CREAT, so it couldn't occur on that
> codepath - the transformation is an equivalent one, but that
> ought to be mentioned)

Ok. I can add a "for non-create path only" comment to that effect in the
helper as well if useful.

> 	* the change I mentioned above needs to be in commit
> message - this one is a lot more subtle.
> 

Ack.

> Anyway, I'll look into that tomorrow - too sleepy right now
> to do any serious analysis.
> 

Ack. Thanks for the first pass.

Brian
Al Viro Jan. 7, 2022, 5:51 p.m. UTC | #7
On Fri, Jan 07, 2022 at 12:32:17PM -0500, Brian Foster wrote:

> > Other problems here (aside of whitespace damage - was that a
> > cut'n'paste of some kind?  Looks like 8859-1 NBSP for each
> > leading space...) are
> 
> Hmm.. I don't see any whitespace damage, even if I pull the patch back
> from the mailing list into my tree..?

That had occured in Ian's reply, almost certainly.  Looks like whatever
he's using for MUA (Evolution?) is misconfigured into doing whitespace
damage - his next mail (in utf8, rather than 8859-1) had a scattering of
U+00A0 in it...  Frankly, I'd never seen a decent GUI MUA, so I've no
real experience with that thing and no suggestions on how to fix that.

> > 	* misleading name of the new helper - it sounds like
> > "non-RCU side of complete_walk()" and that's not what it does
> 
> The intent was the opposite, of course. :P I'm not sure how you infer
> the above from _rcu(), but I'll name the helper whatever. Suggestions?

s/non-// in the above (I really had been half-asleep).  What I'm
saying is that this name invites an assumption that in RCU case
complete_walk() is equivalent to it.  Which is wrong - that's
what complete_walk() does as the first step if it needs to get
out of RCU mode.
Brian Foster Jan. 7, 2022, 6:42 p.m. UTC | #8
On Fri, Jan 07, 2022 at 05:51:52PM +0000, Al Viro wrote:
> On Fri, Jan 07, 2022 at 12:32:17PM -0500, Brian Foster wrote:
> 
> > > Other problems here (aside of whitespace damage - was that a
> > > cut'n'paste of some kind?  Looks like 8859-1 NBSP for each
> > > leading space...) are
> > 
> > Hmm.. I don't see any whitespace damage, even if I pull the patch back
> > from the mailing list into my tree..?
> 
> That had occured in Ian's reply, almost certainly.  Looks like whatever
> he's using for MUA (Evolution?) is misconfigured into doing whitespace
> damage - his next mail (in utf8, rather than 8859-1) had a scattering of
> U+00A0 in it...  Frankly, I'd never seen a decent GUI MUA, so I've no
> real experience with that thing and no suggestions on how to fix that.
> 
> > > 	* misleading name of the new helper - it sounds like
> > > "non-RCU side of complete_walk()" and that's not what it does
> > 
> > The intent was the opposite, of course. :P I'm not sure how you infer
> > the above from _rcu(), but I'll name the helper whatever. Suggestions?
> 
> s/non-// in the above (I really had been half-asleep).  What I'm
> saying is that this name invites an assumption that in RCU case
> complete_walk() is equivalent to it.  Which is wrong - that's
> what complete_walk() does as the first step if it needs to get
> out of RCU mode.
> 

Ah, Ok. I see your point. I suppose we could call it something like
__complete_walk() or __complete_walk_rcu() just to assert that it's a
subcomponent of complete_walk(). Alternatively, we could leave the
complete_walk() bits alone and create a slightly duplicate helper for
the create path (minus LOOKUP_CACHED), or just open code those bits in
the create path without a helper, or perhaps create a lower level helper
along the lines of:

static inline bool nd_reset_root_and_unlazy(struct nameidata *nd)
{
        /*      
         * We don't want to zero nd->root for scoped-lookups or
         * externally-managed nd->root.
         */
        if (!(nd->state & ND_ROOT_PRESET))
                if (!(nd->flags & LOOKUP_IS_SCOPED))
                        nd->root.mnt = NULL; 
        return try_to_unlazy(nd);
}

... and then just leave the LOOKUP_RCU check and LOOKUP_CACHED handling
open coded in the callers as they are now. Hm?

Brian
Ian Kent Jan. 8, 2022, 3:26 a.m. UTC | #9
On Fri, 2022-01-07 at 17:51 +0000, Al Viro wrote:
> On Fri, Jan 07, 2022 at 12:32:17PM -0500, Brian Foster wrote:
> 
> > > Other problems here (aside of whitespace damage - was that a
> > > cut'n'paste of some kind?  Looks like 8859-1 NBSP for each
> > > leading space...) are
> > 
> > Hmm.. I don't see any whitespace damage, even if I pull the patch
> > back
> > from the mailing list into my tree..?
> 
> That had occured in Ian's reply, almost certainly.  Looks like
> whatever
> he's using for MUA (Evolution?) is misconfigured into doing
> whitespace
> damage - his next mail (in utf8, rather than 8859-1) had a scattering
> of
> U+00A0 in it...  Frankly, I'd never seen a decent GUI MUA, so I've no
> real experience with that thing and no suggestions on how to fix
> that.

Yes, your right, I've changed that.
I'll play around some more to see if I can verify things are working,
mutt should be able to tell me that ...

Ian
> 
> > >         * misleading name of the new helper - it sounds like
> > > "non-RCU side of complete_walk()" and that's not what it does
> > 
> > The intent was the opposite, of course. :P I'm not sure how you
> > infer
> > the above from _rcu(), but I'll name the helper whatever.
> > Suggestions?
> 
> s/non-// in the above (I really had been half-asleep).  What I'm
> saying is that this name invites an assumption that in RCU case
> complete_walk() is equivalent to it.  Which is wrong - that's
> what complete_walk() does as the first step if it needs to get
> out of RCU mode.
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 1f9d2187c765..b32fcbc99929 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -856,6 +856,22 @@  static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
 		return 1;
 }
 
+static inline bool complete_walk_rcu(struct nameidata *nd)
+{
+	if (nd->flags & LOOKUP_RCU) {
+		/*
+		 * We don't want to zero nd->root for scoped-lookups or
+		 * externally-managed nd->root.
+		 */
+		if (!(nd->state & ND_ROOT_PRESET))
+			if (!(nd->flags & LOOKUP_IS_SCOPED))
+				nd->root.mnt = NULL;
+		nd->flags &= ~LOOKUP_CACHED;
+		return try_to_unlazy(nd);
+	}
+	return true;
+}
+
 /**
  * complete_walk - successful completion of path walk
  * @nd:  pointer nameidata
@@ -871,18 +887,8 @@  static int complete_walk(struct nameidata *nd)
 	struct dentry *dentry = nd->path.dentry;
 	int status;
 
-	if (nd->flags & LOOKUP_RCU) {
-		/*
-		 * We don't want to zero nd->root for scoped-lookups or
-		 * externally-managed nd->root.
-		 */
-		if (!(nd->state & ND_ROOT_PRESET))
-			if (!(nd->flags & LOOKUP_IS_SCOPED))
-				nd->root.mnt = NULL;
-		nd->flags &= ~LOOKUP_CACHED;
-		if (!try_to_unlazy(nd))
-			return -ECHILD;
-	}
+	if (!complete_walk_rcu(nd))
+		return -ECHILD;
 
 	if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
 		/*
@@ -3325,10 +3331,9 @@  static const char *open_last_lookups(struct nameidata *nd,
 		BUG_ON(nd->flags & LOOKUP_RCU);
 	} else {
 		/* create side of things */
-		if (nd->flags & LOOKUP_RCU) {
-			if (!try_to_unlazy(nd))
-				return ERR_PTR(-ECHILD);
-		}
+		if (!complete_walk_rcu(nd))
+			return ERR_PTR(-ECHILD);
+
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
 		/* trailing slashes? */
 		if (unlikely(nd->last.name[nd->last.len]))