diff mbox series

[5/6] kernfs: stay in rcu-walk mode if possible

Message ID 160862330474.291330.11664503360150456908.stgit@mickey.themaw.net (mailing list archive)
State New, archived
Headers show
Series kernfs: proposed locking and concurrency improvement | expand

Commit Message

Ian Kent Dec. 22, 2020, 7:48 a.m. UTC
During path walks in sysfs (kernfs) needing to take a reference to
a mount doesn't happen often since the walk won't be crossing mount
point boundaries.

Also while staying in rcu-walk mode where possible wouldn't normally
give much improvement.

But when there are many concurrent path walks and there is high d_lock
contention dget() will often need to resort to taking a spin lock to
get the reference. And that could happen each time the reference is
passed from component to component.

So, in the high contention case, it will contribute to the contention.

Therefore staying in rcu-walk mode when possible will reduce contention.

Signed-off-by: Ian Kent <raven@themaw.net>
---
 fs/kernfs/dir.c |   48 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

Comments

Fox Chen Feb. 5, 2021, 8:23 a.m. UTC | #1
Hi Ian,

On Tue, Dec 22, 2020 at 3:48 PM Ian Kent <raven@themaw.net> wrote:
>
> During path walks in sysfs (kernfs) needing to take a reference to
> a mount doesn't happen often since the walk won't be crossing mount
> point boundaries.
>
> Also while staying in rcu-walk mode where possible wouldn't normally
> give much improvement.
>
> But when there are many concurrent path walks and there is high d_lock
> contention dget() will often need to resort to taking a spin lock to
> get the reference. And that could happen each time the reference is
> passed from component to component.
>
> So, in the high contention case, it will contribute to the contention.
>
> Therefore staying in rcu-walk mode when possible will reduce contention.
>
> Signed-off-by: Ian Kent <raven@themaw.net>
> ---
>  fs/kernfs/dir.c |   48 +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index fdeae2c6e7ba..50c5c8c886af 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -1048,8 +1048,54 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
>         struct kernfs_node *parent;
>         struct kernfs_node *kn;
>
> -       if (flags & LOOKUP_RCU)
> +       if (flags & LOOKUP_RCU) {
> +               parent = kernfs_dentry_node(dentry->d_parent);
> +
> +               /* Directory node changed, no, then don't search? */
> +               if (!kernfs_dir_changed(parent, dentry))
> +                       return 1;
> +
> +               kn = kernfs_dentry_node(dentry);
> +               if (!kn) {
> +                       /* Negative hashed dentry, tell the VFS to switch to
> +                        * ref-walk mode and call us again so that node
> +                        * existence can be checked.
> +                        */
> +                       if (!d_unhashed(dentry))
> +                               return -ECHILD;
> +
> +                       /* Negative unhashed dentry, this shouldn't happen
> +                        * because this case occurs in ref-walk mode after
> +                        * dentry allocation which is followed by a call
> +                        * to ->loopup(). But if it does happen the dentry
> +                        * is surely invalid.
> +                        */
> +                       return 0;
> +               }
> +
> +               /* Since the dentry is positive (we got the kernfs node) a
> +                * kernfs node reference was held at the time. Now if the
> +                * dentry reference count is still greater than 0 it's still
> +                * positive so take a reference to the node to perform an
> +                * active check.
> +                */
> +               if (d_count(dentry) <= 0 || !atomic_inc_not_zero(&kn->count))
> +                       return -ECHILD;
> +
> +               /* The kernfs node reference count was greater than 0, if
> +                * it's active continue in rcu-walk mode.
> +                */
> +               if (kernfs_active_read(kn)) {
We are in RCU-walk mode, kernfs_rwsem should not be held, however,
kernfs_active_read will assert the readlock is held. I believe it
should be kernfs_active(kn) here. Am I wrong??

> +                       kernfs_put(kn);
> +                       return 1;
> +               }
> +
> +               /* Otherwise, just tell the VFS to switch to ref-walk mode
> +                * and call us again so the kernfs node can be validated.
> +                */
> +               kernfs_put(kn);
>                 return -ECHILD;
> +       }
>
>         down_read(&kernfs_rwsem);
>
>
>

thanks,
fox
Ian Kent Feb. 5, 2021, 12:10 p.m. UTC | #2
On Fri, 2021-02-05 at 16:23 +0800, Fox Chen wrote:
> Hi Ian,
> 
> On Tue, Dec 22, 2020 at 3:48 PM Ian Kent <raven@themaw.net> wrote:
> > During path walks in sysfs (kernfs) needing to take a reference to
> > a mount doesn't happen often since the walk won't be crossing mount
> > point boundaries.
> > 
> > Also while staying in rcu-walk mode where possible wouldn't
> > normally
> > give much improvement.
> > 
> > But when there are many concurrent path walks and there is high
> > d_lock
> > contention dget() will often need to resort to taking a spin lock
> > to
> > get the reference. And that could happen each time the reference is
> > passed from component to component.
> > 
> > So, in the high contention case, it will contribute to the
> > contention.
> > 
> > Therefore staying in rcu-walk mode when possible will reduce
> > contention.
> > 
> > Signed-off-by: Ian Kent <raven@themaw.net>
> > ---
> >  fs/kernfs/dir.c |   48
> > +++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 47 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> > index fdeae2c6e7ba..50c5c8c886af 100644
> > --- a/fs/kernfs/dir.c
> > +++ b/fs/kernfs/dir.c
> > @@ -1048,8 +1048,54 @@ static int kernfs_dop_revalidate(struct
> > dentry *dentry, unsigned int flags)
> >         struct kernfs_node *parent;
> >         struct kernfs_node *kn;
> > 
> > -       if (flags & LOOKUP_RCU)
> > +       if (flags & LOOKUP_RCU) {
> > +               parent = kernfs_dentry_node(dentry->d_parent);
> > +
> > +               /* Directory node changed, no, then don't search?
> > */
> > +               if (!kernfs_dir_changed(parent, dentry))
> > +                       return 1;
> > +
> > +               kn = kernfs_dentry_node(dentry);
> > +               if (!kn) {
> > +                       /* Negative hashed dentry, tell the VFS to
> > switch to
> > +                        * ref-walk mode and call us again so that
> > node
> > +                        * existence can be checked.
> > +                        */
> > +                       if (!d_unhashed(dentry))
> > +                               return -ECHILD;
> > +
> > +                       /* Negative unhashed dentry, this shouldn't
> > happen
> > +                        * because this case occurs in ref-walk
> > mode after
> > +                        * dentry allocation which is followed by a
> > call
> > +                        * to ->loopup(). But if it does happen the
> > dentry
> > +                        * is surely invalid.
> > +                        */
> > +                       return 0;
> > +               }
> > +
> > +               /* Since the dentry is positive (we got the kernfs
> > node) a
> > +                * kernfs node reference was held at the time. Now
> > if the
> > +                * dentry reference count is still greater than 0
> > it's still
> > +                * positive so take a reference to the node to
> > perform an
> > +                * active check.
> > +                */
> > +               if (d_count(dentry) <= 0 ||
> > !atomic_inc_not_zero(&kn->count))
> > +                       return -ECHILD;
> > +
> > +               /* The kernfs node reference count was greater than
> > 0, if
> > +                * it's active continue in rcu-walk mode.
> > +                */
> > +               if (kernfs_active_read(kn)) {
> We are in RCU-walk mode, kernfs_rwsem should not be held, however,
> kernfs_active_read will assert the readlock is held. I believe it
> should be kernfs_active(kn) here. Am I wrong??

No I think you are correct.
I'll check it and fix it.

> 
> > +                       kernfs_put(kn);
> > +                       return 1;
> > +               }
> > +
> > +               /* Otherwise, just tell the VFS to switch to ref-
> > walk mode
> > +                * and call us again so the kernfs node can be
> > validated.
> > +                */
> > +               kernfs_put(kn);
> >                 return -ECHILD;
> > +       }
> > 
> >         down_read(&kernfs_rwsem);
> > 
> > 
> > 
> 
> thanks,
> fox
diff mbox series

Patch

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index fdeae2c6e7ba..50c5c8c886af 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1048,8 +1048,54 @@  static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
 	struct kernfs_node *parent;
 	struct kernfs_node *kn;
 
-	if (flags & LOOKUP_RCU)
+	if (flags & LOOKUP_RCU) {
+		parent = kernfs_dentry_node(dentry->d_parent);
+
+		/* Directory node changed, no, then don't search? */
+		if (!kernfs_dir_changed(parent, dentry))
+			return 1;
+
+		kn = kernfs_dentry_node(dentry);
+		if (!kn) {
+			/* Negative hashed dentry, tell the VFS to switch to
+			 * ref-walk mode and call us again so that node
+			 * existence can be checked.
+			 */
+			if (!d_unhashed(dentry))
+				return -ECHILD;
+
+			/* Negative unhashed dentry, this shouldn't happen
+			 * because this case occurs in ref-walk mode after
+			 * dentry allocation which is followed by a call
+			 * to ->loopup(). But if it does happen the dentry
+			 * is surely invalid.
+			 */
+			return 0;
+		}
+
+		/* Since the dentry is positive (we got the kernfs node) a
+		 * kernfs node reference was held at the time. Now if the
+		 * dentry reference count is still greater than 0 it's still
+		 * positive so take a reference to the node to perform an
+		 * active check.
+		 */
+		if (d_count(dentry) <= 0 || !atomic_inc_not_zero(&kn->count))
+			return -ECHILD;
+
+		/* The kernfs node reference count was greater than 0, if
+		 * it's active continue in rcu-walk mode.
+		 */
+		if (kernfs_active_read(kn)) {
+			kernfs_put(kn);
+			return 1;
+		}
+
+		/* Otherwise, just tell the VFS to switch to ref-walk mode
+		 * and call us again so the kernfs node can be validated.
+		 */
+		kernfs_put(kn);
 		return -ECHILD;
+	}
 
 	down_read(&kernfs_rwsem);