diff mbox series

[v3] fs: try an opportunistic lookup for O_CREAT opens too

Message ID 20240807-openfast-v3-1-040d132d2559@kernel.org (mailing list archive)
State New
Headers show
Series [v3] fs: try an opportunistic lookup for O_CREAT opens too | expand

Commit Message

Jeff Layton Aug. 7, 2024, 12:10 p.m. UTC
Today, when opening a file we'll typically do a fast lookup, but if
O_CREAT is set, the kernel always takes the exclusive inode lock. I
assume this was done with the expectation that O_CREAT means that we
always expect to do the create, but that's often not the case. Many
programs set O_CREAT even in scenarios where the file already exists.

This patch rearranges the pathwalk-for-open code to also attempt a
fast_lookup in certain O_CREAT cases. If a positive dentry is found, the
inode_lock can be avoided altogether, and if auditing isn't enabled, it
can stay in rcuwalk mode for the last step_into.

One notable exception that is hopefully temporary: if we're doing an
rcuwalk and auditing is enabled, skip the lookup_fast. Legitimizing the
dentry in that case is more expensive than taking the i_rwsem for now.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Here's a revised patch that does a fast_lookup in the O_CREAT codepath
too. The main difference here is that if a positive dentry is found and
audit_dummy_context is true, then we keep the walk lazy for the last
component, which avoids having to take any locks on the parent (just
like with non-O_CREAT opens).

Mateusz wrote a will-it-scale test that does an O_CREAT open and close in
the same directory repeatedly. Running that in 70 different processes:

    v6.10:		  754565
    v6.10+patch:	25747851

...which is roughly a 34x speedup. I also ran the unlink1 test in single
process mode to try and gauge how bad the performance impact would be in
the case where we always have to search, not find anything and do the
create:

    v6.10:		200106
    v6.10+patch:	199188

~0.4% performance hit in that test. I'm not sure that's statistically
significant, but we should keep an eye out for slowdowns in these sorts
of workloads if we decide to take this.
---
Changes in v3:
- Check for IS_ERR in lookup_fast result
- Future-proof open_last_lookups to handle case where lookup_fast_for_open
  returns a positive dentry while auditing is enabled
- Link to v2: https://lore.kernel.org/r/20240806-openfast-v2-1-42da45981811@kernel.org

Changes in v2:
- drop the lockref patch since Mateusz is working on a better approach
- add trailing_slashes helper function
- add a lookup_fast_for_open helper function
- make lookup_fast_for_open skip the lookup if auditing is enabled
- if we find a positive dentry and auditing is disabled, don't unlazy
- Link to v1: https://lore.kernel.org/r/20240802-openfast-v1-0-a1cff2a33063@kernel.org
---
 fs/namei.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 64 insertions(+), 10 deletions(-)


---
base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
change-id: 20240723-openfast-ac49a7b6ade2

Best regards,

Comments

Jan Kara Aug. 7, 2024, 1:20 p.m. UTC | #1
On Wed 07-08-24 08:10:27, Jeff Layton wrote:
> Today, when opening a file we'll typically do a fast lookup, but if
> O_CREAT is set, the kernel always takes the exclusive inode lock. I
> assume this was done with the expectation that O_CREAT means that we
> always expect to do the create, but that's often not the case. Many
> programs set O_CREAT even in scenarios where the file already exists.
> 
> This patch rearranges the pathwalk-for-open code to also attempt a
> fast_lookup in certain O_CREAT cases. If a positive dentry is found, the
> inode_lock can be avoided altogether, and if auditing isn't enabled, it
> can stay in rcuwalk mode for the last step_into.
> 
> One notable exception that is hopefully temporary: if we're doing an
> rcuwalk and auditing is enabled, skip the lookup_fast. Legitimizing the
> dentry in that case is more expensive than taking the i_rwsem for now.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

I'm not very familiar with the path lookup code but the patch looks correct
to me and the win is nice. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> Here's a revised patch that does a fast_lookup in the O_CREAT codepath
> too. The main difference here is that if a positive dentry is found and
> audit_dummy_context is true, then we keep the walk lazy for the last
> component, which avoids having to take any locks on the parent (just
> like with non-O_CREAT opens).
> 
> Mateusz wrote a will-it-scale test that does an O_CREAT open and close in
> the same directory repeatedly. Running that in 70 different processes:
> 
>     v6.10:		  754565
>     v6.10+patch:	25747851
> 
> ...which is roughly a 34x speedup. I also ran the unlink1 test in single
> process mode to try and gauge how bad the performance impact would be in
> the case where we always have to search, not find anything and do the
> create:
> 
>     v6.10:		200106
>     v6.10+patch:	199188
> 
> ~0.4% performance hit in that test. I'm not sure that's statistically
> significant, but we should keep an eye out for slowdowns in these sorts
> of workloads if we decide to take this.
> ---
> Changes in v3:
> - Check for IS_ERR in lookup_fast result
> - Future-proof open_last_lookups to handle case where lookup_fast_for_open
>   returns a positive dentry while auditing is enabled
> - Link to v2: https://lore.kernel.org/r/20240806-openfast-v2-1-42da45981811@kernel.org
> 
> Changes in v2:
> - drop the lockref patch since Mateusz is working on a better approach
> - add trailing_slashes helper function
> - add a lookup_fast_for_open helper function
> - make lookup_fast_for_open skip the lookup if auditing is enabled
> - if we find a positive dentry and auditing is disabled, don't unlazy
> - Link to v1: https://lore.kernel.org/r/20240802-openfast-v1-0-a1cff2a33063@kernel.org
> ---
>  fs/namei.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 64 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 1e05a0f3f04d..7894fafa8e71 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3518,6 +3518,49 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
>  	return ERR_PTR(error);
>  }
>  
> +static inline bool trailing_slashes(struct nameidata *nd)
> +{
> +	return (bool)nd->last.name[nd->last.len];
> +}
> +
> +static struct dentry *lookup_fast_for_open(struct nameidata *nd, int open_flag)
> +{
> +	struct dentry *dentry;
> +
> +	if (open_flag & O_CREAT) {
> +		/* Don't bother on an O_EXCL create */
> +		if (open_flag & O_EXCL)
> +			return NULL;
> +
> +		/*
> +		 * FIXME: If auditing is enabled, then we'll have to unlazy to
> +		 * use the dentry. For now, don't do this, since it shifts
> +		 * contention from parent's i_rwsem to its d_lockref spinlock.
> +		 * Reconsider this once dentry refcounting handles heavy
> +		 * contention better.
> +		 */
> +		if ((nd->flags & LOOKUP_RCU) && !audit_dummy_context())
> +			return NULL;
> +	}
> +
> +	if (trailing_slashes(nd))
> +		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
> +
> +	dentry = lookup_fast(nd);
> +	if (IS_ERR_OR_NULL(dentry))
> +		return dentry;
> +
> +	if (open_flag & O_CREAT) {
> +		/* Discard negative dentries. Need inode_lock to do the create */
> +		if (!dentry->d_inode) {
> +			if (!(nd->flags & LOOKUP_RCU))
> +				dput(dentry);
> +			dentry = NULL;
> +		}
> +	}
> +	return dentry;
> +}
> +
>  static const char *open_last_lookups(struct nameidata *nd,
>  		   struct file *file, const struct open_flags *op)
>  {
> @@ -3535,28 +3578,39 @@ static const char *open_last_lookups(struct nameidata *nd,
>  		return handle_dots(nd, nd->last_type);
>  	}
>  
> +	/* We _can_ be in RCU mode here */
> +	dentry = lookup_fast_for_open(nd, open_flag);
> +	if (IS_ERR(dentry))
> +		return ERR_CAST(dentry);
> +
>  	if (!(open_flag & O_CREAT)) {
> -		if (nd->last.name[nd->last.len])
> -			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
> -		/* we _can_ be in RCU mode here */
> -		dentry = lookup_fast(nd);
> -		if (IS_ERR(dentry))
> -			return ERR_CAST(dentry);
>  		if (likely(dentry))
>  			goto finish_lookup;
>  
>  		if (WARN_ON_ONCE(nd->flags & LOOKUP_RCU))
>  			return ERR_PTR(-ECHILD);
>  	} else {
> -		/* create side of things */
>  		if (nd->flags & LOOKUP_RCU) {
> -			if (!try_to_unlazy(nd))
> +			bool unlazied;
> +
> +			/* can stay in rcuwalk if not auditing */
> +			if (dentry && audit_dummy_context()) {
> +				if (trailing_slashes(nd))
> +					return ERR_PTR(-EISDIR);
> +				goto finish_lookup;
> +			}
> +			unlazied = dentry ? try_to_unlazy_next(nd, dentry) :
> +					    try_to_unlazy(nd);
> +			if (!unlazied)
>  				return ERR_PTR(-ECHILD);
>  		}
>  		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
> -		/* trailing slashes? */
> -		if (unlikely(nd->last.name[nd->last.len]))
> +		if (trailing_slashes(nd)) {
> +			dput(dentry);
>  			return ERR_PTR(-EISDIR);
> +		}
> +		if (dentry)
> +			goto finish_lookup;
>  	}
>  
>  	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
> 
> ---
> base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
> change-id: 20240723-openfast-ac49a7b6ade2
> 
> Best regards,
> -- 
> Jeff Layton <jlayton@kernel.org>
>
Josef Bacik Aug. 7, 2024, 2:24 p.m. UTC | #2
On Wed, Aug 07, 2024 at 08:10:27AM -0400, Jeff Layton wrote:
> Today, when opening a file we'll typically do a fast lookup, but if
> O_CREAT is set, the kernel always takes the exclusive inode lock. I
> assume this was done with the expectation that O_CREAT means that we
> always expect to do the create, but that's often not the case. Many
> programs set O_CREAT even in scenarios where the file already exists.
> 
> This patch rearranges the pathwalk-for-open code to also attempt a
> fast_lookup in certain O_CREAT cases. If a positive dentry is found, the
> inode_lock can be avoided altogether, and if auditing isn't enabled, it
> can stay in rcuwalk mode for the last step_into.
> 
> One notable exception that is hopefully temporary: if we're doing an
> rcuwalk and auditing is enabled, skip the lookup_fast. Legitimizing the
> dentry in that case is more expensive than taking the i_rwsem for now.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
Christian Brauner Aug. 7, 2024, 2:28 p.m. UTC | #3
Ah, I was too late. Jeff, see my reply to your earlier v2 and let me
know how feasible this is in your opinion.
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 1e05a0f3f04d..7894fafa8e71 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3518,6 +3518,49 @@  static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	return ERR_PTR(error);
 }
 
+static inline bool trailing_slashes(struct nameidata *nd)
+{
+	return (bool)nd->last.name[nd->last.len];
+}
+
+static struct dentry *lookup_fast_for_open(struct nameidata *nd, int open_flag)
+{
+	struct dentry *dentry;
+
+	if (open_flag & O_CREAT) {
+		/* Don't bother on an O_EXCL create */
+		if (open_flag & O_EXCL)
+			return NULL;
+
+		/*
+		 * FIXME: If auditing is enabled, then we'll have to unlazy to
+		 * use the dentry. For now, don't do this, since it shifts
+		 * contention from parent's i_rwsem to its d_lockref spinlock.
+		 * Reconsider this once dentry refcounting handles heavy
+		 * contention better.
+		 */
+		if ((nd->flags & LOOKUP_RCU) && !audit_dummy_context())
+			return NULL;
+	}
+
+	if (trailing_slashes(nd))
+		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
+
+	dentry = lookup_fast(nd);
+	if (IS_ERR_OR_NULL(dentry))
+		return dentry;
+
+	if (open_flag & O_CREAT) {
+		/* Discard negative dentries. Need inode_lock to do the create */
+		if (!dentry->d_inode) {
+			if (!(nd->flags & LOOKUP_RCU))
+				dput(dentry);
+			dentry = NULL;
+		}
+	}
+	return dentry;
+}
+
 static const char *open_last_lookups(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op)
 {
@@ -3535,28 +3578,39 @@  static const char *open_last_lookups(struct nameidata *nd,
 		return handle_dots(nd, nd->last_type);
 	}
 
+	/* We _can_ be in RCU mode here */
+	dentry = lookup_fast_for_open(nd, open_flag);
+	if (IS_ERR(dentry))
+		return ERR_CAST(dentry);
+
 	if (!(open_flag & O_CREAT)) {
-		if (nd->last.name[nd->last.len])
-			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
-		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd);
-		if (IS_ERR(dentry))
-			return ERR_CAST(dentry);
 		if (likely(dentry))
 			goto finish_lookup;
 
 		if (WARN_ON_ONCE(nd->flags & LOOKUP_RCU))
 			return ERR_PTR(-ECHILD);
 	} else {
-		/* create side of things */
 		if (nd->flags & LOOKUP_RCU) {
-			if (!try_to_unlazy(nd))
+			bool unlazied;
+
+			/* can stay in rcuwalk if not auditing */
+			if (dentry && audit_dummy_context()) {
+				if (trailing_slashes(nd))
+					return ERR_PTR(-EISDIR);
+				goto finish_lookup;
+			}
+			unlazied = dentry ? try_to_unlazy_next(nd, dentry) :
+					    try_to_unlazy(nd);
+			if (!unlazied)
 				return ERR_PTR(-ECHILD);
 		}
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
-		/* trailing slashes? */
-		if (unlikely(nd->last.name[nd->last.len]))
+		if (trailing_slashes(nd)) {
+			dput(dentry);
 			return ERR_PTR(-EISDIR);
+		}
+		if (dentry)
+			goto finish_lookup;
 	}
 
 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {