diff mbox series

[v2] vfs: move dentry shrinking outside the inode lock in 'rmdir()'

Message ID 20240511200240.6354-2-torvalds@linux-foundation.org (mailing list archive)
State New
Headers show
Series [v2] vfs: move dentry shrinking outside the inode lock in 'rmdir()' | expand

Commit Message

Linus Torvalds May 11, 2024, 8:02 p.m. UTC
Yafang Shao reports that he has seen loads that generate billions of
negative dentries in a directory, which then when the directory is
removed causes excessive latencies for other users because the dentry
shrinking is done under the directory inode lock.

There seems to be no actual reason for holding the inode lock any more
by the time we get rid of the now uninteresting negative dentries, and
it's an effect of the calling convention.

Split the 'vfs_rmdir()' function into two separate phases:

 - 'vfs_rmdir_raw()' does the actual main rmdir heavy lifting

 - 'vfs_rmdir_cleanup()' needs to be run by the caller after a
   successful raw call, after the caller has dropped the inode locks.

We leave the 'vfs_rmdir()' function around, since it has multiple
callers, and only convert the main system call path to the new two-phase
model.  The other uses will be left as an exercise for the reader for
when people find they care.

[ Side note: I think the 'dget()/dput()' pair in vfs_rmdir_raw() is
  superfluous, since callers always have to have a dentry reference over
  the call anyway. That's a separate issue.    - Linus ]

Reported-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/all/20240511022729.35144-1-laoar.shao@gmail.com/
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Waiman Long <longman@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Second version - this time doing the dentry pruning even later, after
releasing the parent inode lock too. 

I did the same amount of "extensive testing" on this one as the previous
one.  IOW, little-to-none. 

 fs/namei.c | 61 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 41 insertions(+), 20 deletions(-)

Comments

Yafang Shao May 12, 2024, 3:06 a.m. UTC | #1
On Sun, May 12, 2024 at 4:03 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Yafang Shao reports that he has seen loads that generate billions of
> negative dentries in a directory, which then when the directory is
> removed causes excessive latencies for other users because the dentry
> shrinking is done under the directory inode lock.
>
> There seems to be no actual reason for holding the inode lock any more
> by the time we get rid of the now uninteresting negative dentries, and
> it's an effect of the calling convention.
>
> Split the 'vfs_rmdir()' function into two separate phases:
>
>  - 'vfs_rmdir_raw()' does the actual main rmdir heavy lifting
>
>  - 'vfs_rmdir_cleanup()' needs to be run by the caller after a
>    successful raw call, after the caller has dropped the inode locks.
>
> We leave the 'vfs_rmdir()' function around, since it has multiple
> callers, and only convert the main system call path to the new two-phase
> model.  The other uses will be left as an exercise for the reader for
> when people find they care.
>
> [ Side note: I think the 'dget()/dput()' pair in vfs_rmdir_raw() is
>   superfluous, since callers always have to have a dentry reference over
>   the call anyway. That's a separate issue.    - Linus ]
>
> Reported-by: Yafang Shao <laoar.shao@gmail.com>
> Link: https://lore.kernel.org/all/20240511022729.35144-1-laoar.shao@gmail.com/
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

This could resolve the secondary concern.
Tested-by: Yafang Shao <laoar.shao@gmail.com>

Might it be feasible to execute the vfs_rmdir_cleanup() within a
kwoker? Such an approach could potentially mitigate the initial
concern as well.

> ---
>
> Second version - this time doing the dentry pruning even later, after
> releasing the parent inode lock too.
>
> I did the same amount of "extensive testing" on this one as the previous
> one.  IOW, little-to-none.
>
>  fs/namei.c | 61 ++++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 41 insertions(+), 20 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 28e62238346e..15b4ff6ed1e5 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -4176,21 +4176,7 @@ SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode)
>         return do_mkdirat(AT_FDCWD, getname(pathname), mode);
>  }
>
> -/**
> - * vfs_rmdir - remove directory
> - * @idmap:     idmap of the mount the inode was found from
> - * @dir:       inode of @dentry
> - * @dentry:    pointer to dentry of the base directory
> - *
> - * Remove a directory.
> - *
> - * If the inode has been found through an idmapped mount the idmap of
> - * the vfsmount must be passed through @idmap. This function will then take
> - * care to map the inode according to @idmap before checking permissions.
> - * On non-idmapped mounts or if permission checking is to be performed on the
> - * raw inode simply pass @nop_mnt_idmap.
> - */
> -int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
> +static int vfs_rmdir_raw(struct mnt_idmap *idmap, struct inode *dir,
>                      struct dentry *dentry)
>  {
>         int error = may_delete(idmap, dir, dentry, 1);
> @@ -4217,18 +4203,43 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
>         if (error)
>                 goto out;
>
> -       shrink_dcache_parent(dentry);
>         dentry->d_inode->i_flags |= S_DEAD;
>         dont_mount(dentry);
>         detach_mounts(dentry);
> -
>  out:
>         inode_unlock(dentry->d_inode);
>         dput(dentry);
> -       if (!error)
> -               d_delete_notify(dir, dentry);
>         return error;
>  }
> +
> +static inline void vfs_rmdir_cleanup(struct inode *dir, struct dentry *dentry)
> +{
> +       shrink_dcache_parent(dentry);
> +       d_delete_notify(dir, dentry);
> +}
> +
> +/**
> + * vfs_rmdir - remove directory
> + * @idmap:     idmap of the mount the inode was found from
> + * @dir:       inode of @dentry
> + * @dentry:    pointer to dentry of the base directory
> + *
> + * Remove a directory.
> + *
> + * If the inode has been found through an idmapped mount the idmap of
> + * the vfsmount must be passed through @idmap. This function will then take
> + * care to map the inode according to @idmap before checking permissions.
> + * On non-idmapped mounts or if permission checking is to be performed on the
> + * raw inode simply pass @nop_mnt_idmap.
> + */
> +int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
> +                    struct dentry *dentry)
> +{
> +       int retval = vfs_rmdir_raw(idmap, dir, dentry);
> +       if (!retval)
> +               vfs_rmdir_cleanup(dir, dentry);
> +       return retval;
> +}
>  EXPORT_SYMBOL(vfs_rmdir);
>
>  int do_rmdir(int dfd, struct filename *name)
> @@ -4272,7 +4283,17 @@ int do_rmdir(int dfd, struct filename *name)
>         error = security_path_rmdir(&path, dentry);
>         if (error)
>                 goto exit4;
> -       error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
> +       error = vfs_rmdir_raw(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
> +       if (error)
> +               goto exit4;
> +       inode_unlock(path.dentry->d_inode);
> +       mnt_drop_write(path.mnt);
> +       vfs_rmdir_cleanup(path.dentry->d_inode, dentry);
> +       dput(dentry);
> +       path_put(&path);
> +       putname(name);
> +       return 0;
> +
>  exit4:
>         dput(dentry);
>  exit3:
> --
> 2.44.0.330.g4d18c88175
>
Al Viro May 12, 2024, 3:30 a.m. UTC | #2
On Sun, May 12, 2024 at 11:06:07AM +0800, Yafang Shao wrote:
> This could resolve the secondary concern.
> Tested-by: Yafang Shao <laoar.shao@gmail.com>
> 
> Might it be feasible to execute the vfs_rmdir_cleanup() within a
> kwoker? Such an approach could potentially mitigate the initial
> concern as well.

	I'm honestly not sure I understood you correctly; in case I
have and you really want to make that asynchronous, the answer's "we
can't do that".  What's more, we can not even delay that past the call
of mnt_drop_write() in do_rmdir().
Yafang Shao May 12, 2024, 3:36 a.m. UTC | #3
On Sun, May 12, 2024 at 11:30 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, May 12, 2024 at 11:06:07AM +0800, Yafang Shao wrote:
> > This could resolve the secondary concern.
> > Tested-by: Yafang Shao <laoar.shao@gmail.com>
> >
> > Might it be feasible to execute the vfs_rmdir_cleanup() within a
> > kwoker? Such an approach could potentially mitigate the initial
> > concern as well.
>
>         I'm honestly not sure I understood you correctly; in case I
> have and you really want to make that asynchronous,

Exactly, that's precisely my point.

> the answer's "we
> can't do that".  What's more, we can not even delay that past the call
> of mnt_drop_write() in do_rmdir().

Thanks for your explanation.
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 28e62238346e..15b4ff6ed1e5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4176,21 +4176,7 @@  SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode)
 	return do_mkdirat(AT_FDCWD, getname(pathname), mode);
 }
 
-/**
- * vfs_rmdir - remove directory
- * @idmap:	idmap of the mount the inode was found from
- * @dir:	inode of @dentry
- * @dentry:	pointer to dentry of the base directory
- *
- * Remove a directory.
- *
- * If the inode has been found through an idmapped mount the idmap of
- * the vfsmount must be passed through @idmap. This function will then take
- * care to map the inode according to @idmap before checking permissions.
- * On non-idmapped mounts or if permission checking is to be performed on the
- * raw inode simply pass @nop_mnt_idmap.
- */
-int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
+static int vfs_rmdir_raw(struct mnt_idmap *idmap, struct inode *dir,
 		     struct dentry *dentry)
 {
 	int error = may_delete(idmap, dir, dentry, 1);
@@ -4217,18 +4203,43 @@  int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
 	if (error)
 		goto out;
 
-	shrink_dcache_parent(dentry);
 	dentry->d_inode->i_flags |= S_DEAD;
 	dont_mount(dentry);
 	detach_mounts(dentry);
-
 out:
 	inode_unlock(dentry->d_inode);
 	dput(dentry);
-	if (!error)
-		d_delete_notify(dir, dentry);
 	return error;
 }
+
+static inline void vfs_rmdir_cleanup(struct inode *dir, struct dentry *dentry)
+{
+	shrink_dcache_parent(dentry);
+	d_delete_notify(dir, dentry);
+}
+
+/**
+ * vfs_rmdir - remove directory
+ * @idmap:	idmap of the mount the inode was found from
+ * @dir:	inode of @dentry
+ * @dentry:	pointer to dentry of the base directory
+ *
+ * Remove a directory.
+ *
+ * If the inode has been found through an idmapped mount the idmap of
+ * the vfsmount must be passed through @idmap. This function will then take
+ * care to map the inode according to @idmap before checking permissions.
+ * On non-idmapped mounts or if permission checking is to be performed on the
+ * raw inode simply pass @nop_mnt_idmap.
+ */
+int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir,
+		     struct dentry *dentry)
+{
+	int retval = vfs_rmdir_raw(idmap, dir, dentry);
+	if (!retval)
+		vfs_rmdir_cleanup(dir, dentry);
+	return retval;
+}
 EXPORT_SYMBOL(vfs_rmdir);
 
 int do_rmdir(int dfd, struct filename *name)
@@ -4272,7 +4283,17 @@  int do_rmdir(int dfd, struct filename *name)
 	error = security_path_rmdir(&path, dentry);
 	if (error)
 		goto exit4;
-	error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
+	error = vfs_rmdir_raw(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
+	if (error)
+		goto exit4;
+	inode_unlock(path.dentry->d_inode);
+	mnt_drop_write(path.mnt);
+	vfs_rmdir_cleanup(path.dentry->d_inode, dentry);
+	dput(dentry);
+	path_put(&path);
+	putname(name);
+	return 0;
+
 exit4:
 	dput(dentry);
 exit3: