diff mbox series

[20/20] 9p: fix ->rename_sem exclusion

Message ID 20250110024303.4157645-20-viro@zeniv.linux.org.uk (mailing list archive)
State New
Headers show
Series [01/20] make sure that DNAME_INLINE_LEN is a multiple of word size | expand

Commit Message

Al Viro Jan. 10, 2025, 2:43 a.m. UTC
9p wants to be able to build a path from given dentry to fs root and keep
it valid over a blocking operation.

->s_vfs_rename_mutex would be a natural candidate, but there are places
where we need that and where we have no way to tell if ->s_vfs_rename_mutex
is already held deeper in callchain.  Moreover, it's only held for
cross-directory renames; name changes within the same directory happen
without it.

Solution:
	* have d_move() done in ->rename() rather than in its caller
	* maintain a 9p-private rwsem (per-filesystem)
	* hold it exclusive over the relevant part of ->rename()
	* hold it shared over the places where we want the path.

That almost works.  FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control.  However, there's
also __d_unalias(), which isn't covered by any of that.

If ->lookup() hits a directory inode with preexisting dentry elsewhere
(due to e.g. rename done on server behind our back), d_splice_alias()
called by ->lookup() will move/rename that alias.

An approach to fixing that would be a couple of optional methods, so that
__d_unalias() would do
	if alias->d_op->d_unalias_trylock != NULL
		if (!alias->d_op->d_unalias_trylock(alias))
			fail (resulting in -ESTALE from lookup)
	__d_move(...)
	if alias->d_op->d_unalias_unlock != NULL
		alias->d_unalias_unlock(alias)
where it currently does __d_move().  9p instances would be down_write_trylock()
and up_write() of ->rename_mutex.

However, to reduce dentry_operations bloat, let's add one method instead -
->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and
->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias).

Another possible variant would be to hold ->rename_sem exclusive around
d_splice_alias() calls in 9p ->lookup(), but that would cause a lot of
contention on that rwsem and it's filesystem-wide, so let's not go there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/locking.rst |  2 ++
 Documentation/filesystems/vfs.rst     | 19 +++++++++++++++++++
 fs/9p/v9fs.h                          |  2 +-
 fs/9p/vfs_dentry.c                    | 13 +++++++++++++
 fs/dcache.c                           |  6 ++++++
 include/linux/dcache.h                |  1 +
 6 files changed, 42 insertions(+), 1 deletion(-)

Comments

Linus Torvalds Jan. 10, 2025, 3:11 a.m. UTC | #1
On Thu, 9 Jan 2025 at 18:45, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> However, to reduce dentry_operations bloat, let's add one method instead -
> ->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and
> ->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias).

Ugh.

So of all the patches, this is the one that I hate.

I absolutely detest interfaces with random true/false arguments, and
when it is about locking, the "detest" becomes something even darker
and more visceral.

I think it would be a lot better as separate ops, considering that

 (a) we'll probably have only one or two actual users, so it's not
like it complicates things on that side

 (b) we don't have *that* many "struct dentry_operations" structures:
I think they are all statically generated constant structures
(typically one or two per filesystem), so it's not like we're saving
memory by merging those pointers into one.

Please?

           Linus
Al Viro Jan. 10, 2025, 5:53 a.m. UTC | #2
On Thu, Jan 09, 2025 at 07:11:46PM -0800, Linus Torvalds wrote:
> On Thu, 9 Jan 2025 at 18:45, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > However, to reduce dentry_operations bloat, let's add one method instead -
> > ->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and
> > ->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias).
> 
> Ugh.
> 
> So of all the patches, this is the one that I hate.
> 
> I absolutely detest interfaces with random true/false arguments, and
> when it is about locking, the "detest" becomes something even darker
> and more visceral.
> 
> I think it would be a lot better as separate ops, considering that
> 
>  (a) we'll probably have only one or two actual users, so it's not
> like it complicates things on that side
> 
>  (b) we don't have *that* many "struct dentry_operations" structures:
> I think they are all statically generated constant structures
> (typically one or two per filesystem), so it's not like we're saving
> memory by merging those pointers into one.

ACK.

> Please?

Done and force-pushed; see below for updated variant of that commit

commit 1f28d77e868e63a07ab50e7fe161fc366b2fb23b
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Jan 5 21:33:17 2025 -0500

    9p: fix ->rename_sem exclusion
    
    9p wants to be able to build a path from given dentry to fs root and keep
    it valid over a blocking operation.
    
    ->s_vfs_rename_mutex would be a natural candidate, but there are places
    where we need that and where we have no way to tell if ->s_vfs_rename_mutex
    is already held deeper in callchain.  Moreover, it's only held for
    cross-directory renames; name changes within the same directory happen
    without it.
    
    Solution:
            * have d_move() done in ->rename() rather than in its caller
            * maintain a 9p-private rwsem (per-filesystem)
            * hold it exclusive over the relevant part of ->rename()
            * hold it shared over the places where we want the path.
    
    That almost works.  FS_RENAME_DOES_D_MOVE is enough to put all d_move()
    and d_exchange() calls under filesystem's control.  However, there's
    also __d_unalias(), which isn't covered by any of that.
    
    If ->lookup() hits a directory inode with preexisting dentry elsewhere
    (due to e.g. rename done on server behind our back), d_splice_alias()
    called by ->lookup() will move/rename that alias.
    
    Add a couple of optional methods, so that __d_unalias() would do
            if alias->d_op->d_unalias_trylock != NULL
                    if (!alias->d_op->d_unalias_trylock(alias))
                            fail (resulting in -ESTALE from lookup)
            __d_move(...)
            if alias->d_op->d_unalias_unlock != NULL
                    alias->d_unalias_unlock(alias)
    where it currently does __d_move().  9p instances do down_write_trylock()
    and up_write() of ->rename_mutex.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 146e7d8aa736..d20a32b77b60 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -31,6 +31,8 @@ prototypes::
 	struct vfsmount *(*d_automount)(struct path *path);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_unalias_trylock)(const struct dentry *);
+	void (*d_unalias_unlock)(const struct dentry *);
 
 locking rules:
 
@@ -50,6 +52,8 @@ d_dname:	   no		no		no		no
 d_automount:	   no		no		yes		no
 d_manage:	   no		no		yes (ref-walk)	maybe
 d_real		   no		no		yes 		no
+d_unalias_trylock  yes		no		no 		no
+d_unalias_unlock   yes		no		no 		no
 ================== ===========	========	==============	========
 
 inode_operations
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c352ebaae98..31eea688609a 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1265,6 +1265,8 @@ defined:
 		struct vfsmount *(*d_automount)(struct path *);
 		int (*d_manage)(const struct path *, bool);
 		struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+		bool (*d_unalias_trylock)(const struct dentry *);
+		void (*d_unalias_unlock)(const struct dentry *);
 	};
 
 ``d_revalidate``
@@ -1428,6 +1430,25 @@ defined:
 
 	For non-regular files, the 'dentry' argument is returned.
 
+``d_unalias_trylock``
+	if present, will be called by d_splice_alias() before moving a
+	preexisting attached alias.  Returning false prevents __d_move(),
+	making d_splice_alias() fail with -ESTALE.
+
+	Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move()
+	and d_exchange() calls from the outside of filesystem methods;
+	however, it does not guarantee that attached dentries won't
+	be renamed or moved by d_splice_alias() finding a preexisting
+	alias for a directory inode.  Normally we would not care;
+	however, something that wants to stabilize the entire path to
+	root over a blocking operation might need that.  See 9p for one
+	(and hopefully only) example.
+
+``d_unalias_unlock``
+	should be paired with ``d_unalias_trylock``; that one is called after
+	__d_move() call in __d_unalias().
+
+
 Each dentry has a pointer to its parent dentry, as well as a hash list
 of child dentries.  Child dentries are basically like files in a
 directory.
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 698c43dd5dc8..f28bc763847a 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -202,7 +202,7 @@ static inline struct v9fs_session_info *v9fs_inode2v9ses(struct inode *inode)
 	return inode->i_sb->s_fs_info;
 }
 
-static inline struct v9fs_session_info *v9fs_dentry2v9ses(struct dentry *dentry)
+static inline struct v9fs_session_info *v9fs_dentry2v9ses(const struct dentry *dentry)
 {
 	return dentry->d_sb->s_fs_info;
 }
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 872c1abe3295..5061f192eafd 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -105,14 +105,30 @@ static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
 	return __v9fs_lookup_revalidate(dentry, flags);
 }
 
+static bool v9fs_dentry_unalias_trylock(const struct dentry *dentry)
+{
+	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+	return down_write_trylock(&v9ses->rename_sem);
+}
+
+static void v9fs_dentry_unalias_unlock(const struct dentry *dentry)
+{
+	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+	up_write(&v9ses->rename_sem);
+}
+
 const struct dentry_operations v9fs_cached_dentry_operations = {
 	.d_revalidate = v9fs_lookup_revalidate,
 	.d_weak_revalidate = __v9fs_lookup_revalidate,
 	.d_delete = v9fs_cached_dentry_delete,
 	.d_release = v9fs_dentry_release,
+	.d_unalias_trylock = v9fs_dentry_unalias_trylock,
+	.d_unalias_unlock = v9fs_dentry_unalias_unlock,
 };
 
 const struct dentry_operations v9fs_dentry_operations = {
 	.d_delete = always_delete_dentry,
 	.d_release = v9fs_dentry_release,
+	.d_unalias_trylock = v9fs_dentry_unalias_trylock,
+	.d_unalias_unlock = v9fs_dentry_unalias_unlock,
 };
diff --git a/fs/dcache.c b/fs/dcache.c
index 7d42ca367522..2ac614fc8bba 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2961,7 +2961,12 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias)
 		goto out_err;
 	m2 = &alias->d_parent->d_inode->i_rwsem;
 out_unalias:
+	if (alias->d_op->d_unalias_trylock &&
+	    !alias->d_op->d_unalias_trylock(alias))
+		goto out_err;
 	__d_move(alias, dentry, false);
+	if (alias->d_op->d_unalias_unlock)
+		alias->d_op->d_unalias_unlock(alias);
 	ret = 0;
 out_err:
 	if (m2)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 4a6bdadf2f29..9a1a30857763 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -159,6 +159,8 @@ struct dentry_operations {
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_unalias_trylock)(const struct dentry *);
+	void (*d_unalias_unlock)(const struct dentry *);
 } ____cacheline_aligned;
 
 /*
diff mbox series

Patch

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 146e7d8aa736..6e20282447a0 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -31,6 +31,7 @@  prototypes::
 	struct vfsmount *(*d_automount)(struct path *path);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_want_unalias)(const struct dentry *, bool);
 
 locking rules:
 
@@ -50,6 +51,7 @@  d_dname:	   no		no		no		no
 d_automount:	   no		no		yes		no
 d_manage:	   no		no		yes (ref-walk)	maybe
 d_real		   no		no		yes 		no
+d_want_unalias	   yes		no		no 		no
 ================== ===========	========	==============	========
 
 inode_operations
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 7c352ebaae98..07d4b4deb252 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1265,6 +1265,7 @@  defined:
 		struct vfsmount *(*d_automount)(struct path *);
 		int (*d_manage)(const struct path *, bool);
 		struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+		bool (*d_want_unalias)(const struct dentry *, bool);
 	};
 
 ``d_revalidate``
@@ -1428,6 +1429,24 @@  defined:
 
 	For non-regular files, the 'dentry' argument is returned.
 
+``d_want_unalias``
+	if present, will be called by d_splice_alias() before and after
+	moving a preexisting attached alias.  The second argument is
+	true for call before __d_move() and false for the call after.
+	Returning false on the first call prevents __d_move(), making
+	d_splice_alias() fail with -ESTALE; return value on the second
+	call is ignored.
+
+	Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move()
+	and d_exchange() calls from the outside of filesystem methods;
+	however, it does not guarantee that attached dentries won't
+	be renamed or moved by d_splice_alias() finding a preexisting
+	alias for a directory inode.  Normally we would not care;
+	however, something that wants to stabilize the entire path to
+	root over a blocking operation might need that.  See 9p for one
+	(and hopefully only) example.
+
+
 Each dentry has a pointer to its parent dentry, as well as a hash list
 of child dentries.  Child dentries are basically like files in a
 directory.
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 698c43dd5dc8..f28bc763847a 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -202,7 +202,7 @@  static inline struct v9fs_session_info *v9fs_inode2v9ses(struct inode *inode)
 	return inode->i_sb->s_fs_info;
 }
 
-static inline struct v9fs_session_info *v9fs_dentry2v9ses(struct dentry *dentry)
+static inline struct v9fs_session_info *v9fs_dentry2v9ses(const struct dentry *dentry)
 {
 	return dentry->d_sb->s_fs_info;
 }
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 872c1abe3295..b2222df318d0 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -105,14 +105,27 @@  static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name,
 	return __v9fs_lookup_revalidate(dentry, flags);
 }
 
+static bool v9fs_dentry_want_unalias(const struct dentry *dentry, bool lock)
+{
+	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
+
+	if (lock)
+		return down_write_trylock(&v9ses->rename_sem);
+
+	up_write(&v9ses->rename_sem);
+	return true;
+}
+
 const struct dentry_operations v9fs_cached_dentry_operations = {
 	.d_revalidate = v9fs_lookup_revalidate,
 	.d_weak_revalidate = __v9fs_lookup_revalidate,
 	.d_delete = v9fs_cached_dentry_delete,
 	.d_release = v9fs_dentry_release,
+	.d_want_unalias = v9fs_dentry_want_unalias,
 };
 
 const struct dentry_operations v9fs_dentry_operations = {
 	.d_delete = always_delete_dentry,
 	.d_release = v9fs_dentry_release,
+	.d_want_unalias = v9fs_dentry_want_unalias,
 };
diff --git a/fs/dcache.c b/fs/dcache.c
index 7d42ca367522..efbfbc1bc5d4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2947,6 +2947,7 @@  static int __d_unalias(struct dentry *dentry, struct dentry *alias)
 {
 	struct mutex *m1 = NULL;
 	struct rw_semaphore *m2 = NULL;
+	bool (*extra_trylock)(const struct dentry *, bool);
 	int ret = -ESTALE;
 
 	/* If alias and dentry share a parent, then no extra locks required */
@@ -2961,7 +2962,12 @@  static int __d_unalias(struct dentry *dentry, struct dentry *alias)
 		goto out_err;
 	m2 = &alias->d_parent->d_inode->i_rwsem;
 out_unalias:
+	extra_trylock = alias->d_op->d_want_unalias;
+	if (extra_trylock && !extra_trylock(alias, true))
+		goto out_err;
 	__d_move(alias, dentry, false);
+	if (extra_trylock)
+		extra_trylock(alias, false);
 	ret = 0;
 out_err:
 	if (m2)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 4a6bdadf2f29..2b33b9d04a8f 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -159,6 +159,7 @@  struct dentry_operations {
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+	bool (*d_want_unalias)(const struct dentry *, bool);
 } ____cacheline_aligned;
 
 /*