Message ID | 20250110024303.4157645-20-viro@zeniv.linux.org.uk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [01/20] make sure that DNAME_INLINE_LEN is a multiple of word size | expand |
On Thu, 9 Jan 2025 at 18:45, Al Viro <viro@zeniv.linux.org.uk> wrote: > > However, to reduce dentry_operations bloat, let's add one method instead - > ->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and > ->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias). Ugh. So of all the patches, this is the one that I hate. I absolutely detest interfaces with random true/false arguments, and when it is about locking, the "detest" becomes something even darker and more visceral. I think it would be a lot better as separate ops, considering that (a) we'll probably have only one or two actual users, so it's not like it complicates things on that side (b) we don't have *that* many "struct dentry_operations" structures: I think they are all statically generated constant structures (typically one or two per filesystem), so it's not like we're saving memory by merging those pointers into one. Please? Linus
On Thu, Jan 09, 2025 at 07:11:46PM -0800, Linus Torvalds wrote: > On Thu, 9 Jan 2025 at 18:45, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > However, to reduce dentry_operations bloat, let's add one method instead - > > ->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and > > ->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias). > > Ugh. > > So of all the patches, this is the one that I hate. > > I absolutely detest interfaces with random true/false arguments, and > when it is about locking, the "detest" becomes something even darker > and more visceral. > > I think it would be a lot better as separate ops, considering that > > (a) we'll probably have only one or two actual users, so it's not > like it complicates things on that side > > (b) we don't have *that* many "struct dentry_operations" structures: > I think they are all statically generated constant structures > (typically one or two per filesystem), so it's not like we're saving > memory by merging those pointers into one. ACK. > Please? Done and force-pushed; see below for updated variant of that commit commit 1f28d77e868e63a07ab50e7fe161fc366b2fb23b Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sun Jan 5 21:33:17 2025 -0500 9p: fix ->rename_sem exclusion 9p wants to be able to build a path from given dentry to fs root and keep it valid over a blocking operation. ->s_vfs_rename_mutex would be a natural candidate, but there are places where we need that and where we have no way to tell if ->s_vfs_rename_mutex is already held deeper in callchain. Moreover, it's only held for cross-directory renames; name changes within the same directory happen without it. Solution: * have d_move() done in ->rename() rather than in its caller * maintain a 9p-private rwsem (per-filesystem) * hold it exclusive over the relevant part of ->rename() * hold it shared over the places where we want the path. That almost works. FS_RENAME_DOES_D_MOVE is enough to put all d_move() and d_exchange() calls under filesystem's control. However, there's also __d_unalias(), which isn't covered by any of that. If ->lookup() hits a directory inode with preexisting dentry elsewhere (due to e.g. rename done on server behind our back), d_splice_alias() called by ->lookup() will move/rename that alias. Add a couple of optional methods, so that __d_unalias() would do if alias->d_op->d_unalias_trylock != NULL if (!alias->d_op->d_unalias_trylock(alias)) fail (resulting in -ESTALE from lookup) __d_move(...) if alias->d_op->d_unalias_unlock != NULL alias->d_unalias_unlock(alias) where it currently does __d_move(). 9p instances do down_write_trylock() and up_write() of ->rename_mutex. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 146e7d8aa736..d20a32b77b60 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -31,6 +31,8 @@ prototypes:: struct vfsmount *(*d_automount)(struct path *path); int (*d_manage)(const struct path *, bool); struct dentry *(*d_real)(struct dentry *, enum d_real_type type); + bool (*d_unalias_trylock)(const struct dentry *); + void (*d_unalias_unlock)(const struct dentry *); locking rules: @@ -50,6 +52,8 @@ d_dname: no no no no d_automount: no no yes no d_manage: no no yes (ref-walk) maybe d_real no no yes no +d_unalias_trylock yes no no no +d_unalias_unlock yes no no no ================== =========== ======== ============== ======== inode_operations diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 7c352ebaae98..31eea688609a 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -1265,6 +1265,8 @@ defined: struct vfsmount *(*d_automount)(struct path *); int (*d_manage)(const struct path *, bool); struct dentry *(*d_real)(struct dentry *, enum d_real_type type); + bool (*d_unalias_trylock)(const struct dentry *); + void (*d_unalias_unlock)(const struct dentry *); }; ``d_revalidate`` @@ -1428,6 +1430,25 @@ defined: For non-regular files, the 'dentry' argument is returned. +``d_unalias_trylock`` + if present, will be called by d_splice_alias() before moving a + preexisting attached alias. Returning false prevents __d_move(), + making d_splice_alias() fail with -ESTALE. + + Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move() + and d_exchange() calls from the outside of filesystem methods; + however, it does not guarantee that attached dentries won't + be renamed or moved by d_splice_alias() finding a preexisting + alias for a directory inode. Normally we would not care; + however, something that wants to stabilize the entire path to + root over a blocking operation might need that. See 9p for one + (and hopefully only) example. + +``d_unalias_unlock`` + should be paired with ``d_unalias_trylock``; that one is called after + __d_move() call in __d_unalias(). + + Each dentry has a pointer to its parent dentry, as well as a hash list of child dentries. Child dentries are basically like files in a directory. diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h index 698c43dd5dc8..f28bc763847a 100644 --- a/fs/9p/v9fs.h +++ b/fs/9p/v9fs.h @@ -202,7 +202,7 @@ static inline struct v9fs_session_info *v9fs_inode2v9ses(struct inode *inode) return inode->i_sb->s_fs_info; } -static inline struct v9fs_session_info *v9fs_dentry2v9ses(struct dentry *dentry) +static inline struct v9fs_session_info *v9fs_dentry2v9ses(const struct dentry *dentry) { return dentry->d_sb->s_fs_info; } diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c index 872c1abe3295..5061f192eafd 100644 --- a/fs/9p/vfs_dentry.c +++ b/fs/9p/vfs_dentry.c @@ -105,14 +105,30 @@ static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name, return __v9fs_lookup_revalidate(dentry, flags); } +static bool v9fs_dentry_unalias_trylock(const struct dentry *dentry) +{ + struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry); + return down_write_trylock(&v9ses->rename_sem); +} + +static void v9fs_dentry_unalias_unlock(const struct dentry *dentry) +{ + struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry); + up_write(&v9ses->rename_sem); +} + const struct dentry_operations v9fs_cached_dentry_operations = { .d_revalidate = v9fs_lookup_revalidate, .d_weak_revalidate = __v9fs_lookup_revalidate, .d_delete = v9fs_cached_dentry_delete, .d_release = v9fs_dentry_release, + .d_unalias_trylock = v9fs_dentry_unalias_trylock, + .d_unalias_unlock = v9fs_dentry_unalias_unlock, }; const struct dentry_operations v9fs_dentry_operations = { .d_delete = always_delete_dentry, .d_release = v9fs_dentry_release, + .d_unalias_trylock = v9fs_dentry_unalias_trylock, + .d_unalias_unlock = v9fs_dentry_unalias_unlock, }; diff --git a/fs/dcache.c b/fs/dcache.c index 7d42ca367522..2ac614fc8bba 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2961,7 +2961,12 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias) goto out_err; m2 = &alias->d_parent->d_inode->i_rwsem; out_unalias: + if (alias->d_op->d_unalias_trylock && + !alias->d_op->d_unalias_trylock(alias)) + goto out_err; __d_move(alias, dentry, false); + if (alias->d_op->d_unalias_unlock) + alias->d_op->d_unalias_unlock(alias); ret = 0; out_err: if (m2) diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 4a6bdadf2f29..9a1a30857763 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -159,6 +159,8 @@ struct dentry_operations { struct vfsmount *(*d_automount)(struct path *); int (*d_manage)(const struct path *, bool); struct dentry *(*d_real)(struct dentry *, enum d_real_type type); + bool (*d_unalias_trylock)(const struct dentry *); + void (*d_unalias_unlock)(const struct dentry *); } ____cacheline_aligned; /*
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 146e7d8aa736..6e20282447a0 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -31,6 +31,7 @@ prototypes:: struct vfsmount *(*d_automount)(struct path *path); int (*d_manage)(const struct path *, bool); struct dentry *(*d_real)(struct dentry *, enum d_real_type type); + bool (*d_want_unalias)(const struct dentry *, bool); locking rules: @@ -50,6 +51,7 @@ d_dname: no no no no d_automount: no no yes no d_manage: no no yes (ref-walk) maybe d_real no no yes no +d_want_unalias yes no no no ================== =========== ======== ============== ======== inode_operations diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 7c352ebaae98..07d4b4deb252 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -1265,6 +1265,7 @@ defined: struct vfsmount *(*d_automount)(struct path *); int (*d_manage)(const struct path *, bool); struct dentry *(*d_real)(struct dentry *, enum d_real_type type); + bool (*d_want_unalias)(const struct dentry *, bool); }; ``d_revalidate`` @@ -1428,6 +1429,24 @@ defined: For non-regular files, the 'dentry' argument is returned. +``d_want_unalias`` + if present, will be called by d_splice_alias() before and after + moving a preexisting attached alias. The second argument is + true for call before __d_move() and false for the call after. + Returning false on the first call prevents __d_move(), making + d_splice_alias() fail with -ESTALE; return value on the second + call is ignored. + + Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move() + and d_exchange() calls from the outside of filesystem methods; + however, it does not guarantee that attached dentries won't + be renamed or moved by d_splice_alias() finding a preexisting + alias for a directory inode. Normally we would not care; + however, something that wants to stabilize the entire path to + root over a blocking operation might need that. See 9p for one + (and hopefully only) example. + + Each dentry has a pointer to its parent dentry, as well as a hash list of child dentries. Child dentries are basically like files in a directory. diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h index 698c43dd5dc8..f28bc763847a 100644 --- a/fs/9p/v9fs.h +++ b/fs/9p/v9fs.h @@ -202,7 +202,7 @@ static inline struct v9fs_session_info *v9fs_inode2v9ses(struct inode *inode) return inode->i_sb->s_fs_info; } -static inline struct v9fs_session_info *v9fs_dentry2v9ses(struct dentry *dentry) +static inline struct v9fs_session_info *v9fs_dentry2v9ses(const struct dentry *dentry) { return dentry->d_sb->s_fs_info; } diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c index 872c1abe3295..b2222df318d0 100644 --- a/fs/9p/vfs_dentry.c +++ b/fs/9p/vfs_dentry.c @@ -105,14 +105,27 @@ static int v9fs_lookup_revalidate(struct inode *dir, const struct qstr *name, return __v9fs_lookup_revalidate(dentry, flags); } +static bool v9fs_dentry_want_unalias(const struct dentry *dentry, bool lock) +{ + struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry); + + if (lock) + return down_write_trylock(&v9ses->rename_sem); + + up_write(&v9ses->rename_sem); + return true; +} + const struct dentry_operations v9fs_cached_dentry_operations = { .d_revalidate = v9fs_lookup_revalidate, .d_weak_revalidate = __v9fs_lookup_revalidate, .d_delete = v9fs_cached_dentry_delete, .d_release = v9fs_dentry_release, + .d_want_unalias = v9fs_dentry_want_unalias, }; const struct dentry_operations v9fs_dentry_operations = { .d_delete = always_delete_dentry, .d_release = v9fs_dentry_release, + .d_want_unalias = v9fs_dentry_want_unalias, }; diff --git a/fs/dcache.c b/fs/dcache.c index 7d42ca367522..efbfbc1bc5d4 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2947,6 +2947,7 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias) { struct mutex *m1 = NULL; struct rw_semaphore *m2 = NULL; + bool (*extra_trylock)(const struct dentry *, bool); int ret = -ESTALE; /* If alias and dentry share a parent, then no extra locks required */ @@ -2961,7 +2962,12 @@ static int __d_unalias(struct dentry *dentry, struct dentry *alias) goto out_err; m2 = &alias->d_parent->d_inode->i_rwsem; out_unalias: + extra_trylock = alias->d_op->d_want_unalias; + if (extra_trylock && !extra_trylock(alias, true)) + goto out_err; __d_move(alias, dentry, false); + if (extra_trylock) + extra_trylock(alias, false); ret = 0; out_err: if (m2) diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 4a6bdadf2f29..2b33b9d04a8f 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -159,6 +159,7 @@ struct dentry_operations { struct vfsmount *(*d_automount)(struct path *); int (*d_manage)(const struct path *, bool); struct dentry *(*d_real)(struct dentry *, enum d_real_type type); + bool (*d_want_unalias)(const struct dentry *, bool); } ____cacheline_aligned; /*
9p wants to be able to build a path from given dentry to fs root and keep it valid over a blocking operation. ->s_vfs_rename_mutex would be a natural candidate, but there are places where we need that and where we have no way to tell if ->s_vfs_rename_mutex is already held deeper in callchain. Moreover, it's only held for cross-directory renames; name changes within the same directory happen without it. Solution: * have d_move() done in ->rename() rather than in its caller * maintain a 9p-private rwsem (per-filesystem) * hold it exclusive over the relevant part of ->rename() * hold it shared over the places where we want the path. That almost works. FS_RENAME_DOES_D_MOVE is enough to put all d_move() and d_exchange() calls under filesystem's control. However, there's also __d_unalias(), which isn't covered by any of that. If ->lookup() hits a directory inode with preexisting dentry elsewhere (due to e.g. rename done on server behind our back), d_splice_alias() called by ->lookup() will move/rename that alias. An approach to fixing that would be a couple of optional methods, so that __d_unalias() would do if alias->d_op->d_unalias_trylock != NULL if (!alias->d_op->d_unalias_trylock(alias)) fail (resulting in -ESTALE from lookup) __d_move(...) if alias->d_op->d_unalias_unlock != NULL alias->d_unalias_unlock(alias) where it currently does __d_move(). 9p instances would be down_write_trylock() and up_write() of ->rename_mutex. However, to reduce dentry_operations bloat, let's add one method instead - ->d_want_unalias(alias, true) instead of ->d_unalias_trylock(alias) and ->d_want_unalias(alias, false) instead of ->d_unalias_unlock(alias). Another possible variant would be to hold ->rename_sem exclusive around d_splice_alias() calls in 9p ->lookup(), but that would cause a lot of contention on that rwsem and it's filesystem-wide, so let's not go there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- Documentation/filesystems/locking.rst | 2 ++ Documentation/filesystems/vfs.rst | 19 +++++++++++++++++++ fs/9p/v9fs.h | 2 +- fs/9p/vfs_dentry.c | 13 +++++++++++++ fs/dcache.c | 6 ++++++ include/linux/dcache.h | 1 + 6 files changed, 42 insertions(+), 1 deletion(-)