Message ID | 20230727172843.20542-4-krisman@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Support negative dentries on case-insensitive ext4 and f2fs | expand |
On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote: > From: Gabriel Krisman Bertazi <krisman@collabora.com> > > Introduce a dentry revalidation helper to be used by case-insensitive > filesystems to check if it is safe to reuse a negative dentry. > > A negative dentry is safe to be reused on a case-insensitive lookup if > it was created during a case-insensitive lookup and this is not a lookup > that will instantiate a dentry. If this is a creation lookup, we also > need to make sure the name matches sensitively the name under lookup in > order to assure the name preserving semantics. > > dentry->d_name is only checked by the case-insensitive d_revalidate hook > in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases, > d_revalidate is always called with the parent inode read-locked, and > therefore the name cannot change from under us. > > d_revalidate is only called in 4 places: lookup_dcache, __lookup_slow, > lookup_open and lookup_fast: > > - lookup_dcache always calls it with zeroed flags, with the exception > of when coming from __lookup_hash, which needs the parent locked > already, for instance in the open/creation path, which is locked in > open_last_lookups. > > - In __lookup_slow, either the parent inode is read locked by the > caller (lookup_slow), or it is called with no flags (lookup_one*). > The read lock suffices to prevent ->d_name modifications, with the > exception of one case: __d_unalias, will call __d_move to fix a > directory accessible from multiple dentries, which effectively swaps > ->d_name while holding only the shared read lock. This happens > through this flow: > > lookup_slow() //LOOKUP_CREATE > d_lookup() > ->d_lookup() > d_splice_alias() > __d_unalias() > __d_move() > > Nevertheless, this case is not a problem because negative dentries > are not allowed to be moved with __d_move. > > - lookup_open also requires the parent to be locked in the creation > case, which is done in open_last_lookups. > > - lookup_fast will indeed be called with the parent unlocked, but it > shouldn't be called with LOOKUP_CREATE. Either it is called in the > link_path_walk, where nd->flags doesn't have LOOKUP_CREATE yet or in > open_last_lookups. But, in this case, it also never has LOOKUP_CREATE, > because it is only called on the !O_CREAT case, which means op->intent > doesn't have LOOKUP_CREAT (set in build_open_flags only if O_CREAT is > set). > > Finally, for the LOOKUP_RENAME_TARGET, we are doing a rename, so the > parents inodes are also locked. > > Reviewed-by: Theodore Ts'o <tytso@mit.edu> > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> > > --- > Changes since v3: > - Add comment regarding creation (Eric) > - Reorder checks to clarify !flags meaning (Eric) > - Add commit message explanaton of the inode read lock wrt. > __d_move. (Eric) > Changes since v2: > - Add comments to all rejection cases (Eric) > - safeguard against filesystem creating dentries without LOOKUP flags > --- > fs/libfs.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 55 insertions(+) > > diff --git a/fs/libfs.c b/fs/libfs.c > index 5b851315eeed..ed04c4dcc312 100644 > --- a/fs/libfs.c > +++ b/fs/libfs.c > @@ -1462,9 +1462,64 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str) > return 0; > } > > +static inline int generic_ci_d_revalidate(struct dentry *dentry, > + const struct qstr *name, > + unsigned int flags) > +{ > + if (d_is_negative(dentry)) { > + const struct dentry *parent = READ_ONCE(dentry->d_parent); > + const struct inode *dir = READ_ONCE(parent->d_inode); > + > + if (dir && needs_casefold(dir)) { > + /* > + * Negative dentries created prior to turning the > + * directory case-insensitive cannot be trusted, since > + * they don't ensure any possible case version of the > + * filename doesn't exist. > + */ > + if (!d_is_casefold_lookup(dentry)) > + return 0; > + > + /* > + * Filesystems will call into d_revalidate without > + * setting LOOKUP_ flags even for file creation (see > + * lookup_one* variants). Reject negative dentries in > + * this case, since we can't know for sure it won't be > + * used for creation. > + */ > + if (!flags) > + return 0; > + > + /* > + * If the lookup is for creation, then a negative dentry > + * can only be reused if it's a case-sensitive match, > + * not just a case-insensitive one. This is needed to > + * make the new file be created with the name the user > + * specified, preserving case. > + */ > + if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) { > + /* > + * ->d_name won't change from under us in the > + * creation path only, since d_revalidate during > + * creation and renames is always called with > + * the parent inode locked. It isn't the case > + * for all lookup callpaths, so ->d_name must > + * not be touched outside > + * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context. > + */ > + if (dentry->d_name.len != name->len || > + memcmp(dentry->d_name.name, name->name, name->len)) > + return 0; > + } > + } > + } > + return 1; > +} > + > static const struct dentry_operations generic_ci_dentry_ops = { > .d_hash = generic_ci_d_hash, > .d_compare = generic_ci_d_compare, > + .d_revalidate_name = generic_ci_d_revalidate, > }; > #endif Wouldn't it make sense to get rid of all this indentation? const struct dentry *parent; const struct inode *dir; if (!d_is_negative(dentry)) return 1; parent = READ_ONCE(dentry->d_parent); dir = READ_ONCE(parent->d_inode); if (!dir) return 1; if (!needs_casefold(dir)) return 1; /* * Negative dentries created prior to turning the * directory case-insensitive cannot be trusted, since * they don't ensure any possible case version of the * filename doesn't exist. */ if (!d_is_casefold_lookup(dentry)) return 0; /* * Filesystems will call into d_revalidate without * setting LOOKUP_ flags even for file creation (see * lookup_one* variants). Reject negative dentries in * this case, since we can't know for sure it won't be * used for creation. */ if (!flags) return 0; /* * If the lookup is for creation, then a negative dentry * can only be reused if it's a case-sensitive match, * not just a case-insensitive one. This is needed to * make the new file be created with the name the user * specified, preserving case. */ if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) { /* * ->d_name won't change from under us in the * creation path only, since d_revalidate during * creation and renames is always called with * the parent inode locked. It isn't the case * for all lookup callpaths, so ->d_name must * not be touched outside * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context. */ if (dentry->d_name.len != name->len || memcmp(dentry->d_name.name, name->name, name->len)) return 0; } return 1;
Christian Brauner <brauner@kernel.org> writes: > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote: > > Wouldn't it make sense to get rid of all this indentation? I'm ok with making this change. I'll wait for more reviews and Eric before sending a new version with this done. Thanks!
On Fri, Jul 28, 2023 at 11:09:46AM -0400, Gabriel Krisman Bertazi wrote: > Christian Brauner <brauner@kernel.org> writes: > > > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote: > > > > > Wouldn't it make sense to get rid of all this indentation? > > I'm ok with making this change. I'll wait for more reviews and Eric > before sending a new version with this done. > > Thanks! > Well, the issue is that with patch 4, all the 'return 1;' would need to change to 'return fscrypt_d_revalidate(dentry, flags);'. A helper function could be used, though, if you prefer: static int generic_ci_d_revalidate(struct dentry *dentry, const struct qstr *name, unsigned int flags) { if (!ci_d_revalidate(dentry, name, flags)) return 0; return fscrypt_d_revalidate(dentry, flags); } - Eric
On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote: > - In __lookup_slow, either the parent inode is read locked by the > caller (lookup_slow), or it is called with no flags (lookup_one*). > The read lock suffices to prevent ->d_name modifications, with the > exception of one case: __d_unalias, will call __d_move to fix a > directory accessible from multiple dentries, which effectively swaps > ->d_name while holding only the shared read lock. This happens > through this flow: > > lookup_slow() //LOOKUP_CREATE > d_lookup() > ->d_lookup() > d_splice_alias() > __d_unalias() > __d_move() > > Nevertheless, this case is not a problem because negative dentries > are not allowed to be moved with __d_move. Isn't it possible for a negative dentry to become a positive one concurrently? - Eric
On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote: > dentry->d_name is only checked by the case-insensitive d_revalidate hook > in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases, > d_revalidate is always called with the parent inode read-locked, and > therefore the name cannot change from under us. "at least read-locked"? Or do you actually mean write-locked? > +static inline int generic_ci_d_revalidate(struct dentry *dentry, > + const struct qstr *name, > + unsigned int flags) No need for inline here. - Eric
Eric Biggers <ebiggers@kernel.org> writes: > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote: >> dentry->d_name is only checked by the case-insensitive d_revalidate hook >> in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases, >> d_revalidate is always called with the parent inode read-locked, and >> therefore the name cannot change from under us. > > "at least read-locked"? Or do you actually mean write-locked? No. I mean read-locked, as in holding the read-part of the inode lock. This is the case for lookup_slow, which is safe, despite the d_add_ci case we discussed in the previous iteration. I'll reword to say "at least read-locked and mention it is the case in lookup_slow". >> +static inline int generic_ci_d_revalidate(struct dentry *dentry, >> + const struct qstr *name, >> + unsigned int flags) > > No need for inline here. sorry, I missed the inline from your previuos review. Will fix it up for this one. > > - Eric
Eric Biggers <ebiggers@kernel.org> writes: > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote: >> - In __lookup_slow, either the parent inode is read locked by the >> caller (lookup_slow), or it is called with no flags (lookup_one*). >> The read lock suffices to prevent ->d_name modifications, with the >> exception of one case: __d_unalias, will call __d_move to fix a >> directory accessible from multiple dentries, which effectively swaps >> ->d_name while holding only the shared read lock. This happens >> through this flow: >> >> lookup_slow() //LOOKUP_CREATE >> d_lookup() >> ->d_lookup() >> d_splice_alias() >> __d_unalias() >> __d_move() >> >> Nevertheless, this case is not a problem because negative dentries >> are not allowed to be moved with __d_move. > > Isn't it possible for a negative dentry to become a positive one concurrently? Do you mean d_splice_alias racing with a dentry instantiation and __d_move being called on a negative dentry that is turning positive? It is not possible for __d_move to be called with a negative dentry for d_splice_alias, since the inode->i_lock is locked during __d_find_alias, so it can't race with __d_instantiate or d_add. Then, __d_find_alias can't find negative dentries in the first place, so we either have a positive dentry, in which case __d_move is fine with regard to d_revalidate_name, or we don't have any aliases and don't call __d_move. Can you clarify what problem you see here?
On Thu, Aug 03, 2023 at 01:37:45PM -0400, Gabriel Krisman Bertazi wrote: > Eric Biggers <ebiggers@kernel.org> writes: > > > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote: > >> - In __lookup_slow, either the parent inode is read locked by the > >> caller (lookup_slow), or it is called with no flags (lookup_one*). > >> The read lock suffices to prevent ->d_name modifications, with the > >> exception of one case: __d_unalias, will call __d_move to fix a > >> directory accessible from multiple dentries, which effectively swaps > >> ->d_name while holding only the shared read lock. This happens > >> through this flow: > >> > >> lookup_slow() //LOOKUP_CREATE > >> d_lookup() > >> ->d_lookup() > >> d_splice_alias() > >> __d_unalias() > >> __d_move() > >> > >> Nevertheless, this case is not a problem because negative dentries > >> are not allowed to be moved with __d_move. > > > > Isn't it possible for a negative dentry to become a positive one concurrently? > > Do you mean d_splice_alias racing with a dentry instantiation and > __d_move being called on a negative dentry that is turning positive? > > It is not possible for __d_move to be called with a negative dentry for > d_splice_alias, since the inode->i_lock is locked during __d_find_alias, > so it can't race with __d_instantiate or d_add. Then, __d_find_alias > can't find negative dentries in the first place, so we either have a > positive dentry, in which case __d_move is fine with regard to > d_revalidate_name, or we don't have any aliases and don't call > __d_move. > > Can you clarify what problem you see here? > I agree that negative dentries can't be moved --- I pointed this out earlier (https://lore.kernel.org/linux-fsdevel/20230720060657.GB2607@sol.localdomain). The question is whether if ->d_revalidate sees a negative dentry, when can it assume that it remains a negative dentry for the remainder of ->d_revalidate. I'm not sure there is a problem, I just don't understand your explanation. - Eric
Eric Biggers <ebiggers@kernel.org> writes: > On Thu, Aug 03, 2023 at 01:37:45PM -0400, Gabriel Krisman Bertazi wrote: >> Eric Biggers <ebiggers@kernel.org> writes: >> >> > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote: >> >> - In __lookup_slow, either the parent inode is read locked by the >> >> caller (lookup_slow), or it is called with no flags (lookup_one*). >> >> The read lock suffices to prevent ->d_name modifications, with the >> >> exception of one case: __d_unalias, will call __d_move to fix a >> >> directory accessible from multiple dentries, which effectively swaps >> >> ->d_name while holding only the shared read lock. This happens >> >> through this flow: >> >> >> >> lookup_slow() //LOOKUP_CREATE >> >> d_lookup() >> >> ->d_lookup() >> >> d_splice_alias() >> >> __d_unalias() >> >> __d_move() >> >> >> >> Nevertheless, this case is not a problem because negative dentries >> >> are not allowed to be moved with __d_move. >> > >> > Isn't it possible for a negative dentry to become a positive one concurrently? >> >> Do you mean d_splice_alias racing with a dentry instantiation and >> __d_move being called on a negative dentry that is turning positive? >> >> It is not possible for __d_move to be called with a negative dentry for >> d_splice_alias, since the inode->i_lock is locked during __d_find_alias, >> so it can't race with __d_instantiate or d_add. Then, __d_find_alias >> can't find negative dentries in the first place, so we either have a >> positive dentry, in which case __d_move is fine with regard to >> d_revalidate_name, or we don't have any aliases and don't call >> __d_move. >> >> Can you clarify what problem you see here? >> > > I agree that negative dentries can't be moved --- I pointed this out earlier > (https://lore.kernel.org/linux-fsdevel/20230720060657.GB2607@sol.localdomain). > The question is whether if ->d_revalidate sees a negative dentry, when can it > assume that it remains a negative dentry for the remainder of ->d_revalidate. > I'm not sure there is a problem, I just don't understand your > explanation. I see. Thanks for clarifying, as I had previously misunderstood your point. So, first of all, if d_revalidate itself is not a creation, it doesn't matter, because we won't touch ->d_name. We might invalidate a valid dentry, but that is ok. The problem would be limited to d_revalidate being on the creation path, where the parent (read-)lock is held. The problem would be doing the memcmp(), while the dentry is turned positive (d_instantiate), while someone else moves the name. For the dentry to be turned positive during a d_revalidate, it would then have to race with d_add or with d_instantiate. d_add shouldn't be possible since we are holding the parent inode lock (at least read-side), which will serialize file creation. From my understanding of the code, d_instantiate also can't race with d_revalidate for the same reason - is also serialized by the parent inode lock, which is acquired in filename_create. At least for all paths in ext4/f2fs. In fact, I'm failing to find a case where the lock is not taken when instantiating a dentry, but I'm unsure if this is a guarantee or just an artifact of the code. It seems to be safe in the current code, but I don't know if it is a guarantee. Can anyone comment on this?
diff --git a/fs/libfs.c b/fs/libfs.c index 5b851315eeed..ed04c4dcc312 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -1462,9 +1462,64 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str) return 0; } +static inline int generic_ci_d_revalidate(struct dentry *dentry, + const struct qstr *name, + unsigned int flags) +{ + if (d_is_negative(dentry)) { + const struct dentry *parent = READ_ONCE(dentry->d_parent); + const struct inode *dir = READ_ONCE(parent->d_inode); + + if (dir && needs_casefold(dir)) { + /* + * Negative dentries created prior to turning the + * directory case-insensitive cannot be trusted, since + * they don't ensure any possible case version of the + * filename doesn't exist. + */ + if (!d_is_casefold_lookup(dentry)) + return 0; + + /* + * Filesystems will call into d_revalidate without + * setting LOOKUP_ flags even for file creation (see + * lookup_one* variants). Reject negative dentries in + * this case, since we can't know for sure it won't be + * used for creation. + */ + if (!flags) + return 0; + + /* + * If the lookup is for creation, then a negative dentry + * can only be reused if it's a case-sensitive match, + * not just a case-insensitive one. This is needed to + * make the new file be created with the name the user + * specified, preserving case. + */ + if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) { + /* + * ->d_name won't change from under us in the + * creation path only, since d_revalidate during + * creation and renames is always called with + * the parent inode locked. It isn't the case + * for all lookup callpaths, so ->d_name must + * not be touched outside + * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context. + */ + if (dentry->d_name.len != name->len || + memcmp(dentry->d_name.name, name->name, name->len)) + return 0; + } + } + } + return 1; +} + static const struct dentry_operations generic_ci_dentry_ops = { .d_hash = generic_ci_d_hash, .d_compare = generic_ci_d_compare, + .d_revalidate_name = generic_ci_d_revalidate, }; #endif