diff mbox series

[v4] libfs: Attempt exact-match comparison first during casefolded lookup

Message ID 87ttn2sip7.fsf_-_@mailhost.krisman.be (mailing list archive)
State New, archived
Headers show
Series [v4] libfs: Attempt exact-match comparison first during casefolded lookup | expand

Commit Message

Gabriel Krisman Bertazi Jan. 24, 2024, 6:13 p.m. UTC
Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 21 Jan 2024 at 08:34, Gabriel Krisman Bertazi <krisman@suse.de> wrote:
>>
>> Are you ok with the earlier v2 of this patchset as-is, and I'll send a
>> new series proposing this change?
>
> Yeah, possibly updating just the commit log to mention how
> __d_lookup_rcu_op_compare() takes care of the data race.

Just for completeness, below the version I intend to apply to
unicode/for-next , which is the v2, plus the comments you and Eric
requested. That is, unless something else comes up.

> I do think that just doing the exact check in
> __d_lookup_rcu_op_compare() would then avoid things like the indirect
> call for that (presumably common) case, but it's not a big deal.

I will also follow up with a new patch for this shortly.

Thanks for the reviews.

-- >8 --
Subject: [PATCH v4] libfs: Attempt exact-match comparison first during
 casefolded lookup

Casefolded comparisons are (obviously) way more costly than a simple
memcmp.  Try the case-sensitive comparison first, falling-back to the
case-insensitive lookup only when needed.  This allows any exact-match
lookup to complete without having to walk the utf8 trie.

Note that, for strict mode, generic_ci_d_compare used to reject an
invalid UTF-8 string, which would now be considered valid if it
exact-matches the disk-name.  But, if that is the case, the filesystem
is corrupt.  More than that, it really doesn't matter in practice,
because the name-under-lookup will have already been rejected by
generic_ci_d_hash and we won't even get here.

The memcmp is safe under RCU because we are operating on str/len instead
of dentry->d_name directly, and the caller guarantees their consistency
between each other in __d_lookup_rcu_op_compare.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
---
 fs/libfs.c | 40 +++++++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 17 deletions(-)

Comments

Linus Torvalds Jan. 24, 2024, 6:42 p.m. UTC | #1
On Wed, 24 Jan 2024 at 10:13, Gabriel Krisman Bertazi <krisman@suse.de> wrote:
>
> Just for completeness, below the version I intend to apply to
> unicode/for-next , which is the v2, plus the comments you and Eric
> requested. That is, unless something else comes up.

Looks ok to me.

My one comment is actually unrelated to the new code, just because the
patch touches this code too:

>         if (len <= DNAME_INLINE_LEN - 1) {
>                 memcpy(strbuf, str, len);
>                 strbuf[len] = 0;
> -               qstr.name = strbuf;
> +               str = strbuf;
>                 /* prevent compiler from optimizing out the temporary buffer */
>                 barrier();

The reason for this whole mess is that allegedly utf8_strncasecmp() is
not safe if the buffer changes under it.

At least that's what the comment says.

But honestly, I don't see it.

I think the whole "copy to a stable buffer" code can and should just
be removed as voodoo programming.

*If* the buffer is actually changing, the name lookup code will just
retry, so whether the return value is correct or not is irrelevant.

All that matters is that the code honors the str/len constraint, and
not blow up - even if the data inside that str/len buffer might not be
stable.

I don't see how the utf8 code could possibly mess up.

That code goes back to commit

  2ce3ee931a09 ("ext4: avoid utf8_strncasecmp() with unstable name")
  fc3bb095ab02 ("f2fs: avoid utf8_strncasecmp() with unstable name")

and I think it's bogus.

Eric - the string *data* may be unsafe, but the string length passed
to the utf8 routines is not changing any more (since it was loaded
long ago).

And honestly, no amount of "the data may change" should possibly ever
cause the utf8 code to then ignore the length that was passed in.

                Linus
Eric Biggers Jan. 24, 2024, 10:44 p.m. UTC | #2
Hi Linus,

On Wed, Jan 24, 2024 at 10:42:51AM -0800, Linus Torvalds wrote:
> On Wed, 24 Jan 2024 at 10:13, Gabriel Krisman Bertazi <krisman@suse.de> wrote:
> >
> > Just for completeness, below the version I intend to apply to
> > unicode/for-next , which is the v2, plus the comments you and Eric
> > requested. That is, unless something else comes up.
> 
> Looks ok to me.
> 
> My one comment is actually unrelated to the new code, just because the
> patch touches this code too:
> 
> >         if (len <= DNAME_INLINE_LEN - 1) {
> >                 memcpy(strbuf, str, len);
> >                 strbuf[len] = 0;
> > -               qstr.name = strbuf;
> > +               str = strbuf;
> >                 /* prevent compiler from optimizing out the temporary buffer */
> >                 barrier();
> 
> The reason for this whole mess is that allegedly utf8_strncasecmp() is
> not safe if the buffer changes under it.
> 
> At least that's what the comment says.
> 
> But honestly, I don't see it.
> 
> I think the whole "copy to a stable buffer" code can and should just
> be removed as voodoo programming.
> 
> *If* the buffer is actually changing, the name lookup code will just
> retry, so whether the return value is correct or not is irrelevant.
> 
> All that matters is that the code honors the str/len constraint, and
> not blow up - even if the data inside that str/len buffer might not be
> stable.
> 
> I don't see how the utf8 code could possibly mess up.
> 
> That code goes back to commit
> 
>   2ce3ee931a09 ("ext4: avoid utf8_strncasecmp() with unstable name")
>   fc3bb095ab02 ("f2fs: avoid utf8_strncasecmp() with unstable name")
> 
> and I think it's bogus.
> 
> Eric - the string *data* may be unsafe, but the string length passed
> to the utf8 routines is not changing any more (since it was loaded
> long ago).
> 
> And honestly, no amount of "the data may change" should possibly ever
> cause the utf8 code to then ignore the length that was passed in.
> 

Since utf8_strncasecmp() has to parse the UTF-8 sequences from the strings, and
UTF-8 sequences are variable-length and may be invalid, there are cases in which
it reads bytes from the strings multiple times.  This makes it especially
vulnerable to non-benign data races, when compared to simpler functions like
memcpy() or memcmp() that more straightforwardly do one pass through the data.

The onus should be on proving the code is safe, not proving that it's unsafe.
But for a specific example of how a data race in utf8_strncasecmp() might be
able to cause a memory safety violation, see the example I gave at
https://lore.kernel.org/linux-fsdevel/20200212063440.GL870@sol.localdomain/.

- Eric
diff mbox series

Patch

diff --git a/fs/libfs.c b/fs/libfs.c
index eec6031b0155..306a0510b7dc 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1704,16 +1704,28 @@  bool is_empty_dir_inode(struct inode *inode)
 static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
 				const char *str, const struct qstr *name)
 {
-	const struct dentry *parent = READ_ONCE(dentry->d_parent);
-	const struct inode *dir = READ_ONCE(parent->d_inode);
-	const struct super_block *sb = dentry->d_sb;
-	const struct unicode_map *um = sb->s_encoding;
-	struct qstr qstr = QSTR_INIT(str, len);
+	const struct dentry *parent;
+	const struct inode *dir;
 	char strbuf[DNAME_INLINE_LEN];
-	int ret;
+	struct qstr qstr;
+
+	/*
+	 * Attempt a case-sensitive match first. It is cheaper and
+	 * should cover most lookups, including all the sane
+	 * applications that expect a case-sensitive filesystem.
+	 *
+	 * This comparison is safe under RCU because the caller
+	 * guarantees the consistency between str and len. See
+	 * __d_lookup_rcu_op_compare() for details.
+	 */
+	if (len == name->len && !memcmp(str, name->name, len))
+		return 0;
 
+	parent = READ_ONCE(dentry->d_parent);
+	dir = READ_ONCE(parent->d_inode);
 	if (!dir || !IS_CASEFOLDED(dir))
-		goto fallback;
+		return 1;
+
 	/*
 	 * If the dentry name is stored in-line, then it may be concurrently
 	 * modified by a rename.  If this happens, the VFS will eventually retry
@@ -1724,20 +1736,14 @@  static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
 	if (len <= DNAME_INLINE_LEN - 1) {
 		memcpy(strbuf, str, len);
 		strbuf[len] = 0;
-		qstr.name = strbuf;
+		str = strbuf;
 		/* prevent compiler from optimizing out the temporary buffer */
 		barrier();
 	}
-	ret = utf8_strncasecmp(um, name, &qstr);
-	if (ret >= 0)
-		return ret;
+	qstr.len = len;
+	qstr.name = str;
 
-	if (sb_has_strict_encoding(sb))
-		return -EINVAL;
-fallback:
-	if (len != name->len)
-		return 1;
-	return !!memcmp(str, name->name, len);
+	return utf8_strncasecmp(dentry->d_sb->s_encoding, name, &qstr);
 }
 
 /**