Message ID | 87ttn2sip7.fsf_-_@mailhost.krisman.be (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] libfs: Attempt exact-match comparison first during casefolded lookup | expand |
On Wed, 24 Jan 2024 at 10:13, Gabriel Krisman Bertazi <krisman@suse.de> wrote: > > Just for completeness, below the version I intend to apply to > unicode/for-next , which is the v2, plus the comments you and Eric > requested. That is, unless something else comes up. Looks ok to me. My one comment is actually unrelated to the new code, just because the patch touches this code too: > if (len <= DNAME_INLINE_LEN - 1) { > memcpy(strbuf, str, len); > strbuf[len] = 0; > - qstr.name = strbuf; > + str = strbuf; > /* prevent compiler from optimizing out the temporary buffer */ > barrier(); The reason for this whole mess is that allegedly utf8_strncasecmp() is not safe if the buffer changes under it. At least that's what the comment says. But honestly, I don't see it. I think the whole "copy to a stable buffer" code can and should just be removed as voodoo programming. *If* the buffer is actually changing, the name lookup code will just retry, so whether the return value is correct or not is irrelevant. All that matters is that the code honors the str/len constraint, and not blow up - even if the data inside that str/len buffer might not be stable. I don't see how the utf8 code could possibly mess up. That code goes back to commit 2ce3ee931a09 ("ext4: avoid utf8_strncasecmp() with unstable name") fc3bb095ab02 ("f2fs: avoid utf8_strncasecmp() with unstable name") and I think it's bogus. Eric - the string *data* may be unsafe, but the string length passed to the utf8 routines is not changing any more (since it was loaded long ago). And honestly, no amount of "the data may change" should possibly ever cause the utf8 code to then ignore the length that was passed in. Linus
Hi Linus, On Wed, Jan 24, 2024 at 10:42:51AM -0800, Linus Torvalds wrote: > On Wed, 24 Jan 2024 at 10:13, Gabriel Krisman Bertazi <krisman@suse.de> wrote: > > > > Just for completeness, below the version I intend to apply to > > unicode/for-next , which is the v2, plus the comments you and Eric > > requested. That is, unless something else comes up. > > Looks ok to me. > > My one comment is actually unrelated to the new code, just because the > patch touches this code too: > > > if (len <= DNAME_INLINE_LEN - 1) { > > memcpy(strbuf, str, len); > > strbuf[len] = 0; > > - qstr.name = strbuf; > > + str = strbuf; > > /* prevent compiler from optimizing out the temporary buffer */ > > barrier(); > > The reason for this whole mess is that allegedly utf8_strncasecmp() is > not safe if the buffer changes under it. > > At least that's what the comment says. > > But honestly, I don't see it. > > I think the whole "copy to a stable buffer" code can and should just > be removed as voodoo programming. > > *If* the buffer is actually changing, the name lookup code will just > retry, so whether the return value is correct or not is irrelevant. > > All that matters is that the code honors the str/len constraint, and > not blow up - even if the data inside that str/len buffer might not be > stable. > > I don't see how the utf8 code could possibly mess up. > > That code goes back to commit > > 2ce3ee931a09 ("ext4: avoid utf8_strncasecmp() with unstable name") > fc3bb095ab02 ("f2fs: avoid utf8_strncasecmp() with unstable name") > > and I think it's bogus. > > Eric - the string *data* may be unsafe, but the string length passed > to the utf8 routines is not changing any more (since it was loaded > long ago). > > And honestly, no amount of "the data may change" should possibly ever > cause the utf8 code to then ignore the length that was passed in. > Since utf8_strncasecmp() has to parse the UTF-8 sequences from the strings, and UTF-8 sequences are variable-length and may be invalid, there are cases in which it reads bytes from the strings multiple times. This makes it especially vulnerable to non-benign data races, when compared to simpler functions like memcpy() or memcmp() that more straightforwardly do one pass through the data. The onus should be on proving the code is safe, not proving that it's unsafe. But for a specific example of how a data race in utf8_strncasecmp() might be able to cause a memory safety violation, see the example I gave at https://lore.kernel.org/linux-fsdevel/20200212063440.GL870@sol.localdomain/. - Eric
diff --git a/fs/libfs.c b/fs/libfs.c index eec6031b0155..306a0510b7dc 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -1704,16 +1704,28 @@ bool is_empty_dir_inode(struct inode *inode) static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, const char *str, const struct qstr *name) { - const struct dentry *parent = READ_ONCE(dentry->d_parent); - const struct inode *dir = READ_ONCE(parent->d_inode); - const struct super_block *sb = dentry->d_sb; - const struct unicode_map *um = sb->s_encoding; - struct qstr qstr = QSTR_INIT(str, len); + const struct dentry *parent; + const struct inode *dir; char strbuf[DNAME_INLINE_LEN]; - int ret; + struct qstr qstr; + + /* + * Attempt a case-sensitive match first. It is cheaper and + * should cover most lookups, including all the sane + * applications that expect a case-sensitive filesystem. + * + * This comparison is safe under RCU because the caller + * guarantees the consistency between str and len. See + * __d_lookup_rcu_op_compare() for details. + */ + if (len == name->len && !memcmp(str, name->name, len)) + return 0; + parent = READ_ONCE(dentry->d_parent); + dir = READ_ONCE(parent->d_inode); if (!dir || !IS_CASEFOLDED(dir)) - goto fallback; + return 1; + /* * If the dentry name is stored in-line, then it may be concurrently * modified by a rename. If this happens, the VFS will eventually retry @@ -1724,20 +1736,14 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, if (len <= DNAME_INLINE_LEN - 1) { memcpy(strbuf, str, len); strbuf[len] = 0; - qstr.name = strbuf; + str = strbuf; /* prevent compiler from optimizing out the temporary buffer */ barrier(); } - ret = utf8_strncasecmp(um, name, &qstr); - if (ret >= 0) - return ret; + qstr.len = len; + qstr.name = str; - if (sb_has_strict_encoding(sb)) - return -EINVAL; -fallback: - if (len != name->len) - return 1; - return !!memcmp(str, name->name, len); + return utf8_strncasecmp(dentry->d_sb->s_encoding, name, &qstr); } /**