From patchwork Wed Mar 31 21:07:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shreeya Patel X-Patchwork-Id: 12176319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC3A2C43462 for ; Wed, 31 Mar 2021 21:08:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 75E36610CD for ; Wed, 31 Mar 2021 21:08:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231180AbhCaVIR (ORCPT ); Wed, 31 Mar 2021 17:08:17 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:34602 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229959AbhCaVIN (ORCPT ); Wed, 31 Mar 2021 17:08:13 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: shreeya) with ESMTPSA id 290BF1F46279 From: Shreeya Patel To: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org, chao@kernel.org, krisman@collabora.com, ebiggers@google.com, drosen@google.com, ebiggers@kernel.org, yuchao0@huawei.com Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, kernel@collabora.com, andre.almeida@collabora.com, kernel test robot Subject: [PATCH v6 1/4] fs: unicode: Use strscpy() instead of strncpy() Date: Thu, 1 Apr 2021 02:37:48 +0530 Message-Id: <20210331210751.281645-2-shreeya.patel@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210331210751.281645-1-shreeya.patel@collabora.com> References: <20210331210751.281645-1-shreeya.patel@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Following warning was reported by Kernel Test Robot. In function 'utf8_parse_version', inlined from 'utf8_load' at fs/unicode/utf8mod.c:195:7: >> fs/unicode/utf8mod.c:175:2: warning: 'strncpy' specified bound 12 equals destination size [-Wstringop-truncation] 175 | strncpy(version_string, version, sizeof(version_string)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The -Wstringop-truncation warning highlights the unintended uses of the strncpy function that truncate the terminating NULL character from the source string. Unlike strncpy(), strscpy() always null-terminates the destination string, hence use strscpy() instead of strncpy(). Fixes: 9d53690f0d4e5 (unicode: implement higher level API for string handling) Acked-by: Gabriel Krisman Bertazi Signed-off-by: Shreeya Patel Reported-by: kernel test robot --- fs/unicode/utf8-core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index dc25823bfed9..f9e6a2718aba 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -179,8 +179,10 @@ static int utf8_parse_version(const char *version, unsigned int *maj, {1, "%d.%d.%d"}, {0, NULL} }; + int ret = strscpy(version_string, version, sizeof(version_string)); - strncpy(version_string, version, sizeof(version_string)); + if (ret < 0) + return ret; if (match_token(version_string, token, args) != 1) return -EINVAL; From patchwork Wed Mar 31 21:07:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shreeya Patel X-Patchwork-Id: 12176321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CD09C43460 for ; Wed, 31 Mar 2021 21:09:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 562C961056 for ; Wed, 31 Mar 2021 21:09:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232310AbhCaVIu (ORCPT ); Wed, 31 Mar 2021 17:08:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231544AbhCaVIS (ORCPT ); Wed, 31 Mar 2021 17:08:18 -0400 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E641EC061574; Wed, 31 Mar 2021 14:08:17 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: shreeya) with ESMTPSA id 45B4C1F461B2 From: Shreeya Patel To: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org, chao@kernel.org, krisman@collabora.com, ebiggers@google.com, drosen@google.com, ebiggers@kernel.org, yuchao0@huawei.com Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, kernel@collabora.com, andre.almeida@collabora.com Subject: [PATCH v6 2/4] fs: unicode: Rename function names from utf8 to unicode Date: Thu, 1 Apr 2021 02:37:49 +0530 Message-Id: <20210331210751.281645-3-shreeya.patel@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210331210751.281645-1-shreeya.patel@collabora.com> References: <20210331210751.281645-1-shreeya.patel@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org utf8data.h_shipped has a large database table which is an auto-generated decodification trie for the unicode normalization functions and it is not necessary to carry this large table in the kernel. Goal is to make UTF-8 encoding loadable by converting it into a module and adding a unicode subsystem layer between the filesystems and the utf8 module. This layer will load the module whenever any filesystem that needs unicode is mounted. Currently, only UTF-8 encoding is supported but if any other encodings are supported in future then the layer file would be responsible for loading the desired encoding module. utf8-core will be converted into this layer file in the future patches. Rename the function names from utf8 to unicode which will denote the functions as the unicode subsystem layer functions and this will also be the first step towards the transformation of utf8-core file into the unicode subsystem layer file. Signed-off-by: Shreeya Patel --- Changes in v6 - Improve the commit message fs/ext4/hash.c | 2 +- fs/ext4/namei.c | 12 ++++---- fs/ext4/super.c | 6 ++-- fs/f2fs/dir.c | 12 ++++---- fs/f2fs/super.c | 6 ++-- fs/libfs.c | 6 ++-- fs/unicode/utf8-core.c | 57 +++++++++++++++++++------------------- fs/unicode/utf8-selftest.c | 8 +++--- include/linux/unicode.h | 32 ++++++++++----------- 9 files changed, 70 insertions(+), 71 deletions(-) diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c index a92eb79de0cc..8890a76abe86 100644 --- a/fs/ext4/hash.c +++ b/fs/ext4/hash.c @@ -285,7 +285,7 @@ int ext4fs_dirhash(const struct inode *dir, const char *name, int len, if (!buff) return -ENOMEM; - dlen = utf8_casefold(um, &qstr, buff, PATH_MAX); + dlen = unicode_casefold(um, &qstr, buff, PATH_MAX); if (dlen < 0) { kfree(buff); goto opaque_seq; diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 686bf982c84e..dde5ce795416 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1290,9 +1290,9 @@ int ext4_ci_compare(const struct inode *parent, const struct qstr *name, int ret; if (quick) - ret = utf8_strncasecmp_folded(um, name, entry); + ret = unicode_strncasecmp_folded(um, name, entry); else - ret = utf8_strncasecmp(um, name, entry); + ret = unicode_strncasecmp(um, name, entry); if (ret < 0) { /* Handle invalid character sequence as either an error @@ -1324,9 +1324,9 @@ void ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname, if (!cf_name->name) return; - len = utf8_casefold(dir->i_sb->s_encoding, - iname, cf_name->name, - EXT4_NAME_LEN); + len = unicode_casefold(dir->i_sb->s_encoding, + iname, cf_name->name, + EXT4_NAME_LEN); if (len <= 0) { kfree(cf_name->name); cf_name->name = NULL; @@ -2201,7 +2201,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry, #ifdef CONFIG_UNICODE if (sb_has_strict_encoding(sb) && IS_CASEFOLDED(dir) && - sb->s_encoding && utf8_validate(sb->s_encoding, &dentry->d_name)) + sb->s_encoding && unicode_validate(sb->s_encoding, &dentry->d_name)) return -EINVAL; #endif diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ad34a37278cd..2fb845752c90 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1259,7 +1259,7 @@ static void ext4_put_super(struct super_block *sb) fs_put_dax(sbi->s_daxdev); fscrypt_free_dummy_policy(&sbi->s_dummy_enc_policy); #ifdef CONFIG_UNICODE - utf8_unload(sb->s_encoding); + unicode_unload(sb->s_encoding); #endif kfree(sbi); } @@ -4304,7 +4304,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) goto failed_mount; } - encoding = utf8_load(encoding_info->version); + encoding = unicode_load(encoding_info->version); if (IS_ERR(encoding)) { ext4_msg(sb, KERN_ERR, "can't mount with superblock charset: %s-%s " @@ -5165,7 +5165,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) crypto_free_shash(sbi->s_chksum_driver); #ifdef CONFIG_UNICODE - utf8_unload(sb->s_encoding); + unicode_unload(sb->s_encoding); #endif #ifdef CONFIG_QUOTA diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index e6270a867be1..f160f9dd667d 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -84,10 +84,10 @@ int f2fs_init_casefolded_name(const struct inode *dir, GFP_NOFS); if (!fname->cf_name.name) return -ENOMEM; - fname->cf_name.len = utf8_casefold(sb->s_encoding, - fname->usr_fname, - fname->cf_name.name, - F2FS_NAME_LEN); + fname->cf_name.len = unicode_casefold(sb->s_encoding, + fname->usr_fname, + fname->cf_name.name, + F2FS_NAME_LEN); if ((int)fname->cf_name.len <= 0) { kfree(fname->cf_name.name); fname->cf_name.name = NULL; @@ -237,7 +237,7 @@ static int f2fs_match_ci_name(const struct inode *dir, const struct qstr *name, entry.len = decrypted_name.len; } - res = utf8_strncasecmp_folded(um, name, &entry); + res = unicode_strncasecmp_folded(um, name, &entry); /* * In strict mode, ignore invalid names. In non-strict mode, * fall back to treating them as opaque byte sequences. @@ -246,7 +246,7 @@ static int f2fs_match_ci_name(const struct inode *dir, const struct qstr *name, res = name->len == entry.len && memcmp(name->name, entry.name, name->len) == 0; } else { - /* utf8_strncasecmp_folded returns 0 on match */ + /* unicode_strncasecmp_folded returns 0 on match */ res = (res == 0); } out: diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 7069793752f1..b4a92e763e27 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1430,7 +1430,7 @@ static void f2fs_put_super(struct super_block *sb) for (i = 0; i < NR_PAGE_TYPE; i++) kvfree(sbi->write_io[i]); #ifdef CONFIG_UNICODE - utf8_unload(sb->s_encoding); + unicode_unload(sb->s_encoding); #endif kfree(sbi); } @@ -3560,7 +3560,7 @@ static int f2fs_setup_casefold(struct f2fs_sb_info *sbi) return -EINVAL; } - encoding = utf8_load(encoding_info->version); + encoding = unicode_load(encoding_info->version); if (IS_ERR(encoding)) { f2fs_err(sbi, "can't mount with superblock charset: %s-%s " @@ -4073,7 +4073,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) kvfree(sbi->write_io[i]); #ifdef CONFIG_UNICODE - utf8_unload(sb->s_encoding); + unicode_unload(sb->s_encoding); sb->s_encoding = NULL; #endif free_options: diff --git a/fs/libfs.c b/fs/libfs.c index e2de5401abca..766556165bb5 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -1404,7 +1404,7 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, * If the dentry name is stored in-line, then it may be concurrently * modified by a rename. If this happens, the VFS will eventually retry * the lookup, so it doesn't matter what ->d_compare() returns. - * However, it's unsafe to call utf8_strncasecmp() with an unstable + * However, it's unsafe to call unicode_strncasecmp() with an unstable * string. Therefore, we have to copy the name into a temporary buffer. */ if (len <= DNAME_INLINE_LEN - 1) { @@ -1414,7 +1414,7 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, /* prevent compiler from optimizing out the temporary buffer */ barrier(); } - ret = utf8_strncasecmp(um, name, &qstr); + ret = unicode_strncasecmp(um, name, &qstr); if (ret >= 0) return ret; @@ -1443,7 +1443,7 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str) if (!dir || !needs_casefold(dir)) return 0; - ret = utf8_casefold_hash(um, dentry, str); + ret = unicode_casefold_hash(um, dentry, str); if (ret < 0 && sb_has_strict_encoding(sb)) return -EINVAL; return 0; diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index f9e6a2718aba..730dbaedf593 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -10,7 +10,7 @@ #include "utf8n.h" -int utf8_validate(const struct unicode_map *um, const struct qstr *str) +int unicode_validate(const struct unicode_map *um, const struct qstr *str) { const struct utf8data *data = utf8nfdi(um->version); @@ -18,10 +18,10 @@ int utf8_validate(const struct unicode_map *um, const struct qstr *str) return -1; return 0; } -EXPORT_SYMBOL(utf8_validate); +EXPORT_SYMBOL(unicode_validate); -int utf8_strncmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2) +int unicode_strncmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) { const struct utf8data *data = utf8nfdi(um->version); struct utf8cursor cur1, cur2; @@ -45,10 +45,10 @@ int utf8_strncmp(const struct unicode_map *um, return 0; } -EXPORT_SYMBOL(utf8_strncmp); +EXPORT_SYMBOL(unicode_strncmp); -int utf8_strncasecmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2) +int unicode_strncasecmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) { const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur1, cur2; @@ -72,14 +72,14 @@ int utf8_strncasecmp(const struct unicode_map *um, return 0; } -EXPORT_SYMBOL(utf8_strncasecmp); +EXPORT_SYMBOL(unicode_strncasecmp); /* String cf is expected to be a valid UTF-8 casefolded * string. */ -int utf8_strncasecmp_folded(const struct unicode_map *um, - const struct qstr *cf, - const struct qstr *s1) +int unicode_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1) { const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur1; @@ -100,10 +100,10 @@ int utf8_strncasecmp_folded(const struct unicode_map *um, return 0; } -EXPORT_SYMBOL(utf8_strncasecmp_folded); +EXPORT_SYMBOL(unicode_strncasecmp_folded); -int utf8_casefold(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen) +int unicode_casefold(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) { const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur; @@ -123,10 +123,10 @@ int utf8_casefold(const struct unicode_map *um, const struct qstr *str, } return -EINVAL; } -EXPORT_SYMBOL(utf8_casefold); +EXPORT_SYMBOL(unicode_casefold); -int utf8_casefold_hash(const struct unicode_map *um, const void *salt, - struct qstr *str) +int unicode_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str) { const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur; @@ -144,10 +144,10 @@ int utf8_casefold_hash(const struct unicode_map *um, const void *salt, str->hash = end_name_hash(hash); return 0; } -EXPORT_SYMBOL(utf8_casefold_hash); +EXPORT_SYMBOL(unicode_casefold_hash); -int utf8_normalize(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen) +int unicode_normalize(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) { const struct utf8data *data = utf8nfdi(um->version); struct utf8cursor cur; @@ -167,11 +167,10 @@ int utf8_normalize(const struct unicode_map *um, const struct qstr *str, } return -EINVAL; } +EXPORT_SYMBOL(unicode_normalize); -EXPORT_SYMBOL(utf8_normalize); - -static int utf8_parse_version(const char *version, unsigned int *maj, - unsigned int *min, unsigned int *rev) +static int unicode_parse_version(const char *version, unsigned int *maj, + unsigned int *min, unsigned int *rev) { substring_t args[3]; char version_string[12]; @@ -194,7 +193,7 @@ static int utf8_parse_version(const char *version, unsigned int *maj, return 0; } -struct unicode_map *utf8_load(const char *version) +struct unicode_map *unicode_load(const char *version) { struct unicode_map *um = NULL; int unicode_version; @@ -202,7 +201,7 @@ struct unicode_map *utf8_load(const char *version) if (version) { unsigned int maj, min, rev; - if (utf8_parse_version(version, &maj, &min, &rev) < 0) + if (unicode_parse_version(version, &maj, &min, &rev) < 0) return ERR_PTR(-EINVAL); if (!utf8version_is_supported(maj, min, rev)) @@ -227,12 +226,12 @@ struct unicode_map *utf8_load(const char *version) return um; } -EXPORT_SYMBOL(utf8_load); +EXPORT_SYMBOL(unicode_load); -void utf8_unload(struct unicode_map *um) +void unicode_unload(struct unicode_map *um) { kfree(um); } -EXPORT_SYMBOL(utf8_unload); +EXPORT_SYMBOL(unicode_unload); MODULE_LICENSE("GPL v2"); diff --git a/fs/unicode/utf8-selftest.c b/fs/unicode/utf8-selftest.c index 6fe8af7edccb..796c1ed922ea 100644 --- a/fs/unicode/utf8-selftest.c +++ b/fs/unicode/utf8-selftest.c @@ -235,7 +235,7 @@ static void check_utf8_nfdicf(void) static void check_utf8_comparisons(void) { int i; - struct unicode_map *table = utf8_load("12.1.0"); + struct unicode_map *table = unicode_load("12.1.0"); if (IS_ERR(table)) { pr_err("%s: Unable to load utf8 %d.%d.%d. Skipping.\n", @@ -249,7 +249,7 @@ static void check_utf8_comparisons(void) const struct qstr s2 = {.name = nfdi_test_data[i].dec, .len = sizeof(nfdi_test_data[i].dec)}; - test_f(!utf8_strncmp(table, &s1, &s2), + test_f(!unicode_strncmp(table, &s1, &s2), "%s %s comparison mismatch\n", s1.name, s2.name); } @@ -259,11 +259,11 @@ static void check_utf8_comparisons(void) const struct qstr s2 = {.name = nfdicf_test_data[i].ncf, .len = sizeof(nfdicf_test_data[i].ncf)}; - test_f(!utf8_strncasecmp(table, &s1, &s2), + test_f(!unicode_strncasecmp(table, &s1, &s2), "%s %s comparison mismatch\n", s1.name, s2.name); } - utf8_unload(table); + unicode_unload(table); } static void check_supported_versions(void) diff --git a/include/linux/unicode.h b/include/linux/unicode.h index 74484d44c755..de23f9ee720b 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -10,27 +10,27 @@ struct unicode_map { int version; }; -int utf8_validate(const struct unicode_map *um, const struct qstr *str); +int unicode_validate(const struct unicode_map *um, const struct qstr *str); -int utf8_strncmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2); +int unicode_strncmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2); -int utf8_strncasecmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2); -int utf8_strncasecmp_folded(const struct unicode_map *um, - const struct qstr *cf, - const struct qstr *s1); +int unicode_strncasecmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2); +int unicode_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1); -int utf8_normalize(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen); +int unicode_normalize(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen); -int utf8_casefold(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen); +int unicode_casefold(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen); -int utf8_casefold_hash(const struct unicode_map *um, const void *salt, - struct qstr *str); +int unicode_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str); -struct unicode_map *utf8_load(const char *version); -void utf8_unload(struct unicode_map *um); +struct unicode_map *unicode_load(const char *version); +void unicode_unload(struct unicode_map *um); #endif /* _LINUX_UNICODE_H */ From patchwork Wed Mar 31 21:07:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shreeya Patel X-Patchwork-Id: 12176325 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FCB2C4360C for ; Wed, 31 Mar 2021 21:09:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6972961073 for ; Wed, 31 Mar 2021 21:09:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232467AbhCaVIv (ORCPT ); Wed, 31 Mar 2021 17:08:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229959AbhCaVIU (ORCPT ); Wed, 31 Mar 2021 17:08:20 -0400 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACF08C061574; Wed, 31 Mar 2021 14:08:20 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: shreeya) with ESMTPSA id 5D8821F4628A From: Shreeya Patel To: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org, chao@kernel.org, krisman@collabora.com, ebiggers@google.com, drosen@google.com, ebiggers@kernel.org, yuchao0@huawei.com Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, kernel@collabora.com, andre.almeida@collabora.com Subject: [PATCH v6 3/4] fs: unicode: Rename utf8-core file to unicode-core Date: Thu, 1 Apr 2021 02:37:50 +0530 Message-Id: <20210331210751.281645-4-shreeya.patel@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210331210751.281645-1-shreeya.patel@collabora.com> References: <20210331210751.281645-1-shreeya.patel@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org utf8data.h_shipped has a large database table which is an auto-generated decodification trie for the unicode normalization functions and it is not necessary to carry this large table in the kernel. Goal is to make UTF-8 encoding loadable by converting it into a module and adding a unicode subsystem layer between the filesystems and the utf8 module. This layer will load the module whenever any filesystem that needs unicode is mounted. Currently, only UTF-8 encoding is supported but if any other encodings are supported in future then the layer file would be responsible for loading the desired encoding module. Rename the file name from utf8-core to unicode-core for transformation of utf8-core file into the unicode subsystem layer file and also for better understanding. Implementation for unicode-core file to act as layer will be added in the future patches. Signed-off-by: Shreeya Patel --- Changes in v6 - Improve the commit message. fs/unicode/Makefile | 2 +- fs/unicode/{utf8-core.c => unicode-core.c} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename fs/unicode/{utf8-core.c => unicode-core.c} (100%) diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile index b88aecc86550..fbf9a629ed0d 100644 --- a/fs/unicode/Makefile +++ b/fs/unicode/Makefile @@ -3,7 +3,7 @@ obj-$(CONFIG_UNICODE) += unicode.o obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o -unicode-y := utf8-norm.o utf8-core.o +unicode-y := utf8-norm.o unicode-core.o $(obj)/utf8-norm.o: $(obj)/utf8data.h diff --git a/fs/unicode/utf8-core.c b/fs/unicode/unicode-core.c similarity index 100% rename from fs/unicode/utf8-core.c rename to fs/unicode/unicode-core.c From patchwork Wed Mar 31 21:07:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shreeya Patel X-Patchwork-Id: 12176323 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B92F9C43461 for ; Wed, 31 Mar 2021 21:09:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8BC07610A1 for ; Wed, 31 Mar 2021 21:09:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232523AbhCaVIw (ORCPT ); Wed, 31 Mar 2021 17:08:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232109AbhCaVIZ (ORCPT ); Wed, 31 Mar 2021 17:08:25 -0400 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBC32C061574; Wed, 31 Mar 2021 14:08:24 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: shreeya) with ESMTPSA id 1107A1F462A0 From: Shreeya Patel To: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org, chao@kernel.org, krisman@collabora.com, ebiggers@google.com, drosen@google.com, ebiggers@kernel.org, yuchao0@huawei.com Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, kernel@collabora.com, andre.almeida@collabora.com Subject: [PATCH v6 4/4] fs: unicode: Add utf8 module and a unicode layer Date: Thu, 1 Apr 2021 02:37:51 +0530 Message-Id: <20210331210751.281645-5-shreeya.patel@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210331210751.281645-1-shreeya.patel@collabora.com> References: <20210331210751.281645-1-shreeya.patel@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org utf8data.h_shipped has a large database table which is an auto-generated decodification trie for the unicode normalization functions. It is not necessary to load this large table in the kernel if no filesystem is using it, hence make UTF-8 encoding loadable by converting it into a module. Modify the file called unicode-core which will act as a layer for unicode subsystem. It will load the UTF-8 module and access it's functions whenever any filesystem that needs unicode is mounted. Currently, only UTF-8 encoding is supported but if any other encodings are supported in future then the layer file would be responsible for loading the desired encoding module. Also, indirect calls using function pointers are slow, use static calls to avoid overhead caused in case of repeated indirect calls. Static calls improves the performance by directly calling the functions as opposed to indirect calls. Signed-off-by: Shreeya Patel --- Changes in v6 - Add spinlock to protect utf8mod and avoid NULL pointer dereference. - Change the static call function names for being consistent with kernel coding style. - Merge the unicode_load_module function with unicode_load as it is not really needed to have a separate function. - Use try_then_module_get instead of module_get to avoid loading the module even when it is already loaded. - Improve the commit message. Changes in v5 - Rename global variables and default static call functions for better understanding - Make only config UNICODE_UTF8 visible and config UNICODE to be always enabled provided UNICODE_UTF8 is enabled. - Improve the documentation for Kconfig - Improve the commit message. Changes in v4 - Return error from the static calls instead of doing nothing and succeeding even without loading the module. - Remove the complete usage of utf8_ops and use static calls at all places. - Restore the static calls to default values when module is unloaded. - Decrement the reference of module after calling the unload function. - Remove spinlock as there will be no race conditions after removing utf8_ops. Changes in v3 - Add a patch which checks if utf8 is loaded before calling utf8_unload() in ext4 and f2fs filesystems - Return error if strscpy() returns value < 0 - Correct the conditions to prevent NULL pointer dereference while accessing functions via utf8_ops variable. - Add spinlock to avoid race conditions. - Use static_call() for preventing speculative execution attacks. Changes in v2 - Remove the duplicate file from the last patch. - Make the wrapper functions inline. - Remove msleep and use try_module_get() and module_put() for ensuring that module is loaded correctly and also doesn't get unloaded while in use. - Resolve the warning reported by kernel test robot. - Resolve all the checkpatch.pl warnings. fs/unicode/Kconfig | 17 ++- fs/unicode/Makefile | 5 +- fs/unicode/unicode-core.c | 289 ++++++++++++++------------------------ fs/unicode/unicode-utf8.c | 264 ++++++++++++++++++++++++++++++++++ include/linux/unicode.h | 96 +++++++++++-- 5 files changed, 467 insertions(+), 204 deletions(-) create mode 100644 fs/unicode/unicode-utf8.c diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig index 2c27b9a5cd6c..ad4b837f2eb2 100644 --- a/fs/unicode/Kconfig +++ b/fs/unicode/Kconfig @@ -2,13 +2,26 @@ # # UTF-8 normalization # +# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8 +# is enabled. This config option adds the unicode subsystem layer which loads +# the UTF-8 module whenever any filesystem needs it. config UNICODE - bool "UTF-8 normalization and casefolding support" + bool + +# utf8data.h_shipped has a large database table which is an auto-generated +# decodification trie for the unicode normalization functions and it is not +# necessary to carry this large table in the kernel. +# Enabling UNICODE_UTF8 option will allow UTF-8 encoding to be built as a +# module and this module will be loaded by the unicode subsystem layer only +# when any filesystem needs it. +config UNICODE_UTF8 + tristate "UTF-8 module" help Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding support. + select UNICODE config UNICODE_NORMALIZATION_SELFTEST tristate "Test UTF-8 normalization support" - depends on UNICODE + depends on UNICODE_UTF8 default n diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile index fbf9a629ed0d..49d50083e6ee 100644 --- a/fs/unicode/Makefile +++ b/fs/unicode/Makefile @@ -1,11 +1,14 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_UNICODE) += unicode.o +obj-$(CONFIG_UNICODE_UTF8) += utf8.o obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o -unicode-y := utf8-norm.o unicode-core.o +unicode-y := unicode-core.o +utf8-y := unicode-utf8.o utf8-norm.o $(obj)/utf8-norm.o: $(obj)/utf8data.h +$(obj)/unicode-utf8.o: $(obj)/utf8-norm.o # In the normal build, the checked-in utf8data.h is just shipped. # diff --git a/fs/unicode/unicode-core.c b/fs/unicode/unicode-core.c index 730dbaedf593..5fef398d8bdf 100644 --- a/fs/unicode/unicode-core.c +++ b/fs/unicode/unicode-core.c @@ -1,237 +1,152 @@ /* SPDX-License-Identifier: GPL-2.0 */ #include #include -#include #include -#include #include #include -#include +#include -#include "utf8n.h" +DEFINE_SPINLOCK(utf8mod_lock); -int unicode_validate(const struct unicode_map *um, const struct qstr *str) -{ - const struct utf8data *data = utf8nfdi(um->version); +static struct module *utf8mod; - if (utf8nlen(data, str->name, str->len) < 0) - return -1; - return 0; +int unicode_validate_default(const struct unicode_map *um, + const struct qstr *str) +{ + WARN_ON(1); + return -EIO; } -EXPORT_SYMBOL(unicode_validate); +EXPORT_SYMBOL(unicode_validate_default); -int unicode_strncmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2) +int unicode_strncmp_default(const struct unicode_map *um, + const struct qstr *s1, + const struct qstr *s2) { - const struct utf8data *data = utf8nfdi(um->version); - struct utf8cursor cur1, cur2; - int c1, c2; - - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) - return -EINVAL; - - if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) - return -EINVAL; - - do { - c1 = utf8byte(&cur1); - c2 = utf8byte(&cur2); - - if (c1 < 0 || c2 < 0) - return -EINVAL; - if (c1 != c2) - return 1; - } while (c1); - - return 0; + WARN_ON(1); + return -EIO; } -EXPORT_SYMBOL(unicode_strncmp); +EXPORT_SYMBOL(unicode_strncmp_default); -int unicode_strncasecmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2) +int unicode_strncasecmp_default(const struct unicode_map *um, + const struct qstr *s1, + const struct qstr *s2) { - const struct utf8data *data = utf8nfdicf(um->version); - struct utf8cursor cur1, cur2; - int c1, c2; - - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) - return -EINVAL; - - if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) - return -EINVAL; - - do { - c1 = utf8byte(&cur1); - c2 = utf8byte(&cur2); - - if (c1 < 0 || c2 < 0) - return -EINVAL; - if (c1 != c2) - return 1; - } while (c1); - - return 0; + WARN_ON(1); + return -EIO; } -EXPORT_SYMBOL(unicode_strncasecmp); - -/* String cf is expected to be a valid UTF-8 casefolded - * string. - */ -int unicode_strncasecmp_folded(const struct unicode_map *um, - const struct qstr *cf, - const struct qstr *s1) +EXPORT_SYMBOL(unicode_strncasecmp_default); + +int unicode_strncasecmp_folded_default(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1) { - const struct utf8data *data = utf8nfdicf(um->version); - struct utf8cursor cur1; - int c1, c2; - int i = 0; - - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) - return -EINVAL; - - do { - c1 = utf8byte(&cur1); - c2 = cf->name[i++]; - if (c1 < 0) - return -EINVAL; - if (c1 != c2) - return 1; - } while (c1); - - return 0; + WARN_ON(1); + return -EIO; } -EXPORT_SYMBOL(unicode_strncasecmp_folded); +EXPORT_SYMBOL(unicode_strncasecmp_folded_default); -int unicode_casefold(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen) +int unicode_normalize_default(const struct unicode_map *um, + const struct qstr *str, + unsigned char *dest, size_t dlen) { - const struct utf8data *data = utf8nfdicf(um->version); - struct utf8cursor cur; - size_t nlen = 0; - - if (utf8ncursor(&cur, data, str->name, str->len) < 0) - return -EINVAL; - - for (nlen = 0; nlen < dlen; nlen++) { - int c = utf8byte(&cur); + WARN_ON(1); + return -EIO; +} +EXPORT_SYMBOL(unicode_normalize_default); - dest[nlen] = c; - if (!c) - return nlen; - if (c == -1) - break; - } - return -EINVAL; +int unicode_casefold_default(const struct unicode_map *um, + const struct qstr *str, + unsigned char *dest, size_t dlen) +{ + WARN_ON(1); + return -EIO; } -EXPORT_SYMBOL(unicode_casefold); +EXPORT_SYMBOL(unicode_casefold_default); -int unicode_casefold_hash(const struct unicode_map *um, const void *salt, - struct qstr *str) +int unicode_casefold_hash_default(const struct unicode_map *um, + const void *salt, struct qstr *str) { - const struct utf8data *data = utf8nfdicf(um->version); - struct utf8cursor cur; - int c; - unsigned long hash = init_name_hash(salt); - - if (utf8ncursor(&cur, data, str->name, str->len) < 0) - return -EINVAL; - - while ((c = utf8byte(&cur))) { - if (c < 0) - return -EINVAL; - hash = partial_name_hash((unsigned char)c, hash); - } - str->hash = end_name_hash(hash); - return 0; + WARN_ON(1); + return -EIO; } -EXPORT_SYMBOL(unicode_casefold_hash); +EXPORT_SYMBOL(unicode_casefold_hash_default); -int unicode_normalize(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen) +struct unicode_map *unicode_load_default(const char *version) { - const struct utf8data *data = utf8nfdi(um->version); - struct utf8cursor cur; - ssize_t nlen = 0; + WARN_ON(1); + return ERR_PTR(-EIO); +} +EXPORT_SYMBOL(unicode_load_default); - if (utf8ncursor(&cur, data, str->name, str->len) < 0) - return -EINVAL; +DEFINE_STATIC_CALL(unicode_validate_static_call, unicode_validate_default); +EXPORT_STATIC_CALL(unicode_validate_static_call); - for (nlen = 0; nlen < dlen; nlen++) { - int c = utf8byte(&cur); +DEFINE_STATIC_CALL(unicode_strncmp_static_call, unicode_strncmp_default); +EXPORT_STATIC_CALL(unicode_strncmp_static_call); - dest[nlen] = c; - if (!c) - return nlen; - if (c == -1) - break; - } - return -EINVAL; -} -EXPORT_SYMBOL(unicode_normalize); +DEFINE_STATIC_CALL(unicode_strncasecmp_static_call, + unicode_strncasecmp_default); +EXPORT_STATIC_CALL(unicode_strncasecmp_static_call); -static int unicode_parse_version(const char *version, unsigned int *maj, - unsigned int *min, unsigned int *rev) -{ - substring_t args[3]; - char version_string[12]; - static const struct match_token token[] = { - {1, "%d.%d.%d"}, - {0, NULL} - }; - int ret = strscpy(version_string, version, sizeof(version_string)); +DEFINE_STATIC_CALL(unicode_strncasecmp_folded_static_call, + unicode_strncasecmp_folded_default); +EXPORT_STATIC_CALL(unicode_strncasecmp_folded_static_call); - if (ret < 0) - return ret; +DEFINE_STATIC_CALL(unicode_normalize_static_call, unicode_normalize_default); +EXPORT_STATIC_CALL(unicode_normalize_static_call); - if (match_token(version_string, token, args) != 1) - return -EINVAL; +DEFINE_STATIC_CALL(unicode_casefold_static_call, unicode_casefold_default); +EXPORT_STATIC_CALL(unicode_casefold_static_call); - if (match_int(&args[0], maj) || match_int(&args[1], min) || - match_int(&args[2], rev)) - return -EINVAL; +DEFINE_STATIC_CALL(unicode_casefold_hash_static_call, + unicode_casefold_hash_default); +EXPORT_STATIC_CALL(unicode_casefold_hash_static_call); - return 0; -} +DEFINE_STATIC_CALL(unicode_load_static_call, unicode_load_default); +EXPORT_STATIC_CALL(unicode_load_static_call); struct unicode_map *unicode_load(const char *version) { - struct unicode_map *um = NULL; - int unicode_version; - - if (version) { - unsigned int maj, min, rev; - - if (unicode_parse_version(version, &maj, &min, &rev) < 0) - return ERR_PTR(-EINVAL); - - if (!utf8version_is_supported(maj, min, rev)) - return ERR_PTR(-EINVAL); - - unicode_version = UNICODE_AGE(maj, min, rev); - } else { - unicode_version = utf8version_latest(); - printk(KERN_WARNING"UTF-8 version not specified. " - "Assuming latest supported version (%d.%d.%d).", - (unicode_version >> 16) & 0xff, - (unicode_version >> 8) & 0xff, - (unicode_version & 0xff)); + try_then_request_module(utf8mod, "utf8"); + if (!utf8mod) { + pr_err("Failed to load UTF-8 module\n"); + return ERR_PTR(-ENODEV); } - um = kzalloc(sizeof(struct unicode_map), GFP_KERNEL); - if (!um) - return ERR_PTR(-ENOMEM); - - um->charset = "UTF-8"; - um->version = unicode_version; - - return um; + spin_lock(&utf8mod_lock); + if (!utf8mod || !try_module_get(utf8mod)) { + spin_unlock(&utf8mod_lock); + return ERR_PTR(-ENODEV); + } + spin_unlock(&utf8mod_lock); + return static_call(unicode_load_static_call)(version); } EXPORT_SYMBOL(unicode_load); void unicode_unload(struct unicode_map *um) { kfree(um); + + spin_lock(&utf8mod_lock); + if (utf8mod) + module_put(utf8mod); + spin_unlock(&utf8mod_lock); + } EXPORT_SYMBOL(unicode_unload); +void unicode_register(struct module *owner) +{ + utf8mod = owner; +} +EXPORT_SYMBOL(unicode_register); + +void unicode_unregister(void) +{ + spin_lock(&utf8mod_lock); + utf8mod = NULL; + spin_unlock(&utf8mod_lock); +} +EXPORT_SYMBOL(unicode_unregister); + MODULE_LICENSE("GPL v2"); diff --git a/fs/unicode/unicode-utf8.c b/fs/unicode/unicode-utf8.c new file mode 100644 index 000000000000..e0180f1c5ea8 --- /dev/null +++ b/fs/unicode/unicode-utf8.c @@ -0,0 +1,264 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "utf8n.h" + +static int utf8_validate(const struct unicode_map *um, const struct qstr *str) +{ + const struct utf8data *data = utf8nfdi(um->version); + + if (utf8nlen(data, str->name, str->len) < 0) + return -1; + return 0; +} + +static int utf8_strncmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) +{ + const struct utf8data *data = utf8nfdi(um->version); + struct utf8cursor cur1, cur2; + int c1, c2; + + if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) + return -EINVAL; + + if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) + return -EINVAL; + + do { + c1 = utf8byte(&cur1); + c2 = utf8byte(&cur2); + + if (c1 < 0 || c2 < 0) + return -EINVAL; + if (c1 != c2) + return 1; + } while (c1); + + return 0; +} + +static int utf8_strncasecmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur1, cur2; + int c1, c2; + + if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) + return -EINVAL; + + if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) + return -EINVAL; + + do { + c1 = utf8byte(&cur1); + c2 = utf8byte(&cur2); + + if (c1 < 0 || c2 < 0) + return -EINVAL; + if (c1 != c2) + return 1; + } while (c1); + + return 0; +} + +/* String cf is expected to be a valid UTF-8 casefolded + * string. + */ +static int utf8_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur1; + int c1, c2; + int i = 0; + + if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) + return -EINVAL; + + do { + c1 = utf8byte(&cur1); + c2 = cf->name[i++]; + if (c1 < 0) + return -EINVAL; + if (c1 != c2) + return 1; + } while (c1); + + return 0; +} + +static int utf8_casefold(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur; + size_t nlen = 0; + + if (utf8ncursor(&cur, data, str->name, str->len) < 0) + return -EINVAL; + + for (nlen = 0; nlen < dlen; nlen++) { + int c = utf8byte(&cur); + + dest[nlen] = c; + if (!c) + return nlen; + if (c == -1) + break; + } + return -EINVAL; +} + +static int utf8_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur; + int c; + unsigned long hash = init_name_hash(salt); + + if (utf8ncursor(&cur, data, str->name, str->len) < 0) + return -EINVAL; + + while ((c = utf8byte(&cur))) { + if (c < 0) + return -EINVAL; + hash = partial_name_hash((unsigned char)c, hash); + } + str->hash = end_name_hash(hash); + return 0; +} + +static int utf8_normalize(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) +{ + const struct utf8data *data = utf8nfdi(um->version); + struct utf8cursor cur; + ssize_t nlen = 0; + + if (utf8ncursor(&cur, data, str->name, str->len) < 0) + return -EINVAL; + + for (nlen = 0; nlen < dlen; nlen++) { + int c = utf8byte(&cur); + + dest[nlen] = c; + if (!c) + return nlen; + if (c == -1) + break; + } + return -EINVAL; +} + +static int utf8_parse_version(const char *version, unsigned int *maj, + unsigned int *min, unsigned int *rev) +{ + substring_t args[3]; + char version_string[12]; + static const struct match_token token[] = { + {1, "%d.%d.%d"}, + {0, NULL} + }; + + int ret = strscpy(version_string, version, sizeof(version_string)); + + if (ret < 0) + return ret; + + if (match_token(version_string, token, args) != 1) + return -EINVAL; + + if (match_int(&args[0], maj) || match_int(&args[1], min) || + match_int(&args[2], rev)) + return -EINVAL; + + return 0; +} + +static struct unicode_map *utf8_load(const char *version) +{ + struct unicode_map *um = NULL; + int unicode_version; + + if (version) { + unsigned int maj, min, rev; + + if (utf8_parse_version(version, &maj, &min, &rev) < 0) + return ERR_PTR(-EINVAL); + + if (!utf8version_is_supported(maj, min, rev)) + return ERR_PTR(-EINVAL); + + unicode_version = UNICODE_AGE(maj, min, rev); + } else { + unicode_version = utf8version_latest(); + pr_warn("UTF-8 version not specified. Assuming latest supported version (%d.%d.%d).", + (unicode_version >> 16) & 0xff, + (unicode_version >> 8) & 0xff, + (unicode_version & 0xfe)); + } + + um = kzalloc(sizeof(*um), GFP_KERNEL); + if (!um) + return ERR_PTR(-ENOMEM); + + um->charset = "UTF-8"; + um->version = unicode_version; + + return um; +} + +static int __init utf8_init(void) +{ + static_call_update(unicode_validate_static_call, utf8_validate); + static_call_update(unicode_strncmp_static_call, utf8_strncmp); + static_call_update(unicode_strncasecmp_static_call, utf8_strncasecmp); + static_call_update(unicode_strncasecmp_folded_static_call, + utf8_strncasecmp_folded); + static_call_update(unicode_normalize_static_call, utf8_normalize); + static_call_update(unicode_casefold_static_call, utf8_casefold); + static_call_update(unicode_casefold_hash_static_call, + utf8_casefold_hash); + static_call_update(unicode_load_static_call, utf8_load); + + unicode_register(THIS_MODULE); + return 0; +} + +static void __exit utf8_exit(void) +{ + static_call_update(unicode_validate_static_call, + unicode_validate_default); + static_call_update(unicode_strncmp_static_call, unicode_strncmp_default); + static_call_update(unicode_strncasecmp_static_call, + unicode_strncasecmp_default); + static_call_update(unicode_strncasecmp_folded_static_call, + unicode_strncasecmp_folded_default); + static_call_update(unicode_normalize_static_call, + unicode_normalize_default); + static_call_update(unicode_casefold_static_call, + unicode_casefold_default); + static_call_update(unicode_casefold_hash_static_call, + unicode_casefold_hash_default); + static_call_update(unicode_load_static_call, unicode_load_default); + + unicode_unregister(); +} + +module_init(utf8_init); +module_exit(utf8_exit); + +MODULE_LICENSE("GPL v2"); diff --git a/include/linux/unicode.h b/include/linux/unicode.h index de23f9ee720b..0b157c6830c6 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -4,33 +4,101 @@ #include #include +#include + struct unicode_map { const char *charset; int version; }; -int unicode_validate(const struct unicode_map *um, const struct qstr *str); +int unicode_validate_default(const struct unicode_map *um, + const struct qstr *str); + +int unicode_strncmp_default(const struct unicode_map *um, + const struct qstr *s1, + const struct qstr *s2); + +int unicode_strncasecmp_default(const struct unicode_map *um, + const struct qstr *s1, + const struct qstr *s2); + +int unicode_strncasecmp_folded_default(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1); + +int unicode_normalize_default(const struct unicode_map *um, + const struct qstr *str, + unsigned char *dest, size_t dlen); + +int unicode_casefold_default(const struct unicode_map *um, + const struct qstr *str, + unsigned char *dest, size_t dlen); + +int unicode_casefold_hash_default(const struct unicode_map *um, + const void *salt, struct qstr *str); -int unicode_strncmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2); +struct unicode_map *unicode_load_default(const char *version); -int unicode_strncasecmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2); -int unicode_strncasecmp_folded(const struct unicode_map *um, - const struct qstr *cf, - const struct qstr *s1); +DECLARE_STATIC_CALL(unicode_validate_static_call, unicode_validate_default); +DECLARE_STATIC_CALL(unicode_strncmp_static_call, unicode_strncmp_default); +DECLARE_STATIC_CALL(unicode_strncasecmp_static_call, + unicode_strncasecmp_default); +DECLARE_STATIC_CALL(unicode_strncasecmp_folded_static_call, + unicode_strncasecmp_folded_default); +DECLARE_STATIC_CALL(unicode_normalize_static_call, unicode_normalize_default); +DECLARE_STATIC_CALL(unicode_casefold_static_call, unicode_casefold_default); +DECLARE_STATIC_CALL(unicode_casefold_hash_static_call, + unicode_casefold_hash_default); +DECLARE_STATIC_CALL(unicode_load_static_call, unicode_load_default); -int unicode_normalize(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen); -int unicode_casefold(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen); +static inline int unicode_validate(const struct unicode_map *um, const struct qstr *str) +{ + return static_call(unicode_validate_static_call)(um, str); +} -int unicode_casefold_hash(const struct unicode_map *um, const void *salt, - struct qstr *str); +static inline int unicode_strncmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) +{ + return static_call(unicode_strncmp_static_call)(um, s1, s2); +} + +static inline int unicode_strncasecmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) +{ + return static_call(unicode_strncasecmp_static_call)(um, s1, s2); +} + +static inline int unicode_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1) +{ + return static_call(unicode_strncasecmp_folded_static_call)(um, cf, s1); +} + +static inline int unicode_normalize(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) +{ + return static_call(unicode_normalize_static_call)(um, str, dest, dlen); +} + +static inline int unicode_casefold(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) +{ + return static_call(unicode_casefold_static_call)(um, str, dest, dlen); +} + +static inline int unicode_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str) +{ + return static_call(unicode_casefold_hash_static_call)(um, salt, str); +} struct unicode_map *unicode_load(const char *version); void unicode_unload(struct unicode_map *um); +void unicode_register(struct module *owner); +void unicode_unregister(void); + #endif /* _LINUX_UNICODE_H */