From patchwork Wed Aug 18 14:06:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12444457 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B793C432BE for ; Wed, 18 Aug 2021 14:09:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0CCC6610A1 for ; Wed, 18 Aug 2021 14:09:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238320AbhHROKG (ORCPT ); Wed, 18 Aug 2021 10:10:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237786AbhHROKE (ORCPT ); Wed, 18 Aug 2021 10:10:04 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16A94C061764; Wed, 18 Aug 2021 07:09:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=cibSNcvbBNy3d99J0xcybasLYRaPm2YAac4xI0ywMu8=; b=dlWPccbA45UR38UUdOxp+hXJ6w k6xHpfeWjanPnLTGesnuRrC3bHSxpNX/PaH/BecYbfmTwtflh1+SKGdZcgHlwoOUWnfwS+QJEWLAJ 2UgSTJSSgmxNuzobdkYK0nSL66Q8eB8PvsYAHQfv0aMC4zwVMQ1/CuOVNrOIfkeYz2TfGj5wGB3jV T+xRI+oCQZB+H+vTJw0TYgFzT4yy6pSb4ScX+H8xP5achbWhvZia0Zom9WSkg0NOnclkRFfaaGAXT MSOQPRxTqS4K6//RXhEwyAaeum4IKnrAYqqhdz5p0InvM23dKMsNFoXwhDrvd7WdAOelONE+lX613 KEDaOecA==; Received: from [2001:4bb8:188:1b1:5a9e:9f39:5a86:b20c] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGMEZ-003uKw-LT; Wed, 18 Aug 2021 14:08:35 +0000 From: Christoph Hellwig To: Gabriel Krisman Bertazi Cc: Shreeya Patel , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: [PATCH 01/11] ext4: simplify ext4_sb_read_encoding Date: Wed, 18 Aug 2021 16:06:41 +0200 Message-Id: <20210818140651.17181-2-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818140651.17181-1-hch@lst.de> References: <20210818140651.17181-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Return the encoding table as the return value instead of as an argument, and don't bother with the encoding flags as the caller can handle that trivially. Signed-off-by: Christoph Hellwig Reviewed-by: Gabriel Krisman Bertazi Acked-by: Theodore Ts'o --- fs/ext4/super.c | 21 +++++++-------------- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index dfa09a277b56..a68be582bba5 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2021,24 +2021,17 @@ static const struct ext4_sb_encodings { {EXT4_ENC_UTF8_12_1, "utf8", "12.1.0"}, }; -static int ext4_sb_read_encoding(const struct ext4_super_block *es, - const struct ext4_sb_encodings **encoding, - __u16 *flags) +static const struct ext4_sb_encodings * +ext4_sb_read_encoding(const struct ext4_super_block *es) { __u16 magic = le16_to_cpu(es->s_encoding); int i; for (i = 0; i < ARRAY_SIZE(ext4_sb_encoding_map); i++) if (magic == ext4_sb_encoding_map[i].magic) - break; - - if (i >= ARRAY_SIZE(ext4_sb_encoding_map)) - return -EINVAL; + return &ext4_sb_encoding_map[i]; - *encoding = &ext4_sb_encoding_map[i]; - *flags = le16_to_cpu(es->s_encoding_flags); - - return 0; + return NULL; } #endif @@ -4303,10 +4296,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) if (ext4_has_feature_casefold(sb) && !sb->s_encoding) { const struct ext4_sb_encodings *encoding_info; struct unicode_map *encoding; - __u16 encoding_flags; + __u16 encoding_flags = le16_to_cpu(es->s_encoding_flags); - if (ext4_sb_read_encoding(es, &encoding_info, - &encoding_flags)) { + encoding_info = ext4_sb_read_encoding(es); + if (!encoding_info) { ext4_msg(sb, KERN_ERR, "Encoding requested by superblock is unknown"); goto failed_mount; From patchwork Wed Aug 18 14:06:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12444513 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 143B2C4320A for ; Wed, 18 Aug 2021 14:10:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E7C8E61076 for ; Wed, 18 Aug 2021 14:10:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238199AbhHROLL (ORCPT ); Wed, 18 Aug 2021 10:11:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235675AbhHROLL (ORCPT ); Wed, 18 Aug 2021 10:11:11 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DDD0C061764; Wed, 18 Aug 2021 07:10:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=xF37HHtJ9cCyf/tD4T3D2NA1Hm2ffytUKSp7nde/n7w=; b=h+NlzP2kBp6K9ActIGzJh8bm2k kYmcKNHroU3KyqP11rWLE/BCZqKWE+OcAmdszJ0q15QKN2pKxomWbQBihuxshlMKYZc2TpzY5V9B2 qQU1e5HUaE6/y5kRj85z+QBLO/TOPFKLrazdp193mD3tFKh3NtItHTuaWQDHlIY9lOpJSKddf8SfD LbM4z2nIJ5weULEH+wyHBEnFwpy5CnASoEvLzYsA8J0xZFThSUXfaAH60IQ+bUe0/6XFV/eZFeNud tqqnpthQDt74bWvI83WdKTiw+ohwqwJqqEP0b8cutE71m+58/Mn3Umgwa04lALjYpTl5VF+sOe8W4 glrPutdA==; Received: from [2001:4bb8:188:1b1:5a9e:9f39:5a86:b20c] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGMG0-003uPq-K8; Wed, 18 Aug 2021 14:09:45 +0000 From: Christoph Hellwig To: Gabriel Krisman Bertazi Cc: Shreeya Patel , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: [PATCH 02/11] f2fs: simplify f2fs_sb_read_encoding Date: Wed, 18 Aug 2021 16:06:42 +0200 Message-Id: <20210818140651.17181-3-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818140651.17181-1-hch@lst.de> References: <20210818140651.17181-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Return the encoding table as the return value instead of as an argument, and don't bother with the encoding flags as the caller can handle that trivially. Signed-off-by: Christoph Hellwig Reviewed-by: Gabriel Krisman Bertazi Reviewed-by: Chao Yu --- fs/f2fs/super.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 8fecd3050ccd..af63ae009582 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -260,24 +260,17 @@ static const struct f2fs_sb_encodings { {F2FS_ENC_UTF8_12_1, "utf8", "12.1.0"}, }; -static int f2fs_sb_read_encoding(const struct f2fs_super_block *sb, - const struct f2fs_sb_encodings **encoding, - __u16 *flags) +static const struct f2fs_sb_encodings * +f2fs_sb_read_encoding(const struct f2fs_super_block *sb) { __u16 magic = le16_to_cpu(sb->s_encoding); int i; for (i = 0; i < ARRAY_SIZE(f2fs_sb_encoding_map); i++) if (magic == f2fs_sb_encoding_map[i].magic) - break; - - if (i >= ARRAY_SIZE(f2fs_sb_encoding_map)) - return -EINVAL; + return &f2fs_sb_encoding_map[i]; - *encoding = &f2fs_sb_encoding_map[i]; - *flags = le16_to_cpu(sb->s_encoding_flags); - - return 0; + return NULL; } struct kmem_cache *f2fs_cf_name_slab; @@ -3730,13 +3723,14 @@ static int f2fs_setup_casefold(struct f2fs_sb_info *sbi) struct unicode_map *encoding; __u16 encoding_flags; - if (f2fs_sb_read_encoding(sbi->raw_super, &encoding_info, - &encoding_flags)) { + encoding_info = f2fs_sb_read_encoding(sbi->raw_super); + if (!encoding_info) { f2fs_err(sbi, "Encoding requested by superblock is unknown"); return -EINVAL; } + encoding_flags = le16_to_cpu(sbi->raw_super->s_encoding_flags); encoding = utf8_load(encoding_info->version); if (IS_ERR(encoding)) { f2fs_err(sbi, From patchwork Wed Aug 18 14:06:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12444515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4326C4320A for ; Wed, 18 Aug 2021 14:11:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A502761076 for ; Wed, 18 Aug 2021 14:11:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238786AbhHROMW (ORCPT ); Wed, 18 Aug 2021 10:12:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238487AbhHROMV (ORCPT ); Wed, 18 Aug 2021 10:12:21 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 384C6C061764; Wed, 18 Aug 2021 07:11:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=W/CqL+1UTR7dYoG9/tA55jo9OSLvjLN7QCowW5Mp68U=; b=Ob7acBy/jQh0420cin743NdF+s UWZ1A9nQSG04dgymqMMAv+PEvVUwZSQkWsEiEZlHToY7VsbpEiPphSrmj8xK2ikXME/M/EmM1r6my pCKz4ad/Gh2bEQYXoOqR31skqwjgyfizT4+VatB5LbN++nNw9KuIJaVT2XgQsHmEfcyCWpoCtAXdY YpyvpxP1pTkR0QDadcDp/29hIqmgaQJvD3lphDl0dZPH5FsPNuoeniI10yvofWcGET5AkYcLMiesT cuxYLsBdGDfJTizYUHTUMigkijH4+m+O+4GwwYUJtb8R5MyU/JdOcycglLhCvYy25/klgsWiWNYre ar/bBM8A==; Received: from [2001:4bb8:188:1b1:5a9e:9f39:5a86:b20c] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGMH5-003uVF-4y; Wed, 18 Aug 2021 14:11:14 +0000 From: Christoph Hellwig To: Gabriel Krisman Bertazi Cc: Shreeya Patel , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: [PATCH 03/11] unicode: remove the charset field from struct unicode_map Date: Wed, 18 Aug 2021 16:06:43 +0200 Message-Id: <20210818140651.17181-4-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818140651.17181-1-hch@lst.de> References: <20210818140651.17181-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org It is hardcoded and only used for a f2fs sysfs file where it can be hardcoded just as easily. Signed-off-by: Christoph Hellwig Reviewed-by: Gabriel Krisman Bertazi --- fs/f2fs/sysfs.c | 3 +-- fs/unicode/utf8-core.c | 3 --- include/linux/unicode.h | 1 - 3 files changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index 6642246206bd..d9ecf75e445d 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -195,8 +195,7 @@ static ssize_t encoding_show(struct f2fs_attr *a, struct super_block *sb = sbi->sb; if (f2fs_sb_has_casefold(sbi)) - return snprintf(buf, PAGE_SIZE, "%s (%d.%d.%d)\n", - sb->s_encoding->charset, + return snprintf(buf, PAGE_SIZE, "UTF-8 (%d.%d.%d)\n", (sb->s_encoding->version >> 16) & 0xff, (sb->s_encoding->version >> 8) & 0xff, sb->s_encoding->version & 0xff); diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index dc25823bfed9..86f42a078d99 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -219,10 +219,7 @@ struct unicode_map *utf8_load(const char *version) um = kzalloc(sizeof(struct unicode_map), GFP_KERNEL); if (!um) return ERR_PTR(-ENOMEM); - - um->charset = "UTF-8"; um->version = unicode_version; - return um; } EXPORT_SYMBOL(utf8_load); diff --git a/include/linux/unicode.h b/include/linux/unicode.h index 74484d44c755..6a392cd9f076 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -6,7 +6,6 @@ #include struct unicode_map { - const char *charset; int version; }; From patchwork Wed Aug 18 14:06:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12444561 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6018FC43214 for ; Wed, 18 Aug 2021 14:16:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4E8CE61051 for ; Wed, 18 Aug 2021 14:16:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238957AbhHROQm (ORCPT ); Wed, 18 Aug 2021 10:16:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239995AbhHROOu (ORCPT ); Wed, 18 Aug 2021 10:14:50 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 160F7C061158; Wed, 18 Aug 2021 07:13:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=z4MCNHhCspLoQHSiqkbpMAHembWOXPJmB4CL3aOqh6g=; b=COcvMN76XdyCfAhgJuQ6dMLnms g2IC3e0WnV44+jc6XPYsI2NMnGfT7OMvlRZWG4Cb99F62uWbMbl1iUGWNqX0UH4RYk0nbaWDFzCq6 mOB2nUeAmCS07Vir/rd8JTwGFshFfRRHQ44JW6pWbwPgCcMzspHj/JZYZ6XFJRuI6AVyZEV6m8aAG SgeTDqVXx9vg8onhQra9ltQA8brZFPxwYyoRILGm0keb3gtJeZ/pGx0oNu+5Im7s1sSQBhURkBUZ3 EeeLqXbCKT4rzRIYDlKZXnB1D1ElKy433b2PcBXyH8WgVx6fJ+JR93nORvhtY9uUCpT3nXQ5G6L+g rQ7wDHwQ==; Received: from [2001:4bb8:188:1b1:5a9e:9f39:5a86:b20c] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGMIE-003uZE-Pc; Wed, 18 Aug 2021 14:12:18 +0000 From: Christoph Hellwig To: Gabriel Krisman Bertazi Cc: Shreeya Patel , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: [PATCH 04/11] unicode: mark the version field in struct unicode_map unsigned Date: Wed, 18 Aug 2021 16:06:44 +0200 Message-Id: <20210818140651.17181-5-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818140651.17181-1-hch@lst.de> References: <20210818140651.17181-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org unicode version tripplets are always unsigned. Signed-off-by: Christoph Hellwig Reviewed-by: Gabriel Krisman Bertazi --- include/linux/unicode.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/unicode.h b/include/linux/unicode.h index 6a392cd9f076..0744f81c4b5f 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -6,7 +6,7 @@ #include struct unicode_map { - int version; + unsigned int version; }; int utf8_validate(const struct unicode_map *um, const struct qstr *str); From patchwork Wed Aug 18 14:06:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12444563 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 991D9C4338F for ; Wed, 18 Aug 2021 14:18:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 81243610C7 for ; Wed, 18 Aug 2021 14:18:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239959AbhHROTE (ORCPT ); Wed, 18 Aug 2021 10:19:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47178 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240043AbhHRORQ (ORCPT ); Wed, 18 Aug 2021 10:17:16 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6900BC06121D; Wed, 18 Aug 2021 07:14:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=H/WqWK1cJ0GtgtFLFrx2fxpNc4slPIfD/lu1pq4V1XU=; b=hgzM3x73kVdu+XyZ5Xa7LLeE5x Uu1HqlooxiYejM8VZ0nlxlbe9IzvCDoVCZyoQXQl3YZyqePo5hWvaH28IoL0RWIaW5QtWR7mLxN2f T/1P/QiJB3b/HvqlBC9DzM+G5cGbRcC6lLfmrgotwBDVmohJhCh4eHhdrPycEajtcjM2OdsY4EBJ7 z1crpfXWNP6n1gzl9A1Uj8oqcbMO0QPtgtOW5GNfbbTR8Lbye5kabEgTxzmEunykEaHgJBS7sIf5O 2BjwYXKRJ15/8hvmHzB/lUk3ZBiY59xBAO22eefywFMOmCpygal4r5YowlDM3D9tqwiDMlg2DoAJK j0hxCZJw==; Received: from [2001:4bb8:188:1b1:5a9e:9f39:5a86:b20c] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGMJx-003ueO-Pa; Wed, 18 Aug 2021 14:13:53 +0000 From: Christoph Hellwig To: Gabriel Krisman Bertazi Cc: Shreeya Patel , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: [PATCH 05/11] unicode: pass a UNICODE_AGE() tripple to utf8_load Date: Wed, 18 Aug 2021 16:06:45 +0200 Message-Id: <20210818140651.17181-6-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818140651.17181-1-hch@lst.de> References: <20210818140651.17181-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Don't bother with pointless string parsing when the caller can just pass the version in the format that the core expects. Also remove the fallback to the latest version that none of the callers actually uses. Signed-off-by: Christoph Hellwig Reviewed-by: Gabriel Krisman Bertazi --- fs/ext4/super.c | 10 ++++---- fs/f2fs/super.c | 10 ++++---- fs/unicode/utf8-core.c | 50 ++++---------------------------------- fs/unicode/utf8-norm.c | 11 ++------- fs/unicode/utf8-selftest.c | 15 ++++++------ fs/unicode/utf8n.h | 14 ++--------- include/linux/unicode.h | 11 ++++++++- 7 files changed, 37 insertions(+), 84 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a68be582bba5..be418a30b52e 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2016,9 +2016,9 @@ static const struct mount_opts { static const struct ext4_sb_encodings { __u16 magic; char *name; - char *version; + unsigned int version; } ext4_sb_encoding_map[] = { - {EXT4_ENC_UTF8_12_1, "utf8", "12.1.0"}, + {EXT4_ENC_UTF8_12_1, "utf8", UNICODE_AGE(12, 1, 0)}, }; static const struct ext4_sb_encodings * @@ -4308,15 +4308,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) encoding = utf8_load(encoding_info->version); if (IS_ERR(encoding)) { ext4_msg(sb, KERN_ERR, - "can't mount with superblock charset: %s-%s " + "can't mount with superblock charset: %s-0x%x " "not supported by the kernel. flags: 0x%x.", encoding_info->name, encoding_info->version, encoding_flags); goto failed_mount; } ext4_msg(sb, KERN_INFO,"Using encoding defined by superblock: " - "%s-%s with flags 0x%hx", encoding_info->name, - encoding_info->version?:"\b", encoding_flags); + "%s-0x%x with flags 0x%hx", encoding_info->name, + encoding_info->version, encoding_flags); sb->s_encoding = encoding; sb->s_encoding_flags = encoding_flags; diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index af63ae009582..d107480f88a3 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -255,9 +255,9 @@ void f2fs_printk(struct f2fs_sb_info *sbi, const char *fmt, ...) static const struct f2fs_sb_encodings { __u16 magic; char *name; - char *version; + unsigned int version; } f2fs_sb_encoding_map[] = { - {F2FS_ENC_UTF8_12_1, "utf8", "12.1.0"}, + {F2FS_ENC_UTF8_12_1, "utf8", UNICODE_AGE(12, 1, 0)}, }; static const struct f2fs_sb_encodings * @@ -3734,15 +3734,15 @@ static int f2fs_setup_casefold(struct f2fs_sb_info *sbi) encoding = utf8_load(encoding_info->version); if (IS_ERR(encoding)) { f2fs_err(sbi, - "can't mount with superblock charset: %s-%s " + "can't mount with superblock charset: %s-0x%x " "not supported by the kernel. flags: 0x%x.", encoding_info->name, encoding_info->version, encoding_flags); return PTR_ERR(encoding); } f2fs_info(sbi, "Using encoding defined by superblock: " - "%s-%s with flags 0x%hx", encoding_info->name, - encoding_info->version?:"\b", encoding_flags); + "%s-0x%x with flags 0x%hx", encoding_info->name, + encoding_info->version, encoding_flags); sbi->sb->s_encoding = encoding; sbi->sb->s_encoding_flags = encoding_flags; diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index 86f42a078d99..dca2865c3bee 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -167,59 +167,19 @@ int utf8_normalize(const struct unicode_map *um, const struct qstr *str, } return -EINVAL; } - EXPORT_SYMBOL(utf8_normalize); -static int utf8_parse_version(const char *version, unsigned int *maj, - unsigned int *min, unsigned int *rev) +struct unicode_map *utf8_load(unsigned int version) { - substring_t args[3]; - char version_string[12]; - static const struct match_token token[] = { - {1, "%d.%d.%d"}, - {0, NULL} - }; - - strncpy(version_string, version, sizeof(version_string)); - - if (match_token(version_string, token, args) != 1) - return -EINVAL; - - if (match_int(&args[0], maj) || match_int(&args[1], min) || - match_int(&args[2], rev)) - return -EINVAL; + struct unicode_map *um; - return 0; -} - -struct unicode_map *utf8_load(const char *version) -{ - struct unicode_map *um = NULL; - int unicode_version; - - if (version) { - unsigned int maj, min, rev; - - if (utf8_parse_version(version, &maj, &min, &rev) < 0) - return ERR_PTR(-EINVAL); - - if (!utf8version_is_supported(maj, min, rev)) - return ERR_PTR(-EINVAL); - - unicode_version = UNICODE_AGE(maj, min, rev); - } else { - unicode_version = utf8version_latest(); - printk(KERN_WARNING"UTF-8 version not specified. " - "Assuming latest supported version (%d.%d.%d).", - (unicode_version >> 16) & 0xff, - (unicode_version >> 8) & 0xff, - (unicode_version & 0xff)); - } + if (!utf8version_is_supported(version)) + return ERR_PTR(-EINVAL); um = kzalloc(sizeof(struct unicode_map), GFP_KERNEL); if (!um) return ERR_PTR(-ENOMEM); - um->version = unicode_version; + um->version = version; return um; } EXPORT_SYMBOL(utf8_load); diff --git a/fs/unicode/utf8-norm.c b/fs/unicode/utf8-norm.c index 1d2d2e5b906a..12abf89ae6ec 100644 --- a/fs/unicode/utf8-norm.c +++ b/fs/unicode/utf8-norm.c @@ -15,13 +15,12 @@ struct utf8data { #include "utf8data.h" #undef __INCLUDED_FROM_UTF8NORM_C__ -int utf8version_is_supported(u8 maj, u8 min, u8 rev) +int utf8version_is_supported(unsigned int version) { int i = ARRAY_SIZE(utf8agetab) - 1; - unsigned int sb_utf8version = UNICODE_AGE(maj, min, rev); while (i >= 0 && utf8agetab[i] != 0) { - if (sb_utf8version == utf8agetab[i]) + if (version == utf8agetab[i]) return 1; i--; } @@ -29,12 +28,6 @@ int utf8version_is_supported(u8 maj, u8 min, u8 rev) } EXPORT_SYMBOL(utf8version_is_supported); -int utf8version_latest(void) -{ - return utf8vers; -} -EXPORT_SYMBOL(utf8version_latest); - /* * UTF-8 valid ranges. * diff --git a/fs/unicode/utf8-selftest.c b/fs/unicode/utf8-selftest.c index 6fe8af7edccb..37f33890e012 100644 --- a/fs/unicode/utf8-selftest.c +++ b/fs/unicode/utf8-selftest.c @@ -235,7 +235,7 @@ static void check_utf8_nfdicf(void) static void check_utf8_comparisons(void) { int i; - struct unicode_map *table = utf8_load("12.1.0"); + struct unicode_map *table = utf8_load(UNICODE_AGE(12, 1, 0)); if (IS_ERR(table)) { pr_err("%s: Unable to load utf8 %d.%d.%d. Skipping.\n", @@ -269,18 +269,19 @@ static void check_utf8_comparisons(void) static void check_supported_versions(void) { /* Unicode 7.0.0 should be supported. */ - test(utf8version_is_supported(7, 0, 0)); + test(utf8version_is_supported(UNICODE_AGE(7, 0, 0))); /* Unicode 9.0.0 should be supported. */ - test(utf8version_is_supported(9, 0, 0)); + test(utf8version_is_supported(UNICODE_AGE(9, 0, 0))); /* Unicode 1x.0.0 (the latest version) should be supported. */ - test(utf8version_is_supported(latest_maj, latest_min, latest_rev)); + test(utf8version_is_supported( + UNICODE_AGE(latest_maj, latest_min, latest_rev))); /* Next versions don't exist. */ - test(!utf8version_is_supported(13, 0, 0)); - test(!utf8version_is_supported(0, 0, 0)); - test(!utf8version_is_supported(-1, -1, -1)); + test(!utf8version_is_supported(UNICODE_AGE(13, 0, 0))); + test(!utf8version_is_supported(UNICODE_AGE(0, 0, 0))); + test(!utf8version_is_supported(UNICODE_AGE(-1, -1, -1))); } static int __init init_test_ucd(void) diff --git a/fs/unicode/utf8n.h b/fs/unicode/utf8n.h index 0acd530c2c79..85a7bebf6927 100644 --- a/fs/unicode/utf8n.h +++ b/fs/unicode/utf8n.h @@ -11,19 +11,9 @@ #include #include #include +#include -/* Encoding a unicode version number as a single unsigned int. */ -#define UNICODE_MAJ_SHIFT (16) -#define UNICODE_MIN_SHIFT (8) - -#define UNICODE_AGE(MAJ, MIN, REV) \ - (((unsigned int)(MAJ) << UNICODE_MAJ_SHIFT) | \ - ((unsigned int)(MIN) << UNICODE_MIN_SHIFT) | \ - ((unsigned int)(REV))) - -/* Highest unicode version supported by the data tables. */ -extern int utf8version_is_supported(u8 maj, u8 min, u8 rev); -extern int utf8version_latest(void); +int utf8version_is_supported(unsigned int version); /* * Look for the correct const struct utf8data for a unicode version. diff --git a/include/linux/unicode.h b/include/linux/unicode.h index 0744f81c4b5f..a5cc44a8f90c 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -5,6 +5,15 @@ #include #include +/* Encoding a unicode version number as a single unsigned int. */ +#define UNICODE_MAJ_SHIFT (16) +#define UNICODE_MIN_SHIFT (8) + +#define UNICODE_AGE(MAJ, MIN, REV) \ + (((unsigned int)(MAJ) << UNICODE_MAJ_SHIFT) | \ + ((unsigned int)(MIN) << UNICODE_MIN_SHIFT) | \ + ((unsigned int)(REV))) + struct unicode_map { unsigned int version; }; @@ -29,7 +38,7 @@ int utf8_casefold(const struct unicode_map *um, const struct qstr *str, int utf8_casefold_hash(const struct unicode_map *um, const void *salt, struct qstr *str); -struct unicode_map *utf8_load(const char *version); +struct unicode_map *utf8_load(unsigned int version); void utf8_unload(struct unicode_map *um); #endif /* _LINUX_UNICODE_H */ From patchwork Wed Aug 18 14:06:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12444571 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3715C4338F for ; Wed, 18 Aug 2021 14:22:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8A88660560 for ; Wed, 18 Aug 2021 14:22:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239412AbhHROWw (ORCPT ); Wed, 18 Aug 2021 10:22:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239496AbhHROWj (ORCPT ); Wed, 18 Aug 2021 10:22:39 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 642E0C0612E7; Wed, 18 Aug 2021 07:16:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=mg81CqRAZmZfDj14bc3WrVlEByj2jtQtmklitDTRhOU=; b=OTlOyB9WwcjLOvQvfh7xU2KCUc tiXAXSH66WT8kGH44M8eINhTDMYtmJCHyyzRr2MrVCIab9nXUGi25HOz5/J/fWaRIgT4orvVFk0aB ZQ4UsHzlhNnqydnsFLOYK2eZv3k9mQzxTWhVSRxHj9Qm2GtE1lH/YzxlQV4iFN/svLnRij9xvpwQv XZiUZaduenhMJfWN++HX0TCCsGuhlG3phL048CHZ9DNttvZXVss4lgOEvj4NRNDdbop1sP/wyZEng phL/yPMKJ1qmKZZ5mfW2xeCsX45unEKwDQIp8V4wNlVBAPqtwzZy21i+lVOEPzRwbHyRkZfipqmPL mO20Txyg==; Received: from [2001:4bb8:188:1b1:5a9e:9f39:5a86:b20c] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGMLF-003uhx-Uo; Wed, 18 Aug 2021 14:15:26 +0000 From: Christoph Hellwig To: Gabriel Krisman Bertazi Cc: Shreeya Patel , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: [PATCH 06/11] unicode: remove the unused utf8{,n}age{min,max} functions Date: Wed, 18 Aug 2021 16:06:46 +0200 Message-Id: <20210818140651.17181-7-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818140651.17181-1-hch@lst.de> References: <20210818140651.17181-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org No actually used anywhere. Signed-off-by: Christoph Hellwig --- fs/unicode/utf8-norm.c | 113 ----------------------------------------- fs/unicode/utf8n.h | 16 ------ 2 files changed, 129 deletions(-) diff --git a/fs/unicode/utf8-norm.c b/fs/unicode/utf8-norm.c index 12abf89ae6ec..4b1b53391ce4 100644 --- a/fs/unicode/utf8-norm.c +++ b/fs/unicode/utf8-norm.c @@ -391,119 +391,6 @@ static utf8leaf_t *utf8lookup(const struct utf8data *data, return utf8nlookup(data, hangul, s, (size_t)-1); } -/* - * Maximum age of any character in s. - * Return -1 if s is not valid UTF-8 unicode. - * Return 0 if only non-assigned code points are used. - */ -int utf8agemax(const struct utf8data *data, const char *s) -{ - utf8leaf_t *leaf; - int age = 0; - int leaf_age; - unsigned char hangul[UTF8HANGULLEAF]; - - if (!data) - return -1; - - while (*s) { - leaf = utf8lookup(data, hangul, s); - if (!leaf) - return -1; - - leaf_age = utf8agetab[LEAF_GEN(leaf)]; - if (leaf_age <= data->maxage && leaf_age > age) - age = leaf_age; - s += utf8clen(s); - } - return age; -} -EXPORT_SYMBOL(utf8agemax); - -/* - * Minimum age of any character in s. - * Return -1 if s is not valid UTF-8 unicode. - * Return 0 if non-assigned code points are used. - */ -int utf8agemin(const struct utf8data *data, const char *s) -{ - utf8leaf_t *leaf; - int age; - int leaf_age; - unsigned char hangul[UTF8HANGULLEAF]; - - if (!data) - return -1; - age = data->maxage; - while (*s) { - leaf = utf8lookup(data, hangul, s); - if (!leaf) - return -1; - leaf_age = utf8agetab[LEAF_GEN(leaf)]; - if (leaf_age <= data->maxage && leaf_age < age) - age = leaf_age; - s += utf8clen(s); - } - return age; -} -EXPORT_SYMBOL(utf8agemin); - -/* - * Maximum age of any character in s, touch at most len bytes. - * Return -1 if s is not valid UTF-8 unicode. - */ -int utf8nagemax(const struct utf8data *data, const char *s, size_t len) -{ - utf8leaf_t *leaf; - int age = 0; - int leaf_age; - unsigned char hangul[UTF8HANGULLEAF]; - - if (!data) - return -1; - - while (len && *s) { - leaf = utf8nlookup(data, hangul, s, len); - if (!leaf) - return -1; - leaf_age = utf8agetab[LEAF_GEN(leaf)]; - if (leaf_age <= data->maxage && leaf_age > age) - age = leaf_age; - len -= utf8clen(s); - s += utf8clen(s); - } - return age; -} -EXPORT_SYMBOL(utf8nagemax); - -/* - * Maximum age of any character in s, touch at most len bytes. - * Return -1 if s is not valid UTF-8 unicode. - */ -int utf8nagemin(const struct utf8data *data, const char *s, size_t len) -{ - utf8leaf_t *leaf; - int leaf_age; - int age; - unsigned char hangul[UTF8HANGULLEAF]; - - if (!data) - return -1; - age = data->maxage; - while (len && *s) { - leaf = utf8nlookup(data, hangul, s, len); - if (!leaf) - return -1; - leaf_age = utf8agetab[LEAF_GEN(leaf)]; - if (leaf_age <= data->maxage && leaf_age < age) - age = leaf_age; - len -= utf8clen(s); - s += utf8clen(s); - } - return age; -} -EXPORT_SYMBOL(utf8nagemin); - /* * Length of the normalization of s. * Return -1 if s is not valid UTF-8 unicode. diff --git a/fs/unicode/utf8n.h b/fs/unicode/utf8n.h index 85a7bebf6927..e4c8a767cf7a 100644 --- a/fs/unicode/utf8n.h +++ b/fs/unicode/utf8n.h @@ -33,22 +33,6 @@ int utf8version_is_supported(unsigned int version); extern const struct utf8data *utf8nfdi(unsigned int maxage); extern const struct utf8data *utf8nfdicf(unsigned int maxage); -/* - * Determine the maximum age of any unicode character in the string. - * Returns 0 if only unassigned code points are present. - * Returns -1 if the input is not valid UTF-8. - */ -extern int utf8agemax(const struct utf8data *data, const char *s); -extern int utf8nagemax(const struct utf8data *data, const char *s, size_t len); - -/* - * Determine the minimum age of any unicode character in the string. - * Returns 0 if any unassigned code points are present. - * Returns -1 if the input is not valid UTF-8. - */ -extern int utf8agemin(const struct utf8data *data, const char *s); -extern int utf8nagemin(const struct utf8data *data, const char *s, size_t len); - /* * Determine the length of the normalized from of the string, * excluding any terminating NULL byte. From patchwork Wed Aug 18 14:06:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12444573 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54D28C4320A for ; Wed, 18 Aug 2021 14:22:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3E46F60560 for ; Wed, 18 Aug 2021 14:22:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239220AbhHROXY (ORCPT ); Wed, 18 Aug 2021 10:23:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239433AbhHROXS (ORCPT ); Wed, 18 Aug 2021 10:23:18 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95E14C07E5E0; Wed, 18 Aug 2021 07:18:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=LK8VlrRcKR0O2tJY3isCs3pM/e17RyWbKPDimZLEThA=; b=SaqTl615icihQdyj6cH4ZebOOY CqeeoB06hXy/5ZM5skyZ78sQVNip29QpdM9j85yX/SFLsWLpUAcoPHIU/Gqt5VV7SJBuNhPZfUJ8Q g9hTb0kCU+dKAcmGLsmpywLzdu+EvYilR0Ivo5PWtGU56Uj2FFBlXzonIV0iHe2SWXyCdKOw3gpLI uP60aLYIigfzV5uKBoVLPrhGQjKRmmpKqHXiW8k9Bb6phAzpp7LsS2+kShH8tQ5ZEKNLwhc14+JP4 I8P1rMGI+rnia2acIGH7zrH2dNt41qTZuaZ0Umdzua6iBCC8Kg+NQ1cMduVzkuNyK1idOLtn1pdyN 2XLJzoIw==; Received: from [2001:4bb8:188:1b1:5a9e:9f39:5a86:b20c] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGMN1-003un9-7F; Wed, 18 Aug 2021 14:17:03 +0000 From: Christoph Hellwig To: Gabriel Krisman Bertazi Cc: Shreeya Patel , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: [PATCH 07/11] unicode: simplify utf8len Date: Wed, 18 Aug 2021 16:06:47 +0200 Message-Id: <20210818140651.17181-8-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818140651.17181-1-hch@lst.de> References: <20210818140651.17181-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Just use the utf8nlen implementation with a (size_t)-1 len argument, similar to utf8_lookup. Also move the function to utf8-selftest.c, as it isn't used anywhere else. Signed-off-by: Christoph Hellwig --- fs/unicode/utf8-norm.c | 30 ------------------------------ fs/unicode/utf8-selftest.c | 5 +++++ fs/unicode/utf8n.h | 1 - 3 files changed, 5 insertions(+), 31 deletions(-) diff --git a/fs/unicode/utf8-norm.c b/fs/unicode/utf8-norm.c index 4b1b53391ce4..348d6e97553f 100644 --- a/fs/unicode/utf8-norm.c +++ b/fs/unicode/utf8-norm.c @@ -391,36 +391,6 @@ static utf8leaf_t *utf8lookup(const struct utf8data *data, return utf8nlookup(data, hangul, s, (size_t)-1); } -/* - * Length of the normalization of s. - * Return -1 if s is not valid UTF-8 unicode. - * - * A string of Default_Ignorable_Code_Point has length 0. - */ -ssize_t utf8len(const struct utf8data *data, const char *s) -{ - utf8leaf_t *leaf; - size_t ret = 0; - unsigned char hangul[UTF8HANGULLEAF]; - - if (!data) - return -1; - while (*s) { - leaf = utf8lookup(data, hangul, s); - if (!leaf) - return -1; - if (utf8agetab[LEAF_GEN(leaf)] > data->maxage) - ret += utf8clen(s); - else if (LEAF_CCC(leaf) == DECOMPOSE) - ret += strlen(LEAF_STR(leaf)); - else - ret += utf8clen(s); - s += utf8clen(s); - } - return ret; -} -EXPORT_SYMBOL(utf8len); - /* * Length of the normalization of s, touch at most len bytes. * Return -1 if s is not valid UTF-8 unicode. diff --git a/fs/unicode/utf8-selftest.c b/fs/unicode/utf8-selftest.c index 37f33890e012..80fb7c75acb2 100644 --- a/fs/unicode/utf8-selftest.c +++ b/fs/unicode/utf8-selftest.c @@ -160,6 +160,11 @@ static const struct { } }; +static ssize_t utf8len(const struct utf8data *data, const char *s) +{ + return utf8nlen(data, s, (size_t)-1); +} + static void check_utf8_nfdi(void) { int i; diff --git a/fs/unicode/utf8n.h b/fs/unicode/utf8n.h index e4c8a767cf7a..41182e5464df 100644 --- a/fs/unicode/utf8n.h +++ b/fs/unicode/utf8n.h @@ -39,7 +39,6 @@ extern const struct utf8data *utf8nfdicf(unsigned int maxage); * Returns 0 if only ignorable code points are present. * Returns -1 if the input is not valid UTF-8. */ -extern ssize_t utf8len(const struct utf8data *data, const char *s); extern ssize_t utf8nlen(const struct utf8data *data, const char *s, size_t len); /* Needed in struct utf8cursor below. */ From patchwork Wed Aug 18 14:06:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12444575 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2BB6C4320E for ; Wed, 18 Aug 2021 14:23:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BADD0610E6 for ; Wed, 18 Aug 2021 14:23:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238707AbhHROXf (ORCPT ); Wed, 18 Aug 2021 10:23:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239554AbhHROXY (ORCPT ); Wed, 18 Aug 2021 10:23:24 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D628FC06122F; Wed, 18 Aug 2021 07:19:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=XRJcay04c4+HGP8hA0k71ugb0TyE1umxocLGpm5yt80=; b=iVm0ncHopjP/X9nEmDSlzFnERU JKbD0lCGfy7zwAjuiKHB3s5gAxtodQVEZDAjn+Gfn/4dm2VXdzxQhP8PWZ8Zoe4ReKH21Zv6sH1bW QmEvKCjp501xWmFSz51Bi1HpWBxQScfahXr/7jACyvcd5MpImceC7PByna31Z7Vh19bZGqJvXhnt2 xKBMfyWNHJ30rXvh0R9pZfwvfBa/Mzq2K/EVeiHRYbLWyrJpz8s15qSJdsGGh45/sZb+u/bJZ+/NG nkE/R1+VDD3bafiXiyn8PuW9kcnC7Tspn9v0MszsilEv545lvAJKyijYWBr8jaArsDB3DHCMMe2P1 mcmP3XxQ==; Received: from [2001:4bb8:188:1b1:5a9e:9f39:5a86:b20c] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGMOU-003uwG-4s; Wed, 18 Aug 2021 14:18:32 +0000 From: Christoph Hellwig To: Gabriel Krisman Bertazi Cc: Shreeya Patel , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: [PATCH 08/11] unicode: move utf8cursor to utf8-selftest.c Date: Wed, 18 Aug 2021 16:06:48 +0200 Message-Id: <20210818140651.17181-9-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818140651.17181-1-hch@lst.de> References: <20210818140651.17181-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Only used by the tests, so no need to keep it in the core. Signed-off-by: Christoph Hellwig --- fs/unicode/utf8-norm.c | 16 ---------------- fs/unicode/utf8-selftest.c | 6 ++++++ fs/unicode/utf8n.h | 2 -- 3 files changed, 6 insertions(+), 18 deletions(-) diff --git a/fs/unicode/utf8-norm.c b/fs/unicode/utf8-norm.c index 348d6e97553f..1ac90fa00070 100644 --- a/fs/unicode/utf8-norm.c +++ b/fs/unicode/utf8-norm.c @@ -456,22 +456,6 @@ int utf8ncursor(struct utf8cursor *u8c, const struct utf8data *data, } EXPORT_SYMBOL(utf8ncursor); -/* - * Set up an utf8cursor for use by utf8byte(). - * - * u8c : pointer to cursor. - * data : const struct utf8data to use for normalization. - * s : NUL-terminated string. - * - * Returns -1 on error, 0 on success. - */ -int utf8cursor(struct utf8cursor *u8c, const struct utf8data *data, - const char *s) -{ - return utf8ncursor(u8c, data, s, (unsigned int)-1); -} -EXPORT_SYMBOL(utf8cursor); - /* * Get one byte from the normalized form of the string described by u8c. * diff --git a/fs/unicode/utf8-selftest.c b/fs/unicode/utf8-selftest.c index 80fb7c75acb2..04628b50351d 100644 --- a/fs/unicode/utf8-selftest.c +++ b/fs/unicode/utf8-selftest.c @@ -165,6 +165,12 @@ static ssize_t utf8len(const struct utf8data *data, const char *s) return utf8nlen(data, s, (size_t)-1); } +static int utf8cursor(struct utf8cursor *u8c, const struct utf8data *data, + const char *s) +{ + return utf8ncursor(u8c, data, s, (unsigned int)-1); +} + static void check_utf8_nfdi(void) { int i; diff --git a/fs/unicode/utf8n.h b/fs/unicode/utf8n.h index 41182e5464df..736b6460a38c 100644 --- a/fs/unicode/utf8n.h +++ b/fs/unicode/utf8n.h @@ -65,8 +65,6 @@ struct utf8cursor { * Returns 0 on success. * Returns -1 on failure. */ -extern int utf8cursor(struct utf8cursor *u8c, const struct utf8data *data, - const char *s); extern int utf8ncursor(struct utf8cursor *u8c, const struct utf8data *data, const char *s, size_t len); From patchwork Wed Aug 18 14:06:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12444579 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 329B9C432BE for ; Wed, 18 Aug 2021 14:24:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 14466610CB for ; Wed, 18 Aug 2021 14:24:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238721AbhHROYh (ORCPT ); Wed, 18 Aug 2021 10:24:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239816AbhHROXj (ORCPT ); Wed, 18 Aug 2021 10:23:39 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC190C0363D0; Wed, 18 Aug 2021 07:20:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=SOc3r12UbZImhOQ8XI/e2k7Id25YwiW68M0eNb04T5g=; b=aVYPf0Yv7/DrYZZGznvRz4N3Zz MzuHRHjOp3Jfr63w2xHHOj93ep5ePbrEsZKIzwZCzQIoWan+SZjKbEtRcr1ygM0dB0gttYn40uBDB pXRAqbc3b/V24aAEZntF/I97VFA6eVmknwwnARiISK75FbdK/bMSkNW/Vo4IYLtTmss6FQlBO8qFr mBM/gag7oJVWe6Doakt/njXKeU4yIrYEztXoMmtDcGk4QcrxE04zYDnRIt6Cd7+nMpm0DQWNejQ4u FRrq+OoJ1E4j6R66BY/1myMaOxlx3GslCq4F6kIG9jpejMuC2U+WOXS+xkPB5RwI+1+5GvI8HwEfH UaEVg7bA==; Received: from [2001:4bb8:188:1b1:5a9e:9f39:5a86:b20c] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGMPY-003v2s-J9; Wed, 18 Aug 2021 14:19:46 +0000 From: Christoph Hellwig To: Gabriel Krisman Bertazi Cc: Shreeya Patel , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: [PATCH 09/11] unicode: cache the normalization tables in struct unicode_map Date: Wed, 18 Aug 2021 16:06:49 +0200 Message-Id: <20210818140651.17181-10-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818140651.17181-1-hch@lst.de> References: <20210818140651.17181-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Instead of repeatedly looking up the version add pointers to the NFD and NFD+CF tables to struct unicode_map, and pass a unicode_map plus index to the functions using the normalization tables. Signed-off-by: Christoph Hellwig --- fs/unicode/utf8-core.c | 37 +++++++++--------- fs/unicode/utf8-norm.c | 45 ++++++++++----------- fs/unicode/utf8-selftest.c | 80 ++++++++++++++++---------------------- fs/unicode/utf8n.h | 10 +++-- include/linux/unicode.h | 19 +++++++++ 5 files changed, 97 insertions(+), 94 deletions(-) diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index dca2865c3bee..d9f713d38c0a 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -5,16 +5,13 @@ #include #include #include -#include #include #include "utf8n.h" int utf8_validate(const struct unicode_map *um, const struct qstr *str) { - const struct utf8data *data = utf8nfdi(um->version); - - if (utf8nlen(data, str->name, str->len) < 0) + if (utf8nlen(um, UTF8_NFDI, str->name, str->len) < 0) return -1; return 0; } @@ -23,14 +20,13 @@ EXPORT_SYMBOL(utf8_validate); int utf8_strncmp(const struct unicode_map *um, const struct qstr *s1, const struct qstr *s2) { - const struct utf8data *data = utf8nfdi(um->version); struct utf8cursor cur1, cur2; int c1, c2; - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) + if (utf8ncursor(&cur1, um, UTF8_NFDI, s1->name, s1->len) < 0) return -EINVAL; - if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) + if (utf8ncursor(&cur2, um, UTF8_NFDI, s2->name, s2->len) < 0) return -EINVAL; do { @@ -50,14 +46,13 @@ EXPORT_SYMBOL(utf8_strncmp); int utf8_strncasecmp(const struct unicode_map *um, const struct qstr *s1, const struct qstr *s2) { - const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur1, cur2; int c1, c2; - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) + if (utf8ncursor(&cur1, um, UTF8_NFDICF, s1->name, s1->len) < 0) return -EINVAL; - if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) + if (utf8ncursor(&cur2, um, UTF8_NFDICF, s2->name, s2->len) < 0) return -EINVAL; do { @@ -81,12 +76,11 @@ int utf8_strncasecmp_folded(const struct unicode_map *um, const struct qstr *cf, const struct qstr *s1) { - const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur1; int c1, c2; int i = 0; - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) + if (utf8ncursor(&cur1, um, UTF8_NFDICF, s1->name, s1->len) < 0) return -EINVAL; do { @@ -105,11 +99,10 @@ EXPORT_SYMBOL(utf8_strncasecmp_folded); int utf8_casefold(const struct unicode_map *um, const struct qstr *str, unsigned char *dest, size_t dlen) { - const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur; size_t nlen = 0; - if (utf8ncursor(&cur, data, str->name, str->len) < 0) + if (utf8ncursor(&cur, um, UTF8_NFDICF, str->name, str->len) < 0) return -EINVAL; for (nlen = 0; nlen < dlen; nlen++) { @@ -128,12 +121,11 @@ EXPORT_SYMBOL(utf8_casefold); int utf8_casefold_hash(const struct unicode_map *um, const void *salt, struct qstr *str) { - const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur; int c; unsigned long hash = init_name_hash(salt); - if (utf8ncursor(&cur, data, str->name, str->len) < 0) + if (utf8ncursor(&cur, um, UTF8_NFDICF, str->name, str->len) < 0) return -EINVAL; while ((c = utf8byte(&cur))) { @@ -149,11 +141,10 @@ EXPORT_SYMBOL(utf8_casefold_hash); int utf8_normalize(const struct unicode_map *um, const struct qstr *str, unsigned char *dest, size_t dlen) { - const struct utf8data *data = utf8nfdi(um->version); struct utf8cursor cur; ssize_t nlen = 0; - if (utf8ncursor(&cur, data, str->name, str->len) < 0) + if (utf8ncursor(&cur, um, UTF8_NFDI, str->name, str->len) < 0) return -EINVAL; for (nlen = 0; nlen < dlen; nlen++) { @@ -180,7 +171,17 @@ struct unicode_map *utf8_load(unsigned int version) if (!um) return ERR_PTR(-ENOMEM); um->version = version; + um->ntab[UTF8_NFDI] = utf8nfdi(version); + if (!um->ntab[UTF8_NFDI]) + goto out_free_um; + um->ntab[UTF8_NFDICF] = utf8nfdicf(version); + if (!um->ntab[UTF8_NFDICF]) + goto out_free_um; return um; + +out_free_um: + kfree(um); + return ERR_PTR(-EINVAL); } EXPORT_SYMBOL(utf8_load); diff --git a/fs/unicode/utf8-norm.c b/fs/unicode/utf8-norm.c index 1ac90fa00070..7c1f28ab31a8 100644 --- a/fs/unicode/utf8-norm.c +++ b/fs/unicode/utf8-norm.c @@ -309,21 +309,19 @@ utf8hangul(const char *str, unsigned char *hangul) * is well-formed and corresponds to a known unicode code point. The * shorthand for this will be "is valid UTF-8 unicode". */ -static utf8leaf_t *utf8nlookup(const struct utf8data *data, - unsigned char *hangul, const char *s, size_t len) +static utf8leaf_t *utf8nlookup(const struct unicode_map *um, + enum utf8_normalization n, unsigned char *hangul, const char *s, + size_t len) { - utf8trie_t *trie = NULL; + utf8trie_t *trie = utf8data + um->ntab[n]->offset; int offlen; int offset; int mask; int node; - if (!data) - return NULL; if (len == 0) return NULL; - trie = utf8data + data->offset; node = 1; while (node) { offlen = (*trie & OFFLEN) >> OFFLEN_SHIFT; @@ -385,29 +383,28 @@ static utf8leaf_t *utf8nlookup(const struct utf8data *data, * * Forwards to utf8nlookup(). */ -static utf8leaf_t *utf8lookup(const struct utf8data *data, - unsigned char *hangul, const char *s) +static utf8leaf_t *utf8lookup(const struct unicode_map *um, + enum utf8_normalization n, unsigned char *hangul, const char *s) { - return utf8nlookup(data, hangul, s, (size_t)-1); + return utf8nlookup(um, n, hangul, s, (size_t)-1); } /* * Length of the normalization of s, touch at most len bytes. * Return -1 if s is not valid UTF-8 unicode. */ -ssize_t utf8nlen(const struct utf8data *data, const char *s, size_t len) +ssize_t utf8nlen(const struct unicode_map *um, enum utf8_normalization n, + const char *s, size_t len) { utf8leaf_t *leaf; size_t ret = 0; unsigned char hangul[UTF8HANGULLEAF]; - if (!data) - return -1; while (len && *s) { - leaf = utf8nlookup(data, hangul, s, len); + leaf = utf8nlookup(um, n, hangul, s, len); if (!leaf) return -1; - if (utf8agetab[LEAF_GEN(leaf)] > data->maxage) + if (utf8agetab[LEAF_GEN(leaf)] > um->ntab[n]->maxage) ret += utf8clen(s); else if (LEAF_CCC(leaf) == DECOMPOSE) ret += strlen(LEAF_STR(leaf)); @@ -430,14 +427,13 @@ EXPORT_SYMBOL(utf8nlen); * * Returns -1 on error, 0 on success. */ -int utf8ncursor(struct utf8cursor *u8c, const struct utf8data *data, - const char *s, size_t len) +int utf8ncursor(struct utf8cursor *u8c, const struct unicode_map *um, + enum utf8_normalization n, const char *s, size_t len) { - if (!data) - return -1; if (!s) return -1; - u8c->data = data; + u8c->um = um; + u8c->n = n; u8c->s = s; u8c->p = NULL; u8c->ss = NULL; @@ -512,9 +508,9 @@ int utf8byte(struct utf8cursor *u8c) /* Look up the data for the current character. */ if (u8c->p) { - leaf = utf8lookup(u8c->data, u8c->hangul, u8c->s); + leaf = utf8lookup(u8c->um, u8c->n, u8c->hangul, u8c->s); } else { - leaf = utf8nlookup(u8c->data, u8c->hangul, + leaf = utf8nlookup(u8c->um, u8c->n, u8c->hangul, u8c->s, u8c->len); } @@ -524,7 +520,8 @@ int utf8byte(struct utf8cursor *u8c) ccc = LEAF_CCC(leaf); /* Characters that are too new have CCC 0. */ - if (utf8agetab[LEAF_GEN(leaf)] > u8c->data->maxage) { + if (utf8agetab[LEAF_GEN(leaf)] > + u8c->um->ntab[u8c->n]->maxage) { ccc = STOPPER; } else if (ccc == DECOMPOSE) { u8c->len -= utf8clen(u8c->s); @@ -538,7 +535,7 @@ int utf8byte(struct utf8cursor *u8c) goto ccc_mismatch; } - leaf = utf8lookup(u8c->data, u8c->hangul, u8c->s); + leaf = utf8lookup(u8c->um, u8c->n, u8c->hangul, u8c->s); if (!leaf) return -1; ccc = LEAF_CCC(leaf); @@ -611,7 +608,6 @@ const struct utf8data *utf8nfdi(unsigned int maxage) return NULL; return &utf8nfdidata[i]; } -EXPORT_SYMBOL(utf8nfdi); const struct utf8data *utf8nfdicf(unsigned int maxage) { @@ -623,4 +619,3 @@ const struct utf8data *utf8nfdicf(unsigned int maxage) return NULL; return &utf8nfdicfdata[i]; } -EXPORT_SYMBOL(utf8nfdicf); diff --git a/fs/unicode/utf8-selftest.c b/fs/unicode/utf8-selftest.c index 04628b50351d..cfa3832b75f4 100644 --- a/fs/unicode/utf8-selftest.c +++ b/fs/unicode/utf8-selftest.c @@ -18,9 +18,7 @@ unsigned int failed_tests; unsigned int total_tests; /* Tests will be based on this version. */ -#define latest_maj 12 -#define latest_min 1 -#define latest_rev 0 +#define UTF8_LATEST UNICODE_AGE(12, 1, 0) #define _test(cond, func, line, fmt, ...) do { \ total_tests++; \ @@ -160,29 +158,22 @@ static const struct { } }; -static ssize_t utf8len(const struct utf8data *data, const char *s) +static ssize_t utf8len(const struct unicode_map *um, enum utf8_normalization n, + const char *s) { - return utf8nlen(data, s, (size_t)-1); + return utf8nlen(um, n, s, (size_t)-1); } -static int utf8cursor(struct utf8cursor *u8c, const struct utf8data *data, - const char *s) +static int utf8cursor(struct utf8cursor *u8c, const struct unicode_map *um, + enum utf8_normalization n, const char *s) { - return utf8ncursor(u8c, data, s, (unsigned int)-1); + return utf8ncursor(u8c, um, n, s, (unsigned int)-1); } -static void check_utf8_nfdi(void) +static void check_utf8_nfdi(struct unicode_map *um) { int i; struct utf8cursor u8c; - const struct utf8data *data; - - data = utf8nfdi(UNICODE_AGE(latest_maj, latest_min, latest_rev)); - if (!data) { - pr_err("%s: Unable to load utf8-%d.%d.%d. Skipping.\n", - __func__, latest_maj, latest_min, latest_rev); - return; - } for (i = 0; i < ARRAY_SIZE(nfdi_test_data); i++) { int len = strlen(nfdi_test_data[i].str); @@ -190,10 +181,11 @@ static void check_utf8_nfdi(void) int j = 0; unsigned char c; - test((utf8len(data, nfdi_test_data[i].str) == nlen)); - test((utf8nlen(data, nfdi_test_data[i].str, len) == nlen)); + test((utf8len(um, UTF8_NFDI, nfdi_test_data[i].str) == nlen)); + test((utf8nlen(um, UTF8_NFDI, nfdi_test_data[i].str, len) == + nlen)); - if (utf8cursor(&u8c, data, nfdi_test_data[i].str) < 0) + if (utf8cursor(&u8c, um, UTF8_NFDI, nfdi_test_data[i].str) < 0) pr_err("can't create cursor\n"); while ((c = utf8byte(&u8c)) > 0) { @@ -207,18 +199,10 @@ static void check_utf8_nfdi(void) } } -static void check_utf8_nfdicf(void) +static void check_utf8_nfdicf(struct unicode_map *um) { int i; struct utf8cursor u8c; - const struct utf8data *data; - - data = utf8nfdicf(UNICODE_AGE(latest_maj, latest_min, latest_rev)); - if (!data) { - pr_err("%s: Unable to load utf8-%d.%d.%d. Skipping.\n", - __func__, latest_maj, latest_min, latest_rev); - return; - } for (i = 0; i < ARRAY_SIZE(nfdicf_test_data); i++) { int len = strlen(nfdicf_test_data[i].str); @@ -226,10 +210,13 @@ static void check_utf8_nfdicf(void) int j = 0; unsigned char c; - test((utf8len(data, nfdicf_test_data[i].str) == nlen)); - test((utf8nlen(data, nfdicf_test_data[i].str, len) == nlen)); + test((utf8len(um, UTF8_NFDICF, nfdicf_test_data[i].str) == + nlen)); + test((utf8nlen(um, UTF8_NFDICF, nfdicf_test_data[i].str, len) == + nlen)); - if (utf8cursor(&u8c, data, nfdicf_test_data[i].str) < 0) + if (utf8cursor(&u8c, um, UTF8_NFDICF, + nfdicf_test_data[i].str) < 0) pr_err("can't create cursor\n"); while ((c = utf8byte(&u8c)) > 0) { @@ -243,16 +230,9 @@ static void check_utf8_nfdicf(void) } } -static void check_utf8_comparisons(void) +static void check_utf8_comparisons(struct unicode_map *table) { int i; - struct unicode_map *table = utf8_load(UNICODE_AGE(12, 1, 0)); - - if (IS_ERR(table)) { - pr_err("%s: Unable to load utf8 %d.%d.%d. Skipping.\n", - __func__, latest_maj, latest_min, latest_rev); - return; - } for (i = 0; i < ARRAY_SIZE(nfdi_test_data); i++) { const struct qstr s1 = {.name = nfdi_test_data[i].str, @@ -273,8 +253,6 @@ static void check_utf8_comparisons(void) test_f(!utf8_strncasecmp(table, &s1, &s2), "%s %s comparison mismatch\n", s1.name, s2.name); } - - utf8_unload(table); } static void check_supported_versions(void) @@ -286,8 +264,7 @@ static void check_supported_versions(void) test(utf8version_is_supported(UNICODE_AGE(9, 0, 0))); /* Unicode 1x.0.0 (the latest version) should be supported. */ - test(utf8version_is_supported( - UNICODE_AGE(latest_maj, latest_min, latest_rev))); + test(utf8version_is_supported(UTF8_LATEST)); /* Next versions don't exist. */ test(!utf8version_is_supported(UNICODE_AGE(13, 0, 0))); @@ -297,19 +274,28 @@ static void check_supported_versions(void) static int __init init_test_ucd(void) { + struct unicode_map *um; + failed_tests = 0; total_tests = 0; + um = utf8_load(UTF8_LATEST); + if (IS_ERR(um)) { + pr_err("%s: Unable to load utf8 table.\n", __func__); + return PTR_ERR(um); + } + check_supported_versions(); - check_utf8_nfdi(); - check_utf8_nfdicf(); - check_utf8_comparisons(); + check_utf8_nfdi(um); + check_utf8_nfdicf(um); + check_utf8_comparisons(um); if (!failed_tests) pr_info("All %u tests passed\n", total_tests); else pr_err("%u out of %u tests failed\n", failed_tests, total_tests); + utf8_unload(um); return 0; } diff --git a/fs/unicode/utf8n.h b/fs/unicode/utf8n.h index 736b6460a38c..206c89f0dbf7 100644 --- a/fs/unicode/utf8n.h +++ b/fs/unicode/utf8n.h @@ -39,7 +39,8 @@ extern const struct utf8data *utf8nfdicf(unsigned int maxage); * Returns 0 if only ignorable code points are present. * Returns -1 if the input is not valid UTF-8. */ -extern ssize_t utf8nlen(const struct utf8data *data, const char *s, size_t len); +ssize_t utf8nlen(const struct unicode_map *um, enum utf8_normalization n, + const char *s, size_t len); /* Needed in struct utf8cursor below. */ #define UTF8HANGULLEAF (12) @@ -48,7 +49,8 @@ extern ssize_t utf8nlen(const struct utf8data *data, const char *s, size_t len); * Cursor structure used by the normalizer. */ struct utf8cursor { - const struct utf8data *data; + const struct unicode_map *um; + enum utf8_normalization n; const char *s; const char *p; const char *ss; @@ -65,8 +67,8 @@ struct utf8cursor { * Returns 0 on success. * Returns -1 on failure. */ -extern int utf8ncursor(struct utf8cursor *u8c, const struct utf8data *data, - const char *s, size_t len); +int utf8ncursor(struct utf8cursor *u8c, const struct unicode_map *um, + enum utf8_normalization n, const char *s, size_t len); /* * Get the next byte in the normalization. diff --git a/include/linux/unicode.h b/include/linux/unicode.h index a5cc44a8f90c..3e502c7456e8 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -5,6 +5,8 @@ #include #include +struct utf8data; + /* Encoding a unicode version number as a single unsigned int. */ #define UNICODE_MAJ_SHIFT (16) #define UNICODE_MIN_SHIFT (8) @@ -14,8 +16,25 @@ ((unsigned int)(MIN) << UNICODE_MIN_SHIFT) | \ ((unsigned int)(REV))) +/* + * Two normalization forms are supported: + * 1) NFDI + * - Apply unicode normalization form NFD. + * - Remove any Default_Ignorable_Code_Point. + * 2) NFDICF + * - Apply unicode normalization form NFD. + * - Remove any Default_Ignorable_Code_Point. + * - Apply a full casefold (C + F). + */ +enum utf8_normalization { + UTF8_NFDI = 0, + UTF8_NFDICF, + UTF8_NMAX, +}; + struct unicode_map { unsigned int version; + const struct utf8data *ntab[UTF8_NMAX]; }; int utf8_validate(const struct unicode_map *um, const struct qstr *str); From patchwork Wed Aug 18 14:06:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12444585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B919AC4320A for ; Wed, 18 Aug 2021 14:25:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 988D3610CE for ; Wed, 18 Aug 2021 14:25:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238113AbhHRO0L (ORCPT ); Wed, 18 Aug 2021 10:26:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239854AbhHROYD (ORCPT ); Wed, 18 Aug 2021 10:24:03 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86C3DC061155; Wed, 18 Aug 2021 07:21:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=jzP03WJks1rcorPHhNxvobSIkXwAyFDxv0GxDOTQtjc=; b=dU066UNDgZs+u11V6bzMLybesP RS9a+S3me1UMJVq8VbIaXEFbBsoN5NBxSS+fD0RPSEMAABCmiklz8JhBafHO0UENhRgDQxonwW9YN j3YGWlH5REELTZOlMofSULozPVwIF2wxiMxQnQQi5k38+leOtsy5KuiZU43TQe3zzzhKkbPL9s+VC CoVWmjVEQsuZ7xTCxAWXAbIaP3kkK4INDBDnjmQKkfkx0eX6HyU6v7LtYvXVKmSg6ootaEqtuAVRV gXoDDszvA9OWA/xgr5PWMh6+WIYCQGOONkLW7VZhJXzDzDbQUoN+nXKMMZ8ZjGzINZvuqrWShNIP+ Zml1pzDg==; Received: from [2001:4bb8:188:1b1:5a9e:9f39:5a86:b20c] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGMQt-003v9V-Dg; Wed, 18 Aug 2021 14:20:58 +0000 From: Christoph Hellwig To: Gabriel Krisman Bertazi Cc: Shreeya Patel , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: [PATCH 10/11] unicode: Add utf8-data module Date: Wed, 18 Aug 2021 16:06:50 +0200 Message-Id: <20210818140651.17181-11-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818140651.17181-1-hch@lst.de> References: <20210818140651.17181-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org utf8data.h contains a large database table which is an auto-generated decodification trie for the unicode normalization functions. Allow building it into a separate module. Based on a patch from Shreeya Patel . Signed-off-by: Christoph Hellwig --- fs/unicode/Kconfig | 13 ++++- fs/unicode/Makefile | 13 ++--- fs/unicode/mkutf8data.c | 24 ++++++++-- fs/unicode/utf8-core.c | 35 +++++++++++--- fs/unicode/utf8-norm.c | 48 ++++--------------- fs/unicode/utf8-selftest.c | 16 +++---- ...{utf8data.h_shipped => utf8data.c_shipped} | 22 +++++++-- fs/unicode/utf8n.h | 40 ++++++++-------- include/linux/unicode.h | 2 + 9 files changed, 123 insertions(+), 90 deletions(-) rename fs/unicode/{utf8data.h_shipped => utf8data.c_shipped} (99%) diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig index 2c27b9a5cd6c..610d7bc05d6e 100644 --- a/fs/unicode/Kconfig +++ b/fs/unicode/Kconfig @@ -8,7 +8,16 @@ config UNICODE Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding support. +config UNICODE_UTF8_DATA + tristate "UTF-8 normalization and casefolding tables" + depends on UNICODE + default UNICODE + help + This contains a large table of case foldings, which can be loaded as + a separate module if you say M here. To be on the safe side stick + to the default of Y. Saying N here makes no sense, if you do not want + utf8 casefolding support, disable CONFIG_UNICODE instead. + config UNICODE_NORMALIZATION_SELFTEST tristate "Test UTF-8 normalization support" - depends on UNICODE - default n + depends on UNICODE_UTF8_DATA diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile index b88aecc86550..2f9d9188852b 100644 --- a/fs/unicode/Makefile +++ b/fs/unicode/Makefile @@ -2,14 +2,15 @@ obj-$(CONFIG_UNICODE) += unicode.o obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o +obj-$(CONFIG_UNICODE_UTF8_DATA) += utf8data.o unicode-y := utf8-norm.o utf8-core.o -$(obj)/utf8-norm.o: $(obj)/utf8data.h +$(obj)/utf8-data.o: $(obj)/utf8data.c -# In the normal build, the checked-in utf8data.h is just shipped. +# In the normal build, the checked-in utf8data.c is just shipped. # -# To generate utf8data.h from UCD, put *.txt files in this directory +# To generate utf8data.c from UCD, put *.txt files in this directory # and pass REGENERATE_UTF8DATA=1 from the command line. ifdef REGENERATE_UTF8DATA @@ -24,15 +25,15 @@ quiet_cmd_utf8data = GEN $@ -t $(srctree)/$(src)/NormalizationTest.txt \ -o $@ -$(obj)/utf8data.h: $(obj)/mkutf8data $(filter %.txt, $(cmd_utf8data)) FORCE +$(obj)/utf8data.c: $(obj)/mkutf8data $(filter %.txt, $(cmd_utf8data)) FORCE $(call if_changed,utf8data) else -$(obj)/utf8data.h: $(src)/utf8data.h_shipped FORCE +$(obj)/utf8data.c: $(src)/utf8data.c_shipped FORCE $(call if_changed,shipped) endif -targets += utf8data.h +targets += utf8data.c hostprogs += mkutf8data diff --git a/fs/unicode/mkutf8data.c b/fs/unicode/mkutf8data.c index ff2025ac5a32..bc1a7c8b5c8d 100644 --- a/fs/unicode/mkutf8data.c +++ b/fs/unicode/mkutf8data.c @@ -3287,12 +3287,10 @@ static void write_file(void) open_fail(utf8_name, errno); fprintf(file, "/* This file is generated code, do not edit. */\n"); - fprintf(file, "#ifndef __INCLUDED_FROM_UTF8NORM_C__\n"); - fprintf(file, "#error Only nls_utf8-norm.c should include this file.\n"); - fprintf(file, "#endif\n"); fprintf(file, "\n"); - fprintf(file, "static const unsigned int utf8vers = %#x;\n", - unicode_maxage); + fprintf(file, "#include \n"); + fprintf(file, "#include \n"); + fprintf(file, "#include \"utf8n.h\"\n"); fprintf(file, "\n"); fprintf(file, "static const unsigned int utf8agetab[] = {\n"); for (i = 0; i != ages_count; i++) @@ -3339,6 +3337,22 @@ static void write_file(void) fprintf(file, "\n"); } fprintf(file, "};\n"); + fprintf(file, "\n"); + fprintf(file, "struct utf8data_table utf8_data_table = {\n"); + fprintf(file, "\t.utf8agetab = utf8agetab,\n"); + fprintf(file, "\t.utf8agetab_size = ARRAY_SIZE(utf8agetab),\n"); + fprintf(file, "\n"); + fprintf(file, "\t.utf8nfdicfdata = utf8nfdicfdata,\n"); + fprintf(file, "\t.utf8nfdicfdata_size = ARRAY_SIZE(utf8nfdicfdata),\n"); + fprintf(file, "\n"); + fprintf(file, "\t.utf8nfdidata = utf8nfdidata,\n"); + fprintf(file, "\t.utf8nfdidata_size = ARRAY_SIZE(utf8nfdidata),\n"); + fprintf(file, "\n"); + fprintf(file, "\t.utf8data = utf8data,\n"); + fprintf(file, "};\n"); + fprintf(file, "EXPORT_SYMBOL_GPL(utf8_data_table);"); + fprintf(file, "\n"); + fprintf(file, "MODULE_LICENSE(\"GPL v2\");\n"); fclose(file); } diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index d9f713d38c0a..38ca824f1015 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -160,25 +160,45 @@ int utf8_normalize(const struct unicode_map *um, const struct qstr *str, } EXPORT_SYMBOL(utf8_normalize); +static const struct utf8data *find_table_version(const struct utf8data *table, + size_t nr_entries, unsigned int version) +{ + size_t i = nr_entries - 1; + + while (version < table[i].maxage) + i--; + if (version > table[i].maxage) + return NULL; + return &table[i]; +} + struct unicode_map *utf8_load(unsigned int version) { struct unicode_map *um; - if (!utf8version_is_supported(version)) - return ERR_PTR(-EINVAL); - um = kzalloc(sizeof(struct unicode_map), GFP_KERNEL); if (!um) return ERR_PTR(-ENOMEM); um->version = version; - um->ntab[UTF8_NFDI] = utf8nfdi(version); - if (!um->ntab[UTF8_NFDI]) + + um->tables = symbol_request(utf8_data_table); + if (!um->tables) goto out_free_um; - um->ntab[UTF8_NFDICF] = utf8nfdicf(version); + + if (!utf8version_is_supported(um, version)) + goto out_symbol_put; + um->ntab[UTF8_NFDI] = find_table_version(um->tables->utf8nfdidata, + um->tables->utf8nfdidata_size, um->version); + if (!um->ntab[UTF8_NFDI]) + goto out_symbol_put; + um->ntab[UTF8_NFDICF] = find_table_version(um->tables->utf8nfdicfdata, + um->tables->utf8nfdicfdata_size, um->version); if (!um->ntab[UTF8_NFDICF]) - goto out_free_um; + goto out_symbol_put; return um; +out_symbol_put: + symbol_put(um->tables); out_free_um: kfree(um); return ERR_PTR(-EINVAL); @@ -187,6 +207,7 @@ EXPORT_SYMBOL(utf8_load); void utf8_unload(struct unicode_map *um) { + symbol_put(utf8_data_table); kfree(um); } EXPORT_SYMBOL(utf8_unload); diff --git a/fs/unicode/utf8-norm.c b/fs/unicode/utf8-norm.c index 7c1f28ab31a8..829c7e2ad764 100644 --- a/fs/unicode/utf8-norm.c +++ b/fs/unicode/utf8-norm.c @@ -6,21 +6,12 @@ #include "utf8n.h" -struct utf8data { - unsigned int maxage; - unsigned int offset; -}; - -#define __INCLUDED_FROM_UTF8NORM_C__ -#include "utf8data.h" -#undef __INCLUDED_FROM_UTF8NORM_C__ - -int utf8version_is_supported(unsigned int version) +int utf8version_is_supported(const struct unicode_map *um, unsigned int version) { - int i = ARRAY_SIZE(utf8agetab) - 1; + int i = um->tables->utf8agetab_size - 1; - while (i >= 0 && utf8agetab[i] != 0) { - if (version == utf8agetab[i]) + while (i >= 0 && um->tables->utf8agetab[i] != 0) { + if (version == um->tables->utf8agetab[i]) return 1; i--; } @@ -161,7 +152,7 @@ typedef const unsigned char utf8trie_t; * underlying datatype: unsigned char. * * leaf[0]: The unicode version, stored as a generation number that is - * an index into utf8agetab[]. With this we can filter code + * an index into ->utf8agetab[]. With this we can filter code * points based on the unicode version in which they were * defined. The CCC of a non-defined code point is 0. * leaf[1]: Canonical Combining Class. During normalization, we need @@ -313,7 +304,7 @@ static utf8leaf_t *utf8nlookup(const struct unicode_map *um, enum utf8_normalization n, unsigned char *hangul, const char *s, size_t len) { - utf8trie_t *trie = utf8data + um->ntab[n]->offset; + utf8trie_t *trie = um->tables->utf8data + um->ntab[n]->offset; int offlen; int offset; int mask; @@ -404,7 +395,8 @@ ssize_t utf8nlen(const struct unicode_map *um, enum utf8_normalization n, leaf = utf8nlookup(um, n, hangul, s, len); if (!leaf) return -1; - if (utf8agetab[LEAF_GEN(leaf)] > um->ntab[n]->maxage) + if (um->tables->utf8agetab[LEAF_GEN(leaf)] > + um->ntab[n]->maxage) ret += utf8clen(s); else if (LEAF_CCC(leaf) == DECOMPOSE) ret += strlen(LEAF_STR(leaf)); @@ -520,7 +512,7 @@ int utf8byte(struct utf8cursor *u8c) ccc = LEAF_CCC(leaf); /* Characters that are too new have CCC 0. */ - if (utf8agetab[LEAF_GEN(leaf)] > + if (u8c->um->tables->utf8agetab[LEAF_GEN(leaf)] > u8c->um->ntab[u8c->n]->maxage) { ccc = STOPPER; } else if (ccc == DECOMPOSE) { @@ -597,25 +589,3 @@ int utf8byte(struct utf8cursor *u8c) } } EXPORT_SYMBOL(utf8byte); - -const struct utf8data *utf8nfdi(unsigned int maxage) -{ - int i = ARRAY_SIZE(utf8nfdidata) - 1; - - while (maxage < utf8nfdidata[i].maxage) - i--; - if (maxage > utf8nfdidata[i].maxage) - return NULL; - return &utf8nfdidata[i]; -} - -const struct utf8data *utf8nfdicf(unsigned int maxage) -{ - int i = ARRAY_SIZE(utf8nfdicfdata) - 1; - - while (maxage < utf8nfdicfdata[i].maxage) - i--; - if (maxage > utf8nfdicfdata[i].maxage) - return NULL; - return &utf8nfdicfdata[i]; -} diff --git a/fs/unicode/utf8-selftest.c b/fs/unicode/utf8-selftest.c index cfa3832b75f4..eb2bbdd688d7 100644 --- a/fs/unicode/utf8-selftest.c +++ b/fs/unicode/utf8-selftest.c @@ -255,21 +255,21 @@ static void check_utf8_comparisons(struct unicode_map *table) } } -static void check_supported_versions(void) +static void check_supported_versions(struct unicode_map *um) { /* Unicode 7.0.0 should be supported. */ - test(utf8version_is_supported(UNICODE_AGE(7, 0, 0))); + test(utf8version_is_supported(um, UNICODE_AGE(7, 0, 0))); /* Unicode 9.0.0 should be supported. */ - test(utf8version_is_supported(UNICODE_AGE(9, 0, 0))); + test(utf8version_is_supported(um, UNICODE_AGE(9, 0, 0))); /* Unicode 1x.0.0 (the latest version) should be supported. */ - test(utf8version_is_supported(UTF8_LATEST)); + test(utf8version_is_supported(um, UTF8_LATEST)); /* Next versions don't exist. */ - test(!utf8version_is_supported(UNICODE_AGE(13, 0, 0))); - test(!utf8version_is_supported(UNICODE_AGE(0, 0, 0))); - test(!utf8version_is_supported(UNICODE_AGE(-1, -1, -1))); + test(!utf8version_is_supported(um, UNICODE_AGE(13, 0, 0))); + test(!utf8version_is_supported(um, UNICODE_AGE(0, 0, 0))); + test(!utf8version_is_supported(um, UNICODE_AGE(-1, -1, -1))); } static int __init init_test_ucd(void) @@ -285,7 +285,7 @@ static int __init init_test_ucd(void) return PTR_ERR(um); } - check_supported_versions(); + check_supported_versions(um); check_utf8_nfdi(um); check_utf8_nfdicf(um); check_utf8_comparisons(um); diff --git a/fs/unicode/utf8data.h_shipped b/fs/unicode/utf8data.c_shipped similarity index 99% rename from fs/unicode/utf8data.h_shipped rename to fs/unicode/utf8data.c_shipped index 76e4f0e1b089..d9b62901aa96 100644 --- a/fs/unicode/utf8data.h_shipped +++ b/fs/unicode/utf8data.c_shipped @@ -1,9 +1,8 @@ /* This file is generated code, do not edit. */ -#ifndef __INCLUDED_FROM_UTF8NORM_C__ -#error Only nls_utf8-norm.c should include this file. -#endif -static const unsigned int utf8vers = 0xc0100; +#include +#include +#include "utf8n.h" static const unsigned int utf8agetab[] = { 0, @@ -4107,3 +4106,18 @@ static const unsigned char utf8data[64256] = { 0x52,0x04,0x00,0x00,0x11,0x04,0x00,0x00,0x02,0x00,0xcf,0x86,0xcf,0x06,0x02,0x00, 0x81,0x80,0xcf,0x86,0x85,0x84,0xcf,0x86,0xcf,0x06,0x02,0x00,0x00,0x00,0x00,0x00 }; + +struct utf8data_table utf8_data_table = { + .utf8agetab = utf8agetab, + .utf8agetab_size = ARRAY_SIZE(utf8agetab), + + .utf8nfdicfdata = utf8nfdicfdata, + .utf8nfdicfdata_size = ARRAY_SIZE(utf8nfdicfdata), + + .utf8nfdidata = utf8nfdidata, + .utf8nfdidata_size = ARRAY_SIZE(utf8nfdidata), + + .utf8data = utf8data, +}; +EXPORT_SYMBOL_GPL(utf8_data_table); +MODULE_LICENSE("GPL v2"); diff --git a/fs/unicode/utf8n.h b/fs/unicode/utf8n.h index 206c89f0dbf7..bd00d587747a 100644 --- a/fs/unicode/utf8n.h +++ b/fs/unicode/utf8n.h @@ -13,25 +13,7 @@ #include #include -int utf8version_is_supported(unsigned int version); - -/* - * Look for the correct const struct utf8data for a unicode version. - * Returns NULL if the version requested is too new. - * - * Two normalization forms are supported: nfdi and nfdicf. - * - * nfdi: - * - Apply unicode normalization form NFD. - * - Remove any Default_Ignorable_Code_Point. - * - * nfdicf: - * - Apply unicode normalization form NFD. - * - Remove any Default_Ignorable_Code_Point. - * - Apply a full casefold (C + F). - */ -extern const struct utf8data *utf8nfdi(unsigned int maxage); -extern const struct utf8data *utf8nfdicf(unsigned int maxage); +int utf8version_is_supported(const struct unicode_map *um, unsigned int version); /* * Determine the length of the normalized from of the string, @@ -78,4 +60,24 @@ int utf8ncursor(struct utf8cursor *u8c, const struct unicode_map *um, */ extern int utf8byte(struct utf8cursor *u8c); +struct utf8data { + unsigned int maxage; + unsigned int offset; +}; + +struct utf8data_table { + const unsigned int *utf8agetab; + int utf8agetab_size; + + const struct utf8data *utf8nfdicfdata; + int utf8nfdicfdata_size; + + const struct utf8data *utf8nfdidata; + int utf8nfdidata_size; + + const unsigned char *utf8data; +}; + +extern struct utf8data_table utf8_data_table; + #endif /* UTF8NORM_H */ diff --git a/include/linux/unicode.h b/include/linux/unicode.h index 3e502c7456e8..2b3849e7cd64 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -6,6 +6,7 @@ #include struct utf8data; +struct utf8data_table; /* Encoding a unicode version number as a single unsigned int. */ #define UNICODE_MAJ_SHIFT (16) @@ -35,6 +36,7 @@ enum utf8_normalization { struct unicode_map { unsigned int version; const struct utf8data *ntab[UTF8_NMAX]; + const struct utf8data_table *tables; }; int utf8_validate(const struct unicode_map *um, const struct qstr *str); From patchwork Wed Aug 18 14:06:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12444577 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A62D3C4320E for ; Wed, 18 Aug 2021 14:24:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8A34D610CE for ; Wed, 18 Aug 2021 14:24:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239034AbhHROYi (ORCPT ); Wed, 18 Aug 2021 10:24:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239958AbhHROYX (ORCPT ); Wed, 18 Aug 2021 10:24:23 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38145C0612A3; Wed, 18 Aug 2021 07:22:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=EItct/4LybJFgbzlVrgpqSv7WDkE+DpQx7uqSXk8vHY=; b=Oi9YlwS38JMx8g2TCuOBCZ/rGs FvpX/QMst217W1jvJ/yiGztjw5E2p9zV9VDhXbodHOXs9Gb2Ka2LUWolbL4+Q0sF+deWk9K6/lRAI RN76vlgkG69i1yiVq0wX/GGS8/8xmU24iFYX7fdR23FRDtBRYs7N2Wvcfs6U2eo6C0tcQ8+USY4bn 03ZTm/qeArw87v+PRoTkOmIAuoruSYRtkoZPK/J0QMITKJd1ZX9FxM+eWsrfruc5qodR2tx5vNkWA f6spZbtYv/Xs2ApMtsuX9sNdXziaMqzs4Eqm1GMLiGKD9xdioADPx4Op+s39mNMwQLAnzgS8LDAHY LtaUuS0Q==; Received: from [2001:4bb8:188:1b1:5a9e:9f39:5a86:b20c] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGMRk-003vEu-0m; Wed, 18 Aug 2021 14:21:52 +0000 From: Christoph Hellwig To: Gabriel Krisman Bertazi Cc: Shreeya Patel , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: [PATCH 11/11] unicode: only export internal symbols for the selftests Date: Wed, 18 Aug 2021 16:06:51 +0200 Message-Id: <20210818140651.17181-12-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818140651.17181-1-hch@lst.de> References: <20210818140651.17181-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The exported symbols in utf8-norm.c are not needed for normal file system consumers, so move them to conditional _GPL exports just for the selftest. Signed-off-by: Christoph Hellwig --- fs/unicode/utf8-norm.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/unicode/utf8-norm.c b/fs/unicode/utf8-norm.c index 829c7e2ad764..768f8ab448b8 100644 --- a/fs/unicode/utf8-norm.c +++ b/fs/unicode/utf8-norm.c @@ -17,7 +17,6 @@ int utf8version_is_supported(const struct unicode_map *um, unsigned int version) } return 0; } -EXPORT_SYMBOL(utf8version_is_supported); /* * UTF-8 valid ranges. @@ -407,7 +406,6 @@ ssize_t utf8nlen(const struct unicode_map *um, enum utf8_normalization n, } return ret; } -EXPORT_SYMBOL(utf8nlen); /* * Set up an utf8cursor for use by utf8byte(). @@ -442,7 +440,6 @@ int utf8ncursor(struct utf8cursor *u8c, const struct unicode_map *um, return -1; return 0; } -EXPORT_SYMBOL(utf8ncursor); /* * Get one byte from the normalized form of the string described by u8c. @@ -588,4 +585,10 @@ int utf8byte(struct utf8cursor *u8c) } } } -EXPORT_SYMBOL(utf8byte); + +#ifdef CONFIG_UNICODE_NORMALIZATION_SELFTEST_MODULE +EXPORT_SYMBOL_GPL(utf8version_is_supported); +EXPORT_SYMBOL_GPL(utf8nlen); +EXPORT_SYMBOL_GPL(utf8ncursor); +EXPORT_SYMBOL_GPL(utf8byte); +#endif