From patchwork Mon Oct 21 16:37:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 13844443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 010A3D15DBB for ; Mon, 21 Oct 2024 16:38:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 82D376B0099; Mon, 21 Oct 2024 12:38:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DD446B009A; Mon, 21 Oct 2024 12:38:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67E326B009B; Mon, 21 Oct 2024 12:38:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4B4CD6B0099 for ; Mon, 21 Oct 2024 12:38:12 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B6D0F806CE for ; Mon, 21 Oct 2024 16:37:58 +0000 (UTC) X-FDA: 82698166446.05.97DD4A6 Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) by imf20.hostedemail.com (Postfix) with ESMTP id EBCFF1C0019 for ; Mon, 21 Oct 2024 16:37:51 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=mTOv6970; spf=pass (imf20.hostedemail.com: domain of andrealmeid@igalia.com designates 178.60.130.6 as permitted sender) smtp.mailfrom=andrealmeid@igalia.com; dmarc=pass (policy=none) header.from=igalia.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729528491; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SYqDKiNCj94D/RjQ3mqSngWF/SOfGh3llXqpE7JA7tA=; b=2vX4w7zzX8Usse5m6P7X6gZmNfh60XQepxiMrc1Y53OibA8ZxSrCCzp7fFgo5xl46L3DcL jz3xP+CLoPdP7gRoUnLiiyrj/mZLpRjj/wL6iIv38pWEHd0b14xVvNuTXHpU7qMn6ziyTl a3NG3WY+lnoGLBde/OnUJJX7p563gII= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=mTOv6970; spf=pass (imf20.hostedemail.com: domain of andrealmeid@igalia.com designates 178.60.130.6 as permitted sender) smtp.mailfrom=andrealmeid@igalia.com; dmarc=pass (policy=none) header.from=igalia.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729528491; a=rsa-sha256; cv=none; b=1YsFMPwjJhpEzWwpd6MGVqQHkh0sO4xhwA9bS2ivPEpLi6NXylkj7bNWAixpyn0GJmiGm5 X/jHiDMZQTmpJQCZNxxMBuVz6jPs0GhDp9C9OZkH9DQTbvp6cnpZErMfcwHU2iR3ZmGMau oAzAyPmrthbHUyoZH1HwdIAA5JxP4mQ= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Cc:To:In-Reply-To:References:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:From:Sender: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=SYqDKiNCj94D/RjQ3mqSngWF/SOfGh3llXqpE7JA7tA=; b=mTOv6970u9vULdRkC82HYxMNTq fi9Y1qtnmPAR6Ej6TQVXdugc/rgR/299vmYJ6Qe7gJpwzX3kV+AdKZXsboBhxYKXPPAJQ7tAnRMPK /cyaK8a0po2U2kDfmt+bucxt+h+xUIkDFuH5uUCl3Fsyhn3fNcidJOoVEw4PWVHsRl+EWjLWltWX9 +A/TN64QnQPpII5SMjQ/7QWWEGoLlAmn0RrmzPEH+vmEY1daG/v2PfK3KqkpWzmC54Rocwqkyb+Ls lBR/JvWl4hXQZc5IlfrezJUejPz2h3VfCGvC5bfswj7rbCvHFhPoDLnXbPHnXv6631oUV86+6aHaG /8KraFlg==; Received: from [191.204.195.205] (helo=[192.168.15.100]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1t2vPx-00DECf-K1; Mon, 21 Oct 2024 18:38:05 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= Date: Mon, 21 Oct 2024 13:37:22 -0300 Subject: [PATCH v8 6/9] tmpfs: Add casefold lookup support MIME-Version: 1.0 Message-Id: <20241021-tonyk-tmpfs-v8-6-f443d5814194@igalia.com> References: <20241021-tonyk-tmpfs-v8-0-f443d5814194@igalia.com> In-Reply-To: <20241021-tonyk-tmpfs-v8-0-f443d5814194@igalia.com> To: Gabriel Krisman Bertazi , Alexander Viro , Christian Brauner , Jan Kara , Theodore Ts'o , Andreas Dilger , Hugh Dickins , Andrew Morton , Jonathan Corbet , smcv@collabora.com Cc: kernel-dev@igalia.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, Gabriel Krisman Bertazi , Gabriel Krisman Bertazi , =?utf-8?q?Andr=C3=A9_Almeida?= X-Mailer: b4 0.14.2 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: EBCFF1C0019 X-Stat-Signature: bd3waoy5fmha8ij4bgi4us6zsre3tony X-Rspam-User: X-HE-Tag: 1729528671-745688 X-HE-Meta: U2FsdGVkX1/cR2YL1gtrIM7GJYSj2Dy7ObG5ShqFwnSzwN1T8m+6KvCFC/4r0u8IFh2tA22/wgfnTbcV4PAVF+PmVz18/JNxux4Q5WjIPbZQHifxKM+d+9D9pizzpcNpsYQGQCiJ8VGzsM3EK4QdnX4/cLE6/z2FwDrFc2kOdyhmNsy4+WNz+MVz5h3ItsEu7vnpxLjpaQ2+wy63HUDUbzgONuyokDPX86PU4mLWqOqpBujf2NEKI3Iu8TR/5j3V0RtMlb1IkA7/oYYiZNnOMEdGtYx5NZYHYo8gERWtZj2plMs+d4274zCln9sVl/BcsHCF1/EIUQTKEpj9/wlUiKyA0QoXpZgR6GJqi++XY5Mrs92SiauFLjoe2WJIZT+cbdZBec0YeVXzSbBaThLNunqkp2FE0OQOG1KBlWR3qqNKhR8pAHYPdc45O0iQMNhupOfsZ6OGqAIurYh1k3Urw9B4QGdRKsZuZqKoGjZHALg4nWHN/IFvDKMPtuiAdNSwxTcSW4Vii0DWVfaSrmQbUxgA2tWs/lJRMLvKt2gLLjQSLhOTFSTl4QSbq9SBprffekjmnzXEGkrl23TkV2w/hPWSmWScFPWtvW6VNtpivHyg50o965maTVjTG1Sm+kekH1iA7EfcPVfkq+9ZaVUVl6VsriDbsA/XxSL/gibNFPygCi429Luj1KrxiP0L85/bIA4lal+PFu7jkAgWkCXvFwfX1ceA/oj4lwM2yf6++wT7jaJ4TlT2XSQ0tNCkTktFlRV6g13bVUlH7WNL2E1tXhpKkVumXkmlie4EdKQcrW08166BL0/VnAawBjMSx+dctyTO781YPXUbdLTHo3OhztvA4ZtMtzfs1mJVGAou/8Mbx6/uvGI/b054u/6pE8a/qaJFccu1ebaqieAmTsSWX3rPoLjDagxvGFFIEp0UIxcg3zSrNF+ezD2t6dEoEXHmvt5/gJIqh/8oqt+2Z68 4vcXpeSj k+uUtAS996ibOziZM/2q91+/vEqk4vrGT+tLtxSt5Sza/bAeDtR6bJUHDLKYE0eCwyv/21xUR03t1DFQNFbH2jYvGZHP5/4rSItTIW7JvL7L5n98ppdVJtjodjfOV8H/NH1krSOYQnOw5lp/qofm+TBViOMb3KulpmxunV6X+wZ8gciwcTY3BrNJdvBCznwIMjIQ32EURggG3sRujOzvo6Bx8yJXAaz73EOfkY/67GZjMFBX2dMTseBX1/bb41YcTKrVM/FNnUAjjuDlrHETXQxJKECwCE/pJ4D1SQdtQ9QJd91GqGkIdCOXYR5CYmfSeiU/jEpzardTkKTal+E1pEYZXfQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Enable casefold lookup in tmpfs, based on the encoding defined by userspace. That means that instead of comparing byte per byte a file name, it compares to a case-insensitive equivalent of the Unicode string. * Dcache handling There's a special need when dealing with case-insensitive dentries. First of all, we currently invalidated every negative casefold dentries. That happens because currently VFS code has no proper support to deal with that, giving that it could incorrectly reuse a previous filename for a new file that has a casefold match. For instance, this could happen: $ mkdir DIR $ rm -r DIR $ mkdir dir $ ls DIR/ And would be perceived as inconsistency from userspace point of view, because even that we match files in a case-insensitive manner, we still honor whatever is the initial filename. Along with that, tmpfs stores only the first equivalent name dentry used in the dcache, preventing duplications of dentries in the dcache. The d_compare() version for casefold files uses a normalized string, so the filename under lookup will be compared to another normalized string for the existing file, achieving a casefolded lookup. * Enabling casefold via mount options Most filesystems have their data stored in disk, so casefold option need to be enabled when building a filesystem on a device (via mkfs). However, as tmpfs is a RAM backed filesystem, there's no disk information and thus no mkfs to store information about casefold. For tmpfs, create casefold options for mounting. Userspace can then enable casefold support for a mount point using: $ mount -t tmpfs -o casefold=utf8-12.1.0 fs_name mount_dir/ Userspace must set what Unicode standard is aiming to. The available options depends on what the kernel Unicode subsystem supports. And for strict encoding: $ mount -t tmpfs -o casefold=utf8-12.1.0,strict_encoding fs_name mount_dir/ Strict encoding means that tmpfs will refuse to create invalid UTF-8 sequences. When this option is not enabled, any invalid sequence will be treated as an opaque byte sequence, ignoring the encoding thus not being able to be looked up in a case-insensitive way. * Check for casefold dirs on simple_lookup() On simple_lookup(), do not create dentries for casefold directories. Currently, VFS does not support case-insensitive negative dentries and can create inconsistencies in the filesystem. Prevent such dentries to being created in the first place. Reviewed-by: Gabriel Krisman Bertazi Reviewed-by: Gabriel Krisman Bertazi Signed-off-by: André Almeida --- Changes from v7: - Consistently guard `encoding` and `strict_encoding` fields from struct shmem_options, so those fields only exists with CONFIG_UNICODE. Changes from v6: - Dropped patch "tmpfs: Always set simple_dentry_operations as dentry ops" - Re-place generic_ci_validate_strict_name() before inode creation Changes from v4: - Squash commit Check for casefold dirs on simple_lookup() here - Fails to mount if strict_encoding is used without encoding - tmpfs doesn't support fscrypt, so I dropped d_revalidate line Changes from v3: - Simplified shmem_parse_opt_casefold() - sb->s_d_op is set to shmem_ci_dentry_ops during mount time - got rid of shmem_lookup(), modified simple_lookup() Changes from v2: - simple_lookup() now sets d_ops - reworked shmem_parse_opt_casefold() - if `mount -o casefold` has no param, load latest UTF-8 version - using (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir) when possible --- fs/libfs.c | 4 ++ mm/shmem.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 127 insertions(+), 4 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index 7b290404c5f9901010ada2f921a214dbc94eb5fa..a168ece5cc61b74114f537f5b7b8a07f2d48b2aa 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -77,6 +77,10 @@ struct dentry *simple_lookup(struct inode *dir, struct dentry *dentry, unsigned return ERR_PTR(-ENAMETOOLONG); if (!dentry->d_sb->s_d_op) d_set_d_op(dentry, &simple_dentry_operations); + + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir)) + return NULL; + d_add(dentry, NULL); return NULL; } diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23cf9ba5b8117ece2b467a434f7c0a3..f26488ff3d6ae1abb9b63d55ca74909249dbf4eb 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -40,6 +40,7 @@ #include #include #include +#include #include "swap.h" static struct vfsmount *shm_mnt __ro_after_init; @@ -123,6 +124,10 @@ struct shmem_options { bool noswap; unsigned short quota_types; struct shmem_quota_limits qlimits; +#if IS_ENABLED(CONFIG_UNICODE) + struct unicode_map *encoding; + bool strict_encoding; +#endif #define SHMEM_SEEN_BLOCKS 1 #define SHMEM_SEEN_INODES 2 #define SHMEM_SEEN_HUGE 4 @@ -3565,6 +3570,9 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir, struct inode *inode; int error; + if (!generic_ci_validate_strict_name(dir, &dentry->d_name)) + return -EINVAL; + inode = shmem_get_inode(idmap, dir->i_sb, dir, mode, dev, VM_NORESERVE); if (IS_ERR(inode)) return PTR_ERR(inode); @@ -3584,7 +3592,12 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir, dir->i_size += BOGO_DIRENT_SIZE; inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir)); inode_inc_iversion(dir); - d_instantiate(dentry, inode); + + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir)) + d_add(dentry, inode); + else + d_instantiate(dentry, inode); + dget(dentry); /* Extra count - pin the dentry in core */ return error; @@ -3675,7 +3688,10 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, inc_nlink(inode); ihold(inode); /* New dentry reference */ dget(dentry); /* Extra pinning count for the created dentry */ - d_instantiate(dentry, inode); + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir)) + d_add(dentry, inode); + else + d_instantiate(dentry, inode); out: return ret; } @@ -3695,6 +3711,14 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry) inode_inc_iversion(dir); drop_nlink(inode); dput(dentry); /* Undo the count from "create" - does all the work */ + + /* + * For now, VFS can't deal with case-insensitive negative dentries, so + * we invalidate them + */ + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir)) + d_invalidate(dentry); + return 0; } @@ -3839,7 +3863,10 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir, dir->i_size += BOGO_DIRENT_SIZE; inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir)); inode_inc_iversion(dir); - d_instantiate(dentry, inode); + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir)) + d_add(dentry, inode); + else + d_instantiate(dentry, inode); dget(dentry); return 0; @@ -4192,6 +4219,9 @@ enum shmem_param { Opt_usrquota_inode_hardlimit, Opt_grpquota_block_hardlimit, Opt_grpquota_inode_hardlimit, + Opt_casefold_version, + Opt_casefold, + Opt_strict_encoding, }; static const struct constant_table shmem_param_enums_huge[] = { @@ -4223,9 +4253,54 @@ const struct fs_parameter_spec shmem_fs_parameters[] = { fsparam_string("grpquota_block_hardlimit", Opt_grpquota_block_hardlimit), fsparam_string("grpquota_inode_hardlimit", Opt_grpquota_inode_hardlimit), #endif + fsparam_string("casefold", Opt_casefold_version), + fsparam_flag ("casefold", Opt_casefold), + fsparam_flag ("strict_encoding", Opt_strict_encoding), {} }; +#if IS_ENABLED(CONFIG_UNICODE) +static int shmem_parse_opt_casefold(struct fs_context *fc, struct fs_parameter *param, + bool latest_version) +{ + struct shmem_options *ctx = fc->fs_private; + unsigned int version = UTF8_LATEST; + struct unicode_map *encoding; + char *version_str = param->string + 5; + + if (!latest_version) { + if (strncmp(param->string, "utf8-", 5)) + return invalfc(fc, "Only UTF-8 encodings are supported " + "in the format: utf8-"); + + version = utf8_parse_version(version_str); + if (version < 0) + return invalfc(fc, "Invalid UTF-8 version: %s", version_str); + } + + encoding = utf8_load(version); + + if (IS_ERR(encoding)) { + return invalfc(fc, "Failed loading UTF-8 version: utf8-%u.%u.%u\n", + unicode_major(version), unicode_minor(version), + unicode_rev(version)); + } + + pr_info("tmpfs: Using encoding : utf8-%u.%u.%u\n", + unicode_major(version), unicode_minor(version), unicode_rev(version)); + + ctx->encoding = encoding; + + return 0; +} +#else +static int shmem_parse_opt_casefold(struct fs_context *fc, struct fs_parameter *param, + bool latest_version) +{ + return invalfc(fc, "tmpfs: Kernel not built with CONFIG_UNICODE\n"); +} +#endif + static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) { struct shmem_options *ctx = fc->fs_private; @@ -4384,6 +4459,17 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) "Group quota inode hardlimit too large."); ctx->qlimits.grpquota_ihardlimit = size; break; + case Opt_casefold_version: + return shmem_parse_opt_casefold(fc, param, false); + case Opt_casefold: + return shmem_parse_opt_casefold(fc, param, true); + case Opt_strict_encoding: +#if IS_ENABLED(CONFIG_UNICODE) + ctx->strict_encoding = true; + break; +#else + return invalfc(fc, "tmpfs: Kernel not built with CONFIG_UNICODE\n"); +#endif } return 0; @@ -4613,6 +4699,11 @@ static void shmem_put_super(struct super_block *sb) { struct shmem_sb_info *sbinfo = SHMEM_SB(sb); +#if IS_ENABLED(CONFIG_UNICODE) + if (sb->s_encoding) + utf8_unload(sb->s_encoding); +#endif + #ifdef CONFIG_TMPFS_QUOTA shmem_disable_quotas(sb); #endif @@ -4623,6 +4714,14 @@ static void shmem_put_super(struct super_block *sb) sb->s_fs_info = NULL; } +#if IS_ENABLED(CONFIG_UNICODE) && defined(CONFIG_TMPFS) +static const struct dentry_operations shmem_ci_dentry_ops = { + .d_hash = generic_ci_d_hash, + .d_compare = generic_ci_d_compare, + .d_delete = always_delete_dentry, +}; +#endif + static int shmem_fill_super(struct super_block *sb, struct fs_context *fc) { struct shmem_options *ctx = fc->fs_private; @@ -4657,9 +4756,25 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc) } sb->s_export_op = &shmem_export_ops; sb->s_flags |= SB_NOSEC | SB_I_VERSION; + +#if IS_ENABLED(CONFIG_UNICODE) + if (!ctx->encoding && ctx->strict_encoding) { + pr_err("tmpfs: strict_encoding option without encoding is forbidden\n"); + error = -EINVAL; + goto failed; + } + + if (ctx->encoding) { + sb->s_encoding = ctx->encoding; + sb->s_d_op = &shmem_ci_dentry_ops; + if (ctx->strict_encoding) + sb->s_encoding_flags = SB_ENC_STRICT_MODE_FL; + } +#endif + #else sb->s_flags |= SB_NOUSER; -#endif +#endif /* CONFIG_TMPFS */ sbinfo->max_blocks = ctx->blocks; sbinfo->max_inodes = ctx->inodes; sbinfo->free_ispace = sbinfo->max_inodes * BOGO_INODE_SIZE; @@ -4933,6 +5048,10 @@ int shmem_init_fs_context(struct fs_context *fc) ctx->uid = current_fsuid(); ctx->gid = current_fsgid(); +#if IS_ENABLED(CONFIG_UNICODE) + ctx->encoding = NULL; +#endif + fc->fs_private = ctx; fc->ops = &shmem_fs_context_ops; return 0;