[v8,4/4] fs: unicode: Add utf8 module and a unicode layer

utf8data.h_shipped has a large database table which is an auto-generated
decodification trie for the unicode normalization functions.
We can avoid carrying this large table in the kernel unless it is required
by the filesystem during boot process.

Hence, make UTF-8 encoding loadable by converting it into a module and
also add built-in UTF-8 support option for compiling it into the
kernel whenever required by the filesystem.

Modify the file called unicode-core which will act as a layer for
unicode subsystem. It will be responsible for loading the UTF-8 module
and accessing it's functions.

Currently, only UTF-8 encoding is supported but if any other encodings
are supported in future then the layer file would be responsible for
loading the desired encoding module.

Also, indirect calls using function pointers are slow, use static calls to
avoid overhead caused in case of repeated indirect calls. Static calls
improves the performance by directly calling the functions as opposed to
indirect calls.

Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
---
Changes in v8
  - Improve the commit messages to better understand the use of built-in option.
  - Improve the help text in Kconfig for avoiding contradictory statements.
  - Make spinlock definition static.
  - Use int instead of bool to avoid gcc warning.
  - Add a comment for decribing why we are using try_then_request_module()
    instead of request_module()

Changes in v7
  - Update the help text in Kconfig
  - Handle the unicode_load_static_call function failure by decrementing
    the reference.
  - Correct the code for handling built-in utf8 option as well.
  - Correct the synchronization for accessing utf8mod.
  - Make changes to unicode_unload() for handling the situation where
    utf8mod != NULL and um == NULL.

Changes in v6
  - Add spinlock to protect utf8mod and avoid NULL pointer
    dereference.
  - Change the static call function names for being consistent with
    kernel coding style.
  - Merge the unicode_load_module function with unicode_load as it is
    not really needed to have a separate function.
  - Use try_then_module_get instead of module_get to avoid loading the
    module even when it is already loaded.
  - Improve the commit message.

Changes in v5
  - Rename global variables and default static call functions for better
    understanding
  - Make only config UNICODE_UTF8 visible and config UNICODE to be always
    enabled provided UNICODE_UTF8 is enabled.  
  - Improve the documentation for Kconfig
  - Improve the commit message.

Changes in v4
  - Return error from the static calls instead of doing nothing and
    succeeding even without loading the module.
  - Remove the complete usage of utf8_ops and use static calls at all
    places.
  - Restore the static calls to default values when module is unloaded.
  - Decrement the reference of module after calling the unload function.
  - Remove spinlock as there will be no race conditions after removing
    utf8_ops.

Changes in v3
  - Add a patch which checks if utf8 is loaded before calling utf8_unload()
    in ext4 and f2fs filesystems
  - Return error if strscpy() returns value < 0
  - Correct the conditions to prevent NULL pointer dereference while
    accessing functions via utf8_ops variable.
  - Add spinlock to avoid race conditions.
  - Use static_call() for preventing speculative execution attacks.

Changes in v2
  - Remove the duplicate file from the last patch.
  - Make the wrapper functions inline.
  - Remove msleep and use try_module_get() and module_put()
    for ensuring that module is loaded correctly and also
    doesn't get unloaded while in use.
  - Resolve the warning reported by kernel test robot.
  - Resolve all the checkpatch.pl warnings.

 fs/unicode/Kconfig        |  26 +++-
 fs/unicode/Makefile       |   5 +-
 fs/unicode/unicode-core.c | 310 +++++++++++++++-----------------------
 fs/unicode/unicode-utf8.c | 264 ++++++++++++++++++++++++++++++++
 include/linux/unicode.h   |  96 ++++++++++--
 5 files changed, 496 insertions(+), 205 deletions(-)
 create mode 100644 fs/unicode/unicode-utf8.c

Message ID	20210423205136.1015456-5-shreeya.patel@collabora.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> sender: shreeya) with ESMTPSA id 7B01A1F43B29 From: Shreeya Patel <shreeya.patel@collabora.com> To: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org, chao@kernel.org, krisman@collabora.com, ebiggers@google.com, drosen@google.com, ebiggers@kernel.org, yuchao0@huawei.com Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, kernel@collabora.com, andre.almeida@collabora.com Subject: [PATCH v8 4/4] fs: unicode: Add utf8 module and a unicode layer Date: Sat, 24 Apr 2021 02:21:36 +0530 Message-Id: <20210423205136.1015456-5-shreeya.patel@collabora.com> In-Reply-To: <20210423205136.1015456-1-shreeya.patel@collabora.com> References: <20210423205136.1015456-1-shreeya.patel@collabora.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Make UTF-8 encoding loadable \| expand [v8,0/4] Make UTF-8 encoding loadable [v8,1/4] fs: unicode: Use strscpy() instead of strncpy() [v8,2/4] fs: unicode: Rename function names from utf8 to unicode [v8,3/4] fs: unicode: Rename utf8-core file to unicode-core [v8,4/4] fs: unicode: Add utf8 module and a unicode layer

[v8,4/4] fs: unicode: Add utf8 module and a unicode layer

Commit Message

Comments

Patch