diff mbox series

[v8,4/4] fs: unicode: Add utf8 module and a unicode layer

Message ID 20210423205136.1015456-5-shreeya.patel@collabora.com (mailing list archive)
State New, archived
Headers show
Series Make UTF-8 encoding loadable | expand

Commit Message

Shreeya Patel April 23, 2021, 8:51 p.m. UTC
utf8data.h_shipped has a large database table which is an auto-generated
decodification trie for the unicode normalization functions.
We can avoid carrying this large table in the kernel unless it is required
by the filesystem during boot process.

Hence, make UTF-8 encoding loadable by converting it into a module and
also add built-in UTF-8 support option for compiling it into the
kernel whenever required by the filesystem.

Modify the file called unicode-core which will act as a layer for
unicode subsystem. It will be responsible for loading the UTF-8 module
and accessing it's functions.

Currently, only UTF-8 encoding is supported but if any other encodings
are supported in future then the layer file would be responsible for
loading the desired encoding module.

Also, indirect calls using function pointers are slow, use static calls to
avoid overhead caused in case of repeated indirect calls. Static calls
improves the performance by directly calling the functions as opposed to
indirect calls.

Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
---
Changes in v8
  - Improve the commit messages to better understand the use of built-in option.
  - Improve the help text in Kconfig for avoiding contradictory statements.
  - Make spinlock definition static.
  - Use int instead of bool to avoid gcc warning.
  - Add a comment for decribing why we are using try_then_request_module()
    instead of request_module()

Changes in v7
  - Update the help text in Kconfig
  - Handle the unicode_load_static_call function failure by decrementing
    the reference.
  - Correct the code for handling built-in utf8 option as well.
  - Correct the synchronization for accessing utf8mod.
  - Make changes to unicode_unload() for handling the situation where
    utf8mod != NULL and um == NULL.

Changes in v6
  - Add spinlock to protect utf8mod and avoid NULL pointer
    dereference.
  - Change the static call function names for being consistent with
    kernel coding style.
  - Merge the unicode_load_module function with unicode_load as it is
    not really needed to have a separate function.
  - Use try_then_module_get instead of module_get to avoid loading the
    module even when it is already loaded.
  - Improve the commit message.

Changes in v5
  - Rename global variables and default static call functions for better
    understanding
  - Make only config UNICODE_UTF8 visible and config UNICODE to be always
    enabled provided UNICODE_UTF8 is enabled.  
  - Improve the documentation for Kconfig
  - Improve the commit message.
 
Changes in v4
  - Return error from the static calls instead of doing nothing and
    succeeding even without loading the module.
  - Remove the complete usage of utf8_ops and use static calls at all
    places.
  - Restore the static calls to default values when module is unloaded.
  - Decrement the reference of module after calling the unload function.
  - Remove spinlock as there will be no race conditions after removing
    utf8_ops.

Changes in v3
  - Add a patch which checks if utf8 is loaded before calling utf8_unload()
    in ext4 and f2fs filesystems
  - Return error if strscpy() returns value < 0
  - Correct the conditions to prevent NULL pointer dereference while
    accessing functions via utf8_ops variable.
  - Add spinlock to avoid race conditions.
  - Use static_call() for preventing speculative execution attacks.

Changes in v2
  - Remove the duplicate file from the last patch.
  - Make the wrapper functions inline.
  - Remove msleep and use try_module_get() and module_put()
    for ensuring that module is loaded correctly and also
    doesn't get unloaded while in use.
  - Resolve the warning reported by kernel test robot.
  - Resolve all the checkpatch.pl warnings.

 fs/unicode/Kconfig        |  26 +++-
 fs/unicode/Makefile       |   5 +-
 fs/unicode/unicode-core.c | 310 +++++++++++++++-----------------------
 fs/unicode/unicode-utf8.c | 264 ++++++++++++++++++++++++++++++++
 include/linux/unicode.h   |  96 ++++++++++--
 5 files changed, 496 insertions(+), 205 deletions(-)
 create mode 100644 fs/unicode/unicode-utf8.c

Comments

Christoph Hellwig April 27, 2021, 6:29 a.m. UTC | #1
On Sat, Apr 24, 2021 at 02:21:36AM +0530, Shreeya Patel wrote:
> utf8data.h_shipped has a large database table which is an auto-generated
> decodification trie for the unicode normalization functions.
> We can avoid carrying this large table in the kernel unless it is required
> by the filesystem during boot process.
> 
> Hence, make UTF-8 encoding loadable by converting it into a module and
> also add built-in UTF-8 support option for compiling it into the
> kernel whenever required by the filesystem.

The way this is implemement looks rather awkward.

Given that the large memory usage is for a data table and not for code,
why not treat is as a firmware blob and load it using request_firmware?
Shreeya Patel April 27, 2021, 10:09 a.m. UTC | #2
On 27/04/21 11:59 am, Christoph Hellwig wrote:
> On Sat, Apr 24, 2021 at 02:21:36AM +0530, Shreeya Patel wrote:
>> utf8data.h_shipped has a large database table which is an auto-generated
>> decodification trie for the unicode normalization functions.
>> We can avoid carrying this large table in the kernel unless it is required
>> by the filesystem during boot process.
>>
>> Hence, make UTF-8 encoding loadable by converting it into a module and
>> also add built-in UTF-8 support option for compiling it into the
>> kernel whenever required by the filesystem.
> The way this is implemement looks rather awkward.
>
> Given that the large memory usage is for a data table and not for code,
> why not treat is as a firmware blob and load it using request_firmware?


utf8 module not just has the data table but also has some kernel code.
The big part that we are trying to keep out of the kernel is a tree 
structure that gets traversed based on a key that is the file name.
This is done when issuing a lookup in the filesystem, which has to be 
very fast. So maybe it would not be so good to use request_firmware for
such a core feature.
Theodore Ts'o April 27, 2021, 2:50 p.m. UTC | #3
On Tue, Apr 27, 2021 at 03:39:15PM +0530, Shreeya Patel wrote:
> > > Hence, make UTF-8 encoding loadable by converting it into a module and
> > > also add built-in UTF-8 support option for compiling it into the
> > > kernel whenever required by the filesystem.
> > The way this is implemement looks rather awkward.

I think that's a bit awkard is the trying to create an abstraction
separation between the unicode and utf8 layers, just in case, at some
point, we want fs/unicode to support more than just utf8.

I think we're better off being opinionated here, and say that the only
unicode encoding that will be supported by the kernel is UTF-8.
Period.  In which case, we don't need to try to insert this unneeded
abstraction layer.

If you really want to make make fs/unicode support more than one
encoding --- say, UTF-16LE, as used by NTFS --- at that point we can
think about what the abstractions should look like.  For example, it
doesn't _actually_ make sense for the data-trie structures to be part
of the utf-8 encoding.  The normalization tables are for Unicode, and
it wouldn't make sense for UTF-16 to have its own normalization
tables, bloating the kernel even more.

It *is* true that the normalization tables have been optimized for
utf-8, because that's what the whole world actually uses; utf-16le is
really a legacy use case.  So presumably, we would probably find a way
to code up the utf-16 functions in a way that used the utf-8 data
tables, even if it wasn't 100% optimal in terms of speed.

But it's probably not worth it at this point.

> > Given that the large memory usage is for a data table and not for code,
> > why not treat is as a firmware blob and load it using request_firmware?
> 
> utf8 module not just has the data table but also has some kernel code.
> The big part that we are trying to keep out of the kernel is a tree
> structure that gets traversed based on a key that is the file name.
> This is done when issuing a lookup in the filesystem, which has to be very
> fast. So maybe it would not be so good to use request_firmware for
> such a core feature.

Speed really isn't a great argument here; the request_firmware is
something that would only need to be done once, when a file system
which requires Unicode normalization and/or case-folding is mounted.

I think the better argument to make is just one of simplicity;
separating the Unicode data table from the kernel adds complexity.  It
also reduces flexibility, since for use cases where it's actually
_preferable_ to have Unicode functionality permanently built-in the
kernel, we now force the use of some kind of initial ramdisk to load a
module before the root file system (which might require Unicode
support) could even be mounted.

The argument *for* making the Unicode table be a loadable firmware is
that it might make it possible to upgrade to a newer version of
Unicode without needing to do a kernel recompile.  On average, Unicode
relases a new to support new character sets every year or so, or when
there Japanese Emperor requiring a new reign name :-).  Usually the
new character sets are for obscure ancient alphabets, and so it's
really not a big deal if the kernel doesn't support, say,
Chorasmian[1] or Dives Akuru[2].  Perhaps people would make a much
bigger deal about new Emoji characters, or new code points for the
Creative Commons symbols.  I'm personally not excited enough to claim
that it's worth the extra complexity, but some people might think so.  :-)

[1] used in Central Asia across Uzbekistan, Kazakhstan, and
Turkmenistan to write an extinct Eastern Iranian language.

[2] historically used in the Maldives until the 20th century.

Of course, using those new Emoji symbols in file names would reduce
portability of that file system if Strict Normalization was mandated.
Fortunately, ext4 and f2fs don't enable strict normalizaation by
default, which is also good, because it means if we don't have the
latest Unicode update in the kernel, it doesn't really matter that
much.... again, not worth the extra complexity/headache IMHO.

Cheers,

					- Ted
Gabriel Krisman Bertazi April 27, 2021, 3:06 p.m. UTC | #4
"Theodore Ts'o" <tytso@mit.edu> writes:

> On Tue, Apr 27, 2021 at 03:39:15PM +0530, Shreeya Patel wrote:
>> > > Hence, make UTF-8 encoding loadable by converting it into a module and
>> > > also add built-in UTF-8 support option for compiling it into the
>> > > kernel whenever required by the filesystem.
>> > The way this is implemement looks rather awkward.
>
> I think that's a bit awkard is the trying to create an abstraction
> separation between the unicode and utf8 layers, just in case, at some
> point, we want fs/unicode to support more than just utf8.
>
> I think we're better off being opinionated here, and say that the only
> unicode encoding that will be supported by the kernel is UTF-8.
> Period.  In which case, we don't need to try to insert this unneeded
> abstraction layer.
>
> If you really want to make make fs/unicode support more than one
> encoding --- say, UTF-16LE, as used by NTFS --- at that point we can
> think about what the abstractions should look like.  For example, it
> doesn't _actually_ make sense for the data-trie structures to be part
> of the utf-8 encoding.  The normalization tables are for Unicode, and
> it wouldn't make sense for UTF-16 to have its own normalization
> tables, bloating the kernel even more.
>
> It *is* true that the normalization tables have been optimized for
> utf-8, because that's what the whole world actually uses; utf-16le is
> really a legacy use case.  So presumably, we would probably find a way
> to code up the utf-16 functions in a way that used the utf-8 data
> tables, even if it wasn't 100% optimal in terms of speed.
>
> But it's probably not worth it at this point.
>
>> > Given that the large memory usage is for a data table and not for code,
>> > why not treat is as a firmware blob and load it using request_firmware?
>> 
>> utf8 module not just has the data table but also has some kernel code.
>> The big part that we are trying to keep out of the kernel is a tree
>> structure that gets traversed based on a key that is the file name.
>> This is done when issuing a lookup in the filesystem, which has to be very
>> fast. So maybe it would not be so good to use request_firmware for
>> such a core feature.
>
> Speed really isn't a great argument here; the request_firmware is
> something that would only need to be done once, when a file system
> which requires Unicode normalization and/or case-folding is mounted.
>
> I think the better argument to make is just one of simplicity;
> separating the Unicode data table from the kernel adds complexity.  It
> also reduces flexibility, since for use cases where it's actually
> _preferable_ to have Unicode functionality permanently built-in the
> kernel, we now force the use of some kind of initial ramdisk to load a
> module before the root file system (which might require Unicode
> support) could even be mounted.

FWIW, embedding FW images to the kernel is also well supported.  Making
the data trie a firmware doesn't make a ramdisk more of a requirement
than the module solution, I think.

> The argument *for* making the Unicode table be a loadable firmware is
> that it might make it possible to upgrade to a newer version of
> Unicode without needing to do a kernel recompile.  On average, Unicode
> relases a new to support new character sets every year or so, or when
> there Japanese Emperor requiring a new reign name :-).  Usually the
> new character sets are for obscure ancient alphabets, and so it's
> really not a big deal if the kernel doesn't support, say,
> Chorasmian[1] or Dives Akuru[2].  Perhaps people would make a much
> bigger deal about new Emoji characters, or new code points for the
> Creative Commons symbols.  I'm personally not excited enough to claim
> that it's worth the extra complexity, but some people might think so.
> :-)

We don't really care about emojis since they are not usually
normalized/folded, and unless you are using strict mode, they will be
invisible for the user. On a unrelated note, newer scripts are more
interesting and we should come up with some update policy someday, since
we are already lagging the unicode spec.  At least we are still in the
Reiwa Era, which was first supported in 12.1 :)

>
> [1] used in Central Asia across Uzbekistan, Kazakhstan, and
> Turkmenistan to write an extinct Eastern Iranian language.
>
> [2] historically used in the Maldives until the 20th century.
>
> Of course, using those new Emoji symbols in file names would reduce
> portability of that file system if Strict Normalization was mandated.
> Fortunately, ext4 and f2fs don't enable strict normalizaation by
> default, which is also good, because it means if we don't have the
> latest Unicode update in the kernel, it doesn't really matter that
> much.... again, not worth the extra complexity/headache IMHO.

ah yes, exactly.

>
> Cheers,
>
> 					- Ted
Theodore Ts'o April 28, 2021, 2:12 p.m. UTC | #5
On Tue, Apr 27, 2021 at 11:06:33AM -0400, Gabriel Krisman Bertazi wrote:
> > I think the better argument to make is just one of simplicity;
> > separating the Unicode data table from the kernel adds complexity.  It
> > also reduces flexibility, since for use cases where it's actually
> > _preferable_ to have Unicode functionality permanently built-in the
> > kernel, we now force the use of some kind of initial ramdisk to load a
> > module before the root file system (which might require Unicode
> > support) could even be mounted.
> 
> FWIW, embedding FW images to the kernel is also well supported.  Making
> the data trie a firmware doesn't make a ramdisk more of a requirement
> than the module solution, I think.

I don't think we support building firmware directly into the kernel
any more.  We used to, but IIRC, there was the feeling that 99.99% of
the time, firmware modules were not GPL compliant, and so we ripped
out that support.

So my point was with the module support, it's *optional* that it be
compiled as a module, which is convenient for those use cases, such as
for example a mobile handset --- where there is no need for modules
since the hardware doesn't change, and so modules and an initrd is
just unnecessary complexity --- and firmware, which would make an
initial ramdisk mandatory if you wanted to use the casefold feature.

Put another way, the only reason why putting the unicode tables in a
module is to make life easier for desktop distros.  For mobile
handsets, modules are an anti-feature, which is why there was no call
for supporting this initially, given the initial use case for the
casefold feature.

Cheers,

					- Ted
Gabriel Krisman Bertazi April 28, 2021, 6:58 p.m. UTC | #6
"Theodore Ts'o" <tytso@mit.edu> writes:

> On Tue, Apr 27, 2021 at 11:06:33AM -0400, Gabriel Krisman Bertazi wrote:
>> > I think the better argument to make is just one of simplicity;
>> > separating the Unicode data table from the kernel adds complexity.  It
>> > also reduces flexibility, since for use cases where it's actually
>> > _preferable_ to have Unicode functionality permanently built-in the
>> > kernel, we now force the use of some kind of initial ramdisk to load a
>> > module before the root file system (which might require Unicode
>> > support) could even be mounted.
>> 
>> FWIW, embedding FW images to the kernel is also well supported.  Making
>> the data trie a firmware doesn't make a ramdisk more of a requirement
>> than the module solution, I think.
>
> I don't think we support building firmware directly into the kernel
> any more.  We used to, but IIRC, there was the feeling that 99.99% of
> the time, firmware modules were not GPL compliant, and so we ripped
> out that support.

Support is still there on 5.12. See
Documentation/driver-api/firmware/built-in-fw.rst.

Personally, I use this feature very often on my development workflow,
for similar reasons to what you said below. In my case, avoiding the
complexity of initramfs but still being able to use my crappy
FW-dependent NIC card to boot from NFS. :)

> compiled as a module, which is convenient for those use cases, such as
> for example a mobile handset --- where there is no need for modules
> since the hardware doesn't change, and so modules and an initrd is
> just unnecessary complexity --- and firmware, which would make an
> initial ramdisk mandatory if you wanted to use the casefold feature.
>
> Put another way, the only reason why putting the unicode tables in a
> module is to make life easier for desktop distros.  For mobile
> handsets, modules are an anti-feature, which is why there was no call
> for supporting this initially, given the initial use case for the
> casefold feature.

What about support for firmware generation from the kernel tree and
installation to /lib/firmware? With a module, I can just call
modules_install, and dealing with modules is hardcoded in the mind of
every kernel developer.  Dealing with firmwares inside the kernel tree
is not common, and I didn't find an equivalent Makefile rule to build
and deploy firmwares on a path that firmware_loader knows about.

I think of firmware as code/data for a device, while modules is for the
kernel domain, even if it is a gross oversimplification.  Are there
other examples of firmware built from the kernel tree that are meant
exclusively to be used by the kernel, without hardware involvement?

For mobile devices, it wouldn't really matter whether it is built-in or
a firmware, right?  On a controlled environment like Android, you know
what to expect of your filesystem, so you know beforehand if your kernel
needs the table loaded (apart from sd cards.  Do people use ext4 for
sdcards in Android or is it all exfat?).  In those scenarios, you gain
very little by not making it built-in.
Christoph Hellwig May 11, 2021, 4:35 a.m. UTC | #7
On Tue, May 11, 2021 at 02:17:00AM +0530, Shreeya Patel wrote:
> Theodore / Christoph, since we haven't come up with any final decision with
> this discussion, how do you think we should proceed on this?

I think loading it as a firmware-like table is much preferable to
a module with all the static call magic papering over that it really is
just one specific table.
Shreeya Patel May 20, 2021, 8:19 p.m. UTC | #8
On 11/05/21 10:05 am, Christoph Hellwig wrote:
> On Tue, May 11, 2021 at 02:17:00AM +0530, Shreeya Patel wrote:
>> Theodore / Christoph, since we haven't come up with any final decision with
>> this discussion, how do you think we should proceed on this?
> I think loading it as a firmware-like table is much preferable to
> a module with all the static call magic papering over that it really is
> just one specific table.


Christoph, I get you point here but request_firmware API requires a 
device pointer and I don't
see anywhere it being NULL so I am not sure if we are going in the right 
way by loading the data as firmware like table.
Gabriel Krisman Bertazi June 3, 2021, 12:07 a.m. UTC | #9
Shreeya Patel <shreeya.patel@collabora.com> writes:

> On 11/05/21 10:05 am, Christoph Hellwig wrote:
>> On Tue, May 11, 2021 at 02:17:00AM +0530, Shreeya Patel wrote:
>>> Theodore / Christoph, since we haven't come up with any final decision with
>>> this discussion, how do you think we should proceed on this?
>> I think loading it as a firmware-like table is much preferable to
>> a module with all the static call magic papering over that it really is
>> just one specific table.
>
>
> Christoph, I get you point here but request_firmware API requires a
> device pointer and I don't
> see anywhere it being NULL so I am not sure if we are going in the right
> way by loading the data as firmware like table.

I wasn't going to really oppose it from being a firmware but this
detail, if required, makes the whole firmware idea more awkward.  If the
whole reason to make it a firmware is to avoid the module boilerplate,
this is just different boilerplate.  Once again, I don't know about
precedent of kernel data as a module, and there is the problem with
Makefile rules to install this stuff, that I mentioned.

We know we can get rid of the static call stuff already, since we likely
won't support more encodings anyway, so that would simplify a lot the
module specific code.
Christoph Hellwig June 16, 2021, 4:09 a.m. UTC | #10
On Wed, Jun 02, 2021 at 08:07:07PM -0400, Gabriel Krisman Bertazi wrote:
> I wasn't going to really oppose it from being a firmware but this
> detail, if required, makes the whole firmware idea more awkward.  If the
> whole reason to make it a firmware is to avoid the module boilerplate,
> this is just different boilerplate.  Once again, I don't know about
> precedent of kernel data as a module, and there is the problem with
> Makefile rules to install this stuff, that I mentioned.
> 
> We know we can get rid of the static call stuff already, since we likely
> won't support more encodings anyway, so that would simplify a lot the
> module specific code.

Well, another thing we can do is a data-only module.  That is a module
that just contains the tables, with the core code doing a symbol_get
on them.
diff mbox series

Patch

diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
index 2c27b9a5cd6c..250b3671a0e2 100644
--- a/fs/unicode/Kconfig
+++ b/fs/unicode/Kconfig
@@ -2,13 +2,31 @@ 
 #
 # UTF-8 normalization
 #
+# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8
+# is enabled. This config option adds the unicode subsystem layer which loads
+# the UTF-8 module whenever any filesystem needs it.
 config UNICODE
-	bool "UTF-8 normalization and casefolding support"
+	bool
+
+config UNICODE_UTF8
+	tristate "UTF-8 support for native Case-Insensitive filesystems"
+	select UNICODE
 	help
-	  Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding
-	  support.
+	  Say M here to enable UTF-8 NFD normalization and NFD+CF casefolding
+	  support as a loadable module or say Y for building it into the kernel.
+	  It is currently supported by EXT4 and F2FS filesystems.
+
+	  utf8data.h_shipped has a large database table which is an
+	  auto-generated decodification trie for the unicode normalization
+	  functions. Enabling UNICODE_UTF8 as M will allow you to avoid
+	  carrying this large table into the kernel and module will only be
+	  loaded whenever required by any filesystem.
+	  Please note, in this case utf8 module will only be available after
+	  booting into the compiled kernel. If your filesystem requires it to
+	  have utf8 during boot time then you should have it built into the
+	  kernel by saying Y here to avoid boot failure.
 
 config UNICODE_NORMALIZATION_SELFTEST
 	tristate "Test UTF-8 normalization support"
-	depends on UNICODE
+	depends on UNICODE_UTF8
 	default n
diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile
index fbf9a629ed0d..49d50083e6ee 100644
--- a/fs/unicode/Makefile
+++ b/fs/unicode/Makefile
@@ -1,11 +1,14 @@ 
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_UNICODE) += unicode.o
+obj-$(CONFIG_UNICODE_UTF8) += utf8.o
 obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o
 
-unicode-y := utf8-norm.o unicode-core.o
+unicode-y := unicode-core.o
+utf8-y := unicode-utf8.o utf8-norm.o
 
 $(obj)/utf8-norm.o: $(obj)/utf8data.h
+$(obj)/unicode-utf8.o: $(obj)/utf8-norm.o
 
 # In the normal build, the checked-in utf8data.h is just shipped.
 #
diff --git a/fs/unicode/unicode-core.c b/fs/unicode/unicode-core.c
index 730dbaedf593..bf442446ed89 100644
--- a/fs/unicode/unicode-core.c
+++ b/fs/unicode/unicode-core.c
@@ -1,228 +1,145 @@ 
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <linux/module.h>
 #include <linux/kernel.h>
-#include <linux/string.h>
 #include <linux/slab.h>
-#include <linux/parser.h>
 #include <linux/errno.h>
 #include <linux/unicode.h>
-#include <linux/stringhash.h>
+#include <linux/spinlock.h>
 
-#include "utf8n.h"
+static DEFINE_SPINLOCK(utf8mod_lock);
 
-int unicode_validate(const struct unicode_map *um, const struct qstr *str)
-{
-	const struct utf8data *data = utf8nfdi(um->version);
+static struct module *utf8mod;
+static bool utf8mod_loaded;
 
-	if (utf8nlen(data, str->name, str->len) < 0)
-		return -1;
-	return 0;
+int unicode_validate_default(const struct unicode_map *um,
+			     const struct qstr *str)
+{
+	WARN_ON(1);
+	return -EIO;
 }
-EXPORT_SYMBOL(unicode_validate);
+EXPORT_SYMBOL(unicode_validate_default);
 
-int unicode_strncmp(const struct unicode_map *um,
-		    const struct qstr *s1, const struct qstr *s2)
+int unicode_strncmp_default(const struct unicode_map *um,
+			    const struct qstr *s1,
+			    const struct qstr *s2)
 {
-	const struct utf8data *data = utf8nfdi(um->version);
-	struct utf8cursor cur1, cur2;
-	int c1, c2;
-
-	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
-		return -EINVAL;
-
-	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
-		return -EINVAL;
-
-	do {
-		c1 = utf8byte(&cur1);
-		c2 = utf8byte(&cur2);
-
-		if (c1 < 0 || c2 < 0)
-			return -EINVAL;
-		if (c1 != c2)
-			return 1;
-	} while (c1);
-
-	return 0;
+	WARN_ON(1);
+	return -EIO;
 }
-EXPORT_SYMBOL(unicode_strncmp);
+EXPORT_SYMBOL(unicode_strncmp_default);
 
-int unicode_strncasecmp(const struct unicode_map *um,
-			const struct qstr *s1, const struct qstr *s2)
+int unicode_strncasecmp_default(const struct unicode_map *um,
+				const struct qstr *s1,
+				const struct qstr *s2)
 {
-	const struct utf8data *data = utf8nfdicf(um->version);
-	struct utf8cursor cur1, cur2;
-	int c1, c2;
-
-	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
-		return -EINVAL;
-
-	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
-		return -EINVAL;
-
-	do {
-		c1 = utf8byte(&cur1);
-		c2 = utf8byte(&cur2);
-
-		if (c1 < 0 || c2 < 0)
-			return -EINVAL;
-		if (c1 != c2)
-			return 1;
-	} while (c1);
-
-	return 0;
+	WARN_ON(1);
+	return -EIO;
 }
-EXPORT_SYMBOL(unicode_strncasecmp);
-
-/* String cf is expected to be a valid UTF-8 casefolded
- * string.
- */
-int unicode_strncasecmp_folded(const struct unicode_map *um,
-			       const struct qstr *cf,
-			       const struct qstr *s1)
+EXPORT_SYMBOL(unicode_strncasecmp_default);
+
+int unicode_strncasecmp_folded_default(const struct unicode_map *um,
+				       const struct qstr *cf,
+				       const struct qstr *s1)
 {
-	const struct utf8data *data = utf8nfdicf(um->version);
-	struct utf8cursor cur1;
-	int c1, c2;
-	int i = 0;
-
-	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
-		return -EINVAL;
-
-	do {
-		c1 = utf8byte(&cur1);
-		c2 = cf->name[i++];
-		if (c1 < 0)
-			return -EINVAL;
-		if (c1 != c2)
-			return 1;
-	} while (c1);
-
-	return 0;
+	WARN_ON(1);
+	return -EIO;
 }
-EXPORT_SYMBOL(unicode_strncasecmp_folded);
+EXPORT_SYMBOL(unicode_strncasecmp_folded_default);
 
-int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
-		     unsigned char *dest, size_t dlen)
+int unicode_normalize_default(const struct unicode_map *um,
+			      const struct qstr *str,
+			      unsigned char *dest, size_t dlen)
 {
-	const struct utf8data *data = utf8nfdicf(um->version);
-	struct utf8cursor cur;
-	size_t nlen = 0;
-
-	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
-		return -EINVAL;
-
-	for (nlen = 0; nlen < dlen; nlen++) {
-		int c = utf8byte(&cur);
+	WARN_ON(1);
+	return -EIO;
+}
+EXPORT_SYMBOL(unicode_normalize_default);
 
-		dest[nlen] = c;
-		if (!c)
-			return nlen;
-		if (c == -1)
-			break;
-	}
-	return -EINVAL;
+int unicode_casefold_default(const struct unicode_map *um,
+			     const struct qstr *str,
+			     unsigned char *dest, size_t dlen)
+{
+	WARN_ON(1);
+	return -EIO;
 }
-EXPORT_SYMBOL(unicode_casefold);
+EXPORT_SYMBOL(unicode_casefold_default);
 
-int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
-			  struct qstr *str)
+int unicode_casefold_hash_default(const struct unicode_map *um,
+				  const void *salt, struct qstr *str)
 {
-	const struct utf8data *data = utf8nfdicf(um->version);
-	struct utf8cursor cur;
-	int c;
-	unsigned long hash = init_name_hash(salt);
-
-	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
-		return -EINVAL;
-
-	while ((c = utf8byte(&cur))) {
-		if (c < 0)
-			return -EINVAL;
-		hash = partial_name_hash((unsigned char)c, hash);
-	}
-	str->hash = end_name_hash(hash);
-	return 0;
+	WARN_ON(1);
+	return -EIO;
 }
-EXPORT_SYMBOL(unicode_casefold_hash);
+EXPORT_SYMBOL(unicode_casefold_hash_default);
 
-int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
-		      unsigned char *dest, size_t dlen)
+struct unicode_map *unicode_load_default(const char *version)
 {
-	const struct utf8data *data = utf8nfdi(um->version);
-	struct utf8cursor cur;
-	ssize_t nlen = 0;
+	WARN_ON(1);
+	return ERR_PTR(-EIO);
+}
+EXPORT_SYMBOL(unicode_load_default);
 
-	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
-		return -EINVAL;
+DEFINE_STATIC_CALL(unicode_validate_static_call, unicode_validate_default);
+EXPORT_STATIC_CALL(unicode_validate_static_call);
 
-	for (nlen = 0; nlen < dlen; nlen++) {
-		int c = utf8byte(&cur);
+DEFINE_STATIC_CALL(unicode_strncmp_static_call, unicode_strncmp_default);
+EXPORT_STATIC_CALL(unicode_strncmp_static_call);
 
-		dest[nlen] = c;
-		if (!c)
-			return nlen;
-		if (c == -1)
-			break;
-	}
-	return -EINVAL;
-}
-EXPORT_SYMBOL(unicode_normalize);
+DEFINE_STATIC_CALL(unicode_strncasecmp_static_call,
+		   unicode_strncasecmp_default);
+EXPORT_STATIC_CALL(unicode_strncasecmp_static_call);
 
-static int unicode_parse_version(const char *version, unsigned int *maj,
-				 unsigned int *min, unsigned int *rev)
-{
-	substring_t args[3];
-	char version_string[12];
-	static const struct match_token token[] = {
-		{1, "%d.%d.%d"},
-		{0, NULL}
-	};
-	int ret = strscpy(version_string, version, sizeof(version_string));
+DEFINE_STATIC_CALL(unicode_strncasecmp_folded_static_call,
+		   unicode_strncasecmp_folded_default);
+EXPORT_STATIC_CALL(unicode_strncasecmp_folded_static_call);
 
-	if (ret < 0)
-		return ret;
+DEFINE_STATIC_CALL(unicode_normalize_static_call, unicode_normalize_default);
+EXPORT_STATIC_CALL(unicode_normalize_static_call);
 
-	if (match_token(version_string, token, args) != 1)
-		return -EINVAL;
+DEFINE_STATIC_CALL(unicode_casefold_static_call, unicode_casefold_default);
+EXPORT_STATIC_CALL(unicode_casefold_static_call);
 
-	if (match_int(&args[0], maj) || match_int(&args[1], min) ||
-	    match_int(&args[2], rev))
-		return -EINVAL;
+DEFINE_STATIC_CALL(unicode_casefold_hash_static_call,
+		   unicode_casefold_hash_default);
+EXPORT_STATIC_CALL(unicode_casefold_hash_static_call);
 
-	return 0;
+DEFINE_STATIC_CALL(unicode_load_static_call, unicode_load_default);
+EXPORT_STATIC_CALL(unicode_load_static_call);
+
+static int utf8mod_get(void)
+{
+	int ret;
+
+	spin_lock(&utf8mod_lock);
+	ret = utf8mod_loaded && try_module_get(utf8mod);
+	spin_unlock(&utf8mod_lock);
+	return ret;
 }
 
 struct unicode_map *unicode_load(const char *version)
 {
-	struct unicode_map *um = NULL;
-	int unicode_version;
-
-	if (version) {
-		unsigned int maj, min, rev;
-
-		if (unicode_parse_version(version, &maj, &min, &rev) < 0)
-			return ERR_PTR(-EINVAL);
-
-		if (!utf8version_is_supported(maj, min, rev))
-			return ERR_PTR(-EINVAL);
-
-		unicode_version = UNICODE_AGE(maj, min, rev);
-	} else {
-		unicode_version = utf8version_latest();
-		printk(KERN_WARNING"UTF-8 version not specified. "
-		       "Assuming latest supported version (%d.%d.%d).",
-		       (unicode_version >> 16) & 0xff,
-		       (unicode_version >> 8) & 0xff,
-		       (unicode_version & 0xff));
+	struct unicode_map *um;
+
+	/*
+	 * try_then_request_module() is used here instead of using
+	 * request_module() because of the following problems that
+	 * could occur with the usage of request_module().
+	 * 1) Multiple calls in parallel to unicode_load() would fail if
+	 * kmod_concurrent_max == 0
+	 * 2) There would be unnecessary memory allocation and userspace
+	 * invocation in call_modprobe() that would always happen even if
+	 * the module is already loaded.
+	 * Hence, using try_then_request_module() would first check if the
+	 * module is already loaded, if not then it calls the request_module()
+	 * and finally would aquire the reference of the loaded module.
+	 */
+	if (!try_then_request_module(utf8mod_get(), "utf8")) {
+		pr_err("Failed to load UTF-8 module\n");
+		return ERR_PTR(-ENODEV);
 	}
-
-	um = kzalloc(sizeof(struct unicode_map), GFP_KERNEL);
-	if (!um)
-		return ERR_PTR(-ENOMEM);
-
-	um->charset = "UTF-8";
-	um->version = unicode_version;
+	um = static_call(unicode_load_static_call)(version);
+	if (IS_ERR(um))
+		module_put(utf8mod);
 
 	return um;
 }
@@ -230,8 +147,29 @@  EXPORT_SYMBOL(unicode_load);
 
 void unicode_unload(struct unicode_map *um)
 {
-	kfree(um);
+	if (um) {
+		kfree(um);
+		module_put(utf8mod);
+	}
 }
 EXPORT_SYMBOL(unicode_unload);
 
+void unicode_register(struct module *owner)
+{
+	spin_lock(&utf8mod_lock);
+	utf8mod = owner;
+	utf8mod_loaded = true;
+	spin_unlock(&utf8mod_lock);
+}
+EXPORT_SYMBOL(unicode_register);
+
+void unicode_unregister(void)
+{
+	spin_lock(&utf8mod_lock);
+	utf8mod = NULL;
+	utf8mod_loaded = false;
+	spin_unlock(&utf8mod_lock);
+}
+EXPORT_SYMBOL(unicode_unregister);
+
 MODULE_LICENSE("GPL v2");
diff --git a/fs/unicode/unicode-utf8.c b/fs/unicode/unicode-utf8.c
new file mode 100644
index 000000000000..e0180f1c5ea8
--- /dev/null
+++ b/fs/unicode/unicode-utf8.c
@@ -0,0 +1,264 @@ 
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/parser.h>
+#include <linux/errno.h>
+#include <linux/unicode.h>
+#include <linux/stringhash.h>
+#include <linux/static_call.h>
+
+#include "utf8n.h"
+
+static int utf8_validate(const struct unicode_map *um, const struct qstr *str)
+{
+	const struct utf8data *data = utf8nfdi(um->version);
+
+	if (utf8nlen(data, str->name, str->len) < 0)
+		return -1;
+	return 0;
+}
+
+static int utf8_strncmp(const struct unicode_map *um,
+			const struct qstr *s1, const struct qstr *s2)
+{
+	const struct utf8data *data = utf8nfdi(um->version);
+	struct utf8cursor cur1, cur2;
+	int c1, c2;
+
+	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
+		return -EINVAL;
+
+	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
+		return -EINVAL;
+
+	do {
+		c1 = utf8byte(&cur1);
+		c2 = utf8byte(&cur2);
+
+		if (c1 < 0 || c2 < 0)
+			return -EINVAL;
+		if (c1 != c2)
+			return 1;
+	} while (c1);
+
+	return 0;
+}
+
+static int utf8_strncasecmp(const struct unicode_map *um,
+			    const struct qstr *s1, const struct qstr *s2)
+{
+	const struct utf8data *data = utf8nfdicf(um->version);
+	struct utf8cursor cur1, cur2;
+	int c1, c2;
+
+	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
+		return -EINVAL;
+
+	if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
+		return -EINVAL;
+
+	do {
+		c1 = utf8byte(&cur1);
+		c2 = utf8byte(&cur2);
+
+		if (c1 < 0 || c2 < 0)
+			return -EINVAL;
+		if (c1 != c2)
+			return 1;
+	} while (c1);
+
+	return 0;
+}
+
+/* String cf is expected to be a valid UTF-8 casefolded
+ * string.
+ */
+static int utf8_strncasecmp_folded(const struct unicode_map *um,
+				   const struct qstr *cf,
+				   const struct qstr *s1)
+{
+	const struct utf8data *data = utf8nfdicf(um->version);
+	struct utf8cursor cur1;
+	int c1, c2;
+	int i = 0;
+
+	if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
+		return -EINVAL;
+
+	do {
+		c1 = utf8byte(&cur1);
+		c2 = cf->name[i++];
+		if (c1 < 0)
+			return -EINVAL;
+		if (c1 != c2)
+			return 1;
+	} while (c1);
+
+	return 0;
+}
+
+static int utf8_casefold(const struct unicode_map *um, const struct qstr *str,
+			 unsigned char *dest, size_t dlen)
+{
+	const struct utf8data *data = utf8nfdicf(um->version);
+	struct utf8cursor cur;
+	size_t nlen = 0;
+
+	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
+		return -EINVAL;
+
+	for (nlen = 0; nlen < dlen; nlen++) {
+		int c = utf8byte(&cur);
+
+		dest[nlen] = c;
+		if (!c)
+			return nlen;
+		if (c == -1)
+			break;
+	}
+	return -EINVAL;
+}
+
+static int utf8_casefold_hash(const struct unicode_map *um, const void *salt,
+			      struct qstr *str)
+{
+	const struct utf8data *data = utf8nfdicf(um->version);
+	struct utf8cursor cur;
+	int c;
+	unsigned long hash = init_name_hash(salt);
+
+	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
+		return -EINVAL;
+
+	while ((c = utf8byte(&cur))) {
+		if (c < 0)
+			return -EINVAL;
+		hash = partial_name_hash((unsigned char)c, hash);
+	}
+	str->hash = end_name_hash(hash);
+	return 0;
+}
+
+static int utf8_normalize(const struct unicode_map *um, const struct qstr *str,
+			  unsigned char *dest, size_t dlen)
+{
+	const struct utf8data *data = utf8nfdi(um->version);
+	struct utf8cursor cur;
+	ssize_t nlen = 0;
+
+	if (utf8ncursor(&cur, data, str->name, str->len) < 0)
+		return -EINVAL;
+
+	for (nlen = 0; nlen < dlen; nlen++) {
+		int c = utf8byte(&cur);
+
+		dest[nlen] = c;
+		if (!c)
+			return nlen;
+		if (c == -1)
+			break;
+	}
+	return -EINVAL;
+}
+
+static int utf8_parse_version(const char *version, unsigned int *maj,
+			      unsigned int *min, unsigned int *rev)
+{
+	substring_t args[3];
+	char version_string[12];
+	static const struct match_token token[] = {
+		{1, "%d.%d.%d"},
+		{0, NULL}
+	};
+
+	int ret = strscpy(version_string, version, sizeof(version_string));
+
+	if (ret < 0)
+		return ret;
+
+	if (match_token(version_string, token, args) != 1)
+		return -EINVAL;
+
+	if (match_int(&args[0], maj) || match_int(&args[1], min) ||
+	    match_int(&args[2], rev))
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct unicode_map *utf8_load(const char *version)
+{
+	struct unicode_map *um = NULL;
+	int unicode_version;
+
+	if (version) {
+		unsigned int maj, min, rev;
+
+		if (utf8_parse_version(version, &maj, &min, &rev) < 0)
+			return ERR_PTR(-EINVAL);
+
+		if (!utf8version_is_supported(maj, min, rev))
+			return ERR_PTR(-EINVAL);
+
+		unicode_version = UNICODE_AGE(maj, min, rev);
+	} else {
+		unicode_version = utf8version_latest();
+		pr_warn("UTF-8 version not specified. Assuming latest supported version (%d.%d.%d).",
+			(unicode_version >> 16) & 0xff,
+			(unicode_version >> 8) & 0xff,
+			(unicode_version & 0xfe));
+	}
+
+	um = kzalloc(sizeof(*um), GFP_KERNEL);
+	if (!um)
+		return ERR_PTR(-ENOMEM);
+
+	um->charset = "UTF-8";
+	um->version = unicode_version;
+
+	return um;
+}
+
+static int __init utf8_init(void)
+{
+	static_call_update(unicode_validate_static_call, utf8_validate);
+	static_call_update(unicode_strncmp_static_call, utf8_strncmp);
+	static_call_update(unicode_strncasecmp_static_call, utf8_strncasecmp);
+	static_call_update(unicode_strncasecmp_folded_static_call,
+			   utf8_strncasecmp_folded);
+	static_call_update(unicode_normalize_static_call, utf8_normalize);
+	static_call_update(unicode_casefold_static_call, utf8_casefold);
+	static_call_update(unicode_casefold_hash_static_call,
+			   utf8_casefold_hash);
+	static_call_update(unicode_load_static_call, utf8_load);
+
+	unicode_register(THIS_MODULE);
+	return 0;
+}
+
+static void __exit utf8_exit(void)
+{
+	static_call_update(unicode_validate_static_call,
+			   unicode_validate_default);
+	static_call_update(unicode_strncmp_static_call, unicode_strncmp_default);
+	static_call_update(unicode_strncasecmp_static_call,
+			   unicode_strncasecmp_default);
+	static_call_update(unicode_strncasecmp_folded_static_call,
+			   unicode_strncasecmp_folded_default);
+	static_call_update(unicode_normalize_static_call,
+			   unicode_normalize_default);
+	static_call_update(unicode_casefold_static_call,
+			   unicode_casefold_default);
+	static_call_update(unicode_casefold_hash_static_call,
+			   unicode_casefold_hash_default);
+	static_call_update(unicode_load_static_call, unicode_load_default);
+
+	unicode_unregister();
+}
+
+module_init(utf8_init);
+module_exit(utf8_exit);
+
+MODULE_LICENSE("GPL v2");
diff --git a/include/linux/unicode.h b/include/linux/unicode.h
index de23f9ee720b..0b157c6830c6 100644
--- a/include/linux/unicode.h
+++ b/include/linux/unicode.h
@@ -4,33 +4,101 @@ 
 
 #include <linux/init.h>
 #include <linux/dcache.h>
+#include <linux/static_call.h>
+
 
 struct unicode_map {
 	const char *charset;
 	int version;
 };
 
-int unicode_validate(const struct unicode_map *um, const struct qstr *str);
+int unicode_validate_default(const struct unicode_map *um,
+			     const struct qstr *str);
+
+int unicode_strncmp_default(const struct unicode_map *um,
+			    const struct qstr *s1,
+			    const struct qstr *s2);
+
+int unicode_strncasecmp_default(const struct unicode_map *um,
+				const struct qstr *s1,
+				const struct qstr *s2);
+
+int unicode_strncasecmp_folded_default(const struct unicode_map *um,
+				       const struct qstr *cf,
+				       const struct qstr *s1);
+
+int unicode_normalize_default(const struct unicode_map *um,
+			      const struct qstr *str,
+			      unsigned char *dest, size_t dlen);
+
+int unicode_casefold_default(const struct unicode_map *um,
+			     const struct qstr *str,
+			     unsigned char *dest, size_t dlen);
+
+int unicode_casefold_hash_default(const struct unicode_map *um,
+				  const void *salt, struct qstr *str);
 
-int unicode_strncmp(const struct unicode_map *um,
-		    const struct qstr *s1, const struct qstr *s2);
+struct unicode_map *unicode_load_default(const char *version);
 
-int unicode_strncasecmp(const struct unicode_map *um,
-			const struct qstr *s1, const struct qstr *s2);
-int unicode_strncasecmp_folded(const struct unicode_map *um,
-			       const struct qstr *cf,
-			       const struct qstr *s1);
+DECLARE_STATIC_CALL(unicode_validate_static_call, unicode_validate_default);
+DECLARE_STATIC_CALL(unicode_strncmp_static_call, unicode_strncmp_default);
+DECLARE_STATIC_CALL(unicode_strncasecmp_static_call,
+		    unicode_strncasecmp_default);
+DECLARE_STATIC_CALL(unicode_strncasecmp_folded_static_call,
+		    unicode_strncasecmp_folded_default);
+DECLARE_STATIC_CALL(unicode_normalize_static_call, unicode_normalize_default);
+DECLARE_STATIC_CALL(unicode_casefold_static_call, unicode_casefold_default);
+DECLARE_STATIC_CALL(unicode_casefold_hash_static_call,
+		    unicode_casefold_hash_default);
+DECLARE_STATIC_CALL(unicode_load_static_call, unicode_load_default);
 
-int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
-		      unsigned char *dest, size_t dlen);
 
-int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
-		     unsigned char *dest, size_t dlen);
+static inline int unicode_validate(const struct unicode_map *um, const struct qstr *str)
+{
+	return static_call(unicode_validate_static_call)(um, str);
+}
 
-int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
-			  struct qstr *str);
+static inline int unicode_strncmp(const struct unicode_map *um,
+				  const struct qstr *s1, const struct qstr *s2)
+{
+	return static_call(unicode_strncmp_static_call)(um, s1, s2);
+}
+
+static inline int unicode_strncasecmp(const struct unicode_map *um,
+				      const struct qstr *s1, const struct qstr *s2)
+{
+	return static_call(unicode_strncasecmp_static_call)(um, s1, s2);
+}
+
+static inline int unicode_strncasecmp_folded(const struct unicode_map *um,
+					     const struct qstr *cf,
+					     const struct qstr *s1)
+{
+	return static_call(unicode_strncasecmp_folded_static_call)(um, cf, s1);
+}
+
+static inline int unicode_normalize(const struct unicode_map *um, const struct qstr *str,
+				    unsigned char *dest, size_t dlen)
+{
+	return static_call(unicode_normalize_static_call)(um, str, dest, dlen);
+}
+
+static inline int unicode_casefold(const struct unicode_map *um, const struct qstr *str,
+				   unsigned char *dest, size_t dlen)
+{
+	return static_call(unicode_casefold_static_call)(um, str, dest, dlen);
+}
+
+static inline int unicode_casefold_hash(const struct unicode_map *um, const void *salt,
+					struct qstr *str)
+{
+	return static_call(unicode_casefold_hash_static_call)(um, salt, str);
+}
 
 struct unicode_map *unicode_load(const char *version);
 void unicode_unload(struct unicode_map *um);
 
+void unicode_register(struct module *owner);
+void unicode_unregister(void);
+
 #endif /* _LINUX_UNICODE_H */