mbox series

[GIT,PULL] unicode patches for 5.17

Message ID 87a6g11zq9.fsf@collabora.com (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] unicode patches for 5.17 | expand

Pull-request

https://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode.git tags/unicode-for-next-5.17

Message

Gabriel Krisman Bertazi Jan. 12, 2022, 1:58 a.m. UTC
The following changes since commit 9e1ff307c779ce1f0f810c7ecce3d95bbae40896:

  Linux 5.15-rc4 (2021-10-03 14:08:47 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode.git tags/unicode-for-next-5.17

for you to fetch changes up to e2a58d2d3416aceeae63dfc7bf680dd390ff331d:

  unicode: only export internal symbols for the selftests (2021-10-12 11:41:39 -0300)

----------------------------------------------------------------
This branch has patches from Christoph Hellwig to split the large data
tables of the unicode subsystem into a loadable module, which allow
users to not have them around if case-insensitive filesystems are not to
be used.  It also includes minor code fixes to unicode and its users,
from the same author.

There is a trivial conflict in the function encoding_show in
fs/f2fs/sysfs.c reported by linux-next between commit

84eab2a899f2 ("f2fs: replace snprintf in show functions with sysfs_emit")

and commit a440943e68cd ("unicode: remove the charset field from struct
unicode_map") from my tree.

I left an example of how I would solve it on the branch
unicode-f2fs-mergeconflict of my tree.

All the patches here have been on linux-next releases for the past
months.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

----------------------------------------------------------------
Christoph Hellwig (11):
      ext4: simplify ext4_sb_read_encoding
      f2fs: simplify f2fs_sb_read_encoding
      unicode: remove the charset field from struct unicode_map
      unicode: mark the version field in struct unicode_map unsigned
      unicode: pass a UNICODE_AGE() tripple to utf8_load
      unicode: remove the unused utf8{,n}age{min,max} functions
      unicode: simplify utf8len
      unicode: move utf8cursor to utf8-selftest.c
      unicode: cache the normalization tables in struct unicode_map
      unicode: Add utf8-data module
      unicode: only export internal symbols for the selftests

 fs/ext4/super.c                                    |  39 ++-
 fs/f2fs/super.c                                    |  38 +--
 fs/f2fs/sysfs.c                                    |   3 +-
 fs/unicode/Kconfig                                 |  13 +-
 fs/unicode/Makefile                                |  13 +-
 fs/unicode/mkutf8data.c                            |  24 +-
 fs/unicode/utf8-core.c                             | 109 ++++-----
 fs/unicode/utf8-norm.c                             | 262 +++------------------
 fs/unicode/utf8-selftest.c                         |  94 ++++----
 .../{utf8data.h_shipped => utf8data.c_shipped}     |  22 +-
 fs/unicode/utf8n.h                                 |  81 +++----
 include/linux/unicode.h                            |  49 +++-
 12 files changed, 291 insertions(+), 456 deletions(-)
 rename fs/unicode/{utf8data.h_shipped => utf8data.c_shipped} (99%)

Comments

pr-tracker-bot@kernel.org Jan. 17, 2022, 5:18 a.m. UTC | #1
The pull request you sent on Tue, 11 Jan 2022 20:58:54 -0500:

> https://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode.git tags/unicode-for-next-5.17

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6661224e66f03706daea8e27714436851cf01731

Thank you!
Linus Torvalds Jan. 17, 2022, 5:55 a.m. UTC | #2
On Wed, Jan 12, 2022 at 3:59 AM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> This branch has patches from Christoph Hellwig to split the large data
> tables of the unicode subsystem into a loadable module, which allow
> users to not have them around if case-insensitive filesystems are not to
> be used.

As seen by the pr-tracker-bot, I've merged this, but it had several rough spots.

One of them was around the renaming of the utf8data.h file to a .c
file: I fixed up the .gitignore problem myself, but the incorrect
comments still remain.

The Kconfig thing is also just plain badly done.

It's completely pointless and stupid to first have a "bool UNICODE"
question, and then have a "tristate UNICODE_UTF8_DATA" question that
depends on it.

The Kconfig file even *knows* it's pointless and stupid, because it
has comment to the effect, but despite writing that comment,
apparently nobody spent the five seconds actually thinking about how
to do it properly.

The sane and proper thing would have been to have *one* single
tristate question ("unicode y/m/n"), and that's used for the unicode
data module status.

Then the "core unicode" option (currently that "UNICODE" bool
question) would become something just computed off the modular
question:

  config UNICODE
         def_bool UNICODE_UTF8_DATA != n

with no actual user input being needed for it.

And yes, it might be even nicer to just make "UNICODE" itself be the
tristate, and not have a separate config variable at all, but that
would require changes to the users.

In particular, the filesystems that have

    #ifdef CONFIG_UNICODE

would have to be updated to use something like

    #ifdef IS_ENABLED(CONFIG_UNICODE)

instead.

That would probably be a good change, though, and then the 'UNICODE'
config option could just be a tristate, with the support code being
built in for the module case, with just the data being (potentially)
modular.

ANYWAY. I didn't do the above, I only fixed up the trivially annoying
gitconfig thing.

I've said this before, and I'll probably have to say it again: the
kernel config part is likely one of the most painful barriers to
people building their own kernel. Some of it is just because we have
*so* many modules, and there's just a lot of configuration you can do.

But the fact that it's already painful is no excuse to then ask people
_stupid_ questions and making the whole process unnecessarily even
more painful.

                  Linus