diff mbox series

[RFC,2/2] wcslen() prototype in string.h

Message ID 20250325-string-add-wcslen-for-llvm-opt-v1-2-b8f1e2c17888@kernel.org (mailing list archive)
State New
Headers show
Series string.c: Add wcslen() | expand

Commit Message

Nathan Chancellor March 25, 2025, 3:45 p.m. UTC
If this is desired, it should be squashed into the previous change. I
wrote it separately because it is slightly more invasive.

In order to export wcslen() to the rest of the kernel (should it ever be
necessary elsewhere), it needs to be added to string.h, along with nls.h
for the typedef of wchar_t. However, dragging in nls.h into string.h
causes an error in the efistub due to a conflicting function name:

  drivers/firmware/efi/libstub/printk.c:27:5: error: static declaration of 'utf8_to_utf32' follows non-static declaration
     27 | u32 utf8_to_utf32(const u8 **s8)
        |     ^
  include/linux/nls.h:55:12: note: previous declaration is here
     55 | extern int utf8_to_utf32(const u8 *s, int len, unicode_t *pu);
        |            ^
  drivers/firmware/efi/libstub/printk.c:85:26: error: too few arguments to function call, expected 3, have 1
     85 |                 c32 = utf8_to_utf32(&s8);
        |                       ~~~~~~~~~~~~~    ^
  include/linux/nls.h:55:12: note: 'utf8_to_utf32' declared here
     55 | extern int utf8_to_utf32(const u8 *s, int len, unicode_t *pu);
        |            ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  2 errors generated.

Rename the efi function to avoid the conflict.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
 drivers/firmware/efi/libstub/printk.c | 4 ++--
 include/linux/string.h                | 2 ++
 lib/string.c                          | 1 -
 3 files changed, 4 insertions(+), 3 deletions(-)

Comments

Andy Shevchenko March 25, 2025, 4:17 p.m. UTC | #1
On Tue, Mar 25, 2025 at 08:45:19AM -0700, Nathan Chancellor wrote:
> If this is desired, it should be squashed into the previous change. I
> wrote it separately because it is slightly more invasive.
> 
> In order to export wcslen() to the rest of the kernel (should it ever be
> necessary elsewhere), it needs to be added to string.h, along with nls.h
> for the typedef of wchar_t. However, dragging in nls.h into string.h
> causes an error in the efistub due to a conflicting function name:
> 
>   drivers/firmware/efi/libstub/printk.c:27:5: error: static declaration of 'utf8_to_utf32' follows non-static declaration
>      27 | u32 utf8_to_utf32(const u8 **s8)
>         |     ^
>   include/linux/nls.h:55:12: note: previous declaration is here
>      55 | extern int utf8_to_utf32(const u8 *s, int len, unicode_t *pu);
>         |            ^
>   drivers/firmware/efi/libstub/printk.c:85:26: error: too few arguments to function call, expected 3, have 1
>      85 |                 c32 = utf8_to_utf32(&s8);
>         |                       ~~~~~~~~~~~~~    ^
>   include/linux/nls.h:55:12: note: 'utf8_to_utf32' declared here
>      55 | extern int utf8_to_utf32(const u8 *s, int len, unicode_t *pu);
>         |            ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   2 errors generated.
> 
> Rename the efi function to avoid the conflict.

Hmm... Why not split this to two, rename patch as a standalone makes sense to
me even outside of this series.
Nathan Chancellor March 25, 2025, 4:58 p.m. UTC | #2
On Tue, Mar 25, 2025 at 06:17:34PM +0200, Andy Shevchenko wrote:
> On Tue, Mar 25, 2025 at 08:45:19AM -0700, Nathan Chancellor wrote:
> > If this is desired, it should be squashed into the previous change. I
> > wrote it separately because it is slightly more invasive.
> > 
> > In order to export wcslen() to the rest of the kernel (should it ever be
> > necessary elsewhere), it needs to be added to string.h, along with nls.h
> > for the typedef of wchar_t. However, dragging in nls.h into string.h
> > causes an error in the efistub due to a conflicting function name:
> > 
> >   drivers/firmware/efi/libstub/printk.c:27:5: error: static declaration of 'utf8_to_utf32' follows non-static declaration
> >      27 | u32 utf8_to_utf32(const u8 **s8)
> >         |     ^
> >   include/linux/nls.h:55:12: note: previous declaration is here
> >      55 | extern int utf8_to_utf32(const u8 *s, int len, unicode_t *pu);
> >         |            ^
> >   drivers/firmware/efi/libstub/printk.c:85:26: error: too few arguments to function call, expected 3, have 1
> >      85 |                 c32 = utf8_to_utf32(&s8);
> >         |                       ~~~~~~~~~~~~~    ^
> >   include/linux/nls.h:55:12: note: 'utf8_to_utf32' declared here
> >      55 | extern int utf8_to_utf32(const u8 *s, int len, unicode_t *pu);
> >         |            ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >   2 errors generated.
> > 
> > Rename the efi function to avoid the conflict.
> 
> Hmm... Why not split this to two, rename patch as a standalone makes sense to
> me even outside of this series.

How so? If nls.h is not included in printk.c via string.h, which does
not happen without this series, what value does the rename have? I do
not mind splitting it up that way to keep things cleaner, I am just
wondering what would be the justification in the changelog (I guess just
that nls.h may get included in the future for some reason)?

Cheers,
Nathan
Andy Shevchenko March 25, 2025, 5:05 p.m. UTC | #3
On Tue, Mar 25, 2025 at 09:58:47AM -0700, Nathan Chancellor wrote:
> On Tue, Mar 25, 2025 at 06:17:34PM +0200, Andy Shevchenko wrote:
> > On Tue, Mar 25, 2025 at 08:45:19AM -0700, Nathan Chancellor wrote:

...

> > > Rename the efi function to avoid the conflict.
> > 
> > Hmm... Why not split this to two, rename patch as a standalone makes sense to
> > me even outside of this series.
> 
> How so? If nls.h is not included in printk.c via string.h, which does
> not happen without this series, what value does the rename have? I do
> not mind splitting it up that way to keep things cleaner, I am just
> wondering what would be the justification in the changelog (I guess just
> that nls.h may get included in the future for some reason)?

Inside this series the justification is obvious (a.k.a. the same), outside
yes something like "Put EFI specific function to the respective namespace
to avoid potential clash in the future when including another header."
Nathan Chancellor March 25, 2025, 9:45 p.m. UTC | #4
On Tue, Mar 25, 2025 at 07:05:28PM +0200, Andy Shevchenko wrote:
> On Tue, Mar 25, 2025 at 09:58:47AM -0700, Nathan Chancellor wrote:
> > On Tue, Mar 25, 2025 at 06:17:34PM +0200, Andy Shevchenko wrote:
> > > On Tue, Mar 25, 2025 at 08:45:19AM -0700, Nathan Chancellor wrote:
> 
> ...
> 
> > > > Rename the efi function to avoid the conflict.
> > > 
> > > Hmm... Why not split this to two, rename patch as a standalone makes sense to
> > > me even outside of this series.
> > 
> > How so? If nls.h is not included in printk.c via string.h, which does
> > not happen without this series, what value does the rename have? I do
> > not mind splitting it up that way to keep things cleaner, I am just
> > wondering what would be the justification in the changelog (I guess just
> > that nls.h may get included in the future for some reason)?
> 
> Inside this series the justification is obvious (a.k.a. the same), outside
> yes something like "Put EFI specific function to the respective namespace
> to avoid potential clash in the future when including another header."

Okay, sounds reasonable to me. This is what I ended up with for that
change, which will become patch one of the series.

From f79267c05d2853c034e2c490f171aed064b85dd6 Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan@kernel.org>
Date: Tue, 25 Mar 2025 14:31:25 -0700
Subject: [PATCH] efi: libstub: Rename utf8_to_utf32() to avoid collision with
 NLS version

A future change will add nls.h to string.h, which will break the libstub
build because NLS includes its own version of utf8_to_utf32():

  drivers/firmware/efi/libstub/printk.c:27:5: error: static declaration of 'utf8_to_utf32' follows non-static declaration
     27 | u32 utf8_to_utf32(const u8 **s8)
        |     ^
  include/linux/nls.h:55:12: note: previous declaration is here
     55 | extern int utf8_to_utf32(const u8 *s, int len, unicode_t *pu);
        |            ^
  drivers/firmware/efi/libstub/printk.c:85:26: error: too few arguments to function call, expected 3, have 1
     85 |                 c32 = utf8_to_utf32(&s8);
        |                       ~~~~~~~~~~~~~    ^
  include/linux/nls.h:55:12: note: 'utf8_to_utf32' declared here
     55 | extern int utf8_to_utf32(const u8 *s, int len, unicode_t *pu);
        |            ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  2 errors generated.

Rename the libstub version since it is static to avoid the conflict.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
 drivers/firmware/efi/libstub/printk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/libstub/printk.c b/drivers/firmware/efi/libstub/printk.c
index 3a67a2cea7bd..334f7e89845c 100644
--- a/drivers/firmware/efi/libstub/printk.c
+++ b/drivers/firmware/efi/libstub/printk.c
@@ -24,7 +24,7 @@ void efi_char16_puts(efi_char16_t *str)
 }
 
 static
-u32 utf8_to_utf32(const u8 **s8)
+u32 efi_utf8_to_utf32(const u8 **s8)
 {
 	u32 c32;
 	u8 c0, cx;
@@ -82,7 +82,7 @@ void efi_puts(const char *str)
 	while (*s8) {
 		if (*s8 == '\n')
 			buf[pos++] = L'\r';
-		c32 = utf8_to_utf32(&s8);
+		c32 = efi_utf8_to_utf32(&s8);
 		if (c32 < 0x10000) {
 			/* Characters in plane 0 use a single word. */
 			buf[pos++] = c32;
Nathan Chancellor March 26, 2025, 12:33 a.m. UTC | #5
On Tue, Mar 25, 2025 at 02:45:21PM -0700, Nathan Chancellor wrote:
> Okay, sounds reasonable to me. This is what I ended up with for that
> change, which will become patch one of the series.
...
> diff --git a/drivers/firmware/efi/libstub/printk.c b/drivers/firmware/efi/libstub/printk.c
> index 3a67a2cea7bd..334f7e89845c 100644
> --- a/drivers/firmware/efi/libstub/printk.c
> +++ b/drivers/firmware/efi/libstub/printk.c
> @@ -24,7 +24,7 @@ void efi_char16_puts(efi_char16_t *str)
>  }
>  
>  static
> -u32 utf8_to_utf32(const u8 **s8)
> +u32 efi_utf8_to_utf32(const u8 **s8)
>  {
>  	u32 c32;
>  	u8 c0, cx;
> @@ -82,7 +82,7 @@ void efi_puts(const char *str)
>  	while (*s8) {
>  		if (*s8 == '\n')
>  			buf[pos++] = L'\r';
> -		c32 = utf8_to_utf32(&s8);
> +		c32 = efi_utf8_to_utf32(&s8);
>  		if (c32 < 0x10000) {
>  			/* Characters in plane 0 use a single word. */
>  			buf[pos++] = c32;
> -- 
> 2.49.0
> 
> Then the first patch (which becomes the second) becomes:
> 
> diff --git a/include/linux/string.h b/include/linux/string.h
> index 0403a4ca4c11..45e01cf3434c 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -7,6 +7,7 @@
>  #include <linux/cleanup.h>	/* for DEFINE_FREE() */
>  #include <linux/compiler.h>	/* for inline */
>  #include <linux/types.h>	/* for size_t */
> +#include <linux/nls.h>		/* for wchar_t */

Good thing I waited :) This include makes s390 unhappy:

https://lore.kernel.org/202503260611.MDurOUhF-lkp@intel.com/

It is possible that should be fixed by adding -Wno-pointer-sign to
KBUILD_CFLAGS_DECOMPRESSOR so that arch/s390/boot matches the rest of
the kernel but...

> diff --git a/lib/string.c b/lib/string.c
> index eb4486ed40d2..1aa09925254b 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -21,6 +21,7 @@
>  #include <linux/errno.h>
>  #include <linux/limits.h>
>  #include <linux/linkage.h>
> +#include <linux/nls.h>
>  #include <linux/stddef.h>
>  #include <linux/string.h>
>  #include <linux/types.h>

I wonder if would be better to do something like the below patch in lieu
of the EFI change above (since there is no chance for a collision) then
change both of the includes for wchar_t in this diff to nls_types.h? I
have no strong opinion but this seems like it would be cleaner for the
sake of backports while not being a bad solution upstream?

Subject: [PATCH] include: Move typedefs in nls.h to their own header

In order to allow commonly included headers such as string.h to access
typedefs such as wchar_t without running into issues with the rest of
the NLS library, refactor the typedefs out into their own header that
can be included in a much safer manner.

Cc: stable@vger.kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
 include/linux/nls.h       | 19 +------------------
 include/linux/nls_types.h | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 18 deletions(-)
 create mode 100644 include/linux/nls_types.h

diff --git a/include/linux/nls.h b/include/linux/nls.h
index e0bf8367b274..3d416d1f60b6 100644
--- a/include/linux/nls.h
+++ b/include/linux/nls.h
@@ -3,24 +3,7 @@
 #define _LINUX_NLS_H
 
 #include <linux/init.h>
-
-/* Unicode has changed over the years.  Unicode code points no longer
- * fit into 16 bits; as of Unicode 5 valid code points range from 0
- * to 0x10ffff (17 planes, where each plane holds 65536 code points).
- *
- * The original decision to represent Unicode characters as 16-bit
- * wchar_t values is now outdated.  But plane 0 still includes the
- * most commonly used characters, so we will retain it.  The newer
- * 32-bit unicode_t type can be used when it is necessary to
- * represent the full Unicode character set.
- */
-
-/* Plane-0 Unicode character */
-typedef u16 wchar_t;
-#define MAX_WCHAR_T	0xffff
-
-/* Arbitrary Unicode character */
-typedef u32 unicode_t;
+#include <linux/nls_types.h>
 
 struct nls_table {
 	const char *charset;
diff --git a/include/linux/nls_types.h b/include/linux/nls_types.h
new file mode 100644
index 000000000000..8caefdba19b1
--- /dev/null
+++ b/include/linux/nls_types.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_NLS_TYPES_H
+#define _LINUX_NLS_TYPES_H
+
+#include <linux/types.h>
+
+/* Unicode has changed over the years.  Unicode code points no longer
+ * fit into 16 bits; as of Unicode 5 valid code points range from 0
+ * to 0x10ffff (17 planes, where each plane holds 65536 code points).
+ *
+ * The original decision to represent Unicode characters as 16-bit
+ * wchar_t values is now outdated.  But plane 0 still includes the
+ * most commonly used characters, so we will retain it.  The newer
+ * 32-bit unicode_t type can be used when it is necessary to
+ * represent the full Unicode character set.
+ */
+
+/* Plane-0 Unicode character */
+typedef u16 wchar_t;
+#define MAX_WCHAR_T	0xffff
+
+/* Arbitrary Unicode character */
+typedef u32 unicode_t;
+
+#endif /* _LINUX_NLS_TYPES_H */
Andy Shevchenko March 26, 2025, 8:52 a.m. UTC | #6
On Tue, Mar 25, 2025 at 02:45:16PM -0700, Nathan Chancellor wrote:
> On Tue, Mar 25, 2025 at 07:05:28PM +0200, Andy Shevchenko wrote:
> > On Tue, Mar 25, 2025 at 09:58:47AM -0700, Nathan Chancellor wrote:
> > > On Tue, Mar 25, 2025 at 06:17:34PM +0200, Andy Shevchenko wrote:
> > > > On Tue, Mar 25, 2025 at 08:45:19AM -0700, Nathan Chancellor wrote:

...

> > > > > Rename the efi function to avoid the conflict.
> > > > 
> > > > Hmm... Why not split this to two, rename patch as a standalone makes sense to
> > > > me even outside of this series.
> > > 
> > > How so? If nls.h is not included in printk.c via string.h, which does
> > > not happen without this series, what value does the rename have? I do
> > > not mind splitting it up that way to keep things cleaner, I am just
> > > wondering what would be the justification in the changelog (I guess just
> > > that nls.h may get included in the future for some reason)?
> > 
> > Inside this series the justification is obvious (a.k.a. the same), outside
> > yes something like "Put EFI specific function to the respective namespace
> > to avoid potential clash in the future when including another header."
> 
> Okay, sounds reasonable to me. This is what I ended up with for that
> change, which will become patch one of the series.

LGTM.
However, now I'm thinking about nls.h appearing in the string.h.
The former actually lacks two headers in it: module.h and types.h.
I'm wondering if fixing nls.h would be even possible after your series.


> with basically the same changelog.
> 
> I will send v2 either tomorrow afternoon or Thursday morning to give a
> little more time for initial review/comments.
Andy Shevchenko March 26, 2025, 8:59 a.m. UTC | #7
On Tue, Mar 25, 2025 at 05:33:03PM -0700, Nathan Chancellor wrote:
> On Tue, Mar 25, 2025 at 02:45:21PM -0700, Nathan Chancellor wrote:

...

> > +#include <linux/nls.h>		/* for wchar_t */
> 
> Good thing I waited :) This include makes s390 unhappy:
> 
> https://lore.kernel.org/202503260611.MDurOUhF-lkp@intel.com/
> 
> It is possible that should be fixed by adding -Wno-pointer-sign to
> KBUILD_CFLAGS_DECOMPRESSOR so that arch/s390/boot matches the rest of
> the kernel but...

Ah, yes, you beat me up to commenting on this, the string.h and string.c made
in a way that they may be and are used in early boot code, i.e. it must not be
dirtyfied with the kernel code.

...

> >  #include <linux/errno.h>
> >  #include <linux/limits.h>
> >  #include <linux/linkage.h>
> > +#include <linux/nls.h>
> >  #include <linux/stddef.h>
> >  #include <linux/string.h>
> >  #include <linux/types.h>
> 
> I wonder if would be better to do something like the below patch in lieu
> of the EFI change above (since there is no chance for a collision) then
> change both of the includes for wchar_t in this diff to nls_types.h? I
> have no strong opinion but this seems like it would be cleaner for the
> sake of backports while not being a bad solution upstream?

>  #define _LINUX_NLS_H
>  
>  #include <linux/init.h>

As I just replied to your previous mail, consider fixing this list as well
by adding module.h and types.h.

...

Overall, can you browse the Ingo's series [1] for the stuff related to this,
if any?

I would avoid doing double efforts or different approaches if we already have
something ready.

[1]: https://lore.kernel.org/linux-kernel/YjBr10JXLGHfEFfi@gmail.com/
Nathan Chancellor March 26, 2025, 3:37 p.m. UTC | #8
On Wed, Mar 26, 2025 at 10:59:52AM +0200, Andy Shevchenko wrote:
> On Tue, Mar 25, 2025 at 05:33:03PM -0700, Nathan Chancellor wrote:
> >  #define _LINUX_NLS_H
> >  
> >  #include <linux/init.h>
> 
> As I just replied to your previous mail, consider fixing this list as well
> by adding module.h and types.h.
> 
> ...
> 
> Overall, can you browse the Ingo's series [1] for the stuff related to this,
> if any?
> 
> I would avoid doing double efforts or different approaches if we already have
> something ready.

In Ingo's last fast-headers tree [1], nls.h only has export.h and init.h
included, so it does not look like anything around this was changed from
what I can tell.

types.h is going to be included via the new nls_types.h and while it
does definitely look like module.h should be included, I do not really
have the time and build capacity at the moment to incorporate testing
that change into this series. I will stick with these two changes for
now then I, you, or someone else can revisit cleaning up nls.h later.

[1]: https://web.git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git/tree/include/linux/nls.h?h=sched/headers

Cheers,
Nathan
Andy Shevchenko March 26, 2025, 3:43 p.m. UTC | #9
On Wed, Mar 26, 2025 at 08:37:56AM -0700, Nathan Chancellor wrote:
> On Wed, Mar 26, 2025 at 10:59:52AM +0200, Andy Shevchenko wrote:
> > On Tue, Mar 25, 2025 at 05:33:03PM -0700, Nathan Chancellor wrote:

...

> > >  #include <linux/init.h>
> > 
> > As I just replied to your previous mail, consider fixing this list as well
> > by adding module.h and types.h.

...

> > Overall, can you browse the Ingo's series [1] for the stuff related to this,
> > if any?
> > 
> > I would avoid doing double efforts or different approaches if we already have
> > something ready.
> 
> In Ingo's last fast-headers tree [1], nls.h only has export.h and init.h
> included, so it does not look like anything around this was changed from
> what I can tell.

Thanks for checking!

> types.h is going to be included via the new nls_types.h and while it
> does definitely look like module.h should be included, I do not really
> have the time and build capacity at the moment to incorporate testing
> that change into this series. I will stick with these two changes for
> now then I, you, or someone else can revisit cleaning up nls.h later.

Fair enough.

> [1]: https://web.git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git/tree/include/linux/nls.h?h=sched/headers
diff mbox series

Patch

diff --git a/drivers/firmware/efi/libstub/printk.c b/drivers/firmware/efi/libstub/printk.c
index 3a67a2cea7bd..334f7e89845c 100644
--- a/drivers/firmware/efi/libstub/printk.c
+++ b/drivers/firmware/efi/libstub/printk.c
@@ -24,7 +24,7 @@  void efi_char16_puts(efi_char16_t *str)
 }
 
 static
-u32 utf8_to_utf32(const u8 **s8)
+u32 efi_utf8_to_utf32(const u8 **s8)
 {
 	u32 c32;
 	u8 c0, cx;
@@ -82,7 +82,7 @@  void efi_puts(const char *str)
 	while (*s8) {
 		if (*s8 == '\n')
 			buf[pos++] = L'\r';
-		c32 = utf8_to_utf32(&s8);
+		c32 = efi_utf8_to_utf32(&s8);
 		if (c32 < 0x10000) {
 			/* Characters in plane 0 use a single word. */
 			buf[pos++] = c32;
diff --git a/include/linux/string.h b/include/linux/string.h
index 0403a4ca4c11..45e01cf3434c 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -7,6 +7,7 @@ 
 #include <linux/cleanup.h>	/* for DEFINE_FREE() */
 #include <linux/compiler.h>	/* for inline */
 #include <linux/types.h>	/* for size_t */
+#include <linux/nls.h>		/* for wchar_t */
 #include <linux/stddef.h>	/* for NULL */
 #include <linux/err.h>		/* for ERR_PTR() */
 #include <linux/errno.h>	/* for E2BIG */
@@ -203,6 +204,7 @@  extern __kernel_size_t strlen(const char *);
 #ifndef __HAVE_ARCH_STRNLEN
 extern __kernel_size_t strnlen(const char *,__kernel_size_t);
 #endif
+extern __kernel_size_t wcslen(const wchar_t *s);
 #ifndef __HAVE_ARCH_STRPBRK
 extern char * strpbrk(const char *,const char *);
 #endif
diff --git a/lib/string.c b/lib/string.c
index bbee8a9e4d83..1aa09925254b 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -430,7 +430,6 @@  size_t strnlen(const char *s, size_t count)
 EXPORT_SYMBOL(strnlen);
 #endif
 
-size_t wcslen(const wchar_t *s);
 size_t wcslen(const wchar_t *s)
 {
 	const wchar_t *sc;