diff mbox series

[v5] tools/nolibc: fix up size inflate regression

Message ID b6ff2684f557f6ce00151905990643e651391614.1691437328.git.falcon@tinylab.org (mailing list archive)
State New
Headers show
Series [v5] tools/nolibc: fix up size inflate regression | expand

Commit Message

Zhangjin Wu Aug. 7, 2023, 8:04 p.m. UTC
As reported and suggested by Willy, the inline __sysret() helper
introduces three types of conversions and increases the size:

(1) the "unsigned long" argument to __sysret() forces a sign extension
from all sys_* functions that used to return 'int'

(2) the comparison with the error range now has to be performed on a
'unsigned long' instead of an 'int'

(3) the return value from __sysret() is a 'long' (note, a signed long)
which then has to be turned back to an 'int' before being returned by the
caller to satisfy the caller's prototype.

To fix up this, firstly, let's use macro instead of inline function to
preserves the input type and avoids these useless conversions (1), (3).

Secondly, comparison to -MAX_ERRNO inflicts on all integer returns where
we could previously keep a simple sign comparison, let's use a new
__is_pointer() macro suggested by David Laight to limit the comparison
to -MAX_ERRNO (2) only for pointer returns and preserve a simple sign
comparison for integer returns as before. The __builtin_choose_expr()
is suggested by David Laight to choose different comparisons based on
the types to share code.

Thirdly, fix up the following warning by an explicit conversion and let
__sysret() be able to accept the (void *) type of argument:

    sysroot/powerpc/include/sys.h: In function 'sbrk':
    sysroot/powerpc/include/sys.h:104:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      104 |         return (void *)__sysret(-ENOMEM);

Fourthly, to further workaround the argument type with 'const' flag,
must use __auto_type for gcc >= 11.0 and __typeof__((arg) + 0) suggested
by David Laight for old gcc versions.

Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/
Link: https://lore.kernel.org/lkml/20230806134348.GA19145@1wt.eu/
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Link: https://lore.kernel.org/lkml/f51e54bcf470451ea36f24640f000e61@AcuMS.aculab.com/
Link: https://lore.kernel.org/lkml/a1732bbffd1542d3b9dd34c92f45076c@AcuMS.aculab.com/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
---

Hi, Willy, Hi, David

v5 applies suggestions from David Laight, it further drops the fixed
'long' conversion branch by using a __typeof__((arg) + 0) trick and also
merges the pointer type and integer type comparisons with
__bultin_choose_expr() and a new __is_pointer() macro, now, the code is
cleaner than before versions.

David, Thanks a lot!

Like before, tests run for all nolibc supported boards.

Changes from v4 --> v5:

* Use __typeof__((arg) + 0) to lose the 'const' flag for old gcc
  versions.

* Import the famous __is_constexpr() macro from kernel side and add a
  __is_pointer() macro based on it. (David, to avoid introduce extra
  discuss on the prove-in-use __is_constexpr macro, this patch uses the
  original version instead of your suggested version, more info here:
  https://lore.kernel.org/lkml/20220131204357.1133674-1-keescook@chromium.org/)

* Use __builtin_choose_expr() to merge two comparisons to share the same
  errno setting code and the -1L assignment code.

Changes from v3 --> v4:

* fix up a new warning about 'ret < 0' when the input arg type is (void *)

Changes from v2 --> v3:

* define a __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT for gcc >= 11.0 (ABI_VERSION >= 1016)
* split __sysret() to two versions by the macro instead of a mixed unified and unreadable version
* use shorter __ret instead of __sysret_arg

Changes from v1 --> v2:

* fix up argument with 'const' in the type
* support "void *" argument

Best regards,
Zhangjin
---

v4: https://lore.kernel.org/lkml/a4084f7fac7a89f861b5582774bc7a98634d1e76.1691392805.git.falcon@tinylab.org/
v3: https://lore.kernel.org/lkml/8eaab5da2dcbba42e3f3efc2ae686a22c95f84f0.1691386601.git.falcon@tinylab.org/
v2: https://lore.kernel.org/lkml/95fe3e732f455fab653fe1427118d905e4d04257.1691339836.git.falcon@tinylab.org/
v1: https://lore.kernel.org/lkml/20230806131921.52453-1-falcon@tinylab.org/

---
 tools/include/nolibc/sys.h | 74 ++++++++++++++++++++++++++++++--------
 1 file changed, 59 insertions(+), 15 deletions(-)

Comments

Willy Tarreau Aug. 8, 2023, 6:49 p.m. UTC | #1
Hi Zhangjin,

On Tue, Aug 08, 2023 at 04:04:05AM +0800, Zhangjin Wu wrote:
> As reported and suggested by Willy, the inline __sysret() helper
> introduces three types of conversions and increases the size:
> 
> (1) the "unsigned long" argument to __sysret() forces a sign extension
> from all sys_* functions that used to return 'int'
> 
> (2) the comparison with the error range now has to be performed on a
> 'unsigned long' instead of an 'int'
> 
> (3) the return value from __sysret() is a 'long' (note, a signed long)
> which then has to be turned back to an 'int' before being returned by the
> caller to satisfy the caller's prototype.
> 
> To fix up this, firstly, let's use macro instead of inline function to
> preserves the input type and avoids these useless conversions (1), (3).
> 
> Secondly, comparison to -MAX_ERRNO inflicts on all integer returns where
> we could previously keep a simple sign comparison, let's use a new
> __is_pointer() macro suggested by David Laight to limit the comparison
> to -MAX_ERRNO (2) only for pointer returns and preserve a simple sign
> comparison for integer returns as before. The __builtin_choose_expr()
> is suggested by David Laight to choose different comparisons based on
> the types to share code.
> 
> Thirdly, fix up the following warning by an explicit conversion and let
> __sysret() be able to accept the (void *) type of argument:
> 
>     sysroot/powerpc/include/sys.h: In function 'sbrk':
>     sysroot/powerpc/include/sys.h:104:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>       104 |         return (void *)__sysret(-ENOMEM);
> 
> Fourthly, to further workaround the argument type with 'const' flag,
> must use __auto_type for gcc >= 11.0 and __typeof__((arg) + 0) suggested
> by David Laight for old gcc versions.
(...)
> tools/include/nolibc/sys.h | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 59 insertions(+), 15 deletions(-)

Quite frankly, even if it can be looked at as a piece of art, I don't
like it. It's overkill for what we need and it brings in several tricky
macros that we don't need and that require a link to their analysis so
that nobody breaks them by accident. I mean, if one day we need them,
okay we know we can find them, they're perfect for certain use cases.
But all this just to avoid a ternary operation is far too much IMHO.
That's without counting on the compiler tricks to use the ugly
__auto_type when available, and the macro names which continue to
possibly interact with user code.

And if you remember, originally you proposed to factor the SET_ERRNO()
stuff in every syscall in order to "simplify the code and improve
maintainability". It's clear that we've long abandonned that goal here.
If we had no other choice, I'd rather roll back to the clean, readable
and trustable SET_ERRNO() in every syscall!

So I just restarted from what I proposed the other day, using a ternary
operator as I suggested in order to address the const case, and it gives
me the following patch, which is way simpler and still a bit readable.
It's made of two nested (?:) :
  - the first one to determine if we have to check for the sign or
    against -MAX_ERRNO to detect an error (depends on the arg's
    signedness)
  - the second one to return either the argument as-is or -1.

The only two tricks are that (typeof(arg))-1 is compared to 1 instead of
zero so that gcc doesn't complain that we're comparing against a null
pointer, and similarly we compare arg+1 to 1 instead of arg to 0 for the
negative case, and that's all. It gives me the expected code and output
from gcc-4.7 to 12.3, and clang-13.

I've checked against your version and it's always exactly the same (in
fact to be more precise sometimes it's 1-2 bytes smaller but that's only
due to the compiler taking liberties with the code ordering, it could as
well have done it the other way around, though it did not this time):

 26144 zhangjin-v5/nolibc-test--Os-arm64     | 26144 willy/nolibc-test--Os-arm64
 23340 zhangjin-v5/nolibc-test--Os-armv5     | 23340 willy/nolibc-test--Os-armv5
 23232 zhangjin-v5/nolibc-test--Os-armv7     | 23232 willy/nolibc-test--Os-armv7
 17508 zhangjin-v5/nolibc-test--Os-armv7t    | 17508 willy/nolibc-test--Os-armv7t
 19674 zhangjin-v5/nolibc-test--Os-i386      | 19673 willy/nolibc-test--Os-i386
 19821 zhangjin-v5/nolibc-test--Os-i586      | 19820 willy/nolibc-test--Os-i586
 23084 zhangjin-v5/nolibc-test--Os-loongarch | 23084 willy/nolibc-test--Os-loongarch
 23404 zhangjin-v5/nolibc-test--Os-mips      | 23404 willy/nolibc-test--Os-mips
 27108 zhangjin-v5/nolibc-test--Os-ppc32     | 27108 willy/nolibc-test--Os-ppc32
 27652 zhangjin-v5/nolibc-test--Os-ppc64     | 27652 willy/nolibc-test--Os-ppc64
 27652 zhangjin-v5/nolibc-test--Os-ppc64le   | 27652 willy/nolibc-test--Os-ppc64le
 19356 zhangjin-v5/nolibc-test--Os-riscv     | 19356 willy/nolibc-test--Os-riscv
 22152 zhangjin-v5/nolibc-test--Os-s390      | 22152 willy/nolibc-test--Os-s390
 22300 zhangjin-v5/nolibc-test--Os-x86_64    | 22299 willy/nolibc-test--Os-x86_64

Unless there's any objection, I'll queue this one. And if __sysret()
annoys us again in the future I might very well revert that simplification.

Any question about the patch ?

Thanks,
Willy

---

From 73dd63ed6666c6f212ba09247e68b6b5711ed6ec Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Tue, 8 Aug 2023 20:12:50 +0200
Subject: tools/nolibc: avoid undesired casts in the __sysret() macro

Having __sysret() as an inline function has the unfortunate effect of
adding casts and large constants comparisons after the syscall returns
that significantly inflate some light code that's otherwise syscall-
heavy. Even nolibc-test grew by ~1%.

Let's switch back to a macro for this, and apply the following rule:
  - if the argument (the local variable containing the syscall return
    value) is signed, any negative value is an error, so the check is
    performed compared to zero with the argument's type ;

  - otherwise if the argument is unsigned, only values >= -MAX_ERRNO
    indicate an error. That's what is used for mmap() for example.

The result is calculated using a ternary operator so that we don't need
to assign values to a temporary variable and that it does work fine with
const if needed.

The sbrk() and mmap() syscalls were also fixed to return the correct
type (they used to have double casts to hide the pointer and restore
it).

Fixes: 428905da6ec4 ("tools/nolibc: sys.h: add a syscall return helper")
Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/
Cc: Zhangjin Wu <falcon@tinylab.org>
Cc: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 tools/include/nolibc/sys.h | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
index 833d6c5e86dc..6b5f39fbf998 100644
--- a/tools/include/nolibc/sys.h
+++ b/tools/include/nolibc/sys.h
@@ -28,22 +28,25 @@
 #include "types.h"
 
 
-/* Syscall return helper for library routines, set errno as -ret when ret is in
- * range of [-MAX_ERRNO, -1]
- *
- * Note, No official reference states the errno range here aligns with musl
- * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
+/* Syscall return helper: takes the syscall value in argument and checks for an
+ * error in it. For unsigned returns, an error is within [-MAX_ERRNO, -1]. For
+ * signed returns, an error is any value < 0. When an error is encountered,
+ * -ret is set into errno and -1 is returned. Otherwise the returned value is
+ * passed as-is with its type preserved.
  */
 
-static __inline__ __attribute__((unused, always_inline))
-long __sysret(unsigned long ret)
-{
-	if (ret >= (unsigned long)-MAX_ERRNO) {
-		SET_ERRNO(-(long)ret);
-		return -1;
-	}
-	return ret;
-}
+#define __sysret(arg)								\
+({										\
+	__typeof__(arg) __sysret_arg = (arg);					\
+	((((__typeof__(arg)) -1) > (__typeof__(arg)) 1) ?   /* unsigned arg? */	\
+	 (uintptr_t)__sysret_arg >= (uintptr_t)-(MAX_ERRNO) :      /* errors */	\
+	 (__sysret_arg + 1) < ((__typeof__(arg))1)     /* signed: <0 = error */	\
+	) ? ({									\
+		SET_ERRNO(-(intptr_t)__sysret_arg);				\
+		((__typeof__(arg)) -1);              /* return -1 upon error */	\
+	}) : __sysret_arg;        /* return original value & type on success */	\
+})
+
 
 /* Functions in this file only describe syscalls. They're declared static so
  * that the compiler usually decides to inline them while still being allowed
@@ -94,7 +97,7 @@ void *sbrk(intptr_t inc)
 	if (ret && sys_brk(ret + inc) == ret + inc)
 		return ret + inc;
 
-	return (void *)__sysret(-ENOMEM);
+	return __sysret((void *)-ENOMEM);
 }
 
 
@@ -682,7 +685,7 @@ void *sys_mmap(void *addr, size_t length, int prot, int flags, int fd,
 static __attribute__((unused))
 void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
 {
-	return (void *)__sysret((unsigned long)sys_mmap(addr, length, prot, flags, fd, offset));
+	return __sysret(sys_mmap(addr, length, prot, flags, fd, offset));
 }
 
 static __attribute__((unused))
Zhangjin Wu Aug. 9, 2023, 10:17 p.m. UTC | #2
Hi, Willy

> Hi Zhangjin,
> 
> On Tue, Aug 08, 2023 at 04:04:05AM +0800, Zhangjin Wu wrote:
> > As reported and suggested by Willy, the inline __sysret() helper
> > introduces three types of conversions and increases the size:
> > 
> > (1) the "unsigned long" argument to __sysret() forces a sign extension
> > from all sys_* functions that used to return 'int'
> > 
> > (2) the comparison with the error range now has to be performed on a
> > 'unsigned long' instead of an 'int'
> > 
> > (3) the return value from __sysret() is a 'long' (note, a signed long)
> > which then has to be turned back to an 'int' before being returned by the
> > caller to satisfy the caller's prototype.
> > 
> > To fix up this, firstly, let's use macro instead of inline function to
> > preserves the input type and avoids these useless conversions (1), (3).
> > 
> > Secondly, comparison to -MAX_ERRNO inflicts on all integer returns where
> > we could previously keep a simple sign comparison, let's use a new
> > __is_pointer() macro suggested by David Laight to limit the comparison
> > to -MAX_ERRNO (2) only for pointer returns and preserve a simple sign
> > comparison for integer returns as before. The __builtin_choose_expr()
> > is suggested by David Laight to choose different comparisons based on
> > the types to share code.
> > 
> > Thirdly, fix up the following warning by an explicit conversion and let
> > __sysret() be able to accept the (void *) type of argument:
> > 
> >     sysroot/powerpc/include/sys.h: In function 'sbrk':
> >     sysroot/powerpc/include/sys.h:104:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
> >       104 |         return (void *)__sysret(-ENOMEM);
> > 
> > Fourthly, to further workaround the argument type with 'const' flag,
> > must use __auto_type for gcc >= 11.0 and __typeof__((arg) + 0) suggested
> > by David Laight for old gcc versions.
> (...)
> > tools/include/nolibc/sys.h | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------
> > 1 file changed, 59 insertions(+), 15 deletions(-)
> 
> Quite frankly, even if it can be looked at as a piece of art, I don't
> like it. It's overkill for what we need and it brings in several tricky
> macros that we don't need and that require a link to their analysis so
> that nobody breaks them by accident. I mean, if one day we need them,
> okay we know we can find them, they're perfect for certain use cases.
> But all this just to avoid a ternary operation is far too much IMHO.
> That's without counting on the compiler tricks to use the ugly
> __auto_type when available, and the macro names which continue to
> possibly interact with user code.
>

Agree, I don't like __auto_type too, although I have tried to find whether
there is a direct macro for it, but NO such one, and the __auto_type in some
older versions don't accept 'const' flag, so, I'm also worried about if gcc
will change it in the future ;-(

Seems __sysret() is mainly used by us in sys.h, perhaps we can simply
assume and guarantee nobody will use 'const' in such cases.

> And if you remember, originally you proposed to factor the SET_ERRNO()
> stuff in every syscall in order to "simplify the code and improve
> maintainability". It's clear that we've long abandonned that goal here.
> If we had no other choice, I'd rather roll back to the clean, readable
> and trustable SET_ERRNO() in every syscall!
>

Agree, or we simply use the original version without pointer returns support
(only sbrk and mmap currently) but convert it to the macro version.

Or, as the idea mentioned by Thomas in a reply: if we can let the sys_
functions use 'long' returns, or even further, we convert all of the sys_
functions to macros and let them preserve input types from library routines and
preserve return types from the my_syscall<N> macros.

As we discussed in my our syscall.h proposal, if there is a common
my_syscall(), every sys_ function can be simply defined to something
like:

    #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)

In my_syscall(), it can even simply return -ENOSYS if the __NR_xxx is
not defined (we init such __NR_xxx to something like __NR_NOSYS):

    // sysnr.h

    // If worried about the applications use this macro, perhaps we can
    // use a different prefix, for example, NOLIBC_NR_xxx

    #define NOLIBC_NR_NOSYS (-1L)

    #ifndef __NR_xxx
    #define NOLIBC_NR_xxx NOLIBC_NR_NOSYS
    #else
    #define NOLIBC_NR_xxx __NR_xxx
    #endif

    // syscall.h

    // _my_syscall is similar to syscall() in unistd.h, but without the
    // __sysret normalization

    #define _my_syscalln(N, ...) my_syscall##N(__VA_ARGS__)
    #define _my_syscall_n(N, ...) _my_syscalln(N, __VA_ARGS__)
    #define _my_syscall(...) _my_syscall_n(_syscall_narg(__VA_ARGS__), ##__VA_ARGS__)

    #define my_syscall(name, ...)                                       \
    ({                                                                  \
           long _ret;                                                   \
           if (NOLIBC_NR_##name == NOLIBC_NR_NOSYS)                     \
                   _ret = -ENOSYS;                                      \
           else                                                         \
                   _ret = _my_syscall(NOLIBC_NR_##name, ##__VA_ARGS__); \
           _ret;                                                        \
    })

    // sys_<NAME> list, based on unistd.h

    #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)
    #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)
    #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)

With above conversions, we may be able to predefine all of the
sys_<NAME> functions to preserve the input types from library rountines
and return types from my_syscall<N> (by default, 'long'). This also
follows the suggestion from Arnd: let sys_ not use the other low level
syscalls, only use its own.

This may also help us to remove all of the `#ifdef __NR_` wrappers, we
can directly check the -ENOSYS in the library routines and try another
sys_<NAME> if required, at last, call __sysret() to normalize the errors
and return value.

Use dup2 and dup3 as examples, with sysnr.h and syscall.h above, sys.h
will work like this, without any #ifdef's:

    /*
     * int dup2(int old, int new);
     */
   
    static __attribute__((unused))
    int dup2(int old, int new)
    {
	int ret = sys_dup3(old, new, 0);

	if (ret == -ENOSYS)
		ret = sys_dup2(old, new);

	return __sysret(ret);
    }
    
    /*
     * int dup3(int old, int new, int flags);
     */
    
    static __attribute__((unused))
    int dup3(int old, int new, int flags)
    {
    	return __sysret(sys_dup3(old, new, flags));
    }

If the above description is not clear enough, I have changed something for this
idea with more cleanups and have done some simple tests:

Compare with v5 --> v6 (with gcc-13.2.0):

     i386: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning  19509 -> 19250
   x86_64: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning  22012 -> 21758
    arm64: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning  25868 -> 25804
      arm: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning  23112 -> 22828
     mips: 160 test(s): 157 passed,   3 skipped,   0 failed => status: warning  22924 -> 22740
      ppc: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning  26628 -> 26376
    ppc64: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning  27204 -> 26756
  ppc64le: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning  27828 -> 27364
    riscv: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning  21870 -> 21772
     s390: 160 test(s): 157 passed,   3 skipped,   0 failed => status: warning  22192 -> 21992

And now we get:

    $ wc -l tools/include/nolibc/sys.h
    746 tools/include/nolibc/sys.h

    $ wc -l tools/include/nolibc/sysnr.h 
    410 tools/include/nolibc/sysnr.h
    $ wc -l tools/include/nolibc/syscall.h 
    110 tools/include/nolibc/syscall.h

Before:

    $ wc -l tools/include/nolibc/sys.h 
    1222 tools/include/nolibc/sys.h

Most of the library routines become one line code.

The main size reduced may be the carefully tuned sys_ selection, for example,
linkat v.s. link,  the patchset still require some cleanups, will send v6 for
more discussion if you agree. 

> So I just restarted from what I proposed the other day, using a ternary
> operator as I suggested in order to address the const case, and it gives
> me the following patch, which is way simpler and still a bit readable.
> It's made of two nested (?:) :
>   - the first one to determine if we have to check for the sign or
>     against -MAX_ERRNO to detect an error (depends on the arg's
>     signedness)
>   - the second one to return either the argument as-is or -1.
> 
> The only two tricks are that (typeof(arg))-1 is compared to 1 instead of
> zero so that gcc doesn't complain that we're comparing against a null
> pointer, and similarly we compare arg+1 to 1 instead of arg to 0 for the
> negative case, and that's all. It gives me the expected code and output
> from gcc-4.7 to 12.3, and clang-13.
> 
> I've checked against your version and it's always exactly the same (in
> fact to be more precise sometimes it's 1-2 bytes smaller but that's only
> due to the compiler taking liberties with the code ordering, it could as
> well have done it the other way around, though it did not this time):
> 
>  26144 zhangjin-v5/nolibc-test--Os-arm64     | 26144 willy/nolibc-test--Os-arm64
>  23340 zhangjin-v5/nolibc-test--Os-armv5     | 23340 willy/nolibc-test--Os-armv5
[...]
> 
> Unless there's any objection, I'll queue this one. And if __sysret()
> annoys us again in the future I might very well revert that simplification.
> 
> Any question about the patch ?
> 
[...]
>  
> -/* Syscall return helper for library routines, set errno as -ret when ret is in
> - * range of [-MAX_ERRNO, -1]
> - *
> - * Note, No official reference states the errno range here aligns with musl
> - * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
> +/* Syscall return helper: takes the syscall value in argument and checks for an
> + * error in it. For unsigned returns, an error is within [-MAX_ERRNO, -1]. For
> + * signed returns, an error is any value < 0. When an error is encountered,
> + * -ret is set into errno and -1 is returned. Otherwise the returned value is
> + * passed as-is with its type preserved.
>   */
>  
> -static __inline__ __attribute__((unused, always_inline))
> -long __sysret(unsigned long ret)
> -{
> -	if (ret >= (unsigned long)-MAX_ERRNO) {
> -		SET_ERRNO(-(long)ret);
> -		return -1;
> -	}
> -	return ret;
> -}
> +#define __sysret(arg)								\
> +({										\
> +	__typeof__(arg) __sysret_arg = (arg);					\

Here ignores the 'const' flag in input type?

> +	((((__typeof__(arg)) -1) > (__typeof__(arg)) 1) ?   /* unsigned arg? */	\
> +	 (uintptr_t)__sysret_arg >= (uintptr_t)-(MAX_ERRNO) :      /* errors */	\
> +	 (__sysret_arg + 1) < ((__typeof__(arg))1)     /* signed: <0 = error */	\
> +	) ? ({									\
> +		SET_ERRNO(-(intptr_t)__sysret_arg);				\
> +		((__typeof__(arg)) -1);              /* return -1 upon error */	\
> +	}) : __sysret_arg;        /* return original value & type on success */	\
> +})
> +
>

To be honest, it is also a little complex when with one "?:" embedded in
another, I even don't understand how the 'unsigned arg' branch works,
sorry, is it dark magic like the __is_constexpr? ;-)

Thanks,
Zhangjin

>  /* Functions in this file only describe syscalls. They're declared static so
>   * that the compiler usually decides to inline them while still being allowed
> @@ -94,7 +97,7 @@ void *sbrk(intptr_t inc)
>  	if (ret && sys_brk(ret + inc) == ret + inc)
>  		return ret + inc;
>  
> -	return (void *)__sysret(-ENOMEM);
> +	return __sysret((void *)-ENOMEM);
>  }
>  
>  
> @@ -682,7 +685,7 @@ void *sys_mmap(void *addr, size_t length, int prot, int flags, int fd,
>  static __attribute__((unused))
>  void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
>  {
> -	return (void *)__sysret((unsigned long)sys_mmap(addr, length, prot, flags, fd, offset));
> +	return __sysret(sys_mmap(addr, length, prot, flags, fd, offset));
>  }
>  
>  static __attribute__((unused))
> -- 
> 2.35.3
Willy Tarreau Aug. 13, 2023, 8:51 a.m. UTC | #3
Hi Zhangjin,

On Thu, Aug 10, 2023 at 06:17:43AM +0800, Zhangjin Wu wrote:
> > Quite frankly, even if it can be looked at as a piece of art, I don't
> > like it. It's overkill for what we need and it brings in several tricky
> > macros that we don't need and that require a link to their analysis so
> > that nobody breaks them by accident. I mean, if one day we need them,
> > okay we know we can find them, they're perfect for certain use cases.
> > But all this just to avoid a ternary operation is far too much IMHO.
> > That's without counting on the compiler tricks to use the ugly
> > __auto_type when available, and the macro names which continue to
> > possibly interact with user code.
> >
> 
> Agree, I don't like __auto_type too, although I have tried to find whether
> there is a direct macro for it, but NO such one, and the __auto_type in some
> older versions don't accept 'const' flag, so, I'm also worried about if gcc
> will change it in the future ;-(

I mean, it's just that we do not need it at all.

> Seems __sysret() is mainly used by us in sys.h,

Sure, it was added not long ago by you to factor all the calls to
SET_ERRNO():

   428905da6ec4 ("tools/nolibc: sys.h: add a syscall return helper")

> perhaps we can simply
> assume and guarantee nobody will use 'const' in such cases.

There is absolutely *no* problem with const since the value is use by
a "return" statement.

> > And if you remember, originally you proposed to factor the SET_ERRNO()
> > stuff in every syscall in order to "simplify the code and improve
> > maintainability". It's clear that we've long abandonned that goal here.
> > If we had no other choice, I'd rather roll back to the clean, readable
> > and trustable SET_ERRNO() in every syscall!
> >
> 
> Agree, or we simply use the original version without pointer returns support
> (only sbrk and mmap currently) but convert it to the macro version.

I indeed think that's the cleanest approach. There will hardly be more
than 2 syscalls returning pointers or unsigned values and all this extra
complexity added just to avoid *two* SET_ERRNO() calls is totally
pointless.

> Or, as the idea mentioned by Thomas in a reply: if we can let the sys_
> functions use 'long' returns, or even further, we convert all of the sys_
> functions to macros and let them preserve input types from library routines and
> preserve return types from the my_syscall<N> macros.

It would be annoying because the sys_* implement some fallbacks, themselves
based on #ifdef and such stuff. Macros are really a pain when they're
repeated. They're a pain to edit, to debug, to modify and you'll see that
editors are even not good with them, you often end up modifying more than
you want to try to keep trailing backslashes aligned.

> As we discussed in my our syscall.h proposal, if there is a common
> my_syscall(), every sys_ function can be simply defined to something
> like:
> 
>     #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)
> 
> In my_syscall(), it can even simply return -ENOSYS if the __NR_xxx is
> not defined (we init such __NR_xxx to something like __NR_NOSYS):
> 
>     // sysnr.h
> 
>     // If worried about the applications use this macro, perhaps we can
>     // use a different prefix, for example, NOLIBC_NR_xxx
> 
>     #define NOLIBC_NR_NOSYS (-1L)
> 
>     #ifndef __NR_xxx
>     #define NOLIBC_NR_xxx NOLIBC_NR_NOSYS
>     #else
>     #define NOLIBC_NR_xxx __NR_xxx
>     #endif
> 
>     // syscall.h
> 
>     // _my_syscall is similar to syscall() in unistd.h, but without the
>     // __sysret normalization
> 
>     #define _my_syscalln(N, ...) my_syscall##N(__VA_ARGS__)
>     #define _my_syscall_n(N, ...) _my_syscalln(N, __VA_ARGS__)
>     #define _my_syscall(...) _my_syscall_n(_syscall_narg(__VA_ARGS__), ##__VA_ARGS__)
> 
>     #define my_syscall(name, ...)                                       \
>     ({                                                                  \
>            long _ret;                                                   \
>            if (NOLIBC_NR_##name == NOLIBC_NR_NOSYS)                     \
>                    _ret = -ENOSYS;                                      \
>            else                                                         \
>                    _ret = _my_syscall(NOLIBC_NR_##name, ##__VA_ARGS__); \
>            _ret;                                                        \
>     })
> 
>     // sys_<NAME> list, based on unistd.h
> 
>     #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)
>     #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)
>     #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)
> 
> With above conversions, we may be able to predefine all of the
> sys_<NAME> functions to preserve the input types from library rountines
> and return types from my_syscall<N> (by default, 'long'). This also
> follows the suggestion from Arnd: let sys_ not use the other low level
> syscalls, only use its own.

Maybe, but I'm not sure there is much to gain here, compared to the
flexibility to map one to another (e.g. see sys_chmod()).

> This may also help us to remove all of the `#ifdef __NR_` wrappers, we
> can directly check the -ENOSYS in the library routines and try another
> sys_<NAME> if required, at last, call __sysret() to normalize the errors
> and return value.
> 
> Use dup2 and dup3 as examples, with sysnr.h and syscall.h above, sys.h
> will work like this, without any #ifdef's:
> 
>     /*
>      * int dup2(int old, int new);
>      */
>    
>     static __attribute__((unused))
>     int dup2(int old, int new)
>     {
> 	int ret = sys_dup3(old, new, 0);
> 
> 	if (ret == -ENOSYS)
> 		ret = sys_dup2(old, new);
> 
> 	return __sysret(ret);
>     }

But this will add a useless test after all such syscalls, we'd rather
not do that!

> > -static __inline__ __attribute__((unused, always_inline))
> > -long __sysret(unsigned long ret)
> > -{
> > -	if (ret >= (unsigned long)-MAX_ERRNO) {
> > -		SET_ERRNO(-(long)ret);
> > -		return -1;
> > -	}
> > -	return ret;
> > -}
> > +#define __sysret(arg)								\
> > +({										\
> > +	__typeof__(arg) __sysret_arg = (arg);					\
> 
> Here ignores the 'const' flag in input type?

Yes, as explained above, there's no issue with const. The issue
that was met in the version I suggested in the message was that
there was an assignment to the variable of value -1 to be returned,
which is not permitted when it's const, and I said that it was not
necessary, it was just a convenience, but that using "?:" does the
job as well without having to do any assignment.

> > +	((((__typeof__(arg)) -1) > (__typeof__(arg)) 1) ?   /* unsigned arg? */	\
> > +	 (uintptr_t)__sysret_arg >= (uintptr_t)-(MAX_ERRNO) :      /* errors */	\
> > +	 (__sysret_arg + 1) < ((__typeof__(arg))1)     /* signed: <0 = error */	\
> > +	) ? ({									\
> > +		SET_ERRNO(-(intptr_t)__sysret_arg);				\
> > +		((__typeof__(arg)) -1);              /* return -1 upon error */	\
> > +	}) : __sysret_arg;        /* return original value & type on success */	\
> > +})
> > +
> >
> 
> To be honest, it is also a little complex when with one "?:" embedded in
> another, I even don't understand how the 'unsigned arg' branch works,
> sorry, is it dark magic like the __is_constexpr? ;-)

The thing is that we don't need to do anything specific for consts, we
just need to check whether an argument is signed or unsigned. The test
for unsigned is that all unsigned integers are positive, so
((unsigned)-1 > 0) is always true. We just compare it to 1 instead of
0 to shut up the compiler which was seeing a comparison against NULL.

The rest is just checking if arg < 0 if arg is signed, or
arg >= -MAX_ERRNO if it's unsigned, and if so, assigns its negation to
errno and returns -1 otherwise returns it as-is. So it's not dark magic,
doesn't rely on compiler's behavior and does not require links to external
books explaining why the macro works in modern compilers.

Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be
honest, because we're there just because of the temptation to remove
lines that were not causing any difficulties :-/

I think we can do something in-between and deal only with signed returns,
and explicitly place the test for MAX_ERRNO on the two unsigned ones
(brk and mmap). It should look approximately like this:

 #define __sysret(arg)							\
 ({									\
 	__typeof__(arg) __sysret_arg = (arg);				\
 	(__sysret_arg < 0) ? ({           /* error ? */                 \
 		SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */      \
 		((__typeof__(arg)) -1);   /*      return -1 */          \
 	}) : __sysret_arg;                /* return original value */   \
 })

Willy
Zhangjin Wu Aug. 13, 2023, 1:26 p.m. UTC | #4
Hi Zhangjin,

> On Thu, Aug 10, 2023 at 06:17:43AM +0800, Zhangjin Wu wrote:
> [...]
> 
> > > And if you remember, originally you proposed to factor the SET_ERRNO()
> > > stuff in every syscall in order to "simplify the code and improve
> > > maintainability". It's clear that we've long abandonned that goal here.
> > > If we had no other choice, I'd rather roll back to the clean, readable
> > > and trustable SET_ERRNO() in every syscall!
> > >
> > 
> > Agree, or we simply use the original version without pointer returns support
> > (only sbrk and mmap currently) but convert it to the macro version.
> 
> I indeed think that's the cleanest approach. There will hardly be more
> than 2 syscalls returning pointers or unsigned values and all this extra
> complexity added just to avoid *two* SET_ERRNO() calls is totally
> pointless.
>

Agree.

> > Or, as the idea mentioned by Thomas in a reply: if we can let the sys_
> > functions use 'long' returns, or even further, we convert all of the sys_
> > functions to macros and let them preserve input types from library routines and
> > preserve return types from the my_syscall<N> macros.
> 
> It would be annoying because the sys_* implement some fallbacks, themselves
> based on #ifdef and such stuff.

Yeah, this is an issue we must solve, let's talk about it in the 'if
(ret == -ENOSYS)' part later. 

> Macros are really a pain when they're
> repeated. They're a pain to edit, to debug, to modify and you'll see that
> editors are even not good with them, you often end up modifying more than
> you want to try to keep trailing backslashes aligned.
>

Agree very much.

> > As we discussed in my our syscall.h proposal, if there is a common
> > my_syscall(), every sys_ function can be simply defined to something
> > like:
> > 
> >     #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)
> > 
> > In my_syscall(), it can even simply return -ENOSYS if the __NR_xxx is
> > not defined (we init such __NR_xxx to something like __NR_NOSYS):
> > 
> >     // sysnr.h
> > 
> >     // If worried about the applications use this macro, perhaps we can
> >     // use a different prefix, for example, NOLIBC_NR_xxx
> > 
> >     #define NOLIBC_NR_NOSYS (-1L)
> > 
> >     #ifndef __NR_xxx
> >     #define NOLIBC_NR_xxx NOLIBC_NR_NOSYS
> >     #else
> >     #define NOLIBC_NR_xxx __NR_xxx
> >     #endif
> > 
> >     // syscall.h
> > 
> >     // _my_syscall is similar to syscall() in unistd.h, but without the
> >     // __sysret normalization
> > 
> >     #define _my_syscalln(N, ...) my_syscall##N(__VA_ARGS__)
> >     #define _my_syscall_n(N, ...) _my_syscalln(N, __VA_ARGS__)
> >     #define _my_syscall(...) _my_syscall_n(_syscall_narg(__VA_ARGS__), ##__VA_ARGS__)
> > 
> >     #define my_syscall(name, ...)                                       \
> >     ({                                                                  \
> >            long _ret;                                                   \
> >            if (NOLIBC_NR_##name == NOLIBC_NR_NOSYS)                     \
> >                    _ret = -ENOSYS;                                      \
> >            else                                                         \
> >                    _ret = _my_syscall(NOLIBC_NR_##name, ##__VA_ARGS__); \
> >            _ret;                                                        \
> >     })
> > 
> >     // sys_<NAME> list, based on unistd.h
> > 
> >     #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)
> >     #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)
> >     #define sys_<NAME>(...) my_syscall(<NAME>, __VA_ARGS__)
> > 
> > With above conversions, we may be able to predefine all of the
> > sys_<NAME> functions to preserve the input types from library rountines
> > and return types from my_syscall<N> (by default, 'long'). This also
> > follows the suggestion from Arnd: let sys_ not use the other low level
> > syscalls, only use its own.
> 
> Maybe, but I'm not sure there is much to gain here, compared to the
> flexibility to map one to another (e.g. see sys_chmod()).
>

But we can do the mapping in library routines too (still need to solve
the #ifdef switch, let's talk later), and the sys_* macros will be purely a
user-space name for the kernel-side syscall (of course, this is really
the ideal status).

Sometimes, the __NR_mmap means old_map, the __NR_select means
old_select, and the __NR_clone means different backwards, we still need
to let architecture to select what they really want and use some macros
like '__ARCH_WANT_SYS_OLD_SELECT' to select the right kernel mmap
implementation.

From the sys.h side, we can assume every sys_* are just the one (with
the same name) provided by kernel side.

From the syscall.h side (**this new syscall.h completely differs from
the old one with reorg about the my_syscall<N>, it only adds a syscall()
like macros: my_syscall() and __sysdef()**), except the one by one
mapping:

    // sysnr.h
    #define NOLIBC__NR_NOSYS (-1L)
    
    #ifndef __NR_brk
    #define NOLIBC__NR_brk NOLIBC__NR_NOSYS
    #else
    #define NOLIBC__NR_brk __NR_brk
    #endif

    ...

    #ifndef __NR_write
    #define NOLIBC__NR_write NOLIBC__NR_NOSYS
    #else
    #define NOLIBC__NR_write __NR_write
    #endif

    // syscall.h

    #define _my_syscall(N, ...) my_syscall##N(__VA_ARGS__)
    #define _my_syscall_n(N, ...) _my_syscall(N, __VA_ARGS__)
    #define my_syscall(...) _my_syscall_n(_syscall_narg(__VA_ARGS__), ##__VA_ARGS__)
    
    /* syscall() is used from application with normalized error and return value */
    #define syscall(...) __sysret(my_syscall(__VA_ARGS__))
    
    /* __sysdef() is used to define sys_* macros with original return value */
    #define __sysdef(name, ...) \
    	((NOLIBC__NR_##name == NOLIBC__NR_NOSYS) ? (long)-ENOSYS : my_syscall(NOLIBC__NR_##name, ##__VA_ARGS__))

    /* sys_* macros */

    #define sys_brk(...)                       __sysdef(brk, __VA_ARGS__)
    #define sys_chdir(...)                     __sysdef(chdir, __VA_ARGS__)
    #define sys_chmod(...)                     __sysdef(chmod, __VA_ARGS__)
    #define sys_chown(...)                     __sysdef(chown, __VA_ARGS__)
    #define sys_chroot(...)                    __sysdef(chroot, __VA_ARGS__)
    #define sys_clone(...)                     __sysdef(clone, __VA_ARGS__)
    #define sys_close(...)                     __sysdef(close, __VA_ARGS__)
    #define sys_dup(...)                       __sysdef(dup, __VA_ARGS__)
    #define sys_dup2(...)                      __sysdef(dup2, __VA_ARGS__)
    #define sys_dup3(...)                      __sysdef(dup3, __VA_ARGS__)
    #define sys_execve(...)                    __sysdef(execve, __VA_ARGS__)
    ...
    #define sys_unlinkat(...)                  __sysdef(unlinkat, __VA_ARGS__)
    #define sys_wait4(...)                     __sysdef(wait4, __VA_ARGS__)
    #define sys_write(...)                     __sysdef(write, __VA_ARGS__)

We also add exceptions for the ones like old_select, old_mmap, and
backwards of clones, here use old_map as an example:

    /* sys_* exceptions */

    /* Some architectures' mmap() is implemented as old_mmap() in kernel side,
     * let's correct them
     */
    #ifdef __ARCH_WANT_SYS_OLD_MMAP
    #undef sys_mmap
    #define sys_old_mmap(...) __sysdef(mmap, __VA_ARGS__)
    static __attribute__((unused))
    long sys_mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
    {
           struct mmap_arg_struct args = {
                   .addr = (unsigned long)addr,
                   .len = (unsigned long)length,
                   .prot = prot,
                   .flags = flags,
                   .fd = fd,
                   .offset = (unsigned long)offset
           };
           return sys_old_mmap(&args);
    }
    #endif

With this exception, s390 no long need to provide its own mmap
definition, it (seems i386 too, but it uses mmap2 currently) can simply
define '__ARCH_WANT_SYS_OLD_MMAP' as the '__ARCH_WANT_SYS_OLD_SELECT' we
are using for old_select.

The same method applies to the selection of the different backward
version of the sys_clone() syscall (from kernel/fork.c):

    /*
     * Note: Different archs have a different API of the clone() syscall, let's
     * normalize sys_clone() for all of them and allow select a backward version by
     * architecture.
     */

    #ifdef __NR_clone
    #undef sys_clone
    #define __sys_clone(...) __sysdef(clone, __VA_ARGS__)
    
    static __attribute__((unused))
    int sys_clone(unsigned long clone_flags, unsigned long newsp,
                  int __attribute__((unused)) stack_size,
                  int parent_tidptr, int child_tidptr, unsigned long tls)
    {
            long ret;
    #ifdef __ARCH_WANT_SYS_CLONE_BACKWARDS
            ret = __sys_clone(clone_flags, newsp, parent_tidptr, tls, child_tidptr);
    #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS2)
            ret = __sys_clone(newsp, clone_flags, parent_tidptr, child_tidptr, tls);
    #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS3)
            ret = __sys_clone(clone_flags, newsp, stack_size, parent_tidptr, child_tidptr, tls);
    #else
            ret = __sys_clone(clone_flags, newsp, parent_tidptr, child_tidptr, tls);
    #endif
            return ret;
    }
    #endif /* __NR_clone */

s390 only requires to define '__ARCH_WANT_SYS_CLONE_BACKWARDS2', no need
to provide its own sys_fork() version, in the __NR_clone branch of
fork(), __ARCH_WANT_SYS_CLONE_BACKWARDS2 can directly select the right
version of sys_clone() for s390).

The one for old_select/select is more simple:

    /*
     * For historic reasons, the select() syscall on arm, powerpc and i386 is
     * old_select in kernel side, they all provide _newselect() syscall as the new
     * version (arm and powerpc from v2.6.27, i386 from v2.6.28) like the select()
     * syscall on the other architectures.
     *
     * To align with them, here defines a new NOLIBC__NR_newselect and map it to
     * __NR__newselect or __NR_select accordingly.
     *
     * Note, since the oldest stable branch is v4.14, it is fair to no long support
     * the versions older than v2.6.27.
     */

    #ifndef NOLIBC__NR_newselect
    #ifndef __NR__newselect
    #ifndef __NR_select
    #define NOLIBC__NR_newselect NOLIBC__NR_NOSYS
    #else  /* __NR_select */
    #define NOLIBC__NR_newselect __NR_select
    #endif
    #else  /* __NR__newselect */
    #define NOLIBC__NR_newselect __NR__newselect
    #endif /* __NR__newselect */
    #endif /* ! NOLIBC__NR_newselect */
    
    /*
     * sys_newselect is not used by any architecture currently, use it as the new
     * version of select, it is mapped to sys__newselect or sys_select by the above
     * definition of NOLIBC__NR_newselect
     */
    #define sys_newselect(...) __sysdef(newselect, __VA_ARGS__)

We only have these three exceptions currently, with this normalization,
the library routines from sys.h can directly think sys_* macros are
generic, if not, let syscall.h take care of the right exceptions.

> > This may also help us to remove all of the `#ifdef __NR_` wrappers, we
> > can directly check the -ENOSYS in the library routines and try another
> > sys_<NAME> if required, at last, call __sysret() to normalize the errors
> > and return value.
> > 
> > Use dup2 and dup3 as examples, with sysnr.h and syscall.h above, sys.h
> > will work like this, without any #ifdef's:
> > 
> >     /*
> >      * int dup2(int old, int new);
> >      */
> >    
> >     static __attribute__((unused))
> >     int dup2(int old, int new)
> >     {
> > 	int ret = sys_dup3(old, new, 0);
> > 
> > 	if (ret == -ENOSYS)
> > 		ret = sys_dup2(old, new);
> > 
> > 	return __sysret(ret);
> >     }
> 
> But this will add a useless test after all such syscalls, we'd rather
> not do that!
>

Indeed, I found this issue too, when __NR_dup3 not defined, it returns
-ENOSYS, than, no size issue, otherwise, the compiler will not be able
to learn what the ret of sys_dup3() will be, so, it can not optimize the
second call to sys_dup2().

So, the '#ifdef' logic must be used like we did in sys_* functions, but
it is really not that meaningful (no big gain as you mentioned above) if
we only move them from the sys_* functions to the library routines.

At last, I found the ternary operation together with the initialization
of the not-defined __NR_* as NOLIBC__NR_NOSYS help this a lot, at last,
we get something like this:

    /* __systry2() is used to select one of two provided low level syscalls */
    #define __systry2(a, sys_a, sys_b) \
    	((NOLIBC__NR_##a != NOLIBC__NR_NOSYS) ? (sys_a) : (sys_b))

It can eliminate all of the '#ifdef' stuffs, using the chmod example you
mentioned above, it becomes something like this:

    /*
     * int chmod(const char *path, mode_t mode);
     */
    
    static __attribute__((unused))
    int chmod(const char *path, mode_t mode)
    {
    	return __sysret(__systry2(chmod, sys_chmod(path, mode), sys_fchmodat(AT_FDCWD, path, mode, 0)));
    }

Purely clean and clear.

Even with the complex select, mmap and fork library routines, we get the same
result (we have moved the __ARCH_WANT_SYS_* to syscall.h to normalize sys_*
there):

    /*
     * int select(int nfds, fd_set *read_fds, fd_set *write_fds,
     *            fd_set *except_fds, struct timeval *timeout);
     */
    
    static __attribute__((unused))
    int select(int nfds, fd_set *rfds, fd_set *wfds, fd_set *efds, struct timeval *timeout)
    {
    	struct timespec t;
    
    	if (timeout) {
    		t.tv_sec  = timeout->tv_sec;
    		t.tv_nsec = timeout->tv_usec * 1000;
    	}
    
    	return __sysret(__systry2(newselect, sys_newselect(nfds, rfds, wfds, efds, timeout),
    					     sys_pselect6(nfds, rfds, wfds, efds, timeout ? &t : NULL, NULL)));
    }

    /*
     * void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
     * int munmap(void *addr, size_t length);
     */
    
    /* Note that on Linux, MAP_FAILED is -1 so we can use the generic __sysret()
     * which returns -1 upon error and still satisfy user land that checks for
     * MAP_FAILED.
     */
    
    static __attribute__((unused))
    void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
    {
    	return (void *)__sysret(__systry2(mmap2, sys_mmap2(addr, length, prot, flags, fd, offset >> 12),
    						 sys_mmap(addr, length, prot, flags, fd, offset)));
    }

    /*
     * pid_t fork(void);
     */
    
    static __attribute__((unused))
    pid_t fork(void)
    {
    	return __sysret(__systry2(fork, sys_fork(), sys_clone(SIGCHLD, 0, 0, 0, 0, 0)));
    }
    
Currently, except the select() library routine has used 3 sys_* before,
the other library routines have only used 2 sys_*, so, __systry2() is
enough to select one of two. If we really want to use the third one,
based on __systry2(), it is very easy to add __systry3() and even
__systry4():

    #define __systry3(a, b, sys_a, sys_b, sys_c) \
            __systry2(a, (sys_a), __systry2(b, (sys_b), (sys_c)))
    
    #define __systry4(a, b, c, sys_a, sys_b, sys_c, sys_d) \
            __systry3(a, b, (sys_a), (sys_b), __systry2(c, (sys_c), (sys_d)))

Perhaps the coming time64 ones may need __systry3(), not tested yet.

> > > -static __inline__ __attribute__((unused, always_inline))
> > > -long __sysret(unsigned long ret)
> > > -{
> > > -	if (ret >= (unsigned long)-MAX_ERRNO) {
> > > -		SET_ERRNO(-(long)ret);
> > > -		return -1;
> > > -	}
> > > -	return ret;
> > > -}
> > > +#define __sysret(arg)								\
> > > +({										\
> > > +	__typeof__(arg) __sysret_arg = (arg);					\
> > 
> > Here ignores the 'const' flag in input type?
> 
> Yes, as explained above, there's no issue with const. The issue
> that was met in the version I suggested in the message was that
> there was an assignment to the variable of value -1 to be returned,
> which is not permitted when it's const, and I said that it was not
> necessary, it was just a convenience, but that using "?:" does the
> job as well without having to do any assignment.

Sorry, I have mixed the 'assignment' in definition (no problem here) and
the 'assignment' in late change (__sysret_arg = -1). The ternary
operation here used is really a great idea ;-) 

> 
> > > +	((((__typeof__(arg)) -1) > (__typeof__(arg)) 1) ?   /* unsigned arg? */	\
> > > +	 (uintptr_t)__sysret_arg >= (uintptr_t)-(MAX_ERRNO) :      /* errors */	\
> > > +	 (__sysret_arg + 1) < ((__typeof__(arg))1)     /* signed: <0 = error */	\
> > > +	) ? ({									\
> > > +		SET_ERRNO(-(intptr_t)__sysret_arg);				\
> > > +		((__typeof__(arg)) -1);              /* return -1 upon error */	\
> > > +	}) : __sysret_arg;        /* return original value & type on success */	\
> > > +})
> > > +
> > >
> > 
> > To be honest, it is also a little complex when with one "?:" embedded in
> > another, I even don't understand how the 'unsigned arg' branch works,
> > sorry, is it dark magic like the __is_constexpr? ;-)
> 
> The thing is that we don't need to do anything specific for consts, we
> just need to check whether an argument is signed or unsigned. The test
> for unsigned is that all unsigned integers are positive, so
> ((unsigned)-1 > 0) is always true. We just compare it to 1 instead of
> 0 to shut up the compiler which was seeing a comparison against NULL.
> 
> The rest is just checking if arg < 0 if arg is signed, or
> arg >= -MAX_ERRNO if it's unsigned, and if so, assigns its negation to
> errno and returns -1 otherwise returns it as-is. So it's not dark magic,
> doesn't rely on compiler's behavior and does not require links to external
> books explaining why the macro works in modern compilers.

Yeah, I have mixed the outside '?:' with the inside '?:', now get it,
thanks a lot!

    ({                                                                             \
      (                                                                            \
        (((__typeof__(arg)) -1) > (__typeof__(arg)) 1)     ?   /* unsigned arg? */ \
        (uintptr_t)__sysret_arg >= (uintptr_t)-(MAX_ERRNO) :      /* errors */	   \
        (__sysret_arg + 1) < ((__typeof__(arg))1)     /* signed: <0 = error */	   \
      ) ? ({									   \
       		SET_ERRNO(-(intptr_t)__sysret_arg);				   \
       		((__typeof__(arg)) -1);              /* return -1 upon error */	   \
       	  })                                                                       \
        : __sysret_arg;        /* return original value & type on success */	   \
    })

This looks better although you have prepared a pretty one below ;-)

> Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be
> honest, because we're there just because of the temptation to remove
> lines that were not causing any difficulties :-/
>
> I think we can do something in-between and deal only with signed returns,
> and explicitly place the test for MAX_ERRNO on the two unsigned ones
> (brk and mmap). It should look approximately like this:
> 
>  #define __sysret(arg)                                                \
>  ({                                                                   \
>  	__typeof__(arg) __sysret_arg = (arg);                           \
>  	(__sysret_arg < 0) ? ({           /* error ? */                 \
>  		SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */      \
>  		((__typeof__(arg)) -1);   /*      return -1 */          \
>  	}) : __sysret_arg;                /* return original value */   \
>  })
>

I like this one very much, a simple test shows, it saves one more byte.

Only a quesiton, why 'errno != -ret' has a '!'? and we have post-tab in
above two lines of __sysret() too, I have changed them to whitespaces.

For the brk and mmap part, it is ok to restore them as before, and it
also works with the new patchset we proposed above, only need such
tuning:

    #define sys_brk(...)                               __sysdef(brk, __VA_ARGS__)
    #define sys_mmap(...)                              __sysdef(mmap, __VA_ARGS__)
    #define sys_mmap2(...)                             __sysdef(mmap2, __VA_ARGS__)

    -->

    #define sys_brk(...)                       (void *)__sysdef(brk, __VA_ARGS__)
    #define sys_mmap(...)                      (void *)__sysdef(mmap, __VA_ARGS__)
    #define sys_mmap2(...)                     (void *)__sysdef(mmap2, __VA_ARGS__)

But let them align with the others may be better, so, most of the sys_*
macros can be simply mapped with a simple line (all of them are
generated automatically), without the care of the return types changing.

So, Willy, as a summary:

- one solution is your new __sysret() + restore the original SET_ERRNO
  for mmap and brk [1].

- another solution is your new __sysret() + my patch [2] to let mmap and brk
  return 'long' as the other sys_* function does.

Both of them are ok for me, but If we can apply the second one, the
proposed patchset above may be cleaner. The patch [2] is really a
prepare patch for the above proposed patchset, that is why I send it
before that patchset.

It is time to finish the size inflate regression issue, after you apply
any of the above solutions, I will resell my proposed patchset above, at
least a RFC patchset, please ignore this currently ;-) 

[1]: https://lore.kernel.org/lkml/20230813090037.GE8237@1wt.eu/#t
[2]: https://lore.kernel.org/lkml/82b584cbda5cee8d5318986644a2a64ba749a098.1691788036.git.falcon@tinylab.org/

Best regards,
Zhangjin

> Willy
Willy Tarreau Aug. 14, 2023, 8:22 a.m. UTC | #5
On Sun, Aug 13, 2023 at 09:26:20PM +0800, Zhangjin Wu wrote:
> > > With above conversions, we may be able to predefine all of the
> > > sys_<NAME> functions to preserve the input types from library rountines
> > > and return types from my_syscall<N> (by default, 'long'). This also
> > > follows the suggestion from Arnd: let sys_ not use the other low level
> > > syscalls, only use its own.
> > 
> > Maybe, but I'm not sure there is much to gain here, compared to the
> > flexibility to map one to another (e.g. see sys_chmod()).
> >
> 
> But we can do the mapping in library routines too (still need to solve
> the #ifdef switch, let's talk later), and the sys_* macros will be purely a
> user-space name for the kernel-side syscall (of course, this is really
> the ideal status).

"we can", sure, but should we ?

> Sometimes, the __NR_mmap means old_map, the __NR_select means
> old_select, and the __NR_clone means different backwards, we still need
> to let architecture to select what they really want and use some macros
> like '__ARCH_WANT_SYS_OLD_SELECT' to select the right kernel mmap
> implementation.

Yes but that's already what we have. I mean, we haven't invented these
macros, they're already used in the kernel itself to indicate what
variant of a syscall an arch depends on. The __NR_ thing doesn't imply
anything, at best its absence implies the syscall doesn't exist in this
form.

> >From the sys.h side, we can assume every sys_* are just the one (with
> the same name) provided by kernel side.
> 
> >From the syscall.h side (**this new syscall.h completely differs from
> the old one with reorg about the my_syscall<N>, it only adds a syscall()
> like macros: my_syscall() and __sysdef()**), except the one by one
> mapping:
> 
>     // sysnr.h
>     #define NOLIBC__NR_NOSYS (-1L)
>     
>     #ifndef __NR_brk
>     #define NOLIBC__NR_brk NOLIBC__NR_NOSYS
>     #else
>     #define NOLIBC__NR_brk __NR_brk
>     #endif
> 
>     ...
> 
>     #ifndef __NR_write
>     #define NOLIBC__NR_write NOLIBC__NR_NOSYS
>     #else
>     #define NOLIBC__NR_write __NR_write
>     #endif

I really don't like the idea of having to add code to redefine everything
that already exists for the purpose of removing code later. There might
be other solutions to this. If we had to go through this, at least I
would like to be sure that we only use the original __NR_xxx so that
nobody has to go through the pain of redefining them like this. Because
what's written above is work for a computer, not a human.

>     ...
>     #define sys_unlinkat(...)                  __sysdef(unlinkat, __VA_ARGS__)
>     #define sys_wait4(...)                     __sysdef(wait4, __VA_ARGS__)
>     #define sys_write(...)                     __sysdef(write, __VA_ARGS__)
> 
> We also add exceptions for the ones like old_select, old_mmap, and
> backwards of clones, here use old_map as an example:
> 
>     /* sys_* exceptions */
> 
>     /* Some architectures' mmap() is implemented as old_mmap() in kernel side,
>      * let's correct them
>      */
>     #ifdef __ARCH_WANT_SYS_OLD_MMAP
>     #undef sys_mmap

This one as well should be avoided. Undefining something already defined
is very brittle as it prevents any further code reorganization, and is
often confusing for those trying to follow the code.

>     #define sys_old_mmap(...) __sysdef(mmap, __VA_ARGS__)
>     static __attribute__((unused))
>     long sys_mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
>     {
>            struct mmap_arg_struct args = {
>                    .addr = (unsigned long)addr,
>                    .len = (unsigned long)length,
>                    .prot = prot,
>                    .flags = flags,
>                    .fd = fd,
>                    .offset = (unsigned long)offset
>            };
>            return sys_old_mmap(&args);
>     }
>     #endif

So that's the perfect example of adding unneeded obfuscation. From what
I understand it does nothing but:

    long sys_mmap(...)
    {
          struct ... args = {
          ...
          };
          return my_syscall(&args);
    }

Right ? If so better remove the unneeded #define.

> With this exception, s390 no long need to provide its own mmap
> definition, it (seems i386 too, but it uses mmap2 currently) can simply
> define '__ARCH_WANT_SYS_OLD_MMAP' as the '__ARCH_WANT_SYS_OLD_SELECT' we
> are using for old_select.
> 
> The same method applies to the selection of the different backward
> version of the sys_clone() syscall (from kernel/fork.c):
(...)

>     #ifdef __NR_clone
>     #undef sys_clone
>     #define __sys_clone(...) __sysdef(clone, __VA_ARGS__)
>     
>     static __attribute__((unused))
>     int sys_clone(unsigned long clone_flags, unsigned long newsp,
>                   int __attribute__((unused)) stack_size,
>                   int parent_tidptr, int child_tidptr, unsigned long tls)
>     {
>             long ret;
>     #ifdef __ARCH_WANT_SYS_CLONE_BACKWARDS
>             ret = __sys_clone(clone_flags, newsp, parent_tidptr, tls, child_tidptr);
>     #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS2)
>             ret = __sys_clone(newsp, clone_flags, parent_tidptr, child_tidptr, tls);
>     #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS3)
>             ret = __sys_clone(clone_flags, newsp, stack_size, parent_tidptr, child_tidptr, tls);
>     #else
>             ret = __sys_clone(clone_flags, newsp, parent_tidptr, child_tidptr, tls);
>     #endif
>             return ret;
>     }
>     #endif /* __NR_clone */
> 
> s390 only requires to define '__ARCH_WANT_SYS_CLONE_BACKWARDS2', no need
> to provide its own sys_fork() version, in the __NR_clone branch of
> fork(), __ARCH_WANT_SYS_CLONE_BACKWARDS2 can directly select the right
> version of sys_clone() for s390).

Maybe but with much less #define indirections it would be significantly
better.

(...)
> We only have these three exceptions currently, with this normalization,
> the library routines from sys.h can directly think sys_* macros are
> generic, if not, let syscall.h take care of the right exceptions.

I see the point. But that doesn't remove the need to write the exported
function itself. I'm not saying there's nothing to save here, I see your
point, I'm just wondering if we really gain something in terms of ease
of declaring new syscalls especially for first-time contributors and if
we're not losing in maintenance. If at least it's easy enough to implement
exceptions, maybe it could be worth further investigating.

> > >     static __attribute__((unused))
> > >     int dup2(int old, int new)
> > >     {
> > > 	int ret = sys_dup3(old, new, 0);
> > > 
> > > 	if (ret == -ENOSYS)
> > > 		ret = sys_dup2(old, new);
> > > 
> > > 	return __sysret(ret);
> > >     }
> > 
> > But this will add a useless test after all such syscalls, we'd rather
> > not do that!
> >
> 
> Indeed, I found this issue too, when __NR_dup3 not defined, it returns
> -ENOSYS, than, no size issue, otherwise, the compiler will not be able
> to learn what the ret of sys_dup3() will be, so, it can not optimize the
> second call to sys_dup2().
> 
> So, the '#ifdef' logic must be used like we did in sys_* functions, but
> it is really not that meaningful (no big gain as you mentioned above) if
> we only move them from the sys_* functions to the library routines.
> 
> At last, I found the ternary operation together with the initialization
> of the not-defined __NR_* as NOLIBC__NR_NOSYS help this a lot, at last,
> we get something like this:
> 
>     /* __systry2() is used to select one of two provided low level syscalls */
>     #define __systry2(a, sys_a, sys_b) \
>     	((NOLIBC__NR_##a != NOLIBC__NR_NOSYS) ? (sys_a) : (sys_b))

But this supposes that all of them are manually defined as you did above.
I'd rather implement an ugly is_numeric() macro based on argument
resolution. I've done it once in another project, I don't remember
precisely where it is but I vaguely remember that it used to check
that the string resolution of the argument gave a letter (when it
does not exist) or a digit (when it does). I can look into that later
if needed. But please avoid extra macro definitions as much as possible,
they're a real pain to handle in the code. There's no error when one is
missing or has a typo, it's difficult to follow them and they don't
appear in the debugger.

> It can eliminate all of the '#ifdef' stuffs, using the chmod example you
> mentioned above, it becomes something like this:
> 
>     /*
>      * int chmod(const char *path, mode_t mode);
>      */
>     
>     static __attribute__((unused))
>     int chmod(const char *path, mode_t mode)
>     {
>     	return __sysret(__systry2(chmod, sys_chmod(path, mode), sys_fchmodat(AT_FDCWD, path, mode, 0)));
>     }
> 
> Purely clean and clear.

That's a matter of taste and it may explain why we often disagree. For me
it's horrid. If I'm the one implementing chmod for my platform and it does
not work, what should I do when facing that, except want to cry ? Think
that right now we have this:

  static __attribute__((unused))
  int sys_chmod(const char *path, mode_t mode)
  {
  #ifdef __NR_fchmodat
          return my_syscall4(__NR_fchmodat, AT_FDCWD, path, mode, 0);
  #elif defined(__NR_chmod)
          return my_syscall2(__NR_chmod, path, mode);
  #else
          return -ENOSYS;
  #endif
  }

Sure it can be called not pretty, but I think it has the merit of being
totally explicit, and whoever sees chmod() fail can quickly check based
on the test in what situation they're supposed to be and what to check.

One thing I'm worried about also regarding using my_syscall() without the
number is that it's easy to miss an argument and have random values taken
from registers (or the stack) passed as argument. For example above we can
see that the flags part is 0 in fchmodat(). It's easy to miss themn and
while the syscall4() will complain, syscall() will silently turn that
into syscall3(). That's not necessarily a big deal, but we've seen during
the development that it's easy to make mistakes and they're not trivial
to spot. So again I'm really wondering about the benefits in such a case.

This is well illustrated in your example below:

>     	return __sysret(__systry2(newselect, sys_newselect(nfds, rfds, wfds, efds, timeout),
>     					     sys_pselect6(nfds, rfds, wfds, efds, timeout ? &t : NULL, NULL)));

How many attempts to get it right ? Just skip one NULL and you don't
see it.

>     static __attribute__((unused))
>     pid_t fork(void)
>     {
>     	return __sysret(__systry2(fork, sys_fork(), sys_clone(SIGCHLD, 0, 0, 0, 0, 0)));
>     }

I'd rather really write more explicit code.

> > > > -static __inline__ __attribute__((unused, always_inline))
> > > > -long __sysret(unsigned long ret)
> > > > -{
> > > > -	if (ret >= (unsigned long)-MAX_ERRNO) {
> > > > -		SET_ERRNO(-(long)ret);
> > > > -		return -1;
> > > > -	}
> > > > -	return ret;
> > > > -}
> > > > +#define __sysret(arg)								\
> > > > +({										\
> > > > +	__typeof__(arg) __sysret_arg = (arg);					\
> > > 
> > > Here ignores the 'const' flag in input type?
> > 
> > Yes, as explained above, there's no issue with const. The issue
> > that was met in the version I suggested in the message was that
> > there was an assignment to the variable of value -1 to be returned,
> > which is not permitted when it's const, and I said that it was not
> > necessary, it was just a convenience, but that using "?:" does the
> > job as well without having to do any assignment.
> 
> Sorry, I have mixed the 'assignment' in definition (no problem here) and
> the 'assignment' in late change (__sysret_arg = -1). The ternary
> operation here used is really a great idea ;-) 

But please do not call that an "idea", it's just one of many language
constructs, and precisely the one designed exactly for this. Use the
right tool for the job!

> > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be
> > honest, because we're there just because of the temptation to remove
> > lines that were not causing any difficulties :-/
> >
> > I think we can do something in-between and deal only with signed returns,
> > and explicitly place the test for MAX_ERRNO on the two unsigned ones
> > (brk and mmap). It should look approximately like this:
> > 
> >  #define __sysret(arg)                                                \
> >  ({                                                                   \
> >  	__typeof__(arg) __sysret_arg = (arg);                           \
> >  	(__sysret_arg < 0) ? ({           /* error ? */                 \
> >  		SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */      \
> >  		((__typeof__(arg)) -1);   /*      return -1 */          \
> >  	}) : __sysret_arg;                /* return original value */   \
> >  })
> >
> 
> I like this one very much, a simple test shows, it saves one more byte.

I'm going to do that and revert the 3 affected syscalls.

> Only a quesiton, why 'errno != -ret' has a '!'? and we have post-tab in
> above two lines of __sysret() too, I have changed them to whitespaces.

Just because I'm not good at writing illustrative code directly in e-mail,
which was warned against via "approximately like this". I do not even
expect it to compile since I'm not sure it has the right number of braces
and parenthesis but indent gives the idea.

> For the brk and mmap part, it is ok to restore them as before, and it
> also works with the new patchset we proposed above, only need such
> tuning:
> 
>     #define sys_brk(...)                               __sysdef(brk, __VA_ARGS__)
>     #define sys_mmap(...)                              __sysdef(mmap, __VA_ARGS__)
>     #define sys_mmap2(...)                             __sysdef(mmap2, __VA_ARGS__)
> 
>     -->
> 
>     #define sys_brk(...)                       (void *)__sysdef(brk, __VA_ARGS__)
>     #define sys_mmap(...)                      (void *)__sysdef(mmap, __VA_ARGS__)
>     #define sys_mmap2(...)                     (void *)__sysdef(mmap2, __VA_ARGS__)

I don't care what it implies for now. We're talking about fixing some
breakage, you'll have plenty of time later to see how to adapt your
future proposals to what is committed. We DO NOT adapt fixes to what's
expected to come later.

> But let them align with the others may be better, so, most of the sys_*
> macros can be simply mapped with a simple line (all of them are
> generated automatically), without the care of the return types changing.
> 
> So, Willy, as a summary:
> 
> - one solution is your new __sysret() + restore the original SET_ERRNO
>   for mmap and brk [1].
> 
> - another solution is your new __sysret() + my patch [2] to let mmap and brk
>   return 'long' as the other sys_* function does.

No, because it will completely break them when they'll need to access the
second half of the memory, as I already explained somewhere else in one
of these numerous discussions. 

> Both of them are ok for me, but If we can apply the second one, the
> proposed patchset above may be cleaner. The patch [2] is really a
> prepare patch for the above proposed patchset, that is why I send it
> before that patchset.
> 
> It is time to finish the size inflate regression issue, after you apply
> any of the above solutions,

I'll do, because this has lasted far too long and gone way too far for
what was supposed to be a trivial fix.


> I will resell my proposed patchset above, at
> least a RFC patchset, please ignore this currently ;-) 

Please allow me to breathe a little bit. Really I mean, I'm already worn
by having to constantly review breaking changes that either introduce
bugs or break maintainability, and to have to justify myself for things
that I thought would be obvious to anyone. Massive changes are extremely
time consuming to review, and trying to figure subtle breakage in such
low-level stuff is even harder. I simply can't assign more time to this,
particularly for the expected gains which for me or often perceived as
losses of maintainability instead :-/

Thanks,
Willy
Zhangjin Wu Aug. 14, 2023, 10:42 a.m. UTC | #6
Hi, Willy

> On Sun, Aug 13, 2023 at 09:26:20PM +0800, Zhangjin Wu wrote:
> [...]
> > With this exception, s390 no long need to provide its own mmap
> > definition, it (seems i386 too, but it uses mmap2 currently) can simply
> > define '__ARCH_WANT_SYS_OLD_MMAP' as the '__ARCH_WANT_SYS_OLD_SELECT' we
> > are using for old_select.
> > 
> > The same method applies to the selection of the different backward
> > version of the sys_clone() syscall (from kernel/fork.c):
> (...)
> 
> >     #ifdef __NR_clone
> >     #undef sys_clone
> >     #define __sys_clone(...) __sysdef(clone, __VA_ARGS__)
> >     
> >     static __attribute__((unused))
> >     int sys_clone(unsigned long clone_flags, unsigned long newsp,
> >                   int __attribute__((unused)) stack_size,
> >                   int parent_tidptr, int child_tidptr, unsigned long tls)
> >     {
> >             long ret;
> >     #ifdef __ARCH_WANT_SYS_CLONE_BACKWARDS
> >             ret = __sys_clone(clone_flags, newsp, parent_tidptr, tls, child_tidptr);
> >     #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS2)
> >             ret = __sys_clone(newsp, clone_flags, parent_tidptr, child_tidptr, tls);
> >     #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS3)
> >             ret = __sys_clone(clone_flags, newsp, stack_size, parent_tidptr, child_tidptr, tls);
> >     #else
> >             ret = __sys_clone(clone_flags, newsp, parent_tidptr, child_tidptr, tls);
> >     #endif
> >             return ret;
> >     }
> >     #endif /* __NR_clone */
> > 
> > s390 only requires to define '__ARCH_WANT_SYS_CLONE_BACKWARDS2', no need
> > to provide its own sys_fork() version, in the __NR_clone branch of
> > fork(), __ARCH_WANT_SYS_CLONE_BACKWARDS2 can directly select the right
> > version of sys_clone() for s390).
> 
> Maybe but with much less #define indirections it would be significantly
> better.
> 
> (...)
> > We only have these three exceptions currently, with this normalization,
> > the library routines from sys.h can directly think sys_* macros are
> > generic, if not, let syscall.h take care of the right exceptions.
> 
> I see the point. But that doesn't remove the need to write the exported
> function itself. I'm not saying there's nothing to save here, I see your
> point, I'm just wondering if we really gain something in terms of ease
> of declaring new syscalls especially for first-time contributors and if
> we're not losing in maintenance. If at least it's easy enough to implement
> exceptions, maybe it could be worth further investigating.
>

I will delay the whole work about __sysdef(), but work on some more generic
parts (like the exceptions above) at first.

> > > >     static __attribute__((unused))
> > > >     int dup2(int old, int new)
> > > >     {
> > > > 	int ret = sys_dup3(old, new, 0);
> > > > 
> > > > 	if (ret == -ENOSYS)
> > > > 		ret = sys_dup2(old, new);
> > > > 
> > > > 	return __sysret(ret);
> > > >     }
> > > 
> > > But this will add a useless test after all such syscalls, we'd rather
> > > not do that!
> > >
> > 
> > Indeed, I found this issue too, when __NR_dup3 not defined, it returns
> > -ENOSYS, than, no size issue, otherwise, the compiler will not be able
> > to learn what the ret of sys_dup3() will be, so, it can not optimize the
> > second call to sys_dup2().
> > 
> > So, the '#ifdef' logic must be used like we did in sys_* functions, but
> > it is really not that meaningful (no big gain as you mentioned above) if
> > we only move them from the sys_* functions to the library routines.
> > 
> > At last, I found the ternary operation together with the initialization
> > of the not-defined __NR_* as NOLIBC__NR_NOSYS help this a lot, at last,
> > we get something like this:
> > 
> >     /* __systry2() is used to select one of two provided low level syscalls */
> >     #define __systry2(a, sys_a, sys_b) \
> >     	((NOLIBC__NR_##a != NOLIBC__NR_NOSYS) ? (sys_a) : (sys_b))
> 
> But this supposes that all of them are manually defined as you did above.
> I'd rather implement an ugly is_numeric() macro based on argument
> resolution. I've done it once in another project, I don't remember
> precisely where it is but I vaguely remember that it used to check
> that the string resolution of the argument gave a letter (when it
> does not exist) or a digit (when it does). I can look into that later
> if needed. But please avoid extra macro definitions as much as possible,
> they're a real pain to handle in the code. There's no error when one is
> missing or has a typo, it's difficult to follow them and they don't
> appear in the debugger.
>

Yeah, your reply inspired me to look into the IS_ENABLED() from
../include/linux/kconfig.h macro again, there was a __is_defined() there, let's
throw away the ugly sysnr.h. I thought of IS_ENABLED() was only for y/n/m
before, but it does return 0 when the macro is not defined, it uses the same
trick in syscall() to calculate the number of arguments, if the macro is not
defined, then, 0 "argument".

> > It can eliminate all of the '#ifdef' stuffs, using the chmod example you
> > mentioned above, it becomes something like this:
> > 
> >     /*
> >      * int chmod(const char *path, mode_t mode);
> >      */
> >     
> >     static __attribute__((unused))
> >     int chmod(const char *path, mode_t mode)
> >     {
> >     	return __sysret(__systry2(chmod, sys_chmod(path, mode), sys_fchmodat(AT_FDCWD, path, mode, 0)));
> >     }
> > 
> > Purely clean and clear.
> 
> That's a matter of taste and it may explain why we often disagree. For me
> it's horrid. If I'm the one implementing chmod for my platform and it does
> not work, what should I do when facing that, except want to cry ? Think
> that right now we have this:
> 
>   static __attribute__((unused))
>   int sys_chmod(const char *path, mode_t mode)
>   {
>   #ifdef __NR_fchmodat
>           return my_syscall4(__NR_fchmodat, AT_FDCWD, path, mode, 0);
>   #elif defined(__NR_chmod)
>           return my_syscall2(__NR_chmod, path, mode);
>   #else
>           return -ENOSYS;
>   #endif
>   }
> 
> Sure it can be called not pretty, but I think it has the merit of being
> totally explicit, and whoever sees chmod() fail can quickly check based
> on the test in what situation they're supposed to be and what to check.
> 
> One thing I'm worried about also regarding using my_syscall() without the
> number is that it's easy to miss an argument and have random values taken
> from registers (or the stack) passed as argument. For example above we can
> see that the flags part is 0 in fchmodat(). It's easy to miss themn and
> while the syscall4() will complain, syscall() will silently turn that
> into syscall3(). That's not necessarily a big deal, but we've seen during
> the development that it's easy to make mistakes and they're not trivial
> to spot. So again I'm really wondering about the benefits in such a case.
> 
> This is well illustrated in your example below:
> 
> >     	return __sysret(__systry2(newselect, sys_newselect(nfds, rfds, wfds, efds, timeout),
> >     					     sys_pselect6(nfds, rfds, wfds, efds, timeout ? &t : NULL, NULL)));
> 
> How many attempts to get it right ? Just skip one NULL and you don't
> see it.

Yeah, seems we have missed the last 0 in ppoll() before and the test may not
report about it either.

[...]
> > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be
> > > honest, because we're there just because of the temptation to remove
> > > lines that were not causing any difficulties :-/
> > >
> > > I think we can do something in-between and deal only with signed returns,
> > > and explicitly place the test for MAX_ERRNO on the two unsigned ones
> > > (brk and mmap). It should look approximately like this:
> > > 
> > >  #define __sysret(arg)                                                \
> > >  ({                                                                   \
> > >  	__typeof__(arg) __sysret_arg = (arg);                           \
> > >  	(__sysret_arg < 0) ? ({           /* error ? */                 \
> > >  		SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */      \
> > >  		((__typeof__(arg)) -1);   /*      return -1 */          \
> > >  	}) : __sysret_arg;                /* return original value */   \
> > >  })
> > >
> > 
> > I like this one very much, a simple test shows, it saves one more byte.
> 
> I'm going to do that and revert the 3 affected syscalls.
>

Ok.
 
> > Only a quesiton, why 'errno != -ret' has a '!'? and we have post-tab in
> > above two lines of __sysret() too, I have changed them to whitespaces.
> 
[...]
> 
> > But let them align with the others may be better, so, most of the sys_*
> > macros can be simply mapped with a simple line (all of them are
> > generated automatically), without the care of the return types changing.
> > 
> > So, Willy, as a summary:
> > 
> > - one solution is your new __sysret() + restore the original SET_ERRNO
> >   for mmap and brk [1].
> > 
> > - another solution is your new __sysret() + my patch [2] to let mmap and brk
> >   return 'long' as the other sys_* function does.
> 
> No, because it will completely break them when they'll need to access the
> second half of the memory, as I already explained somewhere else in one
> of these numerous discussions. 
>

Sorry, will recheck this part later, please ignore it.

[...]
> 
> > I will resell my proposed patchset above, at
> > least a RFC patchset, please ignore this currently ;-) 
> 
> Please allow me to breathe a little bit. Really I mean, I'm already worn
> by having to constantly review breaking changes that either introduce
> bugs or break maintainability, and to have to justify myself for things
> that I thought would be obvious to anyone. Massive changes are extremely
> time consuming to review, and trying to figure subtle breakage in such
> low-level stuff is even harder. I simply can't assign more time to this,
> particularly for the expected gains which for me or often perceived as
> losses of maintainability instead :-/
>

Take a rest, I will delay the whole __sysdef() stuff and continue to work on
the last tinyconfig patchset after v6.5, it is the last one before our early
rv32 work ;-)

Thanks,
Zhangjin

> Thanks,
> Willy
David Laight Aug. 14, 2023, 11:15 a.m. UTC | #7
From: Zhangjin Wu
> Sent: 14 August 2023 11:42
...
> [...]
> > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be
> > > > honest, because we're there just because of the temptation to remove
> > > > lines that were not causing any difficulties :-/
> > > >
> > > > I think we can do something in-between and deal only with signed returns,
> > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones
> > > > (brk and mmap). It should look approximately like this:
> > > >
> > > >  #define __sysret(arg)                                                \
> > > >  ({                                                                   \
> > > >  	__typeof__(arg) __sysret_arg = (arg);                           \
> > > >  	(__sysret_arg < 0) ? ({           /* error ? */                 \
> > > >  		SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */      \
> > > >  		((__typeof__(arg)) -1);   /*      return -1 */          \

I'm pretty sure you don't need the explicit cast.
(It would be needed for a pointer type.)
Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg

Thinking, maybe it should be:

#define __sysret(syscall_fn_args)
({
	__typeof__(syscall_fn_args) __rval = syscall_fn_args;
	__rval >= 0 ? __rval : SET_ERRNO(-__rval), -1;
})

Since, IIRC, the usage is return __sysret(sycall_fn(args));

I'm not sure how public SET_ERRO() is.
But it could include the negate have the value of -1 cast to its argument type?
I think:
	error = -(int)(long)(arg + 0u);
will avoid any sign extension - the (int) might not even be needed.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Willy Tarreau Aug. 14, 2023, 12:09 p.m. UTC | #8
Hi David,

On Mon, Aug 14, 2023 at 11:15:51AM +0000, David Laight wrote:
> From: Zhangjin Wu
> > Sent: 14 August 2023 11:42
> ...
> > [...]
> > > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be
> > > > > honest, because we're there just because of the temptation to remove
> > > > > lines that were not causing any difficulties :-/
> > > > >
> > > > > I think we can do something in-between and deal only with signed returns,
> > > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones
> > > > > (brk and mmap). It should look approximately like this:
> > > > >
> > > > >  #define __sysret(arg)                                                \
> > > > >  ({                                                                   \
> > > > >  	__typeof__(arg) __sysret_arg = (arg);                           \
> > > > >  	(__sysret_arg < 0) ? ({           /* error ? */                 \
> > > > >  		SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */      \
> > > > >  		((__typeof__(arg)) -1);   /*      return -1 */          \
> 
> I'm pretty sure you don't need the explicit cast.
> (It would be needed for a pointer type.)
> Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg
> 
> Thinking, maybe it should be:
> 
> #define __sysret(syscall_fn_args)
> ({
> 	__typeof__(syscall_fn_args) __rval = syscall_fn_args;
> 	__rval >= 0 ? __rval : SET_ERRNO(-__rval), -1;
> })

Yeah almost, since arg is necessarily signed in this version, it's
just that I manually edited the previous macro in the mail and limited
the amount of changes to what was necessary. It's just that SET_ERRNO
only is an instruction, not an expression:

   #define SET_ERRNO(v) do { errno = (v); } while (0)

Thus the return value doesn't even pass through it. That's why it was
so much simpler before. The rationale behind this was to bring the
ability to completely drop errno for programs where you didn't care
about it. It's particularly interesting when you don't need any other
data either as the program gets strunk from a complete section.

> Since, IIRC, the usage is return __sysret(sycall_fn(args));
 
> I'm not sure how public SET_ERRO() is.

For now it is entirely, though it's not supposed to. Thomas and I
have been discussing about renaming some internal-use macros and
functions to avoid needlessly exposing them by accident to the
application. These one definitely qualifies.

> But it could include the negate have the value of -1 cast to its argument type?
> I think:
> 	error = -(int)(long)(arg + 0u);
> will avoid any sign extension - the (int) might not even be needed.

So with a signed (int/long) input and errno as int, I don't think
we can have any case where there's any such extension anyway. In
any case we're either copying the int as-is or truncating it.

Regards,
Willy
David Laight Aug. 14, 2023, 12:27 p.m. UTC | #9
From: Willy Tarreau
> Sent: 14 August 2023 13:10
> 
> Hi David,
> 
> On Mon, Aug 14, 2023 at 11:15:51AM +0000, David Laight wrote:
> > From: Zhangjin Wu
> > > Sent: 14 August 2023 11:42
> > ...
> > > [...]
> > > > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be
> > > > > > honest, because we're there just because of the temptation to remove
> > > > > > lines that were not causing any difficulties :-/
> > > > > >
> > > > > > I think we can do something in-between and deal only with signed returns,
> > > > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones
> > > > > > (brk and mmap). It should look approximately like this:
> > > > > >
> > > > > >  #define __sysret(arg)                                                \
> > > > > >  ({                                                                   \
> > > > > >  	__typeof__(arg) __sysret_arg = (arg);                           \
> > > > > >  	(__sysret_arg < 0) ? ({           /* error ? */                 \
> > > > > >  		SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */      \
> > > > > >  		((__typeof__(arg)) -1);   /*      return -1 */          \
> >
> > I'm pretty sure you don't need the explicit cast.
> > (It would be needed for a pointer type.)
> > Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg
> >
> > Thinking, maybe it should be:
> >
> > #define __sysret(syscall_fn_args)
> > ({
> > 	__typeof__(syscall_fn_args) __rval = syscall_fn_args;
> > 	__rval >= 0 ? __rval : SET_ERRNO(-__rval), -1;
> > })
> 
> Yeah almost, since arg is necessarily signed in this version, it's
> just that I manually edited the previous macro in the mail and limited
> the amount of changes to what was necessary. It's just that SET_ERRNO
> only is an instruction, not an expression:
> 
>    #define SET_ERRNO(v) do { errno = (v); } while (0)
> 
> Thus the return value doesn't even pass through it. That's why it was
> so much simpler before. The rationale behind this was to bring the
> ability to completely drop errno for programs where you didn't care
> about it. It's particularly interesting when you don't need any other
> data either as the program gets strunk from a complete section.

Actually something like:

#define SET_ERRNO(v) (errno = -(long)(v), __typeof__(v)-1)

seems to work and allows the errno assignment be removed.
Also works for pointer types (after a different compare).

A quick check with godbolt doesn't show any sign extensions happening.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Willy Tarreau Aug. 14, 2023, 1:22 p.m. UTC | #10
On Mon, Aug 14, 2023 at 12:27:48PM +0000, David Laight wrote:
> From: Willy Tarreau
> > Sent: 14 August 2023 13:10
> > 
> > Hi David,
> > 
> > On Mon, Aug 14, 2023 at 11:15:51AM +0000, David Laight wrote:
> > > From: Zhangjin Wu
> > > > Sent: 14 August 2023 11:42
> > > ...
> > > > [...]
> > > > > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be
> > > > > > > honest, because we're there just because of the temptation to remove
> > > > > > > lines that were not causing any difficulties :-/
> > > > > > >
> > > > > > > I think we can do something in-between and deal only with signed returns,
> > > > > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones
> > > > > > > (brk and mmap). It should look approximately like this:
> > > > > > >
> > > > > > >  #define __sysret(arg)                                                \
> > > > > > >  ({                                                                   \
> > > > > > >  	__typeof__(arg) __sysret_arg = (arg);                           \
> > > > > > >  	(__sysret_arg < 0) ? ({           /* error ? */                 \
> > > > > > >  		SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */      \
> > > > > > >  		((__typeof__(arg)) -1);   /*      return -1 */          \
> > >
> > > I'm pretty sure you don't need the explicit cast.
> > > (It would be needed for a pointer type.)
> > > Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg
> > >
> > > Thinking, maybe it should be:
> > >
> > > #define __sysret(syscall_fn_args)
> > > ({
> > > 	__typeof__(syscall_fn_args) __rval = syscall_fn_args;
> > > 	__rval >= 0 ? __rval : SET_ERRNO(-__rval), -1;
> > > })
> > 
> > Yeah almost, since arg is necessarily signed in this version, it's
> > just that I manually edited the previous macro in the mail and limited
> > the amount of changes to what was necessary. It's just that SET_ERRNO
> > only is an instruction, not an expression:
> > 
> >    #define SET_ERRNO(v) do { errno = (v); } while (0)
> > 
> > Thus the return value doesn't even pass through it. That's why it was
> > so much simpler before. The rationale behind this was to bring the
> > ability to completely drop errno for programs where you didn't care
> > about it. It's particularly interesting when you don't need any other
> > data either as the program gets strunk from a complete section.
> 
> Actually something like:
> 
> #define SET_ERRNO(v) (errno = -(long)(v), __typeof__(v)-1)
> 
> seems to work and allows the errno assignment be removed.
> Also works for pointer types (after a different compare).

Yes, that's something we can do (with the parenthesis around
__typeof__(v) though).

> A quick check with godbolt doesn't show any sign extensions happening.

I agree there's none here. 

Willy
Zhangjin Wu Aug. 14, 2023, 2:18 p.m. UTC | #11
> On Mon, Aug 14, 2023 at 12:27:48PM +0000, David Laight wrote:
> > From: Willy Tarreau
> > > Sent: 14 August 2023 13:10
> > > 
> > > Hi David,
> > > 
> > > On Mon, Aug 14, 2023 at 11:15:51AM +0000, David Laight wrote:
> > > > From: Zhangjin Wu
> > > > > Sent: 14 August 2023 11:42
> > > > ...
> > > > > [...]
> > > > > > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be
> > > > > > > > honest, because we're there just because of the temptation to remove
> > > > > > > > lines that were not causing any difficulties :-/
> > > > > > > >
> > > > > > > > I think we can do something in-between and deal only with signed returns,
> > > > > > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones
> > > > > > > > (brk and mmap). It should look approximately like this:
> > > > > > > >
> > > > > > > >  #define __sysret(arg)                                                \
> > > > > > > >  ({                                                                   \
> > > > > > > >  	__typeof__(arg) __sysret_arg = (arg);                           \
> > > > > > > >  	(__sysret_arg < 0) ? ({           /* error ? */                 \
> > > > > > > >  		SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */      \
> > > > > > > >  		((__typeof__(arg)) -1);   /*      return -1 */          \
> > > >
> > > > I'm pretty sure you don't need the explicit cast.
> > > > (It would be needed for a pointer type.)
> > > > Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg
> > > >
> > > > Thinking, maybe it should be:
> > > >
> > > > #define __sysret(syscall_fn_args)
> > > > ({
> > > > 	__typeof__(syscall_fn_args) __rval = syscall_fn_args;
> > > > 	__rval >= 0 ? __rval : SET_ERRNO(-__rval), -1;
> > > > })
> > > 
> > > Yeah almost, since arg is necessarily signed in this version, it's
> > > just that I manually edited the previous macro in the mail and limited
> > > the amount of changes to what was necessary. It's just that SET_ERRNO
> > > only is an instruction, not an expression:
> > > 
> > >    #define SET_ERRNO(v) do { errno = (v); } while (0)
> > > 
> > > Thus the return value doesn't even pass through it. That's why it was
> > > so much simpler before. The rationale behind this was to bring the
> > > ability to completely drop errno for programs where you didn't care
> > > about it. It's particularly interesting when you don't need any other
> > > data either as the program gets strunk from a complete section.
> > 
> > Actually something like:
> > 
> > #define SET_ERRNO(v) (errno = -(long)(v), __typeof__(v)-1)
> > 
> > seems to work and allows the errno assignment be removed.
> > Also works for pointer types (after a different compare).
> 
> Yes, that's something we can do (with the parenthesis around
> __typeof__(v) though).
>

Yes, we need the parenthesis, this works:

    #define SET_ERRNO(v) ( errno = -(long)(v), ((__typeof__(v))-1))

    #define __is_unsigned_type(v) ((__typeof__(v))(-1) > (__typeof__(v))1)
    #define __is_syserr(v)        (__is_unsigned_type(v) ? (long)v & ~(-4095UL) : (v + 1 < (__typeof__(v))1))

    #define __sysret(arg)                                                          \
    ({                                                                             \
            __typeof__(arg) __sysret_arg = (arg);                                  \
            (!__is_syserr(__sysret_arg)) ? __sysret_arg : SET_ERRNO(__sysret_arg); \
    })

It looks better now, even for 'void *'.

'(long)x & ~(-4095UL)' saves another 2+ bytes in some architectures:

     i386: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 19256
   x86_64: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 21740
    arm64: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 25812
      arm: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 22856
     mips: 160 test(s): 157 passed,   3 skipped,   0 failed => status: warning 22756
      ppc: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 26380
    ppc64: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 26756
  ppc64le: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 27364
    riscv: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 21758
     s390: 160 test(s): 157 passed,   3 skipped,   0 failed => status: warning 21936

BR,
Zhangjin

> > A quick check with godbolt doesn't show any sign extensions happening.
> 
> I agree there's none here. 
> 
> Willy
Zhangjin Wu Aug. 14, 2023, 3:03 p.m. UTC | #12
> > On Mon, Aug 14, 2023 at 12:27:48PM +0000, David Laight wrote:
> > > From: Willy Tarreau
> > > > Sent: 14 August 2023 13:10
> > > > 
> > > > Hi David,
> > > > 
> > > > On Mon, Aug 14, 2023 at 11:15:51AM +0000, David Laight wrote:
> > > > > From: Zhangjin Wu
> > > > > > Sent: 14 August 2023 11:42
> > > > > ...
> > > > > > [...]
> > > > > > > > > Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be
> > > > > > > > > honest, because we're there just because of the temptation to remove
> > > > > > > > > lines that were not causing any difficulties :-/
> > > > > > > > >
> > > > > > > > > I think we can do something in-between and deal only with signed returns,
> > > > > > > > > and explicitly place the test for MAX_ERRNO on the two unsigned ones
> > > > > > > > > (brk and mmap). It should look approximately like this:
> > > > > > > > >
> > > > > > > > >  #define __sysret(arg)                                                \
> > > > > > > > >  ({                                                                   \
> > > > > > > > >  	__typeof__(arg) __sysret_arg = (arg);                           \
> > > > > > > > >  	(__sysret_arg < 0) ? ({           /* error ? */                 \
> > > > > > > > >  		SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */      \
> > > > > > > > >  		((__typeof__(arg)) -1);   /*      return -1 */          \
> > > > >
> > > > > I'm pretty sure you don't need the explicit cast.
> > > > > (It would be needed for a pointer type.)
> > > > > Can you use __arg < ? SET_ERRNO(-__arg), -1 : __arg
> > > > >
> > > > > Thinking, maybe it should be:
> > > > >
> > > > > #define __sysret(syscall_fn_args)
> > > > > ({
> > > > > 	__typeof__(syscall_fn_args) __rval = syscall_fn_args;
> > > > > 	__rval >= 0 ? __rval : SET_ERRNO(-__rval), -1;
> > > > > })
> > > > 
> > > > Yeah almost, since arg is necessarily signed in this version, it's
> > > > just that I manually edited the previous macro in the mail and limited
> > > > the amount of changes to what was necessary. It's just that SET_ERRNO
> > > > only is an instruction, not an expression:
> > > > 
> > > >    #define SET_ERRNO(v) do { errno = (v); } while (0)
> > > > 
> > > > Thus the return value doesn't even pass through it. That's why it was
> > > > so much simpler before. The rationale behind this was to bring the
> > > > ability to completely drop errno for programs where you didn't care
> > > > about it. It's particularly interesting when you don't need any other
> > > > data either as the program gets strunk from a complete section.
> > > 
> > > Actually something like:
> > > 
> > > #define SET_ERRNO(v) (errno = -(long)(v), __typeof__(v)-1)
> > > 
> > > seems to work and allows the errno assignment be removed.
> > > Also works for pointer types (after a different compare).
> > 
> > Yes, that's something we can do (with the parenthesis around
> > __typeof__(v) though).
> >
> 
> Yes, we need the parenthesis, this works:
> 
>     #define SET_ERRNO(v) ( errno = -(long)(v), ((__typeof__(v))-1))
>

Willy, David, We still need "({})":

	#define SET_ERRNO(v) ({ errno = -(long)(v); (__typeof__(v))-1; })

to silence the warning:

	sysroot/x86_64/include/errno.h:13:42: warning: right-hand operand of comma expression has no effect [-Wunused-value]
	   13 | #define SET_ERRNO(v) ( errno = -(long)(v), (__typeof__(v))-1)
	      |                      ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~

BR,
Zhangjin

>     #define __is_unsigned_type(v) ((__typeof__(v))(-1) > (__typeof__(v))1)
>     #define __is_syserr(v)        (__is_unsigned_type(v) ? (long)v & ~(-4095UL) : (v + 1 < (__typeof__(v))1))
> 
>     #define __sysret(arg)                                                          \
>     ({                                                                             \
>             __typeof__(arg) __sysret_arg = (arg);                                  \
>             (!__is_syserr(__sysret_arg)) ? __sysret_arg : SET_ERRNO(__sysret_arg); \
>     })
> 
> It looks better now, even for 'void *'.
> 
> '(long)x & ~(-4095UL)' saves another 2+ bytes in some architectures:
> 
>      i386: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 19256
>    x86_64: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 21740
>     arm64: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 25812
>       arm: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 22856
>      mips: 160 test(s): 157 passed,   3 skipped,   0 failed => status: warning 22756
>       ppc: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 26380
>     ppc64: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 26756
>   ppc64le: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 27364
>     riscv: 160 test(s): 158 passed,   2 skipped,   0 failed => status: warning 21758
>      s390: 160 test(s): 157 passed,   3 skipped,   0 failed => status: warning 21936
> 
> BR,
> Zhangjin
Zhangjin Wu Aug. 14, 2023, 5:22 p.m. UTC | #13
Hi, Willy

> [...]
> > > 
> > >     /* __systry2() is used to select one of two provided low level syscalls */
> > >     #define __systry2(a, sys_a, sys_b) \
> > >     	((NOLIBC__NR_##a != NOLIBC__NR_NOSYS) ? (sys_a) : (sys_b))
> > 
> > But this supposes that all of them are manually defined as you did above.
> > I'd rather implement an ugly is_numeric() macro based on argument
> > resolution. I've done it once in another project, I don't remember
> > precisely where it is but I vaguely remember that it used to check
> > that the string resolution of the argument gave a letter (when it
> > does not exist) or a digit (when it does). I can look into that later
> > if needed. But please avoid extra macro definitions as much as possible,
> > they're a real pain to handle in the code. There's no error when one is
> > missing or has a typo, it's difficult to follow them and they don't
> > appear in the debugger.
> >
> 
> Yeah, your reply inspired me to look into the IS_ENABLED() from
> ../include/linux/kconfig.h macro again, there was a __is_defined() there, let's
> throw away the ugly sysnr.h. I thought of IS_ENABLED() was only for y/n/m
> before, but it does return 0 when the macro is not defined, it uses the same
> trick in syscall() to calculate the number of arguments, if the macro is not
> defined, then, 0 "argument".
>

The above trick is only for ""#define something 1" ;-)

BR,
Zhangjin
diff mbox series

Patch

diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
index 833d6c5e86dc..6bdd18716e84 100644
--- a/tools/include/nolibc/sys.h
+++ b/tools/include/nolibc/sys.h
@@ -27,23 +27,67 @@ 
 #include "errno.h"
 #include "types.h"
 
+/*
+ * This returns a constant expression while determining if an argument is
+ * a constant expression, most importantly without evaluating the argument.
+ * Glory to Martin Uecker <Martin.Uecker@med.uni-goettingen.de>
+ * (from include/linux/const.h)
+ */
+#define __is_constexpr(x) \
+	(sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
+
+/*
+ * "(void *)0 isn't 'constant enough' for is_constexpr() - so
+ * is_constexpr((type)0) can be used to detect pointer types."
+ * (from David Laight <David.Laight@ACULAB.COM>)
+ */
+#define __is_pointer(x) (!__is_constexpr((__typeof__(x))0))
 
-/* Syscall return helper for library routines, set errno as -ret when ret is in
- * range of [-MAX_ERRNO, -1]
+/*
+ * To preserve the input type and workaround the 'error: assignment of
+ * read-only variable' when the input type has 'const' flag.
+ *
+ * For gcc >= 11.0 (__GXX_ABI_VERSION = 1016), use the new __auto_type keyword
+ * instead of __typeof__().
  *
- * Note, No official reference states the errno range here aligns with musl
- * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
+ * For old gcc versions, "use typeof((x) + 0) to lose the 'const' flag. The
+ * only downside is that char/short become int." (from David Laight
+ * <David.Laight@ACULAB.COM>)
  */
 
-static __inline__ __attribute__((unused, always_inline))
-long __sysret(unsigned long ret)
-{
-	if (ret >= (unsigned long)-MAX_ERRNO) {
-		SET_ERRNO(-(long)ret);
-		return -1;
-	}
-	return ret;
-}
+#if __GXX_ABI_VERSION >= 1016
+#define __typeofdecl(arg) __auto_type
+#else
+#define __typeofdecl(arg) __typeof__((arg) + 0)
+#endif
+
+/* Syscall return helper for library routines
+ *
+ * - for pointer returns, set errno as -ret when ret is in [-MAX_ERRNO, -1]
+ * - for integer returns, set errno as -ret when ret < 0
+ *
+ * Note,
+ *
+ * - No official reference states the errno range, here aligns with musl
+ *   (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h).
+ *
+ * - To reduce binary size by removing useless type conversions and sign
+ *   extensions, the helper is defined as a macro to preserve input type and
+ *   provide two comparisons for both pointer and integer types during the
+ *   compiling stage.
+ */
+
+#define __sysret(arg)                                                                              \
+({                                                                                                 \
+	__typeofdecl(arg) __ret = (arg);                                                           \
+	if (__builtin_choose_expr(__is_pointer(arg), (unsigned long)-(MAX_ERRNO + 1), (long)__ret) \
+		< __builtin_choose_expr(__is_pointer(arg), (unsigned long)__ret, 0)) {             \
+		SET_ERRNO(-(long)__ret);                                                           \
+		__ret = (__typeof__(arg))-1L;                                                      \
+	}                                                                                          \
+	__ret;                                                                                     \
+})
+
 
 /* Functions in this file only describe syscalls. They're declared static so
  * that the compiler usually decides to inline them while still being allowed
@@ -94,7 +138,7 @@  void *sbrk(intptr_t inc)
 	if (ret && sys_brk(ret + inc) == ret + inc)
 		return ret + inc;
 
-	return (void *)__sysret(-ENOMEM);
+	return __sysret((void *)-ENOMEM);
 }
 
 
@@ -682,7 +726,7 @@  void *sys_mmap(void *addr, size_t length, int prot, int flags, int fd,
 static __attribute__((unused))
 void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
 {
-	return (void *)__sysret((unsigned long)sys_mmap(addr, length, prot, flags, fd, offset));
+	return __sysret(sys_mmap(addr, length, prot, flags, fd, offset));
 }
 
 static __attribute__((unused))