diff mbox series

configure: Avoid using strings binary

Message ID 83824abdddf124d76f9f265f77808e859dc094a8.1665650275.git.mprivozn@redhat.com (mailing list archive)
State New, archived
Headers show
Series configure: Avoid using strings binary | expand

Commit Message

Michal Prívozník Oct. 13, 2022, 8:37 a.m. UTC
When determining the endiandness of the target architecture we're
building for a small program is compiled, which in an obfuscated
way declares two strings. Then, we look which string is in
correct order (using strings binary) and deduct the endiandness.
But using the strings binary is problematic, because it's part of
toolchain (strings is just a symlink to
x86_64-pc-linux-gnu-strings or llvm-strings). And when
(cross-)compiling, it requires users to set the symlink to the
correct toolchain.

Fortunately, we have a better alternative anyways. Since we
require either clang or gcc we can rely on macros they declare.

Bug: https://bugs.gentoo.org/876933
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 configure | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

Comments

Michal Prívozník Oct. 13, 2022, 9:17 a.m. UTC | #1
On 10/13/22 10:37, Michal Privoznik wrote:
> When determining the endiandness of the target architecture we're
> building for a small program is compiled, which in an obfuscated
> way declares two strings. Then, we look which string is in
> correct order (using strings binary) and deduct the endiandness.
> But using the strings binary is problematic, because it's part of
> toolchain (strings is just a symlink to
> x86_64-pc-linux-gnu-strings or llvm-strings). And when
> (cross-)compiling, it requires users to set the symlink to the
> correct toolchain.
> 
> Fortunately, we have a better alternative anyways. Since we
> require either clang or gcc we can rely on macros they declare.
> 
> Bug: https://bugs.gentoo.org/876933
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
>  configure | 33 ++++++++++++++++++---------------
>  1 file changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/configure b/configure
> index 45ee6f4eb3..91e04635cb 100755
> --- a/configure
> +++ b/configure
> @@ -1426,27 +1426,30 @@ fi
>  # ---
>  # big/little endian test
>  cat > $TMPC << EOF
> -#include <stdio.h>
> -short big_endian[] = { 0x4269, 0x4765, 0x4e64, 0x4961, 0x4e00, 0, };
> -short little_endian[] = { 0x694c, 0x7454, 0x654c, 0x6e45, 0x6944, 0x6e41, 0, };
> -int main(int argc, char *argv[])
> -{
> -    return printf("%s %s\n", (char *)big_endian, (char *)little_endian);
> -}
> +#if defined(__BYTE_ORDER) && __BYTE_ORDER == __BIG_ENDIAN || \

Actually, this needs to be __BYTE_ORDER__ (missing those two underscores
at the end).

> +    defined(__BIG_ENDIAN__)
> +# error BIG
> +#endif
> +int main(void) { return 0; }
>  EOF
>  
>  if compile_prog ; then
> -    if strings -a $TMPE | grep -q BiGeNdIaN ; then
> -        bigendian="yes"
> -    elif strings -a $TMPE | grep -q LiTtLeEnDiAn ; then
> -        bigendian="no"
> -    else
> -        echo big/little test failed
> -        exit 1
> -    fi
> +  bigendian="yes"

And this needs to be no. Will post v2 shortly.

Michal
Peter Maydell Oct. 13, 2022, 10:39 a.m. UTC | #2
On Thu, 13 Oct 2022 at 09:47, Michal Privoznik <mprivozn@redhat.com> wrote:
>
> When determining the endiandness of the target architecture we're
> building for a small program is compiled, which in an obfuscated
> way declares two strings. Then, we look which string is in
> correct order (using strings binary) and deduct the endiandness.
> But using the strings binary is problematic, because it's part of
> toolchain (strings is just a symlink to
> x86_64-pc-linux-gnu-strings or llvm-strings). And when
> (cross-)compiling, it requires users to set the symlink to the
> correct toolchain.
>
> Fortunately, we have a better alternative anyways. Since we
> require either clang or gcc we can rely on macros they declare.
>
> Bug: https://bugs.gentoo.org/876933
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>

If we can determine this just by looking at C macros, does
this really need to be a configure test at all ? Paolo?



> ---
>  configure | 33 ++++++++++++++++++---------------
>  1 file changed, 18 insertions(+), 15 deletions(-)
>
> diff --git a/configure b/configure
> index 45ee6f4eb3..91e04635cb 100755
> --- a/configure
> +++ b/configure
> @@ -1426,27 +1426,30 @@ fi
>  # ---
>  # big/little endian test
>  cat > $TMPC << EOF
> -#include <stdio.h>
> -short big_endian[] = { 0x4269, 0x4765, 0x4e64, 0x4961, 0x4e00, 0, };
> -short little_endian[] = { 0x694c, 0x7454, 0x654c, 0x6e45, 0x6944, 0x6e41, 0, };
> -int main(int argc, char *argv[])
> -{
> -    return printf("%s %s\n", (char *)big_endian, (char *)little_endian);
> -}
> +#if defined(__BYTE_ORDER) && __BYTE_ORDER == __BIG_ENDIAN || \
> +    defined(__BIG_ENDIAN__)
> +# error BIG
> +#endif
> +int main(void) { return 0; }
>  EOF
>
>  if compile_prog ; then
> -    if strings -a $TMPE | grep -q BiGeNdIaN ; then
> -        bigendian="yes"
> -    elif strings -a $TMPE | grep -q LiTtLeEnDiAn ; then
> -        bigendian="no"
> -    else
> -        echo big/little test failed
> -        exit 1
> -    fi
> +  bigendian="yes"
>  else
> +  cat > $TMPC << EOF
> +#if defined(__BYTE_ORDER) && __BYTE_ORDER == __LITTLE_ENDIAN || \
> +    defined(__LITTLE_ENDIAN__)
> +# error LITTLE
> +#endif
> +int main(void) { return 0; }
> +EOF
> +
> +  if compile_prog ; then
> +    bigendian="no"
> +  else
>      echo big/little test failed
>      exit 1
> +  fi
>  fi

thanks
-- PMM
Michal Prívozník Oct. 13, 2022, 10:43 a.m. UTC | #3
On 10/13/22 12:39, Peter Maydell wrote:
> On Thu, 13 Oct 2022 at 09:47, Michal Privoznik <mprivozn@redhat.com> wrote:
>>
>> When determining the endiandness of the target architecture we're
>> building for a small program is compiled, which in an obfuscated
>> way declares two strings. Then, we look which string is in
>> correct order (using strings binary) and deduct the endiandness.
>> But using the strings binary is problematic, because it's part of
>> toolchain (strings is just a symlink to
>> x86_64-pc-linux-gnu-strings or llvm-strings). And when
>> (cross-)compiling, it requires users to set the symlink to the
>> correct toolchain.
>>
>> Fortunately, we have a better alternative anyways. Since we
>> require either clang or gcc we can rely on macros they declare.
>>
>> Bug: https://bugs.gentoo.org/876933
>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> 
> If we can determine this just by looking at C macros, does
> this really need to be a configure test at all ? Paolo?

Yes, because we're using this information to generate a file for meson
that's later used during cross compilation.

Michal
Daniel P. Berrangé Oct. 13, 2022, 11:07 a.m. UTC | #4
On Thu, Oct 13, 2022 at 11:39:34AM +0100, Peter Maydell wrote:
> On Thu, 13 Oct 2022 at 09:47, Michal Privoznik <mprivozn@redhat.com> wrote:
> >
> > When determining the endiandness of the target architecture we're
> > building for a small program is compiled, which in an obfuscated
> > way declares two strings. Then, we look which string is in
> > correct order (using strings binary) and deduct the endiandness.
> > But using the strings binary is problematic, because it's part of
> > toolchain (strings is just a symlink to
> > x86_64-pc-linux-gnu-strings or llvm-strings). And when
> > (cross-)compiling, it requires users to set the symlink to the
> > correct toolchain.
> >
> > Fortunately, we have a better alternative anyways. Since we
> > require either clang or gcc we can rely on macros they declare.
> >
> > Bug: https://bugs.gentoo.org/876933
> > Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> 
> If we can determine this just by looking at C macros, does
> this really need to be a configure test at all ? Paolo?

We don't need to rely on CLang / GCC macros either, as this
is exposed by GLib 

$ grep BYTE_ORDER /usr/lib64/glib-2.0/include/glibconfig.h
#define G_BYTE_ORDER G_LITTLE_ENDIAN

IOW, any code that needs to know can do one of:

  #if G_BYTE_ORDER == G_LITTLE_ENDIAN

  #if G_BYTE_ORDER == G_BIG_ENDIAN


The only thing 'configure' seems to be doing with the 'bigendian'
env var it sets, is to construct a meson cross compiler spec

  if test "$cross_compile" = "yes"; then
    cross_arg="--cross-file config-meson.cross"
    echo "[host_machine]" >> $cross
    echo "system = '$targetos'" >> $cross
    case "$cpu" in
        i386)
            echo "cpu_family = 'x86'" >> $cross
            ;;
        *)
            echo "cpu_family = '$cpu'" >> $cross
            ;;
    esac
    echo "cpu = '$cpu'" >> $cross
    if test "$bigendian" = "yes" ; then
        echo "endian = 'big'" >> $cross
    else
        echo "endian = 'little'" >> $cross
    fi

so we do need a compile time test in configure, but I'd suggest
using G_BYTE_ORDER

With regards,
Daniel
Peter Maydell Oct. 13, 2022, 11:41 a.m. UTC | #5
On Thu, 13 Oct 2022 at 12:08, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, Oct 13, 2022 at 11:39:34AM +0100, Peter Maydell wrote:
> > On Thu, 13 Oct 2022 at 09:47, Michal Privoznik <mprivozn@redhat.com> wrote:
> > >
> > > When determining the endiandness of the target architecture we're
> > > building for a small program is compiled, which in an obfuscated
> > > way declares two strings. Then, we look which string is in
> > > correct order (using strings binary) and deduct the endiandness.
> > > But using the strings binary is problematic, because it's part of
> > > toolchain (strings is just a symlink to
> > > x86_64-pc-linux-gnu-strings or llvm-strings). And when
> > > (cross-)compiling, it requires users to set the symlink to the
> > > correct toolchain.
> > >
> > > Fortunately, we have a better alternative anyways. Since we
> > > require either clang or gcc we can rely on macros they declare.
> > >
> > > Bug: https://bugs.gentoo.org/876933
> > > Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> >
> > If we can determine this just by looking at C macros, does
> > this really need to be a configure test at all ? Paolo?
>
> We don't need to rely on CLang / GCC macros either, as this
> is exposed by GLib
>
> $ grep BYTE_ORDER /usr/lib64/glib-2.0/include/glibconfig.h
> #define G_BYTE_ORDER G_LITTLE_ENDIAN
>
> IOW, any code that needs to know can do one of:
>
>   #if G_BYTE_ORDER == G_LITTLE_ENDIAN
>
>   #if G_BYTE_ORDER == G_BIG_ENDIAN

It would be more consistent for configure to do this the same
way that compiler.h does, though:

#define HOST_BIG_ENDIAN (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)

thanks
-- PMM
Marc-André Lureau Oct. 13, 2022, 12:26 p.m. UTC | #6
Hi


On Thu, Oct 13, 2022 at 3:50 PM Peter Maydell <peter.maydell@linaro.org>
wrote:

> On Thu, 13 Oct 2022 at 12:08, Daniel P. Berrangé <berrange@redhat.com>
> wrote:
> >
> > On Thu, Oct 13, 2022 at 11:39:34AM +0100, Peter Maydell wrote:
> > > On Thu, 13 Oct 2022 at 09:47, Michal Privoznik <mprivozn@redhat.com>
> wrote:
> > > >
> > > > When determining the endiandness of the target architecture we're
> > > > building for a small program is compiled, which in an obfuscated
> > > > way declares two strings. Then, we look which string is in
> > > > correct order (using strings binary) and deduct the endiandness.
> > > > But using the strings binary is problematic, because it's part of
> > > > toolchain (strings is just a symlink to
> > > > x86_64-pc-linux-gnu-strings or llvm-strings). And when
> > > > (cross-)compiling, it requires users to set the symlink to the
> > > > correct toolchain.
> > > >
> > > > Fortunately, we have a better alternative anyways. Since we
> > > > require either clang or gcc we can rely on macros they declare.
> > > >
> > > > Bug: https://bugs.gentoo.org/876933
> > > > Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> > >
> > > If we can determine this just by looking at C macros, does
> > > this really need to be a configure test at all ? Paolo?
> >
> > We don't need to rely on CLang / GCC macros either, as this
> > is exposed by GLib
> >
> > $ grep BYTE_ORDER /usr/lib64/glib-2.0/include/glibconfig.h
> > #define G_BYTE_ORDER G_LITTLE_ENDIAN
> >
> > IOW, any code that needs to know can do one of:
> >
> >   #if G_BYTE_ORDER == G_LITTLE_ENDIAN
> >
> >   #if G_BYTE_ORDER == G_BIG_ENDIAN
>
> It would be more consistent for configure to do this the same
> way that compiler.h does, though:
>
> #define HOST_BIG_ENDIAN (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
>
>
Weird, it should have been introduced with commit e03b56863d ("Replace
config-time define HOST_WORDS_BIGENDIAN"), and it's part of commit
519655970 ("Move HOST_LONG_BITS to compiler.h")...probably my bad with a
rebase.
Paolo Bonzini Oct. 13, 2022, 1:26 p.m. UTC | #7
On 10/13/22 13:07, Daniel P. Berrangé wrote:
> The only thing 'configure' seems to be doing with the 'bigendian'
> env var it sets, is to construct a meson cross compiler spec

Yes, this is then available from host_machine.endian() in Meson.  It was 
used via

config_host_data.set('HOST_WORDS_BIGENDIAN',
                      host_machine.endian() == 'big')

until 6.2.  Now there's no explicit use but I think it's better to keep 
it correct just in case, and avoid future hard to debug issues[1].

> so we do need a compile time test in configure, but I'd suggest
> using G_BYTE_ORDER

It is imaginable that configure produces multiple cross-compilation 
environments for meson, e.g. to compile qboot from sources.  In that 
case glib would not be available, so please do not introduce any uses of 
glib in configure.

More in general, configure should not be doing any compile tests / 
package detection specific to building the emulators.  Meson 0.63 will 
*finally* make it possible to get there.

That said, using __BYTE_ORDER__ is a good idea!

Paolo

[1] there is one build_machine.endian() which I think wants host_machine 
instead, in fact, but it only affects the final summary.
diff mbox series

Patch

diff --git a/configure b/configure
index 45ee6f4eb3..91e04635cb 100755
--- a/configure
+++ b/configure
@@ -1426,27 +1426,30 @@  fi
 # ---
 # big/little endian test
 cat > $TMPC << EOF
-#include <stdio.h>
-short big_endian[] = { 0x4269, 0x4765, 0x4e64, 0x4961, 0x4e00, 0, };
-short little_endian[] = { 0x694c, 0x7454, 0x654c, 0x6e45, 0x6944, 0x6e41, 0, };
-int main(int argc, char *argv[])
-{
-    return printf("%s %s\n", (char *)big_endian, (char *)little_endian);
-}
+#if defined(__BYTE_ORDER) && __BYTE_ORDER == __BIG_ENDIAN || \
+    defined(__BIG_ENDIAN__)
+# error BIG
+#endif
+int main(void) { return 0; }
 EOF
 
 if compile_prog ; then
-    if strings -a $TMPE | grep -q BiGeNdIaN ; then
-        bigendian="yes"
-    elif strings -a $TMPE | grep -q LiTtLeEnDiAn ; then
-        bigendian="no"
-    else
-        echo big/little test failed
-        exit 1
-    fi
+  bigendian="yes"
 else
+  cat > $TMPC << EOF
+#if defined(__BYTE_ORDER) && __BYTE_ORDER == __LITTLE_ENDIAN || \
+    defined(__LITTLE_ENDIAN__)
+# error LITTLE
+#endif
+int main(void) { return 0; }
+EOF
+
+  if compile_prog ; then
+    bigendian="no"
+  else
     echo big/little test failed
     exit 1
+  fi
 fi
 
 ##########################################