Message ID | 20201023200645.1055-4-dbuono@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add support for Control-Flow Integrity | expand |
On 23/10/20 22:06, Daniele Buono wrote: > This patch allows to compile QEMU with link-time optimization (LTO). > Compilation with LTO is handled directly by meson. This patch adds checks > in configure to make sure the toolchain supports LTO. > > Currently, allow LTO only with clang, since I have found a couple of issues > with gcc-based LTO. > > In case fuzzing is enabled, automatically switch to llvm's linker (lld). > The standard bfd linker has a bug where function wrapping (used by the fuzz* > targets) is used in conjunction with LTO. > > Tested with all major versions of clang from 6 to 12 > > Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com> What are the problems like if you have GCC or you ar/linker are not up to the job? I wouldn't mind omitting the tests since this has to be enabled explicitly by the user. Paolo
On Mon, Oct 26, 2020 at 10:51:43AM +0100, Paolo Bonzini wrote: > On 23/10/20 22:06, Daniele Buono wrote: > > This patch allows to compile QEMU with link-time optimization (LTO). > > Compilation with LTO is handled directly by meson. This patch adds checks > > in configure to make sure the toolchain supports LTO. > > > > Currently, allow LTO only with clang, since I have found a couple of issues > > with gcc-based LTO. > > > > In case fuzzing is enabled, automatically switch to llvm's linker (lld). > > The standard bfd linker has a bug where function wrapping (used by the fuzz* > > targets) is used in conjunction with LTO. > > > > Tested with all major versions of clang from 6 to 12 > > > > Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com> > > What are the problems like if you have GCC or you ar/linker are not up > to the job? I wouldn't mind omitting the tests since this has to be > enabled explicitly by the user. We temporarily disabled LTO in Fedora rawhide due to GCC bugs causing wierd test suite asserts. Those were pre-release versions of GCC/binutils though. I've just tested again and LTO works correctly, so I've enabled LTO once again. Regards, Daniel
In terms of ar and linker, if you don't have the right mix it will just stop at link time with an error. In terms of using gcc the errors may be a bit more subtle, similar to what Daniel mentioned. Succesfully compiling but then showing issues at runtime or in the test suite. I'm using ubuntu 18.04 and the stock compiler (based on gcc 7.5) issues a bunch of warnings but compile succesfully with LTO. However, the tcg binary for sparc64 is broken. System-wide emulation stops in OpenFirmware with an exception. User emulation triggers a segmentation fault in some of the test cases. If I compile QEMU with --enable-debug the tests magically work. I briefly tested with gcc-9 and that seemed to work ok, buy your mileage may vary On 10/26/2020 11:50 AM, Daniel P. Berrangé wrote: > On Mon, Oct 26, 2020 at 10:51:43AM +0100, Paolo Bonzini wrote: >> On 23/10/20 22:06, Daniele Buono wrote: >>> This patch allows to compile QEMU with link-time optimization (LTO). >>> Compilation with LTO is handled directly by meson. This patch adds checks >>> in configure to make sure the toolchain supports LTO. >>> >>> Currently, allow LTO only with clang, since I have found a couple of issues >>> with gcc-based LTO. >>> >>> In case fuzzing is enabled, automatically switch to llvm's linker (lld). >>> The standard bfd linker has a bug where function wrapping (used by the fuzz* >>> targets) is used in conjunction with LTO. >>> >>> Tested with all major versions of clang from 6 to 12 >>> >>> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com> >> >> What are the problems like if you have GCC or you ar/linker are not up >> to the job? I wouldn't mind omitting the tests since this has to be >> enabled explicitly by the user. > > We temporarily disabled LTO in Fedora rawhide due to GCC bugs causing > wierd test suite asserts. Those were pre-release versions of GCC/binutils > though. I've just tested again and LTO works correctly, so I've enabled > LTO once again. > > Regards, > Daniel >
On Tue, Oct 27, 2020 at 10:57:14AM -0400, Daniele Buono wrote: > In terms of ar and linker, if you don't have the right mix it will just > stop at link time with an error. > > In terms of using gcc the errors may be a bit more subtle, similar to > what Daniel mentioned. Succesfully compiling but then showing issues at > runtime or in the test suite. > > I'm using ubuntu 18.04 and the stock compiler (based on gcc 7.5) issues > a bunch of warnings but compile succesfully with LTO. > However, the tcg binary for sparc64 is broken. System-wide emulation > stops in OpenFirmware with an exception. User emulation triggers a > segmentation fault in some of the test cases. If I compile QEMU with > --enable-debug the tests magically work. > > I briefly tested with gcc-9 and that seemed to work ok, buy your mileage > may vary This why we shouldn't artificially block use of LTO with GCC in the configure script. It blocks completely legitimate usage of LTO with GCC versions where it works. The user can detect if their version of GCC is broken by running the test suite during their build process, which is best practice already, and actually testing the result. > > On 10/26/2020 11:50 AM, Daniel P. Berrangé wrote: > > On Mon, Oct 26, 2020 at 10:51:43AM +0100, Paolo Bonzini wrote: > > > On 23/10/20 22:06, Daniele Buono wrote: > > > > This patch allows to compile QEMU with link-time optimization (LTO). > > > > Compilation with LTO is handled directly by meson. This patch adds checks > > > > in configure to make sure the toolchain supports LTO. > > > > > > > > Currently, allow LTO only with clang, since I have found a couple of issues > > > > with gcc-based LTO. > > > > > > > > In case fuzzing is enabled, automatically switch to llvm's linker (lld). > > > > The standard bfd linker has a bug where function wrapping (used by the fuzz* > > > > targets) is used in conjunction with LTO. > > > > > > > > Tested with all major versions of clang from 6 to 12 > > > > > > > > Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com> > > > > > > What are the problems like if you have GCC or you ar/linker are not up > > > to the job? I wouldn't mind omitting the tests since this has to be > > > enabled explicitly by the user. > > > > We temporarily disabled LTO in Fedora rawhide due to GCC bugs causing > > wierd test suite asserts. Those were pre-release versions of GCC/binutils > > though. I've just tested again and LTO works correctly, so I've enabled > > LTO once again. > > > > Regards, > > Daniel > > > Regards, Daniel
Ok, no problem. I can definitely disable the check on GCC. Paolo, would you like me to disable checks on AR/linker for lto too? If so, should I add some of this information on a document, perhaps docs/devel/lto.rst, so it is written somewhere for future uses? -- Btw, using lto with gcc I found another interesting warning here (adding scsi maintainer so they can chip in on the solution): In function 'scsi_disk_new_request_dump', inlined from 'scsi_new_request' at ../qemu-cfi-v3/hw/scsi/scsi-disk.c:2588:9: ../qemu-cfi-v3/hw/scsi/scsi-disk.c:2562:17: warning: argument 1 value '18446744073709551612' exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=] line_buffer = g_malloc(len * 5 + 1); ^ ../qemu-cfi-v3/hw/scsi/scsi-disk.c: In function 'scsi_new_request': /usr/include/glib-2.0/glib/gmem.h:78:10: note: in a call to allocation function 'g_malloc' declared here gpointer g_malloc (gsize n_bytes) G_GNUC_MALLOC G_GNUC_ALLOC_SIZE(1); This seems like a bug to me. len is a signed integer filled up by scsi_cdb_length which can return -1 if it can't decode the command. What would probably happen is that we try a g_malloc with something too big and that would fail. However, scsi_disk_new_request_dump is used for tracing and: a) I believe an unknown command here is a possibility, and is handled by the caller - scsi_new_request - that has the following: command = buf[0]; ops = scsi_disk_reqops_dispatch[command]; if (!ops) { ops = &scsi_disk_emulate_reqops; } so a termination here on the malloc is probably not desired. b) In the tracing, we should probably print the content of the buffer anyway, so that the unknown command can be debugged. However, I don't know what size I should use here. I'm thinking either 1, to print just the command header in the buffer, or the max size of the buffer, which I am not sure how to get. Ideas or you prefer having an initial patch and then discuss it there? On 10/27/2020 11:17 AM, Daniel P. Berrangé wrote: > On Tue, Oct 27, 2020 at 10:57:14AM -0400, Daniele Buono wrote: >> In terms of ar and linker, if you don't have the right mix it will just >> stop at link time with an error. >> >> In terms of using gcc the errors may be a bit more subtle, similar to >> what Daniel mentioned. Succesfully compiling but then showing issues at >> runtime or in the test suite. >> >> I'm using ubuntu 18.04 and the stock compiler (based on gcc 7.5) issues >> a bunch of warnings but compile succesfully with LTO. >> However, the tcg binary for sparc64 is broken. System-wide emulation >> stops in OpenFirmware with an exception. User emulation triggers a >> segmentation fault in some of the test cases. If I compile QEMU with >> --enable-debug the tests magically work. >> >> I briefly tested with gcc-9 and that seemed to work ok, buy your mileage >> may vary > > This why we shouldn't artificially block use of LTO with GCC in > the configure script. It blocks completely legitimate usage of > LTO with GCC versions where it works. > > The user can detect if their version of GCC is broken by running the > test suite during their build process, which is best practice already, > and actually testing the result. > >> >> On 10/26/2020 11:50 AM, Daniel P. Berrangé wrote: >>> On Mon, Oct 26, 2020 at 10:51:43AM +0100, Paolo Bonzini wrote: >>>> On 23/10/20 22:06, Daniele Buono wrote: >>>>> This patch allows to compile QEMU with link-time optimization (LTO). >>>>> Compilation with LTO is handled directly by meson. This patch adds checks >>>>> in configure to make sure the toolchain supports LTO. >>>>> >>>>> Currently, allow LTO only with clang, since I have found a couple of issues >>>>> with gcc-based LTO. >>>>> >>>>> In case fuzzing is enabled, automatically switch to llvm's linker (lld). >>>>> The standard bfd linker has a bug where function wrapping (used by the fuzz* >>>>> targets) is used in conjunction with LTO. >>>>> >>>>> Tested with all major versions of clang from 6 to 12 >>>>> >>>>> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com> >>>> >>>> What are the problems like if you have GCC or you ar/linker are not up >>>> to the job? I wouldn't mind omitting the tests since this has to be >>>> enabled explicitly by the user. >>> >>> We temporarily disabled LTO in Fedora rawhide due to GCC bugs causing >>> wierd test suite asserts. Those were pre-release versions of GCC/binutils >>> though. I've just tested again and LTO works correctly, so I've enabled >>> LTO once again. >>> >>> Regards, >>> Daniel >>> >> > > Regards, > Daniel >
On 27/10/20 21:42, Daniele Buono wrote: > Ok, no problem. I can definitely disable the check on GCC. > > Paolo, would you like me to disable checks on AR/linker for lto too? > If so, should I add some of this information on a document, perhaps > docs/devel/lto.rst, so it is written somewhere for future uses? I am not sure of the effects. Does it simply effectively disable LTO or is it something worse? I'll look into the SCSI issue. Paolo
Daniele Buono <dbuono@linux.vnet.ibm.com> writes: > In terms of ar and linker, if you don't have the right mix it will just > stop at link time with an error. > > In terms of using gcc the errors may be a bit more subtle, similar to > what Daniel mentioned. Succesfully compiling but then showing issues at > runtime or in the test suite. > > I'm using ubuntu 18.04 and the stock compiler (based on gcc 7.5) issues > a bunch of warnings but compile succesfully with LTO. > However, the tcg binary for sparc64 is broken. sparc64-linux-user? I think that might be in a bit of a bit rotted state - we had to disable running check-tcg on it in CI because of instability so I wouldn't be surprised if messing around with LTO has dug up even more gremlins. > System-wide emulation > stops in OpenFirmware with an exception. User emulation triggers a > segmentation fault in some of the test cases. If I compile QEMU with > --enable-debug the tests magically work. Breakage in both system and linux-user emulation probably points at something in the instruction decode being broken. Shame we don't have a working risu setup for sparc64 to give the instruction handling a proper work out. > > I briefly tested with gcc-9 and that seemed to work ok, buy your mileage > may vary > > On 10/26/2020 11:50 AM, Daniel P. Berrangé wrote: >> On Mon, Oct 26, 2020 at 10:51:43AM +0100, Paolo Bonzini wrote: >>> On 23/10/20 22:06, Daniele Buono wrote: >>>> This patch allows to compile QEMU with link-time optimization (LTO). >>>> Compilation with LTO is handled directly by meson. This patch adds checks >>>> in configure to make sure the toolchain supports LTO. >>>> >>>> Currently, allow LTO only with clang, since I have found a couple of issues >>>> with gcc-based LTO. >>>> >>>> In case fuzzing is enabled, automatically switch to llvm's linker (lld). >>>> The standard bfd linker has a bug where function wrapping (used by the fuzz* >>>> targets) is used in conjunction with LTO. >>>> >>>> Tested with all major versions of clang from 6 to 12 >>>> >>>> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com> >>> >>> What are the problems like if you have GCC or you ar/linker are not up >>> to the job? I wouldn't mind omitting the tests since this has to be >>> enabled explicitly by the user. >> >> We temporarily disabled LTO in Fedora rawhide due to GCC bugs causing >> wierd test suite asserts. Those were pre-release versions of GCC/binutils >> though. I've just tested again and LTO works correctly, so I've enabled >> LTO once again. >> >> Regards, >> Daniel >>
If LTO is enabled with the wrong linker/ar: - with the checks, it will exit at configure with an error. I can change this in a warning and disabling LTO if preferred. - without the checks compilation will fail If LTO is enabled with the wrong compiler (e.g. old gcc), you may get a bunch of warnings at compile time, and a binary that won't pass some of the tests in make check. On 10/28/2020 2:44 AM, Paolo Bonzini wrote: > On 27/10/20 21:42, Daniele Buono wrote: >> Ok, no problem. I can definitely disable the check on GCC. >> >> Paolo, would you like me to disable checks on AR/linker for lto too? >> If so, should I add some of this information on a document, perhaps >> docs/devel/lto.rst, so it is written somewhere for future uses? > > I am not sure of the effects. Does it simply effectively disable LTO or > is it something worse? > > I'll look into the SCSI issue. > > Paolo >
On 10/28/2020 5:35 AM, Alex Bennée wrote: > Breakage in both system and linux-user emulation probably points at > something in the instruction decode being broken. Shame we don't have a > working risu setup for sparc64 to give the instruction handling a proper > work out. This is what I'm thinking too. Interesting bit is that sparc32 seem to work fine, and it should be the same codebase. I played a bit with a couple of days but couldn't isolate the faulty instruction. But I'd be happy to work on this issue with someone, perhaps from the sparc maintainers, to see if we can find out what's happening
On 28/10/20 19:22, Daniele Buono wrote: > If LTO is enabled with the wrong linker/ar: > - with the checks, it will exit at configure with an error. I can change > this in a warning and disabling LTO if preferred. > - without the checks compilation will fail > > If LTO is enabled with the wrong compiler (e.g. old gcc), you may get a > bunch of warnings at compile time, and a binary that won't pass some of > the tests in make check. I think both of these count as user error or compiler bug, which we generally don't protect against. There is one exception. We check if the C++ compiler driver can link object files produced by the C compiler driver; this issue arises if the driver used for compilation (C) is GCC and the driver used for linking (C++) is clang, because GCC and clang's sanitizer libraries are not compatible with each other. I think however that in this case the problem is not one of compatibility, but just a broken install, so I think we can just ignore and just forward b_lto. Paolo
diff --git a/configure b/configure index 9dc05cfb8a..e964040522 100755 --- a/configure +++ b/configure @@ -76,6 +76,7 @@ fi TMPB="qemu-conf" TMPC="${TMPDIR1}/${TMPB}.c" TMPO="${TMPDIR1}/${TMPB}.o" +TMPA="${TMPDIR1}/lib${TMPB}.a" TMPCXX="${TMPDIR1}/${TMPB}.cxx" TMPE="${TMPDIR1}/${TMPB}.exe" TMPTXT="${TMPDIR1}/${TMPB}.txt" @@ -180,6 +181,32 @@ compile_prog() { $LDFLAGS $CONFIGURE_LDFLAGS $QEMU_LDFLAGS $local_ldflags } +do_run_filter() { + # Run a generic program, capturing its output to the log, + # but also filtering the output with grep. + # Returns the return value of grep. + # First argument is the filter string. + # Second argument is binary to execute. + local filter="$1" + local filter_pattern="" + if test "$filter" = "yes"; then + shift + filter_pattern="$1" + fi + shift + local program="$1" + shift + echo $program $@ >> config.log + $program $@ >> config.log 2>&1 || return $? + if test "$filter" = "yes"; then + $program $@ 2>&1 | grep "${filter_pattern}" >> /dev/null || return $? + fi +} + +create_library() { + do_run_filter "no" "$ar" -rc${1} $TMPA $TMPO +} + # symbolically link $1 to $2. Portable version of "ln -sf". symlink() { rm -rf "$2" @@ -242,6 +269,7 @@ host_cc="cc" audio_win_int="" libs_qga="" debug_info="yes" +lto="false" stack_protector="" safe_stack="" use_containers="yes" @@ -1159,6 +1187,10 @@ for opt do ;; --disable-werror) werror="no" ;; + --enable-lto) lto="true" + ;; + --disable-lto) lto="false" + ;; --enable-stack-protector) stack_protector="yes" ;; --disable-stack-protector) stack_protector="no" @@ -1735,6 +1767,8 @@ disabled with --disable-FEATURE, default is enabled if available: module-upgrades try to load modules from alternate paths for upgrades debug-tcg TCG debugging (default is disabled) debug-info debugging information + lto Enable Link-Time Optimization. + Depends on clang/llvm >=6.0 sparse sparse checker safe-stack SafeStack Stack Smash Protection. Depends on clang/llvm >= 3.7 and requires coroutine backend ucontext. @@ -5222,6 +5256,62 @@ if test "$plugins" = "yes" && fi ######################################## +# lto (Link-Time Optimization) + +if test "$lto" = "true"; then + # Test compiler/ar/linker support for lto. + # compilation with lto is handled by meson. Just make sure that compiler + # support is fully functional, and add additional compatibility flags + # if necessary. + + if ! echo | $cc -dM -E - | grep __clang__ > /dev/null 2>&1 ; then + # LTO with GCC and other compilers is not tested, and possibly broken + error_exit "QEMU only supports LTO with CLANG" + fi + + # Check that lto is supported. + # Need to check for: + # - Valid compiler, that supports lto flags + # - Valid ar, able to support intermediate code + # - Valid linker, able to support intermediate code + + #### Check for a valid *ar* for link-time optimization. + # Test it by creating a static library and linking it + # Compile an object first + cat > $TMPC << EOF +int fun(int val); + +int fun(int val) { + return val; +} +EOF + if ! compile_object "-Werror -flto"; then + error_exit "LTO is not supported by your compiler" + fi + # Create a library out of it + if ! create_library "s" ; then + error_exit "LTO is not supported by ar. This usually happens when mixing GNU and LLVM toolchain." + fi + # Now create a binary using the library + cat > $TMPC << EOF +int fun(int val); + +int main(int argc, char *argv[]) { + return fun(0); +} +EOF + if ! compile_prog "-Werror" "$test_ldflag -flto ${TMPA}"; then + error_exit "LTO is not supported by ar or the linker. This usually happens when mixing GNU and LLVM toolchain." + fi + + #### All good, add the flags for CFI to our CFLAGS and LDFLAGS + # Flag needed both at compilation and at linking + QEMU_LDFLAGS="$QEMU_LDFLAGS $test_ldflag" + # Add -flto to CONFIGURE_*FLAGS since we need it in configure, + # but will be added by meson later + CONFIGURE_CFLAGS="$QEMU_CFLAGS -flto" + CONFIGURE_LDFLAGS="$QEMU_LDFLAGS -flto" +fi # See if __attribute__((alias)) is supported. # This false for Xcode 9, but has been remedied for Xcode 10. # Unfortunately, travis uses Xcode 9 by default. @@ -5532,6 +5622,43 @@ if test "$fuzzing" = "yes" && test -z "${LIB_FUZZING_ENGINE+xxx}"; then error_exit "Your compiler doesn't support -fsanitize=fuzzer" exit 1 fi + # Make sure that the linker supports a custom linker script + # If LTO is enabled, switch linker to lld, since at the moment + # it is the only linker that works with lto and fuzzing: + # - gold does not support a custom script + # - bfd does not support wrapping functions with LTO + cat > $TMPC << EOF +#include <stdlib.h> +#include <stdio.h> +void* __real_malloc(size_t size); +void* __wrap_malloc(size_t size); + +void* __wrap_malloc(size_t size){ + printf("Inside wrap_malloc\n"); + return __real_malloc(size); +} + +int main(int argc, char *argv[]) { + int *myint = (void*) malloc(sizeof(int)); + *myint = 0; + return *myint; +} +EOF + extra_cflags="$CPU_CFLAGS -Werror" + extra_ldflags="-Wl,-T,${source_path}/tests/qtest/fuzz/fork_fuzz.ld" + extra_ldflags="${extra_ldflags} -Wl,--wrap,malloc" + if test "$lto" = "true"; then + extra_ldflags="${extra_ldflags} -fuse-ld=lld" + fi + if ! compile_prog "$extra_cflags" "$extra_ldflags"; then + error_exit "Your linker does not support our linker script" + fi + if ! do_run_filter "yes" "Inside wrap_malloc" ${TMPE} ""; then + error_exit "Your linker does not support our linker script" + fi + if test "$lto" = "true"; then + QEMU_LDFLAGS="${QEMU_LDFLAGS} -fuse-ld=lld" + fi fi # Thread sanitizer is, for now, much noisier than the other sanitizers; @@ -7018,6 +7145,7 @@ NINJA=$ninja $meson setup \ -Dcapstone=$capstone -Dslirp=$slirp -Dfdt=$fdt \ -Diconv=$iconv -Dcurses=$curses -Dlibudev=$libudev\ -Ddocs=$docs -Dsphinx_build=$sphinx_build \ + -Db_lto=$lto \ $cross_arg \ "$PWD" "$source_path" diff --git a/meson.build b/meson.build index 7627a0ae46..50e5c527df 100644 --- a/meson.build +++ b/meson.build @@ -1959,6 +1959,7 @@ summary_info += {'gprof enabled': config_host.has_key('CONFIG_GPROF')} summary_info += {'sparse enabled': sparse.found()} summary_info += {'strip binaries': get_option('strip')} summary_info += {'profiler': config_host.has_key('CONFIG_PROFILER')} +summary_info += {'link-time optimization (LTO)': get_option('b_lto')} summary_info += {'static build': config_host.has_key('CONFIG_STATIC')} if targetos == 'darwin' summary_info += {'Cocoa support': config_host.has_key('CONFIG_COCOA')}
This patch allows to compile QEMU with link-time optimization (LTO). Compilation with LTO is handled directly by meson. This patch adds checks in configure to make sure the toolchain supports LTO. Currently, allow LTO only with clang, since I have found a couple of issues with gcc-based LTO. In case fuzzing is enabled, automatically switch to llvm's linker (lld). The standard bfd linker has a bug where function wrapping (used by the fuzz* targets) is used in conjunction with LTO. Tested with all major versions of clang from 6 to 12 Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com> --- configure | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++ meson.build | 1 + 2 files changed, 129 insertions(+)