[2/2] configure: add support for Control-Flow Integrity

Message ID	20200702054948.10257-3-dbuono@linux.vnet.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=VsHw=AN=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B5DD92084C From: Daniele Buono <dbuono@linux.vnet.ibm.com> To: qemu-devel@nongnu.org Subject: [PATCH 2/2] configure: add support for Control-Flow Integrity Date: Thu, 2 Jul 2020 01:49:48 -0400 Message-Id: <20200702054948.10257-3-dbuono@linux.vnet.ibm.com> In-Reply-To: <20200702054948.10257-1-dbuono@linux.vnet.ibm.com> References: <20200702054948.10257-1-dbuono@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: none client-ip=148.163.158.5; envelope-from=dbuono@linux.vnet.ibm.com; helo=mx0a-001b2d01.pphosted.com Precedence: list Cc: Paolo Bonzini <pbonzini@redhat.com>, Daniele Buono <dbuono@linux.vnet.ibm.com> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	Add support for Control-Flow Integrity \| expand [0/2] Add support for Control-Flow Integrity [1/2] check-block: enable iotests with cfi-icall [2/2] configure: add support for Control-Flow Integrity

Daniele Buono July 2, 2020, 5:49 a.m. UTC

This patch adds a flag to enable/disable control flow integrity checks
on indirect function calls. This feature is only provided by LLVM/Clang
v3.9 or higher, and only allows indirect function calls to functions
with compatible signatures.

We also add an option to enable a debugging version of cfi, with verbose
output in case of a CFI violation.

CFI on indirect function calls does not support calls to functions in
shared libraries (since they were not known at compile time), and such
calls are forbidden. QEMU relies on dlopen/dlsym when using modules,
so we make modules incompatible with CFI.

We introduce a blacklist file, to disable CFI checks in a limited number
of TCG functions.

The feature relies on link-time optimization (lto), which requires the
use of the gold linker, and the LLVM versions of ar, ranlib and nm.
This patch take care of checking that all the compiler toolchain
dependencies are met.

Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
---
 cfi-blacklist.txt |  27 +++++++
 configure         | 177 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 204 insertions(+)
 create mode 100644 cfi-blacklist.txt

Paolo Bonzini July 2, 2020, 9:45 a.m. UTC | #1

On 02/07/20 07:49, Daniele Buono wrote:
> This patch adds a flag to enable/disable control flow integrity checks
> on indirect function calls. This feature is only provided by LLVM/Clang
> v3.9 or higher, and only allows indirect function calls to functions
> with compatible signatures.
> 
> We also add an option to enable a debugging version of cfi, with verbose
> output in case of a CFI violation.
> 
> CFI on indirect function calls does not support calls to functions in
> shared libraries (since they were not known at compile time), and such
> calls are forbidden. QEMU relies on dlopen/dlsym when using modules,
> so we make modules incompatible with CFI.
> 
> We introduce a blacklist file, to disable CFI checks in a limited number
> of TCG functions.
> 
> The feature relies on link-time optimization (lto), which requires the
> use of the gold linker, and the LLVM versions of ar, ranlib and nm.
> This patch take care of checking that all the compiler toolchain
> dependencies are met.
> 
> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>

Can you split this option in two parts, --enable-lto to enable link-time
optimization (perhaps also for GCC) and --enable-cfi which implies
--enable-lto?

This is because in the future we are considering switching to Meson,
where LTO support is built-in; having a separate switch would make it
easier to find the relevant code.

Paolo

> ---
>  cfi-blacklist.txt |  27 +++++++
>  configure         | 177 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 204 insertions(+)
>  create mode 100644 cfi-blacklist.txt
> 
> diff --git a/cfi-blacklist.txt b/cfi-blacklist.txt
> new file mode 100644
> index 0000000000..bf804431a5
> --- /dev/null
> +++ b/cfi-blacklist.txt
> @@ -0,0 +1,27 @@
> +# List of functions that should not be compiled with Control-Flow Integrity
> +
> +[cfi-icall]
> +# TCG creates binary blobs at runtime, with the transformed code.
> +# When it's time to execute it, the code is called with an indirect function
> +# call. Since such function did not exist at compile time, the runtime has no
> +# way to verify its signature. Disable CFI checks in the function that calls
> +# the binary blob
> +fun:cpu_tb_exec
> +
> +# TCI (Tiny Compiler Interpreter) is an interpreter for TCG pseudo code.
> +# One possible operation in the pseudo code is a call to binary code.
> +# Therefore, disable CFI checks in the interpreter function
> +fun:tcg_qemu_tb_exec
> +
> +# TCG Plugins Callback Functions. The mechanism rely on opening external
> +# shared libraries at runtime and get pointers to functions in such libraries
> +# Since these pointers are external to the QEMU binary, the runtime cannot
> +# verify their signature. Disable CFI Checks in all the functions that use
> +# such pointers.
> +fun:plugin_vcpu_cb__simple
> +fun:plugin_cb__simple
> +fun:plugin_cb__udata
> +fun:qemu_plugin_tb_trans_cb
> +fun:qemu_plugin_vcpu_syscall
> +fun:qemu_plugin_vcpu_syscall_ret
> +fun:plugin_load
> diff --git a/configure b/configure
> index 4a22dcd563..86fb0f390c 100755
> --- a/configure
> +++ b/configure
> @@ -27,6 +27,7 @@ fi
>  TMPB="qemu-conf"
>  TMPC="${TMPDIR1}/${TMPB}.c"
>  TMPO="${TMPDIR1}/${TMPB}.o"
> +TMPA="${TMPDIR1}/lib${TMPB}.a"
>  TMPCXX="${TMPDIR1}/${TMPB}.cxx"
>  TMPE="${TMPDIR1}/${TMPB}.exe"
>  TMPMO="${TMPDIR1}/${TMPB}.mo"
> @@ -134,6 +135,43 @@ compile_prog() {
>    do_cc $QEMU_CFLAGS $local_cflags -o $TMPE $TMPC $QEMU_LDFLAGS $local_ldflags
>  }
>  
> +do_run() {
> +    # Run a generic program, capturing its output to the log.
> +    # First argument is binary to execute.
> +    local program="$1"
> +    shift
> +    echo $program $@ >> config.log
> +    $program $@ >> config.log 2>&1 || return $?
> +}
> +
> +do_run_filter() {
> +    # Run a generic program, capturing its output to the log,
> +    # but also filtering the output with grep.
> +    # Returns the return value of grep.
> +    # First argument is the filter string.
> +    # Second argument is binary to execute.
> +    local filter="$1"
> +    shift
> +    local program="$1"
> +    shift
> +    echo $program $@ >> config.log
> +    $program $@ >> config.log 2>&1
> +    $program $@ 2>&1 | grep ${filter} >> /dev/null || return $?
> +
> +}
> +
> +create_library() {
> +  do_run "$ar" -rc${1} $TMPA $TMPO
> +}
> +
> +create_index() {
> +  do_run "$ranlib" $TMPA
> +}
> +
> +find_library_symbol() {
> +  do_run_filter ${1} "$nm" $TMPA
> +}
> +
>  # symbolically link $1 to $2.  Portable version of "ln -sf".
>  symlink() {
>    rm -rf "$2"
> @@ -306,6 +344,8 @@ libs_tools=""
>  audio_win_int=""
>  libs_qga=""
>  debug_info="yes"
> +cfi="no"
> +cfi_debug="no"
>  stack_protector=""
>  safe_stack=""
>  use_containers="yes"
> @@ -1285,6 +1325,14 @@ for opt do
>    ;;
>    --disable-werror) werror="no"
>    ;;
> +  --enable-cfi) cfi="yes"
> +  ;;
> +  --disable-cfi) cfi="no"
> +  ;;
> +  --enable-cfi-debug) cfi_debug="yes"
> +  ;;
> +  --disable-cfi-debug) cfi_debug="no"
> +  ;;
>    --enable-stack-protector) stack_protector="yes"
>    ;;
>    --disable-stack-protector) stack_protector="no"
> @@ -1838,6 +1886,10 @@ disabled with --disable-FEATURE, default is enabled if available:
>    module-upgrades try to load modules from alternate paths for upgrades
>    debug-tcg       TCG debugging (default is disabled)
>    debug-info      debugging information
> +  cfi             Enable Control-Flow Integrity for indirect function calls.
> +                  Depends on clang/llvm >= 3.9 and is incompatible with modules
> +  cfi-debug       In case of a cfi violation, a message containing the line that
> +                  triggered the error is written to stderr
>    sparse          sparse checker
>    safe-stack      SafeStack Stack Smash Protection. Depends on
>                    clang/llvm >= 3.7 and requires coroutine backend ucontext.
> @@ -5948,6 +6000,129 @@ if  test "$plugins" = "yes" &&
>        "for this purpose. You can't build with --static."
>  fi
>  
> +########################################
> +# cfi (Control Flow Integrity)
> +
> +if test "$cfi" = "yes"; then
> +  # Compiler/Linker Flags that needs to be added for cfi:
> +  # -fsanitize=cfi-icall to enable control-flow integrity checks on
> +  #            indirect function calls.
> +  # -fsanitize-cfi-icall-generalize-pointers to allow indirect function calls
> +  #            with pointers of a different type (i.e. pass a void* to a
> +  #            function that expects a char*). Used in some spots in QEMU,
> +  #            with compile-time type checks done by macros
> +  # -fsanitize-blacklist, to disable CFI on specific functions.
> +  #            required for some TCG functions that call runtime-created or
> +  #            runtime-linked code. More details in cfi-blacklist.txt
> +  # -flto=thin to enable link-time optimization. This is required for the
> +  #            implementation of CFI to work properly across object files
> +  # -fuse-ld=gold Since some of the objects are packed into static libraries,
> +  #               which are not supported by the bfd linker.
> +  test_cflag="-fsanitize=cfi-icall -fsanitize-cfi-icall-generalize-pointers -flto=thin -fsanitize-blacklist=${source_path}/cfi-blacklist.txt"
> +  test_ldflag="-fsanitize=cfi-icall -flto=thin -fuse-ld=gold -fsanitize-blacklist=${source_path}/cfi-blacklist.txt"
> +
> +  if test "$cfi_debug" = "yes"; then
> +    # Disable the default trap mechanism so that a error message is displayed
> +    # when a CFI violation happens. The code is still terminated after the
> +    # message
> +    test_cflag="${test_cflag} -fno-sanitize-trap=cfi-icall"
> +    test_ldflag="${test_ldflag} -fno-sanitize-trap=cfi-icall"
> +  fi
> +
> +  # Check that cfi is supported.
> +  # Need to check for:
> +  # - Valid compiler, that supports cfi flags
> +  # - Valid ar, ranlib and nm, able to work with intermediate code (for lto)
> +  # - Incompatible configure options (plugins and modules) that use dlsym at
> +  #   runtime (indirect function calls to shared libraries is not supported)
> +
> +  #### Check for a valid *ar* for link-time optimization.
> +  # Test it by creating a static library and linking it
> +  # Compile an object first
> +  cat > $TMPC << EOF
> +int fun(int val);
> +
> +int fun(int val) {
> +    return val;
> +}
> +EOF
> +  if ! compile_object "-Werror $test_cflag"; then
> +    error_exit "Control Flow Integrity is not supported by your compiler"
> +  fi
> +  # Create a library out of it
> +  if ! create_library "s" ; then
> +    error_exit "LTO is required for CFI, but is not supported by ar. This usually happens when using gnu ar. Try switching to LLVM ar"
> +  fi
> +  # Now create a binary using the library
> +  cat > $TMPC << EOF
> +int fun(int val);
> +
> +int main(int argc, char *argv[]) {
> +  return fun(0);
> +}
> +EOF
> +  if ! compile_prog "-Werror $test_cflag" "$test_ldflag -L${TMPDIR1} -l${TMPB}"; then
> +    error_exit "LTO is required for CFI, but is not supported by ar. This usually happens when using gnu ar. Try switching to LLVM ar"
> +  fi
> +
> +  #### Check for a valid *ranlib* for link-time optimization.
> +  # Test it by creating a static library without index, indexing and linking it
> +  cat > $TMPC << EOF
> +int fun(int val);
> +
> +int fun(int val) {
> +    return val;
> +}
> +EOF
> +  if ! compile_object "-Werror $test_cflag"; then
> +    error_exit "Control Flow Integrity is not supported by your compiler"
> +  fi
> +  # Create a library explicity without an index
> +  if ! create_library "S" ; then
> +    error_exit "LTO is required for CFI, but is not supported by ar. This usually happens when using gnu ar. Try switching to LLVM ar"
> +  fi
> +  # Now run ranlib to index it
> +  if ! create_index ; then
> +    error_exit "LTO is required for CFI, but is not supported by ranlib. This usually happens when using gnu ranlib. Try switching to LLVM ranlib"
> +  fi
> +  # If ranlib worked, we can now use the library
> +  cat > $TMPC << EOF
> +int fun(int val);
> +
> +int main(int argc, char *argv[]) {
> +  return fun(0);
> +}
> +EOF
> +  if ! compile_prog "-Werror $test_cflag" "$test_ldflag -L${TMPDIR1} -l${TMPB}"; then
> +    error_exit "LTO is required for CFI, but is not supported by ranlib. This usually happens when using gnu ranlib. Try switching to LLVM ranlib"
> +  fi
> +
> +  #### Check for a valid *nm* for link-time optimization.
> +  # nm does not return an error code if the file is unsupported, just
> +  # print a warning text. So, check if *fun* is one of the symbols found by nm
> +  # in the previously created static library
> +  if ! find_library_symbol "fun" ; then
> +    error_exit "LTO is required for CFI, but is not supported by nm. This usually happens when using gnu nm. Try switching to LLVM nm"
> +  fi
> +
> +  #### The toolchain supports CFI, let's check for incompatible options
> +
> +  if test "$modules" = "yes"; then
> +    error_exit "Control Flow Integrity is not compatible with modules"
> +  fi
> +
> +  #### All good, add the flags for CFI to our CFLAGS and LDFLAGS
> +  # Flag needed both at compilation and at linking
> +  QEMU_CFLAGS="$QEMU_CFLAGS $test_cflag"
> +  QEMU_LDFLAGS="$QEMU_LDFLAGS $test_ldflag"
> +
> +else
> +  if test "$cfi_debug" = "yes"; then
> +    error_exit "Cannot enable Control Flow Integrity debugging since CFI is not enabled"
> +  fi
> +fi
> +
> +
>  ########################################
>  # See if __attribute__((alias)) is supported.
>  # This false for Xcode 9, but has been remedied for Xcode 10.
> @@ -6856,6 +7031,8 @@ echo "gprof enabled     $gprof"
>  echo "sparse enabled    $sparse"
>  echo "strip binaries    $strip_opt"
>  echo "profiler          $profiler"
> +echo "cfi               $cfi"
> +echo "cfi debug         $cfi_debug"
>  echo "static build      $static"
>  echo "safe stack        $safe_stack"
>  if test "$darwin" = "yes" ; then
>

Daniel P. Berrangé July 2, 2020, 9:52 a.m. UTC | #2

On Thu, Jul 02, 2020 at 01:49:48AM -0400, Daniele Buono wrote:
> This patch adds a flag to enable/disable control flow integrity checks
> on indirect function calls. This feature is only provided by LLVM/Clang
> v3.9 or higher, and only allows indirect function calls to functions
> with compatible signatures.
> 
> We also add an option to enable a debugging version of cfi, with verbose
> output in case of a CFI violation.
> 
> CFI on indirect function calls does not support calls to functions in
> shared libraries (since they were not known at compile time), and such
> calls are forbidden. QEMU relies on dlopen/dlsym when using modules,
> so we make modules incompatible with CFI.
> 
> We introduce a blacklist file, to disable CFI checks in a limited number
> of TCG functions.
> 
> The feature relies on link-time optimization (lto), which requires the
> use of the gold linker, and the LLVM versions of ar, ranlib and nm.
> This patch take care of checking that all the compiler toolchain
> dependencies are met.
> 
> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
> ---
>  cfi-blacklist.txt |  27 +++++++
>  configure         | 177 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 204 insertions(+)
>  create mode 100644 cfi-blacklist.txt
> 
> diff --git a/cfi-blacklist.txt b/cfi-blacklist.txt
> new file mode 100644
> index 0000000000..bf804431a5
> --- /dev/null
> +++ b/cfi-blacklist.txt
> @@ -0,0 +1,27 @@
> +# List of functions that should not be compiled with Control-Flow Integrity
> +
> +[cfi-icall]
> +# TCG creates binary blobs at runtime, with the transformed code.
> +# When it's time to execute it, the code is called with an indirect function
> +# call. Since such function did not exist at compile time, the runtime has no
> +# way to verify its signature. Disable CFI checks in the function that calls
> +# the binary blob
> +fun:cpu_tb_exec
> +
> +# TCI (Tiny Compiler Interpreter) is an interpreter for TCG pseudo code.
> +# One possible operation in the pseudo code is a call to binary code.
> +# Therefore, disable CFI checks in the interpreter function
> +fun:tcg_qemu_tb_exec
> +
> +# TCG Plugins Callback Functions. The mechanism rely on opening external
> +# shared libraries at runtime and get pointers to functions in such libraries
> +# Since these pointers are external to the QEMU binary, the runtime cannot
> +# verify their signature. Disable CFI Checks in all the functions that use
> +# such pointers.
> +fun:plugin_vcpu_cb__simple
> +fun:plugin_cb__simple
> +fun:plugin_cb__udata
> +fun:qemu_plugin_tb_trans_cb
> +fun:qemu_plugin_vcpu_syscall
> +fun:qemu_plugin_vcpu_syscall_ret
> +fun:plugin_load

The need to maintain this list of functions makes me feel very
uneasy.

How can we have any confidence that this list of functions is
accurate ? How will maintainers ensure that they correctly update
it as they are writing/changing code, and how will they test the
result ?

It feels like it has the same general maint problem as the original
seccomp code we used, where we were never confident we had added
the right exceptions to let QEMU run without crashing when users
tickled some feature we forgot about.


Regards,
Daniel

Daniele Buono July 2, 2020, 12:19 p.m. UTC | #3

On 7/2/2020 5:45 AM, Paolo Bonzini wrote:
> On 02/07/20 07:49, Daniele Buono wrote:
>> This patch adds a flag to enable/disable control flow integrity checks
>> on indirect function calls. This feature is only provided by LLVM/Clang
>> v3.9 or higher, and only allows indirect function calls to functions
>> with compatible signatures.
>>
>> We also add an option to enable a debugging version of cfi, with verbose
>> output in case of a CFI violation.
>>
>> CFI on indirect function calls does not support calls to functions in
>> shared libraries (since they were not known at compile time), and such
>> calls are forbidden. QEMU relies on dlopen/dlsym when using modules,
>> so we make modules incompatible with CFI.
>>
>> We introduce a blacklist file, to disable CFI checks in a limited number
>> of TCG functions.
>>
>> The feature relies on link-time optimization (lto), which requires the
>> use of the gold linker, and the LLVM versions of ar, ranlib and nm.
>> This patch take care of checking that all the compiler toolchain
>> dependencies are met.
>>
>> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
> 
> Can you split this option in two parts, --enable-lto to enable link-time
> optimization (perhaps also for GCC) and --enable-cfi which implies
> --enable-lto?
> 
> This is because in the future we are considering switching to Meson,
> where LTO support is built-in; having a separate switch would make it
> easier to find the relevant code.
> 
> Paolo

Sure, that shouldn't be a big deal.

Thanks,
Daniele

> 
>> ---
>>   cfi-blacklist.txt |  27 +++++++
>>   configure         | 177 ++++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 204 insertions(+)
>>   create mode 100644 cfi-blacklist.txt
>>
>> diff --git a/cfi-blacklist.txt b/cfi-blacklist.txt
>> new file mode 100644
>> index 0000000000..bf804431a5
>> --- /dev/null
>> +++ b/cfi-blacklist.txt
>> @@ -0,0 +1,27 @@
>> +# List of functions that should not be compiled with Control-Flow Integrity
>> +
>> +[cfi-icall]
>> +# TCG creates binary blobs at runtime, with the transformed code.
>> +# When it's time to execute it, the code is called with an indirect function
>> +# call. Since such function did not exist at compile time, the runtime has no
>> +# way to verify its signature. Disable CFI checks in the function that calls
>> +# the binary blob
>> +fun:cpu_tb_exec
>> +
>> +# TCI (Tiny Compiler Interpreter) is an interpreter for TCG pseudo code.
>> +# One possible operation in the pseudo code is a call to binary code.
>> +# Therefore, disable CFI checks in the interpreter function
>> +fun:tcg_qemu_tb_exec
>> +
>> +# TCG Plugins Callback Functions. The mechanism rely on opening external
>> +# shared libraries at runtime and get pointers to functions in such libraries
>> +# Since these pointers are external to the QEMU binary, the runtime cannot
>> +# verify their signature. Disable CFI Checks in all the functions that use
>> +# such pointers.
>> +fun:plugin_vcpu_cb__simple
>> +fun:plugin_cb__simple
>> +fun:plugin_cb__udata
>> +fun:qemu_plugin_tb_trans_cb
>> +fun:qemu_plugin_vcpu_syscall
>> +fun:qemu_plugin_vcpu_syscall_ret
>> +fun:plugin_load
>> diff --git a/configure b/configure
>> index 4a22dcd563..86fb0f390c 100755
>> --- a/configure
>> +++ b/configure
>> @@ -27,6 +27,7 @@ fi
>>   TMPB="qemu-conf"
>>   TMPC="${TMPDIR1}/${TMPB}.c"
>>   TMPO="${TMPDIR1}/${TMPB}.o"
>> +TMPA="${TMPDIR1}/lib${TMPB}.a"
>>   TMPCXX="${TMPDIR1}/${TMPB}.cxx"
>>   TMPE="${TMPDIR1}/${TMPB}.exe"
>>   TMPMO="${TMPDIR1}/${TMPB}.mo"
>> @@ -134,6 +135,43 @@ compile_prog() {
>>     do_cc $QEMU_CFLAGS $local_cflags -o $TMPE $TMPC $QEMU_LDFLAGS $local_ldflags
>>   }
>>   
>> +do_run() {
>> +    # Run a generic program, capturing its output to the log.
>> +    # First argument is binary to execute.
>> +    local program="$1"
>> +    shift
>> +    echo $program $@ >> config.log
>> +    $program $@ >> config.log 2>&1 || return $?
>> +}
>> +
>> +do_run_filter() {
>> +    # Run a generic program, capturing its output to the log,
>> +    # but also filtering the output with grep.
>> +    # Returns the return value of grep.
>> +    # First argument is the filter string.
>> +    # Second argument is binary to execute.
>> +    local filter="$1"
>> +    shift
>> +    local program="$1"
>> +    shift
>> +    echo $program $@ >> config.log
>> +    $program $@ >> config.log 2>&1
>> +    $program $@ 2>&1 | grep ${filter} >> /dev/null || return $?
>> +
>> +}
>> +
>> +create_library() {
>> +  do_run "$ar" -rc${1} $TMPA $TMPO
>> +}
>> +
>> +create_index() {
>> +  do_run "$ranlib" $TMPA
>> +}
>> +
>> +find_library_symbol() {
>> +  do_run_filter ${1} "$nm" $TMPA
>> +}
>> +
>>   # symbolically link $1 to $2.  Portable version of "ln -sf".
>>   symlink() {
>>     rm -rf "$2"
>> @@ -306,6 +344,8 @@ libs_tools=""
>>   audio_win_int=""
>>   libs_qga=""
>>   debug_info="yes"
>> +cfi="no"
>> +cfi_debug="no"
>>   stack_protector=""
>>   safe_stack=""
>>   use_containers="yes"
>> @@ -1285,6 +1325,14 @@ for opt do
>>     ;;
>>     --disable-werror) werror="no"
>>     ;;
>> +  --enable-cfi) cfi="yes"
>> +  ;;
>> +  --disable-cfi) cfi="no"
>> +  ;;
>> +  --enable-cfi-debug) cfi_debug="yes"
>> +  ;;
>> +  --disable-cfi-debug) cfi_debug="no"
>> +  ;;
>>     --enable-stack-protector) stack_protector="yes"
>>     ;;
>>     --disable-stack-protector) stack_protector="no"
>> @@ -1838,6 +1886,10 @@ disabled with --disable-FEATURE, default is enabled if available:
>>     module-upgrades try to load modules from alternate paths for upgrades
>>     debug-tcg       TCG debugging (default is disabled)
>>     debug-info      debugging information
>> +  cfi             Enable Control-Flow Integrity for indirect function calls.
>> +                  Depends on clang/llvm >= 3.9 and is incompatible with modules
>> +  cfi-debug       In case of a cfi violation, a message containing the line that
>> +                  triggered the error is written to stderr
>>     sparse          sparse checker
>>     safe-stack      SafeStack Stack Smash Protection. Depends on
>>                     clang/llvm >= 3.7 and requires coroutine backend ucontext.
>> @@ -5948,6 +6000,129 @@ if  test "$plugins" = "yes" &&
>>         "for this purpose. You can't build with --static."
>>   fi
>>   
>> +########################################
>> +# cfi (Control Flow Integrity)
>> +
>> +if test "$cfi" = "yes"; then
>> +  # Compiler/Linker Flags that needs to be added for cfi:
>> +  # -fsanitize=cfi-icall to enable control-flow integrity checks on
>> +  #            indirect function calls.
>> +  # -fsanitize-cfi-icall-generalize-pointers to allow indirect function calls
>> +  #            with pointers of a different type (i.e. pass a void* to a
>> +  #            function that expects a char*). Used in some spots in QEMU,
>> +  #            with compile-time type checks done by macros
>> +  # -fsanitize-blacklist, to disable CFI on specific functions.
>> +  #            required for some TCG functions that call runtime-created or
>> +  #            runtime-linked code. More details in cfi-blacklist.txt
>> +  # -flto=thin to enable link-time optimization. This is required for the
>> +  #            implementation of CFI to work properly across object files
>> +  # -fuse-ld=gold Since some of the objects are packed into static libraries,
>> +  #               which are not supported by the bfd linker.
>> +  test_cflag="-fsanitize=cfi-icall -fsanitize-cfi-icall-generalize-pointers -flto=thin -fsanitize-blacklist=${source_path}/cfi-blacklist.txt"
>> +  test_ldflag="-fsanitize=cfi-icall -flto=thin -fuse-ld=gold -fsanitize-blacklist=${source_path}/cfi-blacklist.txt"
>> +
>> +  if test "$cfi_debug" = "yes"; then
>> +    # Disable the default trap mechanism so that a error message is displayed
>> +    # when a CFI violation happens. The code is still terminated after the
>> +    # message
>> +    test_cflag="${test_cflag} -fno-sanitize-trap=cfi-icall"
>> +    test_ldflag="${test_ldflag} -fno-sanitize-trap=cfi-icall"
>> +  fi
>> +
>> +  # Check that cfi is supported.
>> +  # Need to check for:
>> +  # - Valid compiler, that supports cfi flags
>> +  # - Valid ar, ranlib and nm, able to work with intermediate code (for lto)
>> +  # - Incompatible configure options (plugins and modules) that use dlsym at
>> +  #   runtime (indirect function calls to shared libraries is not supported)
>> +
>> +  #### Check for a valid *ar* for link-time optimization.
>> +  # Test it by creating a static library and linking it
>> +  # Compile an object first
>> +  cat > $TMPC << EOF
>> +int fun(int val);
>> +
>> +int fun(int val) {
>> +    return val;
>> +}
>> +EOF
>> +  if ! compile_object "-Werror $test_cflag"; then
>> +    error_exit "Control Flow Integrity is not supported by your compiler"
>> +  fi
>> +  # Create a library out of it
>> +  if ! create_library "s" ; then
>> +    error_exit "LTO is required for CFI, but is not supported by ar. This usually happens when using gnu ar. Try switching to LLVM ar"
>> +  fi
>> +  # Now create a binary using the library
>> +  cat > $TMPC << EOF
>> +int fun(int val);
>> +
>> +int main(int argc, char *argv[]) {
>> +  return fun(0);
>> +}
>> +EOF
>> +  if ! compile_prog "-Werror $test_cflag" "$test_ldflag -L${TMPDIR1} -l${TMPB}"; then
>> +    error_exit "LTO is required for CFI, but is not supported by ar. This usually happens when using gnu ar. Try switching to LLVM ar"
>> +  fi
>> +
>> +  #### Check for a valid *ranlib* for link-time optimization.
>> +  # Test it by creating a static library without index, indexing and linking it
>> +  cat > $TMPC << EOF
>> +int fun(int val);
>> +
>> +int fun(int val) {
>> +    return val;
>> +}
>> +EOF
>> +  if ! compile_object "-Werror $test_cflag"; then
>> +    error_exit "Control Flow Integrity is not supported by your compiler"
>> +  fi
>> +  # Create a library explicity without an index
>> +  if ! create_library "S" ; then
>> +    error_exit "LTO is required for CFI, but is not supported by ar. This usually happens when using gnu ar. Try switching to LLVM ar"
>> +  fi
>> +  # Now run ranlib to index it
>> +  if ! create_index ; then
>> +    error_exit "LTO is required for CFI, but is not supported by ranlib. This usually happens when using gnu ranlib. Try switching to LLVM ranlib"
>> +  fi
>> +  # If ranlib worked, we can now use the library
>> +  cat > $TMPC << EOF
>> +int fun(int val);
>> +
>> +int main(int argc, char *argv[]) {
>> +  return fun(0);
>> +}
>> +EOF
>> +  if ! compile_prog "-Werror $test_cflag" "$test_ldflag -L${TMPDIR1} -l${TMPB}"; then
>> +    error_exit "LTO is required for CFI, but is not supported by ranlib. This usually happens when using gnu ranlib. Try switching to LLVM ranlib"
>> +  fi
>> +
>> +  #### Check for a valid *nm* for link-time optimization.
>> +  # nm does not return an error code if the file is unsupported, just
>> +  # print a warning text. So, check if *fun* is one of the symbols found by nm
>> +  # in the previously created static library
>> +  if ! find_library_symbol "fun" ; then
>> +    error_exit "LTO is required for CFI, but is not supported by nm. This usually happens when using gnu nm. Try switching to LLVM nm"
>> +  fi
>> +
>> +  #### The toolchain supports CFI, let's check for incompatible options
>> +
>> +  if test "$modules" = "yes"; then
>> +    error_exit "Control Flow Integrity is not compatible with modules"
>> +  fi
>> +
>> +  #### All good, add the flags for CFI to our CFLAGS and LDFLAGS
>> +  # Flag needed both at compilation and at linking
>> +  QEMU_CFLAGS="$QEMU_CFLAGS $test_cflag"
>> +  QEMU_LDFLAGS="$QEMU_LDFLAGS $test_ldflag"
>> +
>> +else
>> +  if test "$cfi_debug" = "yes"; then
>> +    error_exit "Cannot enable Control Flow Integrity debugging since CFI is not enabled"
>> +  fi
>> +fi
>> +
>> +
>>   ########################################
>>   # See if __attribute__((alias)) is supported.
>>   # This false for Xcode 9, but has been remedied for Xcode 10.
>> @@ -6856,6 +7031,8 @@ echo "gprof enabled     $gprof"
>>   echo "sparse enabled    $sparse"
>>   echo "strip binaries    $strip_opt"
>>   echo "profiler          $profiler"
>> +echo "cfi               $cfi"
>> +echo "cfi debug         $cfi_debug"
>>   echo "static build      $static"
>>   echo "safe stack        $safe_stack"
>>   if test "$darwin" = "yes" ; then
>>
>

Daniele Buono July 2, 2020, 12:50 p.m. UTC | #4

On 7/2/2020 5:52 AM, Daniel P. Berrangé wrote:
> On Thu, Jul 02, 2020 at 01:49:48AM -0400, Daniele Buono wrote:
>> This patch adds a flag to enable/disable control flow integrity checks
>> on indirect function calls. This feature is only provided by LLVM/Clang
>> v3.9 or higher, and only allows indirect function calls to functions
>> with compatible signatures.
>>
>> We also add an option to enable a debugging version of cfi, with verbose
>> output in case of a CFI violation.
>>
>> CFI on indirect function calls does not support calls to functions in
>> shared libraries (since they were not known at compile time), and such
>> calls are forbidden. QEMU relies on dlopen/dlsym when using modules,
>> so we make modules incompatible with CFI.
>>
>> We introduce a blacklist file, to disable CFI checks in a limited number
>> of TCG functions.
>>
>> The feature relies on link-time optimization (lto), which requires the
>> use of the gold linker, and the LLVM versions of ar, ranlib and nm.
>> This patch take care of checking that all the compiler toolchain
>> dependencies are met.
>>
>> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
>> ---
>>   cfi-blacklist.txt |  27 +++++++
>>   configure         | 177 ++++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 204 insertions(+)
>>   create mode 100644 cfi-blacklist.txt
>>
>> diff --git a/cfi-blacklist.txt b/cfi-blacklist.txt
>> new file mode 100644
>> index 0000000000..bf804431a5
>> --- /dev/null
>> +++ b/cfi-blacklist.txt
>> @@ -0,0 +1,27 @@
>> +# List of functions that should not be compiled with Control-Flow Integrity
>> +
>> +[cfi-icall]
>> +# TCG creates binary blobs at runtime, with the transformed code.
>> +# When it's time to execute it, the code is called with an indirect function
>> +# call. Since such function did not exist at compile time, the runtime has no
>> +# way to verify its signature. Disable CFI checks in the function that calls
>> +# the binary blob
>> +fun:cpu_tb_exec
>> +
>> +# TCI (Tiny Compiler Interpreter) is an interpreter for TCG pseudo code.
>> +# One possible operation in the pseudo code is a call to binary code.
>> +# Therefore, disable CFI checks in the interpreter function
>> +fun:tcg_qemu_tb_exec
>> +
>> +# TCG Plugins Callback Functions. The mechanism rely on opening external
>> +# shared libraries at runtime and get pointers to functions in such libraries
>> +# Since these pointers are external to the QEMU binary, the runtime cannot
>> +# verify their signature. Disable CFI Checks in all the functions that use
>> +# such pointers.
>> +fun:plugin_vcpu_cb__simple
>> +fun:plugin_cb__simple
>> +fun:plugin_cb__udata
>> +fun:qemu_plugin_tb_trans_cb
>> +fun:qemu_plugin_vcpu_syscall
>> +fun:qemu_plugin_vcpu_syscall_ret
>> +fun:plugin_load
> 
> The need to maintain this list of functions makes me feel very
> uneasy.
> 
> How can we have any confidence that this list of functions is
> accurate ? How will maintainers ensure that they correctly update
> it as they are writing/changing code, and how will they test the
> result ?
> 
> It feels like it has the same general maint problem as the original
> seccomp code we used, where we were never confident we had added
> the right exceptions to let QEMU run without crashing when users
> tickled some feature we forgot about.
> 
> 
> Regards,
> Daniel
> 

I agree with you that keeping that list updated is a daunting task. In 
my opinion, however, it is not as difficult as a seccomp filter, for the 
following reasons:

1) Seccomp covers everything that runs in your process, including shared 
libraries that you have no control over. CFI covers only the code in the 
QEMU binary. So at least we don't have to guess what other libraries 
used by QEMU will or won't do during execution.

2) With seccomp you have to filter behavior that, while admissible, 
should not happen in your code. CFI can be seen as a run-time type 
checking system; if the signature of the function is wrong, that is a 
coding error... in theory. In practice, there is a corner-case because 
the type checking doesn't know the signature of code loaded or written 
at run-time, and that is why you have to use a CFI filter.

So yes, there is risk, but IMHO it's not as high as in seccomp.

I think with a bit of education, it would be easy to spot red flags in 
new code.
As for education/testing... I can definitely work on a doc to be put in 
docs/devel.
Testing for CFI violations may be more difficult, however if a test code 
that exercises it is written in tests/, compiling QEMU with CFI and 
running the test should be sufficient to hit the violation.
I also wonder if this is something that could be put in the fuzzing 
environment. It would probably also help in finding coding error in 
corner cases quicker.

Regards,
Daniele

Paolo Bonzini July 2, 2020, 12:59 p.m. UTC | #5

On 02/07/20 14:50, Daniele Buono wrote:
> I also wonder if this is something that could be put in the fuzzing
> environment. It would probably also help in finding coding error in
> corner cases quicker.

Yes, fuzzing and tests/docker/test-debug should enable CFI.  Also,
tests/docker/test-clang should enable LTO.

Paolo

Daniel P. Berrangé July 2, 2020, 1:12 p.m. UTC | #6

On Thu, Jul 02, 2020 at 08:50:08AM -0400, Daniele Buono wrote:
> 
> 
> On 7/2/2020 5:52 AM, Daniel P. Berrangé wrote:
> > On Thu, Jul 02, 2020 at 01:49:48AM -0400, Daniele Buono wrote:
> > > This patch adds a flag to enable/disable control flow integrity checks
> > > on indirect function calls. This feature is only provided by LLVM/Clang
> > > v3.9 or higher, and only allows indirect function calls to functions
> > > with compatible signatures.
> > > 
> > > We also add an option to enable a debugging version of cfi, with verbose
> > > output in case of a CFI violation.
> > > 
> > > CFI on indirect function calls does not support calls to functions in
> > > shared libraries (since they were not known at compile time), and such
> > > calls are forbidden. QEMU relies on dlopen/dlsym when using modules,
> > > so we make modules incompatible with CFI.
> > > 
> > > We introduce a blacklist file, to disable CFI checks in a limited number
> > > of TCG functions.
> > > 
> > > The feature relies on link-time optimization (lto), which requires the
> > > use of the gold linker, and the LLVM versions of ar, ranlib and nm.
> > > This patch take care of checking that all the compiler toolchain
> > > dependencies are met.
> > > 
> > > Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
> > > ---
> > >   cfi-blacklist.txt |  27 +++++++
> > >   configure         | 177 ++++++++++++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 204 insertions(+)
> > >   create mode 100644 cfi-blacklist.txt
> > > 
> > > diff --git a/cfi-blacklist.txt b/cfi-blacklist.txt
> > > new file mode 100644
> > > index 0000000000..bf804431a5
> > > --- /dev/null
> > > +++ b/cfi-blacklist.txt
> > > @@ -0,0 +1,27 @@
> > > +# List of functions that should not be compiled with Control-Flow Integrity
> > > +
> > > +[cfi-icall]
> > > +# TCG creates binary blobs at runtime, with the transformed code.
> > > +# When it's time to execute it, the code is called with an indirect function
> > > +# call. Since such function did not exist at compile time, the runtime has no
> > > +# way to verify its signature. Disable CFI checks in the function that calls
> > > +# the binary blob
> > > +fun:cpu_tb_exec
> > > +
> > > +# TCI (Tiny Compiler Interpreter) is an interpreter for TCG pseudo code.
> > > +# One possible operation in the pseudo code is a call to binary code.
> > > +# Therefore, disable CFI checks in the interpreter function
> > > +fun:tcg_qemu_tb_exec
> > > +
> > > +# TCG Plugins Callback Functions. The mechanism rely on opening external
> > > +# shared libraries at runtime and get pointers to functions in such libraries
> > > +# Since these pointers are external to the QEMU binary, the runtime cannot
> > > +# verify their signature. Disable CFI Checks in all the functions that use
> > > +# such pointers.
> > > +fun:plugin_vcpu_cb__simple
> > > +fun:plugin_cb__simple
> > > +fun:plugin_cb__udata
> > > +fun:qemu_plugin_tb_trans_cb
> > > +fun:qemu_plugin_vcpu_syscall
> > > +fun:qemu_plugin_vcpu_syscall_ret
> > > +fun:plugin_load
> > 
> > The need to maintain this list of functions makes me feel very
> > uneasy.
> > 
> > How can we have any confidence that this list of functions is
> > accurate ? How will maintainers ensure that they correctly update
> > it as they are writing/changing code, and how will they test the
> > result ?
> > 
> > It feels like it has the same general maint problem as the original
> > seccomp code we used, where we were never confident we had added
> > the right exceptions to let QEMU run without crashing when users
> > tickled some feature we forgot about.
> > 
> > 
> > Regards,
> > Daniel
> > 
> 
> I agree with you that keeping that list updated is a daunting task. In my
> opinion, however, it is not as difficult as a seccomp filter, for the
> following reasons:
> 
> 1) Seccomp covers everything that runs in your process, including shared
> libraries that you have no control over. CFI covers only the code in the
> QEMU binary. So at least we don't have to guess what other libraries used by
> QEMU will or won't do during execution.
> 
> 2) With seccomp you have to filter behavior that, while admissible, should
> not happen in your code. CFI can be seen as a run-time type checking system;
> if the signature of the function is wrong, that is a coding error... in
> theory. In practice, there is a corner-case because the type checking
> doesn't know the signature of code loaded or written at run-time, and that
> is why you have to use a CFI filter.

Can you elaborate on this function signature rules here ? Is this referring
to scenarios where you cast between 2 different function prototypes ?

I'm wondering if this applies to the way GLib's main loop source callbacks
are registered.

eg the g_source_set_callback method takes a callback with a signature
of "GSourceFunc" which is

  gboolean (*)(void *)

but the way GSources are implemented means that each implementation gets
to define its own custom callback signature. So for example, in QIOChannel
we use

  int (*)(struct QIOChannel *, enum <anonymous>,  void *)

Thus, we always have an intentional bad function cast when invoking the
g_source_set_callback method.

GCC is able to warn about these if we add -Wcast-function-type, but we
don't do that because these bad casts are intentional.

eg

io/channel.c: In function ‘qio_channel_add_watch_full’:
io/channel.c:315:35: error: cast between incompatible function types from ‘QIOChannelFunc’ {aka ‘int (*)(struct QIOChannel *, enum <anonymous>,  void *)’} to ‘gboolean (*)(void *)’ {aka ‘int (*)(void *)’} [-Werror=cast-function-type]
  315 |     g_source_set_callback(source, (GSourceFunc)func, user_data, notify);
      |                                   ^
io/channel.c: In function ‘qio_channel_wait’:
io/channel.c:507:27: error: cast between incompatible function types from ‘gboolean (*)(QIOChannel *, GIOCondition,  void *)’ {aka ‘int (*)(struct QIOChannel *, enum <anonymous>,  void *)’} to ‘gboolean (*)(void *)’ {aka ‘int (*)(void *)’} [-Werror=cast-function-type]
  507 |                           (GSourceFunc)qio_channel_wait_complete,
      |                           ^


Regards,
Daniel

Alexander Bulekov July 2, 2020, 1:38 p.m. UTC | #7

Can't wait to try this out!

On 200702 1459, Paolo Bonzini wrote:
> On 02/07/20 14:50, Daniele Buono wrote:
> > I also wonder if this is something that could be put in the fuzzing
> > environment. It would probably also help in finding coding error in
> > corner cases quicker.
> 
> Yes, fuzzing and tests/docker/test-debug should enable CFI.  Also,
> tests/docker/test-clang should enable LTO.
> 
> Paolo

I believe CFI is most-useful as an active defensive measure. I can't
think of many cases where a fuzzer/test could raise a CFI alert that
wouldn't also be caught by something like canaries, ASan or UBsan,
though maybe I'm missing something. Maybe testing/fuzzing with CFI could
be useful to weed out any errors due to e.g. an incomplete
cfi-blacklist.txt

-Alex

Daniele Buono July 2, 2020, 3:02 p.m. UTC | #8

On 7/2/2020 9:12 AM, Daniel P. Berrangé wrote:
> On Thu, Jul 02, 2020 at 08:50:08AM -0400, Daniele Buono wrote:
>>
>>
>> On 7/2/2020 5:52 AM, Daniel P. Berrangé wrote:
>>> On Thu, Jul 02, 2020 at 01:49:48AM -0400, Daniele Buono wrote:
>>>> This patch adds a flag to enable/disable control flow integrity checks
>>>> on indirect function calls. This feature is only provided by LLVM/Clang
>>>> v3.9 or higher, and only allows indirect function calls to functions
>>>> with compatible signatures.
>>>>
>>>> We also add an option to enable a debugging version of cfi, with verbose
>>>> output in case of a CFI violation.
>>>>
>>>> CFI on indirect function calls does not support calls to functions in
>>>> shared libraries (since they were not known at compile time), and such
>>>> calls are forbidden. QEMU relies on dlopen/dlsym when using modules,
>>>> so we make modules incompatible with CFI.
>>>>
>>>> We introduce a blacklist file, to disable CFI checks in a limited number
>>>> of TCG functions.
>>>>
>>>> The feature relies on link-time optimization (lto), which requires the
>>>> use of the gold linker, and the LLVM versions of ar, ranlib and nm.
>>>> This patch take care of checking that all the compiler toolchain
>>>> dependencies are met.
>>>>
>>>> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
>>>> ---
>>>>    cfi-blacklist.txt |  27 +++++++
>>>>    configure         | 177 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>    2 files changed, 204 insertions(+)
>>>>    create mode 100644 cfi-blacklist.txt
>>>>
>>>> diff --git a/cfi-blacklist.txt b/cfi-blacklist.txt
>>>> new file mode 100644
>>>> index 0000000000..bf804431a5
>>>> --- /dev/null
>>>> +++ b/cfi-blacklist.txt
>>>> @@ -0,0 +1,27 @@
>>>> +# List of functions that should not be compiled with Control-Flow Integrity
>>>> +
>>>> +[cfi-icall]
>>>> +# TCG creates binary blobs at runtime, with the transformed code.
>>>> +# When it's time to execute it, the code is called with an indirect function
>>>> +# call. Since such function did not exist at compile time, the runtime has no
>>>> +# way to verify its signature. Disable CFI checks in the function that calls
>>>> +# the binary blob
>>>> +fun:cpu_tb_exec
>>>> +
>>>> +# TCI (Tiny Compiler Interpreter) is an interpreter for TCG pseudo code.
>>>> +# One possible operation in the pseudo code is a call to binary code.
>>>> +# Therefore, disable CFI checks in the interpreter function
>>>> +fun:tcg_qemu_tb_exec
>>>> +
>>>> +# TCG Plugins Callback Functions. The mechanism rely on opening external
>>>> +# shared libraries at runtime and get pointers to functions in such libraries
>>>> +# Since these pointers are external to the QEMU binary, the runtime cannot
>>>> +# verify their signature. Disable CFI Checks in all the functions that use
>>>> +# such pointers.
>>>> +fun:plugin_vcpu_cb__simple
>>>> +fun:plugin_cb__simple
>>>> +fun:plugin_cb__udata
>>>> +fun:qemu_plugin_tb_trans_cb
>>>> +fun:qemu_plugin_vcpu_syscall
>>>> +fun:qemu_plugin_vcpu_syscall_ret
>>>> +fun:plugin_load
>>>
>>> The need to maintain this list of functions makes me feel very
>>> uneasy.
>>>
>>> How can we have any confidence that this list of functions is
>>> accurate ? How will maintainers ensure that they correctly update
>>> it as they are writing/changing code, and how will they test the
>>> result ?
>>>
>>> It feels like it has the same general maint problem as the original
>>> seccomp code we used, where we were never confident we had added
>>> the right exceptions to let QEMU run without crashing when users
>>> tickled some feature we forgot about.
>>>
>>>
>>> Regards,
>>> Daniel
>>>
>>
>> I agree with you that keeping that list updated is a daunting task. In my
>> opinion, however, it is not as difficult as a seccomp filter, for the
>> following reasons:
>>
>> 1) Seccomp covers everything that runs in your process, including shared
>> libraries that you have no control over. CFI covers only the code in the
>> QEMU binary. So at least we don't have to guess what other libraries used by
>> QEMU will or won't do during execution.
>>
>> 2) With seccomp you have to filter behavior that, while admissible, should
>> not happen in your code. CFI can be seen as a run-time type checking system;
>> if the signature of the function is wrong, that is a coding error... in
>> theory. In practice, there is a corner-case because the type checking
>> doesn't know the signature of code loaded or written at run-time, and that
>> is why you have to use a CFI filter.
> 
> Can you elaborate on this function signature rules here ? Is this referring
> to scenarios where you cast between 2 different function prototypes ?

It is, partially. A CFI check, however, is done at the moment you call 
the function pointer. So if you cast a function pointer to a different 
type (say gboolean (*)(void *) ), but cast it back to the original type 
before you call it, it's going to be fine.

Chances are you are doing this anyway, since you really want to pass 
those parameters to the callback.
> 
> I'm wondering if this applies to the way GLib's main loop source callbacks
> are registered.
> 
> eg the g_source_set_callback method takes a callback with a signature
> of "GSourceFunc" which is
> 
>    gboolean (*)(void *)
> 
> but the way GSources are implemented means that each implementation gets
> to define its own custom callback signature. So for example, in QIOChannel
> we use
> 
>    int (*)(struct QIOChannel *, enum <anonymous>,  void *)
> 
> Thus, we always have an intentional bad function cast when invoking the
> g_source_set_callback method.
> 
> GCC is able to warn about these if we add -Wcast-function-type, but we
> don't do that because these bad casts are intentional.
> 
> eg
> 
> io/channel.c: In function ‘qio_channel_add_watch_full’:
> io/channel.c:315:35: error: cast between incompatible function types from ‘QIOChannelFunc’ {aka ‘int (*)(struct QIOChannel *, enum <anonymous>,  void *)’} to ‘gboolean (*)(void *)’ {aka ‘int (*)(void *)’} [-Werror=cast-function-type]
>    315 |     g_source_set_callback(source, (GSourceFunc)func, user_data, notify);
>        |                                   ^
> io/channel.c: In function ‘qio_channel_wait’:
> io/channel.c:507:27: error: cast between incompatible function types from ‘gboolean (*)(QIOChannel *, GIOCondition,  void *)’ {aka ‘int (*)(struct QIOChannel *, enum <anonymous>,  void *)’} to ‘gboolean (*)(void *)’ {aka ‘int (*)(void *)’} [-Werror=cast-function-type]
>    507 |                           (GSourceFunc)qio_channel_wait_complete,
>        |                           ^
> 
> 
> Regards,
> Daniel
> 
I think GLib's approach is clean and shouldn't be an issue with CFI. We 
are indeed passing the function with a wonky cast, but when the callback 
needs to be called back, that part is handled by a dispatcher function 
that we're writing in QEMU code, such as (io/channel-buffer.c:177)

static gboolean
qio_channel_buffer_source_dispatch(GSource *source,
                                    GSourceFunc callback,
                                    gpointer user_data)
{
     QIOChannelFunc func = (QIOChannelFunc)callback;
     QIOChannelBufferSource *bsource = (QIOChannelBufferSource *)source;

     return (*func)(QIO_CHANNEL(bsource->bioc),
                    ((G_IO_IN | G_IO_OUT) & bsource->condition),
                    user_data);
}

So the callback pointer is cast back to the original type, and CFI 
checks should be happy about it.

Daniele Buono July 2, 2020, 3:43 p.m. UTC | #9

Hey Alex!

I agree, in most cases (possibly all of them), a wrong indirect function 
call will end up with something that is catched by ASan or UBSan.
This way, however, you may catch it earlier and it may make debug easier 
(especially with --enable-cfi-debug which is printing an error with the 
calling and, most times, the called function).

UBSan does have a similar feature, -fsanitize=function, but 
unfortunately it works only on C++ code, and therefore is not good in 
the QEMU case.

And, of course, it will also be used to weed out missing functions in 
the CFI filter.

On 7/2/2020 9:38 AM, Alexander Bulekov wrote:
> Can't wait to try this out!
> 
> On 200702 1459, Paolo Bonzini wrote:
>> On 02/07/20 14:50, Daniele Buono wrote:
>>> I also wonder if this is something that could be put in the fuzzing
>>> environment. It would probably also help in finding coding error in
>>> corner cases quicker.
>>
>> Yes, fuzzing and tests/docker/test-debug should enable CFI.  Also,
>> tests/docker/test-clang should enable LTO.
>>
>> Paolo
> 
> I believe CFI is most-useful as an active defensive measure. I can't
> think of many cases where a fuzzer/test could raise a CFI alert that
> wouldn't also be caught by something like canaries, ASan or UBsan,
> though maybe I'm missing something. Maybe testing/fuzzing with CFI could
> be useful to weed out any errors due to e.g. an incomplete
> cfi-blacklist.txt
> 
> -Alex
>

Daniele Buono July 16, 2020, 9:57 p.m. UTC | #10

On 7/2/2020 5:52 AM, Daniel P. Berrangé wrote:
> The need to maintain this list of functions makes me feel very
> uneasy.
> 
> How can we have any confidence that this list of functions is
> accurate ? How will maintainers ensure that they correctly update
> it as they are writing/changing code, and how will they test the
> result ?

Hi Daniel,

I gave it some thought and studied more of clang's options. It is 
possible to disable cfi on specific functions by using an __attribute__ 
keyword, instead of providing a list in an external file.
In terms of maintaining, this is much better since we are removing the 
need to update the list. I would suggest defining a macro, 
__disable_cfi__, that can be prepended to a function. The macro would 
expand to nothing if cfi is disabled, or to the proper attribute if it 
is enabled. Here's example code snippet

/* Disable CFI checks.
  * The callback function has been loaded from an external library so we 
do not
  * have type information */
__disable_cfi__
void qemu_plugin_vcpu_syscall_ret(CPUState *cpu, int64_t num, int64_t ret)
{
...
}

This would take care of renaming and removal of functions that need cfi.
It would also probably be beneficial to the developers since they can 
see immediately that the function they are working with needs to have 
CFI checks disabled, and why.

If you think this is a better approach, I'll submit v2 with this 
approach instead of the external function list.

For new code, however, the best thing is proper education and testing.
I'll work on a document for docs/devel to detail what it is and how to 
make code cfi-safe.
A good approach should be to test code changes with CFI enabled. CFI is, 
after all, a sanitizer and therefore it makes sense to use it also 
during development, together with ASan, UBSan and the likes. 
Unfortunately, since it works only with clang, I don't think this can 
ever be a hard requirement.

Daniele

Daniele Buono Aug. 10, 2020, 7:01 p.m. UTC | #11

Hi Alex, Paolo
Hitting a small issue while adding support for lto (and therefore cfi) 
to the fuzzer.

The fuzzer requires a modified linker script to place all of the stuff 
that needs to persist across fuzzing runs into a contiguous section of 
memory.

It does that by inserting three new sections after the .data section of 
the binary. Instead of rewriting a full linker script, it's using the 
*INSERT* keyword to add this to the default linker script.

Now, LTO with LLVM requires to use the gold linker, which does not have 
a default linker script and therefore does not support the *INSERT* keyword.
This can be fixed by taking the default script from the bdf linker, 
adding the additional sections and ask gold to use the full script.

However, I don't like the idea of replacing the small, self-contained 
script snippet that is currently used, with a full script (which is 
probably also dependent on the bfd/os version). So I'm thinking of 
adding a check in configure. If gold is the linker, automatically create 
(somehow, still working on it) the full link script by obtaining the 
default bfd script and add the required parts. Would that work for you?

Cheers,
Daniele

On 7/2/2020 11:43 AM, Daniele Buono wrote:
> Hey Alex!
> 
> I agree, in most cases (possibly all of them), a wrong indirect function 
> call will end up with something that is catched by ASan or UBSan.
> This way, however, you may catch it earlier and it may make debug easier 
> (especially with --enable-cfi-debug which is printing an error with the 
> calling and, most times, the called function).
> 
> UBSan does have a similar feature, -fsanitize=function, but 
> unfortunately it works only on C++ code, and therefore is not good in 
> the QEMU case.
> 
> And, of course, it will also be used to weed out missing functions in 
> the CFI filter.
> 
> On 7/2/2020 9:38 AM, Alexander Bulekov wrote:
>> Can't wait to try this out!
>>
>> On 200702 1459, Paolo Bonzini wrote:
>>> On 02/07/20 14:50, Daniele Buono wrote:
>>>> I also wonder if this is something that could be put in the fuzzing
>>>> environment. It would probably also help in finding coding error in
>>>> corner cases quicker.
>>>
>>> Yes, fuzzing and tests/docker/test-debug should enable CFI.  Also,
>>> tests/docker/test-clang should enable LTO.
>>>
>>> Paolo
>>
>> I believe CFI is most-useful as an active defensive measure. I can't
>> think of many cases where a fuzzer/test could raise a CFI alert that
>> wouldn't also be caught by something like canaries, ASan or UBsan,
>> though maybe I'm missing something. Maybe testing/fuzzing with CFI could
>> be useful to weed out any errors due to e.g. an incomplete
>> cfi-blacklist.txt
>>
>> -Alex
>>
>

Paolo Bonzini Aug. 10, 2020, 7:39 p.m. UTC | #12

On 10/08/20 21:01, Daniele Buono wrote:
> So I'm thinking of adding a check in configure. If gold is the linker,
> automatically create (somehow, still working on it) the full link script
> by obtaining the default bfd script and add the required parts. Would
> that work for you?

Maybe even do it unconditionally?

Paolo

Alexander Bulekov Aug. 10, 2020, 9:33 p.m. UTC | #13

On 200810 2139, Paolo Bonzini wrote:
> On 10/08/20 21:01, Daniele Buono wrote:
> > So I'm thinking of adding a check in configure. If gold is the linker,
> > automatically create (somehow, still working on it) the full link script
> > by obtaining the default bfd script and add the required parts. Would
> > that work for you?
> 
> Maybe even do it unconditionally?

I agree.

I can try a respin of my compiler-rt/libFuzzer patches to add a built-in
fork-server to libFuzzer, so we can avoid the linker-script madness
altogether. Don't know how soon I can get to this, but I do think it is
worth another try.

TIL about these differences between ld.bfd and ld.gold.
So the idea is to use something like:
"ld --verbose | grep -n ".*:" | grep -A1 "\s.data\s" | tail -n1"
and insert the existing linker-script before that line?
Thanks
-Alex

> Paolo
>

Daniele Buono Aug. 13, 2020, 2 p.m. UTC | #14

Yes, Something like that, probably with a small python script.

On 8/10/2020 5:33 PM, Alexander Bulekov wrote:
> On 200810 2139, Paolo Bonzini wrote:
>> On 10/08/20 21:01, Daniele Buono wrote:
>>> So I'm thinking of adding a check in configure. If gold is the linker,
>>> automatically create (somehow, still working on it) the full link script
>>> by obtaining the default bfd script and add the required parts. Would
>>> that work for you?
>>
>> Maybe even do it unconditionally?
> 
> I agree.
> 
> I can try a respin of my compiler-rt/libFuzzer patches to add a built-in
> fork-server to libFuzzer, so we can avoid the linker-script madness
> altogether. Don't know how soon I can get to this, but I do think it is
> worth another try.
> 
> TIL about these differences between ld.bfd and ld.gold.
> So the idea is to use something like:
> "ld --verbose | grep -n ".*:" | grep -A1 "\s.data\s" | tail -n1"
> and insert the existing linker-script before that line?
> Thanks
> -Alex
> 
>> Paolo
>>
>

[2/2] configure: add support for Control-Flow Integrity

Commit Message

Comments

Patch