mbox series

[RFC,v2,00/21] KCFI support

Message ID 20220513202159.1550547-1-samitolvanen@google.com (mailing list archive)
Headers show
Series KCFI support | expand

Message

Sami Tolvanen May 13, 2022, 8:21 p.m. UTC
KCFI is a proposed forward-edge control-flow integrity scheme for
Clang, which is more suitable for kernel use than the existing CFI
scheme used by CONFIG_CFI_CLANG. KCFI doesn't require LTO, doesn't
alter function references to point to a jump table, and won't break
function address equality. The latest LLVM patch is here:

  https://reviews.llvm.org/D119296

This RFC series replaces the current arm64 CFI implementation with
KCFI and adds support for x86_64.

KCFI requires assembly functions that are indirectly called from C
code to be annotated with type identifiers. As type information is
only available in C, the compiler emits expected type identifiers into
the symbol table, so they can be referenced from assembly without
having to hardcode type hashes. Patch 7 adds helper macros for
annotating functions, and patches 9 and 17 add annotations.

In case of a type mismatch, KCFI always traps. To support error
handling, the compiler generates a .kcfi_traps section for x86_64,
which contains the locations of each trap, and for arm64, encodes
the necessary register information to the ESR. Patches 10 and 20 add
arch-specific error handlers.

To test this series, you'll need to compile your own Clang toolchain
with the patches linked above. You can also find the complete source
tree here:

  https://github.com/samitolvanen/llvm-project/commits/kcfi-rfc-v2

This series is also available in GitHub:

  https://github.com/samitolvanen/linux/commits/kcfi-rfc-v2

Sami Tolvanen (21):
  efi/libstub: Filter out CC_FLAGS_CFI
  arm64/vdso: Filter out CC_FLAGS_CFI
  kallsyms: Ignore __kcfi_typeid_
  cfi: Remove CONFIG_CFI_CLANG_SHADOW
  cfi: Drop __CFI_ADDRESSABLE
  cfi: Switch to -fsanitize=kcfi
  cfi: Add type helper macros
  psci: Fix the function type for psci_initcall_t
  arm64: Add types to indirect called assembly functions
  arm64: Add CFI error handling
  arm64: Drop unneeded __nocfi attributes
  treewide: Drop function_nocfi
  treewide: Drop WARN_ON_FUNCTION_MISMATCH
  treewide: Drop __cficanonical
  objtool: Don't warn about __cfi_ preambles falling through
  x86/tools/relocs: Ignore __kcfi_typeid_ relocations
  x86: Add types to indirectly called assembly functions
  x86/purgatory: Disable CFI
  x86/vdso: Disable CFI
  x86: Add support for CONFIG_CFI_CLANG
  init: Drop __nocfi from __init

 Makefile                                  |  13 +-
 arch/Kconfig                              |  21 +-
 arch/arm64/crypto/ghash-ce-core.S         |   5 +-
 arch/arm64/crypto/sm3-ce-core.S           |   3 +-
 arch/arm64/include/asm/brk-imm.h          |   6 +
 arch/arm64/include/asm/compiler.h         |  16 -
 arch/arm64/include/asm/ftrace.h           |   2 +-
 arch/arm64/include/asm/mmu_context.h      |   4 +-
 arch/arm64/kernel/acpi_parking_protocol.c |   2 +-
 arch/arm64/kernel/alternative.c           |   2 +-
 arch/arm64/kernel/cpu-reset.S             |   5 +-
 arch/arm64/kernel/cpufeature.c            |   4 +-
 arch/arm64/kernel/ftrace.c                |   2 +-
 arch/arm64/kernel/machine_kexec.c         |   2 +-
 arch/arm64/kernel/psci.c                  |   2 +-
 arch/arm64/kernel/smp_spin_table.c        |   2 +-
 arch/arm64/kernel/traps.c                 |  46 ++-
 arch/arm64/kernel/vdso/Makefile           |   3 +-
 arch/arm64/mm/proc.S                      |   5 +-
 arch/x86/Kconfig                          |   2 +
 arch/x86/crypto/blowfish-x86_64-asm_64.S  |   5 +-
 arch/x86/entry/vdso/Makefile              |   3 +-
 arch/x86/include/asm/linkage.h            |  12 +
 arch/x86/kernel/traps.c                   |  60 +++-
 arch/x86/lib/memcpy_64.S                  |   3 +-
 arch/x86/purgatory/Makefile               |   4 +
 arch/x86/tools/relocs.c                   |   1 +
 drivers/firmware/efi/libstub/Makefile     |   2 +
 drivers/firmware/psci/psci.c              |   6 +-
 drivers/misc/lkdtm/usercopy.c             |   2 +-
 include/asm-generic/bug.h                 |  16 -
 include/asm-generic/vmlinux.lds.h         |  37 +--
 include/linux/cfi.h                       |  65 ++--
 include/linux/cfi_types.h                 |  57 ++++
 include/linux/compiler-clang.h            |   6 +-
 include/linux/compiler.h                  |  16 +-
 include/linux/compiler_types.h            |   4 -
 include/linux/init.h                      |   6 +-
 include/linux/module.h                    |  10 +-
 include/linux/pci.h                       |   4 +-
 kernel/cfi.c                              | 343 ++++------------------
 kernel/kthread.c                          |   3 +-
 kernel/module.c                           |  49 +---
 kernel/workqueue.c                        |   2 +-
 scripts/kallsyms.c                        |   1 +
 scripts/module.lds.S                      |  23 +-
 tools/objtool/check.c                     |   4 +
 47 files changed, 357 insertions(+), 534 deletions(-)
 create mode 100644 include/linux/cfi_types.h

Comments

Sami Tolvanen May 16, 2022, 5:57 p.m. UTC | #1
On Mon, May 16, 2022 at 10:31 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>> Anything Like LLVM cmake Options to be condidered and Set?
>
>
> I activate Clang and LLD ad projects - OK - enough?

Clang and LLD should be sufficient.

>> This series is also available in GitHub:
>>
>>   https://github.com/samitolvanen/linux/commits/kcfi-rfc-v2
>>
>>
>>
>
>> Can you please add a history of what changed?

Oops, I forgot to include that.

Changes in v2:
- Changed the compiler patch to encode arm64 target and type details
in the ESR, and updated the kernel error handling patch accordingly.
- Changed the compiler patch to embed the x86_64 type hash in a valid
instruction to avoid special casing objtool instruction decoding, and
added a __cfi_ symbol for the preamble. Changed the kernel error
handling and manual type annotations to match.
- Dropped the .kcfi_types section as that’s no longer needed by
objtool, and changed the objtool patch to simply ignore the __cfi_
preambles falling through.
- Dropped the .kcfi_traps section on arm64 as it’s no longer needed,
and moved the trap look-up code behind CONFIG_ARCH_USES_CFI_TRAPS,
which is selected only for x86_64.
- Dropped __nocfi attributes from arm64 code where CFI was disabled
due to address space confusion issues, and added type annotations to
relevant assembly functions.
- Dropped __nocfi from __init.

> Nathan has a i915 cfi patch in His personal kernel.org Git.
> Is this relevant to kcfi?

It fixes a type mismatch, so in that sense it's relevant.

> To distinguish between clang-cfi we should use different kbuild variables for kcfi.

The plan is to replace the current CFI implementation. Does reusing
the kbuild variable names cause a problem?

Sami
Sedat Dilek May 17, 2022, 7:33 a.m. UTC | #2
On Mon, May 16, 2022 at 7:57 PM Sami Tolvanen <samitolvanen@google.com> wrote:
>
> On Mon, May 16, 2022 at 10:31 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >> Anything Like LLVM cmake Options to be condidered and Set?
> >
> >
> > I activate Clang and LLD ad projects - OK - enough?
>
> Clang and LLD should be sufficient.
>

Before I give this a try...
Some questions about the LLVM toolchain requirements...

I use tc-build to generate a selfmade LLVM toolchain for X86
(stage1-only + BPF).

$ python3 ./build-llvm.py --no-update --build-type Release -p
clang;lld -t X86;BPF --clang-vendor dileks -B
/home/dileks/src/llvm-toolchain/build -I /opt/llvm-toolchain
--check-targets clang lld --build-stage1-only --install-stage1-only -D
LLVM_PARALLEL_LINK_JOBS=1 --show-build-commands

Link: https://github.com/ClangBuiltLinux/tc-build.git

tc-build uses a GOOD_REVISION for regular kernel-builds, this is
currently 3b2e605e33bd.

$ git describe
llvmorg-15-init-5163-g3b2e605e33bd

$ git show -1 42a879eb2f22
commit 42a879eb2f22af6d1b85983c12fac68b2e9a5e03
Author: Nathan Chancellor <nathan@kernel.org>
Date:   Mon Mar 21 09:19:26 2022 -0700

   build-llvm.py: Update known good revision

   Qualified with https://github.com/nathanchance/llvm-kernel-testing.

   Signed-off-by: Nathan Chancellor <nathan@kernel.org>

diff --git a/build-llvm.py b/build-llvm.py
index c3aade1092ec..54e5d4729a1a 100755
--- a/build-llvm.py
+++ b/build-llvm.py
@@ -16,7 +16,7 @@ import urllib.request as request
from urllib.error import URLError

# This is a known good revision of LLVM for building the kernel
-GOOD_REVISION = '08f70adedb775ce6d41a1f8ad75c4bac225efb5b'
+GOOD_REVISION = '3b2e605e33bd9017ff2eff1493add07822f9d58b'


class Directories:

So, your LLVM git is approx. 5.200 commits further.

$ git log --oneline
tc-build/GOOD_REVISION-20210321..for-15/kcfi-rfc-v2-samitolvanen | wc
-l
5190

Is your tags/kcfi-rfc-v2 the recommended LLVM toolchain for testing kcfi-rfc-v2?

What I want to ask is if your commit well tested for x86 (and arm64)
means build and boot on bare metal?

From Nathan I know if he says I have run kernel-tests - it works.
"Qualified with https://github.com/nathanchance/llvm-kernel-testing."

Just for the records:
You definitely need a pre-LLVM-15 toolchain + KCFI sanitizer patch?
LLVM-14?

> >> This series is also available in GitHub:
> >>
> >>   https://github.com/samitolvanen/linux/commits/kcfi-rfc-v2
> >>
> >>
> >>
> >
> >> Can you please add a history of what changed?
>
> Oops, I forgot to include that.
>

Thanks.
Please fold this ChangeLog into kcfi-rfc-v3.

> Changes in v2:
> - Changed the compiler patch to encode arm64 target and type details
> in the ESR, and updated the kernel error handling patch accordingly.
> - Changed the compiler patch to embed the x86_64 type hash in a valid
> instruction to avoid special casing objtool instruction decoding, and
> added a __cfi_ symbol for the preamble. Changed the kernel error
> handling and manual type annotations to match.
> - Dropped the .kcfi_types section as that’s no longer needed by
> objtool, and changed the objtool patch to simply ignore the __cfi_
> preambles falling through.
> - Dropped the .kcfi_traps section on arm64 as it’s no longer needed,
> and moved the trap look-up code behind CONFIG_ARCH_USES_CFI_TRAPS,
> which is selected only for x86_64.
> - Dropped __nocfi attributes from arm64 code where CFI was disabled
> due to address space confusion issues, and added type annotations to
> relevant assembly functions.
> - Dropped __nocfi from __init.
>
> > Nathan has a i915 cfi patch in His personal kernel.org Git.
> > Is this relevant to kcfi?
>
> It fixes a type mismatch, so in that sense it's relevant.
>

Here the link to patch "drm/i915: Fix CFI violation with show_dynamic_id()":

https://git.kernel.org/pub/scm/linux/kernel/git/nathan/linux.git/commit/?h=submitted/i915-cfi-fix&id=53735be6dc53453fcfbac658e847b54360e73871

> > To distinguish between clang-cfi we should use different kbuild variables for kcfi.
>
> The plan is to replace the current CFI implementation. Does reusing
> the kbuild variable names cause a problem?
>

No hurry.
Afaics someone in the long list of comments to single patches was
thinking of when GCC has "KCFI" support implemented...

You say no need to build your kernel with LTO...
That sounds good.
Currently, I build my kernels with Clang-14 and CONFIG_LTO_CLANG_THIN=y.
Does something speak against using CONFIG_LTO_CLANG_THIN=y with KCFI support?
Build-time?
Disc-usage?

Thanks for answering my questions.

- Sedat -
Peter Zijlstra May 17, 2022, 8:57 a.m. UTC | #3
On Fri, May 13, 2022 at 01:21:38PM -0700, Sami Tolvanen wrote:
> KCFI is a proposed forward-edge control-flow integrity scheme for
> Clang, which is more suitable for kernel use than the existing CFI
> scheme used by CONFIG_CFI_CLANG. KCFI doesn't require LTO, doesn't
> alter function references to point to a jump table, and won't break
> function address equality. The latest LLVM patch is here:
> 
>   https://reviews.llvm.org/D119296
> 
> This RFC series replaces the current arm64 CFI implementation with
> KCFI and adds support for x86_64.

You have some weird behaviour vs weak functions (I so hate those)...

100: 0000000000000980     9 FUNC    LOCAL  DEFAULT    2 __cfi_free_initmem
233: 0000000000000989    35 FUNC    WEAK   DEFAULT    2 free_initmem

With the result that on the final link:

   179: 00000000000009b0     9 FUNC    LOCAL  DEFAULT    1 __cfi_free_initmem
  8689: 00000000000007f0     9 FUNC    LOCAL  DEFAULT   65 __cfi_free_initmem
173283: 00000000000007f9   198 FUNC    GLOBAL DEFAULT   65 free_initmem

This is getting me objtool issues (I'll fix them) but perhaps it's
something you can do something about as well.
Nathan Chancellor May 17, 2022, 6:49 p.m. UTC | #4
On Tue, May 17, 2022 at 09:33:47AM +0200, Sedat Dilek wrote:
> Is your tags/kcfi-rfc-v2 the recommended LLVM toolchain for testing kcfi-rfc-v2?
> 
> What I want to ask is if your commit well tested for x86 (and arm64)
> means build and boot on bare metal?

I have run kCFI (v1, I haven't had time to test v2) on x86_64 hardware,
both AMD and Intel. I have only found two failures so far: the i915
issue that I mention below and a failure in the KVM subsystem, which I
can see by just running QEMU:

[  124.344654] CFI failure at kvm_mmu_notifier_invalidate_range_end+0xca/0x177 [kvm] (target: kvm_null_fn+0x0/0x7 [kvm]; expected type: 0x1e2c5b9c)
[  124.344691] WARNING: CPU: 5 PID: 2767 at kvm_mmu_notifier_invalidate_range_end+0xca/0x177 [kvm]
[  124.344708] Modules linked in: ...
[  124.344737] CPU: 5 PID: 2767 Comm: qemu-system-x86 Tainted: P                  5.18.0-rc6-debug-00033-g1d5284aff7cd #1 8c4966c7fb24f3345076747feebac5fb340a4797
[  124.344738] Hardware name: ASUS System Product Name/PRIME Z590M-PLUS, BIOS 1203 10/27/2021
[  124.344738] RIP: 0010:kvm_mmu_notifier_invalidate_range_end+0xca/0x177 [kvm]
[  124.344756] Code: 48 89 da e8 88 16 86 dd 48 85 c0 89 6c 24 0c 74 30 4d 85 f6 74 36 48 c7 c5 19 c8 b4 c0 4c 8b 34 24 81 7d fa 9c 5b 2c 1e 74 02 <0f> 0b 48 89 c7 4c 89 fe 48 89 da e8 b6 16 86 dd 48 85 c0 75 e2 eb
[  124.344757] RSP: 0018:ffffc033039d78e8 EFLAGS: 00010282
[  124.344758] RAX: ffffa0a141ed9450 RBX: 00007fb897e9efff RCX: ffffa0a143c9d850
[  124.344758] RDX: 00007fb897e9efff RSI: 00007fb897e9e000 RDI: ffffc03304499cf8
[  124.344759] RBP: ffffffffc0b4c819 R08: 0000000000000000 R09: 0000000000000000
[  124.344759] R10: 0000000000000000 R11: ffffffffc0b4c639 R12: ffffc03304499000
[  124.344760] R13: ffffc033044a30a0 R14: ffffc033044a3120 R15: 00007fb897e9e000
[  124.344760] FS:  00007fbca4bc3640(0000) GS:ffffa0a87f540000(0000) knlGS:0000000000000000
[  124.344761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  124.344761] CR2: 0000000000000000 CR3: 0000000119f74006 CR4: 0000000000772ee0
[  124.344762] PKRU: 55555554
[  124.344762] Call Trace:
[  124.344762]  <TASK>
[  124.344763]  __mmu_notifier_invalidate_range_end+0xa1/0xd7
[  124.344764]  wp_page_copy+0x592/0x920
[  124.344766]  __handle_mm_fault+0x820/0x8f0
[  124.344767]  handle_mm_fault+0xe0/0x227
[  124.344768]  __get_user_pages+0x17a/0x430
[  124.344769]  get_user_pages_unlocked+0xd9/0x327
[  124.344770]  hva_to_pfn+0xfa/0x3f7 [kvm 950e483805e01f967a3e331506278d766494a4cc]
[  124.344788]  kvm_faultin_pfn+0xc3/0x2f0 [kvm 950e483805e01f967a3e331506278d766494a4cc]
[  124.344806]  ? fast_page_fault+0x400/0x4c0 [kvm 950e483805e01f967a3e331506278d766494a4cc]
[  124.344823]  direct_page_fault+0x130/0x350 [kvm 950e483805e01f967a3e331506278d766494a4cc]
[  124.344841]  kvm_mmu_page_fault+0x176/0x2f7 [kvm 950e483805e01f967a3e331506278d766494a4cc]
[  124.344858]  vmx_handle_exit+0xe/0x37 [kvm_intel 647f514f18fc365045dc57db46d2bb4930b746eb]
[  124.344863]  vcpu_enter_guest+0xbff/0x1030 [kvm 950e483805e01f967a3e331506278d766494a4cc]
[  124.344881]  vcpu_run+0x65/0x2f0 [kvm 950e483805e01f967a3e331506278d766494a4cc]
[  124.344898]  kvm_arch_vcpu_ioctl_run+0x15f/0x3f7 [kvm 950e483805e01f967a3e331506278d766494a4cc]
[  124.344916]  kvm_vcpu_ioctl+0x547/0x627 [kvm 950e483805e01f967a3e331506278d766494a4cc]
[  124.344933]  ? syscall_exit_to_user_mode+0x24/0x47
[  124.344934]  ? do_syscall_64+0x7a/0x97
[  124.344935]  ? __fget_files+0xa1/0xc0
[  124.344936]  __se_sys_ioctl+0x7c/0xc0
[  124.344937]  do_syscall_64+0x6c/0x97
[  124.344938]  ? do_syscall_64+0x7a/0x97
[  124.344939]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  124.344940] RIP: 0033:0x7fbca75f7b1f
[  124.344940] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[  124.344941] RSP: 002b:00007fbca4bc2550 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  124.344942] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fbca75f7b1f
[  124.344942] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000e
[  124.344943] RBP: 000056351ba661d0 R08: 000056351a74dc48 R09: 0000000000000004
[  124.344943] R10: 004818cf4f129e11 R11: 0000000000000246 R12: 0000000000000000
[  124.344943] R13: 0000000000000000 R14: 00007fbca814f004 R15: 0000000000000000
[  124.344944]  </TASK>
[  124.344945] ---[ end trace 0000000000000000 ]---

I don't do any bare metal testing for the tc-build known good revision
bumps, just QEMU tests with boot-utils.

> Just for the records:
> You definitely need a pre-LLVM-15 toolchain + KCFI sanitizer patch?
> LLVM-14?

The feature won't land in LLVM 14, so there is little point to
backporting and testing it on LLVM 14. This is a work in progress
feature so it has to target the main branch. I have fairly good coverage
of tip of tree locally and we have solid coverage through CI so we'll
know if something major happens, it should be pretty safe to test Sami's
series.

> > > Nathan has a i915 cfi patch in His personal kernel.org Git.
> > > Is this relevant to kcfi?
> >
> > It fixes a type mismatch, so in that sense it's relevant.
> >
> 
> Here the link to patch "drm/i915: Fix CFI violation with show_dynamic_id()":
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/nathan/linux.git/commit/?h=submitted/i915-cfi-fix&id=53735be6dc53453fcfbac658e847b54360e73871

This is now in the i915 tree so that branch is going to disappear:

https://cgit.freedesktop.org/drm/drm-intel/commit/?id=18fb42db05a0b93ab5dd5eab5315e50eaa3ca620

> You say no need to build your kernel with LTO...
> That sounds good.
> Currently, I build my kernels with Clang-14 and CONFIG_LTO_CLANG_THIN=y.
> Does something speak against using CONFIG_LTO_CLANG_THIN=y with KCFI support?
> Build-time?
> Disc-usage?

I wouldn't expect the build time or disk usage to significantly increase
with kCFI. I ran a quick benchmark with Arch Linux's configuration with
ThinLTO then ThinLTO + kCFI on an AMD EPYC 7513:

Benchmark 1: ThinLTO
  Time (abs ≡):        166.036 s               [User: 4687.951 s, System: 1636.767 s]

Benchmark 2: ThinLTO + kCFI
  Time (abs ≡):        168.739 s               [User: 4682.020 s, System: 1638.109 s]

Summary
  'ThinLTO' ran
    1.02 times faster than 'ThinLTO + kCFI'

Cheers,
Nathan
Sami Tolvanen May 17, 2022, 8:25 p.m. UTC | #5
On Tue, May 17, 2022 at 1:58 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, May 13, 2022 at 01:21:38PM -0700, Sami Tolvanen wrote:
> > KCFI is a proposed forward-edge control-flow integrity scheme for
> > Clang, which is more suitable for kernel use than the existing CFI
> > scheme used by CONFIG_CFI_CLANG. KCFI doesn't require LTO, doesn't
> > alter function references to point to a jump table, and won't break
> > function address equality. The latest LLVM patch is here:
> >
> >   https://reviews.llvm.org/D119296
> >
> > This RFC series replaces the current arm64 CFI implementation with
> > KCFI and adds support for x86_64.
>
> You have some weird behaviour vs weak functions (I so hate those)...
>
> 100: 0000000000000980     9 FUNC    LOCAL  DEFAULT    2 __cfi_free_initmem
> 233: 0000000000000989    35 FUNC    WEAK   DEFAULT    2 free_initmem
>
> With the result that on the final link:
>
>    179: 00000000000009b0     9 FUNC    LOCAL  DEFAULT    1 __cfi_free_initmem
>   8689: 00000000000007f0     9 FUNC    LOCAL  DEFAULT   65 __cfi_free_initmem
> 173283: 00000000000007f9   198 FUNC    GLOBAL DEFAULT   65 free_initmem
>
> This is getting me objtool issues (I'll fix them) but perhaps it's
> something you can do something about as well.

Good catch, I'll fix this.

Sami
Sedat Dilek May 19, 2022, 9:01 a.m. UTC | #6
On Tue, May 17, 2022 at 8:49 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Tue, May 17, 2022 at 09:33:47AM +0200, Sedat Dilek wrote:
> > Is your tags/kcfi-rfc-v2 the recommended LLVM toolchain for testing kcfi-rfc-v2?
> >
> > What I want to ask is if your commit well tested for x86 (and arm64)
> > means build and boot on bare metal?
>
> I have run kCFI (v1, I haven't had time to test v2) on x86_64 hardware,
> both AMD and Intel. I have only found two failures so far: the i915
> issue that I mention below and a failure in the KVM subsystem, which I
> can see by just running QEMU:
>

Is there a fix around this issue?

> [  124.344654] CFI failure at kvm_mmu_notifier_invalidate_range_end+0xca/0x177 [kvm] (target: kvm_null_fn+0x0/0x7 [kvm]; expected type: 0x1e2c5b9c)
> [  124.344691] WARNING: CPU: 5 PID: 2767 at kvm_mmu_notifier_invalidate_range_end+0xca/0x177 [kvm]
> [  124.344708] Modules linked in: ...
> [  124.344737] CPU: 5 PID: 2767 Comm: qemu-system-x86 Tainted: P                  5.18.0-rc6-debug-00033-g1d5284aff7cd #1 8c4966c7fb24f3345076747feebac5fb340a4797
> [  124.344738] Hardware name: ASUS System Product Name/PRIME Z590M-PLUS, BIOS 1203 10/27/2021
> [  124.344738] RIP: 0010:kvm_mmu_notifier_invalidate_range_end+0xca/0x177 [kvm]
> [  124.344756] Code: 48 89 da e8 88 16 86 dd 48 85 c0 89 6c 24 0c 74 30 4d 85 f6 74 36 48 c7 c5 19 c8 b4 c0 4c 8b 34 24 81 7d fa 9c 5b 2c 1e 74 02 <0f> 0b 48 89 c7 4c 89 fe 48 89 da e8 b6 16 86 dd 48 85 c0 75 e2 eb
> [  124.344757] RSP: 0018:ffffc033039d78e8 EFLAGS: 00010282
> [  124.344758] RAX: ffffa0a141ed9450 RBX: 00007fb897e9efff RCX: ffffa0a143c9d850
> [  124.344758] RDX: 00007fb897e9efff RSI: 00007fb897e9e000 RDI: ffffc03304499cf8
> [  124.344759] RBP: ffffffffc0b4c819 R08: 0000000000000000 R09: 0000000000000000
> [  124.344759] R10: 0000000000000000 R11: ffffffffc0b4c639 R12: ffffc03304499000
> [  124.344760] R13: ffffc033044a30a0 R14: ffffc033044a3120 R15: 00007fb897e9e000
> [  124.344760] FS:  00007fbca4bc3640(0000) GS:ffffa0a87f540000(0000) knlGS:0000000000000000
> [  124.344761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  124.344761] CR2: 0000000000000000 CR3: 0000000119f74006 CR4: 0000000000772ee0
> [  124.344762] PKRU: 55555554
> [  124.344762] Call Trace:
> [  124.344762]  <TASK>
> [  124.344763]  __mmu_notifier_invalidate_range_end+0xa1/0xd7
> [  124.344764]  wp_page_copy+0x592/0x920
> [  124.344766]  __handle_mm_fault+0x820/0x8f0
> [  124.344767]  handle_mm_fault+0xe0/0x227
> [  124.344768]  __get_user_pages+0x17a/0x430
> [  124.344769]  get_user_pages_unlocked+0xd9/0x327
> [  124.344770]  hva_to_pfn+0xfa/0x3f7 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> [  124.344788]  kvm_faultin_pfn+0xc3/0x2f0 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> [  124.344806]  ? fast_page_fault+0x400/0x4c0 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> [  124.344823]  direct_page_fault+0x130/0x350 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> [  124.344841]  kvm_mmu_page_fault+0x176/0x2f7 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> [  124.344858]  vmx_handle_exit+0xe/0x37 [kvm_intel 647f514f18fc365045dc57db46d2bb4930b746eb]
> [  124.344863]  vcpu_enter_guest+0xbff/0x1030 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> [  124.344881]  vcpu_run+0x65/0x2f0 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> [  124.344898]  kvm_arch_vcpu_ioctl_run+0x15f/0x3f7 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> [  124.344916]  kvm_vcpu_ioctl+0x547/0x627 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> [  124.344933]  ? syscall_exit_to_user_mode+0x24/0x47
> [  124.344934]  ? do_syscall_64+0x7a/0x97
> [  124.344935]  ? __fget_files+0xa1/0xc0
> [  124.344936]  __se_sys_ioctl+0x7c/0xc0
> [  124.344937]  do_syscall_64+0x6c/0x97
> [  124.344938]  ? do_syscall_64+0x7a/0x97
> [  124.344939]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  124.344940] RIP: 0033:0x7fbca75f7b1f
> [  124.344940] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> [  124.344941] RSP: 002b:00007fbca4bc2550 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [  124.344942] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fbca75f7b1f
> [  124.344942] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000e
> [  124.344943] RBP: 000056351ba661d0 R08: 000056351a74dc48 R09: 0000000000000004
> [  124.344943] R10: 004818cf4f129e11 R11: 0000000000000246 R12: 0000000000000000
> [  124.344943] R13: 0000000000000000 R14: 00007fbca814f004 R15: 0000000000000000
> [  124.344944]  </TASK>
> [  124.344945] ---[ end trace 0000000000000000 ]---
>
> I don't do any bare metal testing for the tc-build known good revision
> bumps, just QEMU tests with boot-utils.
>
> > Just for the records:
> > You definitely need a pre-LLVM-15 toolchain + KCFI sanitizer patch?
> > LLVM-14?
>
> The feature won't land in LLVM 14, so there is little point to
> backporting and testing it on LLVM 14. This is a work in progress
> feature so it has to target the main branch. I have fairly good coverage
> of tip of tree locally and we have solid coverage through CI so we'll
> know if something major happens, it should be pretty safe to test Sami's
> series.
>

OK, no backport.

> > > > Nathan has a i915 cfi patch in His personal kernel.org Git.
> > > > Is this relevant to kcfi?
> > >
> > > It fixes a type mismatch, so in that sense it's relevant.
> > >
> >
> > Here the link to patch "drm/i915: Fix CFI violation with show_dynamic_id()":
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/nathan/linux.git/commit/?h=submitted/i915-cfi-fix&id=53735be6dc53453fcfbac658e847b54360e73871
>
> This is now in the i915 tree so that branch is going to disappear:
>
> https://cgit.freedesktop.org/drm/drm-intel/commit/?id=18fb42db05a0b93ab5dd5eab5315e50eaa3ca620
>

Good to know.

> > You say no need to build your kernel with LTO...
> > That sounds good.
> > Currently, I build my kernels with Clang-14 and CONFIG_LTO_CLANG_THIN=y.
> > Does something speak against using CONFIG_LTO_CLANG_THIN=y with KCFI support?
> > Build-time?
> > Disc-usage?
>
> I wouldn't expect the build time or disk usage to significantly increase
> with kCFI. I ran a quick benchmark with Arch Linux's configuration with
> ThinLTO then ThinLTO + kCFI on an AMD EPYC 7513:
>
> Benchmark 1: ThinLTO
>   Time (abs ≡):        166.036 s               [User: 4687.951 s, System: 1636.767 s]
>
> Benchmark 2: ThinLTO + kCFI
>   Time (abs ≡):        168.739 s               [User: 4682.020 s, System: 1638.109 s]
>
> Summary
>   'ThinLTO' ran
>     1.02 times faster than 'ThinLTO + kCFI'
>

Thanks for your feedback.

-Sedat-
Nathan Chancellor May 19, 2022, 8:26 p.m. UTC | #7
On Thu, May 19, 2022 at 11:01:40AM +0200, Sedat Dilek wrote:
> On Tue, May 17, 2022 at 8:49 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > I have run kCFI (v1, I haven't had time to test v2) on x86_64 hardware,
> > both AMD and Intel. I have only found two failures so far: the i915
> > issue that I mention below and a failure in the KVM subsystem, which I
> > can see by just running QEMU:
> >
> 
> Is there a fix around this issue?

No, I mentioned it offhand to Sami in IRC but I never followed up. This
failure appears to be introduced by commit f922bd9bf33b ("KVM: Move MMU
notifier's mmu_lock acquisition into common helper").

Sami, do you want an issue opened around this somewhere?

> > [  124.344654] CFI failure at kvm_mmu_notifier_invalidate_range_end+0xca/0x177 [kvm] (target: kvm_null_fn+0x0/0x7 [kvm]; expected type: 0x1e2c5b9c)
> > [  124.344691] WARNING: CPU: 5 PID: 2767 at kvm_mmu_notifier_invalidate_range_end+0xca/0x177 [kvm]
> > [  124.344708] Modules linked in: ...
> > [  124.344737] CPU: 5 PID: 2767 Comm: qemu-system-x86 Tainted: P                  5.18.0-rc6-debug-00033-g1d5284aff7cd #1 8c4966c7fb24f3345076747feebac5fb340a4797
> > [  124.344738] Hardware name: ASUS System Product Name/PRIME Z590M-PLUS, BIOS 1203 10/27/2021
> > [  124.344738] RIP: 0010:kvm_mmu_notifier_invalidate_range_end+0xca/0x177 [kvm]
> > [  124.344756] Code: 48 89 da e8 88 16 86 dd 48 85 c0 89 6c 24 0c 74 30 4d 85 f6 74 36 48 c7 c5 19 c8 b4 c0 4c 8b 34 24 81 7d fa 9c 5b 2c 1e 74 02 <0f> 0b 48 89 c7 4c 89 fe 48 89 da e8 b6 16 86 dd 48 85 c0 75 e2 eb
> > [  124.344757] RSP: 0018:ffffc033039d78e8 EFLAGS: 00010282
> > [  124.344758] RAX: ffffa0a141ed9450 RBX: 00007fb897e9efff RCX: ffffa0a143c9d850
> > [  124.344758] RDX: 00007fb897e9efff RSI: 00007fb897e9e000 RDI: ffffc03304499cf8
> > [  124.344759] RBP: ffffffffc0b4c819 R08: 0000000000000000 R09: 0000000000000000
> > [  124.344759] R10: 0000000000000000 R11: ffffffffc0b4c639 R12: ffffc03304499000
> > [  124.344760] R13: ffffc033044a30a0 R14: ffffc033044a3120 R15: 00007fb897e9e000
> > [  124.344760] FS:  00007fbca4bc3640(0000) GS:ffffa0a87f540000(0000) knlGS:0000000000000000
> > [  124.344761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  124.344761] CR2: 0000000000000000 CR3: 0000000119f74006 CR4: 0000000000772ee0
> > [  124.344762] PKRU: 55555554
> > [  124.344762] Call Trace:
> > [  124.344762]  <TASK>
> > [  124.344763]  __mmu_notifier_invalidate_range_end+0xa1/0xd7
> > [  124.344764]  wp_page_copy+0x592/0x920
> > [  124.344766]  __handle_mm_fault+0x820/0x8f0
> > [  124.344767]  handle_mm_fault+0xe0/0x227
> > [  124.344768]  __get_user_pages+0x17a/0x430
> > [  124.344769]  get_user_pages_unlocked+0xd9/0x327
> > [  124.344770]  hva_to_pfn+0xfa/0x3f7 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> > [  124.344788]  kvm_faultin_pfn+0xc3/0x2f0 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> > [  124.344806]  ? fast_page_fault+0x400/0x4c0 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> > [  124.344823]  direct_page_fault+0x130/0x350 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> > [  124.344841]  kvm_mmu_page_fault+0x176/0x2f7 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> > [  124.344858]  vmx_handle_exit+0xe/0x37 [kvm_intel 647f514f18fc365045dc57db46d2bb4930b746eb]
> > [  124.344863]  vcpu_enter_guest+0xbff/0x1030 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> > [  124.344881]  vcpu_run+0x65/0x2f0 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> > [  124.344898]  kvm_arch_vcpu_ioctl_run+0x15f/0x3f7 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> > [  124.344916]  kvm_vcpu_ioctl+0x547/0x627 [kvm 950e483805e01f967a3e331506278d766494a4cc]
> > [  124.344933]  ? syscall_exit_to_user_mode+0x24/0x47
> > [  124.344934]  ? do_syscall_64+0x7a/0x97
> > [  124.344935]  ? __fget_files+0xa1/0xc0
> > [  124.344936]  __se_sys_ioctl+0x7c/0xc0
> > [  124.344937]  do_syscall_64+0x6c/0x97
> > [  124.344938]  ? do_syscall_64+0x7a/0x97
> > [  124.344939]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  124.344940] RIP: 0033:0x7fbca75f7b1f
> > [  124.344940] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> > [  124.344941] RSP: 002b:00007fbca4bc2550 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > [  124.344942] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fbca75f7b1f
> > [  124.344942] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000e
> > [  124.344943] RBP: 000056351ba661d0 R08: 000056351a74dc48 R09: 0000000000000004
> > [  124.344943] R10: 004818cf4f129e11 R11: 0000000000000246 R12: 0000000000000000
> > [  124.344943] R13: 0000000000000000 R14: 00007fbca814f004 R15: 0000000000000000
> > [  124.344944]  </TASK>
> > [  124.344945] ---[ end trace 0000000000000000 ]---

Cheers,
Nathan
Sami Tolvanen May 19, 2022, 8:41 p.m. UTC | #8
On Thu, May 19, 2022 at 1:26 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Thu, May 19, 2022 at 11:01:40AM +0200, Sedat Dilek wrote:
> > On Tue, May 17, 2022 at 8:49 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > > I have run kCFI (v1, I haven't had time to test v2) on x86_64 hardware,
> > > both AMD and Intel. I have only found two failures so far: the i915
> > > issue that I mention below and a failure in the KVM subsystem, which I
> > > can see by just running QEMU:
> > >
> >
> > Is there a fix around this issue?
>
> No, I mentioned it offhand to Sami in IRC but I never followed up. This
> failure appears to be introduced by commit f922bd9bf33b ("KVM: Move MMU
> notifier's mmu_lock acquisition into common helper").
>
> Sami, do you want an issue opened around this somewhere?

Yes, please file a bug. The ClangBuiltLinux Github is probably the best place.

Sami