From patchwork Mon Nov 13 02:23:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453568 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4016918647 for ; Mon, 13 Nov 2023 02:32:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="bV4lvDvQ" Received: from smtp-bc0c.mail.infomaniak.ch (smtp-bc0c.mail.infomaniak.ch [IPv6:2001:1600:4:17::bc0c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DF8A137 for ; Sun, 12 Nov 2023 18:32:28 -0800 (PST) Received: from smtp-3-0000.mail.infomaniak.ch (unknown [10.4.36.107]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCsp5jGkzMpvS5; Mon, 13 Nov 2023 02:23:58 +0000 (UTC) Received: from unknown by smtp-3-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCsn4w8yz3W; Mon, 13 Nov 2023 03:23:57 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842238; bh=tY8hlJfPopuy154H09Cr6nqcw/h0DUTJ1/nUw09u9eU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bV4lvDvQTCac2IMzSmZNNLVBOtTqAuS2XLzxMp+OAzsv+rke/FG2TXyxhi8G9+60p uVBBHLsVIDwfgY2jYAyUCw2LIYtSt/u33MBaH9fX9Ljjyq03aPdOVNbFjYzptoLzUl mUsHCWJz6tf9YxVBhc8meVQg8fYV2sX1kdkepvvo= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 01/19] virt: Introduce Hypervisor Enforced Kernel Integrity (Heki) Date: Sun, 12 Nov 2023 21:23:08 -0500 Message-ID: <20231113022326.24388-2-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha From: Madhavan T. Venkataraman Hypervisor Enforced Kernel Integrity (Heki) is a feature that will use the hypervisor to enhance guest virtual machine security. Implement minimal code to introduce Heki: - Define the config variables. - Define a kernel command line parameter "heki" to turn the feature on or off. By default, Heki is on. - Define heki_early_init() and call it in start_kernel(). Currently, this function only prints the value of the "heki" command line parameter. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Co-developed-by: Mickaël Salaün Signed-off-by: Mickaël Salaün Signed-off-by: Madhavan T. Venkataraman --- Changes since v1: * Shrinked this patch to only contain the minimal common parts. * Moved heki_early_init() to start_kernel(). --- Kconfig | 2 ++ arch/x86/Kconfig | 1 + include/linux/heki.h | 31 +++++++++++++++++++++++++++++++ init/main.c | 2 ++ mm/mm_init.c | 1 + virt/Makefile | 1 + virt/heki/Kconfig | 19 +++++++++++++++++++ virt/heki/Makefile | 3 +++ virt/heki/common.h | 16 ++++++++++++++++ virt/heki/main.c | 32 ++++++++++++++++++++++++++++++++ 10 files changed, 108 insertions(+) create mode 100644 include/linux/heki.h create mode 100644 virt/heki/Kconfig create mode 100644 virt/heki/Makefile create mode 100644 virt/heki/common.h create mode 100644 virt/heki/main.c diff --git a/Kconfig b/Kconfig index 745bc773f567..0c844d9bcb03 100644 --- a/Kconfig +++ b/Kconfig @@ -29,4 +29,6 @@ source "lib/Kconfig" source "lib/Kconfig.debug" +source "virt/heki/Kconfig" + source "Documentation/Kconfig" diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 66bfabae8814..424f949442bd 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -35,6 +35,7 @@ config X86_64 select SWIOTLB select ARCH_HAS_ELFCORE_COMPAT select ZONE_DMA32 + select ARCH_SUPPORTS_HEKI config FORCE_DYNAMIC_FTRACE def_bool y diff --git a/include/linux/heki.h b/include/linux/heki.h new file mode 100644 index 000000000000..4c18d2283392 --- /dev/null +++ b/include/linux/heki.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Hypervisor Enforced Kernel Integrity (Heki) - Definitions + * + * Copyright © 2023 Microsoft Corporation + */ + +#ifndef __HEKI_H__ +#define __HEKI_H__ + +#include +#include +#include +#include +#include + +#ifdef CONFIG_HEKI + +extern bool heki_enabled; + +void heki_early_init(void); + +#else /* !CONFIG_HEKI */ + +static inline void heki_early_init(void) +{ +} + +#endif /* CONFIG_HEKI */ + +#endif /* __HEKI_H__ */ diff --git a/init/main.c b/init/main.c index 436d73261810..0d28301c5402 100644 --- a/init/main.c +++ b/init/main.c @@ -99,6 +99,7 @@ #include #include #include +#include #include #include @@ -1047,6 +1048,7 @@ void start_kernel(void) uts_ns_init(); key_init(); security_init(); + heki_early_init(); dbg_late_init(); net_ns_init(); vfs_caches_init(); diff --git a/mm/mm_init.c b/mm/mm_init.c index 50f2f34745af..896977383cc3 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -26,6 +26,7 @@ #include #include #include +#include #include "internal.h" #include "slab.h" #include "shuffle.h" diff --git a/virt/Makefile b/virt/Makefile index 1cfea9436af9..4550dc624466 100644 --- a/virt/Makefile +++ b/virt/Makefile @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only obj-y += lib/ +obj-$(CONFIG_HEKI) += heki/ diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig new file mode 100644 index 000000000000..49695fff6d21 --- /dev/null +++ b/virt/heki/Kconfig @@ -0,0 +1,19 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Hypervisor Enforced Kernel Integrity (Heki) + +config HEKI + bool "Hypervisor Enforced Kernel Integrity (Heki)" + depends on ARCH_SUPPORTS_HEKI + help + This feature enhances guest virtual machine security by taking + advantage of security features provided by the hypervisor for guests. + This feature is helpful in maintaining guest virtual machine security + even after the guest kernel has been compromised. + +config ARCH_SUPPORTS_HEKI + bool "Architecture support for Heki" + help + An architecture should select this when it can successfully build + and run with CONFIG_HEKI. That is, it should provide all of the + architecture support required for the HEKI feature. diff --git a/virt/heki/Makefile b/virt/heki/Makefile new file mode 100644 index 000000000000..354e567df71c --- /dev/null +++ b/virt/heki/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only + +obj-y += main.o diff --git a/virt/heki/common.h b/virt/heki/common.h new file mode 100644 index 000000000000..edd98fc650a8 --- /dev/null +++ b/virt/heki/common.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Hypervisor Enforced Kernel Integrity (Heki) - Common header + * + * Copyright © 2023 Microsoft Corporation + */ + +#ifndef _HEKI_COMMON_H + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) "heki-guest: " fmt + +#endif /* _HEKI_COMMON_H */ diff --git a/virt/heki/main.c b/virt/heki/main.c new file mode 100644 index 000000000000..f005dd74d586 --- /dev/null +++ b/virt/heki/main.c @@ -0,0 +1,32 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Hypervisor Enforced Kernel Integrity (Heki) - Common code + * + * Copyright © 2023 Microsoft Corporation + */ + +#include + +#include "common.h" + +bool heki_enabled __ro_after_init = true; + +/* + * Must be called after kmem_cache_init(). + */ +__init void heki_early_init(void) +{ + if (!heki_enabled) { + pr_warn("Heki is not enabled\n"); + return; + } + pr_warn("Heki is enabled\n"); +} + +static int __init heki_parse_config(char *str) +{ + if (strtobool(str, &heki_enabled)) + pr_warn("Invalid option string for heki: '%s'\n", str); + return 1; +} +__setup("heki=", heki_parse_config); From patchwork Mon Nov 13 02:23:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453574 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AC3D1BDE3 for ; Mon, 13 Nov 2023 02:32:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="X7pEw2w5" Received: from smtp-8faa.mail.infomaniak.ch (smtp-8faa.mail.infomaniak.ch [IPv6:2001:1600:4:17::8faa]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D03010C6 for ; Sun, 12 Nov 2023 18:32:29 -0800 (PST) Received: from smtp-3-0000.mail.infomaniak.ch (unknown [10.4.36.107]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCst2fcPzMpvWb; Mon, 13 Nov 2023 02:24:02 +0000 (UTC) Received: from unknown by smtp-3-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCss1MkLz3X; Mon, 13 Nov 2023 03:24:01 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842242; bh=vhzaT58poP5okVAXmEebxmQ9WRq+VZ+aXO32eUAzlaQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=X7pEw2w51SkJJzAlrXjs8JOCbu2+ngtUoK310kXjccKuEe8o7gDyzceYr3iwHfDCI 6YZB4h6J3uY7B1p5cJbUODXmpzLQyLmqczUCEmD5HyT3UGnHxGOqFyygyEea1ipbc6 rprruE1hn5ZjbMnPK33x06IEzkmtAjdE/mMhNDU4= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 02/19] KVM: x86: Add new hypercall to lock control registers Date: Sun, 12 Nov 2023 21:23:09 -0500 Message-ID: <20231113022326.24388-3-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha This enables guests to lock their CR0 and CR4 registers with a subset of X86_CR0_WP, X86_CR4_SMEP, X86_CR4_SMAP, X86_CR4_UMIP, X86_CR4_FSGSBASE and X86_CR4_CET flags. The new KVM_HC_LOCK_CR_UPDATE hypercall takes three arguments. The first is to identify the control register, the second is a bit mask to pin (i.e. mark as read-only), and the third is for optional flags. These register flags should already be pinned by Linux guests, but once compromised, this self-protection mechanism could be disabled, which is not the case with this dedicated hypercall. Once the CRs are pinned by the guest, if it attempts to change them, then a general protection fault is sent to the guest. This hypercall may evolve and support new kind of registers or pinning. The optional KVM_LOCK_CR_UPDATE_VERSION flag enables guests to know the supported abilities by mapping the returned version with the related features. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Mickaël Salaün --- Changes since v1: * Guard KVM_HC_LOCK_CR_UPDATE hypercall with CONFIG_HEKI. * Move extern cr4_pinned_mask to x86.h (suggested by Kees Cook). * Move VMX CR checks from vmx_set_cr*() to handle_cr() to make it possible to return to user space (see next commit). * Change the heki_check_cr()'s first argument to vcpu. * Don't use -KVM_EPERM in heki_check_cr(). * Generate a fault when the guest requests a denied CR update. * Add a flags argument to get the version of this hypercall. Being able to do a preper version check was suggested by Wei Liu. --- Documentation/virt/kvm/x86/hypercalls.rst | 17 +++++ arch/x86/include/uapi/asm/kvm_para.h | 2 + arch/x86/kernel/cpu/common.c | 4 +- arch/x86/kvm/vmx/vmx.c | 5 ++ arch/x86/kvm/x86.c | 84 +++++++++++++++++++++++ arch/x86/kvm/x86.h | 22 ++++++ include/linux/kvm_host.h | 5 ++ include/uapi/linux/kvm_para.h | 1 + 8 files changed, 139 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/x86/hypercalls.rst b/Documentation/virt/kvm/x86/hypercalls.rst index 10db7924720f..3178576f4c47 100644 --- a/Documentation/virt/kvm/x86/hypercalls.rst +++ b/Documentation/virt/kvm/x86/hypercalls.rst @@ -190,3 +190,20 @@ the KVM_CAP_EXIT_HYPERCALL capability. Userspace must enable that capability before advertising KVM_FEATURE_HC_MAP_GPA_RANGE in the guest CPUID. In addition, if the guest supports KVM_FEATURE_MIGRATION_CONTROL, userspace must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL. + +9. KVM_HC_LOCK_CR_UPDATE +------------------------ + +:Architecture: x86 +:Status: active +:Purpose: Request some control registers to be restricted. + +- a0: identify a control register +- a1: bit mask to make some flags read-only +- a2: optional KVM_LOCK_CR_UPDATE_VERSION flag that will return the version of + this hypercall. Version 1 supports CR0 and CR4 pinning. + +The hypercall lets a guest request control register flags to be pinned for +itself. + +Returns 0 on success or a KVM error code otherwise. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 6e64b27b2c1e..efc5ccc0060f 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -150,4 +150,6 @@ struct kvm_vcpu_pv_apf_data { #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK #define KVM_PV_EOI_DISABLED 0x0 +#define KVM_LOCK_CR_UPDATE_VERSION (1 << 0) + #endif /* _UAPI_ASM_X86_KVM_PARA_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 4e5ffc8b0e46..f18ee7ce0496 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -400,9 +400,11 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c) } /* These bits should not change their value after CPU init is finished. */ -static const unsigned long cr4_pinned_mask = +const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP | X86_CR4_FSGSBASE | X86_CR4_CET; +EXPORT_SYMBOL_GPL(cr4_pinned_mask); + static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning); static unsigned long cr4_pinned_bits __ro_after_init; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6e502ba93141..f487bf16dd96 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5452,6 +5452,11 @@ static int handle_cr(struct kvm_vcpu *vcpu) case 0: /* mov to cr */ val = kvm_register_read(vcpu, reg); trace_kvm_cr_write(cr, val); + + ret = heki_check_cr(vcpu, cr, val); + if (ret) + return ret; + switch (cr) { case 0: err = handle_set_cr0(vcpu, val); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e3eb608b6692..4e6c4c21f12c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8054,11 +8054,86 @@ static unsigned long emulator_get_cr(struct x86_emulate_ctxt *ctxt, int cr) return value; } +#ifdef CONFIG_HEKI + +#define HEKI_ABI_VERSION 1 + +static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr, + unsigned long pin, unsigned long flags) +{ + if (flags) { + if ((flags == KVM_LOCK_CR_UPDATE_VERSION) && !cr && !pin) + return HEKI_ABI_VERSION; + return -KVM_EINVAL; + } + + if (!pin) + return -KVM_EINVAL; + + switch (cr) { + case 0: + /* Cf. arch/x86/kernel/cpu/common.c */ + if (!(pin & X86_CR0_WP)) + return -KVM_EINVAL; + + if ((pin & read_cr0()) != pin) + return -KVM_EINVAL; + + atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr0); + return 0; + case 4: + /* Checks for irrelevant bits. */ + if ((pin & cr4_pinned_mask) != pin) + return -KVM_EINVAL; + + /* Ignores bits not present in host. */ + pin &= __read_cr4(); + atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr4); + return 0; + } + return -KVM_EINVAL; +} + +int heki_check_cr(struct kvm_vcpu *const vcpu, const unsigned long cr, + const unsigned long val) +{ + unsigned long pinned; + + switch (cr) { + case 0: + pinned = atomic_long_read(&vcpu->kvm->heki_pinned_cr0); + if ((val & pinned) != pinned) { + pr_warn_ratelimited( + "heki: Blocked CR0 update: 0x%lx\n", val); + kvm_inject_gp(vcpu, 0); + return 1; + } + return 0; + case 4: + pinned = atomic_long_read(&vcpu->kvm->heki_pinned_cr4); + if ((val & pinned) != pinned) { + pr_warn_ratelimited( + "heki: Blocked CR4 update: 0x%lx\n", val); + kvm_inject_gp(vcpu, 0); + return 1; + } + return 0; + } + return 0; +} +EXPORT_SYMBOL_GPL(heki_check_cr); + +#endif /* CONFIG_HEKI */ + static int emulator_set_cr(struct x86_emulate_ctxt *ctxt, int cr, ulong val) { struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); int res = 0; + res = heki_check_cr(vcpu, cr, val); + if (res) + return res; + switch (cr) { case 0: res = kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); @@ -9918,6 +9993,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) vcpu->arch.complete_userspace_io = complete_hypercall_exit; return 0; } +#ifdef CONFIG_HEKI + case KVM_HC_LOCK_CR_UPDATE: + if (a0 > U32_MAX) { + ret = -KVM_EINVAL; + } else { + ret = heki_lock_cr(vcpu, a0, a1, a2); + } + break; +#endif /* CONFIG_HEKI */ default: ret = -KVM_ENOSYS; break; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 1e7be1f6ab29..193093112b55 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -290,6 +290,26 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk) return !(kvm->arch.disabled_quirks & quirk); } +#ifdef CONFIG_HEKI + +int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr, unsigned long val); + +#else /* CONFIG_HEKI */ + +static inline int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr, + unsigned long val) +{ + return 0; +} + +static inline int heki_lock_cr(struct kvm_vcpu *const vcpu, unsigned long cr, + unsigned long pin) +{ + return 0; +} + +#endif /* CONFIG_HEKI */ + void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); u64 get_kvmclock_ns(struct kvm *kvm); @@ -325,6 +345,8 @@ extern u64 host_xcr0; extern u64 host_xss; extern u64 host_arch_capabilities; +extern const unsigned long cr4_pinned_mask; + extern struct kvm_caps kvm_caps; extern bool enable_pmu; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 687589ce9f63..6864c80ff936 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -835,6 +835,11 @@ struct kvm { bool vm_bugged; bool vm_dead; +#ifdef CONFIG_HEKI + atomic_long_t heki_pinned_cr0; + atomic_long_t heki_pinned_cr4; +#endif /* CONFIG_HEKI */ + #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER struct notifier_block pm_notifier; #endif diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index 960c7e93d1a9..2ed418704603 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -30,6 +30,7 @@ #define KVM_HC_SEND_IPI 10 #define KVM_HC_SCHED_YIELD 11 #define KVM_HC_MAP_GPA_RANGE 12 +#define KVM_HC_LOCK_CR_UPDATE 13 /* * hypercalls use architecture specific From patchwork Mon Nov 13 02:23:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453572 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 088F71B278 for ; Mon, 13 Nov 2023 02:32:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="lCCpGzLo" Received: from smtp-8fac.mail.infomaniak.ch (smtp-8fac.mail.infomaniak.ch [IPv6:2001:1600:4:17::8fac]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E27B18C for ; Sun, 12 Nov 2023 18:32:29 -0800 (PST) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCsy1PsbzMpvSP; Mon, 13 Nov 2023 02:24:06 +0000 (UTC) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCsw4qBmzMpnPj; Mon, 13 Nov 2023 03:24:04 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842246; bh=14gESsZbdf4eGyL7ckyTON/9RCrFW/bTzWgKvh9Y1QM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lCCpGzLomRxznCOR/eHwVydQZR5isgftfHk7I92gfDtAO6CX5RswyL7i2tobhTzAE apr2Ceo33clgylfofcyYQRC+1ikfzHy4nOQYn3bmJlbss06S6fk2V+z1J7nyoxgujZ p/uhukd50aRDJnnIXpcjdg1Yj4GNzxC2Wp5ETI6s= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 03/19] KVM: x86: Add notifications for Heki policy configuration and violation Date: Sun, 12 Nov 2023 21:23:10 -0500 Message-ID: <20231113022326.24388-4-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha Add an interface for user space to be notified about guests' Heki policy and related violations. Extend the KVM_ENABLE_CAP IOCTL with KVM_CAP_HEKI_CONFIGURE and KVM_CAP_HEKI_DENIAL. Each one takes a bitmask as first argument that can contains KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4. The returned value is the bitmask of known Heki exit reasons, for now: KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4. If KVM_CAP_HEKI_CONFIGURE is set, a VM exit will be triggered for each KVM_HC_LOCK_CR_UPDATE hypercalls according to the requested control register. This enables to enlighten the VMM with the guest auto-restrictions. If KVM_CAP_HEKI_DENIAL is set, a VM exit will be triggered for each pinned CR violation. This enables the VMM to react to a policy violation. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Mickaël Salaün --- Changes since v1: * New patch. Making user space aware of Heki properties was requested by Sean Christopherson. --- arch/x86/kvm/vmx/vmx.c | 5 +- arch/x86/kvm/x86.c | 114 +++++++++++++++++++++++++++++++++++---- arch/x86/kvm/x86.h | 7 +-- include/linux/kvm_host.h | 2 + include/uapi/linux/kvm.h | 22 ++++++++ 5 files changed, 136 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index f487bf16dd96..b631b1d7ba30 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5444,6 +5444,7 @@ static int handle_cr(struct kvm_vcpu *vcpu) int reg; int err; int ret; + bool exit = false; exit_qualification = vmx_get_exit_qual(vcpu); cr = exit_qualification & 15; @@ -5453,8 +5454,8 @@ static int handle_cr(struct kvm_vcpu *vcpu) val = kvm_register_read(vcpu, reg); trace_kvm_cr_write(cr, val); - ret = heki_check_cr(vcpu, cr, val); - if (ret) + ret = heki_check_cr(vcpu, cr, val, &exit); + if (exit) return ret; switch (cr) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4e6c4c21f12c..43c28a6953bf 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -119,6 +119,10 @@ static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS; #define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE +#define KVM_HEKI_EXIT_REASON_VALID_MASK ( \ + KVM_HEKI_EXIT_REASON_CR0 | \ + KVM_HEKI_EXIT_REASON_CR4) + #define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \ KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK) @@ -4644,6 +4648,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM)) r |= BIT(KVM_X86_SW_PROTECTED_VM); break; + case KVM_CAP_HEKI_CONFIGURE: + case KVM_CAP_HEKI_DENIAL: + r = KVM_HEKI_EXIT_REASON_VALID_MASK; + break; default: break; } @@ -6518,6 +6526,22 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, } mutex_unlock(&kvm->lock); break; +#ifdef CONFIG_HEKI + case KVM_CAP_HEKI_CONFIGURE: + r = -EINVAL; + if (cap->args[0] & ~KVM_HEKI_EXIT_REASON_VALID_MASK) + break; + kvm->heki_configure_exit_reason = cap->args[0]; + r = 0; + break; + case KVM_CAP_HEKI_DENIAL: + r = -EINVAL; + if (cap->args[0] & ~KVM_HEKI_EXIT_REASON_VALID_MASK) + break; + kvm->heki_denial_exit_reason = cap->args[0]; + r = 0; + break; +#endif default: r = -EINVAL; break; @@ -8056,11 +8080,60 @@ static unsigned long emulator_get_cr(struct x86_emulate_ctxt *ctxt, int cr) #ifdef CONFIG_HEKI +static int complete_heki_configure_exit(struct kvm_vcpu *const vcpu) +{ + kvm_rax_write(vcpu, 0); + ++vcpu->stat.hypercalls; + return kvm_skip_emulated_instruction(vcpu); +} + +static int complete_heki_denial_exit(struct kvm_vcpu *const vcpu) +{ + kvm_inject_gp(vcpu, 0); + return 1; +} + +/* Returns true if the @exit_reason is handled by @vcpu->kvm. */ +static bool heki_exit_cr(struct kvm_vcpu *const vcpu, const __u32 exit_reason, + const u64 heki_reason, unsigned long value) +{ + switch (exit_reason) { + case KVM_EXIT_HEKI_CONFIGURE: + if (!(vcpu->kvm->heki_configure_exit_reason & heki_reason)) + return false; + + vcpu->run->heki_configure.reason = heki_reason; + memset(vcpu->run->heki_configure.reserved, 0, + sizeof(vcpu->run->heki_configure.reserved)); + vcpu->run->heki_configure.cr_pinned = value; + vcpu->arch.complete_userspace_io = complete_heki_configure_exit; + break; + case KVM_EXIT_HEKI_DENIAL: + if (!(vcpu->kvm->heki_denial_exit_reason & heki_reason)) + return false; + + vcpu->run->heki_denial.reason = heki_reason; + memset(vcpu->run->heki_denial.reserved, 0, + sizeof(vcpu->run->heki_denial.reserved)); + vcpu->run->heki_denial.cr_value = value; + vcpu->arch.complete_userspace_io = complete_heki_denial_exit; + break; + default: + WARN_ON_ONCE(1); + return false; + } + + vcpu->run->exit_reason = exit_reason; + return true; +} + #define HEKI_ABI_VERSION 1 static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr, - unsigned long pin, unsigned long flags) + unsigned long pin, unsigned long flags, bool *exit) { + *exit = false; + if (flags) { if ((flags == KVM_LOCK_CR_UPDATE_VERSION) && !cr && !pin) return HEKI_ABI_VERSION; @@ -8080,6 +8153,8 @@ static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr, return -KVM_EINVAL; atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr0); + *exit = heki_exit_cr(vcpu, KVM_EXIT_HEKI_CONFIGURE, + KVM_HEKI_EXIT_REASON_CR0, pin); return 0; case 4: /* Checks for irrelevant bits. */ @@ -8089,24 +8164,37 @@ static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr, /* Ignores bits not present in host. */ pin &= __read_cr4(); atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr4); + *exit = heki_exit_cr(vcpu, KVM_EXIT_HEKI_CONFIGURE, + KVM_HEKI_EXIT_REASON_CR4, pin); return 0; } return -KVM_EINVAL; } +/* + * Sets @exit to true if the caller must exit (i.e. denied access) with the + * returned value: + * - 0 when kvm_run is configured; + * - 1 when there is no user space handler. + */ int heki_check_cr(struct kvm_vcpu *const vcpu, const unsigned long cr, - const unsigned long val) + const unsigned long val, bool *exit) { unsigned long pinned; + *exit = false; + switch (cr) { case 0: pinned = atomic_long_read(&vcpu->kvm->heki_pinned_cr0); if ((val & pinned) != pinned) { pr_warn_ratelimited( "heki: Blocked CR0 update: 0x%lx\n", val); - kvm_inject_gp(vcpu, 0); - return 1; + *exit = true; + if (heki_exit_cr(vcpu, KVM_EXIT_HEKI_DENIAL, + KVM_HEKI_EXIT_REASON_CR0, val)) + return 0; + return complete_heki_denial_exit(vcpu); } return 0; case 4: @@ -8114,8 +8202,11 @@ int heki_check_cr(struct kvm_vcpu *const vcpu, const unsigned long cr, if ((val & pinned) != pinned) { pr_warn_ratelimited( "heki: Blocked CR4 update: 0x%lx\n", val); - kvm_inject_gp(vcpu, 0); - return 1; + *exit = true; + if (heki_exit_cr(vcpu, KVM_EXIT_HEKI_DENIAL, + KVM_HEKI_EXIT_REASON_CR4, val)) + return 0; + return complete_heki_denial_exit(vcpu); } return 0; } @@ -8129,9 +8220,10 @@ static int emulator_set_cr(struct x86_emulate_ctxt *ctxt, int cr, ulong val) { struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); int res = 0; + bool exit = false; - res = heki_check_cr(vcpu, cr, val); - if (res) + res = heki_check_cr(vcpu, cr, val, &exit); + if (exit) return res; switch (cr) { @@ -9998,7 +10090,11 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) if (a0 > U32_MAX) { ret = -KVM_EINVAL; } else { - ret = heki_lock_cr(vcpu, a0, a1, a2); + bool exit = false; + + ret = heki_lock_cr(vcpu, a0, a1, a2, &exit); + if (exit) + return ret; } break; #endif /* CONFIG_HEKI */ diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 193093112b55..f8f5c32bedd9 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -292,18 +292,19 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk) #ifdef CONFIG_HEKI -int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr, unsigned long val); +int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr, unsigned long val, + bool *exit); #else /* CONFIG_HEKI */ static inline int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr, - unsigned long val) + unsigned long val, bool *exit) { return 0; } static inline int heki_lock_cr(struct kvm_vcpu *const vcpu, unsigned long cr, - unsigned long pin) + unsigned long pin, bool *exit) { return 0; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6864c80ff936..ec32af17add8 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -838,6 +838,8 @@ struct kvm { #ifdef CONFIG_HEKI atomic_long_t heki_pinned_cr0; atomic_long_t heki_pinned_cr4; + u64 heki_configure_exit_reason; + u64 heki_denial_exit_reason; #endif /* CONFIG_HEKI */ #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5b5820d19e71..2477b4a16126 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -279,6 +279,8 @@ struct kvm_xen_exit { #define KVM_EXIT_RISCV_CSR 36 #define KVM_EXIT_NOTIFY 37 #define KVM_EXIT_MEMORY_FAULT 38 +#define KVM_EXIT_HEKI_CONFIGURE 39 +#define KVM_EXIT_HEKI_DENIAL 40 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -532,6 +534,24 @@ struct kvm_run { __u64 gpa; __u64 size; } memory_fault; + /* KVM_EXIT_HEKI_CONFIGURE */ + struct { +#define KVM_HEKI_EXIT_REASON_CR0 (1ULL << 0) +#define KVM_HEKI_EXIT_REASON_CR4 (1ULL << 1) + __u64 reason; + union { + __u64 cr_pinned; + __u64 reserved[7]; /* ignored */ + }; + } heki_configure; + /* KVM_EXIT_HEKI_DENIAL */ + struct { + __u64 reason; + union { + __u64 cr_value; + __u64 reserved[7]; /* ignored */ + }; + } heki_denial; /* Fix the size of the union. */ char padding[256]; }; @@ -1219,6 +1239,8 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_MEMORY_ATTRIBUTES 232 #define KVM_CAP_GUEST_MEMFD 233 #define KVM_CAP_VM_TYPES 234 +#define KVM_CAP_HEKI_CONFIGURE 235 +#define KVM_CAP_HEKI_DENIAL 236 #ifdef KVM_CAP_IRQ_ROUTING From patchwork Mon Nov 13 02:23:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453571 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13CC01B295 for ; Mon, 13 Nov 2023 02:32:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="j5ewNw8U" Received: from smtp-8fa8.mail.infomaniak.ch (smtp-8fa8.mail.infomaniak.ch [83.166.143.168]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D30319A3 for ; Sun, 12 Nov 2023 18:32:29 -0800 (PST) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCt22fpHzMpvYh; Mon, 13 Nov 2023 02:24:10 +0000 (UTC) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCt065wzzMpnPj; Mon, 13 Nov 2023 03:24:08 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842250; bh=esVX+RxDYBu/Sf0VJM0uwsUmshvOSuRinJo44ok6d6I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=j5ewNw8U+CG4uLmDWsFmdmmPUPds1RzaRFS3iHpbIedD8bQNFxISwi0x84JEiZJ2I NCDOM4+qkBQaGKfao4tSd+VlnabW4CgD40djzXTwDdf8kQnk1T5bqF0vW0NlssX/dI heNYXTA3TuNjmf/r++0u7uvbsMYZxag0QM+XHVLs= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 04/19] heki: Lock guest control registers at the end of guest kernel init Date: Sun, 12 Nov 2023 21:23:11 -0500 Message-ID: <20231113022326.24388-5-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha The hypervisor needs to provide some functions to support Heki. These form the Heki-Hypervisor API. Define a heki_hypervisor structure to house the API functions. A hypervisor that supports Heki must instantiate a heki_hypervisor structure and pass it to the Heki common code. This allows the common code to access these functions in a hypervisor-agnostic way. The first function that is implemented is lock_crs() (lock control registers). That is, certain flags in the control registers are pinned so that they can never be changed for the lifetime of the guest. Implement Heki support in the guest: - Each supported hypervisor in x86 implements a set of functions for the guest kernel. Add an init_heki() function to that set. This function initializes Heki-related stuff. Call init_heki() for the detected hypervisor in init_hypervisor_platform(). - Implement init_heki() for the guest. - Implement kvm_lock_crs() in the guest to lock down control registers. This function calls a KVM hypercall to do the job. - Instantiate a heki_hypervisor structure that contains a pointer to kvm_lock_crs(). - Pass the heki_hypervisor structure to Heki common code in init_heki(). Implement a heki_late_init() function and call it at the end of kernel init. This function calls lock_crs(). In other words, control registers of a guest are locked down at the end of guest kernel init. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Co-developed-by: Madhavan T. Venkataraman Signed-off-by: Madhavan T. Venkataraman Signed-off-by: Mickaël Salaün --- Changes since v1: * Shrinked the patch to only manage the CR pinning. --- arch/x86/include/asm/x86_init.h | 1 + arch/x86/kernel/cpu/hypervisor.c | 1 + arch/x86/kernel/kvm.c | 56 ++++++++++++++++++++++++++++++++ arch/x86/kvm/Kconfig | 1 + include/linux/heki.h | 22 +++++++++++++ init/main.c | 1 + virt/heki/Kconfig | 9 ++++- virt/heki/main.c | 25 ++++++++++++++ 8 files changed, 115 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 5240d88db52a..ff4dfd2f615e 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -127,6 +127,7 @@ struct x86_hyper_init { bool (*msi_ext_dest_id)(void); void (*init_mem_mapping)(void); void (*init_after_bootmem)(void); + void (*init_heki)(void); }; /** diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c index 553bfbfc3a1b..6085c8129e0c 100644 --- a/arch/x86/kernel/cpu/hypervisor.c +++ b/arch/x86/kernel/cpu/hypervisor.c @@ -106,4 +106,5 @@ void __init init_hypervisor_platform(void) x86_hyper_type = h->type; x86_init.hyper.init_platform(); + x86_init.hyper.init_heki(); } diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index b8ab9ee5896c..8349f4ad3bbd 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -997,6 +998,60 @@ static bool kvm_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs) } #endif +#ifdef CONFIG_HEKI + +extern unsigned long cr4_pinned_mask; + +/* + * TODO: Check SMP policy consistency, e.g. with + * this_cpu_read(cpu_tlbstate.cr4) + */ +static int kvm_lock_crs(void) +{ + unsigned long cr4; + int err; + + err = kvm_hypercall3(KVM_HC_LOCK_CR_UPDATE, 0, X86_CR0_WP, 0); + if (err) + return err; + + cr4 = __read_cr4(); + err = kvm_hypercall3(KVM_HC_LOCK_CR_UPDATE, 4, cr4 & cr4_pinned_mask, + 0); + return err; +} + +static struct heki_hypervisor kvm_heki_hypervisor = { + .lock_crs = kvm_lock_crs, +}; + +static void kvm_init_heki(void) +{ + long err; + + if (!kvm_para_available()) { + /* Cannot make KVM hypercalls. */ + return; + } + + err = kvm_hypercall3(KVM_HC_LOCK_CR_UPDATE, 0, 0, + KVM_LOCK_CR_UPDATE_VERSION); + if (err < 1) { + /* Ignores host not supporting at least the first version. */ + return; + } + + heki.hypervisor = &kvm_heki_hypervisor; +} + +#else /* CONFIG_HEKI */ + +static void kvm_init_heki(void) +{ +} + +#endif /* CONFIG_HEKI */ + const __initconst struct hypervisor_x86 x86_hyper_kvm = { .name = "KVM", .detect = kvm_detect, @@ -1005,6 +1060,7 @@ const __initconst struct hypervisor_x86 x86_hyper_kvm = { .init.x2apic_available = kvm_para_available, .init.msi_ext_dest_id = kvm_msi_ext_dest_id, .init.init_platform = kvm_init_platform, + .init.init_heki = kvm_init_heki, #if defined(CONFIG_AMD_MEM_ENCRYPT) .runtime.sev_es_hcall_prepare = kvm_sev_es_hcall_prepare, .runtime.sev_es_hcall_finish = kvm_sev_es_hcall_finish, diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 8452ed0228cb..7a3b52b7e456 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -49,6 +49,7 @@ config KVM select INTERVAL_TREE select HAVE_KVM_PM_NOTIFIER if PM select KVM_GENERIC_HARDWARE_ENABLING + select HYPERVISOR_SUPPORTS_HEKI help Support hosting fully virtualized guest machines using hardware virtualization extensions. You will need a fairly recent diff --git a/include/linux/heki.h b/include/linux/heki.h index 4c18d2283392..96ccb17657e5 100644 --- a/include/linux/heki.h +++ b/include/linux/heki.h @@ -9,6 +9,7 @@ #define __HEKI_H__ #include +#include #include #include #include @@ -16,15 +17,36 @@ #ifdef CONFIG_HEKI +/* + * A hypervisor that supports Heki will instantiate this structure to + * provide hypervisor specific functions for Heki. + */ +struct heki_hypervisor { + int (*lock_crs)(void); /* Lock control registers. */ +}; + +/* + * If the active hypervisor supports Heki, it will plug its heki_hypervisor + * pointer into this heki structure. + */ +struct heki { + struct heki_hypervisor *hypervisor; +}; + +extern struct heki heki; extern bool heki_enabled; void heki_early_init(void); +void heki_late_init(void); #else /* !CONFIG_HEKI */ static inline void heki_early_init(void) { } +static inline void heki_late_init(void) +{ +} #endif /* CONFIG_HEKI */ diff --git a/init/main.c b/init/main.c index 0d28301c5402..f1c998bbb370 100644 --- a/init/main.c +++ b/init/main.c @@ -1447,6 +1447,7 @@ static int __ref kernel_init(void *unused) exit_boot_config(); free_initmem(); mark_readonly(); + heki_late_init(); /* * Kernel mappings are now finalized - update the userspace page-table diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig index 49695fff6d21..5ea75b595667 100644 --- a/virt/heki/Kconfig +++ b/virt/heki/Kconfig @@ -4,7 +4,7 @@ config HEKI bool "Hypervisor Enforced Kernel Integrity (Heki)" - depends on ARCH_SUPPORTS_HEKI + depends on ARCH_SUPPORTS_HEKI && HYPERVISOR_SUPPORTS_HEKI help This feature enhances guest virtual machine security by taking advantage of security features provided by the hypervisor for guests. @@ -17,3 +17,10 @@ config ARCH_SUPPORTS_HEKI An architecture should select this when it can successfully build and run with CONFIG_HEKI. That is, it should provide all of the architecture support required for the HEKI feature. + +config HYPERVISOR_SUPPORTS_HEKI + bool "Hypervisor support for Heki" + help + A hypervisor should select this when it can successfully build + and run with CONFIG_HEKI. That is, it should provide all of the + hypervisor support required for the Heki feature. diff --git a/virt/heki/main.c b/virt/heki/main.c index f005dd74d586..ff1937e1c946 100644 --- a/virt/heki/main.c +++ b/virt/heki/main.c @@ -10,6 +10,7 @@ #include "common.h" bool heki_enabled __ro_after_init = true; +struct heki heki; /* * Must be called after kmem_cache_init(). @@ -21,6 +22,30 @@ __init void heki_early_init(void) return; } pr_warn("Heki is enabled\n"); + + if (!heki.hypervisor) { + /* This happens for kernels running on bare metal as well. */ + pr_warn("No support for Heki in the active hypervisor\n"); + return; + } + pr_warn("Heki is supported by the active Hypervisor\n"); +} + +/* + * Must be called after mark_readonly(). + */ +void heki_late_init(void) +{ + struct heki_hypervisor *hypervisor = heki.hypervisor; + + if (!heki_enabled || !heki.hypervisor) + return; + + /* Locks control registers so a compromised guest cannot change them. */ + if (WARN_ON(hypervisor->lock_crs())) + return; + + pr_warn("Control registers locked\n"); } static int __init heki_parse_config(char *str) From patchwork Mon Nov 13 02:23:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453550 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F01B7C153 for ; Mon, 13 Nov 2023 02:31:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="fY1T43fQ" Received: from smtp-8faf.mail.infomaniak.ch (smtp-8faf.mail.infomaniak.ch [IPv6:2001:1600:3:17::8faf]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA18ED42 for ; Sun, 12 Nov 2023 18:31:52 -0800 (PST) Received: from smtp-3-0000.mail.infomaniak.ch (unknown [10.4.36.107]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCt645CMzMpyLN; Mon, 13 Nov 2023 02:24:14 +0000 (UTC) Received: from unknown by smtp-3-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCt50b6fz3W; Mon, 13 Nov 2023 03:24:13 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842254; bh=1rCZlaZ8yKqXCX830pyMWcBESjd8hIo3B7pvzokyEX0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fY1T43fQ69r5j3aWJFjq32XSu4yZEZYI2E+LP9K18ORFFEtvxB8OTkhmOFB9B8xuU 0sUkkutgL0KW5fBpSGVrCuj44f5CvbVH1lhuXp8SKoBYGSQSdqzxDgnfSd+jhoS0Th q4yvQzmeGfq3TKnbmej2xh2UjKNCfj2BICLeWDYg= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 05/19] KVM: VMX: Add MBEC support Date: Sun, 12 Nov 2023 21:23:12 -0500 Message-ID: <20231113022326.24388-6-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha This changes add support for VMX_FEATURE_MODE_BASED_EPT_EXEC (named ept_mode_based_exec in /proc/cpuinfo and MBEC elsewhere), which enables to separate EPT execution bits for supervisor vs. user. It transforms the semantic of VMX_EPT_EXECUTABLE_MASK from a global execution to a kernel execution, and use the VMX_EPT_USER_EXECUTABLE_MASK bit to identify user execution. The main use case is to be able to restrict kernel execution while ignoring user space execution from the hypervisor point of view. Indeed, user space execution can already be restricted by the guest kernel. This change enables MBEC but doesn't change the default configuration, which is to allow execution for all guest memory. However, the next commit levages MBEC to restrict kernel memory pages. MBEC can be configured with the new "enable_mbec" module parameter, set to true by default. However, MBEC is disable for L1 and L2 for now. The MMU tracepoints are updated to reflect the difference between kernel and user space executions, see is_executable_pte(). Replace EPT_VIOLATION_RWX_MASK (3 bits) with 4 dedicated EPT_VIOLATION_READ, EPT_VIOLATION_WRITE, EPT_VIOLATION_KERNEL_INSTR, and EPT_VIOLATION_USER_INSTR bits. From the Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3C (System Programming Guide), Part 3: SECONDARY_EXEC_MODE_BASED_EPT_EXEC (bit 22): If either the "unrestricted guest" VM-execution control or the "mode-based execute control for EPT" VM-execution control is 1, the "enable EPT" VM-execution control must also be 1. EPT_VIOLATION_KERNEL_INSTR_BIT (bit 5): The logical-AND of bit 2 in the EPT paging-structure entries used to translate the guest-physical address of the access causing the EPT violation. If the "mode-based execute control for EPT" VM-execution control is 0, this indicates whether the guest-physical address was executable. If that control is 1, this indicates whether the guest-physical address was executable for supervisor-mode linear addresses. EPT_VIOLATION_USER_INSTR_BIT (bit 6): If the "mode-based execute control" VM-execution control is 0, the value of this bit is undefined. If that control is 1, this bit is the logical-AND of bit 10 in the EPT paging-structures entries used to translate the guest-physical address of the access causing the EPT violation. In this case, it indicates whether the guest-physical address was executable for user-mode linear addresses. PT_USER_EXEC_MASK (bit 10): Execute access for user-mode linear addresses. If the "mode-based execute control for EPT" VM-execution control is 1, indicates whether instruction fetches are allowed from user-mode linear addresses in the 512-GByte region controlled by this entry. If that control is 0, this bit is ignored. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Mickaël Salaün --- Changes since v1: * Import the MMU tracepoint changes from the v1's "Enable guests to lock themselves thanks to MBEC" patch. --- arch/x86/include/asm/vmx.h | 11 +++++++++-- arch/x86/kvm/mmu.h | 3 ++- arch/x86/kvm/mmu/mmu.c | 8 ++++++-- arch/x86/kvm/mmu/mmutrace.h | 11 +++++++---- arch/x86/kvm/mmu/paging_tmpl.h | 16 ++++++++++++++-- arch/x86/kvm/mmu/spte.c | 4 +++- arch/x86/kvm/mmu/spte.h | 15 +++++++++++++-- arch/x86/kvm/vmx/capabilities.h | 7 +++++++ arch/x86/kvm/vmx/nested.c | 7 +++++++ arch/x86/kvm/vmx/vmx.c | 29 ++++++++++++++++++++++++++--- arch/x86/kvm/vmx/vmx.h | 1 + 11 files changed, 95 insertions(+), 17 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 0e73616b82f3..7fd390484b36 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -513,6 +513,7 @@ enum vmcs_field { #define VMX_EPT_IPAT_BIT (1ull << 6) #define VMX_EPT_ACCESS_BIT (1ull << 8) #define VMX_EPT_DIRTY_BIT (1ull << 9) +#define VMX_EPT_USER_EXECUTABLE_MASK (1ull << 10) #define VMX_EPT_RWX_MASK (VMX_EPT_READABLE_MASK | \ VMX_EPT_WRITABLE_MASK | \ VMX_EPT_EXECUTABLE_MASK) @@ -558,13 +559,19 @@ enum vm_entry_failure_code { #define EPT_VIOLATION_ACC_READ_BIT 0 #define EPT_VIOLATION_ACC_WRITE_BIT 1 #define EPT_VIOLATION_ACC_INSTR_BIT 2 -#define EPT_VIOLATION_RWX_SHIFT 3 +#define EPT_VIOLATION_READ_BIT 3 +#define EPT_VIOLATION_WRITE_BIT 4 +#define EPT_VIOLATION_KERNEL_INSTR_BIT 5 +#define EPT_VIOLATION_USER_INSTR_BIT 6 #define EPT_VIOLATION_GVA_IS_VALID_BIT 7 #define EPT_VIOLATION_GVA_TRANSLATED_BIT 8 #define EPT_VIOLATION_ACC_READ (1 << EPT_VIOLATION_ACC_READ_BIT) #define EPT_VIOLATION_ACC_WRITE (1 << EPT_VIOLATION_ACC_WRITE_BIT) #define EPT_VIOLATION_ACC_INSTR (1 << EPT_VIOLATION_ACC_INSTR_BIT) -#define EPT_VIOLATION_RWX_MASK (VMX_EPT_RWX_MASK << EPT_VIOLATION_RWX_SHIFT) +#define EPT_VIOLATION_READ (1 << EPT_VIOLATION_READ_BIT) +#define EPT_VIOLATION_WRITE (1 << EPT_VIOLATION_WRITE_BIT) +#define EPT_VIOLATION_KERNEL_INSTR (1 << EPT_VIOLATION_KERNEL_INSTR_BIT) +#define EPT_VIOLATION_USER_INSTR (1 << EPT_VIOLATION_USER_INSTR_BIT) #define EPT_VIOLATION_GVA_IS_VALID (1 << EPT_VIOLATION_GVA_IS_VALID_BIT) #define EPT_VIOLATION_GVA_TRANSLATED (1 << EPT_VIOLATION_GVA_TRANSLATED_BIT) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 253fb2093d5d..65168d3a4e31 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -24,6 +24,7 @@ extern bool __read_mostly enable_mmio_caching; #define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT) #define PT_PAT_MASK (1ULL << 7) #define PT_GLOBAL_MASK (1ULL << 8) +#define PT_USER_EXEC_MASK (1ULL << 10) #define PT64_NX_SHIFT 63 #define PT64_NX_MASK (1ULL << PT64_NX_SHIFT) @@ -102,7 +103,7 @@ static inline u8 kvm_get_shadow_phys_bits(void) void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask); void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask); -void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only); +void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only, bool has_mbec); void kvm_init_mmu(struct kvm_vcpu *vcpu); void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0, diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index baeba8fc1c38..7e053973125c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -29,6 +29,9 @@ #include "cpuid.h" #include "spte.h" +/* Required by paging_tmpl.h for enable_mbec */ +#include "../vmx/capabilities.h" + #include #include #include @@ -3410,7 +3413,7 @@ static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte) { if (fault->exec) - return is_executable_pte(spte); + return is_executable_pte(spte, !fault->user); if (fault->write) return is_writable_pte(spte); @@ -3852,7 +3855,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) */ pm_mask = PT_PRESENT_MASK | shadow_me_value; if (mmu->root_role.level >= PT64_ROOT_4LEVEL) { - pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK; + pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK | + PT_USER_EXEC_MASK; if (WARN_ON_ONCE(!mmu->pml4_root)) { r = -EIO; diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h index ae86820cef69..cb7df95aec25 100644 --- a/arch/x86/kvm/mmu/mmutrace.h +++ b/arch/x86/kvm/mmu/mmutrace.h @@ -342,7 +342,8 @@ TRACE_EVENT( __field(u8, level) /* These depend on page entry type, so compute them now. */ __field(bool, r) - __field(bool, x) + __field(bool, kx) + __field(bool, ux) __field(signed char, u) ), @@ -352,15 +353,17 @@ TRACE_EVENT( __entry->sptep = virt_to_phys(sptep); __entry->level = level; __entry->r = shadow_present_mask || (__entry->spte & PT_PRESENT_MASK); - __entry->x = is_executable_pte(__entry->spte); + __entry->kx = is_executable_pte(__entry->spte, true); + __entry->ux = is_executable_pte(__entry->spte, false); __entry->u = shadow_user_mask ? !!(__entry->spte & shadow_user_mask) : -1; ), - TP_printk("gfn %llx spte %llx (%s%s%s%s) level %d at %llx", + TP_printk("gfn %llx spte %llx (%s%s%s%s%s) level %d at %llx", __entry->gfn, __entry->spte, __entry->r ? "r" : "-", __entry->spte & PT_WRITABLE_MASK ? "w" : "-", - __entry->x ? "x" : "-", + __entry->kx ? "X" : "-", + __entry->ux ? "x" : "-", __entry->u == -1 ? "" : (__entry->u ? "u" : "-"), __entry->level, __entry->sptep ) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index c85255073f67..08f0c8d28245 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -510,8 +510,20 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, * Note, pte_access holds the raw RWX bits from the EPTE, not * ACC_*_MASK flags! */ - vcpu->arch.exit_qualification |= (pte_access & VMX_EPT_RWX_MASK) << - EPT_VIOLATION_RWX_SHIFT; + vcpu->arch.exit_qualification |= + !!(pte_access & VMX_EPT_READABLE_MASK) + << EPT_VIOLATION_READ_BIT; + vcpu->arch.exit_qualification |= + !!(pte_access & VMX_EPT_WRITABLE_MASK) + << EPT_VIOLATION_WRITE_BIT; + vcpu->arch.exit_qualification |= + !!(pte_access & VMX_EPT_EXECUTABLE_MASK) + << EPT_VIOLATION_KERNEL_INSTR_BIT; + if (enable_mbec) { + vcpu->arch.exit_qualification |= + !!(pte_access & VMX_EPT_USER_EXECUTABLE_MASK) + << EPT_VIOLATION_USER_INSTR_BIT; + } } #endif walker->fault.address = addr; diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 4a599130e9c9..386cc1e8aab9 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -422,13 +422,15 @@ void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask) } EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask); -void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only) +void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only, bool has_mbec) { shadow_user_mask = VMX_EPT_READABLE_MASK; shadow_accessed_mask = has_ad_bits ? VMX_EPT_ACCESS_BIT : 0ull; shadow_dirty_mask = has_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull; shadow_nx_mask = 0ull; shadow_x_mask = VMX_EPT_EXECUTABLE_MASK; + if (has_mbec) + shadow_x_mask |= VMX_EPT_USER_EXECUTABLE_MASK; shadow_present_mask = has_exec_only ? 0ull : VMX_EPT_READABLE_MASK; /* * EPT overrides the host MTRRs, and so KVM must program the desired diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index a129951c9a88..2f402f81ee15 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -3,8 +3,11 @@ #ifndef KVM_X86_MMU_SPTE_H #define KVM_X86_MMU_SPTE_H +#include + #include "mmu.h" #include "mmu_internal.h" +#include "../vmx/vmx.h" /* * A MMU present SPTE is backed by actual memory and may or may not be present @@ -320,9 +323,17 @@ static inline bool is_last_spte(u64 pte, int level) return (level == PG_LEVEL_4K) || is_large_pte(pte); } -static inline bool is_executable_pte(u64 spte) +static inline bool is_executable_pte(u64 spte, bool for_kernel_mode) { - return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask; + u64 x_mask = shadow_x_mask; + + if (enable_mbec) { + if (for_kernel_mode) + x_mask &= ~VMX_EPT_USER_EXECUTABLE_MASK; + else + x_mask &= ~VMX_EPT_EXECUTABLE_MASK; + } + return (spte & (x_mask | shadow_nx_mask)) == x_mask; } static inline kvm_pfn_t spte_to_pfn(u64 pte) diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index 41a4533f9989..70bd38645680 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -13,6 +13,7 @@ extern bool __read_mostly enable_vpid; extern bool __read_mostly flexpriority_enabled; extern bool __read_mostly enable_ept; extern bool __read_mostly enable_unrestricted_guest; +extern bool __read_mostly enable_mbec; extern bool __read_mostly enable_ept_ad_bits; extern bool __read_mostly enable_pml; extern bool __read_mostly enable_ipiv; @@ -255,6 +256,12 @@ static inline bool cpu_has_vmx_xsaves(void) SECONDARY_EXEC_ENABLE_XSAVES; } +static inline bool cpu_has_vmx_mbec(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl & + SECONDARY_EXEC_MODE_BASED_EPT_EXEC; +} + static inline bool cpu_has_vmx_waitpkg(void) { return vmcs_config.cpu_based_2nd_exec_ctrl & diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index c5ec0ef51ff7..23a0341729d6 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2324,6 +2324,9 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0 /* VMCS shadowing for L2 is emulated for now */ exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS; + /* MBEC is currently only handled for L0. */ + exec_control &= ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC; + /* * Preset *DT exiting when emulating UMIP, so that vmx_set_cr4() * will not have to rewrite the controls just for this bit. @@ -6863,6 +6866,10 @@ static void nested_vmx_setup_secondary_ctls(u32 ept_caps, { msrs->secondary_ctls_low = 0; + /* + * Currently, SECONDARY_EXEC_MODE_BASED_EPT_EXEC is only handled for + * L0 and doesn't need to be exposed to L1 nor L2. + */ msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl; msrs->secondary_ctls_high &= SECONDARY_EXEC_DESC | diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index b631b1d7ba30..1b1581f578b0 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -94,6 +94,10 @@ bool __read_mostly enable_unrestricted_guest = 1; module_param_named(unrestricted_guest, enable_unrestricted_guest, bool, S_IRUGO); +bool __read_mostly enable_mbec = true; +EXPORT_SYMBOL_GPL(enable_mbec); +module_param_named(mbec, enable_mbec, bool, 0444); + bool __read_mostly enable_ept_ad_bits = 1; module_param_named(eptad, enable_ept_ad_bits, bool, S_IRUGO); @@ -4596,10 +4600,21 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx) exec_control &= ~SECONDARY_EXEC_ENABLE_VPID; if (!enable_ept) { exec_control &= ~SECONDARY_EXEC_ENABLE_EPT; + /* + * From Intel's SDM: + * If either the "unrestricted guest" VM-execution control or + * the "mode-based execute control for EPT" VM-execution + * control is 1, the "enable EPT" VM-execution control must + * also be 1. + */ enable_unrestricted_guest = 0; + enable_mbec = false; } if (!enable_unrestricted_guest) exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST; + if (!enable_mbec) + exec_control &= ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC; + if (kvm_pause_in_guest(vmx->vcpu.kvm)) exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING; if (!kvm_vcpu_apicv_active(vcpu)) @@ -5742,7 +5757,7 @@ static int handle_task_switch(struct kvm_vcpu *vcpu) static int handle_ept_violation(struct kvm_vcpu *vcpu) { - unsigned long exit_qualification; + unsigned long exit_qualification, rwx_mask; gpa_t gpa; u64 error_code; @@ -5772,7 +5787,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR) ? PFERR_FETCH_MASK : 0; /* ept page table entry is present? */ - error_code |= (exit_qualification & EPT_VIOLATION_RWX_MASK) + rwx_mask = EPT_VIOLATION_READ | EPT_VIOLATION_WRITE | + EPT_VIOLATION_KERNEL_INSTR; + if (enable_mbec) + rwx_mask |= EPT_VIOLATION_USER_INSTR; + error_code |= (exit_qualification & rwx_mask) ? PFERR_PRESENT_MASK : 0; error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) != 0 ? @@ -8475,6 +8494,9 @@ static __init int hardware_setup(void) if (!cpu_has_vmx_unrestricted_guest() || !enable_ept) enable_unrestricted_guest = 0; + if (!cpu_has_vmx_mbec() || !enable_ept) + enable_mbec = false; + if (!cpu_has_vmx_flexpriority()) flexpriority_enabled = 0; @@ -8533,7 +8555,8 @@ static __init int hardware_setup(void) if (enable_ept) kvm_mmu_set_ept_masks(enable_ept_ad_bits, - cpu_has_vmx_ept_execute_only()); + cpu_has_vmx_ept_execute_only(), + enable_mbec); /* * Setup shadow_me_value/shadow_me_mask to include MKTME KeyID diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index c2130d2c8e24..882add2412e6 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -572,6 +572,7 @@ static inline u8 vmx_get_rvi(void) SECONDARY_EXEC_ENABLE_VMFUNC | \ SECONDARY_EXEC_BUS_LOCK_DETECTION | \ SECONDARY_EXEC_NOTIFY_VM_EXITING | \ + SECONDARY_EXEC_MODE_BASED_EPT_EXEC | \ SECONDARY_EXEC_ENCLS_EXITING) #define KVM_REQUIRED_VMX_TERTIARY_VM_EXEC_CONTROL 0 From patchwork Mon Nov 13 02:23:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453547 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D0A5BE74 for ; Mon, 13 Nov 2023 02:31:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="C/i6Z8Sv" Received: from smtp-42af.mail.infomaniak.ch (smtp-42af.mail.infomaniak.ch [IPv6:2001:1600:3:17::42af]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37D5410C6 for ; Sun, 12 Nov 2023 18:31:53 -0800 (PST) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCt95GW0zMq1KB; Mon, 13 Nov 2023 02:24:17 +0000 (UTC) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCt839JlzMpnPj; Mon, 13 Nov 2023 03:24:16 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842257; bh=PQNeiuyuaag6rY0/0aZvbY8DcQ+uJkpAZ3YhXfGbU/4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=C/i6Z8Svs7dAZPWHowQNO6lgYqfxZGIKRT4Rv+iVe68+nEKFccASjZlbdqodXOe1Q dwVNuK0bdmdi7h9FMblZ5QvHp8E1G+JUWSQCcbOCYk2XfDmL7p6wvj8iHnqyDwYJcQ tPLUIarjGKkAIHFzNY1thLtfdgMO1Vd24CZIiMHI= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 06/19] KVM: x86: Add kvm_x86_ops.fault_gva() Date: Sun, 12 Nov 2023 21:23:13 -0500 Message-ID: <20231113022326.24388-7-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha This function is needed for kvm_mmu_page_fault() to create synthetic page faults. Code originally written by Mihai Donțu and Nicușor Cîțu: https://lore.kernel.org/r/20211006173113.26445-18-alazar@bitdefender.com Renamed fault_gla() to fault_gva() and use the new EPT_VIOLATION_GVA_IS_VALID. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Co-developed-by: Mihai Donțu Signed-off-by: Mihai Donțu Co-developed-by: Nicușor Cîțu Signed-off-by: Nicușor Cîțu Signed-off-by: Mickaël Salaün --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/svm/svm.c | 9 +++++++++ arch/x86/kvm/vmx/vmx.c | 10 ++++++++++ 4 files changed, 22 insertions(+) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index e3054e3e46d5..ba3db679db2b 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -134,6 +134,7 @@ KVM_X86_OP(msr_filter_changed) KVM_X86_OP(complete_emulated_msr) KVM_X86_OP(vcpu_deliver_sipi_vector) KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); +KVM_X86_OP(fault_gva) #undef KVM_X86_OP #undef KVM_X86_OP_OPTIONAL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index dff10051e9b6..0415dacd4b28 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1750,6 +1750,8 @@ struct kvm_x86_ops { * Returns vCPU specific APICv inhibit reasons */ unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); + + u64 (*fault_gva)(struct kvm_vcpu *vcpu); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index beea99c8e8e0..d32517a2cf9c 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4906,6 +4906,13 @@ static int svm_vm_init(struct kvm *kvm) return 0; } +static u64 svm_fault_gva(struct kvm_vcpu *vcpu) +{ + const struct vcpu_svm *svm = to_svm(vcpu); + + return svm->vcpu.arch.cr2 ? svm->vcpu.arch.cr2 : ~0ull; +} + static struct kvm_x86_ops svm_x86_ops __initdata = { .name = KBUILD_MODNAME, @@ -5037,6 +5044,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, .vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons, + + .fault_gva = svm_fault_gva, }; /* diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 1b1581f578b0..a8158bc1dda9 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8233,6 +8233,14 @@ static void vmx_vm_destroy(struct kvm *kvm) free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); } +static u64 vmx_fault_gva(struct kvm_vcpu *vcpu) +{ + if (vcpu->arch.exit_qualification & EPT_VIOLATION_GVA_IS_VALID) + return vmcs_readl(GUEST_LINEAR_ADDRESS); + + return ~0ull; +} + static struct kvm_x86_ops vmx_x86_ops __initdata = { .name = KBUILD_MODNAME, @@ -8373,6 +8381,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + + .fault_gva = vmx_fault_gva, }; static unsigned int vmx_handle_intel_pt_intr(void) From patchwork Mon Nov 13 02:23:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453548 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FA16BE79 for ; Mon, 13 Nov 2023 02:31:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="y2lDmzRk" Received: from smtp-190d.mail.infomaniak.ch (smtp-190d.mail.infomaniak.ch [IPv6:2001:1600:3:17::190d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B074B18B for ; Sun, 12 Nov 2023 18:31:52 -0800 (PST) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCtF0JdCzMpxDc; Mon, 13 Nov 2023 02:24:21 +0000 (UTC) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtC4TrjzMpnPd; Mon, 13 Nov 2023 03:24:19 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842260; bh=3ZLY/nS0lwLKi/L6VqWKqVBloAFCSA+eB24SBhCzIXI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=y2lDmzRkLKrzf6YPi3D4QKjQPJhPLdTzsvCbaLH8n86bmNl5EAyUw5Vphk2fc4rJw TLfnFGEyP2jqJAbn8DJnwo1XABN8RimqjRgV1bCMpAPRjX8PzxhdSJt9kDm6zHLvGd /ggMXvnRZc6CA01WJiWQxAi6rqcHrhmU955Fp2TY= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 07/19] KVM: x86: Make memory attribute helpers more generic Date: Sun, 12 Nov 2023 21:23:14 -0500 Message-ID: <20231113022326.24388-8-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha To make it useful for other use cases such as Heki, remove the private memory optimizations. I guess we could try to infer the applied attributes to get back these optimizations when it makes sense, but let's keep this simple for now. Main changes: - Replace slots_lock with slots_arch_lock to make it callable from a KVM hypercall. - Move this mutex lock into kvm_vm_ioctl_set_mem_attributes() to make it easier to use with other locks. - Export kvm_vm_set_mem_attributes(). - Remove the kvm_arch_pre_set_memory_attributes() and kvm_arch_post_set_memory_attributes() KVM_MEMORY_ATTRIBUTE_PRIVATE optimizations. Cc: Chao Peng Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Sean Christopherson Cc: Yu Zhang Signed-off-by: Mickaël Salaün --- Changes since v1: * New patch --- arch/x86/kvm/mmu/mmu.c | 23 ----------------------- include/linux/kvm_host.h | 2 ++ virt/kvm/kvm_main.c | 19 ++++++++++--------- 3 files changed, 12 insertions(+), 32 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7e053973125c..4d378d308762 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7251,20 +7251,6 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm) bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, struct kvm_gfn_range *range) { - /* - * Zap SPTEs even if the slot can't be mapped PRIVATE. KVM x86 only - * supports KVM_MEMORY_ATTRIBUTE_PRIVATE, and so it *seems* like KVM - * can simply ignore such slots. But if userspace is making memory - * PRIVATE, then KVM must prevent the guest from accessing the memory - * as shared. And if userspace is making memory SHARED and this point - * is reached, then at least one page within the range was previously - * PRIVATE, i.e. the slot's possible hugepage ranges are changing. - * Zapping SPTEs in this case ensures KVM will reassess whether or not - * a hugepage can be used for affected ranges. - */ - if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm))) - return false; - return kvm_unmap_gfn_range(kvm, range); } @@ -7313,15 +7299,6 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm, lockdep_assert_held_write(&kvm->mmu_lock); lockdep_assert_held(&kvm->slots_lock); - /* - * Calculate which ranges can be mapped with hugepages even if the slot - * can't map memory PRIVATE. KVM mustn't create a SHARED hugepage over - * a range that has PRIVATE GFNs, and conversely converting a range to - * SHARED may now allow hugepages. - */ - if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm))) - return false; - /* * The sequence matters here: upper levels consume the result of lower * level's scanning. diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ec32af17add8..85b8648fd892 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2396,6 +2396,8 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_arch_post_set_memory_attributes(struct kvm *kvm, struct kvm_gfn_range *range); +int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, + unsigned long attributes); static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 23633984142f..0096ccfbb609 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2552,7 +2552,7 @@ static bool kvm_pre_set_memory_attributes(struct kvm *kvm, } /* Set @attributes for the gfn range [@start, @end). */ -static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, +int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, unsigned long attributes) { struct kvm_mmu_notifier_range pre_set_range = { @@ -2577,11 +2577,11 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, entry = attributes ? xa_mk_value(attributes) : NULL; - mutex_lock(&kvm->slots_lock); + lockdep_assert_held(&kvm->slots_arch_lock); /* Nothing to do if the entire range as the desired attributes. */ if (kvm_range_has_memory_attributes(kvm, start, end, attributes)) - goto out_unlock; + return r; /* * Reserve memory ahead of time to avoid having to deal with failures @@ -2590,7 +2590,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, for (i = start; i < end; i++) { r = xa_reserve(&kvm->mem_attr_array, i, GFP_KERNEL_ACCOUNT); if (r) - goto out_unlock; + return r; } kvm_handle_gfn_range(kvm, &pre_set_range); @@ -2602,15 +2602,13 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, } kvm_handle_gfn_range(kvm, &post_set_range); - -out_unlock: - mutex_unlock(&kvm->slots_lock); - return r; } + static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm, struct kvm_memory_attributes *attrs) { + int r; gfn_t start, end; /* flags is currently not used. */ @@ -2633,7 +2631,10 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm, */ BUILD_BUG_ON(sizeof(attrs->attributes) != sizeof(unsigned long)); - return kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes); + mutex_lock(&kvm->slots_arch_lock); + r = kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes); + mutex_unlock(&kvm->slots_arch_lock); + return r; } #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */ From patchwork Mon Nov 13 02:23:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453570 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92BEB18B08 for ; Mon, 13 Nov 2023 02:32:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="vrln9/b4" Received: from smtp-42ac.mail.infomaniak.ch (smtp-42ac.mail.infomaniak.ch [IPv6:2001:1600:4:17::42ac]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D16C10E6 for ; Sun, 12 Nov 2023 18:32:29 -0800 (PST) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCtJ0ZdVzMpvZk; Mon, 13 Nov 2023 02:24:24 +0000 (UTC) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtG6J2GzMpnPd; Mon, 13 Nov 2023 03:24:22 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842263; bh=oVoPkAWjPfqZGemyyPZ53MkaSuuZZCG0BkbPtJbgAiE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vrln9/b46q/IyvbLiM7gr/pjSy6nNAlaXyCcWrkwCHL76184CKYv4PInqVYTmM3XU usQOOb2erA5xMR73/PzeYEi/AVI+wGVine2Q4AcSnr9lSM8X4meSz+MPB7uM5eZm7x SS5OjADjVFHRajKr3cVBhD/5XC7T3gtprqTK5mks= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 08/19] KVM: x86: Extend kvm_vm_set_mem_attributes() with a mask Date: Sun, 12 Nov 2023 21:23:15 -0500 Message-ID: <20231113022326.24388-9-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha Enable to only update a subset of attributes. This is needed to be able to use the XArray for different use cases and make sure they don't interfere (see a following commit). Cc: Chao Peng Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Sean Christopherson Cc: Yu Zhang Signed-off-by: Mickaël Salaün --- Changes since v1: * New patch --- arch/x86/kvm/mmu/mmu.c | 2 +- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 27 +++++++++++++++++++-------- 3 files changed, 21 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4d378d308762..d7010e09440d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7283,7 +7283,7 @@ static bool hugepage_has_attrs(struct kvm *kvm, struct kvm_memory_slot *slot, for (gfn = start; gfn < end; gfn += KVM_PAGES_PER_HPAGE(level - 1)) { if (hugepage_test_mixed(slot, gfn, level - 1) || - attrs != kvm_get_memory_attributes(kvm, gfn)) + !(attrs & kvm_get_memory_attributes(kvm, gfn))) return false; } return true; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 85b8648fd892..de68390ab0f2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2397,7 +2397,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, bool kvm_arch_post_set_memory_attributes(struct kvm *kvm, struct kvm_gfn_range *range); int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, - unsigned long attributes); + unsigned long attributes, unsigned long mask); static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0096ccfbb609..e2c178db17d5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2436,7 +2436,7 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES /* * Returns true if _all_ gfns in the range [@start, @end) have attributes - * matching @attrs. + * matching the @attrs bitmask. */ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, unsigned long attrs) @@ -2459,7 +2459,8 @@ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, entry = xas_next(&xas); } while (xas_retry(&xas, entry)); - if (xas.xa_index != index || xa_to_value(entry) != attrs) { + if (xas.xa_index != index || + (xa_to_value(entry) & attrs) != attrs) { has_attrs = false; break; } @@ -2553,7 +2554,7 @@ static bool kvm_pre_set_memory_attributes(struct kvm *kvm, /* Set @attributes for the gfn range [@start, @end). */ int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, - unsigned long attributes) + unsigned long attributes, unsigned long mask) { struct kvm_mmu_notifier_range pre_set_range = { .start = start, @@ -2572,11 +2573,8 @@ int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, .may_block = true, }; unsigned long i; - void *entry; int r = 0; - entry = attributes ? xa_mk_value(attributes) : NULL; - lockdep_assert_held(&kvm->slots_arch_lock); /* Nothing to do if the entire range as the desired attributes. */ @@ -2596,6 +2594,16 @@ int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, kvm_handle_gfn_range(kvm, &pre_set_range); for (i = start; i < end; i++) { + unsigned long value = 0; + void *entry; + + entry = xa_load(&kvm->mem_attr_array, i); + if (xa_is_value(entry)) + value = xa_to_value(entry) & ~mask; + + value |= attributes & mask; + entry = value ? xa_mk_value(value) : NULL; + r = xa_err(xa_store(&kvm->mem_attr_array, i, entry, GFP_KERNEL_ACCOUNT)); KVM_BUG_ON(r, kvm); @@ -2609,12 +2617,14 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm, struct kvm_memory_attributes *attrs) { int r; + unsigned long attrs_mask; gfn_t start, end; /* flags is currently not used. */ if (attrs->flags) return -EINVAL; - if (attrs->attributes & ~kvm_supported_mem_attributes(kvm)) + attrs_mask = kvm_supported_mem_attributes(kvm); + if (attrs->attributes & ~attrs_mask) return -EINVAL; if (attrs->size == 0 || attrs->address + attrs->size < attrs->address) return -EINVAL; @@ -2632,7 +2642,8 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm, BUILD_BUG_ON(sizeof(attrs->attributes) != sizeof(unsigned long)); mutex_lock(&kvm->slots_arch_lock); - r = kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes); + r = kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes, + attrs_mask); mutex_unlock(&kvm->slots_arch_lock); return r; } From patchwork Mon Nov 13 02:23:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453573 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FB181BDF6 for ; Mon, 13 Nov 2023 02:32:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="naLeBV79" Received: from smtp-8fa8.mail.infomaniak.ch (smtp-8fa8.mail.infomaniak.ch [83.166.143.168]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB3951BDA for ; Sun, 12 Nov 2023 18:32:29 -0800 (PST) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCtM4fDdzMpvbm; Mon, 13 Nov 2023 02:24:27 +0000 (UTC) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtL2gRXzMpnPj; Mon, 13 Nov 2023 03:24:26 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842267; bh=ch9T7FXLfTJgFWbno30mYt9y51RERoKANoOanV/XFpA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=naLeBV79e37SdDz+D/dS7uyrmwvygGCWlJc+XJuRQzMv5XfDFnAdIMM6Uid+A4GTN RAhlLTNQ/RTX6yjhoengNgNRl5DnbJYrC4WmjoGlHXKyNa50wVn5zQJnJq4Mtvmw+J D2bNNNOJvWidC6klDm91oW1hugsA6v5+ManBoxf4= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 09/19] KVM: x86: Extend kvm_range_has_memory_attributes() with match_all Date: Sun, 12 Nov 2023 21:23:16 -0500 Message-ID: <20231113022326.24388-10-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha This enables to check if an attribute is tied to any memory page in a range. This will be useful in a folling commit to check for KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE. Cc: Chao Peng Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Sean Christopherson Cc: Yu Zhang Signed-off-by: Mickaël Salaün --- Changes since v1: * New patch --- arch/x86/kvm/mmu/mmu.c | 2 +- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 27 ++++++++++++++++++--------- 3 files changed, 20 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d7010e09440d..2024ff21d036 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7279,7 +7279,7 @@ static bool hugepage_has_attrs(struct kvm *kvm, struct kvm_memory_slot *slot, const unsigned long end = start + KVM_PAGES_PER_HPAGE(level); if (level == PG_LEVEL_2M) - return kvm_range_has_memory_attributes(kvm, start, end, attrs); + return kvm_range_has_memory_attributes(kvm, start, end, attrs, true); for (gfn = start; gfn < end; gfn += KVM_PAGES_PER_HPAGE(level - 1)) { if (hugepage_test_mixed(slot, gfn, level - 1) || diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index de68390ab0f2..9ecb016a336f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2391,7 +2391,7 @@ static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn } bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, - unsigned long attrs); + unsigned long attrs, bool match_all); bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_arch_post_set_memory_attributes(struct kvm *kvm, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e2c178db17d5..67dbaaf40c1c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2435,11 +2435,11 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES /* - * Returns true if _all_ gfns in the range [@start, @end) have attributes - * matching the @attrs bitmask. + * According to @match_all, returns true if _all_ (respectively _any_) gfns in + * the range [@start, @end) have attributes matching the @attrs bitmask. */ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, - unsigned long attrs) + unsigned long attrs, bool match_all) { XA_STATE(xas, &kvm->mem_attr_array, start); unsigned long index; @@ -2453,16 +2453,25 @@ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, goto out; } - has_attrs = true; + has_attrs = match_all; for (index = start; index < end; index++) { do { entry = xas_next(&xas); } while (xas_retry(&xas, entry)); - if (xas.xa_index != index || - (xa_to_value(entry) & attrs) != attrs) { - has_attrs = false; - break; + if (match_all) { + if (xas.xa_index != index || + (xa_to_value(entry) & attrs) != attrs) { + has_attrs = false; + break; + } + } else { + index = xas.xa_index; + if (index < end && + (xa_to_value(entry) & attrs) == attrs) { + has_attrs = true; + break; + } } } @@ -2578,7 +2587,7 @@ int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, lockdep_assert_held(&kvm->slots_arch_lock); /* Nothing to do if the entire range as the desired attributes. */ - if (kvm_range_has_memory_attributes(kvm, start, end, attributes)) + if (kvm_range_has_memory_attributes(kvm, start, end, attributes, true)) return r; /* From patchwork Mon Nov 13 02:23:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453549 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F3AFC144 for ; Mon, 13 Nov 2023 02:31:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="oy65yEMw" Received: from smtp-8fa9.mail.infomaniak.ch (smtp-8fa9.mail.infomaniak.ch [IPv6:2001:1600:3:17::8fa9]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9EB818C for ; Sun, 12 Nov 2023 18:31:52 -0800 (PST) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCtR28lmzMq1pb; Mon, 13 Nov 2023 02:24:31 +0000 (UTC) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtP6n2FzMpnPd; Mon, 13 Nov 2023 03:24:29 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842271; bh=HO3RcULwQ802eeIPFZkXgFQnmnXm7kA4hgpvD90H0yM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oy65yEMwFBZXkzY8/GbCSCY015c7FsKYAOxiyiLLg876EDse7qPjjQ2TawXHbDdpo RqYXUhhMZo8wrcdUoNJcKni3rzSjF0HzYO1MLe/71b+NvJhrkaU1fgwZGfleYIbVsx 7CZiYxZUd35oWxWwZ9c+J67orF3hA2vwNYlS1LAw= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions Date: Sun, 12 Nov 2023 21:23:17 -0500 Message-ID: <20231113022326.24388-11-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha Define memory attributes that can be associated with guest physical pages in KVM. To begin with, define permissions as memory attributes (READ, WRITE and EXECUTE), and the IMMUTABLE property. In the future, other attributes could be defined. Use the memory attribute feature to implement the following functions in KVM: - kvm_permissions_set(): Set the permissions for a guest page in the memory attribute XArray. - kvm_permissions_get(): Retrieve the permissions associated with a guest page in same XArray. These functions will be called in a following commit to associate proper permissions with guest pages instead of RWX for all the pages. Add 4 new memory attributes, private to the KVM implementation: - KVM_MEMORY_ATTRIBUTE_HEKI_READ - KVM_MEMORY_ATTRIBUTE_HEKI_WRITE - KVM_MEMORY_ATTRIBUTE_HEKI_EXEC - KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Mickaël Salaün Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Co-developed-by: Madhavan T. Venkataraman Signed-off-by: Madhavan T. Venkataraman Signed-off-by: Mickaël Salaün --- Changes since v1: * New patch replacing the deprecated page tracking mechanism. * Add new files: virt/lib/kvm_permissions.c and include/linux/kvm_mem_attr.h * Add new kvm_permissions_get() and kvm_permissions_set() leveraging the to-be-upstream memory attributes for KVM. * Introduce the KVM_MEMORY_ATTRIBUTE_HEKI_* values. --- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/Makefile | 4 +- include/linux/kvm_mem_attr.h | 32 +++++++++++ include/uapi/linux/kvm.h | 5 ++ virt/heki/Kconfig | 1 + virt/lib/kvm_permissions.c | 104 +++++++++++++++++++++++++++++++++++ 6 files changed, 146 insertions(+), 1 deletion(-) create mode 100644 include/linux/kvm_mem_attr.h create mode 100644 virt/lib/kvm_permissions.c diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 7a3b52b7e456..ea6d73241632 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -50,6 +50,7 @@ config KVM select HAVE_KVM_PM_NOTIFIER if PM select KVM_GENERIC_HARDWARE_ENABLING select HYPERVISOR_SUPPORTS_HEKI + select SPARSEMEM help Support hosting fully virtualized guest machines using hardware virtualization extensions. You will need a fairly recent diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 80e3fe184d17..aac51a5d2cae 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -9,10 +9,12 @@ endif include $(srctree)/virt/kvm/Makefile.kvm +VIRT_LIB = ../../../virt/lib + kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \ - mmu/spte.o + mmu/spte.o $(VIRT_LIB)/kvm_permissions.o ifdef CONFIG_HYPERV kvm-y += kvm_onhyperv.o diff --git a/include/linux/kvm_mem_attr.h b/include/linux/kvm_mem_attr.h new file mode 100644 index 000000000000..0a755025e553 --- /dev/null +++ b/include/linux/kvm_mem_attr.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * KVM guest page permissions - Definitions. + * + * Copyright © 2023 Microsoft Corporation. + */ +#ifndef __KVM_MEM_ATTR_H__ +#define __KVM_MEM_ATTR_H__ + +#include +#include + +/* clang-format off */ + +#define MEM_ATTR_READ BIT(0) +#define MEM_ATTR_WRITE BIT(1) +#define MEM_ATTR_EXEC BIT(2) +#define MEM_ATTR_IMMUTABLE BIT(3) + +#define MEM_ATTR_PROT ( \ + MEM_ATTR_READ | \ + MEM_ATTR_WRITE | \ + MEM_ATTR_EXEC | \ + MEM_ATTR_IMMUTABLE) + +/* clang-format on */ + +int kvm_permissions_set(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end, + unsigned long heki_attr); +unsigned long kvm_permissions_get(struct kvm *kvm, gfn_t gfn); + +#endif /* __KVM_MEM_ATTR_H__ */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2477b4a16126..2b5b90216565 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -2319,6 +2319,11 @@ struct kvm_memory_attributes { #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) +#define KVM_MEMORY_ATTRIBUTE_HEKI_READ (1ULL << 4) +#define KVM_MEMORY_ATTRIBUTE_HEKI_WRITE (1ULL << 5) +#define KVM_MEMORY_ATTRIBUTE_HEKI_EXEC (1ULL << 6) +#define KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE (1ULL << 7) + #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd) struct kvm_create_guest_memfd { diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig index 5ea75b595667..75a784653e31 100644 --- a/virt/heki/Kconfig +++ b/virt/heki/Kconfig @@ -5,6 +5,7 @@ config HEKI bool "Hypervisor Enforced Kernel Integrity (Heki)" depends on ARCH_SUPPORTS_HEKI && HYPERVISOR_SUPPORTS_HEKI + select KVM_GENERIC_MEMORY_ATTRIBUTES help This feature enhances guest virtual machine security by taking advantage of security features provided by the hypervisor for guests. diff --git a/virt/lib/kvm_permissions.c b/virt/lib/kvm_permissions.c new file mode 100644 index 000000000000..9f4e8027d21c --- /dev/null +++ b/virt/lib/kvm_permissions.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * KVM guest page permissions - functions. + * + * Copyright © 2023 Microsoft Corporation. + */ +#include +#include + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) "kvm: heki: " fmt + +/* clang-format off */ + +static unsigned long kvm_default_permissions = + MEM_ATTR_READ | + MEM_ATTR_WRITE | + MEM_ATTR_EXEC; + +static unsigned long kvm_memory_attributes_heki = + KVM_MEMORY_ATTRIBUTE_HEKI_READ | + KVM_MEMORY_ATTRIBUTE_HEKI_WRITE | + KVM_MEMORY_ATTRIBUTE_HEKI_EXEC | + KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE; + +/* clang-format on */ + +static unsigned long heki_attr_to_kvm_attr(unsigned long heki_attr) +{ + unsigned long kvm_attr = 0; + + if (WARN_ON_ONCE((heki_attr | MEM_ATTR_PROT) != MEM_ATTR_PROT)) + return 0; + + if (heki_attr & MEM_ATTR_READ) + kvm_attr |= KVM_MEMORY_ATTRIBUTE_HEKI_READ; + if (heki_attr & MEM_ATTR_WRITE) + kvm_attr |= KVM_MEMORY_ATTRIBUTE_HEKI_WRITE; + if (heki_attr & MEM_ATTR_EXEC) + kvm_attr |= KVM_MEMORY_ATTRIBUTE_HEKI_EXEC; + if (heki_attr & MEM_ATTR_IMMUTABLE) + kvm_attr |= KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE; + return kvm_attr; +} + +static unsigned long kvm_attr_to_heki_attr(unsigned long kvm_attr) +{ + unsigned long heki_attr = 0; + + if (kvm_attr & KVM_MEMORY_ATTRIBUTE_HEKI_READ) + heki_attr |= MEM_ATTR_READ; + if (kvm_attr & KVM_MEMORY_ATTRIBUTE_HEKI_WRITE) + heki_attr |= MEM_ATTR_WRITE; + if (kvm_attr & KVM_MEMORY_ATTRIBUTE_HEKI_EXEC) + heki_attr |= MEM_ATTR_EXEC; + if (kvm_attr & KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE) + heki_attr |= MEM_ATTR_IMMUTABLE; + return heki_attr; +} + +unsigned long kvm_permissions_get(struct kvm *kvm, gfn_t gfn) +{ + unsigned long kvm_attr = 0; + + /* + * Retrieve the permissions for a guest page. If not present (i.e., no + * attribute), then return default permissions (RWX). This means + * setting permissions to 0 resets them to RWX. We might want to + * revisit that in a future version. + */ + kvm_attr = kvm_get_memory_attributes(kvm, gfn); + if (kvm_attr) + return kvm_attr_to_heki_attr(kvm_attr); + else + return kvm_default_permissions; +} +EXPORT_SYMBOL_GPL(kvm_permissions_get); + +int kvm_permissions_set(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end, + unsigned long heki_attr) +{ + if ((heki_attr | MEM_ATTR_PROT) != MEM_ATTR_PROT) + return -EINVAL; + + if (gfn_end <= gfn_start) + return -EINVAL; + + if (kvm_range_has_memory_attributes(kvm, gfn_start, gfn_end, + KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE, + false)) { + pr_warn_ratelimited( + "Guest tried to change immutable permission for GFNs %llx-%llx\n", + gfn_start, gfn_end); + return -EPERM; + } + + return kvm_vm_set_mem_attributes(kvm, gfn_start, gfn_end, + heki_attr_to_kvm_attr(heki_attr), + kvm_memory_attributes_heki); +} +EXPORT_SYMBOL_GPL(kvm_permissions_set); From patchwork Mon Nov 13 02:23:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453546 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 425D88C11 for ; Mon, 13 Nov 2023 02:31:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="Wqdj2+CU" X-Greylist: delayed 426 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Sun, 12 Nov 2023 18:31:51 PST Received: from smtp-1909.mail.infomaniak.ch (smtp-1909.mail.infomaniak.ch [185.125.25.9]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2A2310C for ; Sun, 12 Nov 2023 18:31:51 -0800 (PST) Received: from smtp-3-0000.mail.infomaniak.ch (unknown [10.4.36.107]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCtW49lkzMq2Gn; Mon, 13 Nov 2023 02:24:35 +0000 (UTC) Received: from unknown by smtp-3-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtV157dz3W; Mon, 13 Nov 2023 03:24:34 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842275; bh=o9nqKbiugyVsavDfBV6aqCgZeoXT/7P+XJT3zGIQ/Fs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Wqdj2+CUV44d4HuZMwuSPZXcGHSanzUo3Ulx4D8YehMH9U5vcbFl7jLW7UcFqNia4 y2LVudO16hFYsUqLi3Nu+P7n0Bmi1mQVJg9RF5u2Z1azQ1wphiWgO5ISohde0JzGD5 Oin1vdzYOHYknvXN+6DvT1FtaqYxe8GF44LgGvKQ= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 11/19] KVM: x86: Add new hypercall to set EPT permissions Date: Sun, 12 Nov 2023 21:23:18 -0500 Message-ID: <20231113022326.24388-12-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha From: Madhavan T. Venkataraman Add a new KVM_HC_PROTECT_MEMORY hypercall that enables a guest to set EPT permissions for guest pages. Until now, all of the guest pages (except Page Tracked pages) are given RWX permissions in the EPT. In Heki, we want to restrict the permissions to what is strictly needed. For instance, a text page only needs R_X. A read-only data page only needs R__. A normal data page only needs RW_. The guest will pass a page list to the hypercall. The page list is a list of one or more physical pages each of which contains a array of guest ranges and attributes. Currently, the attributes only contain permissions. In the future, other attributes may be added. The hypervisor will apply the specified permissions in the EPT. When a guest try to access its memory in a way which is not allowed, KVM creates a synthetic kernel page fault. This fault should be handled by the guest, which is not currently the case, making it try again and again. This will be part of a follow-up patch series. When enabled, KASAN reveals a bug in the memory attributes patches. We didn't find the source of this issue yet. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Co-developed-by: Mickaël Salaün Signed-off-by: Mickaël Salaün Signed-off-by: Madhavan T. Venkataraman --- Changes since v1: The original hypercall contained support for statically defined sections (text, rodata, etc). It has been redesigned like this: - The previous version accepted an array of physically contiguous ranges. This is appropriate for statically defined sections which are loaded in contiguous memory. But, for other cases like module loading, the pages would be discontiguous. The current version of the hypercall accepts a page list to fix this. - The previous version passed permission combinations. E.g., HEKI_MEM_ATTR_EXEC would imply R_X. The current version passes permissions as memory attributes and each of the permissions must be separately specified. E.g., for text, (MEM_ATTR_READ | MEM_ATTR_EXEC) must be passed. - The previous version locked down the permissions for guest pages so that once the permissions are set, they cannot be changed. In this version, permissions can be changed dynamically, except when the MEM_ATTR_IMMUTABLE is set. So, the hypercall has been renamed from KVM_HC_LOCK_MEM_PAGE_RANGES to KVM_HC_PROTECT_MEMORY. The dynamic setting of permissions is needed by the following features (probably not a complete list): - Kprobes and Optprobes - Static call optimization - Jump Label optimization - Ftrace and Livepatch - Module loading and unloading - eBPF JIT - Kexec - Kgdb Examples: - A text page can be made writable very briefly to install a probe or a trace. - eBPF JIT can populate a writable page with code and make it read-execute. - Module load can load read-only data into a writable page and make the page read-only. - When pages are unmapped, their permissions in the EPT must revert to read-write. --- Documentation/virt/kvm/x86/hypercalls.rst | 14 +++ arch/x86/kvm/mmu/mmu.c | 77 +++++++++++++ arch/x86/kvm/mmu/paging_tmpl.h | 3 + arch/x86/kvm/mmu/spte.c | 15 ++- arch/x86/kvm/x86.c | 130 ++++++++++++++++++++++ include/linux/heki.h | 29 +++++ include/uapi/linux/kvm_para.h | 1 + 7 files changed, 267 insertions(+), 2 deletions(-) diff --git a/Documentation/virt/kvm/x86/hypercalls.rst b/Documentation/virt/kvm/x86/hypercalls.rst index 3178576f4c47..28865d111773 100644 --- a/Documentation/virt/kvm/x86/hypercalls.rst +++ b/Documentation/virt/kvm/x86/hypercalls.rst @@ -207,3 +207,17 @@ The hypercall lets a guest request control register flags to be pinned for itself. Returns 0 on success or a KVM error code otherwise. + +10. KVM_HC_PROTECT_MEMORY +------------------------- + +:Architecture: x86 +:Status: active +:Purpose: Request permissions to be set in EPT + +- a0: physical address of a struct heki_page_list + +The hypercall lets a guest request memory permissions to be set for a list +of physical pages. + +Returns 0 on success or a KVM error code otherwise. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2024ff21d036..2d09bcc35462 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -47,9 +47,11 @@ #include #include #include +#include #include #include #include +#include #include #include @@ -4446,6 +4448,75 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, mmu_invalidate_retry_gfn(vcpu->kvm, fault->mmu_seq, fault->gfn); } +static bool mem_attr_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +{ + unsigned long perm; + bool noexec, nowrite; + + if (unlikely(fault->rsvd)) + return false; + + if (!fault->present) + return false; + + perm = kvm_permissions_get(vcpu->kvm, fault->gfn); + noexec = !(perm & MEM_ATTR_EXEC); + nowrite = !(perm & MEM_ATTR_WRITE); + + if (fault->exec && noexec) { + struct x86_exception exception = { + .vector = PF_VECTOR, + .error_code_valid = true, + .error_code = fault->error_code, + .nested_page_fault = false, + /* + * TODO: This kind of kernel page fault needs to be + * handled by the guest, which is not currently the + * case, making it try again and again. + * + * You may want to test with cr2_or_gva to see the page + * fault caught by the guest kernel (thinking it is a + * user space fault). + */ + .address = static_call(kvm_x86_fault_gva)(vcpu), + .async_page_fault = false, + }; + + pr_warn_ratelimited( + "heki: Creating fetch #PF at 0x%016llx GFN=%llx\n", + exception.address, fault->gfn); + kvm_inject_page_fault(vcpu, &exception); + return true; + } + + if (fault->write && nowrite) { + struct x86_exception exception = { + .vector = PF_VECTOR, + .error_code_valid = true, + .error_code = fault->error_code, + .nested_page_fault = false, + /* + * TODO: This kind of kernel page fault needs to be + * handled by the guest, which is not currently the + * case, making it try again and again. + * + * You may want to test with cr2_or_gva to see the page + * fault caught by the guest kernel (thinking it is a + * user space fault). + */ + .address = static_call(kvm_x86_fault_gva)(vcpu), + .async_page_fault = false, + }; + + pr_warn_ratelimited( + "heki: Creating write #PF at 0x%016llx GFN=%llx\n", + exception.address, fault->gfn); + kvm_inject_page_fault(vcpu, &exception); + return true; + } + return false; +} + static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { int r; @@ -4457,6 +4528,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (page_fault_handle_page_track(vcpu, fault)) return RET_PF_EMULATE; + if (mem_attr_fault(vcpu, fault)) + return RET_PF_RETRY; + r = fast_page_fault(vcpu, fault); if (r != RET_PF_INVALID) return r; @@ -4537,6 +4611,9 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, if (page_fault_handle_page_track(vcpu, fault)) return RET_PF_EMULATE; + if (mem_attr_fault(vcpu, fault)) + return RET_PF_RETRY; + r = fast_page_fault(vcpu, fault); if (r != RET_PF_INVALID) return r; diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 08f0c8d28245..49e8295d62dd 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -820,6 +820,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return RET_PF_EMULATE; } + if (mem_attr_fault(vcpu, fault)) + return RET_PF_RETRY; + r = mmu_topup_memory_caches(vcpu, true); if (r) return r; diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 386cc1e8aab9..d72dc149424c 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -10,6 +10,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include +#include #include "mmu.h" #include "mmu_internal.h" #include "x86.h" @@ -143,6 +144,11 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int level = sp->role.level; u64 spte = SPTE_MMU_PRESENT_MASK; bool wrprot = false; + unsigned long perm; + + perm = kvm_permissions_get(vcpu->kvm, gfn); + if (!(perm & MEM_ATTR_WRITE)) + pte_access &= ~ACC_WRITE_MASK; WARN_ON_ONCE(!pte_access && !shadow_present_mask); @@ -178,10 +184,15 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, pte_access &= ~ACC_EXEC_MASK; } - if (pte_access & ACC_EXEC_MASK) + if (pte_access & ACC_EXEC_MASK) { spte |= shadow_x_mask; - else +#ifdef CONFIG_HEKI + if (enable_mbec && !(perm & MEM_ATTR_EXEC)) + spte &= ~VMX_EPT_EXECUTABLE_MASK; +#endif + } else { spte |= shadow_nx_mask; + } if (pte_access & ACC_USER_MASK) spte |= shadow_user_mask; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 43c28a6953bf..44f94b75ff16 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -62,6 +62,8 @@ #include #include #include +#include +#include #include #include @@ -9983,6 +9985,131 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id) return; } +#ifdef CONFIG_HEKI + +static int heki_protect_memory(struct kvm *const kvm, gpa_t list_pa) +{ + struct heki_page_list *list, *head; + struct heki_pages *pages; + size_t size; + int i, npages, err = 0; + + /* Read in the page list. */ + head = NULL; + npages = 0; + while (list_pa) { + list = kmalloc(PAGE_SIZE, GFP_KERNEL); + if (!list) { + /* For want of a better error number. */ + err = -KVM_E2BIG; + goto free; + } + + err = kvm_read_guest(kvm, list_pa, list, sizeof(*list)); + if (err) { + pr_warn("heki: Can't read list %llx\n", list_pa); + err = -KVM_EFAULT; + goto free; + } + list_pa += sizeof(*list); + + size = list->npages * sizeof(*pages); + pages = list->pages; + err = kvm_read_guest(kvm, list_pa, pages, size); + if (err) { + pr_warn("heki: Can't read pages %llx\n", list_pa); + err = -KVM_EFAULT; + goto free; + } + + list->next = head; + head = list; + npages += list->npages; + list_pa = list->next_pa; + } + + /* For kvm_permissions_set() -> kvm_vm_set_mem_attributes() */ + mutex_lock(&kvm->slots_arch_lock); + + /* + * Walk the page list, apply the permissions for each guest page and + * zap the EPT entry of each page. The pages will be faulted in on + * demand and the correct permissions will be applied at the correct + * level for the pages. + */ + for (list = head; list; list = list->next) { + pages = list->pages; + + for (i = 0; i < list->npages; i++) { + gfn_t gfn_start, gfn_end; + unsigned long permissions; + + if (!PAGE_ALIGNED(pages[i].pa)) { + pr_warn("heki: GPA not aligned: %llx\n", + pages[i].pa); + err = -KVM_EINVAL; + goto unlock; + } + if (!PAGE_ALIGNED(pages[i].epa)) { + pr_warn("heki: GPA not aligned: %llx\n", + pages[i].epa); + err = -KVM_EINVAL; + goto unlock; + } + + gfn_start = gpa_to_gfn(pages[i].pa); + gfn_end = gpa_to_gfn(pages[i].epa); + permissions = pages[i].permissions; + + if (!permissions || (permissions & ~MEM_ATTR_PROT)) { + err = -KVM_EINVAL; + goto unlock; + } + + if (!(permissions & MEM_ATTR_EXEC) && !enable_mbec) { + /* + * Guests can check for MBEC support to avoid + * this error message. We will continue + * applying restrictions partially. + */ + pr_warn("heki: Clearing kernel exec " + "depends on MBEC, which is disabled."); + permissions |= MEM_ATTR_EXEC; + } + + pr_warn("heki: Request to protect GFNs %llx-%llx" + " with %s permissions=%s%s%s\n", + gfn_start, gfn_end, + (permissions & MEM_ATTR_IMMUTABLE) ? + "immutable" : + "mutable", + (permissions & MEM_ATTR_READ) ? "r" : "_", + (permissions & MEM_ATTR_WRITE) ? "w" : "_", + (permissions & MEM_ATTR_EXEC) ? "x" : "_"); + + err = kvm_permissions_set(kvm, gfn_start, gfn_end, + permissions); + if (err) { + pr_warn("heki: Failed to set permissions\n"); + goto unlock; + } + } + } + +unlock: + mutex_unlock(&kvm->slots_arch_lock); + +free: + while (head) { + list = head; + head = head->next; + kfree(list); + } + return err; +} + +#endif /* CONFIG_HEKI */ + static int complete_hypercall_exit(struct kvm_vcpu *vcpu) { u64 ret = vcpu->run->hypercall.ret; @@ -10097,6 +10224,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) return ret; } break; + case KVM_HC_PROTECT_MEMORY: + ret = heki_protect_memory(vcpu->kvm, a0); + break; #endif /* CONFIG_HEKI */ default: ret = -KVM_ENOSYS; diff --git a/include/linux/heki.h b/include/linux/heki.h index 96ccb17657e5..89cc9273a968 100644 --- a/include/linux/heki.h +++ b/include/linux/heki.h @@ -8,6 +8,7 @@ #ifndef __HEKI_H__ #define __HEKI_H__ +#include #include #include #include @@ -17,6 +18,32 @@ #ifdef CONFIG_HEKI +/* + * This structure contains a guest physical range and its permissions (RWX). + */ +struct heki_pages { + gpa_t pa; + gpa_t epa; + unsigned long permissions; +}; + +/* + * Guest ranges are passed to the VMM or hypervisor so they can be authenticated + * and their permissions can be set in the host page table. When an array of + * these is passed to the Hypervisor or VMM, the array must be in physically + * contiguous memory. + * + * This struct occupies one page. In each page, an array of guest ranges can + * be passed. A guest request to the VMM/Hypervisor may contain a list of + * these structs (linked by "next_pa"). + */ +struct heki_page_list { + struct heki_page_list *next; + gpa_t next_pa; + unsigned long npages; + struct heki_pages pages[]; +}; + /* * A hypervisor that supports Heki will instantiate this structure to * provide hypervisor specific functions for Heki. @@ -36,6 +63,8 @@ struct heki { extern struct heki heki; extern bool heki_enabled; +extern bool __read_mostly enable_mbec; + void heki_early_init(void); void heki_late_init(void); diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index 2ed418704603..938c9006e354 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -31,6 +31,7 @@ #define KVM_HC_SCHED_YIELD 11 #define KVM_HC_MAP_GPA_RANGE 12 #define KVM_HC_LOCK_CR_UPDATE 13 +#define KVM_HC_PROTECT_MEMORY 14 /* * hypercalls use architecture specific From patchwork Mon Nov 13 02:23:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453567 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 111BC18634 for ; Mon, 13 Nov 2023 02:32:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="JKmMqdli" X-Greylist: delayed 501 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Sun, 12 Nov 2023 18:32:28 PST Received: from smtp-8fac.mail.infomaniak.ch (smtp-8fac.mail.infomaniak.ch [83.166.143.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B401310D for ; Sun, 12 Nov 2023 18:32:28 -0800 (PST) Received: from smtp-3-0000.mail.infomaniak.ch (unknown [10.4.36.107]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCtb03xVzMpvcL; Mon, 13 Nov 2023 02:24:39 +0000 (UTC) Received: from unknown by smtp-3-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtY6j2Kz3W; Mon, 13 Nov 2023 03:24:37 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842278; bh=udX3yMbRw+y6U4YMfanKu8XE/62XvcB7w1fSqqDyDOI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JKmMqdlijgI0WHD2CFxV/y7DWfB+LUKrwQo0xOX8EfXAZsvA9YZ7OO5gDS9iDz+rl OhhkExnEBcdos98SON70W/UL9RJeu0hGR5koDZT8lkh5am5LEoQnacmgelAtWgQqxy GoKDTdCaRbpksDhhbdbuDq2zFJFNL69+pG5malPA= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 12/19] x86: Implement the Memory Table feature to store arbitrary per-page data Date: Sun, 12 Nov 2023 21:23:19 -0500 Message-ID: <20231113022326.24388-13-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha From: Madhavan T. Venkataraman This feature can be used by a consumer to associate any arbitrary pointer with a physical page. The feature implements a page table format that mirrors the hardware page table. A leaf entry in the table points to consumer data for that page. The page table format has these advantages: - The format allows for a sparse representation. This is useful since the physical address space can be large and is typically sparsely populated in a system. - A consumer of this feature can choose to populate data just for the pages he is interested in. - Information can be stored for large pages, if a consumer wishes. For instance, for Heki, the guest kernel uses this to create permissions counters for each guest physical page. The permissions counters reflects the collective permissions for a guest physical page across all mappings to that page. This allows the guest to request the hypervisor to set only the necessary permissions for a guest physical page in the EPT (instead of RWX). This feature could also be used to improve the KVM's memory attribute and the write page tracking. We will support large page entries in mem_table in a future version thanks to extra mem_table_ops's merge() and split() operations. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Mickaël Salaün Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Madhavan T. Venkataraman --- Changes since v1: * New patch and new file: kernel/mem_table.c --- arch/x86/kernel/setup.c | 2 + include/linux/heki.h | 1 + include/linux/mem_table.h | 55 ++++++++++ kernel/Makefile | 2 + kernel/mem_table.c | 219 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 279 insertions(+) create mode 100644 include/linux/mem_table.h create mode 100644 kernel/mem_table.c diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index b098b1fa2470..e7ae46953ae4 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -25,6 +25,7 @@ #include #include #include +#include #include @@ -1315,6 +1316,7 @@ void __init setup_arch(char **cmdline_p) #endif unwind_init(); + mem_table_init(PG_LEVEL_4K); } #ifdef CONFIG_X86_32 diff --git a/include/linux/heki.h b/include/linux/heki.h index 89cc9273a968..9b0c966c50d1 100644 --- a/include/linux/heki.h +++ b/include/linux/heki.h @@ -15,6 +15,7 @@ #include #include #include +#include #ifdef CONFIG_HEKI diff --git a/include/linux/mem_table.h b/include/linux/mem_table.h new file mode 100644 index 000000000000..738bf12309f3 --- /dev/null +++ b/include/linux/mem_table.h @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Memory table feature - Definitions. + * + * Copyright © 2023 Microsoft Corporation. + */ + +#ifndef __MEM_TABLE_H__ +#define __MEM_TABLE_H__ + +/* clang-format off */ + +/* + * The MEM_TABLE bit is set on entries that point to an intermediate table. + * So, this bit is reserved. This means that pointers to consumer data must + * be at least two-byte aligned (so the MEM_TABLE bit is 0). + */ +#define MEM_TABLE BIT(0) +#define IS_LEAF(entry) !((uintptr_t)entry & MEM_TABLE) + +/* clang-format on */ + +/* + * A memory table is arranged exactly like a page table. The memory table + * configuration reflects the hardware page table configuration. + */ + +/* Parameters at each level of the memory table hierarchy. */ +struct mem_table_level { + unsigned int number; + unsigned int nentries; + unsigned int shift; + unsigned int mask; +}; + +struct mem_table { + struct mem_table_level *level; + struct mem_table_ops *ops; + bool changed; + void *entries[]; +}; + +/* Operations that need to be supplied by a consumer of memory tables. */ +struct mem_table_ops { + void (*free)(void *buf); +}; + +void mem_table_init(unsigned int base_level); +struct mem_table *mem_table_alloc(struct mem_table_ops *ops); +void mem_table_free(struct mem_table *table); +void **mem_table_create(struct mem_table *table, phys_addr_t pa); +void **mem_table_find(struct mem_table *table, phys_addr_t pa, + unsigned int *level_num); + +#endif /* __MEM_TABLE_H__ */ diff --git a/kernel/Makefile b/kernel/Makefile index 3947122d618b..dcef03ec5c54 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -131,6 +131,8 @@ obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o obj-$(CONFIG_RESOURCE_KUNIT_TEST) += resource_kunit.o obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o +obj-$(CONFIG_SPARSEMEM) += mem_table.o + CFLAGS_stackleak.o += $(DISABLE_STACKLEAK_PLUGIN) obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o KASAN_SANITIZE_stackleak.o := n diff --git a/kernel/mem_table.c b/kernel/mem_table.c new file mode 100644 index 000000000000..280a1b5ddde0 --- /dev/null +++ b/kernel/mem_table.c @@ -0,0 +1,219 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Memory table feature. + * + * This feature can be used by a consumer to associate any arbitrary pointer + * with a physical page. The feature implements a page table format that + * mirrors the hardware page table. A leaf entry in the table points to + * consumer data for that page. + * + * The page table format has these advantages: + * + * - The format allows for a sparse representation. This is useful since + * the physical address space can be large and is typically sparsely + * populated in a system. + * + * - A consumer of this feature can choose to populate data just for + * the pages he is interested in. + * + * - Information can be stored for large pages, if a consumer wishes. + * + * For instance, for Heki, the guest kernel uses this to create permissions + * counters for each guest physical page. The permissions counters reflects the + * collective permissions for a guest physical page across all mappings to that + * page. This allows the guest to request the hypervisor to set only the + * necessary permissions for a guest physical page in the EPT (instead of RWX). + * + * Copyright © 2023 Microsoft Corporation. + */ + +/* + * Memory table functions use recursion for simplicity. The recursion is bounded + * by the number of hardware page table levels. + * + * Locking is left to the caller of these functions. + */ +#include +#include +#include + +#define TABLE(entry) ((void *)((uintptr_t)entry & ~MEM_TABLE)) +#define ENTRY(table) ((void *)((uintptr_t)table | MEM_TABLE)) + +/* + * Within this feature, the table levels start from 0. On X86, the base level + * is not 0. + */ +unsigned int mem_table_base_level __ro_after_init; +unsigned int mem_table_nlevels __ro_after_init; +struct mem_table_level mem_table_levels[CONFIG_PGTABLE_LEVELS] __ro_after_init; + +void __init mem_table_init(unsigned int base_level) +{ + struct mem_table_level *level; + unsigned long shift, delta_shift; + int physmem_bits; + int i, max_levels; + + /* + * Compute the actual number of levels present. Compute the parameters + * for each level. + */ + shift = ilog2(PAGE_SIZE); + physmem_bits = PAGE_SHIFT; + max_levels = CONFIG_PGTABLE_LEVELS; + + for (i = 0; i < max_levels && physmem_bits < MAX_PHYSMEM_BITS; i++) { + level = &mem_table_levels[i]; + + switch (i) { + case 0: + level->nentries = PTRS_PER_PTE; + break; + case 1: + level->nentries = PTRS_PER_PMD; + break; + case 2: + level->nentries = PTRS_PER_PUD; + break; + case 3: + level->nentries = PTRS_PER_P4D; + break; + case 4: + level->nentries = PTRS_PER_PGD; + break; + } + level->number = i; + level->shift = shift; + level->mask = level->nentries - 1; + + delta_shift = ilog2(level->nentries); + shift += delta_shift; + physmem_bits += delta_shift; + } + mem_table_nlevels = i; + mem_table_base_level = base_level; +} + +struct mem_table *mem_table_alloc(struct mem_table_ops *ops) +{ + struct mem_table_level *level; + struct mem_table *table; + + level = &mem_table_levels[mem_table_nlevels - 1]; + + table = kzalloc(struct_size(table, entries, level->nentries), + GFP_KERNEL); + if (table) { + table->level = level; + table->ops = ops; + return table; + } + return NULL; +} +EXPORT_SYMBOL_GPL(mem_table_alloc); + +static void _mem_table_free(struct mem_table *table) +{ + struct mem_table_level *level = table->level; + void **entries = table->entries; + struct mem_table_ops *ops = table->ops; + int i; + + for (i = 0; i < level->nentries; i++) { + if (!entries[i]) + continue; + if (IS_LEAF(entries[i])) { + /* The consumer frees the pointer. */ + ops->free(entries[i]); + continue; + } + _mem_table_free(TABLE(entries[i])); + } + kfree(table); +} + +void mem_table_free(struct mem_table *table) +{ + _mem_table_free(table); +} +EXPORT_SYMBOL_GPL(mem_table_free); + +static void **_mem_table_find(struct mem_table *table, phys_addr_t pa, + unsigned int *level_number) +{ + struct mem_table_level *level = table->level; + void **entries = table->entries; + unsigned long i; + + i = (pa >> level->shift) & level->mask; + + *level_number = level->number; + if (!entries[i]) + return NULL; + + if (IS_LEAF(entries[i])) + return &entries[i]; + + return _mem_table_find(TABLE(entries[i]), pa, level_number); +} + +void **mem_table_find(struct mem_table *table, phys_addr_t pa, + unsigned int *level_number) +{ + void **entry; + + entry = _mem_table_find(table, pa, level_number); + level_number += mem_table_base_level; + + return entry; +} +EXPORT_SYMBOL_GPL(mem_table_find); + +static void **_mem_table_create(struct mem_table *table, phys_addr_t pa) +{ + struct mem_table_level *level = table->level; + void **entries = table->entries; + unsigned long i; + + table->changed = true; + i = (pa >> level->shift) & level->mask; + + if (!level->number) { + /* + * Reached the lowest level. Return a pointer to the entry + * so that the consumer can populate it. + */ + return &entries[i]; + } + + /* + * If the entry is NULL, then create a lower level table and make the + * entry point to it. Or, if the entry is a leaf, then we need to + * split the entry. In this case as well, create a lower level table + * to split the entry. + */ + if (!entries[i] || IS_LEAF(entries[i])) { + struct mem_table *next; + + /* Create next level table. */ + level--; + next = kzalloc(struct_size(table, entries, level->nentries), + GFP_KERNEL); + if (!next) + return NULL; + + next->level = level; + next->ops = table->ops; + next->changed = true; + entries[i] = ENTRY(next); + } + + return _mem_table_create(TABLE(entries[i]), pa); +} + +void **mem_table_create(struct mem_table *table, phys_addr_t pa) +{ + return _mem_table_create(table, pa); +} +EXPORT_SYMBOL_GPL(mem_table_create); From patchwork Mon Nov 13 02:23:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453566 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05CFA18632 for ; Mon, 13 Nov 2023 02:32:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="K6v5KJH9" Received: from smtp-190c.mail.infomaniak.ch (smtp-190c.mail.infomaniak.ch [185.125.25.12]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9CC810E for ; Sun, 12 Nov 2023 18:32:28 -0800 (PST) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCtf4t44zMpvd2; Mon, 13 Nov 2023 02:24:42 +0000 (UTC) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtd2zG3zMpnPr; Mon, 13 Nov 2023 03:24:41 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842282; bh=hEzwbKiCPTVRprt7TA2n5SNi5zwGnb2iVWTsgeipVHE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=K6v5KJH9jR6IiLGfGMDtfMvhMpsRKAlNdnYP3VLP1tQYDJsD9yOa1l7BiqESRCqDJ cmZYiFVvTDYPtiXd2tBOhB0CrMCZj3y2+qczeM+o9M04Idv7FHkodSM7IbhjzC/hXd PiK+ofD1UU5HDgZ8vthIvlYSVcyE4os40daGAPbA= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 13/19] heki: Implement a kernel page table walker Date: Sun, 12 Nov 2023 21:23:20 -0500 Message-ID: <20231113022326.24388-14-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha From: Madhavan T. Venkataraman The Heki feature needs to do the following: - Find kernel mappings. - Determine the permissions associated with each mapping. - Determine the collective permissions for a guest physical page across all of its mappings. This way, a guest physical page can reflect only the required permissions in the EPT thanks to the KVM_HC_PROTECT_MEMORY hypercall.. Implement a kernel page table walker that walks all of the kernel mappings and calls a callback function for each mapping. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Co-developed-by: Mickaël Salaün Signed-off-by: Mickaël Salaün Signed-off-by: Madhavan T. Venkataraman --- Change since v1: * New patch and new file: virt/heki/walk.c --- include/linux/heki.h | 16 +++++ virt/heki/Makefile | 1 + virt/heki/walk.c | 140 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 157 insertions(+) create mode 100644 virt/heki/walk.c diff --git a/include/linux/heki.h b/include/linux/heki.h index 9b0c966c50d1..a7ae0b387dfe 100644 --- a/include/linux/heki.h +++ b/include/linux/heki.h @@ -61,6 +61,22 @@ struct heki { struct heki_hypervisor *hypervisor; }; +/* + * The kernel page table is walked to locate kernel mappings. For each + * mapping, a callback function is called. The table walker passes information + * about the mapping to the callback using this structure. + */ +struct heki_args { + /* Information passed by the table walker to the callback. */ + unsigned long va; + phys_addr_t pa; + size_t size; + unsigned long flags; +}; + +/* Callback function called by the table walker. */ +typedef void (*heki_func_t)(struct heki_args *args); + extern struct heki heki; extern bool heki_enabled; diff --git a/virt/heki/Makefile b/virt/heki/Makefile index 354e567df71c..a5daa4ff7a4f 100644 --- a/virt/heki/Makefile +++ b/virt/heki/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only obj-y += main.o +obj-y += walk.o diff --git a/virt/heki/walk.c b/virt/heki/walk.c new file mode 100644 index 000000000000..e10b54226fcc --- /dev/null +++ b/virt/heki/walk.c @@ -0,0 +1,140 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Hypervisor Enforced Kernel Integrity (Heki) - Kernel page table walker. + * + * Copyright © 2023 Microsoft Corporation + * + * Cf. arch/x86/mm/init_64.c + */ + +#include +#include + +static void heki_walk_pte(pmd_t *pmd, unsigned long va, unsigned long va_end, + heki_func_t func, struct heki_args *args) +{ + pte_t *pte; + unsigned long next_va; + + for (pte = pte_offset_kernel(pmd, va); va < va_end; + va = next_va, pte++) { + next_va = (va + PAGE_SIZE) & PAGE_MASK; + + if (next_va > va_end) + next_va = va_end; + + if (!pte_present(*pte)) + continue; + + args->va = va; + args->pa = pte_pfn(*pte) << PAGE_SHIFT; + args->size = PAGE_SIZE; + args->flags = pte_flags(*pte); + + func(args); + } +} + +static void heki_walk_pmd(pud_t *pud, unsigned long va, unsigned long va_end, + heki_func_t func, struct heki_args *args) +{ + pmd_t *pmd; + unsigned long next_va; + + for (pmd = pmd_offset(pud, va); va < va_end; va = next_va, pmd++) { + next_va = pmd_addr_end(va, va_end); + + if (!pmd_present(*pmd)) + continue; + + if (pmd_large(*pmd)) { + args->va = va; + args->pa = pmd_pfn(*pmd) << PAGE_SHIFT; + args->pa += va & (PMD_SIZE - 1); + args->size = next_va - va; + args->flags = pmd_flags(*pmd); + + func(args); + } else { + heki_walk_pte(pmd, va, next_va, func, args); + } + } +} + +static void heki_walk_pud(p4d_t *p4d, unsigned long va, unsigned long va_end, + heki_func_t func, struct heki_args *args) +{ + pud_t *pud; + unsigned long next_va; + + for (pud = pud_offset(p4d, va); va < va_end; va = next_va, pud++) { + next_va = pud_addr_end(va, va_end); + + if (!pud_present(*pud)) + continue; + + if (pud_large(*pud)) { + args->va = va; + args->pa = pud_pfn(*pud) << PAGE_SHIFT; + args->pa += va & (PUD_SIZE - 1); + args->size = next_va - va; + args->flags = pud_flags(*pud); + + func(args); + } else { + heki_walk_pmd(pud, va, next_va, func, args); + } + } +} + +static void heki_walk_p4d(pgd_t *pgd, unsigned long va, unsigned long va_end, + heki_func_t func, struct heki_args *args) +{ + p4d_t *p4d; + unsigned long next_va; + + for (p4d = p4d_offset(pgd, va); va < va_end; va = next_va, p4d++) { + next_va = p4d_addr_end(va, va_end); + + if (!p4d_present(*p4d)) + continue; + + if (p4d_large(*p4d)) { + args->va = va; + args->pa = p4d_pfn(*p4d) << PAGE_SHIFT; + args->pa += va & (P4D_SIZE - 1); + args->size = next_va - va; + args->flags = p4d_flags(*p4d); + + func(args); + } else { + heki_walk_pud(p4d, va, next_va, func, args); + } + } +} + +void heki_walk(unsigned long va, unsigned long va_end, heki_func_t func, + struct heki_args *args) +{ + pgd_t *pgd; + unsigned long next_va; + + for (pgd = pgd_offset_k(va); va < va_end; va = next_va, pgd++) { + next_va = pgd_addr_end(va, va_end); + + if (!pgd_present(*pgd)) + continue; + + if (pgd_large(*pgd)) { + args->va = va; + args->pa = pgd_pfn(*pgd) << PAGE_SHIFT; + args->pa += va & (PGDIR_SIZE - 1); + args->size = next_va - va; + args->flags = pgd_flags(*pgd); + + func(args); + } else { + heki_walk_p4d(pgd, va, next_va, func, args); + } + } +} From patchwork Mon Nov 13 02:23:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453576 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 286341BDD6 for ; Mon, 13 Nov 2023 02:32:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="jIruH5wh" X-Greylist: delayed 506 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Sun, 12 Nov 2023 18:32:29 PST Received: from smtp-190c.mail.infomaniak.ch (smtp-190c.mail.infomaniak.ch [IPv6:2001:1600:4:17::190c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21432111 for ; Sun, 12 Nov 2023 18:32:28 -0800 (PST) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCtk62jlzMpvbR; Mon, 13 Nov 2023 02:24:46 +0000 (UTC) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtj3BHpzMpnPj; Mon, 13 Nov 2023 03:24:45 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842286; bh=cKEtzyAn6nu6IgS2HwQtrr3Q0FdnoNCxsjW/LjtBy3Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jIruH5whNlSM69bid+qC3QXUvSlkIk75w7vIB/z+Qo7hEu+vNgXjfS7u+hLyxFvJh GvNxrzuFYdFr4xGhRvANkM2xckqutIUrO++9D0gm+yBcTUiZrahZ5H3unv9L5ngWMM D6Za9ta9LWpEPpfa0MoQWkvjZSapc+FeceMFwMpo= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 14/19] heki: x86: Initialize permissions counters for pages mapped into KVA Date: Sun, 12 Nov 2023 21:23:21 -0500 Message-ID: <20231113022326.24388-15-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha From: Madhavan T. Venkataraman Define a permissions counters structure that contains a counter for read, write and execute. Each mapped guest page will be allocated a permissions counters structure. During kernel boot, walk the kernel address space, locate all the mappings, create permissions counters for each mapped guest page and update the counters to reflect the collective permissions for each page across all of its mappings. The collective permissions will be applied in the EPT in a following commit. We might want to move these counters to a safer place (e.g., KVM) to protect it from tampering by the guest kernel itself. We should note that walking through all mappings might be slow if KASAN is enabled. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Mickaël Salaün Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Suggested-by: Mickaël Salaün Signed-off-by: Madhavan T. Venkataraman --- Changes since v1: * New patch and new files: arch/x86/mm/heki.c and virt/heki/counters.c --- arch/x86/mm/Makefile | 2 + arch/x86/mm/heki.c | 56 +++++++++++++++++ include/linux/heki.h | 32 ++++++++++ virt/heki/Kconfig | 2 + virt/heki/Makefile | 1 + virt/heki/counters.c | 147 +++++++++++++++++++++++++++++++++++++++++++ virt/heki/main.c | 13 ++++ 7 files changed, 253 insertions(+) create mode 100644 arch/x86/mm/heki.c create mode 100644 virt/heki/counters.c diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index c80febc44cd2..2998eaac0dbb 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -67,3 +67,5 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_amd.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o + +obj-$(CONFIG_HEKI) += heki.o diff --git a/arch/x86/mm/heki.c b/arch/x86/mm/heki.c new file mode 100644 index 000000000000..c495df0d8772 --- /dev/null +++ b/arch/x86/mm/heki.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Hypervisor Enforced Kernel Integrity (Heki) - Arch specific. + * + * Copyright © 2023 Microsoft Corporation + */ + +#include +#include + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) "heki-guest: " fmt + +static unsigned long kernel_va; +static unsigned long kernel_end; +static unsigned long direct_map_va; +static unsigned long direct_map_end; + +__init void heki_arch_early_init(void) +{ + /* Kernel virtual address space range, not yet compatible with KASLR. */ + if (pgtable_l5_enabled()) { + kernel_va = 0xff00000000000000UL; + kernel_end = 0xffffffffffe00000UL; + direct_map_va = 0xff11000000000000UL; + direct_map_end = 0xff91000000000000UL; + } else { + kernel_va = 0xffff800000000000UL; + kernel_end = 0xffffffffffe00000UL; + direct_map_va = 0xffff888000000000UL; + direct_map_end = 0xffffc88000000000UL; + } + + /* + * Initialize the counters for all existing kernel mappings except + * for direct map. + */ + heki_map(kernel_va, direct_map_va); + heki_map(direct_map_end, kernel_end); +} + +unsigned long heki_flags_to_permissions(unsigned long flags) +{ + unsigned long permissions; + + permissions = MEM_ATTR_READ | MEM_ATTR_EXEC; + if (flags & _PAGE_RW) + permissions |= MEM_ATTR_WRITE; + if (flags & _PAGE_NX) + permissions &= ~MEM_ATTR_EXEC; + + return permissions; +} diff --git a/include/linux/heki.h b/include/linux/heki.h index a7ae0b387dfe..86c787d121e0 100644 --- a/include/linux/heki.h +++ b/include/linux/heki.h @@ -19,6 +19,16 @@ #ifdef CONFIG_HEKI +/* + * This structure keeps track of the collective permissions for a guest page + * across all of its mappings. + */ +struct heki_counters { + int read; + int write; + int execute; +}; + /* * This structure contains a guest physical range and its permissions (RWX). */ @@ -56,9 +66,17 @@ struct heki_hypervisor { /* * If the active hypervisor supports Heki, it will plug its heki_hypervisor * pointer into this heki structure. + * + * During guest kernel boot, permissions counters for each guest page are + * initialized based on the page's current permissions. */ struct heki { struct heki_hypervisor *hypervisor; + struct mem_table *counters; +}; + +enum heki_cmd { + HEKI_MAP, }; /* @@ -72,6 +90,9 @@ struct heki_args { phys_addr_t pa; size_t size; unsigned long flags; + + /* Command passed by caller. */ + enum heki_cmd cmd; }; /* Callback function called by the table walker. */ @@ -84,6 +105,14 @@ extern bool __read_mostly enable_mbec; void heki_early_init(void); void heki_late_init(void); +void heki_counters_init(void); +void heki_walk(unsigned long va, unsigned long va_end, heki_func_t func, + struct heki_args *args); +void heki_map(unsigned long va, unsigned long end); + +/* Arch-specific functions. */ +void heki_arch_early_init(void); +unsigned long heki_flags_to_permissions(unsigned long flags); #else /* !CONFIG_HEKI */ @@ -93,6 +122,9 @@ static inline void heki_early_init(void) static inline void heki_late_init(void) { } +static inline void heki_map(unsigned long va, unsigned long end) +{ +} #endif /* CONFIG_HEKI */ diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig index 75a784653e31..6d956eb9d04b 100644 --- a/virt/heki/Kconfig +++ b/virt/heki/Kconfig @@ -6,6 +6,8 @@ config HEKI bool "Hypervisor Enforced Kernel Integrity (Heki)" depends on ARCH_SUPPORTS_HEKI && HYPERVISOR_SUPPORTS_HEKI select KVM_GENERIC_MEMORY_ATTRIBUTES + depends on !X86_16BIT + select SPARSEMEM help This feature enhances guest virtual machine security by taking advantage of security features provided by the hypervisor for guests. diff --git a/virt/heki/Makefile b/virt/heki/Makefile index a5daa4ff7a4f..564f92faa9d8 100644 --- a/virt/heki/Makefile +++ b/virt/heki/Makefile @@ -2,3 +2,4 @@ obj-y += main.o obj-y += walk.o +obj-y += counters.o diff --git a/virt/heki/counters.c b/virt/heki/counters.c new file mode 100644 index 000000000000..7067449cabca --- /dev/null +++ b/virt/heki/counters.c @@ -0,0 +1,147 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Hypervisor Enforced Kernel Integrity (Heki) - Permissions counters. + * + * Copyright © 2023 Microsoft Corporation + */ + +#include +#include +#include + +#include "common.h" + +DEFINE_MUTEX(heki_lock); + +static void heki_update_counters(struct heki_counters *counters, + unsigned long perm, unsigned long set, + unsigned long clear) +{ + if (WARN_ON_ONCE(!counters)) + return; + + if ((clear & MEM_ATTR_READ) && (perm & MEM_ATTR_READ)) + counters->read--; + if ((clear & MEM_ATTR_WRITE) && (perm & MEM_ATTR_WRITE)) + counters->write--; + if ((clear & MEM_ATTR_EXEC) && (perm & MEM_ATTR_EXEC)) + counters->execute--; + + if ((set & MEM_ATTR_READ) && !(perm & MEM_ATTR_READ)) + counters->read++; + if ((set & MEM_ATTR_WRITE) && !(perm & MEM_ATTR_WRITE)) + counters->write++; + if ((set & MEM_ATTR_EXEC) && !(perm & MEM_ATTR_EXEC)) + counters->execute++; +} + +static struct heki_counters *heki_create_counters(struct mem_table *table, + phys_addr_t pa) +{ + struct heki_counters *counters; + void **entry; + + entry = mem_table_create(table, pa); + if (WARN_ON(!entry)) + return NULL; + + counters = kzalloc(sizeof(*counters), GFP_KERNEL); + if (WARN_ON(!counters)) + return NULL; + + *entry = counters; + return counters; +} + +void heki_callback(struct heki_args *args) +{ + /* The VA is only for debug. It is not really used in this function. */ + unsigned long va; + phys_addr_t pa, pa_end; + unsigned long permissions; + void **entry; + struct heki_counters *counters; + unsigned int ignore; + + if (!pfn_valid(args->pa >> PAGE_SHIFT)) + return; + + permissions = heki_flags_to_permissions(args->flags); + + /* + * Handle counters for a leaf entry in the kernel page table. + */ + pa_end = args->pa + args->size; + for (pa = args->pa, va = args->va; pa < pa_end; + pa += PAGE_SIZE, va += PAGE_SIZE) { + entry = mem_table_find(heki.counters, pa, &ignore); + if (entry) + counters = *entry; + else + counters = NULL; + + switch (args->cmd) { + case HEKI_MAP: + if (!counters) + counters = + heki_create_counters(heki.counters, pa); + heki_update_counters(counters, 0, permissions, 0); + break; + + default: + WARN_ON_ONCE(1); + break; + } + } +} + +static void heki_func(unsigned long va, unsigned long end, + struct heki_args *args) +{ + if (!heki.counters || va >= end) + return; + + va = ALIGN_DOWN(va, PAGE_SIZE); + end = ALIGN(end, PAGE_SIZE); + + mutex_lock(&heki_lock); + + heki_walk(va, end, heki_callback, args); + + mutex_unlock(&heki_lock); +} + +/* + * Find the mappings in the given range and initialize permission counters for + * them. + */ +void heki_map(unsigned long va, unsigned long end) +{ + struct heki_args args = { + .cmd = HEKI_MAP, + }; + + heki_func(va, end, &args); +} + +/* + * Permissions counters are associated with each guest page using the + * Memory Table feature. Initialize the permissions counters here. + * Note that we don't support large page entries for counters because + * it is difficult to merge/split counters for large pages. + */ + +static void heki_counters_free(void *counters) +{ + kfree(counters); +} + +static struct mem_table_ops heki_counters_ops = { + .free = heki_counters_free, +}; + +__init void heki_counters_init(void) +{ + heki.counters = mem_table_alloc(&heki_counters_ops); + WARN_ON(!heki.counters); +} diff --git a/virt/heki/main.c b/virt/heki/main.c index ff1937e1c946..0ab7de659e6f 100644 --- a/virt/heki/main.c +++ b/virt/heki/main.c @@ -21,6 +21,16 @@ __init void heki_early_init(void) pr_warn("Heki is not enabled\n"); return; } + + /* + * Static addresses (see heki_arch_early_init) are not compatible with + * KASLR. This will be handled in a next patch series. + */ + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { + pr_warn("Heki is disabled because KASLR is not supported yet\n"); + return; + } + pr_warn("Heki is enabled\n"); if (!heki.hypervisor) { @@ -29,6 +39,9 @@ __init void heki_early_init(void) return; } pr_warn("Heki is supported by the active Hypervisor\n"); + + heki_counters_init(); + heki_arch_early_init(); } /* From patchwork Mon Nov 13 02:23:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453569 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 801C218623 for ; Mon, 13 Nov 2023 02:32:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="K5jzM4Xu" Received: from smtp-190e.mail.infomaniak.ch (smtp-190e.mail.infomaniak.ch [185.125.25.14]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF6A51BFF for ; Sun, 12 Nov 2023 18:32:29 -0800 (PST) Received: from smtp-3-0000.mail.infomaniak.ch (unknown [10.4.36.107]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCtn6lRlzMpvf0; Mon, 13 Nov 2023 02:24:49 +0000 (UTC) Received: from unknown by smtp-3-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtn0kJDz3W; Mon, 13 Nov 2023 03:24:49 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842289; bh=oIX4MzUmE/Oa64hLfkUteDWwPQashM0GHMQHapB48so=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=K5jzM4XuFLgeXCO9+orL/1d6/57akoG8sVCXVP6DTx3kGnM8+LT8Zp4EFmKLEQVJU byp0y0LRrf6MzvP+HnMoZzEDuC6dqCMBtuBJ4XXTKeXfqgWYy3j1J1518WlBYsj24n 4qx1SQ23tdaoLwB2+k7Q0uM0YzOX+D/zb42fxDvg= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 15/19] heki: x86: Initialize permissions counters for pages in vmap()/vunmap() Date: Sun, 12 Nov 2023 21:23:22 -0500 Message-ID: <20231113022326.24388-16-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha From: Madhavan T. Venkataraman When a page gets mapped, create permissions counters for it and initialize them based on the specified permissions. When a page gets unmapped, update the counters appropriately. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Mickaël Salaün Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Madhavan T. Venkataraman --- Changes since v1: * New patch --- include/linux/heki.h | 11 ++++++++++- mm/vmalloc.c | 7 +++++++ virt/heki/counters.c | 20 ++++++++++++++++++++ 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/include/linux/heki.h b/include/linux/heki.h index 86c787d121e0..d660994d34d0 100644 --- a/include/linux/heki.h +++ b/include/linux/heki.h @@ -68,7 +68,11 @@ struct heki_hypervisor { * pointer into this heki structure. * * During guest kernel boot, permissions counters for each guest page are - * initialized based on the page's current permissions. + * initialized based on the page's current permissions. Beyond this point, + * the counters are updated whenever: + * + * - a page is mapped into the kernel address space + * - a page is unmapped from the kernel address space */ struct heki { struct heki_hypervisor *hypervisor; @@ -77,6 +81,7 @@ struct heki { enum heki_cmd { HEKI_MAP, + HEKI_UNMAP, }; /* @@ -109,6 +114,7 @@ void heki_counters_init(void); void heki_walk(unsigned long va, unsigned long va_end, heki_func_t func, struct heki_args *args); void heki_map(unsigned long va, unsigned long end); +void heki_unmap(unsigned long va, unsigned long end); /* Arch-specific functions. */ void heki_arch_early_init(void); @@ -125,6 +131,9 @@ static inline void heki_late_init(void) static inline void heki_map(unsigned long va, unsigned long end) { } +static inline void heki_unmap(unsigned long va, unsigned long end) +{ +} #endif /* CONFIG_HEKI */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a3fedb3ee0db..d9096502e571 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -301,6 +302,8 @@ static int vmap_range_noflush(unsigned long addr, unsigned long end, if (mask & ARCH_PAGE_TABLE_SYNC_MASK) arch_sync_kernel_mappings(start, end); + heki_map(start, end); + return err; } @@ -419,6 +422,8 @@ void __vunmap_range_noflush(unsigned long start, unsigned long end) pgtbl_mod_mask mask = 0; BUG_ON(addr >= end); + heki_unmap(start, end); + pgd = pgd_offset_k(addr); do { next = pgd_addr_end(addr, end); @@ -564,6 +569,8 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end, if (mask & ARCH_PAGE_TABLE_SYNC_MASK) arch_sync_kernel_mappings(start, end); + heki_map(start, end); + return 0; } diff --git a/virt/heki/counters.c b/virt/heki/counters.c index 7067449cabca..adc8d566b8a9 100644 --- a/virt/heki/counters.c +++ b/virt/heki/counters.c @@ -88,6 +88,13 @@ void heki_callback(struct heki_args *args) heki_update_counters(counters, 0, permissions, 0); break; + case HEKI_UNMAP: + if (WARN_ON_ONCE(!counters)) + break; + heki_update_counters(counters, permissions, 0, + permissions); + break; + default: WARN_ON_ONCE(1); break; @@ -124,6 +131,19 @@ void heki_map(unsigned long va, unsigned long end) heki_func(va, end, &args); } +/* + * Find the mappings in the given range and revert the permission counters for + * them. + */ +void heki_unmap(unsigned long va, unsigned long end) +{ + struct heki_args args = { + .cmd = HEKI_UNMAP, + }; + + heki_func(va, end, &args); +} + /* * Permissions counters are associated with each guest page using the * Memory Table feature. Initialize the permissions counters here. From patchwork Mon Nov 13 02:23:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453575 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E0A91C291 for ; Mon, 13 Nov 2023 02:32:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="tSrGiLtu" Received: from smtp-8fae.mail.infomaniak.ch (smtp-8fae.mail.infomaniak.ch [IPv6:2001:1600:4:17::8fae]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DAC851BCC for ; Sun, 12 Nov 2023 18:32:29 -0800 (PST) Received: from smtp-3-0000.mail.infomaniak.ch (unknown [10.4.36.107]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCts44SJzMpvdP; Mon, 13 Nov 2023 02:24:53 +0000 (UTC) Received: from unknown by smtp-3-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtr2nMzz3W; Mon, 13 Nov 2023 03:24:52 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842293; bh=xCal4QiPzuyoIMCQ55fNeQ6igvrqyER4Ogi9VJWigXw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tSrGiLtuSOlTibjuVDi1SgaJbkgBVfdywQJICFGMkZZbetDA86MTLx1CUJrtURSJN PCIUP107oOKIFocIkyU1oVto05PKP3vEUaLGG4TW793G5zq8LWUqWguccvZHAbm47M LggTT2pvie45aa6Z33u6iroI0xd/t/vhEQdIBxuk= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 16/19] heki: x86: Update permissions counters when guest page permissions change Date: Sun, 12 Nov 2023 21:23:23 -0500 Message-ID: <20231113022326.24388-17-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha From: Madhavan T. Venkataraman When permissions are changed on an existing mapping, update the permissions counters. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Mickaël Salaün Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Madhavan T. Venkataraman --- Changes since v1: * New patch --- arch/x86/mm/heki.c | 9 +++++++ arch/x86/mm/pat/set_memory.c | 51 ++++++++++++++++++++++++++++++++++++ include/linux/heki.h | 14 ++++++++++ virt/heki/counters.c | 23 ++++++++++++++++ 4 files changed, 97 insertions(+) diff --git a/arch/x86/mm/heki.c b/arch/x86/mm/heki.c index c495df0d8772..c0eace9e343f 100644 --- a/arch/x86/mm/heki.c +++ b/arch/x86/mm/heki.c @@ -54,3 +54,12 @@ unsigned long heki_flags_to_permissions(unsigned long flags) return permissions; } + +void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set, + unsigned long *clear) +{ + if (pgprot_val(prot) & _PAGE_RW) + *set |= MEM_ATTR_WRITE; + if (pgprot_val(prot) & _PAGE_NX) + *clear |= MEM_ATTR_EXEC; +} diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index bda9f129835e..6aaa1ce5692c 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include @@ -2056,11 +2057,56 @@ int clear_mce_nospec(unsigned long pfn) EXPORT_SYMBOL_GPL(clear_mce_nospec); #endif /* CONFIG_X86_64 */ +#ifdef CONFIG_HEKI + +static void heki_change_page_attr_set(unsigned long va, int numpages, + pgprot_t set) +{ + unsigned long va_end; + unsigned long set_permissions = 0, clear_permissions = 0; + + heki_pgprot_to_permissions(set, &set_permissions, &clear_permissions); + if (!(set_permissions | clear_permissions)) + return; + + va_end = va + (numpages << PAGE_SHIFT); + heki_update(va, va_end, set_permissions, clear_permissions); +} + +static void heki_change_page_attr_clear(unsigned long va, int numpages, + pgprot_t clear) +{ + unsigned long va_end; + unsigned long set_permissions = 0, clear_permissions = 0; + + heki_pgprot_to_permissions(clear, &clear_permissions, &set_permissions); + if (!(set_permissions | clear_permissions)) + return; + + va_end = va + (numpages << PAGE_SHIFT); + heki_update(va, va_end, set_permissions, clear_permissions); +} + +#else /* !CONFIG_HEKI */ + +static void heki_change_page_attr_set(unsigned long va, int numpages, + pgprot_t set) +{ +} + +static void heki_change_page_attr_clear(unsigned long va, int numpages, + pgprot_t clear) +{ +} + +#endif /* CONFIG_HEKI */ + int set_memory_x(unsigned long addr, int numpages) { if (!(__supported_pte_mask & _PAGE_NX)) return 0; + heki_change_page_attr_clear(addr, numpages, __pgprot(_PAGE_NX)); return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_NX), 0); } @@ -2069,11 +2115,14 @@ int set_memory_nx(unsigned long addr, int numpages) if (!(__supported_pte_mask & _PAGE_NX)) return 0; + heki_change_page_attr_set(addr, numpages, __pgprot(_PAGE_NX)); return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_NX), 0); } int set_memory_ro(unsigned long addr, int numpages) { + // TODO: What about _PAGE_DIRTY? + heki_change_page_attr_clear(addr, numpages, __pgprot(_PAGE_RW)); return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0); } @@ -2084,11 +2133,13 @@ int set_memory_rox(unsigned long addr, int numpages) if (__supported_pte_mask & _PAGE_NX) clr.pgprot |= _PAGE_NX; + heki_change_page_attr_clear(addr, numpages, clr); return change_page_attr_clear(&addr, numpages, clr, 0); } int set_memory_rw(unsigned long addr, int numpages) { + heki_change_page_attr_set(addr, numpages, __pgprot(_PAGE_RW)); return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_RW), 0); } diff --git a/include/linux/heki.h b/include/linux/heki.h index d660994d34d0..079b34af07f0 100644 --- a/include/linux/heki.h +++ b/include/linux/heki.h @@ -73,6 +73,7 @@ struct heki_hypervisor { * * - a page is mapped into the kernel address space * - a page is unmapped from the kernel address space + * - permissions are changed for a mapped page */ struct heki { struct heki_hypervisor *hypervisor; @@ -81,6 +82,7 @@ struct heki { enum heki_cmd { HEKI_MAP, + HEKI_UPDATE, HEKI_UNMAP, }; @@ -98,6 +100,10 @@ struct heki_args { /* Command passed by caller. */ enum heki_cmd cmd; + + /* Permissions passed by heki_update(). */ + unsigned long set; + unsigned long clear; }; /* Callback function called by the table walker. */ @@ -114,11 +120,15 @@ void heki_counters_init(void); void heki_walk(unsigned long va, unsigned long va_end, heki_func_t func, struct heki_args *args); void heki_map(unsigned long va, unsigned long end); +void heki_update(unsigned long va, unsigned long end, unsigned long set, + unsigned long clear); void heki_unmap(unsigned long va, unsigned long end); /* Arch-specific functions. */ void heki_arch_early_init(void); unsigned long heki_flags_to_permissions(unsigned long flags); +void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set, + unsigned long *clear); #else /* !CONFIG_HEKI */ @@ -131,6 +141,10 @@ static inline void heki_late_init(void) static inline void heki_map(unsigned long va, unsigned long end) { } +static inline void heki_update(unsigned long va, unsigned long end, + unsigned long set, unsigned long clear) +{ +} static inline void heki_unmap(unsigned long va, unsigned long end) { } diff --git a/virt/heki/counters.c b/virt/heki/counters.c index adc8d566b8a9..d0f830b0775a 100644 --- a/virt/heki/counters.c +++ b/virt/heki/counters.c @@ -88,6 +88,13 @@ void heki_callback(struct heki_args *args) heki_update_counters(counters, 0, permissions, 0); break; + case HEKI_UPDATE: + if (!counters) + continue; + heki_update_counters(counters, permissions, args->set, + args->clear); + break; + case HEKI_UNMAP: if (WARN_ON_ONCE(!counters)) break; @@ -131,6 +138,22 @@ void heki_map(unsigned long va, unsigned long end) heki_func(va, end, &args); } +/* + * Find the mappings in the given range and update permission counters for + * them. Apply permissions in the host page table. + */ +void heki_update(unsigned long va, unsigned long end, unsigned long set, + unsigned long clear) +{ + struct heki_args args = { + .cmd = HEKI_UPDATE, + .set = set, + .clear = clear, + }; + + heki_func(va, end, &args); +} + /* * Find the mappings in the given range and revert the permission counters for * them. From patchwork Mon Nov 13 02:23:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453498 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 030508472 for ; Mon, 13 Nov 2023 02:25:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="dJfAiYA2" Received: from smtp-1909.mail.infomaniak.ch (smtp-1909.mail.infomaniak.ch [IPv6:2001:1600:3:17::1909]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C55F6385D for ; Sun, 12 Nov 2023 18:25:06 -0800 (PST) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCtx4lfyzMq2H9; Mon, 13 Nov 2023 02:24:57 +0000 (UTC) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtw1z09zMpnPr; Mon, 13 Nov 2023 03:24:56 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842297; bh=x8fg6SD1g74kkGJ23TK2HH1loleby/1wIvw5whB02rE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dJfAiYA2HYA3jqI/ZQKjz5gzrQbNbqyogck83ckqbDcyi+QeXvv9eLzd5ebcYVOvo cLtZ+H/07iEl7T5kTrk2eFqbGuc3VumW1ct0jF8/XyG299t2fBuDo8YODWnquBbTNY ymQ6InNw/ozaj7PTIf8MX7lr4GmSX3s7IhzlnmOM= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching Date: Sun, 12 Nov 2023 21:23:24 -0500 Message-ID: <20231113022326.24388-18-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha From: Madhavan T. Venkataraman X86 uses a function called __text_poke() to modify executable code. This patching function is used by many features such as KProbes and FTrace. Update the permissions counters for the text page so that write permissions can be temporarily established in the EPT to modify the instructions in that page. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Mickaël Salaün Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Madhavan T. Venkataraman --- Changes since v1: * New patch --- arch/x86/kernel/alternative.c | 5 ++++ arch/x86/mm/heki.c | 49 +++++++++++++++++++++++++++++++++++ include/linux/heki.h | 14 ++++++++++ 3 files changed, 68 insertions(+) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 517ee01503be..64fd8757ba5c 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -1801,6 +1802,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l */ pgprot = __pgprot(pgprot_val(PAGE_KERNEL) & ~_PAGE_GLOBAL); + heki_text_poke_start(pages, cross_page_boundary ? 2 : 1, pgprot); /* * The lock is not really needed, but this allows to avoid open-coding. */ @@ -1865,7 +1867,10 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l } local_irq_restore(flags); + pte_unmap_unlock(ptep, ptl); + heki_text_poke_end(pages, cross_page_boundary ? 2 : 1, pgprot); + return addr; } diff --git a/arch/x86/mm/heki.c b/arch/x86/mm/heki.c index c0eace9e343f..e4c60d8b4f2d 100644 --- a/arch/x86/mm/heki.c +++ b/arch/x86/mm/heki.c @@ -5,8 +5,11 @@ * Copyright © 2023 Microsoft Corporation */ +#include +#include #include #include +#include #ifdef pr_fmt #undef pr_fmt @@ -63,3 +66,49 @@ void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set, if (pgprot_val(prot) & _PAGE_NX) *clear |= MEM_ATTR_EXEC; } + +static unsigned long heki_pgprot_to_flags(pgprot_t prot) +{ + unsigned long flags = 0; + + if (pgprot_val(prot) & _PAGE_RW) + flags |= _PAGE_RW; + if (pgprot_val(prot) & _PAGE_NX) + flags |= _PAGE_NX; + return flags; +} + +static void heki_text_poke_common(struct page **pages, int npages, + pgprot_t prot, enum heki_cmd cmd) +{ + struct heki_args args = { + .cmd = cmd, + }; + unsigned long va = poking_addr; + int i; + + if (!heki.counters) + return; + + mutex_lock(&heki_lock); + + for (i = 0; i < npages; i++, va += PAGE_SIZE) { + args.va = va; + args.pa = page_to_pfn(pages[i]) << PAGE_SHIFT; + args.size = PAGE_SIZE; + args.flags = heki_pgprot_to_flags(prot); + heki_callback(&args); + } + + mutex_unlock(&heki_lock); +} + +void heki_text_poke_start(struct page **pages, int npages, pgprot_t prot) +{ + heki_text_poke_common(pages, npages, prot, HEKI_MAP); +} + +void heki_text_poke_end(struct page **pages, int npages, pgprot_t prot) +{ + heki_text_poke_common(pages, npages, prot, HEKI_UNMAP); +} diff --git a/include/linux/heki.h b/include/linux/heki.h index 079b34af07f0..6f2cfddc6dac 100644 --- a/include/linux/heki.h +++ b/include/linux/heki.h @@ -111,6 +111,7 @@ typedef void (*heki_func_t)(struct heki_args *args); extern struct heki heki; extern bool heki_enabled; +extern struct mutex heki_lock; extern bool __read_mostly enable_mbec; @@ -123,12 +124,15 @@ void heki_map(unsigned long va, unsigned long end); void heki_update(unsigned long va, unsigned long end, unsigned long set, unsigned long clear); void heki_unmap(unsigned long va, unsigned long end); +void heki_callback(struct heki_args *args); /* Arch-specific functions. */ void heki_arch_early_init(void); unsigned long heki_flags_to_permissions(unsigned long flags); void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set, unsigned long *clear); +void heki_text_poke_start(struct page **pages, int npages, pgprot_t prot); +void heki_text_poke_end(struct page **pages, int npages, pgprot_t prot); #else /* !CONFIG_HEKI */ @@ -149,6 +153,16 @@ static inline void heki_unmap(unsigned long va, unsigned long end) { } +/* Arch-specific functions. */ +static inline void heki_text_poke_start(struct page **pages, int npages, + pgprot_t prot) +{ +} +static inline void heki_text_poke_end(struct page **pages, int npages, + pgprot_t prot) +{ +} + #endif /* CONFIG_HEKI */ #endif /* __HEKI_H__ */ From patchwork Mon Nov 13 02:23:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453497 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C44D79EA for ; Mon, 13 Nov 2023 02:25:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="0mhnZA1x" X-Greylist: delayed 65 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Sun, 12 Nov 2023 18:25:02 PST Received: from smtp-bc09.mail.infomaniak.ch (smtp-bc09.mail.infomaniak.ch [IPv6:2001:1600:3:17::bc09]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4942830D5 for ; Sun, 12 Nov 2023 18:25:02 -0800 (PST) Received: from smtp-3-0000.mail.infomaniak.ch (unknown [10.4.36.107]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCv05yshzMq2HC; Mon, 13 Nov 2023 02:25:00 +0000 (UTC) Received: from unknown by smtp-3-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCtz4hvPz3W; Mon, 13 Nov 2023 03:24:59 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842300; bh=PATv6XqDzwsqZ2hzsduhnRZLYujicEUOyMqcIKoFSMY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=0mhnZA1xfB5IXGthCsMhNYtbubItWy18yziuYfeR6AWnv19fq1a5ISoUh4unXyOIh SC9+Jk+sOe/3QbX5C7M0oDyImgy94uTLkrpYq/BH7ntImAVOjNoVR0spQpk+hAJBca WdT/RtauYzVOLQlQ7Hw86Ab44ZapL/N2SdK3uwOw= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor Date: Sun, 12 Nov 2023 21:23:25 -0500 Message-ID: <20231113022326.24388-19-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha From: Madhavan T. Venkataraman Implement a hypervisor function, kvm_protect_memory() that calls the KVM_HC_PROTECT_MEMORY hypercall to request the KVM hypervisor to set specified permissions on a list of guest pages. Using the protect_memory() function, set proper EPT permissions for all guest pages. Use the MEM_ATTR_IMMUTABLE property to protect the kernel static sections and the boot-time read-only sections. This enables to make sure a compromised guest will not be able to change its main physical memory page permissions. However, this also disable any feature that may change the kernel's text section (e.g., ftrace, Kprobes), but they can still be used on kernel modules. Module loading/unloading, and eBPF JIT is allowed without restrictions for now, but we'll need a way to authenticate these code changes to really improve the guests' security. We plan to use module signatures, but there is no solution yet to authenticate eBPF programs. Being able to use ftrace and Kprobes in a secure way is a challenge not solved yet. We're looking for ideas to make this work. Likewise, the JUMP_LABEL feature cannot work because the kernel's text section is read-only. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Co-developed-by: Mickaël Salaün Signed-off-by: Mickaël Salaün Signed-off-by: Madhavan T. Venkataraman --- Changes since v1: * New patch --- arch/x86/kernel/kvm.c | 11 ++++++ arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/mm/heki.c | 21 ++++++++++ include/linux/heki.h | 26 ++++++++++++ virt/heki/Kconfig | 1 + virt/heki/counters.c | 90 ++++++++++++++++++++++++++++++++++++++++-- virt/heki/main.c | 83 +++++++++++++++++++++++++++++++++++++- 7 files changed, 229 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 8349f4ad3bbd..343615b0e3bf 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -1021,8 +1021,19 @@ static int kvm_lock_crs(void) return err; } +static int kvm_protect_memory(gpa_t pa) +{ + long err; + + WARN_ON_ONCE(in_interrupt()); + + err = kvm_hypercall1(KVM_HC_PROTECT_MEMORY, pa); + return err; +} + static struct heki_hypervisor kvm_heki_hypervisor = { .lock_crs = kvm_lock_crs, + .protect_memory = kvm_protect_memory, }; static void kvm_init_heki(void) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2d09bcc35462..13be05e9ccf1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7374,7 +7374,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm, int level; lockdep_assert_held_write(&kvm->mmu_lock); - lockdep_assert_held(&kvm->slots_lock); + lockdep_assert_held(&kvm->slots_arch_lock); /* * The sequence matters here: upper levels consume the result of lower diff --git a/arch/x86/mm/heki.c b/arch/x86/mm/heki.c index e4c60d8b4f2d..6c3fa9defada 100644 --- a/arch/x86/mm/heki.c +++ b/arch/x86/mm/heki.c @@ -45,6 +45,19 @@ __init void heki_arch_early_init(void) heki_map(direct_map_end, kernel_end); } +void heki_arch_late_init(void) +{ + /* + * The permission counters for all existing kernel mappings have + * already been updated. Now, walk all the pages, compute their + * permissions from the counters and apply the permissions in the + * host page table. To accomplish this, we walk the direct map + * range. + */ + heki_protect(direct_map_va, direct_map_end); + pr_warn("Guest memory protected\n"); +} + unsigned long heki_flags_to_permissions(unsigned long flags) { unsigned long permissions; @@ -67,6 +80,11 @@ void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set, *clear |= MEM_ATTR_EXEC; } +unsigned long heki_default_permissions(void) +{ + return MEM_ATTR_READ | MEM_ATTR_WRITE; +} + static unsigned long heki_pgprot_to_flags(pgprot_t prot) { unsigned long flags = 0; @@ -100,6 +118,9 @@ static void heki_text_poke_common(struct page **pages, int npages, heki_callback(&args); } + if (args.head) + heki_apply_permissions(&args); + mutex_unlock(&heki_lock); } diff --git a/include/linux/heki.h b/include/linux/heki.h index 6f2cfddc6dac..306bcec7ae92 100644 --- a/include/linux/heki.h +++ b/include/linux/heki.h @@ -15,6 +15,8 @@ #include #include #include +#include +#include #include #ifdef CONFIG_HEKI @@ -61,6 +63,7 @@ struct heki_page_list { */ struct heki_hypervisor { int (*lock_crs)(void); /* Lock control registers. */ + int (*protect_memory)(gpa_t pa); /* Protect guest memory */ }; /* @@ -74,16 +77,28 @@ struct heki_hypervisor { * - a page is mapped into the kernel address space * - a page is unmapped from the kernel address space * - permissions are changed for a mapped page + * + * At the end of kernel boot (before kicking off the init process), the + * permissions for guest pages are applied to the host page table. + * + * Beyond that point, the counters and host page table permissions are updated + * whenever: + * + * - a guest page is mapped into the kernel address space + * - a guest page is unmapped from the kernel address space + * - permissions are changed for a mapped guest page */ struct heki { struct heki_hypervisor *hypervisor; struct mem_table *counters; + bool protect_memory; }; enum heki_cmd { HEKI_MAP, HEKI_UPDATE, HEKI_UNMAP, + HEKI_PROTECT_MEMORY, }; /* @@ -103,7 +118,12 @@ struct heki_args { /* Permissions passed by heki_update(). */ unsigned long set; + unsigned long set_global; unsigned long clear; + + /* Page list is built by the callback. */ + struct heki_page_list *head; + phys_addr_t head_pa; }; /* Callback function called by the table walker. */ @@ -125,14 +145,20 @@ void heki_update(unsigned long va, unsigned long end, unsigned long set, unsigned long clear); void heki_unmap(unsigned long va, unsigned long end); void heki_callback(struct heki_args *args); +void heki_protect(unsigned long va, unsigned long end); +void heki_add_pa(struct heki_args *args, phys_addr_t pa, + unsigned long permissions); +void heki_apply_permissions(struct heki_args *args); /* Arch-specific functions. */ void heki_arch_early_init(void); +void heki_arch_late_init(void); unsigned long heki_flags_to_permissions(unsigned long flags); void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set, unsigned long *clear); void heki_text_poke_start(struct page **pages, int npages, pgprot_t prot); void heki_text_poke_end(struct page **pages, int npages, pgprot_t prot); +unsigned long heki_default_permissions(void); #else /* !CONFIG_HEKI */ diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig index 6d956eb9d04b..9bde84cd759e 100644 --- a/virt/heki/Kconfig +++ b/virt/heki/Kconfig @@ -8,6 +8,7 @@ config HEKI select KVM_GENERIC_MEMORY_ATTRIBUTES depends on !X86_16BIT select SPARSEMEM + depends on !JUMP_LABEL help This feature enhances guest virtual machine security by taking advantage of security features provided by the hypervisor for guests. diff --git a/virt/heki/counters.c b/virt/heki/counters.c index d0f830b0775a..302113bee6ff 100644 --- a/virt/heki/counters.c +++ b/virt/heki/counters.c @@ -13,6 +13,25 @@ DEFINE_MUTEX(heki_lock); +static inline unsigned long heki_permissions(struct heki_counters *counters) +{ + unsigned long permissions; + + if (!counters) + return heki_default_permissions(); + + permissions = 0; + if (counters->read) + permissions |= MEM_ATTR_READ; + if (counters->write) + permissions |= MEM_ATTR_WRITE; + if (counters->execute) + permissions |= MEM_ATTR_EXEC; + if (!permissions) + permissions = heki_default_permissions(); + return permissions; +} + static void heki_update_counters(struct heki_counters *counters, unsigned long perm, unsigned long set, unsigned long clear) @@ -53,20 +72,38 @@ static struct heki_counters *heki_create_counters(struct mem_table *table, return counters; } +static void heki_check_counters(struct heki_counters *counters, + unsigned long permissions) +{ + /* + * If a permission has been added to a PTE directly, it will not be + * reflected in the counters. Adjust for that. This is a bit of a + * hack, really. + */ + if ((permissions & MEM_ATTR_READ) && !counters->read) + counters->read++; + if ((permissions & MEM_ATTR_WRITE) && !counters->write) + counters->write++; + if ((permissions & MEM_ATTR_EXEC) && !counters->execute) + counters->execute++; +} + void heki_callback(struct heki_args *args) { /* The VA is only for debug. It is not really used in this function. */ unsigned long va; phys_addr_t pa, pa_end; - unsigned long permissions; + unsigned long permissions, existing, new; void **entry; struct heki_counters *counters; unsigned int ignore; + bool protect_memory; if (!pfn_valid(args->pa >> PAGE_SHIFT)) return; permissions = heki_flags_to_permissions(args->flags); + protect_memory = heki.protect_memory; /* * Handle counters for a leaf entry in the kernel page table. @@ -80,6 +117,8 @@ void heki_callback(struct heki_args *args) else counters = NULL; + existing = heki_permissions(counters); + switch (args->cmd) { case HEKI_MAP: if (!counters) @@ -102,10 +141,30 @@ void heki_callback(struct heki_args *args) permissions); break; + case HEKI_PROTECT_MEMORY: + if (counters) + heki_check_counters(counters, permissions); + existing = 0; + break; + default: WARN_ON_ONCE(1); break; } + + new = heki_permissions(counters) | args->set_global; + + /* + * To be able to use a pool of allocated memory for new + * executable or read-only mappings (e.g., kernel module + * loading), ignores immutable attribute if memory can be + * changed. + */ + if (new & MEM_ATTR_WRITE) + new &= ~MEM_ATTR_IMMUTABLE; + + if (protect_memory && existing != new) + heki_add_pa(args, pa, new); } } @@ -120,14 +179,20 @@ static void heki_func(unsigned long va, unsigned long end, mutex_lock(&heki_lock); + if (args->cmd == HEKI_PROTECT_MEMORY) + heki.protect_memory = true; + heki_walk(va, end, heki_callback, args); + if (args->head) + heki_apply_permissions(args); + mutex_unlock(&heki_lock); } /* * Find the mappings in the given range and initialize permission counters for - * them. + * them. Apply permissions in the host page table. */ void heki_map(unsigned long va, unsigned long end) { @@ -138,6 +203,25 @@ void heki_map(unsigned long va, unsigned long end) heki_func(va, end, &args); } +/* + * The architecture calls this to protect all guest pages at the end of + * kernel init. Up to this point, only the counters for guest pages have been + * updated. No permissions have been applied on the host page table. + * Now, the permissions will be applied. + * + * Beyond this point, the host page table permissions will always be updated + * whenever the counters are updated. + */ +void heki_protect(unsigned long va, unsigned long end) +{ + struct heki_args args = { + .cmd = HEKI_PROTECT_MEMORY, + .set_global = MEM_ATTR_IMMUTABLE, + }; + + heki_func(va, end, &args); +} + /* * Find the mappings in the given range and update permission counters for * them. Apply permissions in the host page table. @@ -156,7 +240,7 @@ void heki_update(unsigned long va, unsigned long end, unsigned long set, /* * Find the mappings in the given range and revert the permission counters for - * them. + * them. Apply permissions in the host page table. */ void heki_unmap(unsigned long va, unsigned long end) { diff --git a/virt/heki/main.c b/virt/heki/main.c index 0ab7de659e6f..5629334112e7 100644 --- a/virt/heki/main.c +++ b/virt/heki/main.c @@ -51,7 +51,7 @@ void heki_late_init(void) { struct heki_hypervisor *hypervisor = heki.hypervisor; - if (!heki_enabled || !heki.hypervisor) + if (!heki.counters) return; /* Locks control registers so a compromised guest cannot change them. */ @@ -59,6 +59,87 @@ void heki_late_init(void) return; pr_warn("Control registers locked\n"); + + heki_arch_late_init(); +} + +/* + * Build a list of guest pages with their permissions. This list will be + * passed to the VMM/Hypervisor to set these permissions in the host page + * table. + */ +void heki_add_pa(struct heki_args *args, phys_addr_t pa, + unsigned long permissions) +{ + struct heki_page_list *list = args->head; + struct heki_pages *hpage; + u64 max_pages; + struct page *page; + bool new = false; + + max_pages = (PAGE_SIZE - sizeof(*list)) / sizeof(*hpage); +again: + if (!list || list->npages == max_pages) { + page = alloc_page(GFP_KERNEL); + if (WARN_ON_ONCE(!page)) + return; + + list = page_address(page); + list->npages = 0; + list->next_pa = args->head_pa; + list->next = args->head; + + args->head = list; + args->head_pa = page_to_pfn(page) << PAGE_SHIFT; + new = true; + } + + hpage = &list->pages[list->npages]; + if (new) { + hpage->pa = pa; + hpage->epa = pa + PAGE_SIZE; + hpage->permissions = permissions; + return; + } + + if (pa == hpage->epa && permissions == hpage->permissions) { + hpage->epa += PAGE_SIZE; + return; + } + + list->npages++; + new = true; + goto again; +} + +void heki_apply_permissions(struct heki_args *args) +{ + struct heki_hypervisor *hypervisor = heki.hypervisor; + struct heki_page_list *list = args->head; + phys_addr_t list_pa = args->head_pa; + struct page *page; + int ret; + + if (!list) + return; + + /* The very last one must be included. */ + list->npages++; + + /* Protect guest memory in the host page table. */ + ret = hypervisor->protect_memory(list_pa); + if (ret) { + pr_warn("Failed to set memory permission\n"); + return; + } + + /* Free all the pages in the page list. */ + while (list) { + page = pfn_to_page(list_pa >> PAGE_SHIFT); + list_pa = list->next_pa; + list = list->next; + __free_pages(page, 0); + } } static int __init heki_parse_config(char *str) From patchwork Mon Nov 13 02:23:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13453499 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 863D6846B; Mon, 13 Nov 2023 02:25:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="HbDL//+3" Received: from smtp-8fab.mail.infomaniak.ch (smtp-8fab.mail.infomaniak.ch [IPv6:2001:1600:3:17::8fab]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F6F049DF; Sun, 12 Nov 2023 18:25:08 -0800 (PST) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4STCv440z4zMq2H7; Mon, 13 Nov 2023 02:25:04 +0000 (UTC) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4STCv31cKszMppt7; Mon, 13 Nov 2023 03:25:03 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1699842304; bh=qU7rpwZb8Bl+xuVDwtq6MKJ75S7Uo4UAc0PUDjw5xlk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HbDL//+3ATCwPmW6FeNdrIH7tZIwaMnw+QS1qdZf+B2mnT/MKQm4aP+mx66UUcjaD hsK1zTShrWzEU7FyzuuXAFeXbZ2X6WXdUpzO09fScu+mFKm7CXVRlh5HLUvFpTFOab J5+cWZjh90MaulzQheCaiQUYvIq2/sNhJR0yOd08= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Chao Peng , "Edgecombe, Rick P" , Forrest Yuan Yu , James Gowans , James Morris , John Andersen , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Thara Gopinath , Trilok Soni , Wei Liu , Will Deacon , Yu Zhang , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [RFC PATCH v2 19/19] virt: Add Heki KUnit tests Date: Sun, 12 Nov 2023 21:23:26 -0500 Message-ID: <20231113022326.24388-20-mic@digikod.net> In-Reply-To: <20231113022326.24388-1-mic@digikod.net> References: <20231113022326.24388-1-mic@digikod.net> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Infomaniak-Routing: alpha This adds a new CONFIG_HEKI_TEST option to run tests at boot. Because we use some symbols not exported to modules (e.g., kernel_set_to_readonly) this could not work as modules. To run these tests, we need to boot the kernel with the heki_test=N boot argument with N selecting a specific test: 1. heki_test_cr_disable_smep: Check CR pinning and try to disable SMEP. 2. heki_test_write_to_const: Check .rodata (const) protection. 3. heki_test_write_to_ro_after_init: Check __ro_after_init protection. 4. heki_test_exec: Check non-executable kernel memory. This way to select tests should not be required when the kernel will properly handle the triggered synthetic page faults. For now, these page faults make the kernel loop. All these tests temporarily disable the related kernel self-protections and should then failed if Heki doesn't protect the kernel. They are verbose to make it easier to understand what is going on. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Mickaël Salaün --- Changes since v1: * Move all tests to virt/heki/tests.c --- include/linux/heki.h | 1 + virt/heki/Kconfig | 12 +++ virt/heki/Makefile | 1 + virt/heki/main.c | 6 +- virt/heki/tests.c | 207 +++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 226 insertions(+), 1 deletion(-) create mode 100644 virt/heki/tests.c diff --git a/include/linux/heki.h b/include/linux/heki.h index 306bcec7ae92..9e2cf0051ab0 100644 --- a/include/linux/heki.h +++ b/include/linux/heki.h @@ -149,6 +149,7 @@ void heki_protect(unsigned long va, unsigned long end); void heki_add_pa(struct heki_args *args, phys_addr_t pa, unsigned long permissions); void heki_apply_permissions(struct heki_args *args); +void heki_run_test(void); /* Arch-specific functions. */ void heki_arch_early_init(void); diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig index 9bde84cd759e..fa814a921bb0 100644 --- a/virt/heki/Kconfig +++ b/virt/heki/Kconfig @@ -28,3 +28,15 @@ config HYPERVISOR_SUPPORTS_HEKI A hypervisor should select this when it can successfully build and run with CONFIG_HEKI. That is, it should provide all of the hypervisor support required for the Heki feature. + +config HEKI_TEST + bool "Tests for Heki" if !KUNIT_ALL_TESTS + depends on HEKI && KUNIT=y + default KUNIT_ALL_TESTS + help + Run Heki tests at runtime according to the heki_test=N boot + parameter, with N identifying the test to run (between 1 and 4). + + Before launching the init process, the system might not respond + because of unhandled kernel page fault. This will be fixed in a + next patch series. diff --git a/virt/heki/Makefile b/virt/heki/Makefile index 564f92faa9d8..a66cd0ba140b 100644 --- a/virt/heki/Makefile +++ b/virt/heki/Makefile @@ -3,3 +3,4 @@ obj-y += main.o obj-y += walk.o obj-y += counters.o +obj-y += tests.o diff --git a/virt/heki/main.c b/virt/heki/main.c index 5629334112e7..ce9984231996 100644 --- a/virt/heki/main.c +++ b/virt/heki/main.c @@ -51,8 +51,10 @@ void heki_late_init(void) { struct heki_hypervisor *hypervisor = heki.hypervisor; - if (!heki.counters) + if (!heki.counters) { + heki_run_test(); return; + } /* Locks control registers so a compromised guest cannot change them. */ if (WARN_ON(hypervisor->lock_crs())) @@ -61,6 +63,8 @@ void heki_late_init(void) pr_warn("Control registers locked\n"); heki_arch_late_init(); + + heki_run_test(); } /* diff --git a/virt/heki/tests.c b/virt/heki/tests.c new file mode 100644 index 000000000000..6e6542b257f1 --- /dev/null +++ b/virt/heki/tests.c @@ -0,0 +1,207 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Hypervisor Enforced Kernel Integrity (Heki) - Common code + * + * Copyright © 2023 Microsoft Corporation + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "common.h" + +#ifdef CONFIG_HEKI_TEST + +/* Heki test data */ + +/* Takes two pages to not change permission of other read-only pages. */ +const char heki_test_const_buf[PAGE_SIZE * 2] = {}; +char heki_test_ro_after_init_buf[PAGE_SIZE * 2] __ro_after_init = {}; + +long heki_test_exec_data(long); +void _test_exec_data_end(void); + +/* Used to test ROP execution against the .rodata section. */ +/* clang-format off */ +asm( +".pushsection .rodata;" // NOT .text section +".global heki_test_exec_data;" +".type heki_test_exec_data, @function;" +"heki_test_exec_data:" +ASM_ENDBR +"movq %rdi, %rax;" +"inc %rax;" +ASM_RET +".size heki_test_exec_data, .-heki_test_exec_data;" +"_test_exec_data_end:" +".popsection"); +/* clang-format on */ + +static void heki_test_cr_disable_smep(struct kunit *test) +{ + unsigned long cr4; + + /* SMEP should be initially enabled. */ + KUNIT_ASSERT_TRUE(test, __read_cr4() & X86_CR4_SMEP); + + kunit_warn(test, + "Starting control register pinning tests with SMEP check\n"); + + /* + * Trying to disable SMEP, bypassing kernel self-protection by not + * using cr4_clear_bits(X86_CR4_SMEP). + */ + cr4 = __read_cr4() & ~X86_CR4_SMEP; + asm volatile("mov %0,%%cr4" : "+r"(cr4) : : "memory"); + + /* SMEP should still be enabled. */ + KUNIT_ASSERT_TRUE(test, __read_cr4() & X86_CR4_SMEP); +} + +static inline void print_addr(struct kunit *test, const char *const buf_name, + void *const buf) +{ + const pte_t pte = *virt_to_kpte((unsigned long)buf); + const phys_addr_t paddr = slow_virt_to_phys(buf); + bool present = pte_flags(pte) & (_PAGE_PRESENT); + bool accessible = pte_accessible(&init_mm, pte); + + kunit_warn( + test, + "%s vaddr:%llx paddr:%llx exec:%d write:%d present:%d accessible:%d\n", + buf_name, (unsigned long long)buf, paddr, !!pte_exec(pte), + !!pte_write(pte), present, accessible); +} + +extern int kernel_set_to_readonly; + +static void heki_test_write_to_rodata(struct kunit *test, + const char *const buf_name, + char *const ro_buf) +{ + print_addr(test, buf_name, (void *)ro_buf); + KUNIT_EXPECT_EQ(test, 0, *ro_buf); + + kunit_warn( + test, + "Bypassing kernel self-protection: mark memory as writable\n"); + kernel_set_to_readonly = 0; + /* + * Removes execute permission that might be set by bugdoor-exec, + * because change_page_attr_clear() is not use by set_memory_rw(). + * This is required since commit 652c5bf380ad ("x86/mm: Refuse W^X + * violations"). + */ + KUNIT_ASSERT_FALSE(test, set_memory_nx((unsigned long)PTR_ALIGN_DOWN( + ro_buf, PAGE_SIZE), + 1)); + KUNIT_ASSERT_FALSE(test, set_memory_rw((unsigned long)PTR_ALIGN_DOWN( + ro_buf, PAGE_SIZE), + 1)); + kernel_set_to_readonly = 1; + + kunit_warn(test, "Trying memory write\n"); + *ro_buf = 0x11; + KUNIT_EXPECT_EQ(test, 0, *ro_buf); + kunit_warn(test, "New content: 0x%02x\n", *ro_buf); +} + +static void heki_test_write_to_const(struct kunit *test) +{ + heki_test_write_to_rodata(test, "const_buf", + (void *)heki_test_const_buf); +} + +static void heki_test_write_to_ro_after_init(struct kunit *test) +{ + heki_test_write_to_rodata(test, "ro_after_init_buf", + (void *)heki_test_ro_after_init_buf); +} + +typedef long test_exec_t(long); + +static void heki_test_exec(struct kunit *test) +{ + const size_t exec_size = 7; + unsigned long nx_page_start = (unsigned long)PTR_ALIGN_DOWN( + (const void *const)heki_test_exec_data, PAGE_SIZE); + unsigned long nx_page_end = (unsigned long)PTR_ALIGN( + (const void *const)heki_test_exec_data + exec_size, PAGE_SIZE); + test_exec_t *exec = (test_exec_t *)heki_test_exec_data; + long ret; + + /* Starting non-executable memory tests. */ + print_addr(test, "test_exec_data", heki_test_exec_data); + + kunit_warn( + test, + "Bypassing kernel-self protection: mark memory as executable\n"); + kernel_set_to_readonly = 0; + KUNIT_ASSERT_FALSE(test, + set_memory_rox(nx_page_start, + PFN_UP(nx_page_end - nx_page_start))); + kernel_set_to_readonly = 1; + + kunit_warn( + test, + "Trying to execute data (ROP) in (initially) non-executable memory\n"); + ret = exec(3); + + /* This should not be reached because of the uncaught page fault. */ + KUNIT_EXPECT_EQ(test, 3, ret); + kunit_warn(test, "Result of execution: 3 + 1 = %ld\n", ret); +} + +const struct kunit_case heki_test_cases[] = { + KUNIT_CASE(heki_test_cr_disable_smep), + KUNIT_CASE(heki_test_write_to_const), + KUNIT_CASE(heki_test_write_to_ro_after_init), + KUNIT_CASE(heki_test_exec), + {} +}; + +static unsigned long heki_test __ro_after_init; + +static int __init parse_heki_test_config(char *str) +{ + if (kstrtoul(str, 10, &heki_test) || + heki_test > (ARRAY_SIZE(heki_test_cases) - 1)) + pr_warn("Invalid option string for heki_test: '%s'\n", str); + return 1; +} + +__setup("heki_test=", parse_heki_test_config); + +void heki_run_test(void) +{ + struct kunit_case heki_test_case[2] = {}; + struct kunit_suite heki_test_suite = { + .name = "heki", + .test_cases = heki_test_case, + }; + struct kunit_suite *const test_suite = &heki_test_suite; + + if (!kunit_enabled() || heki_test == 0 || + heki_test >= ARRAY_SIZE(heki_test_cases)) + return; + + pr_warn("Running test #%lu\n", heki_test); + heki_test_case[0] = heki_test_cases[heki_test - 1]; + __kunit_test_suites_init(&test_suite, 1); +} + +#else /* CONFIG_HEKI_TEST */ + +void heki_run_test(void) +{ +} + +#endif /* CONFIG_HEKI_TEST */