From patchwork Fri May 5 15:20:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13232763 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F3AFDC77B7F for ; Fri, 5 May 2023 15:32:39 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1puxOw-00083G-BY; Fri, 05 May 2023 11:31:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG1-0003EH-Gk for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:06 -0400 Received: from smtp-42ab.mail.infomaniak.ch ([2001:1600:3:17::42ab]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxFz-0003iu-90 for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:05 -0400 Received: from smtp-3-0000.mail.infomaniak.ch (unknown [10.4.36.107]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4QCZD54cDqzMqZgk; Fri, 5 May 2023 17:21:57 +0200 (CEST) Received: from unknown by smtp-3-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4QCZD330PNz1j3; Fri, 5 May 2023 17:21:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1683300117; bh=Sck5/KPfNUhl8VtyERkRdjhXvgnFmYNWT4AHdC4aHDg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Xy2uQqu3d6UyYIRq1FHdiksjhJdSOrSuJr5w83oL1hRdMuCXet7azQy+AgpzQUfui mWmlvk+7HFvNzEiU9UANLTHxNmTdUoA1IlKd2pKbr/FKAIab/IQO57y70uUQT1Y0RN NjhO3YHPUfCPweoKfcyI2ayyvi0vzx4JsaOeBc24= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Forrest Yuan Yu , James Morris , John Andersen , Liran Alon , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Rick Edgecombe , Thara Gopinath , Will Deacon , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v1 1/9] KVM: x86: Add kvm_x86_ops.fault_gva() Date: Fri, 5 May 2023 17:20:38 +0200 Message-Id: <20230505152046.6575-2-mic@digikod.net> In-Reply-To: <20230505152046.6575-1-mic@digikod.net> References: <20230505152046.6575-1-mic@digikod.net> MIME-Version: 1.0 X-Infomaniak-Routing: alpha Received-SPF: pass client-ip=2001:1600:3:17::42ab; envelope-from=mic@digikod.net; helo=smtp-42ab.mail.infomaniak.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Fri, 05 May 2023 11:31:05 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org This function is needed for kvm_mmu_page_fault() to create synthetic page faults. Code originally written by Mihai Donțu and Nicușor Cîțu: https://lore.kernel.org/r/20211006173113.26445-18-alazar@bitdefender.com Renamed fault_gla() to fault_gva() and use the new EPT_VIOLATION_GVA_IS_VALID. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Co-developed-by: Mihai Donțu Signed-off-by: Mihai Donțu Co-developed-by: Nicușor Cîțu Signed-off-by: Nicușor Cîțu Signed-off-by: Mickaël Salaün Link: https://lore.kernel.org/r/20230505152046.6575-2-mic@digikod.net --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/svm/svm.c | 9 +++++++++ arch/x86/kvm/vmx/vmx.c | 10 ++++++++++ 4 files changed, 22 insertions(+) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index abccd51dcfca..b761182a9444 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -131,6 +131,7 @@ KVM_X86_OP(msr_filter_changed) KVM_X86_OP(complete_emulated_msr) KVM_X86_OP(vcpu_deliver_sipi_vector) KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); +KVM_X86_OP(fault_gva) #undef KVM_X86_OP #undef KVM_X86_OP_OPTIONAL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6aaae18f1854..f319bcdeb8bd 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1706,6 +1706,8 @@ struct kvm_x86_ops { * Returns vCPU specific APICv inhibit reasons */ unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); + + u64 (*fault_gva)(struct kvm_vcpu *vcpu); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 9a194aa1a75a..8b47b38aaf7f 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4700,6 +4700,13 @@ static int svm_vm_init(struct kvm *kvm) return 0; } +static u64 svm_fault_gva(struct kvm_vcpu *vcpu) +{ + const struct vcpu_svm *svm = to_svm(vcpu); + + return svm->vcpu.arch.cr2 ? svm->vcpu.arch.cr2 : ~0ull; +} + static struct kvm_x86_ops svm_x86_ops __initdata = { .name = "kvm_amd", @@ -4826,6 +4833,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, .vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons, + + .fault_gva = svm_fault_gva, }; /* diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 7eec0226d56a..9870db887a62 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8067,6 +8067,14 @@ static void vmx_vm_destroy(struct kvm *kvm) free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); } +static u64 vmx_fault_gva(struct kvm_vcpu *vcpu) +{ + if (vcpu->arch.exit_qualification & EPT_VIOLATION_GVA_IS_VALID) + return vmcs_readl(GUEST_LINEAR_ADDRESS); + + return ~0ull; +} + static struct kvm_x86_ops vmx_x86_ops __initdata = { .name = "kvm_intel", @@ -8204,6 +8212,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + + .fault_gva = vmx_fault_gva, }; static unsigned int vmx_handle_intel_pt_intr(void) From patchwork Fri May 5 15:20:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13232765 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 626F4C7EE22 for ; Fri, 5 May 2023 15:33:02 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1puxOy-00089y-FE; Fri, 05 May 2023 11:31:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG3-0003Ed-Ag for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:07 -0400 Received: from smtp-42ab.mail.infomaniak.ch ([2001:1600:3:17::42ab]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxFz-0003iw-91 for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:07 -0400 Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4QCZD66dQnzMqFxM; Fri, 5 May 2023 17:21:58 +0200 (CEST) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4QCZD60R4vzMpxBm; Fri, 5 May 2023 17:21:58 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1683300118; bh=SrFXUO04Nqgobaetk479001er96mJxIHBBoQ7lZlahM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=wpaXL8F3FW0faPdmfsziq7KNPT5JCNQ53r4Uxr1P2zxnScOSqz2qoUmOrpuWgBv6i NuH/ri5XXTqceSyc0YP1flR3e6CIXrSsEa0s/oRFw99Emxppybi4No1rUvvsCb7yVM kRuXppKZpZtphifBIw5w6qtCumEL2JsEPthxKuz0= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Forrest Yuan Yu , James Morris , John Andersen , Liran Alon , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Rick Edgecombe , Thara Gopinath , Will Deacon , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v1 2/9] KVM: x86/mmu: Add support for prewrite page tracking Date: Fri, 5 May 2023 17:20:39 +0200 Message-Id: <20230505152046.6575-3-mic@digikod.net> In-Reply-To: <20230505152046.6575-1-mic@digikod.net> References: <20230505152046.6575-1-mic@digikod.net> MIME-Version: 1.0 X-Infomaniak-Routing: alpha Received-SPF: pass client-ip=2001:1600:3:17::42ab; envelope-from=mic@digikod.net; helo=smtp-42ab.mail.infomaniak.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Fri, 05 May 2023 11:31:05 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Add a new page tracking mode to deny a page update and throw a page fault to the guest. This is useful for KVM to be able to make some pages non-writable (not read-only because it doesn't imply execution restrictions), see the next Heki commits. This kind of synthetic kernel page fault needs to be handled by the guest, which is not currently the case, making it try again and again. This will be part of a follow-up patch series. Update emulator_read_write_onepage() to handle X86EMUL_CONTINUE and X86EMUL_PROPAGATE_FAULT. Update page_fault_handle_page_track() to call kvm_slot_page_track_is_active() whenever this is required for KVM_PAGE_TRACK_PREWRITE and KVM_PAGE_TRACK_WRITE, even if one tracker already returned true. Invert the return code semantic for read_emulate() and write_emulate(): - from 1=Ok 0=Error - to X86EMUL_* return codes (e.g. X86EMUL_CONTINUE == 0) Imported the prewrite page tracking support part originally written by Mihai Donțu, Marian Rotariu, and Ștefan Șicleru: https://lore.kernel.org/r/20211006173113.26445-27-alazar@bitdefender.com https://lore.kernel.org/r/20211006173113.26445-28-alazar@bitdefender.com Removed the GVA changes for page tracking, removed the X86EMUL_RETRY_INSTR case, and some emulation part for now. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Marian Rotariu Cc: Mihai Donțu Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Cc: Ștefan Șicleru Signed-off-by: Mickaël Salaün Link: https://lore.kernel.org/r/20230505152046.6575-3-mic@digikod.net --- arch/x86/include/asm/kvm_page_track.h | 12 +++++ arch/x86/kvm/mmu/mmu.c | 64 ++++++++++++++++++++++----- arch/x86/kvm/mmu/page_track.c | 33 +++++++++++++- arch/x86/kvm/mmu/spte.c | 6 +++ arch/x86/kvm/x86.c | 27 +++++++---- 5 files changed, 122 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h index eb186bc57f6a..a7fb4ff888e6 100644 --- a/arch/x86/include/asm/kvm_page_track.h +++ b/arch/x86/include/asm/kvm_page_track.h @@ -3,6 +3,7 @@ #define _ASM_X86_KVM_PAGE_TRACK_H enum kvm_page_track_mode { + KVM_PAGE_TRACK_PREWRITE, KVM_PAGE_TRACK_WRITE, KVM_PAGE_TRACK_MAX, }; @@ -22,6 +23,16 @@ struct kvm_page_track_notifier_head { struct kvm_page_track_notifier_node { struct hlist_node node; + /* + * It is called when guest is writing the write-tracked page + * and the write emulation didn't happened yet. + * + * @vcpu: the vcpu where the write access happened + * @gpa: the physical address written by guest + * @node: this nodet + */ + bool (*track_prewrite)(struct kvm_vcpu *vcpu, gpa_t gpa, + struct kvm_page_track_notifier_node *node); /* * It is called when guest is writing the write-tracked page * and write emulation is finished at that time. @@ -73,6 +84,7 @@ kvm_page_track_register_notifier(struct kvm *kvm, void kvm_page_track_unregister_notifier(struct kvm *kvm, struct kvm_page_track_notifier_node *n); +bool kvm_page_track_prewrite(struct kvm_vcpu *vcpu, gpa_t gpa); void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, int bytes); void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 835426254e76..e5d1e241ff0f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -793,9 +793,13 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) slot = __gfn_to_memslot(slots, gfn); /* the non-leaf shadow pages are keeping readonly. */ - if (sp->role.level > PG_LEVEL_4K) - return kvm_slot_page_track_add_page(kvm, slot, gfn, - KVM_PAGE_TRACK_WRITE); + if (sp->role.level > PG_LEVEL_4K) { + kvm_slot_page_track_add_page(kvm, slot, gfn, + KVM_PAGE_TRACK_PREWRITE); + kvm_slot_page_track_add_page(kvm, slot, gfn, + KVM_PAGE_TRACK_WRITE); + return; + } kvm_mmu_gfn_disallow_lpage(slot, gfn); @@ -840,9 +844,13 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) gfn = sp->gfn; slots = kvm_memslots_for_spte_role(kvm, sp->role); slot = __gfn_to_memslot(slots, gfn); - if (sp->role.level > PG_LEVEL_4K) - return kvm_slot_page_track_remove_page(kvm, slot, gfn, - KVM_PAGE_TRACK_WRITE); + if (sp->role.level > PG_LEVEL_4K) { + kvm_slot_page_track_remove_page(kvm, slot, gfn, + KVM_PAGE_TRACK_PREWRITE); + kvm_slot_page_track_remove_page(kvm, slot, gfn, + KVM_PAGE_TRACK_WRITE); + return; + } kvm_mmu_gfn_allow_lpage(slot, gfn); } @@ -2714,7 +2722,10 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, * track machinery is used to write-protect upper-level shadow pages, * i.e. this guards the role.level == 4K assertion below! */ - if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE)) + if (kvm_slot_page_track_is_active(kvm, slot, gfn, + KVM_PAGE_TRACK_PREWRITE) || + kvm_slot_page_track_is_active(kvm, slot, gfn, + KVM_PAGE_TRACK_WRITE)) return -EPERM; /* @@ -4103,6 +4114,8 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct) static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { + bool ret = false; + if (unlikely(fault->rsvd)) return false; @@ -4113,10 +4126,14 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu, * guest is writing the page which is write tracked which can * not be fixed by page fault handler. */ - if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE)) - return true; + ret = kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, + KVM_PAGE_TRACK_PREWRITE) || + ret; + ret = kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, + KVM_PAGE_TRACK_WRITE) || + ret; - return false; + return ret; } static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr) @@ -5600,6 +5617,33 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err if (r != RET_PF_EMULATE) return 1; + if ((error_code & PFERR_WRITE_MASK) && + !kvm_page_track_prewrite(vcpu, cr2_or_gpa)) { + struct x86_exception fault = { + .vector = PF_VECTOR, + .error_code_valid = true, + .error_code = error_code, + .nested_page_fault = false, + /* + * TODO: This kind of kernel page fault needs to be handled by + * the guest, which is not currently the case, making it try + * again and again. + * + * You may want to test with cr2_or_gva to see the page + * fault caught by the guest kernel (thinking it is a + * user space fault). + */ + .address = static_call(kvm_x86_fault_gva)(vcpu), + .async_page_fault = false, + }; + + pr_warn_ratelimited( + "heki-kvm: Creating write #PF at 0x%016llx\n", + fault.address); + kvm_inject_page_fault(vcpu, &fault); + return RET_PF_INVALID; + } + /* * Before emulating the instruction, check if the error code * was due to a RO violation while translating the guest page. diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c index 2e09d1b6249f..2454887cd48b 100644 --- a/arch/x86/kvm/mmu/page_track.c +++ b/arch/x86/kvm/mmu/page_track.c @@ -131,9 +131,10 @@ void kvm_slot_page_track_add_page(struct kvm *kvm, */ kvm_mmu_gfn_disallow_lpage(slot, gfn); - if (mode == KVM_PAGE_TRACK_WRITE) + if (mode == KVM_PAGE_TRACK_PREWRITE || mode == KVM_PAGE_TRACK_WRITE) { if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K)) kvm_flush_remote_tlbs(kvm); + } } EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page); @@ -248,6 +249,36 @@ kvm_page_track_unregister_notifier(struct kvm *kvm, } EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier); +/* + * Notify the node that a write access is about to happen. Returning false + * doesn't stop the other nodes from being called, but it will stop + * the emulation. + * + * The node should figure out if the written page is the one that the node + * is interested in by itself. + */ +bool kvm_page_track_prewrite(struct kvm_vcpu *vcpu, gpa_t gpa) +{ + struct kvm_page_track_notifier_head *head; + struct kvm_page_track_notifier_node *n; + int idx; + bool ret = true; + + head = &vcpu->kvm->arch.track_notifier_head; + + if (hlist_empty(&head->track_notifier_list)) + return ret; + + idx = srcu_read_lock(&head->track_srcu); + hlist_for_each_entry_srcu(n, &head->track_notifier_list, node, + srcu_read_lock_held(&head->track_srcu)) + if (n->track_prewrite) + if (!n->track_prewrite(vcpu, gpa, n)) + ret = false; + srcu_read_unlock(&head->track_srcu, idx); + return ret; +} + /* * Notify the node that write access is intercepted and write emulation is * finished at this time. diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index c0fd7e049b4e..639f220a1ed5 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -144,6 +144,12 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, u64 spte = SPTE_MMU_PRESENT_MASK; bool wrprot = false; + if (kvm_slot_page_track_is_active(vcpu->kvm, slot, gfn, + KVM_PAGE_TRACK_PREWRITE) || + kvm_slot_page_track_is_active(vcpu->kvm, slot, gfn, + KVM_PAGE_TRACK_WRITE)) + pte_access &= ~ACC_WRITE_MASK; + WARN_ON_ONCE(!pte_access && !shadow_present_mask); if (sp->role.ad_disabled) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a2c299d47e69..fd05f42c9913 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7325,6 +7325,7 @@ static int kvm_write_guest_virt_helper(gva_t addr, void *val, unsigned int bytes r = X86EMUL_IO_NEEDED; goto out; } + kvm_page_track_write(vcpu, gpa, data, towrite); bytes -= towrite; data += towrite; @@ -7441,13 +7442,12 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva, int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, const void *val, int bytes) { - int ret; - - ret = kvm_vcpu_write_guest(vcpu, gpa, val, bytes); - if (ret < 0) - return 0; + if (!kvm_page_track_prewrite(vcpu, gpa)) + return X86EMUL_PROPAGATE_FAULT; + if (kvm_vcpu_write_guest(vcpu, gpa, val, bytes)) + return X86EMUL_UNHANDLEABLE; kvm_page_track_write(vcpu, gpa, val, bytes); - return 1; + return X86EMUL_CONTINUE; } struct read_write_emulator_ops { @@ -7477,7 +7477,9 @@ static int read_prepare(struct kvm_vcpu *vcpu, void *val, int bytes) static int read_emulate(struct kvm_vcpu *vcpu, gpa_t gpa, void *val, int bytes) { - return !kvm_vcpu_read_guest(vcpu, gpa, val, bytes); + if (kvm_vcpu_read_guest(vcpu, gpa, val, bytes)) + return X86EMUL_UNHANDLEABLE; + return X86EMUL_CONTINUE; } static int write_emulate(struct kvm_vcpu *vcpu, gpa_t gpa, @@ -7551,8 +7553,12 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, return X86EMUL_PROPAGATE_FAULT; } - if (!ret && ops->read_write_emulate(vcpu, gpa, val, bytes)) - return X86EMUL_CONTINUE; + if (!ret) { + ret = ops->read_write_emulate(vcpu, gpa, val, bytes); + if (ret != X86EMUL_UNHANDLEABLE) + /* Handles X86EMUL_CONTINUE and X86EMUL_PROPAGATE_FAULT. */ + return ret; + } /* * Is this MMIO handled locally? @@ -7689,6 +7695,9 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt, if (kvm_is_error_hva(hva)) goto emul_write; + if (!kvm_page_track_prewrite(vcpu, gpa)) + return X86EMUL_PROPAGATE_FAULT; + hva += offset_in_page(gpa); switch (bytes) { From patchwork Fri May 5 15:20:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13232762 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D72CEC77B7C for ; Fri, 5 May 2023 15:32:33 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1puxOx-00086q-77; Fri, 05 May 2023 11:31:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG7-0003FB-AD for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:11 -0400 Received: from smtp-bc0e.mail.infomaniak.ch ([2001:1600:4:17::bc0e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG0-0003j5-BA for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:10 -0400 Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4QCZD81Wj3zMqHLq; Fri, 5 May 2023 17:22:00 +0200 (CEST) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4QCZD73gl0zMpxBk; Fri, 5 May 2023 17:21:59 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1683300120; bh=lpToIjkPWA0duCR50VLguq+aftI5CkhRhLIM9+Y09DI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=zFJPl8PCpV0TqwQBoPoSGHQX+/8+SSuy0c9K2NPBDqUtmLq0QmGY7FWD3HryhaVm/ 17tCkfTiTGYBupyQ9RSGCsspEYrGNJuhBEUcpQDTrOdzsDu04Esrk+DE1xaHkpmpk1 MM8P5jZUL4ILscFaudbqPEsff1UIBvirPP3WnImg= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Forrest Yuan Yu , James Morris , John Andersen , Liran Alon , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Rick Edgecombe , Thara Gopinath , Will Deacon , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v1 3/9] virt: Implement Heki common code Date: Fri, 5 May 2023 17:20:40 +0200 Message-Id: <20230505152046.6575-4-mic@digikod.net> In-Reply-To: <20230505152046.6575-1-mic@digikod.net> References: <20230505152046.6575-1-mic@digikod.net> MIME-Version: 1.0 X-Infomaniak-Routing: alpha Received-SPF: pass client-ip=2001:1600:4:17::bc0e; envelope-from=mic@digikod.net; helo=smtp-bc0e.mail.infomaniak.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, T_SPF_HELO_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Fri, 05 May 2023 11:31:05 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Madhavan T. Venkataraman Hypervisor Enforced Kernel Integrity (Heki) is a feature that will use the hypervisor to enhance guest virtual machine security. Configuration ============= Define the config variables for the feature. This feature depends on support from the architecture as well as the hypervisor. Enabling HEKI ============= Define a kernel command line parameter "heki" to turn the feature on or off. By default, Heki is on. Feature initialization ====================== The linker script, vmlinux.lds.S, defines a number of sections that are loaded in kernel memory. Each of these sections has its own permissions. For instance, .text has HEKI_ATTR_MEM_EXEC | HEKI_ATTR_MEM_NOWRITE, and .rodata has HEKI_ATTR_MEM_NOWRITE. Define an architecture specific init function, heki_arch_init(). In this function, collect the ranges of all of the sections. These sections will be protected in the host page table with their respective permissions so that even if the guest kernel is compromised, their permissions cannot be changed. Define heki_early_init() to initialize the feature. For now, this function just checks if the feature is enabled and calls heki_arch_init(). Define heki_late_init() that protects the sections in the host page table. This needs hypervisor support which will be introduced in the future. This function is called at the end of kernel init. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Mickaël Salaün Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Madhavan T. Venkataraman Link: https://lore.kernel.org/r/20230505152046.6575-4-mic@digikod.net --- Kconfig | 2 + arch/x86/Kconfig | 1 + arch/x86/include/asm/sections.h | 4 + arch/x86/kernel/setup.c | 49 ++++++++++++ include/linux/heki.h | 90 +++++++++++++++++++++ init/main.c | 3 + virt/Makefile | 1 + virt/heki/Kconfig | 22 ++++++ virt/heki/Makefile | 3 + virt/heki/heki.c | 135 ++++++++++++++++++++++++++++++++ 10 files changed, 310 insertions(+) create mode 100644 include/linux/heki.h create mode 100644 virt/heki/Kconfig create mode 100644 virt/heki/Makefile create mode 100644 virt/heki/heki.c diff --git a/Kconfig b/Kconfig index 745bc773f567..0c844d9bcb03 100644 --- a/Kconfig +++ b/Kconfig @@ -29,4 +29,6 @@ source "lib/Kconfig" source "lib/Kconfig.debug" +source "virt/heki/Kconfig" + source "Documentation/Kconfig" diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3604074a878b..5cf5a7a97811 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -297,6 +297,7 @@ config X86 select FUNCTION_ALIGNMENT_4B imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE + select ARCH_SUPPORTS_HEKI if X86_64 config INSTRUCTION_DECODER def_bool y diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h index a6e8373a5170..42ef1e33b8a5 100644 --- a/arch/x86/include/asm/sections.h +++ b/arch/x86/include/asm/sections.h @@ -18,6 +18,10 @@ extern char __end_of_kernel_reserve[]; extern unsigned long _brk_start, _brk_end; +extern int __start_orc_unwind_ip[], __stop_orc_unwind_ip[]; +extern struct orc_entry __start_orc_unwind[], __stop_orc_unwind[]; +extern unsigned int orc_lookup[], orc_lookup_end[]; + static inline bool arch_is_kernel_initmem_freed(unsigned long addr) { /* diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 88188549647c..f0ddaf24ab63 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -850,6 +851,54 @@ static void __init x86_report_nx(void) } } +#ifdef CONFIG_HEKI + +/* + * Gather all of the statically defined sections so heki_late_init() can + * protect these sections in the host page table. + * + * The sections are defined under "SECTIONS" in vmlinux.lds.S + * Keep this array in sync with SECTIONS. + */ +struct heki_va_range __initdata heki_va_ranges[] = { + { + .va_start = _stext, + .va_end = _etext, + .attributes = HEKI_ATTR_MEM_NOWRITE | HEKI_ATTR_MEM_EXEC, + }, + { + .va_start = __start_rodata, + .va_end = __end_rodata, + .attributes = HEKI_ATTR_MEM_NOWRITE, + }, +#ifdef CONFIG_UNWINDER_ORC + { + .va_start = __start_orc_unwind_ip, + .va_end = __stop_orc_unwind_ip, + .attributes = HEKI_ATTR_MEM_NOWRITE, + }, + { + .va_start = __start_orc_unwind, + .va_end = __stop_orc_unwind, + .attributes = HEKI_ATTR_MEM_NOWRITE, + }, + { + .va_start = orc_lookup, + .va_end = orc_lookup_end, + .attributes = HEKI_ATTR_MEM_NOWRITE, + }, +#endif /* CONFIG_UNWINDER_ORC */ +}; + +void __init heki_arch_init(void) +{ + heki.num_static_ranges = ARRAY_SIZE(heki_va_ranges); + heki.static_ranges = + heki_alloc_pa_ranges(heki_va_ranges, heki.num_static_ranges); +} + +#endif /* CONFIG_HEKI */ + /* * Determine if we were loaded by an EFI loader. If so, then we have also been * passed the efi memmap, systab, etc., so we should use these data structures diff --git a/include/linux/heki.h b/include/linux/heki.h new file mode 100644 index 000000000000..e4a3192ba687 --- /dev/null +++ b/include/linux/heki.h @@ -0,0 +1,90 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Hypervisor Enforced Kernel Integrity (Heki) - Headers + * + * Copyright © 2023 Microsoft Corporation + */ + +#ifndef __HEKI_H__ +#define __HEKI_H__ + +#ifdef CONFIG_HEKI + +#include + +/* Heki attributes for memory pages. */ +/* clang-format off */ +#define HEKI_ATTR_MEM_NOWRITE (1ULL << 0) +#define HEKI_ATTR_MEM_EXEC (1ULL << 1) +/* clang-format on */ + +/* + * heki_va_range is used to specify a virtual address range within the kernel + * address space along with their attributes. + */ +struct heki_va_range { + void *va_start; + void *va_end; + u64 attributes; +}; + +/* + * heki_pa_range is passed to the VMM or hypervisor so it can be processed by + * the VMM or the hypervisor based on range attributes. Examples of ranges: + * + * - a range whose permissions need to be set in the host page table + * - a range that contains information needed for authentication + * + * When an array of these is passed to the Hypervisor or VMM, the array + * must be in physically contiguous memory. + */ +struct heki_pa_range { + gfn_t gfn_start; + gfn_t gfn_end; + u64 attributes; +}; + +/* + * A hypervisor that supports Heki will instantiate this structure to + * provide hypervisor specific functions for Heki. + */ +struct heki_hypervisor { + int (*protect_ranges)(struct heki_pa_range *ranges, int num_ranges); + int (*lock_crs)(void); +}; + +/* + * If the architecture supports Heki, it will initialize static_ranges in + * early boot. + * + * If the active hypervisor supports Heki, it will plug its heki_hypervisor + * pointer into this heki structure. + */ +struct heki { + struct heki_pa_range *static_ranges; + int num_static_ranges; + struct heki_hypervisor *hypervisor; +}; + +extern struct heki heki; + +void heki_early_init(void); +void heki_arch_init(void); +void heki_late_init(void); + +struct heki_pa_range *heki_alloc_pa_ranges(struct heki_va_range *va_ranges, + int num_ranges); +void heki_free_pa_ranges(struct heki_pa_range *pa_ranges, int num_ranges); + +#else /* !CONFIG_HEKI */ + +static inline void heki_early_init(void) +{ +} +static inline void heki_late_init(void) +{ +} + +#endif /* CONFIG_HEKI */ + +#endif /* __HEKI_H__ */ diff --git a/init/main.c b/init/main.c index e1c3911d7c70..8649dbb07f18 100644 --- a/init/main.c +++ b/init/main.c @@ -102,6 +102,7 @@ #include #include #include +#include #include #include @@ -999,6 +1000,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) sort_main_extable(); trap_init(); mm_init(); + heki_early_init(); poking_init(); ftrace_init(); @@ -1530,6 +1532,7 @@ static int __ref kernel_init(void *unused) exit_boot_config(); free_initmem(); mark_readonly(); + heki_late_init(); /* * Kernel mappings are now finalized - update the userspace page-table diff --git a/virt/Makefile b/virt/Makefile index 1cfea9436af9..4550dc624466 100644 --- a/virt/Makefile +++ b/virt/Makefile @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only obj-y += lib/ +obj-$(CONFIG_HEKI) += heki/ diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig new file mode 100644 index 000000000000..9858a827fe17 --- /dev/null +++ b/virt/heki/Kconfig @@ -0,0 +1,22 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Hypervisor Enforced Kernel Integrity (HEKI) +# + +config HEKI + bool "Hypervisor Enforced Kernel Integrity (Heki)" + default y + depends on !JUMP_LABEL && ARCH_SUPPORTS_HEKI + select KVM_EXTERNAL_WRITE_TRACKING if KVM + help + This feature enhances guest virtual machine security by taking + advantage of security features provided by the hypervisor for guests. + This feature is helpful in maintaining guest virtual machine security + even after the guest kernel has been compromised. + +config ARCH_SUPPORTS_HEKI + bool "Architecture support for HEKI" + help + An architecture should select this when it can successfully build + and run with CONFIG_HEKI. That is, it should provide all of the + architecture support required for the HEKI feature. diff --git a/virt/heki/Makefile b/virt/heki/Makefile new file mode 100644 index 000000000000..2bc2061c9dfc --- /dev/null +++ b/virt/heki/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only + +obj-y += heki.o diff --git a/virt/heki/heki.c b/virt/heki/heki.c new file mode 100644 index 000000000000..c8cb1b84cceb --- /dev/null +++ b/virt/heki/heki.c @@ -0,0 +1,135 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Hypervisor Enforced Kernel Integrity (Heki) - Common code + * + * Copyright © 2023 Microsoft Corporation + */ + +#include +#include +#include +#include +#include +#include +#include + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) "heki-guest: " fmt + +static bool heki_enabled __ro_after_init = true; + +struct heki heki = {}; + +struct heki_pa_range *heki_alloc_pa_ranges(struct heki_va_range *va_ranges, + int num_ranges) +{ + struct heki_pa_range *pa_ranges, *pa_range; + struct heki_va_range *va_range; + u64 attributes; + size_t size; + int i; + + size = PAGE_ALIGN(sizeof(struct heki_pa_range) * num_ranges); + pa_ranges = alloc_pages_exact(size, GFP_KERNEL); + if (!pa_ranges) + return NULL; + + for (i = 0; i < num_ranges; i++) { + va_range = &va_ranges[i]; + pa_range = &pa_ranges[i]; + + pa_range->gfn_start = PFN_DOWN(__pa_symbol(va_range->va_start)); + pa_range->gfn_end = PFN_UP(__pa_symbol(va_range->va_end)) - 1; + pa_range->attributes = va_range->attributes; + + /* + * WARNING: + * Leaks addresses, should only be kept for development. + */ + attributes = pa_range->attributes; + pr_warn("Configuring GFN 0x%llx-0x%llx with %s\n", + pa_range->gfn_start, pa_range->gfn_end, + (attributes & HEKI_ATTR_MEM_NOWRITE) ? "[nowrite]" : + ""); + } + + return pa_ranges; +} + +void heki_free_pa_ranges(struct heki_pa_range *pa_ranges, int num_ranges) +{ + size_t size; + + size = PAGE_ALIGN(sizeof(struct heki_pa_range) * num_ranges); + free_pages_exact(pa_ranges, size); +} + +void __init heki_early_init(void) +{ + if (!heki_enabled) { + pr_warn("Disabled\n"); + return; + } + pr_warn("Enabled\n"); + + heki_arch_init(); +} + +void heki_late_init(void) +{ + struct heki_hypervisor *hypervisor = heki.hypervisor; + int ret; + + if (!heki_enabled) + return; + + if (!heki.static_ranges) { + pr_warn("Architecture did not initialize static ranges\n"); + return; + } + + /* + * Hypervisor support will be added in the future. When it is, the + * hypervisor will be used to protect guest kernel memory and + * control registers. + */ + + if (!hypervisor) { + /* This happens for kernels running on bare metal as well. */ + pr_warn("No hypervisor support\n"); + goto out; + } + + /* Protects statically defined sections in the host page table. */ + ret = hypervisor->protect_ranges(heki.static_ranges, + heki.num_static_ranges); + if (WARN(ret, "Failed to protect static sections: %d\n", ret)) + goto out; + pr_warn("Static sections protected\n"); + + /* + * Locks control registers so a compromised guest cannot change + * them. + */ + ret = hypervisor->lock_crs(); + if (WARN(ret, "Failed to lock control registers: %d\n", ret)) + goto out; + pr_warn("Control registers locked\n"); + +out: + heki_free_pa_ranges(heki.static_ranges, heki.num_static_ranges); + heki.static_ranges = NULL; + heki.num_static_ranges = 0; +} + +static int __init heki_parse_config(char *str) +{ + if (strtobool(str, &heki_enabled)) + pr_warn("Invalid option string for heki: '%s'\n", str); + return 1; +} + +__setup("heki=", heki_parse_config); From patchwork Fri May 5 15:20:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13232766 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5C58CC77B7F for ; Fri, 5 May 2023 15:33:03 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1puxOu-0007no-Vm; Fri, 05 May 2023 11:31:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG3-0003El-Tl for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:08 -0400 Received: from smtp-bc0c.mail.infomaniak.ch ([2001:1600:4:17::bc0c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxFz-0003j9-Ua for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:07 -0400 Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4QCZD938BPzMq1NC; Fri, 5 May 2023 17:22:01 +0200 (CEST) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4QCZD85V0tzMpt9P; Fri, 5 May 2023 17:22:00 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1683300121; bh=dkHjbqpuvSBmD/ZCnBFFFKsFtN1g9PxpijWtwL2weO4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FR+pYomIgtrTj0WI3JRDUSbHVrdlibfnwQU30RKu6bstMqwIWNnbYH/f3efnA/++Y PVW9ePmH83Lzuo5sF3eA8Zxiu6x5s8zasun6j41FzXtbI7LfAbR7ieVBarIcdhpBG1 xrxpuSnVobX97umCLQuu/zofOq0RUrhcLMOWIHuk= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Forrest Yuan Yu , James Morris , John Andersen , Liran Alon , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Rick Edgecombe , Thara Gopinath , Will Deacon , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v1 4/9] KVM: x86: Add new hypercall to set EPT permissions Date: Fri, 5 May 2023 17:20:41 +0200 Message-Id: <20230505152046.6575-5-mic@digikod.net> In-Reply-To: <20230505152046.6575-1-mic@digikod.net> References: <20230505152046.6575-1-mic@digikod.net> MIME-Version: 1.0 X-Infomaniak-Routing: alpha Received-SPF: pass client-ip=2001:1600:4:17::bc0c; envelope-from=mic@digikod.net; helo=smtp-bc0c.mail.infomaniak.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Fri, 05 May 2023 11:31:05 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Add a new KVM_HC_LOCK_MEM_PAGE_RANGES hypercall that enables a guest to set EPT permissions on a set of page ranges. This hypercall takes three arguments. The first contains the GPA pointing to an array of struct heki_pa_range. The second argument is the size of the array, not the number of elements. The third argument is for future proofness and is designed to contains optional flags (e.g. to change the array type), but must be zero for now. The struct heki_pa_range contains a GFN that starts the range and another that is the indicate the last (included) page. A bit field of attributes are tied to this range. The HEKI_ATTR_MEM_NOWRITE attribute is interpreted as a removal of the EPT write permission to deny any write access from the guest through its lifetime. We choose "nowrite" because "read-only" exclude execution, it follows a deny-list approach, and most importantly because it is an incremental addition to the status quo (i.e., everything is allowed from the TDP point of view). This is implemented thanks to the KVM_PAGE_TRACK_PREWRITE mode previously introduced. The page ranges recording is currently implemented with a static array of 16 elements to make it simple, but this mechanism will be dynamic in a follow-up. Define a kernel command line parameter "heki" to turn the feature on or off. By default, Heki is turned on. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Mickaël Salaün Link: https://lore.kernel.org/r/20230505152046.6575-5-mic@digikod.net --- Documentation/virt/kvm/x86/hypercalls.rst | 17 +++ arch/x86/kvm/x86.c | 169 ++++++++++++++++++++++ include/linux/kvm_host.h | 13 ++ include/uapi/linux/kvm_para.h | 1 + virt/kvm/kvm_main.c | 4 + 5 files changed, 204 insertions(+) diff --git a/Documentation/virt/kvm/x86/hypercalls.rst b/Documentation/virt/kvm/x86/hypercalls.rst index 10db7924720f..0ec79cc77f53 100644 --- a/Documentation/virt/kvm/x86/hypercalls.rst +++ b/Documentation/virt/kvm/x86/hypercalls.rst @@ -190,3 +190,20 @@ the KVM_CAP_EXIT_HYPERCALL capability. Userspace must enable that capability before advertising KVM_FEATURE_HC_MAP_GPA_RANGE in the guest CPUID. In addition, if the guest supports KVM_FEATURE_MIGRATION_CONTROL, userspace must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL. + +9. KVM_HC_LOCK_MEM_PAGE_RANGES +------------------------------ + +:Architecture: x86 +:Status: active +:Purpose: Request memory page ranges to be restricted. + +- a0: physical address of a struct heki_pa_range array +- a1: size of the array +- a2: optional flags, must be 0 for now + +The hypercall lets a guest request memory permissions to be removed for itself, +identified with set of physical page ranges (GFNs). The HEKI_ATTR_MEM_NOWRITE +memory page range attribute forbids related modification to the guest. + +Returns 0 on success or a KVM error code otherwise. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fd05f42c9913..ffab64d08de3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -59,6 +59,7 @@ #include #include #include +#include #include @@ -9596,6 +9597,161 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id) return; } +#ifdef CONFIG_HEKI + +static int heki_page_track_add(struct kvm *const kvm, const gfn_t gfn, + const enum kvm_page_track_mode mode) +{ + struct kvm_memory_slot *slot; + int idx; + + BUILD_BUG_ON(!IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING)); + + idx = srcu_read_lock(&kvm->srcu); + slot = gfn_to_memslot(kvm, gfn); + if (!slot) { + srcu_read_unlock(&kvm->srcu, idx); + return -EINVAL; + } + + write_lock(&kvm->mmu_lock); + kvm_slot_page_track_add_page(kvm, slot, gfn, mode); + write_unlock(&kvm->mmu_lock); + srcu_read_unlock(&kvm->srcu, idx); + return 0; +} + +static bool +heki_page_track_prewrite(struct kvm_vcpu *const vcpu, const gpa_t gpa, + struct kvm_page_track_notifier_node *const node) +{ + const gfn_t gfn = gpa_to_gfn(gpa); + const struct kvm *const kvm = vcpu->kvm; + size_t i; + + /* Checks if it is our own tracked pages, or those of someone else. */ + for (i = 0; i < HEKI_GFN_MAX; i++) { + if (gfn >= kvm->heki_gfn_no_write[i].start && + gfn <= kvm->heki_gfn_no_write[i].end) + return false; + } + + return true; +} + +static int kvm_heki_init_vm(struct kvm *const kvm) +{ + struct kvm_page_track_notifier_node *const node = + kzalloc(sizeof(*node), GFP_KERNEL); + + if (!node) + return -ENOMEM; + + node->track_prewrite = heki_page_track_prewrite; + kvm_page_track_register_notifier(kvm, node); + return 0; +} + +static bool is_gfn_overflow(unsigned long val) +{ + const gfn_t gfn_mask = gpa_to_gfn(~0); + + return (val | gfn_mask) != gfn_mask; +} + +#define HEKI_PA_RANGE_MAX_SIZE (sizeof(struct heki_pa_range) * HEKI_GFN_MAX) + +static int heki_lock_mem_page_ranges(struct kvm *const kvm, gpa_t mem_ranges, + unsigned long mem_ranges_size) +{ + int err; + size_t i, ranges_num; + struct heki_pa_range *ranges; + + if (mem_ranges_size > HEKI_PA_RANGE_MAX_SIZE) + return -KVM_E2BIG; + + if ((mem_ranges_size % sizeof(struct heki_pa_range)) != 0) + return -KVM_EINVAL; + + ranges = kzalloc(mem_ranges_size, GFP_KERNEL); + if (!ranges) + return -KVM_E2BIG; + + err = kvm_read_guest(kvm, mem_ranges, ranges, mem_ranges_size); + if (err) { + err = -KVM_EFAULT; + goto out_free_ranges; + } + + ranges_num = mem_ranges_size / sizeof(struct heki_pa_range); + for (i = 0; i < ranges_num; i++) { + const u64 attributes_mask = HEKI_ATTR_MEM_NOWRITE; + const gfn_t gfn_start = ranges[i].gfn_start; + const gfn_t gfn_end = ranges[i].gfn_end; + const u64 attributes = ranges[i].attributes; + + if (is_gfn_overflow(ranges[i].gfn_start)) { + err = -KVM_EINVAL; + goto out_free_ranges; + } + if (is_gfn_overflow(ranges[i].gfn_end)) { + err = -KVM_EINVAL; + goto out_free_ranges; + } + if (ranges[i].gfn_start > ranges[i].gfn_end) { + err = -KVM_EINVAL; + goto out_free_ranges; + } + if (!ranges[i].attributes) { + err = -KVM_EINVAL; + goto out_free_ranges; + } + if ((ranges[i].attributes | attributes_mask) != + attributes_mask) { + err = -KVM_EINVAL; + goto out_free_ranges; + } + + if (attributes & HEKI_ATTR_MEM_NOWRITE) { + unsigned long gfn; + size_t gfn_i; + + gfn_i = atomic_dec_if_positive( + &kvm->heki_gfn_no_write_num); + if (gfn_i == 0) { + err = -KVM_E2BIG; + goto out_free_ranges; + } + + gfn_i--; + kvm->heki_gfn_no_write[gfn_i].start = gfn_start; + kvm->heki_gfn_no_write[gfn_i].end = gfn_end; + + for (gfn = gfn_start; gfn <= gfn_end; gfn++) + WARN_ON_ONCE(heki_page_track_add( + kvm, gfn, KVM_PAGE_TRACK_PREWRITE)); + } + + pr_warn("heki-kvm: Locking GFN 0x%llx-0x%llx with %s\n", + gfn_start, gfn_end, + (attributes & HEKI_ATTR_MEM_NOWRITE) ? "[nowrite]" : ""); + } + +out_free_ranges: + kfree(ranges); + return err; +} + +#else /* CONFIG_HEKI */ + +static int kvm_heki_init_vm(struct kvm *const kvm) +{ + return 0; +} + +#endif /* CONFIG_HEKI */ + static int complete_hypercall_exit(struct kvm_vcpu *vcpu) { u64 ret = vcpu->run->hypercall.ret; @@ -9694,6 +9850,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) vcpu->arch.complete_userspace_io = complete_hypercall_exit; return 0; } +#ifdef CONFIG_HEKI + case KVM_HC_LOCK_MEM_PAGE_RANGES: + /* No flags for now. */ + if (a2) + ret = -KVM_EINVAL; + else + ret = heki_lock_mem_page_ranges(vcpu->kvm, a0, a1); + break; +#endif /* CONFIG_HEKI */ default: ret = -KVM_ENOSYS; break; @@ -12126,6 +12291,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) if (ret) goto out_page_track; + ret = kvm_heki_init_vm(kvm); + if (ret) + goto out_page_track; + ret = static_call(kvm_x86_vm_init)(kvm); if (ret) goto out_uninit_mmu; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4f26b244f6d0..39a1bdc2ba42 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -699,6 +699,13 @@ struct kvm_memslots { int node_idx; }; +#ifdef CONFIG_HEKI +struct heki_gfn_range { + gfn_t start; + gfn_t end; +}; +#endif /* CONFIG_HEKI */ + struct kvm { #ifdef KVM_HAVE_MMU_RWLOCK rwlock_t mmu_lock; @@ -801,6 +808,12 @@ struct kvm { bool vm_bugged; bool vm_dead; +#ifdef CONFIG_HEKI +#define HEKI_GFN_MAX 16 + atomic_t heki_gfn_no_write_num; + struct heki_gfn_range heki_gfn_no_write[HEKI_GFN_MAX]; +#endif /* CONFIG_HEKI */ + #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER struct notifier_block pm_notifier; #endif diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index 960c7e93d1a9..d7512a10880e 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -30,6 +30,7 @@ #define KVM_HC_SEND_IPI 10 #define KVM_HC_SCHED_YIELD 11 #define KVM_HC_MAP_GPA_RANGE 12 +#define KVM_HC_LOCK_MEM_PAGE_RANGES 13 /* * hypercalls use architecture specific diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9c60384b5ae0..4aea936dfe73 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1230,6 +1230,10 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) list_add(&kvm->vm_list, &vm_list); mutex_unlock(&kvm_lock); +#ifdef CONFIG_HEKI + atomic_set(&kvm->heki_gfn_no_write_num, HEKI_GFN_MAX + 1); +#endif /* CONFIG_HEKI */ + preempt_notifier_inc(); kvm_init_pm_notifier(kvm); From patchwork Fri May 5 15:20:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13232764 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5C945C77B7C for ; Fri, 5 May 2023 15:33:02 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1puxOv-00080P-Uk; Fri, 05 May 2023 11:31:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG3-0003Ek-LQ for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:07 -0400 Received: from smtp-8faf.mail.infomaniak.ch ([83.166.143.175]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG0-0003jI-OP for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:07 -0400 Received: from smtp-3-0000.mail.infomaniak.ch (unknown [10.4.36.107]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4QCZDB3g0SzMqb46; Fri, 5 May 2023 17:22:02 +0200 (CEST) Received: from unknown by smtp-3-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4QCZD96vnsz1jJ; Fri, 5 May 2023 17:22:01 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1683300122; bh=ZzOynGNQaPpNtiJ0zd91Lcwn9zW2+9ulyf1drBQ/Prg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vaA3xKDBfcFF3nZ5ibwAf9TNvpVjn95vBk3M+IQ75Zbj+yeoIWJoXq+IK2rjSvz6J Ek6QJhVl0QOgjeidTZl8FeFQNrpxRWC0pFUxavnSrBqF5C1GDY1DYBVA7I141lj7uP nCrFgP2JPKXF3ssC9wKq84oe+CDtv4db+Lpg7rcs= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Forrest Yuan Yu , James Morris , John Andersen , Liran Alon , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Rick Edgecombe , Thara Gopinath , Will Deacon , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v1 5/9] KVM: x86: Add new hypercall to lock control registers Date: Fri, 5 May 2023 17:20:42 +0200 Message-Id: <20230505152046.6575-6-mic@digikod.net> In-Reply-To: <20230505152046.6575-1-mic@digikod.net> References: <20230505152046.6575-1-mic@digikod.net> MIME-Version: 1.0 X-Infomaniak-Routing: alpha Received-SPF: pass client-ip=83.166.143.175; envelope-from=mic@digikod.net; helo=smtp-8faf.mail.infomaniak.ch X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Fri, 05 May 2023 11:31:05 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org This enables guests to lock their CR0 and CR4 registers with a subset of X86_CR0_WP, X86_CR4_SMEP, X86_CR4_SMAP, X86_CR4_UMIP, X86_CR4_FSGSBASE and X86_CR4_CET flags. The new KVM_HC_LOCK_CR_UPDATE hypercall takes two arguments. The first is to identify the control register, and the second is a bit mask to pin (i.e. mark as read-only). These register flags should already be pinned by Linux guests, but once compromised, this self-protection mechanism could be disabled, which is not the case with this dedicated hypercall. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Mickaël Salaün Link: https://lore.kernel.org/r/20230505152046.6575-6-mic@digikod.net --- Documentation/virt/kvm/x86/hypercalls.rst | 15 +++++ arch/x86/kernel/cpu/common.c | 2 +- arch/x86/kvm/vmx/vmx.c | 10 ++++ arch/x86/kvm/x86.c | 72 +++++++++++++++++++++++ arch/x86/kvm/x86.h | 16 +++++ include/linux/kvm_host.h | 3 + include/uapi/linux/kvm_para.h | 1 + 7 files changed, 118 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/x86/hypercalls.rst b/Documentation/virt/kvm/x86/hypercalls.rst index 0ec79cc77f53..8aa5d28986e3 100644 --- a/Documentation/virt/kvm/x86/hypercalls.rst +++ b/Documentation/virt/kvm/x86/hypercalls.rst @@ -207,3 +207,18 @@ identified with set of physical page ranges (GFNs). The HEKI_ATTR_MEM_NOWRITE memory page range attribute forbids related modification to the guest. Returns 0 on success or a KVM error code otherwise. + +10. KVM_HC_LOCK_CR_UPDATE +------------------------- + +:Architecture: x86 +:Status: active +:Purpose: Request some control registers to be restricted. + +- a0: identify a control register +- a1: bit mask to make some flags read-only + +The hypercall lets a guest request control register flags to be pinned for +itself. + +Returns 0 on success or a KVM error code otherwise. diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f3cc7699e1e1..dd89379fe5ac 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -413,7 +413,7 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c) } /* These bits should not change their value after CPU init is finished. */ -static const unsigned long cr4_pinned_mask = +const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP | X86_CR4_FSGSBASE | X86_CR4_CET; static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 9870db887a62..931688edc8eb 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3162,6 +3162,11 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long hw_cr0, old_cr0_pg; u32 tmp; + int res; + + res = heki_check_cr(vcpu->kvm, 0, cr0); + if (res) + return; old_cr0_pg = kvm_read_cr0_bits(vcpu, X86_CR0_PG); @@ -3323,6 +3328,11 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) * this bit, even if host CR4.MCE == 0. */ unsigned long hw_cr4; + int res; + + res = heki_check_cr(vcpu->kvm, 4, cr4); + if (res) + return; hw_cr4 = (cr4_read_shadow() & X86_CR4_MCE) | (cr4 & ~X86_CR4_MCE); if (is_unrestricted_guest(vcpu)) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ffab64d08de3..a529455359ac 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7927,11 +7927,77 @@ static unsigned long emulator_get_cr(struct x86_emulate_ctxt *ctxt, int cr) return value; } +#ifdef CONFIG_HEKI + +extern unsigned long cr4_pinned_mask; + +static int heki_lock_cr(struct kvm *const kvm, const unsigned long cr, + unsigned long pin) +{ + if (!pin) + return -KVM_EINVAL; + + switch (cr) { + case 0: + /* Cf. arch/x86/kernel/cpu/common.c */ + if (!(pin & X86_CR0_WP)) + return -KVM_EINVAL; + + if ((read_cr0() & pin) != pin) + return -KVM_EINVAL; + + atomic_long_or(pin, &kvm->heki_pinned_cr0); + return 0; + case 4: + /* Checks for irrelevant bits. */ + if ((pin & cr4_pinned_mask) != pin) + return -KVM_EINVAL; + + /* Ignores bits not present in host. */ + pin &= __read_cr4(); + atomic_long_or(pin, &kvm->heki_pinned_cr4); + return 0; + } + return -KVM_EINVAL; +} + +int heki_check_cr(const struct kvm *const kvm, const unsigned long cr, + const unsigned long val) +{ + unsigned long pinned; + + switch (cr) { + case 0: + pinned = atomic_long_read(&kvm->heki_pinned_cr0); + if ((val & pinned) != pinned) { + pr_warn_ratelimited( + "heki-kvm: Blocked CR0 update: 0x%lx\n", val); + return -KVM_EPERM; + } + return 0; + case 4: + pinned = atomic_long_read(&kvm->heki_pinned_cr4); + if ((val & pinned) != pinned) { + pr_warn_ratelimited( + "heki-kvm: Blocked CR4 update: 0x%lx\n", val); + return -KVM_EPERM; + } + return 0; + } + return 0; +} + +#endif /* CONFIG_HEKI */ + static int emulator_set_cr(struct x86_emulate_ctxt *ctxt, int cr, ulong val) { struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); int res = 0; + res = heki_check_cr(vcpu->kvm, cr, val); + if (res) + return res; + switch (cr) { case 0: res = kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); @@ -9858,6 +9924,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) else ret = heki_lock_mem_page_ranges(vcpu->kvm, a0, a1); break; + case KVM_HC_LOCK_CR_UPDATE: + if (a0 > U32_MAX) + ret = -KVM_EINVAL; + else + ret = heki_lock_cr(vcpu->kvm, a0, a1); + break; #endif /* CONFIG_HEKI */ default: ret = -KVM_ENOSYS; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 9de72586f406..3e80a60ecbd8 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -276,6 +276,22 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk) return !(kvm->arch.disabled_quirks & quirk); } +#ifdef CONFIG_HEKI + +int heki_check_cr(const struct kvm *kvm, unsigned long cr, unsigned long val); + +bool kvm_heki_is_exec_allowed(struct kvm_vcpu *vcpu, gpa_t gpa); + +#else /* CONFIG_HEKI */ + +static inline int heki_check_cr(const struct kvm *const kvm, + const unsigned long cr, const unsigned long val) +{ + return 0; +} + +#endif /* CONFIG_HEKI */ + void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); u64 get_kvmclock_ns(struct kvm *kvm); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 39a1bdc2ba42..ab9dc723bc89 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -812,6 +812,9 @@ struct kvm { #define HEKI_GFN_MAX 16 atomic_t heki_gfn_no_write_num; struct heki_gfn_range heki_gfn_no_write[HEKI_GFN_MAX]; + + atomic_long_t heki_pinned_cr0; + atomic_long_t heki_pinned_cr4; #endif /* CONFIG_HEKI */ #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index d7512a10880e..9f68d4ba646b 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -31,6 +31,7 @@ #define KVM_HC_SCHED_YIELD 11 #define KVM_HC_MAP_GPA_RANGE 12 #define KVM_HC_LOCK_MEM_PAGE_RANGES 13 +#define KVM_HC_LOCK_CR_UPDATE 14 /* * hypercalls use architecture specific From patchwork Fri May 5 15:20:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13232759 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4CF28C77B7F for ; Fri, 5 May 2023 15:32:06 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1puxOu-0007rf-TN; Fri, 05 May 2023 11:31:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG5-0003F0-IG for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:09 -0400 Received: from smtp-bc08.mail.infomaniak.ch ([2001:1600:4:17::bc08]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG2-0003jP-4D for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:08 -0400 Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4QCZDC54NKzMqP0t; Fri, 5 May 2023 17:22:03 +0200 (CEST) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4QCZDB71wWzMpt9w; Fri, 5 May 2023 17:22:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1683300123; bh=DM23zXDS+uuGwMhWqh31C3FvKal04OOOYj6BCpV8eTU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=xWSKWq8GRjQntbZQyjyc/hcLfD/4e9Eo9VEQ3TyJ15GAj81rIlqRsji0rYWtV6o9s DjYkgLxPq6tOCUL7E0ENshFSZ0cXEvowTkLwi08Dmqi/WhYCFJPWohunvkPKWdyymr yIWmw5cSKPerfUXYOOtVgs7cuuhvsEWWph9Se7ZE= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Forrest Yuan Yu , James Morris , John Andersen , Liran Alon , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Rick Edgecombe , Thara Gopinath , Will Deacon , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v1 6/9] KVM: x86: Add Heki hypervisor support Date: Fri, 5 May 2023 17:20:43 +0200 Message-Id: <20230505152046.6575-7-mic@digikod.net> In-Reply-To: <20230505152046.6575-1-mic@digikod.net> References: <20230505152046.6575-1-mic@digikod.net> MIME-Version: 1.0 X-Infomaniak-Routing: alpha Received-SPF: pass client-ip=2001:1600:4:17::bc08; envelope-from=mic@digikod.net; helo=smtp-bc08.mail.infomaniak.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Fri, 05 May 2023 11:31:07 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Madhavan T. Venkataraman Each supported hypervisor in x86 implements a struct x86_hyper_init to define the init functions for the hypervisor. Define a new init_heki() entry point in struct x86_hyper_init. Hypervisors that support Heki must define this init_heki() function. Call init_heki() of the chosen hypervisor in init_hypervisor_platform(). Create a heki_hypervisor structure that each hypervisor can fill with its data and functions. This will allow the Heki feature to work in a hypervisor agnostic way. Declare and initialize a "heki_hypervisor" structure for KVM so KVM can support Heki. Define the init_heki() function for KVM. In init_heki(), set the hypervisor field in the generic "heki" structure to the KVM "heki_hypervisor". After this point, generic Heki code can access the KVM Heki data and functions. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Co-developed-by: Mickaël Salaün Signed-off-by: Mickaël Salaün Signed-off-by: Madhavan T. Venkataraman Link: https://lore.kernel.org/r/20230505152046.6575-7-mic@digikod.net --- arch/x86/include/asm/x86_init.h | 2 + arch/x86/kernel/cpu/hypervisor.c | 1 + arch/x86/kernel/kvm.c | 72 ++++++++++++++++++++++++++++++++ arch/x86/kernel/x86_init.c | 1 + arch/x86/kvm/Kconfig | 1 + virt/heki/Kconfig | 9 +++- virt/heki/heki.c | 6 --- 7 files changed, 85 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index c1c8c581759d..0fc5041a66c6 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -119,6 +119,7 @@ struct x86_init_pci { * @msi_ext_dest_id: MSI supports 15-bit APIC IDs * @init_mem_mapping: setup early mappings during init_mem_mapping() * @init_after_bootmem: guest init after boot allocator is finished + * @init_heki: Hypervisor enforced kernel integrity */ struct x86_hyper_init { void (*init_platform)(void); @@ -127,6 +128,7 @@ struct x86_hyper_init { bool (*msi_ext_dest_id)(void); void (*init_mem_mapping)(void); void (*init_after_bootmem)(void); + void (*init_heki)(void); }; /** diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c index 553bfbfc3a1b..6085c8129e0c 100644 --- a/arch/x86/kernel/cpu/hypervisor.c +++ b/arch/x86/kernel/cpu/hypervisor.c @@ -106,4 +106,5 @@ void __init init_hypervisor_platform(void) x86_hyper_type = h->type; x86_init.hyper.init_platform(); + x86_init.hyper.init_heki(); } diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 1cceac5984da..e53cebdcf3ac 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -866,6 +867,45 @@ static void __init kvm_guest_init(void) hardlockup_detector_disable(); } +#ifdef CONFIG_HEKI + +static int kvm_protect_ranges(struct heki_pa_range *ranges, int num_ranges) +{ + size_t size; + long err; + + WARN_ON(in_interrupt()); + + size = sizeof(ranges[0]) * num_ranges; + err = kvm_hypercall3(KVM_HC_LOCK_MEM_PAGE_RANGES, __pa(ranges), size, 0); + if (WARN(err, "Failed to enforce memory protection: %ld\n", err)) + return err; + + return 0; +} + +extern unsigned long cr4_pinned_mask; + +/* + * TODO: Check SMP policy consistency, e.g. with + * this_cpu_read(cpu_tlbstate.cr4) + */ +static int kvm_lock_crs(void) +{ + unsigned long cr4; + int err; + + err = kvm_hypercall2(KVM_HC_LOCK_CR_UPDATE, 0, X86_CR0_WP); + if (err) + return err; + + cr4 = __read_cr4(); + err = kvm_hypercall2(KVM_HC_LOCK_CR_UPDATE, 4, cr4 & cr4_pinned_mask); + return err; +} + +#endif /* CONFIG_HEKI */ + static noinline uint32_t __kvm_cpuid_base(void) { if (boot_cpu_data.cpuid_level < 0) @@ -999,6 +1039,37 @@ static bool kvm_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs) } #endif +#ifdef CONFIG_HEKI + +static struct heki_hypervisor kvm_heki_hypervisor = { + .protect_ranges = kvm_protect_ranges, + .lock_crs = kvm_lock_crs, +}; + +static void kvm_init_heki(void) +{ + long err; + + if (!kvm_para_available()) + /* Cannot make KVM hypercalls. */ + return; + + err = kvm_hypercall3(KVM_HC_LOCK_MEM_PAGE_RANGES, -1, -1, -1); + if (err == -KVM_ENOSYS) + /* Ignores host. */ + return; + + heki.hypervisor = &kvm_heki_hypervisor; +} + +#else /* CONFIG_HEKI */ + +static void kvm_init_heki(void) +{ +} + +#endif /* CONFIG_HEKI */ + const __initconst struct hypervisor_x86 x86_hyper_kvm = { .name = "KVM", .detect = kvm_detect, @@ -1007,6 +1078,7 @@ const __initconst struct hypervisor_x86 x86_hyper_kvm = { .init.x2apic_available = kvm_para_available, .init.msi_ext_dest_id = kvm_msi_ext_dest_id, .init.init_platform = kvm_init_platform, + .init.init_heki = kvm_init_heki, #if defined(CONFIG_AMD_MEM_ENCRYPT) .runtime.sev_es_hcall_prepare = kvm_sev_es_hcall_prepare, .runtime.sev_es_hcall_finish = kvm_sev_es_hcall_finish, diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index ef80d361b463..0a023c24fcdb 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -114,6 +114,7 @@ struct x86_init_ops x86_init __initdata = { .msi_ext_dest_id = bool_x86_init_noop, .init_mem_mapping = x86_init_noop, .init_after_bootmem = x86_init_noop, + .init_heki = x86_init_noop, }, .acpi = { diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index fbeaa9ddef59..ba355171ceeb 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -49,6 +49,7 @@ config KVM select SRCU select INTERVAL_TREE select HAVE_KVM_PM_NOTIFIER if PM + select HYPERVISOR_SUPPORTS_HEKI help Support hosting fully virtualized guest machines using hardware virtualization extensions. You will need a fairly recent diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig index 9858a827fe17..96f18ce03013 100644 --- a/virt/heki/Kconfig +++ b/virt/heki/Kconfig @@ -6,7 +6,7 @@ config HEKI bool "Hypervisor Enforced Kernel Integrity (Heki)" default y - depends on !JUMP_LABEL && ARCH_SUPPORTS_HEKI + depends on !JUMP_LABEL && ARCH_SUPPORTS_HEKI && HYPERVISOR_SUPPORTS_HEKI select KVM_EXTERNAL_WRITE_TRACKING if KVM help This feature enhances guest virtual machine security by taking @@ -20,3 +20,10 @@ config ARCH_SUPPORTS_HEKI An architecture should select this when it can successfully build and run with CONFIG_HEKI. That is, it should provide all of the architecture support required for the HEKI feature. + +config HYPERVISOR_SUPPORTS_HEKI + bool "Hypervisor support for Heki" + help + A hypervisor should select this when it can successfully build + and run with CONFIG_HEKI. That is, it should provide all of the + hypervisor support required for the Heki feature. diff --git a/virt/heki/heki.c b/virt/heki/heki.c index c8cb1b84cceb..142b5dc98a2f 100644 --- a/virt/heki/heki.c +++ b/virt/heki/heki.c @@ -91,12 +91,6 @@ void heki_late_init(void) return; } - /* - * Hypervisor support will be added in the future. When it is, the - * hypervisor will be used to protect guest kernel memory and - * control registers. - */ - if (!hypervisor) { /* This happens for kernels running on bare metal as well. */ pr_warn("No hypervisor support\n"); From patchwork Fri May 5 15:20:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13232758 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 623BBC7EE22 for ; Fri, 5 May 2023 15:32:04 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1puxOw-000833-AH; Fri, 05 May 2023 11:31:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxGF-0003Gu-BS for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:19 -0400 Received: from smtp-8fac.mail.infomaniak.ch ([2001:1600:4:17::8fac]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG3-0003ja-0T for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:19 -0400 Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4QCZDD5yVJzMqDN2; Fri, 5 May 2023 17:22:04 +0200 (CEST) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4QCZDD0z6dzMpxBc; Fri, 5 May 2023 17:22:04 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1683300124; bh=numFKFwerBzb1GFmju5PzfUy3LvlXvoHGRlXH+KNT+c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aY8puWEd6eYwOhiBC5f3v2W8STDFR1ws5t4KGxnuMkV/veHF5dk9ikWYLtZACCzKL ZZHQs/vJJkdeZe4WGYydF7qoNq+EgiNGsn5zsfWYHRSGNlObVxnWCaLzSzlH/rsRqO 0NTmO3IcYiupEfrP7vmngHe/c3dwNtOvqClQ6aug= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Forrest Yuan Yu , James Morris , John Andersen , Liran Alon , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Rick Edgecombe , Thara Gopinath , Will Deacon , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v1 7/9] KVM: VMX: Add MBEC support Date: Fri, 5 May 2023 17:20:44 +0200 Message-Id: <20230505152046.6575-8-mic@digikod.net> In-Reply-To: <20230505152046.6575-1-mic@digikod.net> References: <20230505152046.6575-1-mic@digikod.net> MIME-Version: 1.0 X-Infomaniak-Routing: alpha Received-SPF: pass client-ip=2001:1600:4:17::8fac; envelope-from=mic@digikod.net; helo=smtp-8fac.mail.infomaniak.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Fri, 05 May 2023 11:31:07 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org This changes add support for VMX_FEATURE_MODE_BASED_EPT_EXEC (named ept_mode_based_exec in /proc/cpuinfo and MBEC elsewhere), which enables to separate EPT execution bits for supervisor vs. user. It transforms the semantic of VMX_EPT_EXECUTABLE_MASK from a global execution to a kernel execution, and use the VMX_EPT_USER_EXECUTABLE_MASK bit to identify user execution. The main use case is to be able to restrict kernel execution while ignoring user space execution from the hypervisor point of view. Indeed, user space execution can already be restricted by the guest kernel. This change enables MBEC but doesn't change the default configuration, which is to allow execution for all guest memory. However, the next commit levages MBEC to restrict kernel memory pages. MBEC can be configured with the new "enable_mbec" module parameter, set to true by default. However, MBEC is disable for L1 and L2 for now. Replace EPT_VIOLATION_RWX_MASK (3 bits) with 4 dedicated EPT_VIOLATION_READ, EPT_VIOLATION_WRITE, EPT_VIOLATION_KERNEL_INSTR, and EPT_VIOLATION_USER_INSTR bits. From the Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3C (System Programming Guide), Part 3: SECONDARY_EXEC_MODE_BASED_EPT_EXEC (bit 22): If either the "unrestricted guest" VM-execution control or the "mode-based execute control for EPT" VM-execution control is 1, the "enable EPT" VM-execution control must also be 1. EPT_VIOLATION_KERNEL_INSTR_BIT (bit 5): The logical-AND of bit 2 in the EPT paging-structure entries used to translate the guest-physical address of the access causing the EPT violation. If the "mode-based execute control for EPT" VM-execution control is 0, this indicates whether the guest-physical address was executable. If that control is 1, this indicates whether the guest-physical address was executable for supervisor-mode linear addresses. EPT_VIOLATION_USER_INSTR_BIT (bit 6): If the "mode-based execute control" VM-execution control is 0, the value of this bit is undefined. If that control is 1, this bit is the logical-AND of bit 10 in the EPT paging-structures entries used to translate the guest-physical address of the access causing the EPT violation. In this case, it indicates whether the guest-physical address was executable for user-mode linear addresses. PT_USER_EXEC_MASK (bit 10): Execute access for user-mode linear addresses. If the "mode-based execute control for EPT" VM-execution control is 1, indicates whether instruction fetches are allowed from user-mode linear addresses in the 512-GByte region controlled by this entry. If that control is 0, this bit is ignored. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Mickaël Salaün Link: https://lore.kernel.org/r/20230505152046.6575-8-mic@digikod.net --- arch/x86/include/asm/vmx.h | 11 +++++++++-- arch/x86/kvm/mmu.h | 3 ++- arch/x86/kvm/mmu/mmu.c | 6 +++++- arch/x86/kvm/mmu/paging_tmpl.h | 16 ++++++++++++++-- arch/x86/kvm/mmu/spte.c | 4 +++- arch/x86/kvm/vmx/capabilities.h | 7 +++++++ arch/x86/kvm/vmx/nested.c | 7 +++++++ arch/x86/kvm/vmx/vmx.c | 28 +++++++++++++++++++++++++--- arch/x86/kvm/vmx/vmx.h | 1 + 9 files changed, 73 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 498dc600bd5c..452e7d153832 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -511,6 +511,7 @@ enum vmcs_field { #define VMX_EPT_IPAT_BIT (1ull << 6) #define VMX_EPT_ACCESS_BIT (1ull << 8) #define VMX_EPT_DIRTY_BIT (1ull << 9) +#define VMX_EPT_USER_EXECUTABLE_MASK (1ull << 10) #define VMX_EPT_RWX_MASK (VMX_EPT_READABLE_MASK | \ VMX_EPT_WRITABLE_MASK | \ VMX_EPT_EXECUTABLE_MASK) @@ -556,13 +557,19 @@ enum vm_entry_failure_code { #define EPT_VIOLATION_ACC_READ_BIT 0 #define EPT_VIOLATION_ACC_WRITE_BIT 1 #define EPT_VIOLATION_ACC_INSTR_BIT 2 -#define EPT_VIOLATION_RWX_SHIFT 3 +#define EPT_VIOLATION_READ_BIT 3 +#define EPT_VIOLATION_WRITE_BIT 4 +#define EPT_VIOLATION_KERNEL_INSTR_BIT 5 +#define EPT_VIOLATION_USER_INSTR_BIT 6 #define EPT_VIOLATION_GVA_IS_VALID_BIT 7 #define EPT_VIOLATION_GVA_TRANSLATED_BIT 8 #define EPT_VIOLATION_ACC_READ (1 << EPT_VIOLATION_ACC_READ_BIT) #define EPT_VIOLATION_ACC_WRITE (1 << EPT_VIOLATION_ACC_WRITE_BIT) #define EPT_VIOLATION_ACC_INSTR (1 << EPT_VIOLATION_ACC_INSTR_BIT) -#define EPT_VIOLATION_RWX_MASK (VMX_EPT_RWX_MASK << EPT_VIOLATION_RWX_SHIFT) +#define EPT_VIOLATION_READ (1 << EPT_VIOLATION_READ_BIT) +#define EPT_VIOLATION_WRITE (1 << EPT_VIOLATION_WRITE_BIT) +#define EPT_VIOLATION_KERNEL_INSTR (1 << EPT_VIOLATION_KERNEL_INSTR_BIT) +#define EPT_VIOLATION_USER_INSTR (1 << EPT_VIOLATION_USER_INSTR_BIT) #define EPT_VIOLATION_GVA_IS_VALID (1 << EPT_VIOLATION_GVA_IS_VALID_BIT) #define EPT_VIOLATION_GVA_TRANSLATED (1 << EPT_VIOLATION_GVA_TRANSLATED_BIT) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 6bdaacb6faa0..3c4fd4618cc1 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -24,6 +24,7 @@ extern bool __read_mostly enable_mmio_caching; #define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT) #define PT_PAT_MASK (1ULL << 7) #define PT_GLOBAL_MASK (1ULL << 8) +#define PT_USER_EXEC_MASK (1ULL << 10) #define PT64_NX_SHIFT 63 #define PT64_NX_MASK (1ULL << PT64_NX_SHIFT) @@ -102,7 +103,7 @@ static inline u8 kvm_get_shadow_phys_bits(void) void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask); void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask); -void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only); +void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only, bool has_mbec); void kvm_init_mmu(struct kvm_vcpu *vcpu); void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0, diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e5d1e241ff0f..a47e63217eb8 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -27,6 +27,9 @@ #include "cpuid.h" #include "spte.h" +/* Required by paging_tmpl.h for enable_mbec */ +#include "../vmx/capabilities.h" + #include #include #include @@ -3763,7 +3766,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) */ pm_mask = PT_PRESENT_MASK | shadow_me_value; if (mmu->root_role.level >= PT64_ROOT_4LEVEL) { - pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK; + pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK | + PT_USER_EXEC_MASK; if (WARN_ON_ONCE(!mmu->pml4_root)) { r = -EIO; diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 0f6455072055..12119d519c77 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -498,8 +498,20 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, * Note, pte_access holds the raw RWX bits from the EPTE, not * ACC_*_MASK flags! */ - vcpu->arch.exit_qualification |= (pte_access & VMX_EPT_RWX_MASK) << - EPT_VIOLATION_RWX_SHIFT; + vcpu->arch.exit_qualification |= + !!(pte_access & VMX_EPT_READABLE_MASK) + << EPT_VIOLATION_READ_BIT; + vcpu->arch.exit_qualification |= + !!(pte_access & VMX_EPT_WRITABLE_MASK) + << EPT_VIOLATION_WRITE_BIT; + vcpu->arch.exit_qualification |= + !!(pte_access & VMX_EPT_EXECUTABLE_MASK) + << EPT_VIOLATION_KERNEL_INSTR_BIT; + if (enable_mbec) { + vcpu->arch.exit_qualification |= + !!(pte_access & VMX_EPT_USER_EXECUTABLE_MASK) + << EPT_VIOLATION_USER_INSTR_BIT; + } } #endif walker->fault.address = addr; diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 639f220a1ed5..f1e2e3cad878 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -430,13 +430,15 @@ void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask) } EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask); -void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only) +void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only, bool has_mbec) { shadow_user_mask = VMX_EPT_READABLE_MASK; shadow_accessed_mask = has_ad_bits ? VMX_EPT_ACCESS_BIT : 0ull; shadow_dirty_mask = has_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull; shadow_nx_mask = 0ull; shadow_x_mask = VMX_EPT_EXECUTABLE_MASK; + if (has_mbec) + shadow_x_mask |= VMX_EPT_USER_EXECUTABLE_MASK; shadow_present_mask = has_exec_only ? 0ull : VMX_EPT_READABLE_MASK; /* * EPT overrides the host MTRRs, and so KVM must program the desired diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index cd2ac9536c99..2cc5d7d20144 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -13,6 +13,7 @@ extern bool __read_mostly enable_vpid; extern bool __read_mostly flexpriority_enabled; extern bool __read_mostly enable_ept; extern bool __read_mostly enable_unrestricted_guest; +extern bool __read_mostly enable_mbec; extern bool __read_mostly enable_ept_ad_bits; extern bool __read_mostly enable_pml; extern bool __read_mostly enable_ipiv; @@ -255,6 +256,12 @@ static inline bool cpu_has_vmx_xsaves(void) SECONDARY_EXEC_XSAVES; } +static inline bool cpu_has_vmx_mbec(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl & + SECONDARY_EXEC_MODE_BASED_EPT_EXEC; +} + static inline bool cpu_has_vmx_waitpkg(void) { return vmcs_config.cpu_based_2nd_exec_ctrl & diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index d93c715cda6a..3c381c75e2a9 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2317,6 +2317,9 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0 /* VMCS shadowing for L2 is emulated for now */ exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS; + /* MBEC is currently only handled for L0. */ + exec_control &= ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC; + /* * Preset *DT exiting when emulating UMIP, so that vmx_set_cr4() * will not have to rewrite the controls just for this bit. @@ -6870,6 +6873,10 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps) */ msrs->secondary_ctls_low = 0; + /* + * Currently, SECONDARY_EXEC_MODE_BASED_EPT_EXEC is only handled for + * L0 and doesn't need to be exposed to L1 nor L2. + */ msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl; msrs->secondary_ctls_high &= SECONDARY_EXEC_DESC | diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 931688edc8eb..004fd4e5e057 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -94,6 +94,9 @@ bool __read_mostly enable_unrestricted_guest = 1; module_param_named(unrestricted_guest, enable_unrestricted_guest, bool, S_IRUGO); +bool __read_mostly enable_mbec = true; +module_param_named(mbec, enable_mbec, bool, 0444); + bool __read_mostly enable_ept_ad_bits = 1; module_param_named(eptad, enable_ept_ad_bits, bool, S_IRUGO); @@ -4518,10 +4521,21 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx) exec_control &= ~SECONDARY_EXEC_ENABLE_VPID; if (!enable_ept) { exec_control &= ~SECONDARY_EXEC_ENABLE_EPT; + /* + * From Intel's SDM: + * If either the "unrestricted guest" VM-execution control or + * the "mode-based execute control for EPT" VM-execution + * control is 1, the "enable EPT" VM-execution control must + * also be 1. + */ enable_unrestricted_guest = 0; + enable_mbec = false; } if (!enable_unrestricted_guest) exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST; + if (!enable_mbec) + exec_control &= ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC; + if (kvm_pause_in_guest(vmx->vcpu.kvm)) exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING; if (!kvm_vcpu_apicv_active(vcpu)) @@ -5658,7 +5672,7 @@ static int handle_task_switch(struct kvm_vcpu *vcpu) static int handle_ept_violation(struct kvm_vcpu *vcpu) { - unsigned long exit_qualification; + unsigned long exit_qualification, rwx_mask; gpa_t gpa; u64 error_code; @@ -5688,7 +5702,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR) ? PFERR_FETCH_MASK : 0; /* ept page table entry is present? */ - error_code |= (exit_qualification & EPT_VIOLATION_RWX_MASK) + rwx_mask = EPT_VIOLATION_READ | EPT_VIOLATION_WRITE | + EPT_VIOLATION_KERNEL_INSTR; + if (enable_mbec) + rwx_mask |= EPT_VIOLATION_USER_INSTR; + error_code |= (exit_qualification & rwx_mask) ? PFERR_PRESENT_MASK : 0; error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) != 0 ? @@ -8345,6 +8363,9 @@ static __init int hardware_setup(void) if (!cpu_has_vmx_unrestricted_guest() || !enable_ept) enable_unrestricted_guest = 0; + if (!cpu_has_vmx_mbec() || !enable_ept) + enable_mbec = false; + if (!cpu_has_vmx_flexpriority()) flexpriority_enabled = 0; @@ -8404,7 +8425,8 @@ static __init int hardware_setup(void) if (enable_ept) kvm_mmu_set_ept_masks(enable_ept_ad_bits, - cpu_has_vmx_ept_execute_only()); + cpu_has_vmx_ept_execute_only(), + enable_mbec); /* * Setup shadow_me_value/shadow_me_mask to include MKTME KeyID diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index a3da84f4ea45..815db44cd51e 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -585,6 +585,7 @@ static inline u8 vmx_get_rvi(void) SECONDARY_EXEC_ENABLE_VMFUNC | \ SECONDARY_EXEC_BUS_LOCK_DETECTION | \ SECONDARY_EXEC_NOTIFY_VM_EXITING | \ + SECONDARY_EXEC_MODE_BASED_EPT_EXEC | \ SECONDARY_EXEC_ENCLS_EXITING) #define KVM_REQUIRED_VMX_TERTIARY_VM_EXEC_CONTROL 0 From patchwork Fri May 5 15:20:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13232761 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A6A49C7EE22 for ; Fri, 5 May 2023 15:32:31 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1puxP0-0008Bj-0h; Fri, 05 May 2023 11:31:22 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG9-0003Fp-IA for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:13 -0400 Received: from smtp-42aa.mail.infomaniak.ch ([2001:1600:4:17::42aa]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG5-0003jn-8M for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:13 -0400 Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4QCZDF6ppRzMq81r; Fri, 5 May 2023 17:22:05 +0200 (CEST) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4QCZDF1ydFzMptBL; Fri, 5 May 2023 17:22:05 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1683300125; bh=PyAeGY6RXqTYQefM0sl5cf8gwLNcu9YypouNBS2PSvg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gx2MJRBQRK/Q7zeNftCJx4FHowj9XptOmH9PCuCsUSu0Bp7v+0qbMxJPxRE/O0jXB coJOo2nnquoJ+LYtPiXbpt1gGK6521yg3b0SbAlgonaVc1WJSFYSTnWf6gVnc0qPj3 BlSHXvPRMTfLaM2oFX0fm3FNVa7xLedl9XNnFLhY= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Forrest Yuan Yu , James Morris , John Andersen , Liran Alon , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Rick Edgecombe , Thara Gopinath , Will Deacon , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v1 8/9] KVM: x86/mmu: Enable guests to lock themselves thanks to MBEC Date: Fri, 5 May 2023 17:20:45 +0200 Message-Id: <20230505152046.6575-9-mic@digikod.net> In-Reply-To: <20230505152046.6575-1-mic@digikod.net> References: <20230505152046.6575-1-mic@digikod.net> MIME-Version: 1.0 X-Infomaniak-Routing: alpha Received-SPF: pass client-ip=2001:1600:4:17::42aa; envelope-from=mic@digikod.net; helo=smtp-42aa.mail.infomaniak.ch X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Fri, 05 May 2023 11:31:07 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org This changes enable to enforce a deny-by-default execution security policy for guest kernels, leveraged by the Heki implementation. Create synthetic page faults when an access is denied by Heki. This kind of kernel page fault needs to be handled by guests, which is not currently the case, making it try again and again, but we are working to calm down such guests by teaching them how to handle such page faults. The MMU tracepoints are updated to reflect the difference between kernel and user space executions. kvm_heki_fix_all_ept_exec_perm() walks through all guest memory pages to set the configured default execution permissions (i.e. only allow configured executabel memory pages). The struct heki_mem_range's attribute field now understand HEKI_ATTR_MEM_EXEC, which allows the related kernel sections to be executable, and deny any other kernel memory from being executable for the whole lifetime of the guest. This obviously can only work with static kernels and we are exploring ways to handle authenticated and dynamic kernel memory permission updates. If the host doesn't have MBEC enabled, the KVM_HC_LOCK_MEM_PAGE_RANGES hypecall will return -KVM_EOPNOTSUPP and might only apply the previous ranges, if any. This is useful to develop this RFC and make sure execution restrictions are enforced (and not silently ignored), but this behavior might change in a future patch series. Guest kernels could check for MBEC support to not use the HEKI_ATTR_MEM_EXEC attribute. The number of configurable memory ranges per guest is 16 for now. This will change with a follow-up. There are currently some pr_warn() calls to make it easy to test this code. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Mickaël Salaün Link: https://lore.kernel.org/r/20230505152046.6575-9-mic@digikod.net --- Documentation/virt/kvm/x86/hypercalls.rst | 4 +- arch/x86/kvm/mmu/mmu.c | 35 ++++++++- arch/x86/kvm/mmu/mmutrace.h | 11 ++- arch/x86/kvm/mmu/spte.c | 19 ++++- arch/x86/kvm/mmu/spte.h | 15 +++- arch/x86/kvm/mmu/tdp_mmu.c | 73 ++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 4 + arch/x86/kvm/x86.c | 90 ++++++++++++++++++++++- arch/x86/kvm/x86.h | 7 ++ include/linux/kvm_host.h | 4 + virt/kvm/kvm_main.c | 1 + 11 files changed, 250 insertions(+), 13 deletions(-) diff --git a/Documentation/virt/kvm/x86/hypercalls.rst b/Documentation/virt/kvm/x86/hypercalls.rst index 8aa5d28986e3..5accf5f6de13 100644 --- a/Documentation/virt/kvm/x86/hypercalls.rst +++ b/Documentation/virt/kvm/x86/hypercalls.rst @@ -204,7 +204,9 @@ must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL. The hypercall lets a guest request memory permissions to be removed for itself, identified with set of physical page ranges (GFNs). The HEKI_ATTR_MEM_NOWRITE -memory page range attribute forbids related modification to the guest. +memory page range attribute forbids related modification to the guest. The +HEKI_ATTR_MEM_EXEC attribute allows execution on the specified pages while +removing it for all the others. Returns 0 on success or a KVM error code otherwise. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a47e63217eb8..56a8bcac1b82 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3313,7 +3313,7 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte) { if (fault->exec) - return is_executable_pte(spte); + return is_executable_pte(spte, !fault->user); if (fault->write) return is_writable_pte(spte); @@ -5602,6 +5602,39 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa))) return RET_PF_RETRY; + /* Skips real page faults if not needed. */ + if ((error_code & PFERR_FETCH_MASK) && + !kvm_heki_is_exec_allowed(vcpu, cr2_or_gpa)) { + /* + * TODO: To avoid kvm_heki_is_exec_allowed() call, check + * enable_mbec and EPT_VIOLATION_KERNEL_INSTR, see + * handle_ept_violation(). + */ + struct x86_exception fault = { + .vector = PF_VECTOR, + .error_code_valid = true, + .error_code = error_code, + .nested_page_fault = false, + /* + * TODO: This kind of kernel page fault needs to be handled by + * the guest, which is not currently the case, making it try + * again and again. + * + * You may want to test with cr2_or_gva to see the page + * fault caught by the guest kernel (thinking it is a + * user space fault). + */ + .address = static_call(kvm_x86_fault_gva)(vcpu), + .async_page_fault = false, + }; + + pr_warn_ratelimited( + "heki-kvm: Creating fetch #PF at 0x%016llx\n", + fault.address); + kvm_inject_page_fault(vcpu, &fault); + return RET_PF_INVALID; + } + r = RET_PF_INVALID; if (unlikely(error_code & PFERR_RSVD_MASK)) { r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct); diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h index ae86820cef69..cb7df95aec25 100644 --- a/arch/x86/kvm/mmu/mmutrace.h +++ b/arch/x86/kvm/mmu/mmutrace.h @@ -342,7 +342,8 @@ TRACE_EVENT( __field(u8, level) /* These depend on page entry type, so compute them now. */ __field(bool, r) - __field(bool, x) + __field(bool, kx) + __field(bool, ux) __field(signed char, u) ), @@ -352,15 +353,17 @@ TRACE_EVENT( __entry->sptep = virt_to_phys(sptep); __entry->level = level; __entry->r = shadow_present_mask || (__entry->spte & PT_PRESENT_MASK); - __entry->x = is_executable_pte(__entry->spte); + __entry->kx = is_executable_pte(__entry->spte, true); + __entry->ux = is_executable_pte(__entry->spte, false); __entry->u = shadow_user_mask ? !!(__entry->spte & shadow_user_mask) : -1; ), - TP_printk("gfn %llx spte %llx (%s%s%s%s) level %d at %llx", + TP_printk("gfn %llx spte %llx (%s%s%s%s%s) level %d at %llx", __entry->gfn, __entry->spte, __entry->r ? "r" : "-", __entry->spte & PT_WRITABLE_MASK ? "w" : "-", - __entry->x ? "x" : "-", + __entry->kx ? "X" : "-", + __entry->ux ? "x" : "-", __entry->u == -1 ? "" : (__entry->u ? "u" : "-"), __entry->level, __entry->sptep ) diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index f1e2e3cad878..c9fabb3c9cb2 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -184,10 +184,25 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, pte_access &= ~ACC_EXEC_MASK; } - if (pte_access & ACC_EXEC_MASK) + if (pte_access & ACC_EXEC_MASK) { spte |= shadow_x_mask; - else +#ifdef CONFIG_HEKI + /* + * FIXME: Race condition (at boot) if no + * lockdep_assert_held_write(vcpu->kvm->mmu_lock); + */ + if (READ_ONCE(vcpu->kvm->heki_kernel_exec_locked)) { + if (!heki_exec_is_allowed(vcpu->kvm, gfn)) + spte &= ~VMX_EPT_EXECUTABLE_MASK; + else + pr_warn("heki-kvm: Allowing kernel execution " + "for GFN 0x%llx\n", + gfn); + } +#endif /* CONFIG_HEKI */ + } else { spte |= shadow_nx_mask; + } if (pte_access & ACC_USER_MASK) spte |= shadow_user_mask; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 6f54dc9409c9..30b250d03132 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -3,7 +3,10 @@ #ifndef KVM_X86_MMU_SPTE_H #define KVM_X86_MMU_SPTE_H +#include + #include "mmu_internal.h" +#include "../vmx/vmx.h" /* * A MMU present SPTE is backed by actual memory and may or may not be present @@ -307,9 +310,17 @@ static inline bool is_last_spte(u64 pte, int level) return (level == PG_LEVEL_4K) || is_large_pte(pte); } -static inline bool is_executable_pte(u64 spte) +static inline bool is_executable_pte(u64 spte, bool for_kernel_mode) { - return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask; + u64 x_mask = shadow_x_mask; + + if (enable_mbec) { + if (for_kernel_mode) + x_mask &= ~VMX_EPT_USER_EXECUTABLE_MASK; + else + x_mask &= ~VMX_EPT_EXECUTABLE_MASK; + } + return (spte & (x_mask | shadow_nx_mask)) == x_mask; } static inline kvm_pfn_t spte_to_pfn(u64 pte) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index d6df38d371a0..0be34a9e90c0 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -7,7 +7,10 @@ #include "tdp_mmu.h" #include "spte.h" +#include "../x86.h" + #include +#include #include static bool __read_mostly tdp_mmu_enabled = true; @@ -1021,6 +1024,76 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm) } } +#ifdef CONFIG_HEKI + +/* TODO: Handle flush? */ +void kvm_heki_fix_all_ept_exec_perm(struct kvm *const kvm) +{ + int i; + struct kvm_mmu_page *root; + const gfn_t start = 0; + const gfn_t end = tdp_mmu_max_gfn_exclusive(); + + if (WARN_ON_ONCE(!is_tdp_mmu_enabled(kvm))) + return; + + if (WARN_ON_ONCE(!enable_mbec)) + return; + + write_lock(&kvm->mmu_lock); + + /* + * Because heki_exec_locked is only set with this code, it cannot be + * unlocked. This is protected against race condition thanks to + * mmu_lock. + */ + WRITE_ONCE(kvm->heki_kernel_exec_locked, true); + + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { + for_each_tdp_mmu_root(kvm, root, i) { + struct tdp_iter iter; + + WARN_ON_ONCE(!refcount_read(&root->tdp_mmu_root_count)); + + /* + * TODO: Make sure + * !is_shadow_present_pte()/SPTE_MMU_PRESENT_MASK are + * well handled when they are present. + */ + + rcu_read_lock(); + tdp_root_for_each_leaf_pte(iter, root, start, end) { + u64 new_spte; + + if (heki_exec_is_allowed(kvm, iter.gfn)) { + pr_warn("heki-kvm: Allowing kernel " + "execution for GFN 0x%llx\n", + iter.gfn); + continue; + } + pr_warn("heki-kvm: Denying kernel execution " + "for GFN 0x%llx\n", + iter.gfn); + +retry: + new_spte = iter.old_spte & + ~VMX_EPT_EXECUTABLE_MASK; + if (new_spte == iter.old_spte) + continue; + + if (tdp_mmu_set_spte_atomic(kvm, &iter, + new_spte)) + goto retry; + } + rcu_read_unlock(); + } + } + write_unlock(&kvm->mmu_lock); + pr_warn("heki-kvm: Locked executable kernel memory\n"); +} + +#endif /* CONFIG_HEKI */ + /* * Zap all invalidated roots to ensure all SPTEs are dropped before the "fast * zap" completes. diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index d3714200b932..8b70b6af68d4 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -24,6 +24,10 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm); void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm); void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm); +#ifdef CONFIG_HEKI +void kvm_heki_fix_all_ept_exec_perm(struct kvm *const kvm); +#endif /* CONFIG_HEKI */ + int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a529455359ac..7ac8d9fabc18 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -20,6 +20,7 @@ #include "irq.h" #include "ioapic.h" #include "mmu.h" +#include "mmu/tdp_mmu.h" #include "i8254.h" #include "tss.h" #include "kvm_cache_regs.h" @@ -31,6 +32,7 @@ #include "lapic.h" #include "xen.h" #include "smm.h" +#include "vmx/capabilities.h" #include #include @@ -9705,6 +9707,45 @@ heki_page_track_prewrite(struct kvm_vcpu *const vcpu, const gpa_t gpa, return true; } +bool heki_exec_is_allowed(const struct kvm *const kvm, const gfn_t gfn) +{ + unsigned int gfn_last; + + if (!READ_ONCE(kvm->heki_kernel_exec_locked)) + return true; + + /* + * heki_gfn_exec_last is initialized with (HEKI_GFN_MAX + 1), + * and 0 means that heki_gfn_exec_last is full. + */ + for (gfn_last = atomic_read(&kvm->heki_gfn_exec_last); + gfn_last > 0 && gfn_last <= HEKI_GFN_MAX;) { + gfn_last--; + + /* Ignores unused slots. */ + if (kvm->heki_gfn_exec[gfn_last].end == 0) + break; + + if (gfn >= kvm->heki_gfn_exec[gfn_last].start && + gfn <= kvm->heki_gfn_exec[gfn_last].end) { + /* TODO: Opportunistically shrink heki_gfn_exec. */ + return true; + } + } + return false; +} + +bool kvm_heki_is_exec_allowed(struct kvm_vcpu *vcpu, gpa_t gpa) +{ + const gfn_t gfn = gpa_to_gfn(gpa); + const struct kvm *const kvm = vcpu->kvm; + + if (heki_exec_is_allowed(kvm, gfn)) + return true; + + return false; +} + static int kvm_heki_init_vm(struct kvm *const kvm) { struct kvm_page_track_notifier_node *const node = @@ -9733,6 +9774,7 @@ static int heki_lock_mem_page_ranges(struct kvm *const kvm, gpa_t mem_ranges, int err; size_t i, ranges_num; struct heki_pa_range *ranges; + bool has_exec_restriction = false; if (mem_ranges_size > HEKI_PA_RANGE_MAX_SIZE) return -KVM_E2BIG; @@ -9752,7 +9794,8 @@ static int heki_lock_mem_page_ranges(struct kvm *const kvm, gpa_t mem_ranges, ranges_num = mem_ranges_size / sizeof(struct heki_pa_range); for (i = 0; i < ranges_num; i++) { - const u64 attributes_mask = HEKI_ATTR_MEM_NOWRITE; + const u64 attributes_mask = HEKI_ATTR_MEM_NOWRITE | + HEKI_ATTR_MEM_EXEC; const gfn_t gfn_start = ranges[i].gfn_start; const gfn_t gfn_end = ranges[i].gfn_end; const u64 attributes = ranges[i].attributes; @@ -9799,11 +9842,52 @@ static int heki_lock_mem_page_ranges(struct kvm *const kvm, gpa_t mem_ranges, kvm, gfn, KVM_PAGE_TRACK_PREWRITE)); } - pr_warn("heki-kvm: Locking GFN 0x%llx-0x%llx with %s\n", + /* + * Allow-list for execute permission, + * see kvm_heki_fix_all_ept_exec_perm(). + */ + if (attributes & HEKI_ATTR_MEM_EXEC) { + size_t gfn_i; + + if (!enable_mbec) { + /* + * Guests can check for MBEC support to avoid + * such error by not using HEKI_ATTR_MEM_EXEC. + */ + err = -KVM_EOPNOTSUPP; + pr_warn("heki-kvm: HEKI_ATTR_MEM_EXEC " + "depends on MBEC, which is disabled."); + /* + * We should continue partially applying + * restrictions, but it is useful for this RFC + * to exit early in case of missing MBEC + * support. + */ + goto out_free_ranges; + } + + has_exec_restriction = true; + gfn_i = atomic_dec_if_positive( + &kvm->heki_gfn_exec_last); + if (gfn_i == 0) { + err = -KVM_E2BIG; + goto out_free_ranges; + } + + gfn_i--; + kvm->heki_gfn_exec[gfn_i].start = gfn_start; + kvm->heki_gfn_exec[gfn_i].end = gfn_end; + } + + pr_warn("heki-kvm: Locking GFN 0x%llx-0x%llx with %s%s\n", gfn_start, gfn_end, - (attributes & HEKI_ATTR_MEM_NOWRITE) ? "[nowrite]" : ""); + (attributes & HEKI_ATTR_MEM_NOWRITE) ? "[nowrite]" : "", + (attributes & HEKI_ATTR_MEM_EXEC) ? "[exec]" : ""); } + if (has_exec_restriction) + kvm_heki_fix_all_ept_exec_perm(kvm); + out_free_ranges: kfree(ranges); return err; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 3e80a60ecbd8..2127e551202d 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -282,6 +282,8 @@ int heki_check_cr(const struct kvm *kvm, unsigned long cr, unsigned long val); bool kvm_heki_is_exec_allowed(struct kvm_vcpu *vcpu, gpa_t gpa); +bool heki_exec_is_allowed(const struct kvm *const kvm, const gfn_t gfn); + #else /* CONFIG_HEKI */ static inline int heki_check_cr(const struct kvm *const kvm, @@ -290,6 +292,11 @@ static inline int heki_check_cr(const struct kvm *const kvm, return 0; } +static inline bool kvm_heki_is_exec_allowed(struct kvm_vcpu *vcpu, gpa_t gpa) +{ + return true; +} + #endif /* CONFIG_HEKI */ void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ab9dc723bc89..82c7b02cbcc3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -812,9 +812,13 @@ struct kvm { #define HEKI_GFN_MAX 16 atomic_t heki_gfn_no_write_num; struct heki_gfn_range heki_gfn_no_write[HEKI_GFN_MAX]; + atomic_t heki_gfn_exec_last; + struct heki_gfn_range heki_gfn_exec[HEKI_GFN_MAX]; atomic_long_t heki_pinned_cr0; atomic_long_t heki_pinned_cr4; + + bool heki_kernel_exec_locked; #endif /* CONFIG_HEKI */ #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4aea936dfe73..a177f8ff5123 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1232,6 +1232,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) #ifdef CONFIG_HEKI atomic_set(&kvm->heki_gfn_no_write_num, HEKI_GFN_MAX + 1); + atomic_set(&kvm->heki_gfn_exec_last, HEKI_GFN_MAX + 1); #endif /* CONFIG_HEKI */ preempt_notifier_inc(); From patchwork Fri May 5 15:20:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= X-Patchwork-Id: 13232760 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DA8B5C7EE26 for ; Fri, 5 May 2023 15:32:07 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1puxOy-0008AK-TS; Fri, 05 May 2023 11:31:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG8-0003FT-4k for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:12 -0400 Received: from smtp-bc0d.mail.infomaniak.ch ([45.157.188.13]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1puxG5-0003k2-8q for qemu-devel@nongnu.org; Fri, 05 May 2023 11:22:11 -0400 Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4QCZDH0FZXzMqc7P; Fri, 5 May 2023 17:22:07 +0200 (CEST) Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4QCZDG2bp6zMpxBc; Fri, 5 May 2023 17:22:06 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=digikod.net; s=20191114; t=1683300126; bh=6hEUXV1Yu+QqE06q6YV92ksgcTE4SSVMGFp/YRlUSqM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cCom6GYQLGlk2nCbcclR8AZKUkW0AJpZFC6wR92gff0rbAnNjJlFfzBd5J+3y++84 WAx+LLa8CAjewu98goOyt+2vh84ogydmCURfKSQ02P+jr2VYlfAmiT9Knsjn08r/LT Tng2aFdb73PL8bp/dTy6vPMFTznPlq63+NzYf8sI= From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= To: Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ingo Molnar , Kees Cook , Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= , Alexander Graf , Forrest Yuan Yu , James Morris , John Andersen , Liran Alon , "Madhavan T . Venkataraman" , Marian Rotariu , =?utf-8?q?Mihai_Don=C8=9Bu?= , =?utf-8?b?TmljdciZ?= =?utf-8?b?b3IgQ8OuyJt1?= , Rick Edgecombe , Thara Gopinath , Will Deacon , Zahra Tarkhani , =?utf-8?q?=C8=98tefan_=C8=98icler?= =?utf-8?q?u?= , dev@lists.cloudhypervisor.org, kvm@vger.kernel.org, linux-hardening@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, x86@kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v1 9/9] virt: Add Heki KUnit tests Date: Fri, 5 May 2023 17:20:46 +0200 Message-Id: <20230505152046.6575-10-mic@digikod.net> In-Reply-To: <20230505152046.6575-1-mic@digikod.net> References: <20230505152046.6575-1-mic@digikod.net> MIME-Version: 1.0 X-Infomaniak-Routing: alpha Received-SPF: pass client-ip=45.157.188.13; envelope-from=mic@digikod.net; helo=smtp-bc0d.mail.infomaniak.ch X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Fri, 05 May 2023 11:31:06 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org This adds a new CONFIG_HEKI_TEST option to run tests at boot. Indeed, because this patch series forbids the loading of kernel modules after the boot, we need to make built-in tests. Furthermore, because we use some symbols not exported to modules (e.g., kernel_set_to_readonly) this could not work as modules. To run these tests, we need to boot the kernel with the heki_test=N boot argument with N selecting a specific test: 1. heki_test_cr_disable_smep: Check CR pinning and try to disable SMEP. 2. heki_test_write_to_const: Check .rodata (const) protection. 3. heki_test_write_to_ro_after_init: Check __ro_after_init protection. 4. heki_test_exec: Check non-executable kernel memory. This way to select tests should not be required when the kernel will properly handle the triggered synthetic page faults. For now, these page faults make the kernel loop. All these tests temporarily disable the related kernel self-protections and should then failed if Heki doesn't protect the kernel. They are verbose to make it easier to understand what is going on. Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Kees Cook Cc: Madhavan T. Venkataraman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Thomas Gleixner Cc: Vitaly Kuznetsov Cc: Wanpeng Li Signed-off-by: Mickaël Salaün Link: https://lore.kernel.org/r/20230505152046.6575-10-mic@digikod.net --- virt/heki/Kconfig | 12 +++ virt/heki/heki.c | 194 +++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 205 insertions(+), 1 deletion(-) diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig index 96f18ce03013..806981f2b22d 100644 --- a/virt/heki/Kconfig +++ b/virt/heki/Kconfig @@ -27,3 +27,15 @@ config HYPERVISOR_SUPPORTS_HEKI A hypervisor should select this when it can successfully build and run with CONFIG_HEKI. That is, it should provide all of the hypervisor support required for the Heki feature. + +config HEKI_TEST + bool "Tests for Heki" if !KUNIT_ALL_TESTS + depends on HEKI && KUNIT=y + default KUNIT_ALL_TESTS + help + Run Heki tests at runtime according to the heki_test=N boot + parameter, with N identifying the test to run (between 1 and 4). + + Before launching the init process, the system might not respond + because of unhandled kernel page fault. This will be fixed in a + next patch series. diff --git a/virt/heki/heki.c b/virt/heki/heki.c index 142b5dc98a2f..361e7734e950 100644 --- a/virt/heki/heki.c +++ b/virt/heki/heki.c @@ -5,11 +5,13 @@ * Copyright © 2023 Microsoft Corporation */ +#include #include #include #include #include #include +#include #include #include @@ -78,13 +80,201 @@ void __init heki_early_init(void) heki_arch_init(); } +#ifdef CONFIG_HEKI_TEST + +/* Heki test data */ + +/* Takes two pages to not change permission of other read-only pages. */ +const char heki_test_const_buf[PAGE_SIZE * 2] = {}; +char heki_test_ro_after_init_buf[PAGE_SIZE * 2] __ro_after_init = {}; + +long heki_test_exec_data(long); +void _test_exec_data_end(void); + +/* Used to test ROP execution against the .rodata section. */ +/* clang-format off */ +asm( +".pushsection .rodata;" // NOT .text section +".global heki_test_exec_data;" +".type heki_test_exec_data, @function;" +"heki_test_exec_data:" +ASM_ENDBR +"movq %rdi, %rax;" +"inc %rax;" +ASM_RET +".size heki_test_exec_data, .-heki_test_exec_data;" +"_test_exec_data_end:" +".popsection"); +/* clang-format on */ + +static void heki_test_cr_disable_smep(struct kunit *test) +{ + unsigned long cr4; + + /* SMEP should be initially enabled. */ + KUNIT_ASSERT_TRUE(test, __read_cr4() & X86_CR4_SMEP); + + kunit_warn(test, + "Starting control register pinning tests with SMEP check\n"); + + /* + * Trying to disable SMEP, bypassing kernel self-protection by not + * using cr4_clear_bits(X86_CR4_SMEP). + */ + cr4 = __read_cr4() & ~X86_CR4_SMEP; + asm volatile("mov %0,%%cr4" : "+r"(cr4) : : "memory"); + + /* SMEP should still be enabled. */ + KUNIT_ASSERT_TRUE(test, __read_cr4() & X86_CR4_SMEP); +} + +static inline void print_addr(struct kunit *test, const char *const buf_name, + void *const buf) +{ + const pte_t pte = *virt_to_kpte((unsigned long)buf); + const phys_addr_t paddr = slow_virt_to_phys(buf); + bool present = pte_flags(pte) & (_PAGE_PRESENT); + bool accessible = pte_accessible(&init_mm, pte); + + kunit_warn( + test, + "%s vaddr:%llx paddr:%llx exec:%d write:%d present:%d accessible:%d\n", + buf_name, (unsigned long long)buf, paddr, !!pte_exec(pte), + !!pte_write(pte), present, accessible); +} + +extern int kernel_set_to_readonly; + +static void heki_test_write_to_rodata(struct kunit *test, + const char *const buf_name, + char *const ro_buf) +{ + print_addr(test, buf_name, (void *)ro_buf); + KUNIT_EXPECT_EQ(test, 0, *ro_buf); + + kunit_warn( + test, + "Bypassing kernel self-protection: mark memory as writable\n"); + kernel_set_to_readonly = 0; + /* + * Removes execute permission that might be set by bugdoor-exec, + * because change_page_attr_clear() is not use by set_memory_rw(). + * This is required since commit 652c5bf380ad ("x86/mm: Refuse W^X + * violations"). + */ + KUNIT_ASSERT_FALSE(test, set_memory_nx((unsigned long)PTR_ALIGN_DOWN( + ro_buf, PAGE_SIZE), + 1)); + KUNIT_ASSERT_FALSE(test, set_memory_rw((unsigned long)PTR_ALIGN_DOWN( + ro_buf, PAGE_SIZE), + 1)); + kernel_set_to_readonly = 1; + + kunit_warn(test, "Trying memory write\n"); + *ro_buf = 0x11; + KUNIT_EXPECT_EQ(test, 0, *ro_buf); + kunit_warn(test, "New content: 0x%02x\n", *ro_buf); +} + +static void heki_test_write_to_const(struct kunit *test) +{ + heki_test_write_to_rodata(test, "const_buf", + (void *)heki_test_const_buf); +} + +static void heki_test_write_to_ro_after_init(struct kunit *test) +{ + heki_test_write_to_rodata(test, "ro_after_init_buf", + (void *)heki_test_ro_after_init_buf); +} + +typedef long test_exec_t(long); + +static void heki_test_exec(struct kunit *test) +{ + const size_t exec_size = 7; + unsigned long nx_page_start = (unsigned long)PTR_ALIGN_DOWN( + (const void *const)heki_test_exec_data, PAGE_SIZE); + unsigned long nx_page_end = (unsigned long)PTR_ALIGN( + (const void *const)heki_test_exec_data + exec_size, PAGE_SIZE); + test_exec_t *exec = (test_exec_t *)heki_test_exec_data; + long ret; + + /* Starting non-executable memory tests. */ + print_addr(test, "test_exec_data", heki_test_exec_data); + + kunit_warn( + test, + "Bypassing kernel-self protection: mark memory as executable\n"); + kernel_set_to_readonly = 0; + KUNIT_ASSERT_FALSE(test, + set_memory_rox(nx_page_start, + PFN_UP(nx_page_end - nx_page_start))); + kernel_set_to_readonly = 1; + + kunit_warn( + test, + "Trying to execute data (ROP) in (initially) non-executable memory\n"); + ret = exec(3); + + /* This should not be reached because of the uncaught page fault. */ + KUNIT_EXPECT_EQ(test, 3, ret); + kunit_warn(test, "Result of execution: 3 + 1 = %ld\n", ret); +} + +const struct kunit_case heki_test_cases[] = { + KUNIT_CASE(heki_test_cr_disable_smep), + KUNIT_CASE(heki_test_write_to_const), + KUNIT_CASE(heki_test_write_to_ro_after_init), + KUNIT_CASE(heki_test_exec), + {} +}; + +static unsigned long heki_test __ro_after_init; + +static int __init parse_heki_test_config(char *str) +{ + if (kstrtoul(str, 10, &heki_test) || + heki_test > (ARRAY_SIZE(heki_test_cases) - 1)) + pr_warn("Invalid option string for heki_test: '%s'\n", str); + return 1; +} + +__setup("heki_test=", parse_heki_test_config); + +static void heki_run_test(void) +{ + struct kunit_case heki_test_case[2] = {}; + struct kunit_suite heki_test_suite = { + .name = "heki", + .test_cases = heki_test_case, + }; + struct kunit_suite *const test_suite = &heki_test_suite; + + if (!kunit_enabled() || heki_test == 0 || + heki_test >= ARRAY_SIZE(heki_test_cases)) + return; + + pr_warn("Running test #%lu\n", heki_test); + heki_test_case[0] = heki_test_cases[heki_test - 1]; + __kunit_test_suites_init(&test_suite, 1); +} + +#else /* CONFIG_HEKI_TEST */ + +static inline void heki_run_test(void) +{ +} + +#endif /* CONFIG_HEKI_TEST */ + void heki_late_init(void) { struct heki_hypervisor *hypervisor = heki.hypervisor; int ret; if (!heki_enabled) - return; + return heki_run_test(); if (!heki.static_ranges) { pr_warn("Architecture did not initialize static ranges\n"); @@ -113,6 +303,8 @@ void heki_late_init(void) goto out; pr_warn("Control registers locked\n"); + heki_run_test(); + out: heki_free_pa_ranges(heki.static_ranges, heki.num_static_ranges); heki.static_ranges = NULL;