[v10,29/81] KVM: x86: page_track: add support for preread, prewrite and preexec

Message ID	20201125093600.2766-30-alazar@bitdefender.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: =?utf-8?q?Adalbert_Laz=C4=83r?= <alazar@bitdefender.com> To: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org, Paolo Bonzini <pbonzini@redhat.com>, =?utf-8?q?Mihai_Don=C8=9Bu?= <mdontu@bitdefender.com>, =?utf-8?q?Adalbert_L?= =?utf-8?q?az=C4=83r?= <alazar@bitdefender.com> Subject: [PATCH v10 29/81] KVM: x86: page_track: add support for preread, prewrite and preexec Date: Wed, 25 Nov 2020 11:35:08 +0200 Message-Id: <20201125093600.2766-30-alazar@bitdefender.com> In-Reply-To: <20201125093600.2766-1-alazar@bitdefender.com> References: <20201125093600.2766-1-alazar@bitdefender.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	VM introspection \| expand [v10,00/81] VM introspection [v10,01/81] KVM: UAPI: add error codes used by the VM introspection code [v10,02/81] KVM: add kvm_vcpu_kick_and_wait() [v10,03/81] KVM: add kvm_get_max_gfn() [v10,04/81] KVM: doc: fix the hypercalls numbering [v10,05/81] KVM: x86: add kvm_arch_vcpu_get_regs() and kvm_arch_vcpu_get_sregs() [v10,06/81] KVM: x86: add kvm_arch_vcpu_set_regs() [v10,07/81] KVM: x86: avoid injecting #PF when emulate the VMCALL instruction [v10,08/81] KVM: x86: add kvm_x86_ops.bp_intercepted() [v10,09/81] KVM: x86: add kvm_x86_ops.control_cr3_intercept() [v10,10/81] KVM: x86: add kvm_x86_ops.cr3_write_intercepted() [v10,11/81] KVM: x86: add kvm_x86_ops.desc_ctrl_supported() [v10,12/81] KVM: svm: add support for descriptor-table VM-exits [v10,13/81] KVM: x86: add kvm_x86_ops.control_desc_intercept() [v10,14/81] KVM: x86: add kvm_x86_ops.desc_intercepted() [v10,15/81] KVM: x86: add kvm_x86_ops.msr_write_intercepted() [v10,16/81] KVM: x86: svm: use the vmx convention to control the MSR interception [v10,17/81] KVM: x86: add kvm_x86_ops.control_msr_intercept() [v10,18/81] KVM: x86: vmx: use a symbolic constant when checking the exit qualifications [v10,19/81] KVM: x86: save the error code during EPT/NPF exits handling [v10,20/81] KVM: x86: add kvm_x86_ops.fault_gla() [v10,21/81] KVM: x86: add kvm_x86_ops.control_singlestep() [v10,22/81] KVM: x86: export kvm_arch_vcpu_set_guest_debug() [v10,23/81] KVM: x86: extend kvm_mmu_gva_to_gpa_system() with the 'access' parameter [v10,24/81] KVM: x86: export kvm_inject_pending_exception() [v10,25/81] KVM: x86: export kvm_vcpu_ioctl_x86_get_xsave() [v10,26/81] KVM: x86: export kvm_vcpu_ioctl_x86_set_xsave() [v10,27/81] KVM: x86: page track: provide all callbacks with the guest virtual address [v10,28/81] KVM: x86: page track: add track_create_slot() callback [v10,29/81] KVM: x86: page_track: add support for preread, prewrite and preexec [v10,30/81] KVM: x86: wire in the preread/prewrite/preexec page trackers [v10,31/81] KVM: x86: disable gpa_available optimization for fetch and page-walk SPT violations [v10,32/81] KVM: introduce VM introspection [v10,33/81] KVM: introspection: add hook/unhook ioctls [v10,34/81] KVM: introspection: add permission access ioctls [v10,35/81] KVM: introspection: add the read/dispatch message function [v10,36/81] KVM: introspection: add KVMI_GET_VERSION [v10,37/81] KVM: introspection: add KVMI_VM_CHECK_COMMAND and KVMI_VM_CHECK_EVENT [v10,38/81] KVM: introspection: add KVMI_VM_GET_INFO [v10,39/81] KVM: introspection: add KVM_INTROSPECTION_PREUNHOOK [v10,40/81] KVM: introspection: add KVMI_VM_EVENT_UNHOOK [v10,41/81] KVM: introspection: add KVMI_VM_CONTROL_EVENTS [v10,42/81] KVM: introspection: add KVMI_VM_READ_PHYSICAL/KVMI_VM_WRITE_PHYSICAL [v10,43/81] KVM: introspection: add vCPU related data [v10,44/81] KVM: introspection: add a jobs list to every introspected vCPU [v10,45/81] KVM: introspection: handle vCPU introspection requests [v10,46/81] KVM: introspection: handle vCPU commands [v10,47/81] KVM: introspection: add KVMI_VCPU_GET_INFO [v10,48/81] KVM: introspection: add KVMI_VM_PAUSE_VCPU [v10,49/81] KVM: introspection: add support for vCPU events [v10,50/81] KVM: introspection: add KVMI_VCPU_EVENT_PAUSE [v10,51/81] KVM: introspection: add the crash action handling on the event reply [v10,52/81] KVM: introspection: add KVMI_VCPU_CONTROL_EVENTS [v10,53/81] KVM: introspection: add KVMI_VCPU_GET_REGISTERS [v10,54/81] KVM: introspection: add KVMI_VCPU_SET_REGISTERS [v10,55/81] KVM: introspection: add KVMI_VCPU_GET_CPUID [v10,56/81] KVM: introspection: add KVMI_VCPU_EVENT_HYPERCALL [v10,57/81] KVM: introspection: add KVMI_VCPU_EVENT_BREAKPOINT [v10,58/81] KVM: introspection: add cleanup support for vCPUs [v10,59/81] KVM: introspection: restore the state of #BP interception on unhook [v10,60/81] KVM: introspection: add KVMI_VM_CONTROL_CLEANUP [v10,61/81] KVM: introspection: add KVMI_VCPU_CONTROL_CR and KVMI_VCPU_EVENT_CR [v10,62/81] KVM: introspection: restore the state of CR3 interception on unhook [v10,63/81] KVM: introspection: add KVMI_VCPU_INJECT_EXCEPTION + KVMI_VCPU_EVENT_TRAP [v10,64/81] KVM: introspection: add KVMI_VM_GET_MAX_GFN [v10,65/81] KVM: introspection: add KVMI_VCPU_EVENT_XSETBV [v10,66/81] KVM: introspection: add KVMI_VCPU_GET_XCR [v10,67/81] KVM: introspection: add KVMI_VCPU_GET_XSAVE [v10,68/81] KVM: introspection: add KVMI_VCPU_SET_XSAVE [v10,69/81] KVM: introspection: add KVMI_VCPU_GET_MTRR_TYPE [v10,70/81] KVM: introspection: add KVMI_VCPU_EVENT_DESCRIPTOR [v10,71/81] KVM: introspection: restore the state of descriptor-table register interception on unho… [v10,72/81] KVM: introspection: add KVMI_VCPU_CONTROL_MSR and KVMI_VCPU_EVENT_MSR [v10,73/81] KVM: introspection: restore the state of MSR interception on unhook [v10,74/81] KVM: introspection: add KVMI_VM_SET_PAGE_ACCESS [v10,75/81] KVM: introspection: add KVMI_VCPU_EVENT_PF [v10,76/81] KVM: introspection: extend KVMI_GET_VERSION with struct kvmi_features [v10,77/81] KVM: introspection: add KVMI_VCPU_CONTROL_SINGLESTEP [v10,78/81] KVM: introspection: add KVMI_VCPU_EVENT_SINGLESTEP [v10,79/81] KVM: introspection: add KVMI_VCPU_TRANSLATE_GVA [v10,80/81] KVM: introspection: emulate a guest page table walk on SPT violations due to A/D bit up… [v10,81/81] KVM: x86: call the page tracking code on emulation failure

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h index 00a66c4d4d3c..c10f0f65c77a 100644 --- a/arch/x86/include/asm/kvm_page_track.h +++ b/arch/x86/include/asm/kvm_page_track.h @@ -3,7 +3,10 @@ #define _ASM_X86_KVM_PAGE_TRACK_H enum kvm_page_track_mode { + KVM_PAGE_TRACK_PREREAD, + KVM_PAGE_TRACK_PREWRITE, KVM_PAGE_TRACK_WRITE, + KVM_PAGE_TRACK_PREEXEC, KVM_PAGE_TRACK_MAX, }; @@ -22,6 +25,33 @@ struct kvm_page_track_notifier_head { struct kvm_page_track_notifier_node { struct hlist_node node; + /* + * It is called when guest is reading the read-tracked page + * and the read emulation is about to happen. + * + * @vcpu: the vcpu where the read access happened. + * @gpa: the physical address read by guest. + * @gva: the virtual address read by guest. + * @bytes: the read length. + * @node: this node. + */ + bool (*track_preread)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + int bytes, + struct kvm_page_track_notifier_node *node); + /* + * It is called when guest is writing the write-tracked page + * and the write emulation didn't happened yet. + * + * @vcpu: the vcpu where the write access happened. + * @gpa: the physical address written by guest. + * @gva: the virtual address written by guest. + * @new: the data was written to the address. + * @bytes: the written length. + * @node: this node + */ + bool (*track_prewrite)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + const u8 *new, int bytes, + struct kvm_page_track_notifier_node *node); /* * It is called when guest is writing the write-tracked page * and write emulation is finished at that time. @@ -36,6 +66,17 @@ struct kvm_page_track_notifier_node { void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, const u8 *new, int bytes, struct kvm_page_track_notifier_node *node); + /* + * It is called when guest is fetching from a exec-tracked page + * and the fetch emulation is about to happen. + * + * @vcpu: the vcpu where the fetch access happened. + * @gpa: the physical address fetched by guest. + * @gva: the virtual address fetched by guest. + * @node: this node. + */ + bool (*track_preexec)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + struct kvm_page_track_notifier_node *node); /* * It is called when memory slot is being created * @@ -49,7 +90,7 @@ struct kvm_page_track_notifier_node { struct kvm_page_track_notifier_node *node); /* * It is called when memory slot is being moved or removed - * users can drop write-protection for the pages in that memory slot + * users can drop active protection for the pages in that memory slot * * @kvm: the kvm where memory slot being moved or removed * @slot: the memory slot being moved or removed @@ -81,7 +122,12 @@ kvm_page_track_register_notifier(struct kvm *kvm, void kvm_page_track_unregister_notifier(struct kvm *kvm, struct kvm_page_track_notifier_node *n); +bool kvm_page_track_preread(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + int bytes); +bool kvm_page_track_prewrite(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + const u8 *new, int bytes); void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, const u8 *new, int bytes); +bool kvm_page_track_preexec(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva); void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot); #endif diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 1631e2367085..36add6fb712f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1124,6 +1124,31 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect) return mmu_spte_update(sptep, spte); } +static bool spte_read_protect(u64 *sptep) +{ + u64 spte = *sptep; + bool exec_only_supported = (shadow_present_mask == 0ull); + + rmap_printk("rmap_read_protect: spte %p %llx\n", sptep, *sptep); + + WARN_ON_ONCE(!exec_only_supported); + + spte = spte & ~(PT_WRITABLE_MASK | PT_PRESENT_MASK); + + return mmu_spte_update(sptep, spte); +} + +static bool spte_exec_protect(u64 *sptep) +{ + u64 spte = *sptep; + + rmap_printk("rmap_exec_protect: spte %p %llx\n", sptep, *sptep); + + spte = spte & ~PT_USER_MASK; + + return mmu_spte_update(sptep, spte); +} + static bool __rmap_write_protect(struct kvm *kvm, struct kvm_rmap_head *rmap_head, bool pt_protect) @@ -1138,6 +1163,32 @@ static bool __rmap_write_protect(struct kvm *kvm, return flush; } +static bool __rmap_read_protect(struct kvm *kvm, + struct kvm_rmap_head *rmap_head) +{ + struct rmap_iterator iter; + bool flush = false; + u64 *sptep; + + for_each_rmap_spte(rmap_head, &iter, sptep) + flush |= spte_read_protect(sptep); + + return flush; +} + +static bool __rmap_exec_protect(struct kvm *kvm, + struct kvm_rmap_head *rmap_head) +{ + struct rmap_iterator iter; + bool flush = false; + u64 *sptep; + + for_each_rmap_spte(rmap_head, &iter, sptep) + flush |= spte_exec_protect(sptep); + + return flush; +} + static bool spte_clear_dirty(u64 *sptep) { u64 spte = *sptep; @@ -1316,6 +1367,36 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, return write_protected; } +bool kvm_mmu_slot_gfn_read_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn) +{ + struct kvm_rmap_head *rmap_head; + bool read_protected = false; + int i; + + for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { + rmap_head = __gfn_to_rmap(gfn, i, slot); + read_protected |= __rmap_read_protect(kvm, rmap_head); + } + + return read_protected; +} + +bool kvm_mmu_slot_gfn_exec_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn) +{ + struct kvm_rmap_head *rmap_head; + bool exec_protected = false; + int i; + + for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { + rmap_head = __gfn_to_rmap(gfn, i, slot); + exec_protected |= __rmap_exec_protect(kvm, rmap_head); + } + + return exec_protected; +} + static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn) { struct kvm_memory_slot *slot; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index bfc6389edc28..6fa15728f6ca 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -92,6 +92,10 @@ void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn); void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn); bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, struct kvm_memory_slot *slot, u64 gfn); +bool kvm_mmu_slot_gfn_read_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn); +bool kvm_mmu_slot_gfn_exec_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn); void kvm_flush_remote_tlbs_with_address(struct kvm *kvm, u64 start_gfn, u64 pages); diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c index 27a8d1a02e84..ebbfd99a69da 100644 --- a/arch/x86/kvm/mmu/page_track.c +++ b/arch/x86/kvm/mmu/page_track.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Support KVM gust page tracking + * Support KVM guest page tracking * * This feature allows us to track page access in guest. Currently, only * write access is tracked. @@ -96,7 +96,7 @@ static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn, * @kvm: the guest instance we are interested in. * @slot: the @gfn belongs to. * @gfn: the guest page. - * @mode: tracking mode, currently only write track is supported. + * @mode: tracking mode. */ void kvm_slot_page_track_add_page(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -114,9 +114,16 @@ void kvm_slot_page_track_add_page(struct kvm *kvm, */ kvm_mmu_gfn_disallow_lpage(slot, gfn); - if (mode == KVM_PAGE_TRACK_WRITE) + if (mode == KVM_PAGE_TRACK_PREWRITE || mode == KVM_PAGE_TRACK_WRITE) { if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn)) kvm_flush_remote_tlbs(kvm); + } else if (mode == KVM_PAGE_TRACK_PREREAD) { + if (kvm_mmu_slot_gfn_read_protect(kvm, slot, gfn)) + kvm_flush_remote_tlbs(kvm); + } else if (mode == KVM_PAGE_TRACK_PREEXEC) { + if (kvm_mmu_slot_gfn_exec_protect(kvm, slot, gfn)) + kvm_flush_remote_tlbs(kvm); + } } EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page); @@ -131,7 +138,7 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page); * @kvm: the guest instance we are interested in. * @slot: the @gfn belongs to. * @gfn: the guest page. - * @mode: tracking mode, currently only write track is supported. + * @mode: tracking mode. */ void kvm_slot_page_track_remove_page(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -224,12 +231,80 @@ kvm_page_track_unregister_notifier(struct kvm *kvm, } EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier); +/* + * Notify the node that a read access is about to happen. Returning false + * doesn't stop the other nodes from being called, but it will stop + * the emulation. + * + * The node should figure out if the read page is the one that the node + * is interested in by itself. + * + * The nodes will always be in conflict if they track the same page: + * - accepting a read won't guarantee that the next node will not override + * the data (filling new/bytes and setting data_ready) + * - filling new/bytes with custom data won't guarantee that the next node + * will not override that + */ +bool kvm_page_track_preread(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + int bytes) +{ + struct kvm_page_track_notifier_head *head; + struct kvm_page_track_notifier_node *n; + int idx; + bool ret = true; + + head = &vcpu->kvm->arch.track_notifier_head; + + if (hlist_empty(&head->track_notifier_list)) + return ret; + + idx = srcu_read_lock(&head->track_srcu); + hlist_for_each_entry_srcu(n, &head->track_notifier_list, node, + srcu_read_lock_held(&head->track_srcu)) + if (n->track_preread) + if (!n->track_preread(vcpu, gpa, gva, bytes, n)) + ret = false; + srcu_read_unlock(&head->track_srcu, idx); + return ret; +} + +/* + * Notify the node that a write access is about to happen. Returning false + * doesn't stop the other nodes from being called, but it will stop + * the emulation. + * + * The node should figure out if the written page is the one that the node + * is interested in by itself. + */ +bool kvm_page_track_prewrite(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + const u8 *new, int bytes) +{ + struct kvm_page_track_notifier_head *head; + struct kvm_page_track_notifier_node *n; + int idx; + bool ret = true; + + head = &vcpu->kvm->arch.track_notifier_head; + + if (hlist_empty(&head->track_notifier_list)) + return ret; + + idx = srcu_read_lock(&head->track_srcu); + hlist_for_each_entry_srcu(n, &head->track_notifier_list, node, + srcu_read_lock_held(&head->track_srcu)) + if (n->track_prewrite) + if (!n->track_prewrite(vcpu, gpa, gva, new, bytes, n)) + ret = false; + srcu_read_unlock(&head->track_srcu, idx); + return ret; +} + /* * Notify the node that write access is intercepted and write emulation is * finished at this time. * - * The node should figure out if the written page is the one that node is - * interested in by itself. + * The node should figure out if the written page is the one that the node + * is interested in by itself. */ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, const u8 *new, int bytes) @@ -251,12 +326,42 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, srcu_read_unlock(&head->track_srcu, idx); } +/* + * Notify the node that an instruction is about to be executed. + * Returning false doesn't stop the other nodes from being called, + * but it will stop the emulation with X86EMUL_RETRY_INSTR. + * + * The node should figure out if the page is the one that the node + * is interested in by itself. + */ +bool kvm_page_track_preexec(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva) +{ + struct kvm_page_track_notifier_head *head; + struct kvm_page_track_notifier_node *n; + int idx; + bool ret = true; + + head = &vcpu->kvm->arch.track_notifier_head; + + if (hlist_empty(&head->track_notifier_list)) + return ret; + + idx = srcu_read_lock(&head->track_srcu); + hlist_for_each_entry_srcu(n, &head->track_notifier_list, node, + srcu_read_lock_held(&head->track_srcu)) + if (n->track_preexec) + if (!n->track_preexec(vcpu, gpa, gva, n)) + ret = false; + srcu_read_unlock(&head->track_srcu, idx); + return ret; +} + /* * Notify the node that memory slot is being removed or moved so that it can - * drop write-protection for the pages in the memory slot. + * drop active protection for the pages in the memory slot. * - * The node should figure out it has any write-protected pages in this slot - * by itself. + * The node should figure out if the page is the one that the node + * is interested in by itself. */ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot) {

[v10,29/81] KVM: x86: page_track: add support for preread, prewrite and preexec

Commit Message

Patch