[v8,34/81] KVM: x86: page_track: add support for preread, prewrite and preexec

Message ID	20200330101308.21702-35-alazar@bitdefender.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=cHma=5P=vger.kernel.org=kvm-owner@kernel.org> From: =?utf-8?q?Adalbert_Laz=C4=83r?= <alazar@bitdefender.com> To: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org, Paolo Bonzini <pbonzini@redhat.com>, =?utf-8?q?Mihai_Don=C8=9Bu?= <mdontu@bitdefender.com>, =?utf-8?q?Adalbert_L?= =?utf-8?q?az=C4=83r?= <alazar@bitdefender.com> Subject: [PATCH v8 34/81] KVM: x86: page_track: add support for preread, prewrite and preexec Date: Mon, 30 Mar 2020 13:12:21 +0300 Message-Id: <20200330101308.21702-35-alazar@bitdefender.com> In-Reply-To: <20200330101308.21702-1-alazar@bitdefender.com> References: <20200330101308.21702-1-alazar@bitdefender.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	VM introspection \| expand [v8,00/81] VM introspection [v8,01/81] sched/swait: add swait_event_killable_exclusive() [v8,02/81] export kill_pid_info() [v8,03/81] KVM: add new error codes for VM introspection [v8,04/81] KVM: add kvm_vcpu_kick_and_wait() [v8,05/81] KVM: add kvm_get_max_gfn() [v8,06/81] KVM: doc: fix the hypercall numbering [v8,07/81] KVM: x86: add kvm_arch_vcpu_get_regs() and kvm_arch_vcpu_get_sregs() [v8,08/81] KVM: x86: add kvm_arch_vcpu_set_regs() [v8,09/81] KVM: x86: avoid injecting #PF when emulate the VMCALL instruction [v8,10/81] KVM: x86: add .bp_intercepted() to struct kvm_x86_ops [v8,11/81] KVM: x86: add .control_cr3_intercept() to struct kvm_x86_ops [v8,12/81] KVM: x86: add .cr3_write_intercepted() [v8,13/81] KVM: x86: add .desc_ctrl_supported() [v8,14/81] KVM: svm: add support for descriptor-table exits [v8,15/81] KVM: x86: add .control_desc_intercept() [v8,16/81] KVM: x86: add .desc_intercepted() [v8,17/81] KVM: x86: export .msr_write_intercepted() [v8,18/81] KVM: x86: use MSR_TYPE_R, MSR_TYPE_W and MSR_TYPE_RW with AMD code too [v8,19/81] KVM: svm: pass struct kvm_vcpu to set_msr_interception() [v8,20/81] KVM: vmx: pass struct kvm_vcpu to the intercept msr related functions [v8,21/81] KVM: x86: add .control_msr_intercept() [v8,22/81] KVM: x86: vmx: use a symbolic constant when checking the exit qualifications [v8,23/81] KVM: x86: save the error code during EPT/NPF exits handling [v8,24/81] KVM: x86: add .fault_gla() [v8,25/81] KVM: x86: add .spt_fault() [v8,26/81] KVM: x86: add .gpt_translation_fault() [v8,27/81] KVM: x86: add .control_singlestep() [v8,28/81] KVM: x86: export kvm_arch_vcpu_set_guest_debug() [v8,29/81] KVM: x86: extend kvm_mmu_gva_to_gpa_system() with the 'access' parameter [v8,30/81] KVM: x86: export kvm_inject_pending_exception() [v8,31/81] KVM: x86: export kvm_vcpu_ioctl_x86_get_xsave() [v8,32/81] KVM: x86: page track: provide all page tracking hooks with the guest virtual address [v8,33/81] KVM: x86: page track: add track_create_slot() callback [v8,34/81] KVM: x86: page_track: add support for preread, prewrite and preexec [v8,35/81] KVM: x86: wire in the preread/prewrite/preexec page trackers [v8,36/81] KVM: x86: intercept the write access on sidt and other emulated instructions [v8,37/81] KVM: x86: disable gpa_available optimization for fetch and page-walk NPF/EPT violations [v8,38/81] KVM: introduce VM introspection [v8,39/81] KVM: introspection: add hook/unhook ioctls [v8,40/81] KVM: introspection: add permission access ioctls [v8,41/81] KVM: introspection: add the read/dispatch message function [v8,42/81] KVM: introspection: add KVMI_GET_VERSION [v8,43/81] KVM: introspection: add KVMI_VM_CHECK_COMMAND and KVMI_VM_CHECK_EVENT [v8,44/81] KVM: introspection: add KVMI_VM_GET_INFO [v8,45/81] KVM: introspection: add KVMI_EVENT_UNHOOK [v8,46/81] KVM: introspection: add KVMI_VM_CONTROL_EVENTS [v8,47/81] KVM: introspection: add KVMI_VM_READ_PHYSICAL/KVMI_VM_WRITE_PHYSICAL [v8,48/81] KVM: introspection: add vCPU related data [v8,49/81] KVM: introspection: add a jobs list to every introspected vCPU [v8,50/81] KVM: introspection: handle vCPU introspection requests [v8,51/81] KVM: introspection: handle vCPU commands [v8,52/81] KVM: introspection: add KVMI_VCPU_GET_INFO [v8,53/81] KVM: introspection: add KVMI_VCPU_PAUSE [v8,54/81] KVM: introspection: add KVMI_EVENT_PAUSE_VCPU [v8,55/81] KVM: introspection: add crash action handling on event reply [v8,56/81] KVM: introspection: add KVMI_VCPU_CONTROL_EVENTS [v8,57/81] KVM: introspection: add KVMI_VCPU_GET_REGISTERS [v8,58/81] KVM: introspection: add KVMI_VCPU_SET_REGISTERS [v8,59/81] KVM: introspection: add KVMI_VCPU_GET_CPUID [v8,60/81] KVM: introspection: add KVMI_EVENT_HYPERCALL [v8,61/81] KVM: introspection: add KVMI_EVENT_BREAKPOINT [v8,62/81] KVM: introspection: restore the state of #BP interception on unhook [v8,63/81] KVM: introspection: add KVMI_VCPU_CONTROL_CR and KVMI_EVENT_CR [v8,64/81] KVM: introspection: restore the state of CR3 interception on unhook [v8,65/81] KVM: introspection: add KVMI_VCPU_INJECT_EXCEPTION + KVMI_EVENT_TRAP [v8,66/81] KVM: introspection: add KVMI_VM_GET_MAX_GFN [v8,67/81] KVM: introspection: add KVMI_EVENT_XSETBV [v8,68/81] KVM: introspection: add KVMI_VCPU_GET_XSAVE [v8,69/81] KVM: introspection: add KVMI_VCPU_GET_MTRR_TYPE [v8,70/81] KVM: introspection: add KVMI_EVENT_DESCRIPTOR [v8,71/81] KVM: introspection: restore the state of descriptor-table register interception on unhook [v8,72/81] KVM: introspection: add KVMI_VCPU_CONTROL_MSR and KVMI_EVENT_MSR [v8,73/81] KVM: introspection: restore the state of MSR interception on unhook [v8,74/81] KVM: introspection: add KVMI_VM_SET_PAGE_ACCESS [v8,75/81] KVM: introspection: add KVMI_EVENT_PF [v8,76/81] KVM: introspection: extend KVMI_GET_VERSION with struct kvmi_features [v8,77/81] KVM: introspection: add KVMI_VCPU_CONTROL_SINGLESTEP [v8,78/81] KVM: introspection: add KVMI_EVENT_SINGLESTEP [v8,79/81] KVM: introspection: add KVMI_VCPU_TRANSLATE_GVA [v8,80/81] KVM: introspection: emulate a guest page table walk on SPT violations due to A/D bit upd… [v8,81/81] KVM: x86: call the page tracking code on emulation failure

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h index dc528c6f2eb0..646cbfa07676 100644 --- a/arch/x86/include/asm/kvm_page_track.h +++ b/arch/x86/include/asm/kvm_page_track.h @@ -3,7 +3,10 @@ #define _ASM_X86_KVM_PAGE_TRACK_H enum kvm_page_track_mode { + KVM_PAGE_TRACK_PREREAD, + KVM_PAGE_TRACK_PREWRITE, KVM_PAGE_TRACK_WRITE, + KVM_PAGE_TRACK_PREEXEC, KVM_PAGE_TRACK_MAX, }; @@ -22,6 +25,33 @@ struct kvm_page_track_notifier_head { struct kvm_page_track_notifier_node { struct hlist_node node; + /* + * It is called when guest is reading the read-tracked page + * and the read emulation is about to happen. + * + * @vcpu: the vcpu where the read access happened. + * @gpa: the physical address read by guest. + * @gva: the virtual address read by guest. + * @bytes: the read length. + * @node: this node. + */ + bool (*track_preread)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + int bytes, + struct kvm_page_track_notifier_node *node); + /* + * It is called when guest is writing the write-tracked page + * and the write emulation didn't happened yet. + * + * @vcpu: the vcpu where the write access happened. + * @gpa: the physical address written by guest. + * @gva: the virtual address written by guest. + * @new: the data was written to the address. + * @bytes: the written length. + * @node: this node + */ + bool (*track_prewrite)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + const u8 *new, int bytes, + struct kvm_page_track_notifier_node *node); /* * It is called when guest is writing the write-tracked page * and write emulation is finished at that time. @@ -36,6 +66,17 @@ struct kvm_page_track_notifier_node { void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, const u8 *new, int bytes, struct kvm_page_track_notifier_node *node); + /* + * It is called when guest is fetching from a exec-tracked page + * and the fetch emulation is about to happen. + * + * @vcpu: the vcpu where the fetch access happened. + * @gpa: the physical address fetched by guest. + * @gva: the virtual address fetched by guest. + * @node: this node. + */ + bool (*track_preexec)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + struct kvm_page_track_notifier_node *node); /* * It is called when memory slot is being created * @@ -49,7 +90,7 @@ struct kvm_page_track_notifier_node { struct kvm_page_track_notifier_node *node); /* * It is called when memory slot is being moved or removed - * users can drop write-protection for the pages in that memory slot + * users can drop active protection for the pages in that memory slot * * @kvm: the kvm where memory slot being moved or removed * @slot: the memory slot being moved or removed @@ -82,7 +123,12 @@ kvm_page_track_register_notifier(struct kvm *kvm, void kvm_page_track_unregister_notifier(struct kvm *kvm, struct kvm_page_track_notifier_node *n); +bool kvm_page_track_preread(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + int bytes); +bool kvm_page_track_prewrite(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + const u8 *new, int bytes); void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, const u8 *new, int bytes); +bool kvm_page_track_preexec(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva); void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot); #endif diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index a647601c9e1c..2b5a7163ff39 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -222,6 +222,10 @@ void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn); void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn); bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, struct kvm_memory_slot *slot, u64 gfn); +bool kvm_mmu_slot_gfn_read_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn); +bool kvm_mmu_slot_gfn_exec_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn); int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu); int kvm_mmu_post_init_vm(struct kvm *kvm); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4edeb3e275bc..bd82cf1fb6a2 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1587,6 +1587,31 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect) return mmu_spte_update(sptep, spte); } +static bool spte_read_protect(u64 *sptep) +{ + u64 spte = *sptep; + bool exec_only_supported = (shadow_present_mask == 0ull); + + rmap_printk("rmap_read_protect: spte %p %llx\n", sptep, *sptep); + + WARN_ON_ONCE(!exec_only_supported); + + spte = spte & ~(PT_WRITABLE_MASK | PT_PRESENT_MASK); + + return mmu_spte_update(sptep, spte); +} + +static bool spte_exec_protect(u64 *sptep) +{ + u64 spte = *sptep; + + rmap_printk("rmap_exec_protect: spte %p %llx\n", sptep, *sptep); + + spte = spte & ~PT_USER_MASK; + + return mmu_spte_update(sptep, spte); +} + static bool __rmap_write_protect(struct kvm *kvm, struct kvm_rmap_head *rmap_head, bool pt_protect) @@ -1601,6 +1626,32 @@ static bool __rmap_write_protect(struct kvm *kvm, return flush; } +static bool __rmap_read_protect(struct kvm *kvm, + struct kvm_rmap_head *rmap_head) +{ + struct rmap_iterator iter; + bool flush = false; + u64 *sptep; + + for_each_rmap_spte(rmap_head, &iter, sptep) + flush |= spte_read_protect(sptep); + + return flush; +} + +static bool __rmap_exec_protect(struct kvm *kvm, + struct kvm_rmap_head *rmap_head) +{ + struct rmap_iterator iter; + bool flush = false; + u64 *sptep; + + for_each_rmap_spte(rmap_head, &iter, sptep) + flush |= spte_exec_protect(sptep); + + return flush; +} + static bool spte_clear_dirty(u64 *sptep) { u64 spte = *sptep; @@ -1776,6 +1827,36 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, return write_protected; } +bool kvm_mmu_slot_gfn_read_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn) +{ + struct kvm_rmap_head *rmap_head; + bool read_protected = false; + int i; + + for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) { + rmap_head = __gfn_to_rmap(gfn, i, slot); + read_protected |= __rmap_read_protect(kvm, rmap_head); + } + + return read_protected; +} + +bool kvm_mmu_slot_gfn_exec_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn) +{ + struct kvm_rmap_head *rmap_head; + bool exec_protected = false; + int i; + + for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) { + rmap_head = __gfn_to_rmap(gfn, i, slot); + exec_protected |= __rmap_exec_protect(kvm, rmap_head); + } + + return exec_protected; +} + static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn) { struct kvm_memory_slot *slot; diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c index f36e74430ad2..cc3eb2cc7e38 100644 --- a/arch/x86/kvm/mmu/page_track.c +++ b/arch/x86/kvm/mmu/page_track.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Support KVM gust page tracking + * Support KVM guest page tracking * * This feature allows us to track page access in guest. Currently, only * write access is tracked. @@ -99,7 +99,7 @@ static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn, * @kvm: the guest instance we are interested in. * @slot: the @gfn belongs to. * @gfn: the guest page. - * @mode: tracking mode, currently only write track is supported. + * @mode: tracking mode. */ void kvm_slot_page_track_add_page(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -117,9 +117,16 @@ void kvm_slot_page_track_add_page(struct kvm *kvm, */ kvm_mmu_gfn_disallow_lpage(slot, gfn); - if (mode == KVM_PAGE_TRACK_WRITE) + if (mode == KVM_PAGE_TRACK_PREWRITE || mode == KVM_PAGE_TRACK_WRITE) { if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn)) kvm_flush_remote_tlbs(kvm); + } else if (mode == KVM_PAGE_TRACK_PREREAD) { + if (kvm_mmu_slot_gfn_read_protect(kvm, slot, gfn)) + kvm_flush_remote_tlbs(kvm); + } else if (mode == KVM_PAGE_TRACK_PREEXEC) { + if (kvm_mmu_slot_gfn_exec_protect(kvm, slot, gfn)) + kvm_flush_remote_tlbs(kvm); + } } EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page); @@ -134,7 +141,7 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page); * @kvm: the guest instance we are interested in. * @slot: the @gfn belongs to. * @gfn: the guest page. - * @mode: tracking mode, currently only write track is supported. + * @mode: tracking mode. */ void kvm_slot_page_track_remove_page(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -227,12 +234,78 @@ kvm_page_track_unregister_notifier(struct kvm *kvm, } EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier); +/* + * Notify the node that a read access is about to happen. Returning false + * doesn't stop the other nodes from being called, but it will stop + * the emulation. + * + * The node should figure out if the read page is the one that the node + * is interested in by itself. + * + * The nodes will always be in conflict if they track the same page: + * - accepting a read won't guarantee that the next node will not override + * the data (filling new/bytes and setting data_ready) + * - filling new/bytes with custom data won't guarantee that the next node + * will not override that + */ +bool kvm_page_track_preread(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + int bytes) +{ + struct kvm_page_track_notifier_head *head; + struct kvm_page_track_notifier_node *n; + int idx; + bool ret = true; + + head = &vcpu->kvm->arch.track_notifier_head; + + if (hlist_empty(&head->track_notifier_list)) + return ret; + + idx = srcu_read_lock(&head->track_srcu); + hlist_for_each_entry_rcu(n, &head->track_notifier_list, node) + if (n->track_preread) + if (!n->track_preread(vcpu, gpa, gva, bytes, n)) + ret = false; + srcu_read_unlock(&head->track_srcu, idx); + return ret; +} + +/* + * Notify the node that a write access is about to happen. Returning false + * doesn't stop the other nodes from being called, but it will stop + * the emulation. + * + * The node should figure out if the written page is the one that the node + * is interested in by itself. + */ +bool kvm_page_track_prewrite(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + const u8 *new, int bytes) +{ + struct kvm_page_track_notifier_head *head; + struct kvm_page_track_notifier_node *n; + int idx; + bool ret = true; + + head = &vcpu->kvm->arch.track_notifier_head; + + if (hlist_empty(&head->track_notifier_list)) + return ret; + + idx = srcu_read_lock(&head->track_srcu); + hlist_for_each_entry_rcu(n, &head->track_notifier_list, node) + if (n->track_prewrite) + if (!n->track_prewrite(vcpu, gpa, gva, new, bytes, n)) + ret = false; + srcu_read_unlock(&head->track_srcu, idx); + return ret; +} + /* * Notify the node that write access is intercepted and write emulation is * finished at this time. * - * The node should figure out if the written page is the one that node is - * interested in by itself. + * The node should figure out if the written page is the one that the node + * is interested in by itself. */ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, const u8 *new, int bytes) @@ -253,12 +326,41 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, srcu_read_unlock(&head->track_srcu, idx); } +/* + * Notify the node that an instruction is about to be executed. + * Returning false doesn't stop the other nodes from being called, + * but it will stop the emulation with X86EMUL_RETRY_INSTR. + * + * The node should figure out if the page is the one that the node + * is interested in by itself. + */ +bool kvm_page_track_preexec(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva) +{ + struct kvm_page_track_notifier_head *head; + struct kvm_page_track_notifier_node *n; + int idx; + bool ret = true; + + head = &vcpu->kvm->arch.track_notifier_head; + + if (hlist_empty(&head->track_notifier_list)) + return ret; + + idx = srcu_read_lock(&head->track_srcu); + hlist_for_each_entry_rcu(n, &head->track_notifier_list, node) + if (n->track_preexec) + if (!n->track_preexec(vcpu, gpa, gva, n)) + ret = false; + srcu_read_unlock(&head->track_srcu, idx); + return ret; +} + /* * Notify the node that memory slot is being removed or moved so that it can - * drop write-protection for the pages in the memory slot. + * drop active protection for the pages in the memory slot. * - * The node should figure out it has any write-protected pages in this slot - * by itself. + * The node should figure out if the page is the one that the node + * is interested in by itself. */ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot) {

[v8,34/81] KVM: x86: page_track: add support for preread, prewrite and preexec

Commit Message

Patch