[RFC,v6,23/92] kvm: page track: add support for preread, prewrite and preexec

Message ID	20190809160047.8319-24-alazar@bitdefender.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: =?utf-8?q?Adalbert_Laz=C4=83r?= <alazar@bitdefender.com> To: kvm@vger.kernel.org Cc: linux-mm@kvack.org, virtualization@lists.linux-foundation.org, Paolo Bonzini <pbonzini@redhat.com>, =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>, Tamas K Lengyel <tamas@tklengyel.com>, Mathieu Tarral <mathieu.tarral@protonmail.com>, =?utf-8?q?Samuel_Laur=C3=A9?= =?utf-8?q?n?= <samuel.lauren@iki.fi>, Patrick Colp <patrick.colp@oracle.com>, Jan Kiszka <jan.kiszka@siemens.com>, Stefan Hajnoczi <stefanha@redhat.com>, Weijiang Yang <weijiang.yang@intel.com>, Zhang@vger.kernel.org, Yu C <yu.c.zhang@intel.com>, =?utf-8?q?Mihai_Don=C8=9Bu?= <mdontu@bitdefender.com>, =?utf-8?q?Adalbert_L?= =?utf-8?q?az=C4=83r?= <alazar@bitdefender.com>, Xiao Guangrong <guangrong.xiao@gmail.com>, Sean Christopherson <sean.j.christopherson@intel.com> Subject: [RFC PATCH v6 23/92] kvm: page track: add support for preread, prewrite and preexec Date: Fri, 9 Aug 2019 18:59:38 +0300 Message-Id: <20190809160047.8319-24-alazar@bitdefender.com> In-Reply-To: <20190809160047.8319-1-alazar@bitdefender.com> References: <20190809160047.8319-1-alazar@bitdefender.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	VM introspection \| expand [RFC,v6,00/92] VM introspection [RFC,v6,01/92] kvm: introduce KVMI (VM introspection subsystem) [RFC,v6,02/92] kvm: introspection: add basic ioctls (hook/unhook) [RFC,v6,03/92] kvm: introspection: add permission access ioctls [RFC,v6,04/92] kvm: introspection: add the read/dispatch message function [RFC,v6,05/92] kvm: introspection: add KVMI_GET_VERSION [RFC,v6,06/92] kvm: introspection: add KVMI_CONTROL_CMD_RESPONSE [RFC,v6,07/92] kvm: introspection: honor the reply option when handling the KVMI_GET_VERSION command [RFC,v6,08/92] kvm: introspection: add KVMI_CHECK_COMMAND and KVMI_CHECK_EVENT [RFC,v6,09/92] kvm: introspection: add KVMI_GET_GUEST_INFO [RFC,v6,10/92] kvm: introspection: add KVMI_CONTROL_VM_EVENTS [RFC,v6,11/92] kvm: introspection: add vCPU related data [RFC,v6,12/92] kvm: introspection: add a jobs list to every introspected vCPU [RFC,v6,13/92] kvm: introspection: make the vCPU wait even when its jobs list is empty [RFC,v6,14/92] kvm: introspection: handle introspection commands before returning to guest [RFC,v6,15/92] kvm: introspection: handle vCPU related introspection commands [RFC,v6,16/92] kvm: introspection: handle events and event replies [RFC,v6,17/92] kvm: introspection: introduce event actions [RFC,v6,18/92] kvm: introspection: add KVMI_EVENT_UNHOOK [RFC,v6,19/92] kvm: introspection: add KVMI_EVENT_CREATE_VCPU [RFC,v6,20/92] kvm: introspection: add KVMI_GET_VCPU_INFO [RFC,v6,21/92] kvm: page track: add track_create_slot() callback [RFC,v6,22/92] kvm: x86: provide all page tracking hooks with the guest virtual address [RFC,v6,23/92] kvm: page track: add support for preread, prewrite and preexec [RFC,v6,24/92] kvm: x86: wire in the preread/prewrite/preexec page trackers [RFC,v6,25/92] kvm: x86: intercept the write access on sidt and other emulated instructions [RFC,v6,26/92] kvm: x86: add kvm_mmu_nested_pagefault() [RFC,v6,27/92] kvm: introspection: use page track [RFC,v6,28/92] kvm: x86: consult the page tracking from kvm_mmu_get_page() and __direct_map() [RFC,v6,29/92] kvm: introspection: add KVMI_CONTROL_EVENTS [RFC,v6,30/92] kvm: x86: add kvm_spt_fault() [RFC,v6,31/92] kvm: introspection: add KVMI_EVENT_PF [RFC,v6,32/92] kvm: introspection: add KVMI_GET_PAGE_ACCESS [RFC,v6,33/92] kvm: introspection: add KVMI_SET_PAGE_ACCESS [RFC,v6,34/92] Documentation: Introduce EPT based Subpage Protection [RFC,v6,35/92] KVM: VMX: Add control flags for SPP enabling [RFC,v6,36/92] KVM: VMX: Implement functions for SPPT paging setup [RFC,v6,37/92] KVM: VMX: Introduce SPP access bitmap and operation functions [RFC,v6,38/92] KVM: VMX: Add init/set/get functions for SPP [RFC,v6,39/92] KVM: VMX: Introduce SPP user-space IOCTLs [RFC,v6,40/92] KVM: VMX: Handle SPP induced vmexit and page fault [RFC,v6,41/92] KVM: MMU: Enable Lazy mode SPPT setup [RFC,v6,42/92] KVM: MMU: Handle host memory remapping and reclaim [RFC,v6,43/92] kvm: introspection: add KVMI_CONTROL_SPP [RFC,v6,44/92] kvm: introspection: extend the internal database of tracked pages with write_bitmap … [RFC,v6,45/92] kvm: introspection: add KVMI_GET_PAGE_WRITE_BITMAP [RFC,v6,46/92] kvm: introspection: add KVMI_SET_PAGE_WRITE_BITMAP [RFC,v6,47/92] kvm: introspection: add KVMI_READ_PHYSICAL and KVMI_WRITE_PHYSICAL [RFC,v6,48/92] kvm: add kvm_vcpu_kick_and_wait() [RFC,v6,49/92] kvm: introspection: add KVMI_PAUSE_VCPU and KVMI_EVENT_PAUSE_VCPU [RFC,v6,50/92] kvm: introspection: add KVMI_GET_REGISTERS [RFC,v6,51/92] kvm: introspection: add KVMI_SET_REGISTERS [RFC,v6,52/92] kvm: introspection: add KVMI_GET_CPUID [RFC,v6,53/92] kvm: introspection: add KVMI_INJECT_EXCEPTION + KVMI_EVENT_TRAP [RFC,v6,54/92] kvm: introspection: add KVMI_CONTROL_CR and KVMI_EVENT_CR [RFC,v6,55/92] kvm: introspection: add KVMI_CONTROL_MSR and KVMI_EVENT_MSR [RFC,v6,56/92] kvm: x86: block any attempt to disable MSR interception if tracked by introspection [RFC,v6,57/92] kvm: introspection: add KVMI_GET_XSAVE [RFC,v6,58/92] kvm: introspection: add KVMI_GET_MTRR_TYPE [RFC,v6,59/92] kvm: introspection: add KVMI_EVENT_XSETBV [RFC,v6,60/92] kvm: x86: add kvm_arch_vcpu_set_guest_debug() [RFC,v6,61/92] kvm: introspection: add KVMI_EVENT_BREAKPOINT [RFC,v6,62/92] kvm: introspection: add KVMI_EVENT_HYPERCALL [RFC,v6,63/92] kvm: introspection: add KVMI_EVENT_DESCRIPTOR [RFC,v6,64/92] kvm: introspection: add single-stepping [RFC,v6,65/92] kvm: introspection: add KVMI_EVENT_SINGLESTEP [RFC,v6,66/92] kvm: introspection: add custom input when single-stepping a vCPU [RFC,v6,67/92] kvm: introspection: use single stepping on unimplemented instructions [RFC,v6,68/92] kvm: x86: emulate a guest page table walk on SPT violations due to A/D bit updates [RFC,v6,69/92] kvm: x86: keep the page protected if tracked by the introspection tool [RFC,v6,70/92] kvm: x86: filter out access rights only when tracked by the introspection tool [RFC,v6,71/92] mm: add support for remote mapping [RFC,v6,72/92] kvm: introspection: add memory map/unmap support on the guest side [RFC,v6,73/92] kvm: introspection: use remote mapping [RFC,v6,74/92] kvm: x86: do not unconditionally patch the hypercall instruction during emulation [RFC,v6,75/92] kvm: x86: disable gpa_available optimization in emulator_read_write_onepage() [RFC,v6,76/92] kvm: x86: disable EPT A/D bits if introspection is present [RFC,v6,77/92] kvm: introspection: add trace functions [RFC,v6,78/92] kvm: x86: add tracepoints for interrupt and exception injections [RFC,v6,79/92] kvm: x86: emulate movsd xmm, m64 [RFC,v6,80/92] kvm: x86: emulate movss xmm, m32 [RFC,v6,81/92] kvm: x86: emulate movq xmm, m64 [RFC,v6,82/92] kvm: x86: emulate movq r, xmm [RFC,v6,83/92] kvm: x86: emulate movd xmm, m32 [RFC,v6,84/92] kvm: x86: enable the half part of movss, movsd, movups [RFC,v6,85/92] kvm: x86: emulate lfence [RFC,v6,86/92] kvm: x86: emulate xorpd xmm2/m128, xmm1 [RFC,v6,87/92] kvm: x86: emulate xorps xmm/m128, xmm [RFC,v6,88/92] kvm: x86: emulate fst/fstp m64fp [RFC,v6,89/92] kvm: x86: make lock cmpxchg r, r/m atomic [RFC,v6,90/92] kvm: x86: emulate lock cmpxchg8b atomically [RFC,v6,91/92] kvm: x86: emulate lock cmpxchg16b m128 [RFC,v6,92/92] kvm: x86: fallback to the single-step on multipage CMPXCHG emulation

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h index 0492a85f3a44..a431e5e1e5cb 100644 --- a/arch/x86/include/asm/kvm_page_track.h +++ b/arch/x86/include/asm/kvm_page_track.h @@ -3,7 +3,10 @@ #define _ASM_X86_KVM_PAGE_TRACK_H enum kvm_page_track_mode { + KVM_PAGE_TRACK_PREREAD, + KVM_PAGE_TRACK_PREWRITE, KVM_PAGE_TRACK_WRITE, + KVM_PAGE_TRACK_PREEXEC, KVM_PAGE_TRACK_MAX, }; @@ -22,6 +25,13 @@ struct kvm_page_track_notifier_head { struct kvm_page_track_notifier_node { struct hlist_node node; + bool (*track_preread)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + u8 *new, int bytes, + struct kvm_page_track_notifier_node *node, + bool *data_ready); + bool (*track_prewrite)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + const u8 *new, int bytes, + struct kvm_page_track_notifier_node *node); /* * It is called when guest is writing the write-tracked page * and write emulation is finished at that time. @@ -35,12 +45,14 @@ struct kvm_page_track_notifier_node { void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, const u8 *new, int bytes, struct kvm_page_track_notifier_node *node); + bool (*track_preexec)(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + struct kvm_page_track_notifier_node *node); void (*track_create_slot)(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned long npages, struct kvm_page_track_notifier_node *node); /* * It is called when memory slot is being moved or removed - * users can drop write-protection for the pages in that memory slot + * users can drop active protection for the pages in that memory slot * * @kvm: the kvm where memory slot being moved or removed * @slot: the memory slot being moved or removed @@ -73,7 +85,12 @@ kvm_page_track_register_notifier(struct kvm *kvm, void kvm_page_track_unregister_notifier(struct kvm *kvm, struct kvm_page_track_notifier_node *n); +bool kvm_page_track_preread(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + u8 *new, int bytes, bool *data_ready); +bool kvm_page_track_prewrite(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + const u8 *new, int bytes); void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, const u8 *new, int bytes); +bool kvm_page_track_preexec(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva); void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot); #endif diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9898d863b6b6..a86b165cf6dd 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1523,6 +1523,31 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect) return mmu_spte_update(sptep, spte); } +static bool spte_read_protect(u64 *sptep) +{ + u64 spte = *sptep; + bool exec_only_supported = (shadow_present_mask == 0ull); + + rmap_printk("rmap_read_protect: spte %p %llx\n", sptep, *sptep); + + WARN_ON_ONCE(!exec_only_supported); + + spte = spte & ~(PT_WRITABLE_MASK | PT_PRESENT_MASK); + + return mmu_spte_update(sptep, spte); +} + +static bool spte_exec_protect(u64 *sptep) +{ + u64 spte = *sptep; + + rmap_printk("rmap_exec_protect: spte %p %llx\n", sptep, *sptep); + + spte = spte & ~PT_USER_MASK; + + return mmu_spte_update(sptep, spte); +} + static bool __rmap_write_protect(struct kvm *kvm, struct kvm_rmap_head *rmap_head, bool pt_protect) @@ -1537,6 +1562,32 @@ static bool __rmap_write_protect(struct kvm *kvm, return flush; } +static bool __rmap_read_protect(struct kvm *kvm, + struct kvm_rmap_head *rmap_head) +{ + u64 *sptep; + struct rmap_iterator iter; + bool flush = false; + + for_each_rmap_spte(rmap_head, &iter, sptep) + flush |= spte_read_protect(sptep); + + return flush; +} + +static bool __rmap_exec_protect(struct kvm *kvm, + struct kvm_rmap_head *rmap_head) +{ + u64 *sptep; + struct rmap_iterator iter; + bool flush = false; + + for_each_rmap_spte(rmap_head, &iter, sptep) + flush |= spte_exec_protect(sptep); + + return flush; +} + static bool spte_clear_dirty(u64 *sptep) { u64 spte = *sptep; @@ -1707,6 +1758,36 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, return write_protected; } +bool kvm_mmu_slot_gfn_read_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn) +{ + struct kvm_rmap_head *rmap_head; + int i; + bool read_protected = false; + + for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) { + rmap_head = __gfn_to_rmap(gfn, i, slot); + read_protected |= __rmap_read_protect(kvm, rmap_head); + } + + return read_protected; +} + +bool kvm_mmu_slot_gfn_exec_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn) +{ + struct kvm_rmap_head *rmap_head; + int i; + bool exec_protected = false; + + for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) { + rmap_head = __gfn_to_rmap(gfn, i, slot); + exec_protected |= __rmap_exec_protect(kvm, rmap_head); + } + + return exec_protected; +} + static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn) { struct kvm_memory_slot *slot; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index c7b333147c4a..45948dabe0b6 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -210,5 +210,9 @@ void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn); void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn); bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, struct kvm_memory_slot *slot, u64 gfn); +bool kvm_mmu_slot_gfn_read_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn); +bool kvm_mmu_slot_gfn_exec_protect(struct kvm *kvm, + struct kvm_memory_slot *slot, u64 gfn); int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu); #endif diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c index ff7defb4a1d2..fc792939a05c 100644 --- a/arch/x86/kvm/page_track.c +++ b/arch/x86/kvm/page_track.c @@ -1,5 +1,5 @@ /* - * Support KVM gust page tracking + * Support KVM guest page tracking * * This feature allows us to track page access in guest. Currently, only * write access is tracked. @@ -101,7 +101,7 @@ static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn, * @kvm: the guest instance we are interested in. * @slot: the @gfn belongs to. * @gfn: the guest page. - * @mode: tracking mode, currently only write track is supported. + * @mode: tracking mode. */ void kvm_slot_page_track_add_page(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -119,9 +119,16 @@ void kvm_slot_page_track_add_page(struct kvm *kvm, */ kvm_mmu_gfn_disallow_lpage(slot, gfn); - if (mode == KVM_PAGE_TRACK_WRITE) + if (mode == KVM_PAGE_TRACK_PREWRITE || mode == KVM_PAGE_TRACK_WRITE) { if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn)) kvm_flush_remote_tlbs(kvm); + } else if (mode == KVM_PAGE_TRACK_PREREAD) { + if (kvm_mmu_slot_gfn_read_protect(kvm, slot, gfn)) + kvm_flush_remote_tlbs(kvm); + } else if (mode == KVM_PAGE_TRACK_PREEXEC) { + if (kvm_mmu_slot_gfn_exec_protect(kvm, slot, gfn)) + kvm_flush_remote_tlbs(kvm); + } } EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page); @@ -136,7 +143,7 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page); * @kvm: the guest instance we are interested in. * @slot: the @gfn belongs to. * @gfn: the guest page. - * @mode: tracking mode, currently only write track is supported. + * @mode: tracking mode. */ void kvm_slot_page_track_remove_page(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -229,12 +236,81 @@ kvm_page_track_unregister_notifier(struct kvm *kvm, } EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier); +/* + * Notify the node that a read access is about to happen. Returning false + * doesn't stop the other nodes from being called, but it will stop + * the emulation. + * + * The node should figure out if the written page is the one that the node + * is interested in by itself. + * + * The nodes will always be in conflict if they track the same page: + * - accepting a read won't guarantee that the next node will not override + * the data (filling new/bytes and setting data_ready) + * - filling new/bytes with custom data won't guarantee that the next node + * will not override that + */ +bool kvm_page_track_preread(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + u8 *new, int bytes, bool *data_ready) +{ + struct kvm_page_track_notifier_head *head; + struct kvm_page_track_notifier_node *n; + int idx; + bool ret = true; + + *data_ready = false; + + head = &vcpu->kvm->arch.track_notifier_head; + + if (hlist_empty(&head->track_notifier_list)) + return ret; + + idx = srcu_read_lock(&head->track_srcu); + hlist_for_each_entry_rcu(n, &head->track_notifier_list, node) + if (n->track_preread) + if (!n->track_preread(vcpu, gpa, gva, new, bytes, n, + data_ready)) + ret = false; + srcu_read_unlock(&head->track_srcu, idx); + return ret; +} + +/* + * Notify the node that a write access is about to happen. Returning false + * doesn't stop the other nodes from being called, but it will stop + * the emulation. + * + * The node should figure out if the written page is the one that the node + * is interested in by itself. + */ +bool kvm_page_track_prewrite(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, + const u8 *new, int bytes) +{ + struct kvm_page_track_notifier_head *head; + struct kvm_page_track_notifier_node *n; + int idx; + bool ret = true; + + head = &vcpu->kvm->arch.track_notifier_head; + + if (hlist_empty(&head->track_notifier_list)) + return ret; + + idx = srcu_read_lock(&head->track_srcu); + hlist_for_each_entry_rcu(n, &head->track_notifier_list, node) + if (n->track_prewrite) + if (!n->track_prewrite(vcpu, gpa, gva, new, bytes, n)) + ret = false; + srcu_read_unlock(&head->track_srcu, idx); + return ret; +} + /* * Notify the node that write access is intercepted and write emulation is * finished at this time. * - * The node should figure out if the written page is the one that node is - * interested in by itself. + * The node should figure out if the written page is the one that the node + * is interested in by itself. */ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, const u8 *new, int bytes) @@ -255,12 +331,41 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva, srcu_read_unlock(&head->track_srcu, idx); } +/* + * Notify the node that an instruction is about to be executed. + * Returning false doesn't stop the other nodes from being called, + * but it will stop the emulation with X86EMUL_RETRY_INSTR. + * + * The node should figure out if the written page is the one that the node + * is interested in by itself. + */ +bool kvm_page_track_preexec(struct kvm_vcpu *vcpu, gpa_t gpa, gva_t gva) +{ + struct kvm_page_track_notifier_head *head; + struct kvm_page_track_notifier_node *n; + int idx; + bool ret = true; + + head = &vcpu->kvm->arch.track_notifier_head; + + if (hlist_empty(&head->track_notifier_list)) + return ret; + + idx = srcu_read_lock(&head->track_srcu); + hlist_for_each_entry_rcu(n, &head->track_notifier_list, node) + if (n->track_preexec) + if (!n->track_preexec(vcpu, gpa, gva, n)) + ret = false; + srcu_read_unlock(&head->track_srcu, idx); + return ret; +} + /* * Notify the node that memory slot is being removed or moved so that it can - * drop write-protection for the pages in the memory slot. + * drop active protection for the pages in the memory slot. * - * The node should figure out it has any write-protected pages in this slot - * by itself. + * The node should figure out if the written page is the one that the node + * is interested in by itself. */ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot) {

[RFC,v6,23/92] kvm: page track: add support for preread, prewrite and preexec

Commit Message

Patch