From patchwork Fri Dec 6 23:57:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277193 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 20863159A for ; Fri, 6 Dec 2019 23:58:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E9C6821835 for ; Fri, 6 Dec 2019 23:58:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726646AbfLFX6v (ORCPT ); Fri, 6 Dec 2019 18:58:51 -0500 Received: from mga07.intel.com ([134.134.136.100]:55584 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726377AbfLFX5h (ORCPT ); Fri, 6 Dec 2019 18:57:37 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530323" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:34 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 01/16] KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM Date: Fri, 6 Dec 2019 15:57:14 -0800 Message-Id: <20191206235729.29263-2-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Convert a plethora of parameters and variables in the MMU and page fault flows from type gva_t to gpa_t to properly handle TDP on 32-bit KVM. Thanks to PSE and PAE paging, 32-bit kernels can access 64-bit physical addresses. When TDP is enabled, the fault address is a guest physical address and thus can be a 64-bit value, even when both KVM and its guest are using 32-bit virtual addressing, e.g. VMX's VMCS.GUEST_PHYSICAL is a 64-bit field, not a natural width field. Using a gva_t for the fault address means KVM will incorrectly drop the upper 32-bits of the GPA. Ditto for gva_to_gpa() when it is used to translate L2 GPAs to L1 GPAs. Opportunistically rename variables and parameters to better reflect the dual address modes, e.g. use "cr2_or_gpa" for fault addresses and plain "addr" instead of "vaddr" when the address may be either a GVA or an L2 GPA. Similarly, use "gpa" in the nonpaging_page_fault() flows to avoid a confusing "gpa_t gva" declaration; this also sets the stage for a future patch to combing nonpaging_page_fault() and tdp_page_fault() with minimal churn. Sprinkle in a few comments to document flows where an address is known to be a GVA and thus can be safely truncated to a 32-bit value. Add WARNs in kvm_handle_page_fault() and FNAME(gva_to_gpa_nested)() to help document such cases and detect bugs. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 8 ++-- arch/x86/kvm/mmu/mmu.c | 69 +++++++++++++++++++-------------- arch/x86/kvm/mmu/paging_tmpl.h | 25 +++++++----- arch/x86/kvm/mmutrace.h | 12 +++--- arch/x86/kvm/x86.c | 40 +++++++++---------- arch/x86/kvm/x86.h | 2 +- include/linux/kvm_host.h | 6 +-- virt/kvm/async_pf.c | 10 ++--- 8 files changed, 94 insertions(+), 78 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b79cd6aa4075..8233ffa9db6a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -378,12 +378,12 @@ struct kvm_mmu { void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root); unsigned long (*get_cr3)(struct kvm_vcpu *vcpu); u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index); - int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err, + int (*page_fault)(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u32 err, bool prefault); void (*inject_page_fault)(struct kvm_vcpu *vcpu, struct x86_exception *fault); - gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access, - struct x86_exception *exception); + gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gpa_t gva_or_gpa, + u32 access, struct x86_exception *exception); gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access, struct x86_exception *exception); int (*sync_page)(struct kvm_vcpu *vcpu, @@ -1468,7 +1468,7 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu); int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); -int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u64 error_code, +int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, void *insn, int insn_len); void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva); void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6f92b40d798c..3977c2b7e8e5 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3528,7 +3528,7 @@ static bool is_access_allowed(u32 fault_err_code, u64 spte) * - true: let the vcpu to access on the same address again. * - false: let the real page fault path to fix it. */ -static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, +static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int level, u32 error_code) { struct kvm_shadow_walk_iterator iterator; @@ -3548,7 +3548,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, do { u64 new_spte; - for_each_shadow_entry_lockless(vcpu, gva, iterator, spte) + for_each_shadow_entry_lockless(vcpu, cr2_or_gpa, iterator, spte) if (!is_shadow_present_pte(spte) || iterator.level < level) break; @@ -3626,7 +3626,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, } while (true); - trace_fast_page_fault(vcpu, gva, error_code, iterator.sptep, + trace_fast_page_fault(vcpu, cr2_or_gpa, error_code, iterator.sptep, spte, fault_handled); walk_shadow_page_lockless_end(vcpu); @@ -3634,10 +3634,11 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, } static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, - gva_t gva, kvm_pfn_t *pfn, bool write, bool *writable); + gpa_t cr2_or_gpa, kvm_pfn_t *pfn, bool write, + bool *writable); static int make_mmu_pages_available(struct kvm_vcpu *vcpu); -static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code, +static int nonpaging_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, gfn_t gfn, bool prefault) { int r; @@ -3663,16 +3664,16 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code, gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); } - if (fast_page_fault(vcpu, v, level, error_code)) + if (fast_page_fault(vcpu, gpa, level, error_code)) return RET_PF_RETRY; mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (try_async_pf(vcpu, prefault, gfn, v, &pfn, write, &map_writable)) + if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) return RET_PF_RETRY; - if (handle_abnormal_pfn(vcpu, v, gfn, pfn, ACC_ALL, &r)) + if (handle_abnormal_pfn(vcpu, gpa, gfn, pfn, ACC_ALL, &r)) return r; r = RET_PF_RETRY; @@ -3683,7 +3684,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code, goto out_unlock; if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); - r = __direct_map(vcpu, v, write, map_writable, level, pfn, + r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, prefault, false); out_unlock: spin_unlock(&vcpu->kvm->mmu_lock); @@ -3981,7 +3982,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_mmu_sync_roots); -static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr, +static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gpa_t vaddr, u32 access, struct x86_exception *exception) { if (exception) @@ -3989,7 +3990,7 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr, return vaddr; } -static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr, +static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gpa_t vaddr, u32 access, struct x86_exception *exception) { @@ -4149,13 +4150,14 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr) walk_shadow_page_lockless_end(vcpu); } -static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, +static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, bool prefault) { - gfn_t gfn = gva >> PAGE_SHIFT; + gfn_t gfn = gpa >> PAGE_SHIFT; int r; - pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code); + /* Note, paging is disabled, ergo gva == gpa. */ + pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code); if (page_fault_handle_page_track(vcpu, error_code, gfn)) return RET_PF_EMULATE; @@ -4167,11 +4169,12 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); - return nonpaging_map(vcpu, gva & PAGE_MASK, + return nonpaging_map(vcpu, gpa & PAGE_MASK, error_code, gfn, prefault); } -static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn) +static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, + gfn_t gfn) { struct kvm_arch_async_pf arch; @@ -4180,11 +4183,13 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn) arch.direct_map = vcpu->arch.mmu->direct_map; arch.cr3 = vcpu->arch.mmu->get_cr3(vcpu); - return kvm_setup_async_pf(vcpu, gva, kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); + return kvm_setup_async_pf(vcpu, cr2_or_gpa, + kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); } static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, - gva_t gva, kvm_pfn_t *pfn, bool write, bool *writable) + gpa_t cr2_or_gpa, kvm_pfn_t *pfn, bool write, + bool *writable) { struct kvm_memory_slot *slot; bool async; @@ -4204,12 +4209,12 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, return false; /* *pfn has correct page already */ if (!prefault && kvm_can_do_async_pf(vcpu)) { - trace_kvm_try_async_get_page(gva, gfn); + trace_kvm_try_async_get_page(cr2_or_gpa, gfn); if (kvm_find_async_pf_gfn(vcpu, gfn)) { - trace_kvm_async_pf_doublefault(gva, gfn); + trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn); kvm_make_request(KVM_REQ_APF_HALT, vcpu); return true; - } else if (kvm_arch_setup_async_pf(vcpu, gva, gfn)) + } else if (kvm_arch_setup_async_pf(vcpu, cr2_or_gpa, gfn)) return true; } @@ -4222,6 +4227,12 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, { int r = 1; +#ifndef CONFIG_X86_64 + /* A 64-bit CR2 should be impossible on 32-bit KVM. */ + if (WARN_ON_ONCE(fault_address >> 32)) + return -EFAULT; +#endif + vcpu->arch.l1tf_flush_l1d = true; switch (vcpu->arch.apf.host_apf_reason) { default: @@ -4259,7 +4270,7 @@ check_hugepage_cache_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, int level) return kvm_mtrr_check_gfn_range_consistency(vcpu, gfn, page_num); } -static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, +static int tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, bool prefault) { kvm_pfn_t pfn; @@ -5516,7 +5527,7 @@ static int make_mmu_pages_available(struct kvm_vcpu *vcpu) return 0; } -int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, +int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, void *insn, int insn_len) { int r, emulation_type = 0; @@ -5525,18 +5536,18 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, /* With shadow page tables, fault_address contains a GVA or nGPA. */ if (vcpu->arch.mmu->direct_map) { vcpu->arch.gpa_available = true; - vcpu->arch.gpa_val = cr2; + vcpu->arch.gpa_val = cr2_or_gpa; } r = RET_PF_INVALID; if (unlikely(error_code & PFERR_RSVD_MASK)) { - r = handle_mmio_page_fault(vcpu, cr2, direct); + r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct); if (r == RET_PF_EMULATE) goto emulate; } if (r == RET_PF_INVALID) { - r = vcpu->arch.mmu->page_fault(vcpu, cr2, + r = vcpu->arch.mmu->page_fault(vcpu, cr2_or_gpa, lower_32_bits(error_code), false); WARN_ON(r == RET_PF_INVALID); @@ -5556,7 +5567,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, */ if (vcpu->arch.mmu->direct_map && (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) { - kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2)); + kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)); return 1; } @@ -5571,7 +5582,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, * explicitly shadowing L1's page tables, i.e. unprotecting something * for L1 isn't going to magically fix whatever issue cause L2 to fail. */ - if (!mmio_info_in_cache(vcpu, cr2, direct) && !is_guest_mode(vcpu)) + if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu)) emulation_type = EMULTYPE_ALLOW_RETRY; emulate: /* @@ -5586,7 +5597,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code, return 1; } - return x86_emulate_instruction(vcpu, cr2, emulation_type, insn, + return x86_emulate_instruction(vcpu, cr2_or_gpa, emulation_type, insn, insn_len); } EXPORT_SYMBOL_GPL(kvm_mmu_page_fault); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 97b21e7fd013..c1d7b866a03f 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -291,11 +291,11 @@ static inline unsigned FNAME(gpte_pkeys)(struct kvm_vcpu *vcpu, u64 gpte) } /* - * Fetch a guest pte for a guest virtual address + * Fetch a guest pte for a guest virtual address, or for an L2's GPA. */ static int FNAME(walk_addr_generic)(struct guest_walker *walker, struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, - gva_t addr, u32 access) + gpa_t addr, u32 access) { int ret; pt_element_t pte; @@ -496,7 +496,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, } static int FNAME(walk_addr)(struct guest_walker *walker, - struct kvm_vcpu *vcpu, gva_t addr, u32 access) + struct kvm_vcpu *vcpu, gpa_t addr, u32 access) { return FNAME(walk_addr_generic)(walker, vcpu, vcpu->arch.mmu, addr, access); @@ -611,7 +611,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw, * If the guest tries to write a write-protected page, we need to * emulate this operation, return 1 to indicate this case. */ -static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, +static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr, struct guest_walker *gw, int write_fault, int hlevel, kvm_pfn_t pfn, bool map_writable, bool prefault, @@ -765,7 +765,7 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, * Returns: 1 if we need to emulate the instruction, 0 otherwise, or * a negative value on error. */ -static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, +static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, bool prefault) { int write_fault = error_code & PFERR_WRITE_MASK; @@ -945,18 +945,19 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa) spin_unlock(&vcpu->kvm->mmu_lock); } -static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access, +/* Note, @addr is a GPA when gva_to_gpa() translates an L2 GPA to an L1 GPA. */ +static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gpa_t addr, u32 access, struct x86_exception *exception) { struct guest_walker walker; gpa_t gpa = UNMAPPED_GVA; int r; - r = FNAME(walk_addr)(&walker, vcpu, vaddr, access); + r = FNAME(walk_addr)(&walker, vcpu, addr, access); if (r) { gpa = gfn_to_gpa(walker.gfn); - gpa |= vaddr & ~PAGE_MASK; + gpa |= addr & ~PAGE_MASK; } else if (exception) *exception = walker.fault; @@ -964,7 +965,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access, } #if PTTYPE != PTTYPE_EPT -static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr, +/* Note, gva_to_gpa_nested() is only used to translate L2 GVAs. */ +static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gpa_t vaddr, u32 access, struct x86_exception *exception) { @@ -972,6 +974,11 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr, gpa_t gpa = UNMAPPED_GVA; int r; +#ifndef CONFIG_X86_64 + /* A 64-bit GVA should be impossible on 32-bit KVM. */ + WARN_ON_ONCE(vaddr >> 32); +#endif + r = FNAME(walk_addr_nested)(&walker, vcpu, vaddr, access); if (r) { diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h index 7ca8831c7d1a..3c6522b84ff1 100644 --- a/arch/x86/kvm/mmutrace.h +++ b/arch/x86/kvm/mmutrace.h @@ -249,13 +249,13 @@ TRACE_EVENT( TRACE_EVENT( fast_page_fault, - TP_PROTO(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code, + TP_PROTO(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u32 error_code, u64 *sptep, u64 old_spte, bool retry), - TP_ARGS(vcpu, gva, error_code, sptep, old_spte, retry), + TP_ARGS(vcpu, cr2_or_gpa, error_code, sptep, old_spte, retry), TP_STRUCT__entry( __field(int, vcpu_id) - __field(gva_t, gva) + __field(gpa_t, cr2_or_gpa) __field(u32, error_code) __field(u64 *, sptep) __field(u64, old_spte) @@ -265,7 +265,7 @@ TRACE_EVENT( TP_fast_assign( __entry->vcpu_id = vcpu->vcpu_id; - __entry->gva = gva; + __entry->cr2_or_gpa = cr2_or_gpa; __entry->error_code = error_code; __entry->sptep = sptep; __entry->old_spte = old_spte; @@ -273,9 +273,9 @@ TRACE_EVENT( __entry->retry = retry; ), - TP_printk("vcpu %d gva %lx error_code %s sptep %p old %#llx" + TP_printk("vcpu %d gva %llx error_code %s sptep %p old %#llx" " new %llx spurious %d fixed %d", __entry->vcpu_id, - __entry->gva, __print_flags(__entry->error_code, "|", + __entry->cr2_or_gpa, __print_flags(__entry->error_code, "|", kvm_mmu_trace_pferr_flags), __entry->sptep, __entry->old_spte, __entry->new_spte, __spte_satisfied(old_spte), __spte_satisfied(new_spte) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3ed167e039e5..fa46fbed6001 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6377,11 +6377,11 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type) return 1; } -static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2, +static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, bool write_fault_to_shadow_pgtable, int emulation_type) { - gpa_t gpa = cr2; + gpa_t gpa = cr2_or_gpa; kvm_pfn_t pfn; if (!(emulation_type & EMULTYPE_ALLOW_RETRY)) @@ -6395,7 +6395,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2, * Write permission should be allowed since only * write access need to be emulated. */ - gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL); + gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL); /* * If the mapping is invalid in guest, let cpu retry @@ -6452,10 +6452,10 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2, } static bool retry_instruction(struct x86_emulate_ctxt *ctxt, - unsigned long cr2, int emulation_type) + gpa_t cr2_or_gpa, int emulation_type) { struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); - unsigned long last_retry_eip, last_retry_addr, gpa = cr2; + unsigned long last_retry_eip, last_retry_addr, gpa = cr2_or_gpa; last_retry_eip = vcpu->arch.last_retry_eip; last_retry_addr = vcpu->arch.last_retry_addr; @@ -6484,14 +6484,14 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt, if (x86_page_table_writing_insn(ctxt)) return false; - if (ctxt->eip == last_retry_eip && last_retry_addr == cr2) + if (ctxt->eip == last_retry_eip && last_retry_addr == cr2_or_gpa) return false; vcpu->arch.last_retry_eip = ctxt->eip; - vcpu->arch.last_retry_addr = cr2; + vcpu->arch.last_retry_addr = cr2_or_gpa; if (!vcpu->arch.mmu->direct_map) - gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL); + gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL); kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)); @@ -6637,11 +6637,8 @@ static bool is_vmware_backdoor_opcode(struct x86_emulate_ctxt *ctxt) return false; } -int x86_emulate_instruction(struct kvm_vcpu *vcpu, - unsigned long cr2, - int emulation_type, - void *insn, - int insn_len) +int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, + int emulation_type, void *insn, int insn_len) { int r; struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt; @@ -6687,8 +6684,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, kvm_queue_exception(vcpu, UD_VECTOR); return 1; } - if (reexecute_instruction(vcpu, cr2, write_fault_to_spt, - emulation_type)) + if (reexecute_instruction(vcpu, cr2_or_gpa, + write_fault_to_spt, + emulation_type)) return 1; if (ctxt->have_exception) { /* @@ -6722,7 +6720,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, return 1; } - if (retry_instruction(ctxt, cr2, emulation_type)) + if (retry_instruction(ctxt, cr2_or_gpa, emulation_type)) return 1; /* this is needed for vmware backdoor interface to work since it @@ -6734,7 +6732,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, restart: /* Save the faulting GPA (cr2) in the address field */ - ctxt->exception.address = cr2; + ctxt->exception.address = cr2_or_gpa; r = x86_emulate_insn(ctxt); @@ -6742,7 +6740,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, return 1; if (r == EMULATION_FAILED) { - if (reexecute_instruction(vcpu, cr2, write_fault_to_spt, + if (reexecute_instruction(vcpu, cr2_or_gpa, write_fault_to_spt, emulation_type)) return 1; @@ -10012,7 +10010,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) work->arch.cr3 != vcpu->arch.mmu->get_cr3(vcpu)) return; - vcpu->arch.mmu->page_fault(vcpu, work->gva, 0, true); + vcpu->arch.mmu->page_fault(vcpu, work->cr2_or_gpa, 0, true); } static inline u32 kvm_async_pf_hash_fn(gfn_t gfn) @@ -10125,7 +10123,7 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, { struct x86_exception fault; - trace_kvm_async_pf_not_present(work->arch.token, work->gva); + trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa); kvm_add_async_pf_gfn(vcpu, work->arch.gfn); if (kvm_can_deliver_async_pf(vcpu) && @@ -10160,7 +10158,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, work->arch.token = ~0; /* broadcast wakeup */ else kvm_del_async_pf_gfn(vcpu, work->arch.gfn); - trace_kvm_async_pf_ready(work->arch.token, work->gva); + trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa); if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED && !apf_get_user(vcpu, &val)) { diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 29391af8871d..cab5e71f0f0f 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -289,7 +289,7 @@ int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, int page_num); bool kvm_vector_hashing_enabled(void); -int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2, +int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int emulation_type, void *insn, int insn_len); #define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7ed1e2f8641e..0b564a928774 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -204,7 +204,7 @@ struct kvm_async_pf { struct list_head queue; struct kvm_vcpu *vcpu; struct mm_struct *mm; - gva_t gva; + gpa_t cr2_or_gpa; unsigned long addr; struct kvm_arch_async_pf arch; bool wakeup_all; @@ -212,8 +212,8 @@ struct kvm_async_pf { void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu); void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu); -int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva, - struct kvm_arch_async_pf *arch); +int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, + unsigned long hva, struct kvm_arch_async_pf *arch); int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #endif diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index 35305d6e68cc..d8ef708a2ef6 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -64,7 +64,7 @@ static void async_pf_execute(struct work_struct *work) struct mm_struct *mm = apf->mm; struct kvm_vcpu *vcpu = apf->vcpu; unsigned long addr = apf->addr; - gva_t gva = apf->gva; + gpa_t cr2_or_gpa = apf->cr2_or_gpa; int locked = 1; might_sleep(); @@ -92,7 +92,7 @@ static void async_pf_execute(struct work_struct *work) * this point */ - trace_kvm_async_pf_completed(addr, gva); + trace_kvm_async_pf_completed(addr, cr2_or_gpa); if (swq_has_sleeper(&vcpu->wq)) swake_up_one(&vcpu->wq); @@ -165,8 +165,8 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) } } -int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva, - struct kvm_arch_async_pf *arch) +int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, + unsigned long hva, struct kvm_arch_async_pf *arch) { struct kvm_async_pf *work; @@ -185,7 +185,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva, work->wakeup_all = false; work->vcpu = vcpu; - work->gva = gva; + work->cr2_or_gpa = cr2_or_gpa; work->addr = hva; work->arch = *arch; work->mm = current->mm; From patchwork Fri Dec 6 23:57:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277191 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D537A930 for ; Fri, 6 Dec 2019 23:58:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BE09924670 for ; Fri, 6 Dec 2019 23:58:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726480AbfLFX5h (ORCPT ); Fri, 6 Dec 2019 18:57:37 -0500 Received: from mga07.intel.com ([134.134.136.100]:55585 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726473AbfLFX5h (ORCPT ); Fri, 6 Dec 2019 18:57:37 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530326" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:35 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 02/16] KVM: x86/mmu: Move definition of make_mmu_pages_available() up Date: Fri, 6 Dec 2019 15:57:15 -0800 Message-Id: <20191206235729.29263-3-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Move make_mmu_pages_available() above its first user to put it closer to related code and eliminate a forward declaration. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 41 ++++++++++++++++++++--------------------- 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3977c2b7e8e5..e2792305ce32 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2899,6 +2899,26 @@ static bool prepare_zap_oldest_mmu_page(struct kvm *kvm, return kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); } +static int make_mmu_pages_available(struct kvm_vcpu *vcpu) +{ + LIST_HEAD(invalid_list); + + if (likely(kvm_mmu_available_pages(vcpu->kvm) >= KVM_MIN_FREE_MMU_PAGES)) + return 0; + + while (kvm_mmu_available_pages(vcpu->kvm) < KVM_REFILL_PAGES) { + if (!prepare_zap_oldest_mmu_page(vcpu->kvm, &invalid_list)) + break; + + ++vcpu->kvm->stat.mmu_recycled; + } + kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); + + if (!kvm_mmu_available_pages(vcpu->kvm)) + return -ENOSPC; + return 0; +} + /* * Changing the number of mmu pages allocated to the vm * Note: if goal_nr_mmu_pages is too small, you will get dead lock @@ -3636,7 +3656,6 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int level, static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, gpa_t cr2_or_gpa, kvm_pfn_t *pfn, bool write, bool *writable); -static int make_mmu_pages_available(struct kvm_vcpu *vcpu); static int nonpaging_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, gfn_t gfn, bool prefault) @@ -5507,26 +5526,6 @@ int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva) } EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt); -static int make_mmu_pages_available(struct kvm_vcpu *vcpu) -{ - LIST_HEAD(invalid_list); - - if (likely(kvm_mmu_available_pages(vcpu->kvm) >= KVM_MIN_FREE_MMU_PAGES)) - return 0; - - while (kvm_mmu_available_pages(vcpu->kvm) < KVM_REFILL_PAGES) { - if (!prepare_zap_oldest_mmu_page(vcpu->kvm, &invalid_list)) - break; - - ++vcpu->kvm->stat.mmu_recycled; - } - kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); - - if (!kvm_mmu_available_pages(vcpu->kvm)) - return -ENOSPC; - return 0; -} - int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, void *insn, int insn_len) { From patchwork Fri Dec 6 23:57:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277163 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8EDC414B7 for ; Fri, 6 Dec 2019 23:57:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6FB2C217F4 for ; Fri, 6 Dec 2019 23:57:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726473AbfLFX5j (ORCPT ); Fri, 6 Dec 2019 18:57:39 -0500 Received: from mga07.intel.com ([134.134.136.100]:55585 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726374AbfLFX5h (ORCPT ); Fri, 6 Dec 2019 18:57:37 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530329" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:35 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 03/16] KVM: x86/mmu: Fold nonpaging_map() into nonpaging_page_fault() Date: Fri, 6 Dec 2019 15:57:16 -0800 Message-Id: <20191206235729.29263-4-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Fold nonpaging_map() into its sole caller, nonpaging_page_fault(), in preparation for combining the bulk of nonpaging_page_fault() and tdp_page_fault() into a common helper. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 106 +++++++++++++++++++---------------------- 1 file changed, 49 insertions(+), 57 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e2792305ce32..8217cdd6beac 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3657,60 +3657,6 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, gpa_t cr2_or_gpa, kvm_pfn_t *pfn, bool write, bool *writable); -static int nonpaging_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, - gfn_t gfn, bool prefault) -{ - int r; - int level; - bool force_pt_level; - kvm_pfn_t pfn; - unsigned long mmu_seq; - bool map_writable, write = error_code & PFERR_WRITE_MASK; - bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) && - is_nx_huge_page_enabled(); - - force_pt_level = lpage_disallowed; - level = mapping_level(vcpu, gfn, &force_pt_level); - if (likely(!force_pt_level)) { - /* - * This path builds a PAE pagetable - so we can map - * 2mb pages at maximum. Therefore check if the level - * is larger than that. - */ - if (level > PT_DIRECTORY_LEVEL) - level = PT_DIRECTORY_LEVEL; - - gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); - } - - if (fast_page_fault(vcpu, gpa, level, error_code)) - return RET_PF_RETRY; - - mmu_seq = vcpu->kvm->mmu_notifier_seq; - smp_rmb(); - - if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) - return RET_PF_RETRY; - - if (handle_abnormal_pfn(vcpu, gpa, gfn, pfn, ACC_ALL, &r)) - return r; - - r = RET_PF_RETRY; - spin_lock(&vcpu->kvm->mmu_lock); - if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) - goto out_unlock; - if (make_mmu_pages_available(vcpu) < 0) - goto out_unlock; - if (likely(!force_pt_level)) - transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); - r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, - prefault, false); -out_unlock: - spin_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(pfn); - return r; -} - static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa, struct list_head *invalid_list) { @@ -4172,12 +4118,21 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr) static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, bool prefault) { - gfn_t gfn = gpa >> PAGE_SHIFT; int r; + int level; + kvm_pfn_t pfn; + unsigned long mmu_seq; + gfn_t gfn = gpa >> PAGE_SHIFT; + bool write = error_code & PFERR_WRITE_MASK; + bool force_pt_level, map_writable; + bool exec = error_code & PFERR_FETCH_MASK; + bool lpage_disallowed = exec && is_nx_huge_page_enabled(); /* Note, paging is disabled, ergo gva == gpa. */ pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code); + gpa &= PAGE_MASK; + if (page_fault_handle_page_track(vcpu, error_code, gfn)) return RET_PF_EMULATE; @@ -4187,9 +4142,46 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); + force_pt_level = lpage_disallowed; + level = mapping_level(vcpu, gfn, &force_pt_level); + if (likely(!force_pt_level)) { + /* + * This path builds a PAE pagetable - so we can map + * 2mb pages at maximum. Therefore check if the level + * is larger than that. + */ + if (level > PT_DIRECTORY_LEVEL) + level = PT_DIRECTORY_LEVEL; - return nonpaging_map(vcpu, gpa & PAGE_MASK, - error_code, gfn, prefault); + gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); + } + + if (fast_page_fault(vcpu, gpa, level, error_code)) + return RET_PF_RETRY; + + mmu_seq = vcpu->kvm->mmu_notifier_seq; + smp_rmb(); + + if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) + return RET_PF_RETRY; + + if (handle_abnormal_pfn(vcpu, gpa, gfn, pfn, ACC_ALL, &r)) + return r; + + r = RET_PF_RETRY; + spin_lock(&vcpu->kvm->mmu_lock); + if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) + goto out_unlock; + if (make_mmu_pages_available(vcpu) < 0) + goto out_unlock; + if (likely(!force_pt_level)) + transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); + r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, + prefault, false); +out_unlock: + spin_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(pfn); + return r; } static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, From patchwork Fri Dec 6 23:57:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277179 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4C29159A for ; Fri, 6 Dec 2019 23:58:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9201820659 for ; Fri, 6 Dec 2019 23:58:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726578AbfLFX6T (ORCPT ); Fri, 6 Dec 2019 18:58:19 -0500 Received: from mga07.intel.com ([134.134.136.100]:55584 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726492AbfLFX5i (ORCPT ); Fri, 6 Dec 2019 18:57:38 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530332" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:36 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 04/16] KVM: x86/mmu: Move nonpaging_page_fault() below try_async_pf() Date: Fri, 6 Dec 2019 15:57:17 -0800 Message-Id: <20191206235729.29263-5-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Move nonpaging_page_fault() below try_async_pf() to eliminate the forward declaration of try_async_pf() and to prepare for combining the bulk of nonpaging_page_fault() and tdp_page_fault() into a common helper. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 142 ++++++++++++++++++++--------------------- 1 file changed, 69 insertions(+), 73 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 8217cdd6beac..ef1b7663f6ea 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3653,10 +3653,6 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int level, return fault_handled; } -static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, - gpa_t cr2_or_gpa, kvm_pfn_t *pfn, bool write, - bool *writable); - static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa, struct list_head *invalid_list) { @@ -4115,75 +4111,6 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr) walk_shadow_page_lockless_end(vcpu); } -static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, - u32 error_code, bool prefault) -{ - int r; - int level; - kvm_pfn_t pfn; - unsigned long mmu_seq; - gfn_t gfn = gpa >> PAGE_SHIFT; - bool write = error_code & PFERR_WRITE_MASK; - bool force_pt_level, map_writable; - bool exec = error_code & PFERR_FETCH_MASK; - bool lpage_disallowed = exec && is_nx_huge_page_enabled(); - - /* Note, paging is disabled, ergo gva == gpa. */ - pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code); - - gpa &= PAGE_MASK; - - if (page_fault_handle_page_track(vcpu, error_code, gfn)) - return RET_PF_EMULATE; - - r = mmu_topup_memory_caches(vcpu); - if (r) - return r; - - MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); - - force_pt_level = lpage_disallowed; - level = mapping_level(vcpu, gfn, &force_pt_level); - if (likely(!force_pt_level)) { - /* - * This path builds a PAE pagetable - so we can map - * 2mb pages at maximum. Therefore check if the level - * is larger than that. - */ - if (level > PT_DIRECTORY_LEVEL) - level = PT_DIRECTORY_LEVEL; - - gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); - } - - if (fast_page_fault(vcpu, gpa, level, error_code)) - return RET_PF_RETRY; - - mmu_seq = vcpu->kvm->mmu_notifier_seq; - smp_rmb(); - - if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) - return RET_PF_RETRY; - - if (handle_abnormal_pfn(vcpu, gpa, gfn, pfn, ACC_ALL, &r)) - return r; - - r = RET_PF_RETRY; - spin_lock(&vcpu->kvm->mmu_lock); - if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) - goto out_unlock; - if (make_mmu_pages_available(vcpu) < 0) - goto out_unlock; - if (likely(!force_pt_level)) - transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); - r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, - prefault, false); -out_unlock: - spin_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(pfn); - return r; -} - static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, gfn_t gfn) { @@ -4233,6 +4160,75 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, return false; } +static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, + u32 error_code, bool prefault) +{ + int r; + int level; + kvm_pfn_t pfn; + unsigned long mmu_seq; + gfn_t gfn = gpa >> PAGE_SHIFT; + bool write = error_code & PFERR_WRITE_MASK; + bool force_pt_level, map_writable; + bool exec = error_code & PFERR_FETCH_MASK; + bool lpage_disallowed = exec && is_nx_huge_page_enabled(); + + /* Note, paging is disabled, ergo gva == gpa. */ + pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code); + + gpa &= PAGE_MASK; + + if (page_fault_handle_page_track(vcpu, error_code, gfn)) + return RET_PF_EMULATE; + + r = mmu_topup_memory_caches(vcpu); + if (r) + return r; + + MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); + + force_pt_level = lpage_disallowed; + level = mapping_level(vcpu, gfn, &force_pt_level); + if (likely(!force_pt_level)) { + /* + * This path builds a PAE pagetable - so we can map + * 2mb pages at maximum. Therefore check if the level + * is larger than that. + */ + if (level > PT_DIRECTORY_LEVEL) + level = PT_DIRECTORY_LEVEL; + + gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); + } + + if (fast_page_fault(vcpu, gpa, level, error_code)) + return RET_PF_RETRY; + + mmu_seq = vcpu->kvm->mmu_notifier_seq; + smp_rmb(); + + if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) + return RET_PF_RETRY; + + if (handle_abnormal_pfn(vcpu, gpa, gfn, pfn, ACC_ALL, &r)) + return r; + + r = RET_PF_RETRY; + spin_lock(&vcpu->kvm->mmu_lock); + if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) + goto out_unlock; + if (make_mmu_pages_available(vcpu) < 0) + goto out_unlock; + if (likely(!force_pt_level)) + transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); + r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, + prefault, false); +out_unlock: + spin_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(pfn); + return r; +} + int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, u64 fault_address, char *insn, int insn_len) { From patchwork Fri Dec 6 23:57:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277187 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7132C159A for ; Fri, 6 Dec 2019 23:58:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5906C20659 for ; Fri, 6 Dec 2019 23:58:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727016AbfLFX6j (ORCPT ); Fri, 6 Dec 2019 18:58:39 -0500 Received: from mga07.intel.com ([134.134.136.100]:55586 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726506AbfLFX5i (ORCPT ); Fri, 6 Dec 2019 18:57:38 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530337" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:36 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 05/16] KVM: x86/mmu: Refactor handling of cache consistency with TDP Date: Fri, 6 Dec 2019 15:57:18 -0800 Message-Id: <20191206235729.29263-6-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Pre-calculate the max level for a TDP page with respect to MTRR cache consistency in preparation of replacing force_pt_level with max_level, and eventually combining the bulk of nonpaging_page_fault() and tdp_page_fault() into a common helper. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index ef1b7663f6ea..e4a96236aa14 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4267,16 +4267,6 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, } EXPORT_SYMBOL_GPL(kvm_handle_page_fault); -static bool -check_hugepage_cache_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, int level) -{ - int page_num = KVM_PAGES_PER_HPAGE(level); - - gfn &= ~(page_num - 1); - - return kvm_mtrr_check_gfn_range_consistency(vcpu, gfn, page_num); -} - static int tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, bool prefault) { @@ -4290,6 +4280,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, bool map_writable; bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) && is_nx_huge_page_enabled(); + int max_level; MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); @@ -4300,14 +4291,21 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, if (r) return r; - force_pt_level = - lpage_disallowed || - !check_hugepage_cache_consistency(vcpu, gfn, PT_DIRECTORY_LEVEL); + for (max_level = PT_MAX_HUGEPAGE_LEVEL; + max_level > PT_PAGE_TABLE_LEVEL; + max_level--) { + int page_num = KVM_PAGES_PER_HPAGE(max_level); + gfn_t base = gfn & ~(page_num - 1); + + if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num)) + break; + } + + force_pt_level = lpage_disallowed || max_level == PT_PAGE_TABLE_LEVEL; level = mapping_level(vcpu, gfn, &force_pt_level); if (likely(!force_pt_level)) { - if (level > PT_DIRECTORY_LEVEL && - !check_hugepage_cache_consistency(vcpu, gfn, level)) - level = PT_DIRECTORY_LEVEL; + if (level > max_level) + level = max_level; gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); } From patchwork Fri Dec 6 23:57:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277189 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1671C159A for ; Fri, 6 Dec 2019 23:58:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E912924670 for ; Fri, 6 Dec 2019 23:58:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726951AbfLFX6j (ORCPT ); Fri, 6 Dec 2019 18:58:39 -0500 Received: from mga07.intel.com ([134.134.136.100]:55585 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726509AbfLFX5i (ORCPT ); Fri, 6 Dec 2019 18:57:38 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530340" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:36 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 06/16] KVM: x86/mmu: Refactor the per-slot level calculation in mapping_level() Date: Fri, 6 Dec 2019 15:57:19 -0800 Message-Id: <20191206235729.29263-7-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Invert the loop which adjusts the allowed page level based on what's compatible with the associated memslot to use a largest-to-smallest page size walk. This paves the way for passing around a "max level" variable instead of having redundant checks and/or multiple booleans. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e4a96236aa14..bd1711201181 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1326,7 +1326,7 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn, static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, bool *force_pt_level) { - int host_level, level, max_level; + int host_level, max_level; struct kvm_memory_slot *slot; if (unlikely(*force_pt_level)) @@ -1343,12 +1343,12 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, return host_level; max_level = min(kvm_x86_ops->get_lpage_level(), host_level); - - for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level) - if (__mmu_gfn_lpage_is_disallowed(large_gfn, level, slot)) + for ( ; max_level > PT_PAGE_TABLE_LEVEL; max_level--) { + if (!__mmu_gfn_lpage_is_disallowed(large_gfn, max_level, slot)) break; + } - return level - 1; + return max_level; } /* From patchwork Fri Dec 6 23:57:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277181 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EAF0D159A for ; Fri, 6 Dec 2019 23:58:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C91BE2245C for ; Fri, 6 Dec 2019 23:58:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726915AbfLFX6T (ORCPT ); Fri, 6 Dec 2019 18:58:19 -0500 Received: from mga07.intel.com ([134.134.136.100]:55586 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726552AbfLFX5j (ORCPT ); Fri, 6 Dec 2019 18:57:39 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530343" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:37 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 07/16] KVM: x86/mmu: Refactor handling of forced 4k pages in page faults Date: Fri, 6 Dec 2019 15:57:20 -0800 Message-Id: <20191206235729.29263-8-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Refactor the page fault handlers and mapping_level() to track the max allowed page level instead of only tracking if a 4k page is mandatory due to one restriction or another. This paves the way for cleanly consolidating tdp_page_fault() and nonpaging_page_fault(), and for eliminating a redundant check on mmu_gfn_lpage_is_disallowed(). No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 45 ++++++++++++++-------------------- arch/x86/kvm/mmu/paging_tmpl.h | 16 +++++++----- 2 files changed, 29 insertions(+), 32 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index bd1711201181..877924cbb75b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1324,18 +1324,19 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn, } static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, - bool *force_pt_level) + int *max_levelp) { - int host_level, max_level; + int host_level, max_level = *max_levelp; struct kvm_memory_slot *slot; - if (unlikely(*force_pt_level)) + if (unlikely(max_level == PT_PAGE_TABLE_LEVEL)) return PT_PAGE_TABLE_LEVEL; slot = kvm_vcpu_gfn_to_memslot(vcpu, large_gfn); - *force_pt_level = !memslot_valid_for_gpte(slot, true); - if (unlikely(*force_pt_level)) + if (!memslot_valid_for_gpte(slot, true)) { + *max_levelp = PT_PAGE_TABLE_LEVEL; return PT_PAGE_TABLE_LEVEL; + } host_level = host_mapping_level(vcpu->kvm, large_gfn); @@ -4169,9 +4170,10 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long mmu_seq; gfn_t gfn = gpa >> PAGE_SHIFT; bool write = error_code & PFERR_WRITE_MASK; - bool force_pt_level, map_writable; + bool map_writable; bool exec = error_code & PFERR_FETCH_MASK; bool lpage_disallowed = exec && is_nx_huge_page_enabled(); + int max_level; /* Note, paging is disabled, ergo gva == gpa. */ pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code); @@ -4187,19 +4189,12 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); - force_pt_level = lpage_disallowed; - level = mapping_level(vcpu, gfn, &force_pt_level); - if (likely(!force_pt_level)) { - /* - * This path builds a PAE pagetable - so we can map - * 2mb pages at maximum. Therefore check if the level - * is larger than that. - */ - if (level > PT_DIRECTORY_LEVEL) - level = PT_DIRECTORY_LEVEL; + /* This path builds a PAE pagetable, we can map 2mb pages at maximum. */ + max_level = lpage_disallowed ? PT_PAGE_TABLE_LEVEL : PT_DIRECTORY_LEVEL; + level = mapping_level(vcpu, gfn, &max_level); + if (level > PT_PAGE_TABLE_LEVEL) gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); - } if (fast_page_fault(vcpu, gpa, level, error_code)) return RET_PF_RETRY; @@ -4219,7 +4214,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, goto out_unlock; if (make_mmu_pages_available(vcpu) < 0) goto out_unlock; - if (likely(!force_pt_level)) + if (likely(max_level > PT_PAGE_TABLE_LEVEL)) transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, prefault, false); @@ -4273,7 +4268,6 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, kvm_pfn_t pfn; int r; int level; - bool force_pt_level; gfn_t gfn = gpa >> PAGE_SHIFT; unsigned long mmu_seq; int write = error_code & PFERR_WRITE_MASK; @@ -4301,13 +4295,12 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, break; } - force_pt_level = lpage_disallowed || max_level == PT_PAGE_TABLE_LEVEL; - level = mapping_level(vcpu, gfn, &force_pt_level); - if (likely(!force_pt_level)) { - if (level > max_level) - level = max_level; + if (lpage_disallowed) + max_level = PT_PAGE_TABLE_LEVEL; + + level = mapping_level(vcpu, gfn, &max_level); + if (level > PT_PAGE_TABLE_LEVEL) gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); - } if (fast_page_fault(vcpu, gpa, level, error_code)) return RET_PF_RETRY; @@ -4327,7 +4320,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, goto out_unlock; if (make_mmu_pages_available(vcpu) < 0) goto out_unlock; - if (likely(!force_pt_level)) + if (likely(max_level > PT_PAGE_TABLE_LEVEL)) transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, prefault, lpage_disallowed); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index c1d7b866a03f..1938a6e4e631 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -778,7 +778,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, bool map_writable, is_self_change_mapping; bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) && is_nx_huge_page_enabled(); - bool force_pt_level = lpage_disallowed; + int max_level; pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); @@ -818,14 +818,18 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, &walker, user_fault, &vcpu->arch.write_fault_to_shadow_pgtable); + max_level = lpage_disallowed ? PT_PAGE_TABLE_LEVEL : + PT_MAX_HUGEPAGE_LEVEL; + if (walker.level >= PT_DIRECTORY_LEVEL && !is_self_change_mapping) { - level = mapping_level(vcpu, walker.gfn, &force_pt_level); - if (likely(!force_pt_level)) { + level = mapping_level(vcpu, walker.gfn, &max_level); + if (likely(max_level > PT_DIRECTORY_LEVEL)) { level = min(walker.level, level); walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1); } - } else - force_pt_level = true; + } else { + max_level = PT_PAGE_TABLE_LEVEL; + } mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); @@ -865,7 +869,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT); if (make_mmu_pages_available(vcpu) < 0) goto out_unlock; - if (!force_pt_level) + if (max_level > PT_PAGE_TABLE_LEVEL) transparent_hugepage_adjust(vcpu, walker.gfn, &pfn, &level); r = FNAME(fetch)(vcpu, addr, &walker, write_fault, level, pfn, map_writable, prefault, lpage_disallowed); From patchwork Fri Dec 6 23:57:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277173 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE3CA159A for ; Fri, 6 Dec 2019 23:58:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CB649217F4 for ; Fri, 6 Dec 2019 23:58:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726597AbfLFX5l (ORCPT ); Fri, 6 Dec 2019 18:57:41 -0500 Received: from mga07.intel.com ([134.134.136.100]:55585 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726538AbfLFX5j (ORCPT ); Fri, 6 Dec 2019 18:57:39 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530346" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:37 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 08/16] KVM: x86/mmu: Incorporate guest's page level into max level for shadow MMU Date: Fri, 6 Dec 2019 15:57:21 -0800 Message-Id: <20191206235729.29263-9-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Restrict the max level for a shadow page based on the guest's level instead of capping the level after the fact for host-mapped huge pages, e.g. hugetlbfs pages. Explicitly capping the max level using the guest mapping level also eliminates FNAME(page_fault)'s subtle dependency on THP only supporting 2mb pages. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/paging_tmpl.h | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 1938a6e4e631..7d57ec576df0 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -773,7 +773,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, struct guest_walker walker; int r; kvm_pfn_t pfn; - int level = PT_PAGE_TABLE_LEVEL; + int level; unsigned long mmu_seq; bool map_writable, is_self_change_mapping; bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) && @@ -818,18 +818,14 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, &walker, user_fault, &vcpu->arch.write_fault_to_shadow_pgtable); - max_level = lpage_disallowed ? PT_PAGE_TABLE_LEVEL : - PT_MAX_HUGEPAGE_LEVEL; - - if (walker.level >= PT_DIRECTORY_LEVEL && !is_self_change_mapping) { - level = mapping_level(vcpu, walker.gfn, &max_level); - if (likely(max_level > PT_DIRECTORY_LEVEL)) { - level = min(walker.level, level); - walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1); - } - } else { + if (lpage_disallowed || is_self_change_mapping) max_level = PT_PAGE_TABLE_LEVEL; - } + else + max_level = walker.level; + + level = mapping_level(vcpu, walker.gfn, &max_level); + if (level > PT_PAGE_TABLE_LEVEL) + walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1); mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); From patchwork Fri Dec 6 23:57:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277183 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B3889930 for ; Fri, 6 Dec 2019 23:58:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9ADD520659 for ; Fri, 6 Dec 2019 23:58:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726898AbfLFX6T (ORCPT ); Fri, 6 Dec 2019 18:58:19 -0500 Received: from mga07.intel.com ([134.134.136.100]:55584 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726551AbfLFX5k (ORCPT ); Fri, 6 Dec 2019 18:57:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530351" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:37 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 09/16] KVM: x86/mmu: Persist gfn_lpage_is_disallowed() to max_level Date: Fri, 6 Dec 2019 15:57:22 -0800 Message-Id: <20191206235729.29263-10-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Persist the max page level calculated via gfn_lpage_is_disallowed() to the max level "returned" by mapping_level() so that its naturally taken into account by the max level check that conditions calling transparent_hugepage_adjust(). Drop the gfn_lpage_is_disallowed() check in thp_adjust() as it's now handled by mapping_level() and its callers. Add a comment to document the behavior of host_mapping_level() and its interaction with max level and transparent huge pages. Note, transferring the gfn_lpage_is_disallowed() from thp_adjust() to mapping_level() superficially affects how changes to a memslot's disallow_lpage count will be handled due to thp_adjust() being run while holding mmu_lock. In the more common case where a different vCPU increments the count via account_shadowed(), gfn_lpage_is_disallowed() is rechecked by set_spte() to ensure a writable large page isn't created. In the less common case where the count is decremented to zero due to all shadow pages in the memslot being zapped, THP behavior now matches hugetlbfs behavior in the sense that a small page will be created when a large page could be used if the count reaches zero in the miniscule window between mapping_level() and acquiring mmu_lock. Lastly, the new THP behavior also follows hugetlbfs behavior in the absurdly unlikely scenario of a memslot being moved such that the memslot's compatibility with respect to large pages changes, but without changing the validity of the gpf->pfn walk. I.e. if a memslot is moved between mapping_level() and snapshotting mmu_seq, it's theoretically possible to consume a stale disallow_lpage count. But, since KVM zaps all shadow pages when moving a memslot and forces all vCPUs to reload a new MMU, the inserted spte will always be thrown away prior to completing the memslot move, i.e. whether or not the spte accurately reflects disallow_lpage is irrelevant. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 877924cbb75b..8782a70abe78 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1326,7 +1326,7 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn, static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, int *max_levelp) { - int host_level, max_level = *max_levelp; + int max_level = *max_levelp; struct kvm_memory_slot *slot; if (unlikely(max_level == PT_PAGE_TABLE_LEVEL)) @@ -1338,18 +1338,27 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, return PT_PAGE_TABLE_LEVEL; } - host_level = host_mapping_level(vcpu->kvm, large_gfn); - - if (host_level == PT_PAGE_TABLE_LEVEL) - return host_level; - - max_level = min(kvm_x86_ops->get_lpage_level(), host_level); + max_level = min(max_level, kvm_x86_ops->get_lpage_level()); for ( ; max_level > PT_PAGE_TABLE_LEVEL; max_level--) { if (!__mmu_gfn_lpage_is_disallowed(large_gfn, max_level, slot)) break; } - return max_level; + *max_levelp = max_level; + + if (max_level == PT_PAGE_TABLE_LEVEL) + return PT_PAGE_TABLE_LEVEL; + + /* + * Note, host_mapping_level() does *not* handle transparent huge pages. + * As suggested by "mapping", it reflects the page size established by + * the associated vma, if there is one, i.e. host_mapping_level() will + * return a huge page level if and only if a vma exists and the backing + * implementation for the vma uses huge pages, e.g. hugetlbfs and dax. + * So, do not propagate host_mapping_level() to max_level as KVM can + * still promote the guest mapping to a huge page in the THP case. + */ + return host_mapping_level(vcpu->kvm, large_gfn); } /* @@ -3420,8 +3429,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, */ if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL && - PageTransCompoundMap(pfn_to_page(pfn)) && - !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) { + PageTransCompoundMap(pfn_to_page(pfn))) { unsigned long mask; /* * mmu_notifier_retry was successful and we hold the From patchwork Fri Dec 6 23:57:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277185 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B766B930 for ; Fri, 6 Dec 2019 23:58:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 96B032245C for ; Fri, 6 Dec 2019 23:58:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726864AbfLFX6S (ORCPT ); Fri, 6 Dec 2019 18:58:18 -0500 Received: from mga07.intel.com ([134.134.136.100]:55585 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726578AbfLFX5k (ORCPT ); Fri, 6 Dec 2019 18:57:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530355" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:38 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 10/16] KVM: x86/mmu: Rename lpage_disallowed to account_disallowed_nx_lpage Date: Fri, 6 Dec 2019 15:57:23 -0800 Message-Id: <20191206235729.29263-11-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Rename __direct_map()'s param that controls whether or not a disallowed NX large page should be accounted to match what it actually does. The nonpaging_page_fault() case unconditionally passes %false for the param even though it locally sets lpage_disallowed. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 8782a70abe78..a09a3bc0b6ae 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3349,7 +3349,7 @@ static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator it, static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write, int map_writable, int level, kvm_pfn_t pfn, - bool prefault, bool lpage_disallowed) + bool prefault, bool account_disallowed_nx_lpage) { struct kvm_shadow_walk_iterator it; struct kvm_mmu_page *sp; @@ -3378,7 +3378,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write, it.level - 1, true, ACC_ALL); link_shadow_page(vcpu, it.sptep, sp); - if (lpage_disallowed) + if (account_disallowed_nx_lpage) account_huge_nx_page(vcpu->kvm, sp); } } From patchwork Fri Dec 6 23:57:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277177 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40D0B930 for ; Fri, 6 Dec 2019 23:58:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1F600217F4 for ; Fri, 6 Dec 2019 23:58:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726717AbfLFX6J (ORCPT ); Fri, 6 Dec 2019 18:58:09 -0500 Received: from mga07.intel.com ([134.134.136.100]:55586 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726575AbfLFX5k (ORCPT ); Fri, 6 Dec 2019 18:57:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530359" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:38 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 11/16] KVM: x86/mmu: Consolidate tdp_page_fault() and nonpaging_page_fault() Date: Fri, 6 Dec 2019 15:57:24 -0800 Message-Id: <20191206235729.29263-12-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Consolidate the direct MMU page fault handlers into a common helper, direct_page_fault(). Except for unique max level conditions, the tdp and nonpaging fault handlers are functionally identical. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 166 +++++++++++++++-------------------------- 1 file changed, 61 insertions(+), 105 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a09a3bc0b6ae..929fca02b075 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4169,67 +4169,72 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, return false; } -static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, - u32 error_code, bool prefault) +static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, + bool prefault, int max_level, bool is_tdp) { - int r; - int level; - kvm_pfn_t pfn; - unsigned long mmu_seq; - gfn_t gfn = gpa >> PAGE_SHIFT; bool write = error_code & PFERR_WRITE_MASK; - bool map_writable; bool exec = error_code & PFERR_FETCH_MASK; bool lpage_disallowed = exec && is_nx_huge_page_enabled(); - int max_level; + bool map_writable; - /* Note, paging is disabled, ergo gva == gpa. */ + gfn_t gfn = gpa >> PAGE_SHIFT; + unsigned long mmu_seq; + kvm_pfn_t pfn; + int level, r; + + MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); + + if (page_fault_handle_page_track(vcpu, error_code, gfn)) + return RET_PF_EMULATE; + + r = mmu_topup_memory_caches(vcpu); + if (r) + return r; + + if (lpage_disallowed) + max_level = PT_PAGE_TABLE_LEVEL; + + level = mapping_level(vcpu, gfn, &max_level); + if (level > PT_PAGE_TABLE_LEVEL) + gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); + + if (fast_page_fault(vcpu, gpa, level, error_code)) + return RET_PF_RETRY; + + mmu_seq = vcpu->kvm->mmu_notifier_seq; + smp_rmb(); + + if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) + return RET_PF_RETRY; + + if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r)) + return r; + + r = RET_PF_RETRY; + spin_lock(&vcpu->kvm->mmu_lock); + if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) + goto out_unlock; + if (make_mmu_pages_available(vcpu) < 0) + goto out_unlock; + if (likely(max_level > PT_PAGE_TABLE_LEVEL)) + transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); + r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, prefault, + is_tdp && lpage_disallowed); + +out_unlock: + spin_unlock(&vcpu->kvm->mmu_lock); + kvm_release_pfn_clean(pfn); + return r; +} + +static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, + u32 error_code, bool prefault) +{ pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code); - gpa &= PAGE_MASK; - - if (page_fault_handle_page_track(vcpu, error_code, gfn)) - return RET_PF_EMULATE; - - r = mmu_topup_memory_caches(vcpu); - if (r) - return r; - - MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); - /* This path builds a PAE pagetable, we can map 2mb pages at maximum. */ - max_level = lpage_disallowed ? PT_PAGE_TABLE_LEVEL : PT_DIRECTORY_LEVEL; - - level = mapping_level(vcpu, gfn, &max_level); - if (level > PT_PAGE_TABLE_LEVEL) - gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); - - if (fast_page_fault(vcpu, gpa, level, error_code)) - return RET_PF_RETRY; - - mmu_seq = vcpu->kvm->mmu_notifier_seq; - smp_rmb(); - - if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) - return RET_PF_RETRY; - - if (handle_abnormal_pfn(vcpu, gpa, gfn, pfn, ACC_ALL, &r)) - return r; - - r = RET_PF_RETRY; - spin_lock(&vcpu->kvm->mmu_lock); - if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) - goto out_unlock; - if (make_mmu_pages_available(vcpu) < 0) - goto out_unlock; - if (likely(max_level > PT_PAGE_TABLE_LEVEL)) - transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); - r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, - prefault, false); -out_unlock: - spin_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(pfn); - return r; + return direct_page_fault(vcpu, gpa & PAGE_MASK, error_code, prefault, + PT_DIRECTORY_LEVEL, false); } int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, @@ -4273,69 +4278,20 @@ EXPORT_SYMBOL_GPL(kvm_handle_page_fault); static int tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, bool prefault) { - kvm_pfn_t pfn; - int r; - int level; - gfn_t gfn = gpa >> PAGE_SHIFT; - unsigned long mmu_seq; - int write = error_code & PFERR_WRITE_MASK; - bool map_writable; - bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) && - is_nx_huge_page_enabled(); int max_level; - MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); - - if (page_fault_handle_page_track(vcpu, error_code, gfn)) - return RET_PF_EMULATE; - - r = mmu_topup_memory_caches(vcpu); - if (r) - return r; - for (max_level = PT_MAX_HUGEPAGE_LEVEL; max_level > PT_PAGE_TABLE_LEVEL; max_level--) { int page_num = KVM_PAGES_PER_HPAGE(max_level); - gfn_t base = gfn & ~(page_num - 1); + gfn_t base = (gpa >> PAGE_SHIFT) & ~(page_num - 1); if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num)) break; } - if (lpage_disallowed) - max_level = PT_PAGE_TABLE_LEVEL; - - level = mapping_level(vcpu, gfn, &max_level); - if (level > PT_PAGE_TABLE_LEVEL) - gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1); - - if (fast_page_fault(vcpu, gpa, level, error_code)) - return RET_PF_RETRY; - - mmu_seq = vcpu->kvm->mmu_notifier_seq; - smp_rmb(); - - if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) - return RET_PF_RETRY; - - if (handle_abnormal_pfn(vcpu, 0, gfn, pfn, ACC_ALL, &r)) - return r; - - r = RET_PF_RETRY; - spin_lock(&vcpu->kvm->mmu_lock); - if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) - goto out_unlock; - if (make_mmu_pages_available(vcpu) < 0) - goto out_unlock; - if (likely(max_level > PT_PAGE_TABLE_LEVEL)) - transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); - r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, - prefault, lpage_disallowed); -out_unlock: - spin_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(pfn); - return r; + return direct_page_fault(vcpu, gpa, error_code, prefault, + max_level, true); } static void nonpaging_init_context(struct kvm_vcpu *vcpu, From patchwork Fri Dec 6 23:57:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277175 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE8D8159A for ; Fri, 6 Dec 2019 23:58:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D5F7720659 for ; Fri, 6 Dec 2019 23:58:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726828AbfLFX6K (ORCPT ); Fri, 6 Dec 2019 18:58:10 -0500 Received: from mga07.intel.com ([134.134.136.100]:55584 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726584AbfLFX5k (ORCPT ); Fri, 6 Dec 2019 18:57:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530361" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:38 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 12/16] KVM: x86/mmu: Move transparent_hugepage_adjust() above __direct_map() Date: Fri, 6 Dec 2019 15:57:25 -0800 Message-Id: <20191206235729.29263-13-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Move thp_adjust() above __direct_map() in preparation of calling thp_adjust() from __direct_map() and FNAME(fetch). No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 76 +++++++++++++++++++++--------------------- 1 file changed, 38 insertions(+), 38 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 929fca02b075..49e5d48e7327 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3324,6 +3324,44 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) __direct_pte_prefetch(vcpu, sp, sptep); } +static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, + gfn_t gfn, kvm_pfn_t *pfnp, + int *levelp) +{ + kvm_pfn_t pfn = *pfnp; + int level = *levelp; + + /* + * Check if it's a transparent hugepage. If this would be an + * hugetlbfs page, level wouldn't be set to + * PT_PAGE_TABLE_LEVEL and there would be no adjustment done + * here. + */ + if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) && + !kvm_is_zone_device_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL && + PageTransCompoundMap(pfn_to_page(pfn))) { + unsigned long mask; + /* + * mmu_notifier_retry was successful and we hold the + * mmu_lock here, so the pmd can't become splitting + * from under us, and in turn + * __split_huge_page_refcount() can't run from under + * us and we can safely transfer the refcount from + * PG_tail to PG_head as we switch the pfn to tail to + * head. + */ + *levelp = level = PT_DIRECTORY_LEVEL; + mask = KVM_PAGES_PER_HPAGE(level) - 1; + VM_BUG_ON((gfn & mask) != (pfn & mask)); + if (pfn & mask) { + kvm_release_pfn_clean(pfn); + pfn &= ~mask; + kvm_get_pfn(pfn); + *pfnp = pfn; + } + } +} + static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator it, gfn_t gfn, kvm_pfn_t *pfnp, int *levelp) { @@ -3414,44 +3452,6 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) return -EFAULT; } -static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, - gfn_t gfn, kvm_pfn_t *pfnp, - int *levelp) -{ - kvm_pfn_t pfn = *pfnp; - int level = *levelp; - - /* - * Check if it's a transparent hugepage. If this would be an - * hugetlbfs page, level wouldn't be set to - * PT_PAGE_TABLE_LEVEL and there would be no adjustment done - * here. - */ - if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) && - !kvm_is_zone_device_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL && - PageTransCompoundMap(pfn_to_page(pfn))) { - unsigned long mask; - /* - * mmu_notifier_retry was successful and we hold the - * mmu_lock here, so the pmd can't become splitting - * from under us, and in turn - * __split_huge_page_refcount() can't run from under - * us and we can safely transfer the refcount from - * PG_tail to PG_head as we switch the pfn to tail to - * head. - */ - *levelp = level = PT_DIRECTORY_LEVEL; - mask = KVM_PAGES_PER_HPAGE(level) - 1; - VM_BUG_ON((gfn & mask) != (pfn & mask)); - if (pfn & mask) { - kvm_release_pfn_clean(pfn); - pfn &= ~mask; - kvm_get_pfn(pfn); - *pfnp = pfn; - } - } -} - static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn, kvm_pfn_t pfn, unsigned access, int *ret_val) { From patchwork Fri Dec 6 23:57:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277169 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9756517EF for ; Fri, 6 Dec 2019 23:57:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7F083217F4 for ; Fri, 6 Dec 2019 23:57:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726743AbfLFX5y (ORCPT ); Fri, 6 Dec 2019 18:57:54 -0500 Received: from mga07.intel.com ([134.134.136.100]:55585 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726607AbfLFX5m (ORCPT ); Fri, 6 Dec 2019 18:57:42 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530365" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:39 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 13/16] KVM: x86/mmu: Move calls to thp_adjust() down a level Date: Fri, 6 Dec 2019 15:57:26 -0800 Message-Id: <20191206235729.29263-14-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Move the calls to thp_adjust() down a level from the page fault handlers to the map/fetch helpers and remove the page count shuffling done in thp_adjust(). Despite holding a reference to the underlying page while processing a page fault, the page fault flows don't actually rely on holding a reference to the page when thp_adjust() is called. At that point, the fault handlers hold mmu_lock, which prevents mmu_notifier from completing any invalidations, and have verified no invalidations from mmu_notifier have occurred since the page reference was acquired (which is done prior to taking mmu_lock). The kvm_release_pfn_clean()/kvm_get_pfn() dance in thp_adjust() is a quirk that is necessitated because thp_adjust() modifies the pfn that is consumed by its caller. Because the page fault handlers call kvm_release_pfn_clean() on said pfn, thp_adjust() needs to transfer the reference to the correct pfn purely for correctness when the pfn is released. Calling thp_adjust() from __direct_map() and FNAME(fetch) means the pfn adjustment doesn't change the pfn as seen by the page fault handlers, i.e. the pfn released by the page fault handlers is the same pfn that was returned by gfn_to_pfn(). Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 31 ++++++++++++------------------- arch/x86/kvm/mmu/paging_tmpl.h | 11 ++++++----- 2 files changed, 18 insertions(+), 24 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 49e5d48e7327..904fb466dd24 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3341,24 +3341,15 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, !kvm_is_zone_device_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL && PageTransCompoundMap(pfn_to_page(pfn))) { unsigned long mask; + /* - * mmu_notifier_retry was successful and we hold the - * mmu_lock here, so the pmd can't become splitting - * from under us, and in turn - * __split_huge_page_refcount() can't run from under - * us and we can safely transfer the refcount from - * PG_tail to PG_head as we switch the pfn to tail to - * head. + * mmu_notifier_retry() was successful and mmu_lock is held, so + * the pmd can't be split from under us. */ *levelp = level = PT_DIRECTORY_LEVEL; mask = KVM_PAGES_PER_HPAGE(level) - 1; VM_BUG_ON((gfn & mask) != (pfn & mask)); - if (pfn & mask) { - kvm_release_pfn_clean(pfn); - pfn &= ~mask; - kvm_get_pfn(pfn); - *pfnp = pfn; - } + *pfnp = pfn & ~mask; } } @@ -3386,8 +3377,9 @@ static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator it, } static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write, - int map_writable, int level, kvm_pfn_t pfn, - bool prefault, bool account_disallowed_nx_lpage) + int map_writable, int level, int max_level, + kvm_pfn_t pfn, bool prefault, + bool account_disallowed_nx_lpage) { struct kvm_shadow_walk_iterator it; struct kvm_mmu_page *sp; @@ -3398,6 +3390,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write, if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) return RET_PF_RETRY; + if (likely(max_level > PT_PAGE_TABLE_LEVEL)) + transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); + trace_kvm_mmu_spte_requested(gpa, level, pfn); for_each_shadow_entry(vcpu, gpa, it) { /* @@ -4216,10 +4211,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, goto out_unlock; if (make_mmu_pages_available(vcpu) < 0) goto out_unlock; - if (likely(max_level > PT_PAGE_TABLE_LEVEL)) - transparent_hugepage_adjust(vcpu, gfn, &pfn, &level); - r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, prefault, - is_tdp && lpage_disallowed); + r = __direct_map(vcpu, gpa, write, map_writable, level, max_level, pfn, + prefault, is_tdp && lpage_disallowed); out_unlock: spin_unlock(&vcpu->kvm->mmu_lock); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 7d57ec576df0..3b0ba2a77e28 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -613,7 +613,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw, */ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr, struct guest_walker *gw, - int write_fault, int hlevel, + int write_fault, int hlevel, int max_level, kvm_pfn_t pfn, bool map_writable, bool prefault, bool lpage_disallowed) { @@ -673,6 +673,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr, gfn = gw->gfn | ((addr & PT_LVL_OFFSET_MASK(gw->level)) >> PAGE_SHIFT); base_gfn = gfn; + if (max_level > PT_PAGE_TABLE_LEVEL) + transparent_hugepage_adjust(vcpu, gw->gfn, &pfn, &hlevel); + trace_kvm_mmu_spte_requested(addr, gw->level, pfn); for (; shadow_walk_okay(&it); shadow_walk_next(&it)) { @@ -865,10 +868,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT); if (make_mmu_pages_available(vcpu) < 0) goto out_unlock; - if (max_level > PT_PAGE_TABLE_LEVEL) - transparent_hugepage_adjust(vcpu, walker.gfn, &pfn, &level); - r = FNAME(fetch)(vcpu, addr, &walker, write_fault, - level, pfn, map_writable, prefault, lpage_disallowed); + r = FNAME(fetch)(vcpu, addr, &walker, write_fault, level, max_level, + pfn, map_writable, prefault, lpage_disallowed); kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT); out_unlock: From patchwork Fri Dec 6 23:57:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277171 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C3F017EF for ; Fri, 6 Dec 2019 23:58:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8336E24670 for ; Fri, 6 Dec 2019 23:58:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726791AbfLFX5z (ORCPT ); Fri, 6 Dec 2019 18:57:55 -0500 Received: from mga07.intel.com ([134.134.136.100]:55586 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726605AbfLFX5m (ORCPT ); Fri, 6 Dec 2019 18:57:42 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530368" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:40 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 14/16] KVM: x86/mmu: Move root_hpa validity checks to top of page fault handler Date: Fri, 6 Dec 2019 15:57:27 -0800 Message-Id: <20191206235729.29263-15-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a check on root_hpa at the beginning of the page fault handler to consolidate several checks on root_hpa that are scattered throughout the page fault code. This is a preparatory step towards eventually removing such checks altogether, or at the very least WARNing if an invalid root is encountered. Remove only the checks that can be easily audited to confirm that root_hpa cannot be invalidated between their current location and the new check in kvm_mmu_page_fault(), and aren't currently protected by mmu_lock, i.e. keep the checks in __direct_map() and FNAME(fetch) for the time being. The root_hpa checks that are consolidate were all added by commit 37f6a4e237303 ("KVM: x86: handle invalid root_hpa everywhere") which was a follow up to a bug fix for __direct_map(), commit 989c6b34f6a94 ("KVM: MMU: handle invalid root_hpa at __direct_map") At the time, nested VMX had, in hindsight, crazy handling of nested interrupts and would trigger a nested VM-Exit in ->interrupt_allowed(), and thus unexpectedly reset the MMU in flows such as can_do_async_pf(). Now that the wonky nested VM-Exit behavior is gone, the root_hpa checks are bogus and confusing, e.g. it's not at all obvious what they actually protect against, and at first glance they appear to be broken since many of them run without holding mmu_lock. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 904fb466dd24..5e8666d25053 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3561,9 +3561,6 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int level, u64 spte = 0ull; uint retry_count = 0; - if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) - return false; - if (!page_fault_can_be_fast(error_code)) return false; @@ -4007,9 +4004,6 @@ walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep) int root, leaf; bool reserved = false; - if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) - goto exit; - walk_shadow_page_lockless_begin(vcpu); for (shadow_walk_init(&iterator, vcpu, addr), @@ -4039,7 +4033,7 @@ walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep) root--; } } -exit: + *sptep = spte; return reserved; } @@ -4103,9 +4097,6 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr) struct kvm_shadow_walk_iterator iterator; u64 spte; - if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) - return; - walk_shadow_page_lockless_begin(vcpu); for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) { clear_sp_write_flooding_count(iterator.sptep); @@ -5468,6 +5459,9 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, int r, emulation_type = 0; bool direct = vcpu->arch.mmu->direct_map; + if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) + return RET_PF_RETRY; + /* With shadow page tables, fault_address contains a GVA or nGPA. */ if (vcpu->arch.mmu->direct_map) { vcpu->arch.gpa_available = true; From patchwork Fri Dec 6 23:57:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277167 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F0C417EF for ; Fri, 6 Dec 2019 23:57:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7677220659 for ; Fri, 6 Dec 2019 23:57:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726720AbfLFX5x (ORCPT ); Fri, 6 Dec 2019 18:57:53 -0500 Received: from mga07.intel.com ([134.134.136.100]:55584 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726608AbfLFX5m (ORCPT ); Fri, 6 Dec 2019 18:57:42 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530371" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:40 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 15/16] KVM: x86/mmu: WARN on an invalid root_hpa Date: Fri, 6 Dec 2019 15:57:28 -0800 Message-Id: <20191206235729.29263-16-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org WARN on the existing invalid root_hpa checks in __direct_map() and FNAME(fetch). The "legitimate" path that invalidated root_hpa in the middle of a page fault is long since gone, i.e. it should no longer be impossible to invalidate in the middle of a page fault[*]. The root_hpa checks were added by two related commits 989c6b34f6a94 ("KVM: MMU: handle invalid root_hpa at __direct_map") 37f6a4e237303 ("KVM: x86: handle invalid root_hpa everywhere") to fix a bug where nested_vmx_vmexit() could be called *in the middle* of a page fault. At the time, vmx_interrupt_allowed(), which was and still is used by kvm_can_do_async_pf() via ->interrupt_allowed(), directly invoked nested_vmx_vmexit() to switch from L2 to L1 to emulate a VM-Exit on a pending interrupt. Emulating the nested VM-Exit resulted in root_hpa being invalidated by kvm_mmu_reset_context() without explicitly terminating the page fault. Now that root_hpa is checked for validity by kvm_mmu_page_fault(), WARN on an invalid root_hpa to detect any flows that reset the MMU while handling a page fault. The broken vmx_interrupt_allowed() behavior has long since been fixed and resetting the MMU during a page fault should not be considered legal behavior. [*] It's actually technically possible in FNAME(page_fault)() because it calls inject_page_fault() when the guest translation is invalid, but in that case the page fault handling is immediately terminated. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 5e8666d25053..88fd1022731f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3387,7 +3387,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write, gfn_t gfn = gpa >> PAGE_SHIFT; gfn_t base_gfn = gfn; - if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) + if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) return RET_PF_RETRY; if (likely(max_level > PT_PAGE_TABLE_LEVEL)) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 3b0ba2a77e28..b53bed3c901c 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -637,7 +637,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr, if (FNAME(gpte_changed)(vcpu, gw, top_level)) goto out_gpte_changed; - if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) + if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) goto out_gpte_changed; for (shadow_walk_init(&it, vcpu, addr); From patchwork Fri Dec 6 23:57:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 11277165 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2600F14B7 for ; Fri, 6 Dec 2019 23:57:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0E746217F4 for ; Fri, 6 Dec 2019 23:57:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726673AbfLFX5q (ORCPT ); Fri, 6 Dec 2019 18:57:46 -0500 Received: from mga07.intel.com ([134.134.136.100]:55585 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726626AbfLFX5m (ORCPT ); Fri, 6 Dec 2019 18:57:42 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 15:57:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,286,1571727600"; d="scan'208";a="219530375" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 15:57:40 -0800 From: Sean Christopherson To: Paolo Bonzini , =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 16/16] KVM: x86/mmu: WARN if root_hpa is invalid when handling a page fault Date: Fri, 6 Dec 2019 15:57:29 -0800 Message-Id: <20191206235729.29263-17-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com> References: <20191206235729.29263-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org WARN if root_hpa is invalid when handling a page fault. The check on root_hpa exists for historical reasons that no longer apply to the current KVM code base. Remove an equivalent debug-only warning in direct_page_fault(), whose existence more or less confirms that root_hpa should always be valid when handling a page fault. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 88fd1022731f..32534ea1efd0 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4168,8 +4168,6 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, kvm_pfn_t pfn; int level, r; - MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)); - if (page_fault_handle_page_track(vcpu, error_code, gfn)) return RET_PF_EMULATE; @@ -5459,7 +5457,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, int r, emulation_type = 0; bool direct = vcpu->arch.mmu->direct_map; - if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) + if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) return RET_PF_RETRY; /* With shadow page tables, fault_address contains a GVA or nGPA. */