[v2,00/17] KVM: x86/mmu: Page fault and MMIO cleanups

Message ID	20240507155817.3951344-1-pbonzini@redhat.com (mailing list archive)
Headers	show Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20D27165FD4 for <kvm@vger.kernel.org>; Tue, 7 May 2024 15:58:20 +0000 (UTC) From: Paolo Bonzini <pbonzini@redhat.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Date: Tue, 7 May 2024 11:58:00 -0400 Message-ID: <20240507155817.3951344-1-pbonzini@redhat.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	KVM: x86/mmu: Page fault and MMIO cleanups \| expand [v2,00/17] KVM: x86/mmu: Page fault and MMIO cleanups [01/17] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation [02/17] KVM: x86: Remove separate "bit" defines for page fault error code masks [03/17] KVM: x86: Define more SEV+ page fault error bits/flags for #NPF [04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler [05/17] KVM: x86/mmu: Pass full 64-bit error code when handling page faults [06/17] KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero [07/17] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults [08/17] KVM: x86/mmu: check for invalid async page faults involving private memory [09/17] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults [10/17] KVM: x86/mmu: Move private vs. shared check above slot validity checks [11/17] KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots [12/17] KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO [13/17] KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn() [14/17] KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn() [15/17] KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults [16/17] KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values [17/17] KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns

This is an updated version of the series at https://patchew.org/linux/20240228024147.41573-1-seanjc@google.com/ which has been used for SEV-SNP development, taking into account all review comments. Patch 8 is the only completely new patch (and even then it had been posted together with the various TDX/SNP prep series). Here is an annotated git range-diff: ============================================================================== KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation - tweak commit message, add comment @@ Commit message Exit to userspace with -EFAULT / KVM_EXIT_MEMORY_FAULT if a private fault triggers emulation of any kind, as KVM doesn't currently support emulating access to guest private memory. Practically speaking, private faults and - emulation are already mutually exclusive, but there are edge cases upon - edge cases where KVM can return RET_PF_EMULATE, and adding one last check - to harden against weird, unexpected combinations is inexpensive. + emulation are already mutually exclusive, but there are many flow that + can result in KVM returning RET_PF_EMULATE, and adding one last check + to harden against weird, unexpected combinations and/or KVM bugs is + inexpensive. Suggested-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> @@ arch/x86/kvm/mmu/mmu_internal.h: static inline int kvm_mmu_do_page_fault(struct else r = vcpu->arch.mmu->page_fault(vcpu, &fault); ++ /* ++ * Not sure what's happening, but punt to userspace and hope that ++ * they can fix it by changing memory to shared, or they can ++ * provide a better error. ++ */ + if (r == RET_PF_EMULATE && fault.is_private) { ++ pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n"); + kvm_mmu_prepare_memory_fault_exit(vcpu, &fault); + return -EFAULT; + } ============================================================================== KVM: x86: Remove separate "bit" defines for page fault error code masks - do not use ilog2 @@ Commit message just to see which flag corresponds to which bit is quite annoying, as is having to define two macros just to add recognition of a new flag. - Use ilog2() to derive the bit in permission_fault(), the one function that - actually needs the bit number (it does clever shifting to manipulate flags - in order to avoid conditional branches). + Use ternary operator to derive the bit in permission_fault(), the one + function that actually needs the bit number as part of clever shifting + to avoid conditional branches. Generally the compiler is able to turn + it into a conditional move, and if not it's not really a big deal. No functional change intended. @@ arch/x86/kvm/mmu.h: static inline u8 permission_fault(struct kvm_vcpu *vcpu, str u64 implicit_access = access & PFERR_IMPLICIT_ACCESS; bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC; - int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1; -+ int index = (pfec + (not_smap << ilog2(PFERR_RSVD_MASK))) >> 1; ++ int index = (pfec | (not_smap ? PFERR_RSVD_MASK : 0)) >> 1; u32 errcode = PFERR_PRESENT_MASK; bool fault; @@ arch/x86/kvm/mmu.h: static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, + pkru_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; /* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */ - offset = (pfec & ~1) + +- offset = (pfec & ~1) + - ((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT)); -+ ((pte_access & PT_USER_MASK) << (ilog2(PFERR_RSVD_MASK) - PT_USER_SHIFT)); ++ offset = (pfec & ~1) | ((pte_access & PT_USER_MASK) ? PFERR_RSVD_MASK : 0); pkru_bits &= mmu->pkru_mask >> offset; errcode |= -pkru_bits & PFERR_PK_MASK; ============================================================================== KVM: x86: Define more SEV+ page fault error bits/flags for #NPF - match commit message to bits defined in header file @@ Commit message Define more #NPF error code flags that are relevant to SEV+ (mostly SNP) guests, as specified by the APM: + * Bit 31 (RMP): Set to 1 if the fault was caused due to an RMP check or a + VMPL check failure, 0 otherwise. * Bit 34 (ENC): Set to 1 if the guest’s effective C-bit was 1, 0 otherwise. * Bit 35 (SIZEM): Set to 1 if the fault was caused by a size mismatch between PVALIDATE or RMPADJUST and the RMP, 0 otherwise. * Bit 36 (VMPL): Set to 1 if the fault was caused by a VMPL permission check failure, 0 otherwise. - * Bit 37 (SSS): Set to VMPL permission mask SSS (bit 4) value if VmplSSS is - enabled. Note, the APM is *extremely* misleading, and strongly implies that the above flags can _only_ be set for #NPF exits from SNP guests. That is a ============================================================================== KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler - add a description of PFERR_IMPLICIT_ACCESS @@ Commit message Add a compile-time assert in the legacy #PF handler to make sure that KVM- define flags are covered by its existing sanity check on the upper bits. + Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we + are removing the comment that defined it. + Signed-off-by: Sean Christopherson <seanjc@google.com> + Reviewed-by: Kai Huang <kai.huang@intel.com> + Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Message-ID: <20240228024147.41573-8-seanjc@google.com> + Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> + + ## arch/x86/include/asm/kvm_host.h ## +@@ arch/x86/include/asm/kvm_host.h: enum x86_intercept_stage; + #define PFERR_GUEST_ENC_MASK BIT_ULL(34) + #define PFERR_GUEST_SIZEM_MASK BIT_ULL(35) + #define PFERR_GUEST_VMPL_MASK BIT_ULL(36) ++ ++/* ++ * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks ++ * when emulating instructions that triggers implicit access. ++ */ + #define PFERR_IMPLICIT_ACCESS BIT_ULL(48) ++#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS) + + #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \ + PFERR_WRITE_MASK | \ ## arch/x86/kvm/mmu/mmu.c ## @@ arch/x86/kvm/mmu/mmu.c: int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, - if (WARN_ON_ONCE(error_code >> 32)) - error_code = lower_32_bits(error_code); + return -EFAULT; + #endif + /* Ensure the above sanity check also covers KVM-defined flags. */ + BUILD_BUG_ON(lower_32_bits(PFERR_SYNTHETIC_MASK)); - move in front of the other synthetic page fault error code patches ============================================================================== KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero commit message copy editing @@ Commit message and even more explicitly in the SDM as VMCS.VM_EXIT_INTR_ERROR_CODE is a 32-bit field. - Simply drop the upper bits of hardware provides garbage, as spurious + Simply drop the upper bits if hardware provides garbage, as spurious information should do no harm (though in all likelihood hardware is buggy and the kernel is doomed). @@ Commit message which in turn will allow deriving PFERR_PRIVATE_ACCESS from AMD's PFERR_GUEST_ENC_MASK without running afoul of the sanity check. - Note, this also why Intel uses bit 15 for SGX (highest bit on Intel CPUs) + Note, this is also why Intel uses bit 15 for SGX (highest bit on Intel CPUs) and AMD uses bit 31 for RMP (highest bit on AMD CPUs); using the highest bit minimizes the probability of a collision with the "other" vendor, without needing to plumb more bits through microcode. ============================================================================== KVM: x86/mmu: Use synthetic page fault error code to indicate private faults - patch reordering, no other changes ============================================================================== KVM: x86/mmu: check for invalid async page faults involving private memory - new patch coming from TDX/SNP prep series; test PFERR_PRIVATE_ACCESS, set arch.error_code @@ arch/x86/kvm/mmu/mmu.c: static u32 alloc_apf_token(struct kvm_vcpu *vcpu) arch.token = alloc_apf_token(vcpu); - arch.gfn = gfn; + arch.gfn = fault->gfn; ++ arch.error_code = fault->error_code; arch.direct_map = vcpu->arch.mmu->root_role.direct; arch.cr3 = kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu); @@ arch/x86/kvm/mmu/mmu.c: static u32 alloc_apf_token(struct kvm_vcpu *vcpu) { int r; -+ if (WARN_ON_ONCE(work->arch.error_code & PFERR_GUEST_ENC_MASK)) ++ if (WARN_ON_ONCE(work->arch.error_code & PFERR_PRIVATE_ACCESS)) + return; + if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) || ============================================================================== KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults - exit to userspace if the wrong case happens, test PFERR_PRIVATE_ACCESS @@ Commit message ## arch/x86/kvm/mmu/mmu.c ## @@ arch/x86/kvm/mmu/mmu.c: int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err - error_code |= PFERR_PRIVATE_ACCESS; r = RET_PF_INVALID; -- if (unlikely(error_code & PFERR_RSVD_MASK)) { -+ if (unlikely((error_code & PFERR_RSVD_MASK) && -+ !WARN_ON_ONCE(error_code & PFERR_GUEST_ENC_MASK))) { + if (unlikely(error_code & PFERR_RSVD_MASK)) { ++ if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS)) ++ return -EFAULT; ++ r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct); if (r == RET_PF_EMULATE) goto emulate; ============================================================================== KVM: x86/mmu: Move private vs. shared check above slot validity checks - add comment about use of mmu_invalidate_seq @@ arch/x86/kvm/mmu/mmu.c: static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, stru return kvm_faultin_pfn_private(vcpu, fault); @@ arch/x86/kvm/mmu/mmu.c: static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, + { + int ret; + ++ /* ++ * Note that the mmu_invalidate_seq also serves to detect a concurrent ++ * change in attributes. is_page_fault_stale() will detect an ++ * invalidation relate to fault->fn and resume the guest without ++ * installing a mapping in the page tables. ++ */ fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq; smp_rmb(); + /* -+ * Check for a private vs. shared mismatch *after* taking a snapshot of -+ * mmu_invalidate_seq, as changes to gfn attributes are guarded by the -+ * invalidation notifier. ++ * Now that we have a snapshot of mmu_invalidate_seq we can check for a ++ * private vs. shared mismatch. + */ + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); ============================================================================== KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn() - differences in moved code, range-diff is unreadable ============================================================================== KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn() - remove unnecessary change @@ arch/x86/kvm/mmu/mmu.c: static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct return kvm_handle_noslot_fault(vcpu, fault, access); /* - - ## arch/x86/kvm/mmu/mmu_internal.h ## -@@ arch/x86/kvm/mmu/mmu_internal.h: struct kvm_page_fault { - /* The memslot containing gfn. May be NULL. */ - struct kvm_memory_slot *slot; - -- /* Outputs of kvm_faultin_pfn. */ -+ /* Outputs of kvm_faultin_pfn. */ - unsigned long mmu_seq; - kvm_pfn_t pfn; - hva_t hva; Isaku Yamahata (1): KVM: x86/mmu: Pass full 64-bit error code when handling page faults Paolo Bonzini (1): KVM: x86/mmu: check for invalid async page faults involving private memory Sean Christopherson (15): KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation KVM: x86: Remove separate "bit" defines for page fault error code masks KVM: x86: Define more SEV+ page fault error bits/flags for #NPF KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero KVM: x86/mmu: Use synthetic page fault error code to indicate private faults KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults KVM: x86/mmu: Move private vs. shared check above slot validity checks KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn() KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn() KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns arch/x86/include/asm/kvm_host.h | 46 ++++---- arch/x86/kvm/mmu.h | 5 +- arch/x86/kvm/mmu/mmu.c | 182 ++++++++++++++++++++------------ arch/x86/kvm/mmu/mmu_internal.h | 28 ++++- arch/x86/kvm/mmu/mmutrace.h | 2 +- arch/x86/kvm/svm/svm.c | 9 ++ 6 files changed, 174 insertions(+), 98 deletions(-)