[v3,01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush

Flush all EPTP/VPID contexts if a TLB flush _may_ have been triggered by
a remote or deferred TLB flush, i.e. by KVM_REQ_TLB_FLUSH.  Remote TLB
flushes require all contexts to be invalidated, not just the active
contexts, e.g. all mappings in all contexts for a given HVA need to be
invalidated on a mmu_notifier invalidation.  Similarly, the instigator
of the deferred TLB flush may be expecting all contexts to be flushed,
e.g. vmx_vcpu_load_vmcs().

Without nested VMX, flushing only the current EPTP/VPID context isn't
problematic because KVM uses a constant VPID for each vCPU, and
mmu_alloc_direct_roots() all but guarantees KVM will use a single EPTP
for L1.  In the rare case where a different EPTP is created or reused,
KVM (currently) unconditionally flushes the new EPTP context prior to
entering the guest.

With nested VMX, KVM conditionally uses a different VPID for L2, and
unconditionally uses a different EPTP for L2.  Because KVM doesn't
_intentionally_ guarantee L2's EPTP/VPID context is flushed on nested
VM-Enter, it'd be possible for a malicious L1 to attack the host and/or
different VMs by exploiting the lack of flushing for L2.

  1) Launch nested guest from malicious L1.

  2) Nested VM-Enter to L2.

  3) Access target GPA 'g'.  CPU inserts TLB entry tagged with L2's ASID
     mapping 'g' to host PFN 'x'.

  2) Nested VM-Exit to L1.

  3) L1 triggers kernel same-page merging (ksm) by duplicating/zeroing
     the page for PFN 'x'.

  4) Host kernel merges PFN 'x' with PFN 'y', i.e. unmaps PFN 'x' and
     remaps the page to PFN 'y'.  mmu_notifier sends invalidate command,
     KVM flushes TLB only for L1's ASID.

  4) Host kernel reallocates PFN 'x' to some other task/guest.

  5) Nested VM-Enter to L2.  KVM does not invalidate L2's EPTP or VPID.

  6) L2 accesses GPA 'g' and gains read/write access to PFN 'x' via its
     stale TLB entry.

However, current KVM unconditionally flushes L1's EPTP/VPID context on
nested VM-Exit.  But, that behavior is mostly unintentional, KVM doesn't
go out of its way to flush EPTP/VPID on nested VM-Enter/VM-Exit, rather
a TLB flush is guaranteed to occur prior to re-entering L1 due to
__kvm_mmu_new_cr3() always being called with skip_tlb_flush=false.  On
nested VM-Enter, this happens via kvm_init_shadow_ept_mmu() (nested EPT
enabled) or in nested_vmx_load_cr3() (nested EPT disabled).  On nested
VM-Exit it occurs via nested_vmx_load_cr3().

This also fixes a bug where a deferred TLB flush in the context of L2,
with EPT disabled, would flush L1's VPID instead of L2's VPID, as
vmx_flush_tlb() flushes L1's VPID regardless of is_guest_mode().

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Junaid Shahid <junaids@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: John Haxby <john.haxby@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Fixes: efebf0aaec3d ("KVM: nVMX: Do not flush TLB on L1<->L2 transitions if L1 uses VPID and EPT")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.h | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

Message ID	20200320212833.3507-2-sean.j.christopherson@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=CcrQ=5F=vger.kernel.org=kvm-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E7B9913 for <patchwork-kvm@patchwork.kernel.org>; Fri, 20 Mar 2020 21:31:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 286A220775 for <patchwork-kvm@patchwork.kernel.org>; Fri, 20 Mar 2020 21:31:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727054AbgCTV2o (ORCPT <rfc822;patchwork-kvm@patchwork.kernel.org>); Fri, 20 Mar 2020 17:28:44 -0400 Received: from mga01.intel.com ([192.55.52.88]:48422 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726801AbgCTV2o (ORCPT <rfc822;kvm@vger.kernel.org>); Fri, 20 Mar 2020 17:28:44 -0400 IronPort-SDR: WZ3M7p7Pw+L5IFaElIQ6M0/G/r+wQOzsJuUrlUQelK8StVsWEmgvM+UKQyRE/SfjKywTBl5mp4 zAdrr1SCaBlA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Mar 2020 14:28:43 -0700 IronPort-SDR: kRg8672Th143DJPmBX2VmGiCrlwhtjjkTJigxEvDm1JPKQi0yDQ8opP6TPhCTWf6MhrFKkyet6 Pu7xCLHfsJkw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,286,1580803200"; d="scan'208";a="269224393" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.202]) by fmsmga004.fm.intel.com with ESMTP; 20 Mar 2020 14:28:42 -0700 From: Sean Christopherson <sean.j.christopherson@intel.com> To: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon <bgardon@google.com>, Junaid Shahid <junaids@google.com>, Liran Alon <liran.alon@oracle.com>, Boris Ostrovsky <boris.ostrovsky@oracle.com>, John Haxby <john.haxby@oracle.com>, Miaohe Lin <linmiaohe@huawei.com>, Tom Lendacky <thomas.lendacky@amd.com> Subject: [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush Date: Fri, 20 Mar 2020 14:27:57 -0700 Message-Id: <20200320212833.3507-2-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200320212833.3507-1-sean.j.christopherson@intel.com> References: <20200320212833.3507-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: <kvm.vger.kernel.org> X-Mailing-List: kvm@vger.kernel.org
Series	KVM: x86: TLB flushing fixes and enhancements \| expand [v3,00/37] KVM: x86: TLB flushing fixes and enhancements [v3,01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush [v3,02/37] KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT) [v3,03/37] KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 [v3,04/37] KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT [v3,05/37] KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault) [v3,06/37] KVM: x86: Consolidate logic for injecting page faults to L1 [v3,07/37] KVM: x86: Sync SPTEs when injecting page/EPT fault into L1 [v3,08/37] KVM: VMX: Skip global INVVPID fallback if vpid==0 in vpid_sync_context() [v3,09/37] KVM: VMX: Use vpid_sync_context() directly when possible [v3,10/37] KVM: VMX: Move vpid_sync_vcpu_addr() down a few lines [v3,11/37] KVM: VMX: Handle INVVPID fallback logic in vpid_sync_vcpu_addr() [v3,12/37] KVM: VMX: Drop redundant capability checks in low level INVVPID helpers [v3,13/37] KVM: nVMX: Use vpid_sync_vcpu_addr() to emulate INVVPID with address [v3,14/37] KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook [v3,15/37] KVM: VMX: Clean up vmx_flush_tlb_gva() [v3,16/37] KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush() [v3,17/37] KVM: SVM: Wire up ->tlb_flush_guest() directly to svm_flush_tlb() [v3,18/37] KVM: VMX: Move vmx_flush_tlb() to vmx.c [v3,19/37] KVM: nVMX: Move nested_get_vpid02() to vmx/nested.h [v3,20/37] KVM: VMX: Introduce vmx_flush_tlb_current() [v3,21/37] KVM: SVM: Document the ASID logic in svm_flush_tlb() [v3,22/37] KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all() [v3,23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit [v3,24/37] KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID [v3,25/37] KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes [v3,26/37] KVM: nVMX: Selectively use TLB_FLUSH_CURRENT for nested VM-Enter/VM-Exit [v3,27/37] KVM: nVMX: Reload APIC access page on nested VM-Exit only if necessary [v3,28/37] KVM: VMX: Retrieve APIC access page HPA only when necessary [v3,29/37] KVM: VMX: Don't reload APIC access page if its control is disabled [v3,30/37] KVM: x86/mmu: Move fast_cr3_switch() side effects to __kvm_mmu_new_cr3() [v3,31/37] KVM: x86/mmu: Add separate override for MMU sync during fast CR3 switch [v3,32/37] KVM: x86/mmu: Add module param to force TLB flush on root reuse [v3,33/37] KVM: nVMX: Skip MMU sync on nested VMX transition when possible [v3,34/37] KVM: nVMX: Don't flush TLB on nested VMX transition [v3,35/37] KVM: nVMX: Free only the affected contexts when emulating INVEPT [v3,36/37] KVM: x86: Replace "cr3" with "pgd" in "new cr3/pgd" related code [v3,37/37] KVM: VMX: Clean cr3/pgd handling in vmx_load_mmu_pgd()

[v3,01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush

Commit Message

Comments

Patch