Message ID | 20200320212833.3507-5-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: TLB flushing fixes and enhancements | expand |
Sean Christopherson <sean.j.christopherson@intel.com> writes: > From: Junaid Shahid <junaids@google.com> > > Free all roots when emulating INVVPID for L1 and EPT is disabled, as > outstanding changes to the page tables managed by L1 need to be > recognized. Because L1 and L2 share an MMU when EPT is disabled, and > because VPID is not tracked by the MMU role, all roots in the current > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume > stale SPTEs. > > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation") > Signed-off-by: Junaid Shahid <junaids@google.com> > [sean: ported to upstream KVM, reworded the comment and changelog] > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 9624cea4ed9f..bc74fbbf33c6 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) > return kvm_skip_emulated_instruction(vcpu); > } > > + /* > + * Sync the shadow page tables if EPT is disabled, L1 is invalidating > + * linear mappings for L2 (tagged with L2's VPID). Free all roots as > + * VPIDs are not tracked in the MMU role. > + * > + * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share > + * an MMU when EPT is disabled. > + * > + * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR. > + */ > + if (!enable_ept) > + kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, > + KVM_MMU_ROOTS_ALL); > + This is related to my remark on the previous patch; the comment above makes me think I'm missing something obvious, enlighten me please) My understanding is that L1 and L2 will share arch.root_mmu not only when EPT is globally disabled, we seem to switch between root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2 guests may be different on this. Do we need to handle this somehow? > return nested_vmx_succeed(vcpu); > }
On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote: > Sean Christopherson <sean.j.christopherson@intel.com> writes: > > > From: Junaid Shahid <junaids@google.com> > > > > Free all roots when emulating INVVPID for L1 and EPT is disabled, as > > outstanding changes to the page tables managed by L1 need to be > > recognized. Because L1 and L2 share an MMU when EPT is disabled, and > > because VPID is not tracked by the MMU role, all roots in the current > > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or > > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume > > stale SPTEs. > > > > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation") > > Signed-off-by: Junaid Shahid <junaids@google.com> > > [sean: ported to upstream KVM, reworded the comment and changelog] > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > > --- > > arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > > index 9624cea4ed9f..bc74fbbf33c6 100644 > > --- a/arch/x86/kvm/vmx/nested.c > > +++ b/arch/x86/kvm/vmx/nested.c > > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) > > return kvm_skip_emulated_instruction(vcpu); > > } > > > > + /* > > + * Sync the shadow page tables if EPT is disabled, L1 is invalidating > > + * linear mappings for L2 (tagged with L2's VPID). Free all roots as > > + * VPIDs are not tracked in the MMU role. > > + * > > + * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share > > + * an MMU when EPT is disabled. > > + * > > + * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR. > > + */ > > + if (!enable_ept) > > + kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, > > + KVM_MMU_ROOTS_ALL); > > + > > This is related to my remark on the previous patch; the comment above > makes me think I'm missing something obvious, enlighten me please) > > My understanding is that L1 and L2 will share arch.root_mmu not only > when EPT is globally disabled, we seem to switch between > root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2 > guests may be different on this. Do we need to handle this somehow? guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1. enable_ept is global and cannot be changed without reloading kvm_intel. This most definitely over-invalidates, e.g. it blasts away L1's page tables. But, fixing that requires tracking VPID in mmu_role and/or adding support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is disabled. Assuming the vast majority of nested deployments enable EPT in L0, the cost of both options likely outweighs the benefits. > > return nested_vmx_succeed(vcpu); > > } > > -- > Vitaly >
Sean Christopherson <sean.j.christopherson@intel.com> writes: > On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote: >> Sean Christopherson <sean.j.christopherson@intel.com> writes: >> >> > From: Junaid Shahid <junaids@google.com> >> > >> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as >> > outstanding changes to the page tables managed by L1 need to be >> > recognized. Because L1 and L2 share an MMU when EPT is disabled, and >> > because VPID is not tracked by the MMU role, all roots in the current >> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or >> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume >> > stale SPTEs. >> > >> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation") >> > Signed-off-by: Junaid Shahid <junaids@google.com> >> > [sean: ported to upstream KVM, reworded the comment and changelog] >> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> >> > --- >> > arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++ >> > 1 file changed, 14 insertions(+) >> > >> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c >> > index 9624cea4ed9f..bc74fbbf33c6 100644 >> > --- a/arch/x86/kvm/vmx/nested.c >> > +++ b/arch/x86/kvm/vmx/nested.c >> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) >> > return kvm_skip_emulated_instruction(vcpu); >> > } >> > >> > + /* >> > + * Sync the shadow page tables if EPT is disabled, L1 is invalidating >> > + * linear mappings for L2 (tagged with L2's VPID). Free all roots as >> > + * VPIDs are not tracked in the MMU role. >> > + * >> > + * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share >> > + * an MMU when EPT is disabled. >> > + * >> > + * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR. >> > + */ >> > + if (!enable_ept) >> > + kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, >> > + KVM_MMU_ROOTS_ALL); >> > + >> >> This is related to my remark on the previous patch; the comment above >> makes me think I'm missing something obvious, enlighten me please) >> >> My understanding is that L1 and L2 will share arch.root_mmu not only >> when EPT is globally disabled, we seem to switch between >> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2 >> guests may be different on this. Do we need to handle this somehow? > > guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1. > enable_ept is global and cannot be changed without reloading kvm_intel. > > This most definitely over-invalidates, e.g. it blasts away L1's page > tables. But, fixing that requires tracking VPID in mmu_role and/or adding > support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is > disabled. Assuming the vast majority of nested deployments enable EPT in > L0, the cost of both options likely outweighs the benefits. > Yes but my question rather was: what if global 'enable_ept' is true but nested EPT is not being used by L1, don't we still need to do kvm_mmu_free_roots(&vcpu->arch.root_mmu) here?
On Mon, Mar 23, 2020 at 05:33:08PM +0100, Vitaly Kuznetsov wrote: > Sean Christopherson <sean.j.christopherson@intel.com> writes: > > > On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote: > >> Sean Christopherson <sean.j.christopherson@intel.com> writes: > >> > >> > From: Junaid Shahid <junaids@google.com> > >> > > >> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as > >> > outstanding changes to the page tables managed by L1 need to be > >> > recognized. Because L1 and L2 share an MMU when EPT is disabled, and > >> > because VPID is not tracked by the MMU role, all roots in the current > >> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or > >> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume > >> > stale SPTEs. > >> > > >> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation") > >> > Signed-off-by: Junaid Shahid <junaids@google.com> > >> > [sean: ported to upstream KVM, reworded the comment and changelog] > >> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > >> > --- > >> > arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++ > >> > 1 file changed, 14 insertions(+) > >> > > >> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > >> > index 9624cea4ed9f..bc74fbbf33c6 100644 > >> > --- a/arch/x86/kvm/vmx/nested.c > >> > +++ b/arch/x86/kvm/vmx/nested.c > >> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) > >> > return kvm_skip_emulated_instruction(vcpu); > >> > } > >> > > >> > + /* > >> > + * Sync the shadow page tables if EPT is disabled, L1 is invalidating > >> > + * linear mappings for L2 (tagged with L2's VPID). Free all roots as > >> > + * VPIDs are not tracked in the MMU role. > >> > + * > >> > + * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share > >> > + * an MMU when EPT is disabled. > >> > + * > >> > + * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR. > >> > + */ > >> > + if (!enable_ept) > >> > + kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, > >> > + KVM_MMU_ROOTS_ALL); > >> > + > >> > >> This is related to my remark on the previous patch; the comment above > >> makes me think I'm missing something obvious, enlighten me please) > >> > >> My understanding is that L1 and L2 will share arch.root_mmu not only > >> when EPT is globally disabled, we seem to switch between > >> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2 > >> guests may be different on this. Do we need to handle this somehow? > > > > guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1. > > enable_ept is global and cannot be changed without reloading kvm_intel. > > > > This most definitely over-invalidates, e.g. it blasts away L1's page > > tables. But, fixing that requires tracking VPID in mmu_role and/or adding > > support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is > > disabled. Assuming the vast majority of nested deployments enable EPT in > > L0, the cost of both options likely outweighs the benefits. > > > > Yes but my question rather was: what if global 'enable_ept' is true but > nested EPT is not being used by L1, don't we still need to do > kvm_mmu_free_roots(&vcpu->arch.root_mmu) here? No, because L0 isn't shadowing the L1->L2 page tables, i.e. there can't be unsync'd SPTEs for L2. The vpid_sync_*() above flushes the TLB for L2's effective VPID, which is all that's required.
Sean Christopherson <sean.j.christopherson@intel.com> writes: > On Mon, Mar 23, 2020 at 05:33:08PM +0100, Vitaly Kuznetsov wrote: >> Sean Christopherson <sean.j.christopherson@intel.com> writes: >> >> > On Mon, Mar 23, 2020 at 04:34:17PM +0100, Vitaly Kuznetsov wrote: >> >> Sean Christopherson <sean.j.christopherson@intel.com> writes: >> >> >> >> > From: Junaid Shahid <junaids@google.com> >> >> > >> >> > Free all roots when emulating INVVPID for L1 and EPT is disabled, as >> >> > outstanding changes to the page tables managed by L1 need to be >> >> > recognized. Because L1 and L2 share an MMU when EPT is disabled, and >> >> > because VPID is not tracked by the MMU role, all roots in the current >> >> > MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or >> >> > VM-Exit could do a fast CR3 switch (without a flush/sync) and consume >> >> > stale SPTEs. >> >> > >> >> > Fixes: 5c614b3583e7b ("KVM: nVMX: nested VPID emulation") >> >> > Signed-off-by: Junaid Shahid <junaids@google.com> >> >> > [sean: ported to upstream KVM, reworded the comment and changelog] >> >> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> >> >> > --- >> >> > arch/x86/kvm/vmx/nested.c | 14 ++++++++++++++ >> >> > 1 file changed, 14 insertions(+) >> >> > >> >> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c >> >> > index 9624cea4ed9f..bc74fbbf33c6 100644 >> >> > --- a/arch/x86/kvm/vmx/nested.c >> >> > +++ b/arch/x86/kvm/vmx/nested.c >> >> > @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) >> >> > return kvm_skip_emulated_instruction(vcpu); >> >> > } >> >> > >> >> > + /* >> >> > + * Sync the shadow page tables if EPT is disabled, L1 is invalidating >> >> > + * linear mappings for L2 (tagged with L2's VPID). Free all roots as >> >> > + * VPIDs are not tracked in the MMU role. >> >> > + * >> >> > + * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share >> >> > + * an MMU when EPT is disabled. >> >> > + * >> >> > + * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR. >> >> > + */ >> >> > + if (!enable_ept) >> >> > + kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, >> >> > + KVM_MMU_ROOTS_ALL); >> >> > + >> >> >> >> This is related to my remark on the previous patch; the comment above >> >> makes me think I'm missing something obvious, enlighten me please) >> >> >> >> My understanding is that L1 and L2 will share arch.root_mmu not only >> >> when EPT is globally disabled, we seem to switch between >> >> root_mmu/guest_mmu only when nested_cpu_has_ept(vmcs12) but different L2 >> >> guests may be different on this. Do we need to handle this somehow? >> > >> > guest_mmu is used iff nested EPT is enabled, which requires enable_ept=1. >> > enable_ept is global and cannot be changed without reloading kvm_intel. >> > >> > This most definitely over-invalidates, e.g. it blasts away L1's page >> > tables. But, fixing that requires tracking VPID in mmu_role and/or adding >> > support for using guest_mmu when L1 isn't using TDP, i.e. nested EPT is >> > disabled. Assuming the vast majority of nested deployments enable EPT in >> > L0, the cost of both options likely outweighs the benefits. >> > >> >> Yes but my question rather was: what if global 'enable_ept' is true but >> nested EPT is not being used by L1, don't we still need to do >> kvm_mmu_free_roots(&vcpu->arch.root_mmu) here? > > No, because L0 isn't shadowing the L1->L2 page tables, i.e. there can't be > unsync'd SPTEs for L2. The vpid_sync_*() above flushes the TLB for L2's > effective VPID, which is all that's required. Ah, stupid me, it's actually EPT and not nested EPT which we care about here. Thank you for the clarification! Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 9624cea4ed9f..bc74fbbf33c6 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -5250,6 +5250,20 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) return kvm_skip_emulated_instruction(vcpu); } + /* + * Sync the shadow page tables if EPT is disabled, L1 is invalidating + * linear mappings for L2 (tagged with L2's VPID). Free all roots as + * VPIDs are not tracked in the MMU role. + * + * Note, this operates on root_mmu, not guest_mmu, as L1 and L2 share + * an MMU when EPT is disabled. + * + * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR. + */ + if (!enable_ept) + kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, + KVM_MMU_ROOTS_ALL); + return nested_vmx_succeed(vcpu); }