diff mbox series

[v4,03/32] KVM: SVM: Flush the "current" TLB when activating AVIC

Message ID 20221001005915.2041642-4-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: AVIC and local APIC fixes+cleanups | expand

Commit Message

Sean Christopherson Oct. 1, 2022, 12:58 a.m. UTC
Flush the TLB when activating AVIC as the CPU can insert into the TLB
while AVIC is "locally" disabled.  KVM doesn't treat "APIC hardware
disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC
hardware disabled, AVIC is not guaranteed to be inhibited.  As a result,
KVM may create a valid NPT mapping for the APIC base, which the CPU can
cache as a non-AVIC translation.

Note, Intel handles this in vmx_set_virtual_apic_mode().

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Maxim Levitsky Dec. 8, 2022, 9:52 p.m. UTC | #1
On Wed, 2022-12-07 at 18:02 +0200, Maxim Levitsky wrote:
On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Flush the TLB when activating AVIC as the CPU can insert into the TLB
> while AVIC is "locally" disabled.  KVM doesn't treat "APIC hardware
> disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC
> hardware disabled, AVIC is not guaranteed to be inhibited.  As a result,
> KVM may create a valid NPT mapping for the APIC base, which the CPU can
> cache as a non-AVIC translation.
> 
> Note, Intel handles this in vmx_set_virtual_apic_mode().
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 6919dee69f18..712330b80891 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
>                 /* Disabling MSR intercept for x2APIC registers */
>                 svm_set_x2apic_msr_interception(svm, false);
>         } else {
> +               /*
> +                * Flush the TLB, the guest may have inserted a non-APIC
> +                * mapping into the TLB while AVIC was disabled.
> +                */
> +               kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
> +
>                 /* For xAVIC and hybrid-xAVIC modes */
>                 vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
>                 /* Enabling MSR intercept for x2APIC registers */


I agree, that if guest disables APIC on a vCPU, this will lead to call to kvm_apic_update_apicv which will
disable AVIC, but if other vCPUs don't disable it, the AVIC's private memslot will still be mapped and
guest could read/write it from this vCPU, and its TLB mapping needs to be invalidated if/when APIC is re-enabled.
 
However I think that this adds an unnecessarily (at least in the future) performance penalty to AVIC nesting coexistence:
 
L1's AVIC is inhibited on each nested VM entry, and uninhibited on each nested VM exit, but while nested the guest
can't really access it as it has its own NPT.
 
With this patch KVM will invalidate L1's TLB on each nested VM exit. KVM sadly already does this but this can be fixed
(its another thing on my TODO list)
 
Note that APICv doesn't have this issue, it is not inhibited on nested VM entry/exit, thus this code is not performance
sensitive for APICv.
 
 
I somewhat vote again, as I said before to disable the APICv/AVIC memslot, if any of vCPUs have APICv/AVIC hardware disabled,
because it is also more correct from an x86 perspective. I do wonder how often is the usage of having "extra" cpus but not using them, and thus having their APIC in disabled state.
KVM does support adding new vCPUs on the fly, so this shouldn't be needed, and APICv inhibit in this case is just a perf regression.

Or at least do this only when APIC does back from hardware disabled state to enabled.

Best regards,
	Maxim Levitsky
Maxim Levitsky Dec. 8, 2022, 10:02 p.m. UTC | #2
On Wed, 2022-12-07 at 18:02 +0200, Maxim Levitsky wrote:
On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> Flush the TLB when activating AVIC as the CPU can insert into the TLB
> while AVIC is "locally" disabled.  KVM doesn't treat "APIC hardware
> disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC
> hardware disabled, AVIC is not guaranteed to be inhibited.  As a result,
> KVM may create a valid NPT mapping for the APIC base, which the CPU can
> cache as a non-AVIC translation.
> 
> Note, Intel handles this in vmx_set_virtual_apic_mode().
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 6919dee69f18..712330b80891 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
>                 /* Disabling MSR intercept for x2APIC registers */
>                 svm_set_x2apic_msr_interception(svm, false);
>         } else {
> +               /*
> +                * Flush the TLB, the guest may have inserted a non-APIC
> +                * mapping into the TLB while AVIC was disabled.
> +                */
> +               kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
> +
>                 /* For xAVIC and hybrid-xAVIC modes */
>                 vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
>                 /* Enabling MSR intercept for x2APIC registers */

I agree, that if guest disables APIC on a vCPU, this will lead to call to kvm_apic_update_apicv which will
disable AVIC, but if other vCPUs don't disable it, the AVIC's private memslot will still be mapped and
guest could read/write it from this vCPU, and its TLB mapping needs to be invalidated if/when APIC is re-enabled.
 
However I think that this adds an unnecessarily (at least in the future) performance penalty to AVIC nesting coexistence:
 
L1's AVIC is inhibited on each nested VM entry, and uninhibited on each nested VM exit, but while nested the guest
can't really access it as it has its own NPT.
 
With this patch KVM will invalidate L1's TLB on each nested VM exit. KVM sadly already does this but this can be fixed
(its another thing on my TODO list)
 
Note that APICv doesn't have this issue, it is not inhibited on nested VM entry/exit, thus this code is not performance
sensitive for APICv.
 
 
I somewhat vote again, as I said before to disable the APICv/AVIC memslot, if any of vCPUs have APICv/AVIC hardware disabled,
because it is also more correct from an x86 perspective. I do wonder how often is the usage of having "extra" cpus but not using them, and thus having their APIC in disabled state.
KVM does support adding new vCPUs on the fly, so this shouldn't be needed, and APICv inhibit in this case is just a perf regression.

Or at least do this only when APIC does back from hardware disabled state to enabled.

Best regards,
	Maxim Levitsky
Sean Christopherson Dec. 9, 2022, 12:40 a.m. UTC | #3
On Thu, Dec 08, 2022, Maxim Levitsky wrote:
> On Wed, 2022-12-07 at 18:02 +0200, Maxim Levitsky wrote:
> On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote:
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
> >                 /* Disabling MSR intercept for x2APIC registers */
> >                 svm_set_x2apic_msr_interception(svm, false);
> >         } else {
> > +               /*
> > +                * Flush the TLB, the guest may have inserted a non-APIC
> > +                * mapping into the TLB while AVIC was disabled.
> > +                */
> > +               kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
> > +
> >                 /* For xAVIC and hybrid-xAVIC modes */
> >                 vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
> >                 /* Enabling MSR intercept for x2APIC registers */
> 
> 
> I agree, that if guest disables APIC on a vCPU, this will lead to call to
> kvm_apic_update_apicv which will disable AVIC, but if other vCPUs don't
> disable it, the AVIC's private memslot will still be mapped and guest could
> read/write it from this vCPU, and its TLB mapping needs to be invalidated
> if/when APIC is re-enabled.
>  
> However I think that this adds an unnecessarily (at least in the future)
> performance penalty to AVIC nesting coexistence:
>  
> L1's AVIC is inhibited on each nested VM entry, and uninhibited on each
> nested VM exit, but while nested the guest can't really access it as it has
> its own NPT.
>  
> With this patch KVM will invalidate L1's TLB on each nested VM exit. KVM
> sadly already does this but this can be fixed (its another thing on my TODO
> list)
>  
> Note that APICv doesn't have this issue, it is not inhibited on nested VM
> entry/exit, thus this code is not performance sensitive for APICv.
>  
>  
> I somewhat vote again, as I said before to disable the APICv/AVIC memslot, if
> any of vCPUs have APICv/AVIC hardware disabled, because it is also more
> correct from an x86 perspective. I do wonder how often is the usage of having
> "extra" cpus but not using them, and thus having their APIC in disabled
> state.

There are legimate scenarios where a kernel might want to disable the APIC on
select CPUs, e.g. to offline SMT siblings in BIOS.  Whether or not doing that in
a VM makes sense is debatable, but we really have no way of knowing if there are
existing guests that selectively disable APICs.

> KVM does support adding new vCPUs on the fly, so this shouldn't be needed,
> and APICv inhibit in this case is just a perf regression.

Heh, "just" a perf regression.  Inhibiting APICv would definitely be a perf regression
that people care about, e.g. see the very recent bug fixes:

https://lore.kernel.org/all/20221116205123.18737-1-gedwards@ddn.com
https://lore.kernel.org/all/1669984574-32692-1-git-send-email-yuanzhaoxiong@baidu.com

Conceptually, I like the idea of inhibiting the APICv memslot if a vCPU has its
APIC hardware disabled.  But practically speaking, because KVM has allowed that
scenario for years, I don't think we should introduce such an inhibit and risk
regressing guests.

> Or at least do this only when APIC does back from hardware disabled state to
> enabled.

I have no objection to fine tuning this in follow-up, but for this bug fix I'd
much prefer to go with this minimal change.  The nested SVM TLB flushing issue
extends far beyond this one case, i.e. needs a non-trivial overhaul and an audit
of pretty every piece of SVM code that can interact with TLBs.
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 6919dee69f18..712330b80891 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -86,6 +86,12 @@  static void avic_activate_vmcb(struct vcpu_svm *svm)
 		/* Disabling MSR intercept for x2APIC registers */
 		svm_set_x2apic_msr_interception(svm, false);
 	} else {
+		/*
+		 * Flush the TLB, the guest may have inserted a non-APIC
+		 * mapping into the TLB while AVIC was disabled.
+		 */
+		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
+
 		/* For xAVIC and hybrid-xAVIC modes */
 		vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
 		/* Enabling MSR intercept for x2APIC registers */