Message ID | 20221001005915.2041642-4-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: AVIC and local APIC fixes+cleanups | expand |
On Wed, 2022-12-07 at 18:02 +0200, Maxim Levitsky wrote: On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote: > Flush the TLB when activating AVIC as the CPU can insert into the TLB > while AVIC is "locally" disabled. KVM doesn't treat "APIC hardware > disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC > hardware disabled, AVIC is not guaranteed to be inhibited. As a result, > KVM may create a valid NPT mapping for the APIC base, which the CPU can > cache as a non-AVIC translation. > > Note, Intel handles this in vmx_set_virtual_apic_mode(). > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/svm/avic.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > index 6919dee69f18..712330b80891 100644 > --- a/arch/x86/kvm/svm/avic.c > +++ b/arch/x86/kvm/svm/avic.c > @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm) > /* Disabling MSR intercept for x2APIC registers */ > svm_set_x2apic_msr_interception(svm, false); > } else { > + /* > + * Flush the TLB, the guest may have inserted a non-APIC > + * mapping into the TLB while AVIC was disabled. > + */ > + kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu); > + > /* For xAVIC and hybrid-xAVIC modes */ > vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID; > /* Enabling MSR intercept for x2APIC registers */ I agree, that if guest disables APIC on a vCPU, this will lead to call to kvm_apic_update_apicv which will disable AVIC, but if other vCPUs don't disable it, the AVIC's private memslot will still be mapped and guest could read/write it from this vCPU, and its TLB mapping needs to be invalidated if/when APIC is re-enabled. However I think that this adds an unnecessarily (at least in the future) performance penalty to AVIC nesting coexistence: L1's AVIC is inhibited on each nested VM entry, and uninhibited on each nested VM exit, but while nested the guest can't really access it as it has its own NPT. With this patch KVM will invalidate L1's TLB on each nested VM exit. KVM sadly already does this but this can be fixed (its another thing on my TODO list) Note that APICv doesn't have this issue, it is not inhibited on nested VM entry/exit, thus this code is not performance sensitive for APICv. I somewhat vote again, as I said before to disable the APICv/AVIC memslot, if any of vCPUs have APICv/AVIC hardware disabled, because it is also more correct from an x86 perspective. I do wonder how often is the usage of having "extra" cpus but not using them, and thus having their APIC in disabled state. KVM does support adding new vCPUs on the fly, so this shouldn't be needed, and APICv inhibit in this case is just a perf regression. Or at least do this only when APIC does back from hardware disabled state to enabled. Best regards, Maxim Levitsky
On Wed, 2022-12-07 at 18:02 +0200, Maxim Levitsky wrote: On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote: > Flush the TLB when activating AVIC as the CPU can insert into the TLB > while AVIC is "locally" disabled. KVM doesn't treat "APIC hardware > disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC > hardware disabled, AVIC is not guaranteed to be inhibited. As a result, > KVM may create a valid NPT mapping for the APIC base, which the CPU can > cache as a non-AVIC translation. > > Note, Intel handles this in vmx_set_virtual_apic_mode(). > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/svm/avic.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > index 6919dee69f18..712330b80891 100644 > --- a/arch/x86/kvm/svm/avic.c > +++ b/arch/x86/kvm/svm/avic.c > @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm) > /* Disabling MSR intercept for x2APIC registers */ > svm_set_x2apic_msr_interception(svm, false); > } else { > + /* > + * Flush the TLB, the guest may have inserted a non-APIC > + * mapping into the TLB while AVIC was disabled. > + */ > + kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu); > + > /* For xAVIC and hybrid-xAVIC modes */ > vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID; > /* Enabling MSR intercept for x2APIC registers */ I agree, that if guest disables APIC on a vCPU, this will lead to call to kvm_apic_update_apicv which will disable AVIC, but if other vCPUs don't disable it, the AVIC's private memslot will still be mapped and guest could read/write it from this vCPU, and its TLB mapping needs to be invalidated if/when APIC is re-enabled. However I think that this adds an unnecessarily (at least in the future) performance penalty to AVIC nesting coexistence: L1's AVIC is inhibited on each nested VM entry, and uninhibited on each nested VM exit, but while nested the guest can't really access it as it has its own NPT. With this patch KVM will invalidate L1's TLB on each nested VM exit. KVM sadly already does this but this can be fixed (its another thing on my TODO list) Note that APICv doesn't have this issue, it is not inhibited on nested VM entry/exit, thus this code is not performance sensitive for APICv. I somewhat vote again, as I said before to disable the APICv/AVIC memslot, if any of vCPUs have APICv/AVIC hardware disabled, because it is also more correct from an x86 perspective. I do wonder how often is the usage of having "extra" cpus but not using them, and thus having their APIC in disabled state. KVM does support adding new vCPUs on the fly, so this shouldn't be needed, and APICv inhibit in this case is just a perf regression. Or at least do this only when APIC does back from hardware disabled state to enabled. Best regards, Maxim Levitsky
On Thu, Dec 08, 2022, Maxim Levitsky wrote: > On Wed, 2022-12-07 at 18:02 +0200, Maxim Levitsky wrote: > On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote: > > --- a/arch/x86/kvm/svm/avic.c > > +++ b/arch/x86/kvm/svm/avic.c > > @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm) > > /* Disabling MSR intercept for x2APIC registers */ > > svm_set_x2apic_msr_interception(svm, false); > > } else { > > + /* > > + * Flush the TLB, the guest may have inserted a non-APIC > > + * mapping into the TLB while AVIC was disabled. > > + */ > > + kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu); > > + > > /* For xAVIC and hybrid-xAVIC modes */ > > vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID; > > /* Enabling MSR intercept for x2APIC registers */ > > > I agree, that if guest disables APIC on a vCPU, this will lead to call to > kvm_apic_update_apicv which will disable AVIC, but if other vCPUs don't > disable it, the AVIC's private memslot will still be mapped and guest could > read/write it from this vCPU, and its TLB mapping needs to be invalidated > if/when APIC is re-enabled. > > However I think that this adds an unnecessarily (at least in the future) > performance penalty to AVIC nesting coexistence: > > L1's AVIC is inhibited on each nested VM entry, and uninhibited on each > nested VM exit, but while nested the guest can't really access it as it has > its own NPT. > > With this patch KVM will invalidate L1's TLB on each nested VM exit. KVM > sadly already does this but this can be fixed (its another thing on my TODO > list) > > Note that APICv doesn't have this issue, it is not inhibited on nested VM > entry/exit, thus this code is not performance sensitive for APICv. > > > I somewhat vote again, as I said before to disable the APICv/AVIC memslot, if > any of vCPUs have APICv/AVIC hardware disabled, because it is also more > correct from an x86 perspective. I do wonder how often is the usage of having > "extra" cpus but not using them, and thus having their APIC in disabled > state. There are legimate scenarios where a kernel might want to disable the APIC on select CPUs, e.g. to offline SMT siblings in BIOS. Whether or not doing that in a VM makes sense is debatable, but we really have no way of knowing if there are existing guests that selectively disable APICs. > KVM does support adding new vCPUs on the fly, so this shouldn't be needed, > and APICv inhibit in this case is just a perf regression. Heh, "just" a perf regression. Inhibiting APICv would definitely be a perf regression that people care about, e.g. see the very recent bug fixes: https://lore.kernel.org/all/20221116205123.18737-1-gedwards@ddn.com https://lore.kernel.org/all/1669984574-32692-1-git-send-email-yuanzhaoxiong@baidu.com Conceptually, I like the idea of inhibiting the APICv memslot if a vCPU has its APIC hardware disabled. But practically speaking, because KVM has allowed that scenario for years, I don't think we should introduce such an inhibit and risk regressing guests. > Or at least do this only when APIC does back from hardware disabled state to > enabled. I have no objection to fine tuning this in follow-up, but for this bug fix I'd much prefer to go with this minimal change. The nested SVM TLB flushing issue extends far beyond this one case, i.e. needs a non-trivial overhaul and an audit of pretty every piece of SVM code that can interact with TLBs.
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 6919dee69f18..712330b80891 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm) /* Disabling MSR intercept for x2APIC registers */ svm_set_x2apic_msr_interception(svm, false); } else { + /* + * Flush the TLB, the guest may have inserted a non-APIC + * mapping into the TLB while AVIC was disabled. + */ + kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu); + /* For xAVIC and hybrid-xAVIC modes */ vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID; /* Enabling MSR intercept for x2APIC registers */