Message ID | 1572648072-84536-14-git-send-email-suravee.suthikulpanit@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kvm: x86: Support AMD SVM AVIC w/ in-kernel irqchip mode | expand |
On 01/11/19 23:41, Suthikulpanit, Suravee wrote: > + /* > + * AMD SVM AVIC accelerates EOI write and does not trap. > + * This cause in-kernel PIT re-inject mode to fail > + * since it checks ps->irq_ack before kvm_set_irq() > + * and relies on the ack notifier to timely queue > + * the pt->worker work iterm and reinject the missed tick. > + * So, deactivate APICv when PIT is in reinject mode. > + */ > if (reinject) { > + kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ); > /* The initial state is preserved while ps->reinject == 0. */ > kvm_pit_reset_reinject(pit); > kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier); > kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier); > } else { > + kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ); > kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier); > kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier); This is not too nice for Intel which does support (through the EOI exit mask) APICv even if PIT reinjection active. We can work around it by adding a global mask of inhibit reasons that apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c. Then kvm_request_apicv_update can ignore reasons that the vendor doesn't care about. Paolo
Paolo, Thanks for quick response. On 11/2/19 4:57 AM, Paolo Bonzini wrote: > On 01/11/19 23:41, Suthikulpanit, Suravee wrote: >> + /* >> + * AMD SVM AVIC accelerates EOI write and does not trap. >> + * This cause in-kernel PIT re-inject mode to fail >> + * since it checks ps->irq_ack before kvm_set_irq() >> + * and relies on the ack notifier to timely queue >> + * the pt->worker work iterm and reinject the missed tick. >> + * So, deactivate APICv when PIT is in reinject mode. >> + */ >> if (reinject) { >> + kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ); >> /* The initial state is preserved while ps->reinject == 0. */ >> kvm_pit_reset_reinject(pit); >> kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier); >> kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier); >> } else { >> + kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ); >> kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier); >> kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier); > > This is not too nice for Intel which does support (through the EOI exit > mask) APICv even if PIT reinjection active. I see you point. > We can work around it by adding a global mask of inhibit reasons that > apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c. > > Then kvm_request_apicv_update can ignore reasons that the vendor doesn't > care about. > > Paolo > What about we enhance the pre_update_apivc_exec_ctrl() to also return success/fail. In here, the vendor specific code can decide to update APICv state or not. Thanks, Suravee
On 04/11/19 19:54, Suthikulpanit, Suravee wrote: > I see you point. > >> We can work around it by adding a global mask of inhibit reasons that >> apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c. >> >> Then kvm_request_apicv_update can ignore reasons that the vendor doesn't >> care about. > > What about we enhance the pre_update_apivc_exec_ctrl() to also return > success/fail. In here, the vendor specific code can decide to update > APICv state or not. That works for me, too. Something like return false for deactivate and true for activate. Paolo
On Sat, Nov 02, 2019 at 10:57:47AM +0100, Paolo Bonzini wrote: > On 01/11/19 23:41, Suthikulpanit, Suravee wrote: > > + /* > > + * AMD SVM AVIC accelerates EOI write and does not trap. > > + * This cause in-kernel PIT re-inject mode to fail > > + * since it checks ps->irq_ack before kvm_set_irq() > > + * and relies on the ack notifier to timely queue > > + * the pt->worker work iterm and reinject the missed tick. > > + * So, deactivate APICv when PIT is in reinject mode. > > + */ > > if (reinject) { > > + kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ); > > /* The initial state is preserved while ps->reinject == 0. */ > > kvm_pit_reset_reinject(pit); > > kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier); > > kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier); > > } else { > > + kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ); > > kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier); > > kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier); > > This is not too nice for Intel which does support (through the EOI exit > mask) APICv even if PIT reinjection active. Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when given a non-empty eoi_exit_bitmap, and enable it back on a clear eoi_exit_bitmap. This may remove the need to add special treatment to PIT etc. Thanks, Roman.
> Am 04.11.2019 um 22:50 schrieb Paolo Bonzini <pbonzini@redhat.com>: > > On 04/11/19 19:54, Suthikulpanit, Suravee wrote: >> I see you point. >> >>> We can work around it by adding a global mask of inhibit reasons that >>> apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c. >>> >>> Then kvm_request_apicv_update can ignore reasons that the vendor doesn't >>> care about. >> >> What about we enhance the pre_update_apivc_exec_ctrl() to also return >> success/fail. In here, the vendor specific code can decide to update >> APICv state or not. > > That works for me, too. Something like return false for deactivate and > true for activate. I'm trying to wrap my head around how that will work with live migration. Do we also need to save/restore the deactivate reasons? Alex > > Paolo Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
On Tue, Nov 05, 2019 at 07:05:57AM +0000, Graf (AWS), Alexander wrote: > > > > Am 04.11.2019 um 22:50 schrieb Paolo Bonzini <pbonzini@redhat.com>: > > > > On 04/11/19 19:54, Suthikulpanit, Suravee wrote: > >> I see you point. > >> > >>> We can work around it by adding a global mask of inhibit reasons that > >>> apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c. > >>> > >>> Then kvm_request_apicv_update can ignore reasons that the vendor doesn't > >>> care about. > >> > >> What about we enhance the pre_update_apivc_exec_ctrl() to also return > >> success/fail. In here, the vendor specific code can decide to update > >> APICv state or not. > > > > That works for me, too. Something like return false for deactivate and > > true for activate. > > I'm trying to wrap my head around how that will work with live > migration. Do we also need to save/restore the deactivate reasons? Nope, this is all invisible to userland. The target will deduce the deactivation reasons on its own from the user-visible setup like PIT configuration, Hyper-V SynIC, etc. Roman.
On 05/11/19 00:17, Roman Kagan wrote: >> This is not too nice for Intel which does support (through the EOI exit >> mask) APICv even if PIT reinjection active. > Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when > given a non-empty eoi_exit_bitmap, and enable it back on a clear > eoi_exit_bitmap. This may remove the need to add special treatment to > PIT etc. That is a very nice idea---we can make that a single disable reason, like APICV_DEACTIVATE_REASON_EOI, and Intel can simply never use it. Paolo
Roman/Paolo On 11/5/2019 4:47 PM, Paolo Bonzini wrote: > On 05/11/19 00:17, Roman Kagan wrote: >>> This is not too nice for Intel which does support (through the EOI exit >>> mask) APICv even if PIT reinjection active. >> Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when >> given a non-empty eoi_exit_bitmap, and enable it back on a clear >> eoi_exit_bitmap. This may remove the need to add special treatment to >> PIT etc. > > That is a very nice idea---we can make that a single disable reason, > like APICV_DEACTIVATE_REASON_EOI, and Intel can simply never use it. I took at look at the svm_load_eoi_exitmap() and it is called via: kvm_make_scan_ioapic_request() -> KVM_REQ_SCAN_IOAPIC -> vcpu_scan_ioapic() -> KVM_REQ_LOAD_EOI_EXITMAP -> vcpu_load_eoi_exitmap() The kvm_make_scan_ioapic_request() is called from multiple places: arch/x86/kvm/irq_comm.c: * kvm_arch_post_irq_routing_update() : Called from kvm_set_irq_routing() arch/x86/kvm/ioapic.c: * kvm_arch_post_irq_ack_notifier_list_update() : (Un)registering irq ack notifier * kvm_set_ioapic() : Setting ioapic irqchip * ioapic_mmio_write() -> ioapic_write_indirect() arch/x86/kvm/lapic.c: * recalculate_apic_map() Most calls would be from ioapic_mmio_write()->ioapic_write_indirect(). In case of AMD AVIC, the svm_load_e::vsoi_exitmap() is called several times, and requesting APICV (de)activate from here when the eoi_exit_bitmap is set/clear would introduce large overhead especially with SMP machine. So, for now, let's just disable APICv when in-kernel PIT is in reinject (delay) mode. I'll also add the logic to avoid unnecessary overhead for Intel. Thanks, Suravee
On Mon, Nov 11, 2019 at 11:37:53AM -0600, Suravee Suthikulpanit wrote: > On 11/5/2019 4:47 PM, Paolo Bonzini wrote: > > On 05/11/19 00:17, Roman Kagan wrote: > > > > This is not too nice for Intel which does support (through the EOI exit > > > > mask) APICv even if PIT reinjection active. > > > Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when > > > given a non-empty eoi_exit_bitmap, and enable it back on a clear > > > eoi_exit_bitmap. This may remove the need to add special treatment to > > > PIT etc. > > > > That is a very nice idea---we can make that a single disable reason, > > like APICV_DEACTIVATE_REASON_EOI, and Intel can simply never use it. > > I took at look at the svm_load_eoi_exitmap() and it is called via: > kvm_make_scan_ioapic_request() -> > KVM_REQ_SCAN_IOAPIC -> vcpu_scan_ioapic() -> > KVM_REQ_LOAD_EOI_EXITMAP -> vcpu_load_eoi_exitmap() > > The kvm_make_scan_ioapic_request() is called from multiple places: > > arch/x86/kvm/irq_comm.c: > * kvm_arch_post_irq_routing_update() : Called from kvm_set_irq_routing() > > arch/x86/kvm/ioapic.c: > * kvm_arch_post_irq_ack_notifier_list_update() : (Un)registering irq ack notifier > * kvm_set_ioapic() : Setting ioapic irqchip > * ioapic_mmio_write() -> ioapic_write_indirect() > > arch/x86/kvm/lapic.c: > * recalculate_apic_map() > > Most calls would be from ioapic_mmio_write()->ioapic_write_indirect(). > > In case of AMD AVIC, the svm_load_e::vsoi_exitmap() is called several times, and requesting > APICV (de)activate from here when the eoi_exit_bitmap is set/clear would introduce > large overhead especially with SMP machine. This doesn't look like a hot path, so I'm not sure it needs to be optimized for performance. Especially so since kvm_make_scan_ioapic_request does kvm_make_all_cpus_request which isn't particularly fast by definition, and I guess the extra overhead there won't be noticable. OTOH introducing extra code paths has its maintenance costs, so sticking the simple logic in svm_load_eoi_exitmap looks attractive. Just my 2c, Roman.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index fe61269..460f7a4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -858,6 +858,7 @@ enum kvm_irqchip_mode { #define APICV_DEACT_BIT_HYPERV 1 #define APICV_DEACT_BIT_NESTED 2 #define APICV_DEACT_BIT_IRQWIN 3 +#define APICV_DEACT_BIT_PIT_REINJ 4 struct kvm_arch { unsigned long n_used_mmu_pages; diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 4a6dc54..3f77fda 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -295,12 +295,22 @@ void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject) if (atomic_read(&ps->reinject) == reinject) return; + /* + * AMD SVM AVIC accelerates EOI write and does not trap. + * This cause in-kernel PIT re-inject mode to fail + * since it checks ps->irq_ack before kvm_set_irq() + * and relies on the ack notifier to timely queue + * the pt->worker work iterm and reinject the missed tick. + * So, deactivate APICv when PIT is in reinject mode. + */ if (reinject) { + kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ); /* The initial state is preserved while ps->reinject == 0. */ kvm_pit_reset_reinject(pit); kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier); kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier); } else { + kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ); kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier); kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier); } diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 0e7ff04..9812feb 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1679,7 +1679,13 @@ static int avic_update_access_page(struct kvm *kvm, bool activate) int ret = 0; mutex_lock(&kvm->slots_lock); - if (kvm->arch.apic_access_page_done == activate) + /* + * During kvm_destroy_vm(), kvm_pit_set_reinject() could trigger + * APICv mode change, which update APIC_ACCESS_PAGE_PRIVATE_MEMSLOT + * memory region. So, we need to ensure that kvm->mm == current->mm. + */ + if ((kvm->arch.apic_access_page_done == activate) || + (kvm->mm != current->mm)) goto out; ret = __x86_set_memory_region(kvm,
AMD SVM AVIC accelerates EOI write and does not trap. This causes in-kernel PIT re-injection mode to fail since it relies on irq-ack notifier mechanism. So, APICv is activated only when in-kernel PIT is in discard mode e.g. w/ qemu option: -global kvm-pit.lost_tick_policy=discard Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/i8254.c | 10 ++++++++++ arch/x86/kvm/svm.c | 8 +++++++- 3 files changed, 18 insertions(+), 1 deletion(-)