Message ID | 20240215160136.1256084-4-alejandro.j.jimenez@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Export APICv-related state via binary stats interface | expand |
>diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c >index 4b74ea91f4e6..853cafe4a9af 100644 >--- a/arch/x86/kvm/svm/avic.c >+++ b/arch/x86/kvm/svm/avic.c >@@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag) > * bit in the vAPIC backing page. So, we just need to schedule > * in the vcpu. > */ >- if (vcpu) >+ if (vcpu) { > kvm_vcpu_wake_up(vcpu); >+ ++vcpu->stat.ga_log_event; >+ } > I am not sure why this is added for SVM only. it looks to me GALog events are similar to Intel IOMMU's wakeup events. Can we have a general name? maybe iommu_wakeup_event and increase the counter after the kvm_vcpu_wake_up() call in pi_wakeup_handler().
On 4/9/24 02:45, Chao Gao wrote: >> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c >> index 4b74ea91f4e6..853cafe4a9af 100644 >> --- a/arch/x86/kvm/svm/avic.c >> +++ b/arch/x86/kvm/svm/avic.c >> @@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag) >> * bit in the vAPIC backing page. So, we just need to schedule >> * in the vcpu. >> */ >> - if (vcpu) >> + if (vcpu) { >> kvm_vcpu_wake_up(vcpu); >> + ++vcpu->stat.ga_log_event; >> + } >> > > I am not sure why this is added for SVM only. I am mostly familiar with AVIC, and much less so with VMX's PI, so this is why I am likely missing potential stats that could be useful to expose from the VMX side. I'll be glad to implement any other suggestions you have. it looks to me GALog events are > similar to Intel IOMMU's wakeup events. Can we have a general name? maybe > iommu_wakeup_event I believe that after: d588bb9be1da ("KVM: VMX: enable IPI virtualization") both the VT-d PI and the virtualized IPIs code paths will use POSTED_INTR_WAKEUP_VECTOR for interrupts targeting a blocked vCPU. So on Intel hosts enabling IPI virtualization, a counter incremented in pi_wakeup_handler() would record interrupts from both virtualized IPIs and VT-d sources. I don't think it is correct to generalize this counter since AMD's implementation is different; when a blocked vCPU is targeted: - by device interrupts, it uses the GA Log mechanism - by an IPI, it generates an AVIC_INCOMPLETE_IPI #VMEXIT If the reasoning above is correct, we can add a VMX specific counter (vmx_pi_wakeup_event?) that is increased in pi_wakeup_handler() as you suggest, and document the difference in behavior so that is not confused as equivalent with the ga_log_event counter. An alternative if we'd like to have a common 'iommu_wakeup_event' is to add filtering on pi_wakeup_handler() and only increment the common counter if IPI virtualization is not enabled (i.e. !vmx_can_use_ipiv()), in which case we'd only handle device interrupts and it becomes the parallel case to GA Log events. That leaves us with a VMX-specific counter (vmx_pi_wakeup_event) which provides no definition between interrupt sources when IPI virtualization is enabled, or when disabled we have a common/generic counter (iommu_wakeup_event) that applies to both vendors. Please let me know if you agree with this approach or have other suggestions. Thank you, Alejandro > > and increase the counter after the kvm_vcpu_wake_up() call in pi_wakeup_handler().
On Tue, Apr 09, 2024 at 09:31:45PM -0400, Alejandro Jimenez wrote: > >On 4/9/24 02:45, Chao Gao wrote: >> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c >> > index 4b74ea91f4e6..853cafe4a9af 100644 >> > --- a/arch/x86/kvm/svm/avic.c >> > +++ b/arch/x86/kvm/svm/avic.c >> > @@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag) >> > * bit in the vAPIC backing page. So, we just need to schedule >> > * in the vcpu. >> > */ >> > - if (vcpu) >> > + if (vcpu) { >> > kvm_vcpu_wake_up(vcpu); >> > + ++vcpu->stat.ga_log_event; >> > + } >> > >> >> I am not sure why this is added for SVM only. > >I am mostly familiar with AVIC, and much less so with VMX's PI, so this is >why I am likely missing potential stats that could be useful to expose from >the VMX side. I'll be glad to implement any other suggestions you have. > > >it looks to me GALog events are >> similar to Intel IOMMU's wakeup events. Can we have a general name? maybe >> iommu_wakeup_event > >I believe that after: >d588bb9be1da ("KVM: VMX: enable IPI virtualization") > >both the VT-d PI and the virtualized IPIs code paths will use POSTED_INTR_WAKEUP_VECTOR >for interrupts targeting a blocked vCPU. So on Intel hosts enabling IPI virtualization, >a counter incremented in pi_wakeup_handler() would record interrupts from both virtualized >IPIs and VT-d sources. > >I don't think it is correct to generalize this counter since AMD's implementation is >different; when a blocked vCPU is targeted: > >- by device interrupts, it uses the GA Log mechanism >- by an IPI, it generates an AVIC_INCOMPLETE_IPI #VMEXIT > >If the reasoning above is correct, we can add a VMX specific counter (vmx_pi_wakeup_event?) >that is increased in pi_wakeup_handler() as you suggest, and document the difference >in behavior so that is not confused as equivalent with the ga_log_event counter. Correct. If we cannot generalize the counter, I think it is ok to add the counter for SVM only. Thank you for the clarification.
On Fri, Apr 12, 2024, Chao Gao wrote: > On Tue, Apr 09, 2024 at 09:31:45PM -0400, Alejandro Jimenez wrote: > > > >On 4/9/24 02:45, Chao Gao wrote: > >> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > >> > index 4b74ea91f4e6..853cafe4a9af 100644 > >> > --- a/arch/x86/kvm/svm/avic.c > >> > +++ b/arch/x86/kvm/svm/avic.c > >> > @@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag) > >> > * bit in the vAPIC backing page. So, we just need to schedule > >> > * in the vcpu. > >> > */ > >> > - if (vcpu) > >> > + if (vcpu) { > >> > kvm_vcpu_wake_up(vcpu); > >> > + ++vcpu->stat.ga_log_event; > >> > + } > >> > > >> > >> I am not sure why this is added for SVM only. > > > >I am mostly familiar with AVIC, and much less so with VMX's PI, so this is > >why I am likely missing potential stats that could be useful to expose from > >the VMX side. I'll be glad to implement any other suggestions you have. > > > > > >it looks to me GALog events are > >> similar to Intel IOMMU's wakeup events. Can we have a general name? maybe > >> iommu_wakeup_event > > > >I believe that after: > >d588bb9be1da ("KVM: VMX: enable IPI virtualization") > > > >both the VT-d PI and the virtualized IPIs code paths will use POSTED_INTR_WAKEUP_VECTOR > >for interrupts targeting a blocked vCPU. So on Intel hosts enabling IPI virtualization, > >a counter incremented in pi_wakeup_handler() would record interrupts from both virtualized > >IPIs and VT-d sources. > > > >I don't think it is correct to generalize this counter since AMD's implementation is > >different; when a blocked vCPU is targeted: > > > >- by device interrupts, it uses the GA Log mechanism > >- by an IPI, it generates an AVIC_INCOMPLETE_IPI #VMEXIT > > > >If the reasoning above is correct, we can add a VMX specific counter (vmx_pi_wakeup_event?) > >that is increased in pi_wakeup_handler() as you suggest, and document the difference > >in behavior so that is not confused as equivalent with the ga_log_event counter. > > Correct. If we cannot generalize the counter, I think it is ok to > add the counter for SVM only. Thank you for the clarification. There's already a generic stat, halt_wakeup, that more or less covers this case. And despite what the comment says, avic_ga_log_notifier() does NOT schedule in the task, kvm_vcpu_wake_up() only wakes up blocking vCPUs, no more, no less. I'm also not at all convinced that KVM needs to differentiate between IPIs and device interrupts that arrive when the vCPU isn't in the guest. E.g. this can kinda sorta be used to confirm IRQ affinity, but if the vCPU is happily running in the guest, such a heuristic will get false negatives. And for confirming that GA logging is working, that's more or less covered by the proposed APICv stat. If AVIC is enabled, the VM has assigned devices, and GA logging *isn't* working, then you'll probably find out quite quickly because the VM will have a lot of missed interrupts, e.g. vCPUs will get stuck in HLT.
On 4/16/24 14:35, Sean Christopherson wrote: > On Fri, Apr 12, 2024, Chao Gao wrote: >> On Tue, Apr 09, 2024 at 09:31:45PM -0400, Alejandro Jimenez wrote: >>> >>> On 4/9/24 02:45, Chao Gao wrote: >>>>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c >>>>> index 4b74ea91f4e6..853cafe4a9af 100644 >>>>> --- a/arch/x86/kvm/svm/avic.c >>>>> +++ b/arch/x86/kvm/svm/avic.c >>>>> @@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag) >>>>> * bit in the vAPIC backing page. So, we just need to schedule >>>>> * in the vcpu. >>>>> */ >>>>> - if (vcpu) >>>>> + if (vcpu) { >>>>> kvm_vcpu_wake_up(vcpu); >>>>> + ++vcpu->stat.ga_log_event; >>>>> + } >>>>> >>>> >>>> I am not sure why this is added for SVM only. >>> >>> I am mostly familiar with AVIC, and much less so with VMX's PI, so this is >>> why I am likely missing potential stats that could be useful to expose from >>> the VMX side. I'll be glad to implement any other suggestions you have. >>> >>> >>> it looks to me GALog events are >>>> similar to Intel IOMMU's wakeup events. Can we have a general name? maybe >>>> iommu_wakeup_event >>> >>> I believe that after: >>> d588bb9be1da ("KVM: VMX: enable IPI virtualization") >>> >>> both the VT-d PI and the virtualized IPIs code paths will use POSTED_INTR_WAKEUP_VECTOR >>> for interrupts targeting a blocked vCPU. So on Intel hosts enabling IPI virtualization, >>> a counter incremented in pi_wakeup_handler() would record interrupts from both virtualized >>> IPIs and VT-d sources. >>> >>> I don't think it is correct to generalize this counter since AMD's implementation is >>> different; when a blocked vCPU is targeted: >>> >>> - by device interrupts, it uses the GA Log mechanism >>> - by an IPI, it generates an AVIC_INCOMPLETE_IPI #VMEXIT >>> >>> If the reasoning above is correct, we can add a VMX specific counter (vmx_pi_wakeup_event?) >>> that is increased in pi_wakeup_handler() as you suggest, and document the difference >>> in behavior so that is not confused as equivalent with the ga_log_event counter. >> >> Correct. If we cannot generalize the counter, I think it is ok to >> add the counter for SVM only. Thank you for the clarification. > > There's already a generic stat, halt_wakeup, that more or less covers this case. I don't think we can extrapolate PI-originated wake ups from halt_wakeup, since it can/will also be triggered with APICv/AVIC disabled. > And despite what the comment says, avic_ga_log_notifier() does NOT schedule in > the task, kvm_vcpu_wake_up() only wakes up blocking vCPUs, no more, no less. True, both the GA log and the PI wake up handler just call kvm_vcpu_wake_up(). > > I'm also not at all convinced that KVM needs to differentiate between IPIs and > device interrupts that arrive when the vCPU isn't in the guest. E.g. this can > kinda sorta be used to confirm IRQ affinity, but if the vCPU is happily running > in the guest, such a heuristic will get false negatives. > > And for confirming that GA logging is working, that's more or less covered by the > proposed APICv stat. If AVIC is enabled, the VM has assigned devices, and GA logging > *isn't* working, then you'll probably find out quite quickly because the VM will > have a lot of missed interrupts, e.g. vCPUs will get stuck in HLT. ACK, if the device interrupts are not being handled correctly there will be lots of complaints during device initialization as we have seen before. There is one scenario in which you can have APICv/AVIC enabled but only doing the IPI acceleration, while device interrupts are still using the legacy path. It requires booting the host kernel with 'amd_iommu_intr=legacy'(AMD) or with 'intremap=nopost'(Intel), so that is a special case since you must explicitly request the behavior. In short, I typically use the GA Log tracepoint to confirm IOMMU AVIC is working as expected, so I wanted to provide the equivalent via the stats. If we want to have a common stat, we could have a pi_wakeup stat that is incremented both in ga_log and vmx_pi_wakeup_event, but I do understand that is not strictly necessary, specially if we want to be conservative with the number of stats. Thank you, Alejandro
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b6f18084d504..74e08b57f2e0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1565,6 +1565,7 @@ struct kvm_vcpu_stat { u64 guest_mode; u64 notify_window_exits; u64 apicv_accept_irq; + u64 ga_log_event; }; struct x86_instruction_info; diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 4b74ea91f4e6..853cafe4a9af 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag) * bit in the vAPIC backing page. So, we just need to schedule * in the vcpu. */ - if (vcpu) + if (vcpu) { kvm_vcpu_wake_up(vcpu); + ++vcpu->stat.ga_log_event; + } return 0; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2ad70cf6e52c..6a1df29ae650 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -305,6 +305,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { STATS_DESC_IBOOLEAN(VCPU, guest_mode), STATS_DESC_COUNTER(VCPU, notify_window_exits), STATS_DESC_COUNTER(VCPU, apicv_accept_irq), + STATS_DESC_COUNTER(VCPU, ga_log_event), }; const struct kvm_stats_header kvm_vcpu_stats_header = {
Export a per-vCPU binary stat counting GALog events received. Such events are specific to IOMMU AVIC, and their presence can be used to confirm that this capability is active. Also, exporting this statistic is useful to assert that device interrupts are being sent to specific vCPUs, confirming IRQ affinity settings. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm/avic.c | 4 +++- arch/x86/kvm/x86.c | 1 + 3 files changed, 5 insertions(+), 1 deletion(-)