diff mbox

KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table

Message ID 1406736728-8516-1-git-send-email-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini July 30, 2014, 4:12 p.m. UTC
Currently, the EOI exit bitmap (used for APICv) does not include
interrupts that are masked.  However, this can cause a bug that manifests
as an interrupt storm inside the guest.  Alex Williamson reported the
bug and is the one who really debugged this; I only wrote the patch. :)

The scenario involves a multi-function PCI device with OHCI and EHCI
USB functions and an audio function, all assigned to the guest, where
both USB functions use legacy INTx interrupts.

As soon as the guest boots, interrupts for these devices turn into an
interrupt storm in the guest; the host does not see the interrupt storm.
Basically the EOI path does not work, and the guest continues to see the
interrupt over and over, even after it attempts to mask it at the APIC.
The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
with not many changes in the area of APIC/IOAPIC handling).

Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
on in the eoi_exit_bitmap and TMR, and things then work.  What happens
is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
It does not have set bit 59 because the RTE was masked, so the IOAPIC
never sees the EOI and the interrupt continues to fire in the guest.

Probably, the guest is masking the interrupt in the redirection table in
the interrupt routine, i.e. while the interrupt is set in a LAPIC's ISR.
The simplest fix is to ignore the masking state, we would rather have
an unnecessary exit rather than a missed IRQ ACK and anyway IOAPIC
interrupts are not as performance-sensitive as for example MSIs.

[Thanks to Alex for his precise description of the problem
 and initial debugging effort.  A lot of the text above is
 based on emails exchanged with him.]

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/ioapic.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Alex Williamson July 30, 2014, 4:51 p.m. UTC | #1
On Wed, 2014-07-30 at 18:12 +0200, Paolo Bonzini wrote:
> Currently, the EOI exit bitmap (used for APICv) does not include
> interrupts that are masked.  However, this can cause a bug that manifests
> as an interrupt storm inside the guest.  Alex Williamson reported the
> bug and is the one who really debugged this; I only wrote the patch. :)
> 
> The scenario involves a multi-function PCI device with OHCI and EHCI
> USB functions and an audio function, all assigned to the guest, where
> both USB functions use legacy INTx interrupts.
> 
> As soon as the guest boots, interrupts for these devices turn into an
> interrupt storm in the guest; the host does not see the interrupt storm.
> Basically the EOI path does not work, and the guest continues to see the
> interrupt over and over, even after it attempts to mask it at the APIC.
> The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
> with not many changes in the area of APIC/IOAPIC handling).
> 
> Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
> on in the eoi_exit_bitmap and TMR, and things then work.  What happens
> is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
> It does not have set bit 59 because the RTE was masked, so the IOAPIC
> never sees the EOI and the interrupt continues to fire in the guest.
> 
> Probably, the guest is masking the interrupt in the redirection table in
> the interrupt routine, i.e. while the interrupt is set in a LAPIC's ISR.
> The simplest fix is to ignore the masking state, we would rather have
> an unnecessary exit rather than a missed IRQ ACK and anyway IOAPIC
> interrupts are not as performance-sensitive as for example MSIs.
> 
> [Thanks to Alex for his precise description of the problem
>  and initial debugging effort.  A lot of the text above is
>  based on emails exchanged with him.]
> 
> Reported-by: Alex Williamson <alex.williamson@redhat.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks Paolo

Tested-by: Alex Williamson <alex.williamson@redhat.com>

> ---
>  virt/kvm/ioapic.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index 2458a1dc2ba9..e8ce34c9db32 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -254,10 +254,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
>  	spin_lock(&ioapic->lock);
>  	for (index = 0; index < IOAPIC_NUM_PINS; index++) {
>  		e = &ioapic->redirtbl[index];
> -		if (!e->fields.mask &&
> -			(e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
> -			 kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC,
> -				 index) || index == RTC_GSI)) {
> +		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
> +		    kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) ||
> +		    index == RTC_GSI) {
>  			if (kvm_apic_match_dest(vcpu, NULL, 0,
>  				e->fields.dest_id, e->fields.dest_mode)) {
>  				__set_bit(e->fields.vector,



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Aug. 6, 2014, 2:03 p.m. UTC | #2
Paolo Bonzini wrote on 2014-07-31:
> Currently, the EOI exit bitmap (used for APICv) does not include
> interrupts that are masked.  However, this can cause a bug that manifests
> as an interrupt storm inside the guest.  Alex Williamson reported the
> bug and is the one who really debugged this; I only wrote the patch. :)
> 
> The scenario involves a multi-function PCI device with OHCI and EHCI
> USB functions and an audio function, all assigned to the guest, where
> both USB functions use legacy INTx interrupts.
> 
> As soon as the guest boots, interrupts for these devices turn into an
> interrupt storm in the guest; the host does not see the interrupt storm.
> Basically the EOI path does not work, and the guest continues to see the
> interrupt over and over, even after it attempts to mask it at the APIC.
> The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
> with not many changes in the area of APIC/IOAPIC handling).
> 
> Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
> on in the eoi_exit_bitmap and TMR, and things then work.  What happens
> is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
> It does not have set bit 59 because the RTE was masked, so the IOAPIC
> never sees the EOI and the interrupt continues to fire in the guest.
> 
> Probably, the guest is masking the interrupt in the redirection table in
> the interrupt routine, i.e. while the interrupt is set in a LAPIC's ISR.
> The simplest fix is to ignore the masking state, we would rather have
> an unnecessary exit rather than a missed IRQ ACK and anyway IOAPIC
> interrupts are not as performance-sensitive as for example MSIs.

I feel this fixing may hurt performance in some cases. If the mask bit is set, this means the vector in this entry may be used by other devices(like a assigned device). But here you set it in eoi exit bitmap and this will cause vmexit on each EOI which should not happen.

> 
> [Thanks to Alex for his precise description of the problem
>  and initial debugging effort.  A lot of the text above is
>  based on emails exchanged with him.]
> 
> Reported-by: Alex Williamson <alex.williamson@redhat.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  virt/kvm/ioapic.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index 2458a1dc2ba9..e8ce34c9db32 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -254,10 +254,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
> u64 *eoi_exit_bitmap,
>  	spin_lock(&ioapic->lock);
>  	for (index = 0; index < IOAPIC_NUM_PINS; index++) {
>  		e = &ioapic->redirtbl[index];
> -		if (!e->fields.mask &&
> -			(e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
> -			 kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC,
> -				 index) || index == RTC_GSI)) {
> +		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
> +		    kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index)
> ||
> +		    index == RTC_GSI) {
>  			if (kvm_apic_match_dest(vcpu, NULL, 0,
>  				e->fields.dest_id, e->fields.dest_mode)) {
>  				__set_bit(e->fields.vector,

Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Aug. 6, 2014, 2:08 p.m. UTC | #3
Il 06/08/2014 16:03, Zhang, Yang Z ha scritto:
> Paolo Bonzini wrote on 2014-07-31:
>> Probably, the guest is masking the interrupt in the redirection table in
>> the interrupt routine, i.e. while the interrupt is set in a LAPIC's ISR.
>> The simplest fix is to ignore the masking state, we would rather have
>> an unnecessary exit rather than a missed IRQ ACK and anyway IOAPIC
>> interrupts are not as performance-sensitive as for example MSIs.
> 
> I feel this fixing may hurt performance in some cases. If the mask
> bit is set, this means the vector in this entry may be used by other
> devices(like a assigned device). But here you set it in eoi exit bitmap
> and this will cause vmexit on each EOI which should not happen.

Note that this *was* reported on an assigned device.

IOAPIC should not be a performance-sensitive path.  High-performance
assigned devices should be using MSIs.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Aug. 7, 2014, 1:31 a.m. UTC | #4
Paolo Bonzini wrote on 2014-08-06:
> Il 06/08/2014 16:03, Zhang, Yang Z ha scritto:
>> Paolo Bonzini wrote on 2014-07-31:
>>> Probably, the guest is masking the interrupt in the redirection
>>> table in the interrupt routine, i.e. while the interrupt is set in a LAPIC's ISR.
>>> The simplest fix is to ignore the masking state, we would rather
>>> have an unnecessary exit rather than a missed IRQ ACK and anyway
>>> IOAPIC interrupts are not as performance-sensitive as for example MSIs.
>> 
>> I feel this fixing may hurt performance in some cases. If the mask
>> bit is set, this means the vector in this entry may be used by other
>> devices(like a assigned device). But here you set it in eoi exit
>> bitmap and this will cause vmexit on each EOI which should not happen.
> 
> Note that this *was* reported on an assigned device.
> 
> IOAPIC should not be a performance-sensitive path.  High-performance
> assigned devices should be using MSIs.

Let me give an example to see whether my concern is a real problem:
Guest allocates a vector and set it in IOAPIC entry to deliver interrupt. Later it masks the IOAPIC entry(means stop the corresponding device) and assign this vector to a MSI device. With this patch, even the vector is not used by IOAPIC, but it still set eoi exit bitmap unconditionally. The subsequent EOIs to MSI device will force vmexit. Could this happen?

I think the right fixing is to check the ISR plus TMR to construct the eoi exit bitmap.

> 
> Paolo


Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Aug. 7, 2014, 6:13 a.m. UTC | #5
Il 07/08/2014 03:31, Zhang, Yang Z ha scritto:
> Let me give an example to see whether my concern is a real problem: 
> Guest allocates a vector and set it in IOAPIC entry to deliver
> interrupt. Later it masks the IOAPIC entry(means stop the
> corresponding device) and assign this vector to a MSI device. With
> this patch, even the vector is not used by IOAPIC, but it still set
> eoi exit bitmap unconditionally. The subsequent EOIs to MSI device
> will force vmexit. Could this happen?

Yes, I guess it could.  I'm not sure whether it could on Linux or Windows.

> I think the right fixing is to check the ISR plus TMR to construct
> the eoi exit bitmap.

Do you care enough to propose a patch? :)

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Aug. 7, 2014, 6:17 a.m. UTC | #6
Paolo Bonzini wrote on 2014-08-07:
> Il 07/08/2014 03:31, Zhang, Yang Z ha scritto:
>> Let me give an example to see whether my concern is a real problem:
>> Guest allocates a vector and set it in IOAPIC entry to deliver
>> interrupt. Later it masks the IOAPIC entry(means stop the
>> corresponding device) and assign this vector to a MSI device. With
>> this patch, even the vector is not used by IOAPIC, but it still set
>> eoi exit bitmap unconditionally. The subsequent EOIs to MSI device
>> will force vmexit. Could this happen?
> 
> Yes, I guess it could.  I'm not sure whether it could on Linux or Windows.
> 
>> I think the right fixing is to check the ISR plus TMR to construct
>> the eoi exit bitmap.
> 
> Do you care enough to propose a patch? :)
> 

Sure. I will do it.

> Paolo


Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 2458a1dc2ba9..e8ce34c9db32 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -254,10 +254,9 @@  void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
 	spin_lock(&ioapic->lock);
 	for (index = 0; index < IOAPIC_NUM_PINS; index++) {
 		e = &ioapic->redirtbl[index];
-		if (!e->fields.mask &&
-			(e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
-			 kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC,
-				 index) || index == RTC_GSI)) {
+		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
+		    kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) ||
+		    index == RTC_GSI) {
 			if (kvm_apic_match_dest(vcpu, NULL, 0,
 				e->fields.dest_id, e->fields.dest_mode)) {
 				__set_bit(e->fields.vector,