diff mbox

[1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race

Message ID 20171105135233.34572-2-nikita.leshchenko@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nikita Leshenko Nov. 5, 2017, 1:52 p.m. UTC
KVM uses ioapic_handled_vectors to track vectors that need to notify the
IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an
interrupt with old configuration is pending or running and
ioapic_handled_vectors only remembers the newest configuration;
thus EOI from the old interrupt is not delievered to the IOAPIC.

A previous commit db2bdcbbbd32
("KVM: x86: fix edge EOI and IOAPIC reconfig race")
addressed this issue by adding pending edge-triggered interrupts to
ioapic_handled_vectors, fixing this race for edge-triggered interrupts.
The commit explicitly ignored level-triggered interrupts,
but this race applies to them as well:

1) IOAPIC sends a level triggered interrupt vector to VCPU0
2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC
   to route the vector to VCPU1. The reconfiguration rewrites only the
   upper 32 bits of the IOREDTBLn register. (Causes KVM to update
   ioapic_handled_vectors for VCPU0 and it no longer includes the vector.)
3) VCPU0 sends EOI for the vector, but it's not delievered to the
   IOAPIC because the ioapic_handled_vectors doesn't include the vector.
4) New interrupts are not delievered to VCPU1 because remote_irr bit
   is set forever.

Therefore, the correct behavior is to add all pending and running
interrupts to ioapic_handled_vectors.

This commit introduces a slight performance hit similar to
commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
for the rare case that the vector is reused by a non-IOAPIC source on
VCPU0. We prefer to keep solution simple and not handle this case just
as the original commit does.

Fixes: db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")

Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/ioapic.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Wanpeng Li Nov. 6, 2017, 2 a.m. UTC | #1
2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
> KVM uses ioapic_handled_vectors to track vectors that need to notify the
> IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an
> interrupt with old configuration is pending or running and
> ioapic_handled_vectors only remembers the newest configuration;
> thus EOI from the old interrupt is not delievered to the IOAPIC.
>
> A previous commit db2bdcbbbd32
> ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
> addressed this issue by adding pending edge-triggered interrupts to
> ioapic_handled_vectors, fixing this race for edge-triggered interrupts.
> The commit explicitly ignored level-triggered interrupts,
> but this race applies to them as well:
>
> 1) IOAPIC sends a level triggered interrupt vector to VCPU0
> 2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC
>    to route the vector to VCPU1. The reconfiguration rewrites only the
>    upper 32 bits of the IOREDTBLn register. (Causes KVM to update
>    ioapic_handled_vectors for VCPU0 and it no longer includes the vector.)

Refer to __ioapic_write_entry() in linux guest kernel codes, both
upper 32 bits and lower 32 bits are written to IOREDTBLx, so when the
scenario which you mentioned will occur?

Regards,
Wanpeng Li

> 3) VCPU0 sends EOI for the vector, but it's not delievered to the
>    IOAPIC because the ioapic_handled_vectors doesn't include the vector.
> 4) New interrupts are not delievered to VCPU1 because remote_irr bit
>    is set forever.
>
> Therefore, the correct behavior is to add all pending and running
> interrupts to ioapic_handled_vectors.
>
> This commit introduces a slight performance hit similar to
> commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
> for the rare case that the vector is reused by a non-IOAPIC source on
> VCPU0. We prefer to keep solution simple and not handle this case just
> as the original commit does.
>
> Fixes: db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
>
> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/kvm/ioapic.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index bdff437acbcb..ae0a7dc318b2 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -257,8 +257,7 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors)
>                     index == RTC_GSI) {
>                         if (kvm_apic_match_dest(vcpu, NULL, 0,
>                                      e->fields.dest_id, e->fields.dest_mode) ||
> -                           (e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
> -                            kvm_apic_pending_eoi(vcpu, e->fields.vector)))
> +                           kvm_apic_pending_eoi(vcpu, e->fields.vector))
>                                 __set_bit(e->fields.vector,
>                                           ioapic_handled_vectors);
>                 }
> --
> 2.13.3
>
Nikita Leshenko Nov. 6, 2017, 1:53 p.m. UTC | #2
> On 6 Nov 2017, at 4:00, Wanpeng Li <kernellwp@gmail.com> wrote:
> 
> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>> KVM uses ioapic_handled_vectors to track vectors that need to notify the
>> IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an
>> interrupt with old configuration is pending or running and
>> ioapic_handled_vectors only remembers the newest configuration;
>> thus EOI from the old interrupt is not delievered to the IOAPIC.
>> 
>> A previous commit db2bdcbbbd32
>> ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
>> addressed this issue by adding pending edge-triggered interrupts to
>> ioapic_handled_vectors, fixing this race for edge-triggered interrupts.
>> The commit explicitly ignored level-triggered interrupts,
>> but this race applies to them as well:
>> 
>> 1) IOAPIC sends a level triggered interrupt vector to VCPU0
>> 2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC
>>   to route the vector to VCPU1. The reconfiguration rewrites only the
>>   upper 32 bits of the IOREDTBLn register. (Causes KVM to update
>>   ioapic_handled_vectors for VCPU0 and it no longer includes the vector.)
> 
> Refer to __ioapic_write_entry() in linux guest kernel codes, both
> upper 32 bits and lower 32 bits are written to IOREDTBLx, so when the
> scenario which you mentioned will occur?

You’re correct regarding Linux, but other operating systems may update
the redirection entry partially.

For example, ESXi 5.5 has a kernel function IOAPICSteerVector that
modifies the Destination Field in the upper bits without affecting the
lower bits of IOREDTBLx. We have observed that without this patch ESXi
would loose network connectivity because it would get stuck with the
Remote IRR bit set forever.

Nikita

> 
> Regards,
> Wanpeng Li
> 
>> 3) VCPU0 sends EOI for the vector, but it's not delievered to the
>>   IOAPIC because the ioapic_handled_vectors doesn't include the vector.
>> 4) New interrupts are not delievered to VCPU1 because remote_irr bit
>>   is set forever.
>> 
>> Therefore, the correct behavior is to add all pending and running
>> interrupts to ioapic_handled_vectors.
>> 
>> This commit introduces a slight performance hit similar to
>> commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
>> for the rare case that the vector is reused by a non-IOAPIC source on
>> VCPU0. We prefer to keep solution simple and not handle this case just
>> as the original commit does.
>> 
>> Fixes: db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
>> 
>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>> arch/x86/kvm/ioapic.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index bdff437acbcb..ae0a7dc318b2 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -257,8 +257,7 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors)
>>                    index == RTC_GSI) {
>>                        if (kvm_apic_match_dest(vcpu, NULL, 0,
>>                                     e->fields.dest_id, e->fields.dest_mode) ||
>> -                           (e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
>> -                            kvm_apic_pending_eoi(vcpu, e->fields.vector)))
>> +                           kvm_apic_pending_eoi(vcpu, e->fields.vector))
>>                                __set_bit(e->fields.vector,
>>                                          ioapic_handled_vectors);
>>                }
>> --
>> 2.13.3
Paolo Bonzini Nov. 6, 2017, 2:05 p.m. UTC | #3
On 06/11/2017 14:53, Nikita Leshchenko wrote:
> 
>> On 6 Nov 2017, at 4:00, Wanpeng Li <kernellwp@gmail.com> wrote:
>>
>> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>>> KVM uses ioapic_handled_vectors to track vectors that need to notify the
>>> IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an
>>> interrupt with old configuration is pending or running and
>>> ioapic_handled_vectors only remembers the newest configuration;
>>> thus EOI from the old interrupt is not delievered to the IOAPIC.
>>>
>>> A previous commit db2bdcbbbd32
>>> ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
>>> addressed this issue by adding pending edge-triggered interrupts to
>>> ioapic_handled_vectors, fixing this race for edge-triggered interrupts.
>>> The commit explicitly ignored level-triggered interrupts,
>>> but this race applies to them as well:
>>>
>>> 1) IOAPIC sends a level triggered interrupt vector to VCPU0
>>> 2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC
>>>   to route the vector to VCPU1. The reconfiguration rewrites only the
>>>   upper 32 bits of the IOREDTBLn register. (Causes KVM to update
>>>   ioapic_handled_vectors for VCPU0 and it no longer includes the vector.)
>>
>> Refer to __ioapic_write_entry() in linux guest kernel codes, both
>> upper 32 bits and lower 32 bits are written to IOREDTBLx, so when the
>> scenario which you mentioned will occur?
> 
> You’re correct regarding Linux, but other operating systems may update
> the redirection entry partially.
> 
> For example, ESXi 5.5 has a kernel function IOAPICSteerVector that
> modifies the Destination Field in the upper bits without affecting the
> lower bits of IOREDTBLx. We have observed that without this patch ESXi
> would loose network connectivity because it would get stuck with the
> Remote IRR bit set forever.

Thanks, this is helpful.

Can you write testcases for kvm-unit-tests for some or all of these
scenarios?

Thanks,

Paolo
Nikita Leshenko Nov. 7, 2017, 8:13 a.m. UTC | #4
> On 6 Nov 2017, at 16:05, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> Can you write testcases for kvm-unit-tests for some or all of these
> scenarios?
Submitted patch.

Nikita
Paolo Bonzini Nov. 7, 2017, 10:18 a.m. UTC | #5
On 07/11/2017 09:13, Nikita Leshchenko wrote:
> 
>> On 6 Nov 2017, at 16:05, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> Can you write testcases for kvm-unit-tests for some or all of these
>> scenarios?
> Submitted patch.

For 2-4-5 would be nice too (especially 2, the others are easier).

Paolo
Steve Rutherford Nov. 7, 2017, 11:09 p.m. UTC | #6
Thanks for doing this! This looks fine from the split irqchip
perspective as well.

On Tue, Nov 7, 2017 at 2:18 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 07/11/2017 09:13, Nikita Leshchenko wrote:
>>
>>> On 6 Nov 2017, at 16:05, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>> Can you write testcases for kvm-unit-tests for some or all of these
>>> scenarios?
>> Submitted patch.
>
> For 2-4-5 would be nice too (especially 2, the others are easier).
>
> Paolo
diff mbox

Patch

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index bdff437acbcb..ae0a7dc318b2 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -257,8 +257,7 @@  void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors)
 		    index == RTC_GSI) {
 			if (kvm_apic_match_dest(vcpu, NULL, 0,
 			             e->fields.dest_id, e->fields.dest_mode) ||
-			    (e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
-			     kvm_apic_pending_eoi(vcpu, e->fields.vector)))
+			    kvm_apic_pending_eoi(vcpu, e->fields.vector))
 				__set_bit(e->fields.vector,
 					  ioapic_handled_vectors);
 		}