diff mbox series

kvm/x86: Fix simultaneous ExtINT and lapic interrupt handling with APICv

Message ID 6a8897917188a3a23710199f8da3f5f33670b80f.camel@infradead.org (mailing list archive)
State New, archived
Headers show
Series kvm/x86: Fix simultaneous ExtINT and lapic interrupt handling with APICv | expand

Commit Message

David Woodhouse Nov. 26, 2020, 12:05 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

Booting a guest with 'noapic' with split irqchip and APICv enabled leads
to a livelock in vcpu_run().

When the userspace PIC has an IRQ to deliver, the VMM sets
kvm_run->request_interrupt_window and vcpu_enter_guest() duly enables the
corresponding VMexit, which happens immediately.

However, if an interrupt also exists in the local APIC, that causes
kvm_cpu_has_interrupt() to be true and thus causes
kvm_vcpu_ready_for_interrupt_injection() to return false; the intent
being that the kernel will use this interrupt window to inject its own
interrupt from the LAPIC. So vcpu_run() doesn't exit all the way to
userspace, and just loops.

However, when APICv is in use there is no interrupt to be injected since
that happens automatically. So the guest vCPU is launched again, exits
again immediately, and the loop repeats ad infinitum without making any
progress.

It looks like this was introduced in commit 782d422bcaee, when
dm_request_for_irq_injection() started returning true based purely
on the fact that userspace had requested the interrupt window, without
heed to kvm_cpu_has_interrupt() also being true.

Fix it by allowing userspace to use the interrupt window with priority
over the interrupt that is already in the LAPIC, for both APICv and
legacy injection alike. Instead of '!kvm_cpu_has_interrupt()', the
condition becomes '!(lapic_in_kernel(vcpu) && kvm_cpu_has_extint(vcpu))'

Fixes: 782d422bcaee ("KVM: x86: split kvm_vcpu_ready_for_interrupt_injection out of dm_request_for_irq_injection")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/irq.c              | 2 +-
 arch/x86/kvm/x86.c              | 6 ++++--
 3 files changed, 6 insertions(+), 3 deletions(-)

Comments

Paolo Bonzini Nov. 26, 2020, 6 p.m. UTC | #1
On 26/11/20 13:05, David Woodhouse wrote:
> |It looks like this was introduced in commit 782d422bcaee, when 
> dm_request_for_irq_injection() started returning true based purely on 
> the fact that userspace had requested the interrupt window, without heed 
> to kvm_cpu_has_interrupt() also being true. |

That patch had no semantic change, because 
dm_request_for_irq_injection() was split in two and the problematic bit 
was only split to kvm_vcpu_ready_for_interrupt_injection().

Even pre-patch there was a

	if (kvm_cpu_has_interrupt(vcpu))
		return false;

in dm_request_for_irq_injection() which your patch would have changed to

	if (lapic_in_kernel(vcpu) && kvm_cpu_has_extint(vcpu))
		return false;

Your patch certainly works, but _what_ does

		!(lapic_in_kernel(vcpu) && kvm_cpu_has_extint(vcpu)) &&
  		kvm_cpu_accept_dm_intr(vcpu)

mean in terms of the vcpu's state?  I have no idea, in fact at this 
point I barely have an idea of what 
kvm_vcpu_ready_for_interrupt_injection does.  Let's figure it out.


First act
~~~~~~~~~

First of all let's take a step back from your patch.  Let's just look at 
kvm_cpu_has_interrupt(vcpu) and trivially remove the APIC case from 
kvm_cpu_has_interrupt:

+static bool xxx(struct kvm_vcpu *vcpu)
+{
+	WARN_ON(pic_in_kernel(vcpu->kvm));
+	if (!lapic_in_kernel(vcpu))
+		return vcpu->arch.interrupt.injected;
+	else
+		return kvm_cpu_has_extint(vcpu);
+}

  	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
  		!kvm_event_needs_reinjection(vcpu) &&
+		!xxx(vcpu) &&
  		kvm_cpu_accept_dm_intr(vcpu);

Again, no idea does "xxx" do, much less its combination with 
kvm_cpu_accept_dm_intr.  We need to dive further down.


Second act
~~~~~~~~~~

kvm_cpu_accept_dm_intr can be rewritten like this:

         if (!lapic_in_kernel(vcpu))
		return true;
	else
                 return kvm_apic_accept_pic_intr(vcpu));

Therefore, we can commonize the "if"s in our xxx function with those 
from kvm_cpu_accept_dm_intr.  Remembering that the first act used the 
negation of xxx, the patch now takes this shape

+static int yyy(struct kvm_vcpu *vcpu)
+{
+	WARN_ON(pic_in_kernel(vcpu->kvm));
+	if (!lapic_in_kernel(vcpu))
+		return !vcpu->arch.interrupt.injected;
+	else
+		return (!kvm_cpu_has_extint(vcpu) &&
+			kvm_apic_accept_pic_intr(vcpu));
+}

  	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
  		!kvm_event_needs_reinjection(vcpu) &&
- 		kvm_cpu_accept_dm_intr(vcpu);
+		yyy(vcpu);

This doesn't seem like progress, but we're not done...


Third act
~~~~~~~~~

Let's look at the arms of yyy's "if" statement one by one.

If !lapic_in_kernel, the return statement will always be true because 
the function is called under !kvm_event_needs_reinjection(vcpu).  So 
we're already at

static int yyy(struct kvm_vcpu *vcpu)
{
	WARN_ON(pic_in_kernel(vcpu->kvm));
	if (!lapic_in_kernel(vcpu))
		return true;
	
	return (!kvm_cpu_has_extint(vcpu) &&
		kvm_apic_accept_pic_intr(vcpu));
}

As to the "else" branch, irqchip_split is true so 
kvm_cpu_has_extint(vcpu) is "kvm_apic_accept_pic_intr(v) && 
pending_userspace_extint(v)".  More simplifications ahead!

	!(A && B) && A
     =>  (!A || !B) && A
     =>  A && !B

that is:

static int yyy(struct kvm_vcpu *vcpu)
{
	WARN_ON(pic_in_kernel(vcpu->kvm));
	if (!lapic_in_kernel(vcpu))
		return true;
	
	return (kvm_apic_accept_pic_intr(vcpu) &&
		!pending_userspace_extint(vcpu));
}

which makes sense: focusing on ExtINT and ignoring event reinjection 
(which is handled by the caller), the vCPU is ready for interrupt 
injection if:

- there is no LAPIC (so ExtINT injection is in the hands of userspace), or

- PIC interrupts are being accepted, and userspace's last ExtINT isn't 
still pending.

Thus, the final patch is:

  static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
  {
+	WARN_ON(pic_in_kernel(vcpu->kvm));
+
  	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
  		!kvm_event_needs_reinjection(vcpu) &&
-		kvm_cpu_accept_dm_intr(vcpu);
+		(!lapic_in_kernel(vcpu)
+		 || (kvm_apic_accept_pic_intr(vcpu)
+		     && !pending_userspace_extint(v));
  }

I'm wondering if this one fails as well...

Paolo
David Woodhouse Nov. 26, 2020, 7:07 p.m. UTC | #2
On Thu, 2020-11-26 at 19:00 +0100, Paolo Bonzini wrote:
>   static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu
> *vcpu)
>   {
> +       WARN_ON(pic_in_kernel(vcpu->kvm));
> +

But what if I *want* to inject Xen event channel interrupts while the
actual PIC is in the kernel? :)

Not that I'll have to once the kernel is fixed and I can enable my
shiny new userspace PIC/PIT/IOAPIC code, I suppose.... 

>         return kvm_arch_interrupt_allowed(vcpu) &&
> -               !kvm_cpu_has_interrupt(vcpu) &&
>                 !kvm_event_needs_reinjection(vcpu) &&
> -               kvm_cpu_accept_dm_intr(vcpu);
> +               (!lapic_in_kernel(vcpu)
> +                || (kvm_apic_accept_pic_intr(vcpu)
> +                    && !pending_userspace_extint(v));
>   }
> 

I'll give this version a spin...


static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
{
	return kvm_arch_interrupt_allowed(vcpu) &&
		!kvm_event_needs_reinjection(vcpu) &&
		(!lapic_in_kernel(vcpu)
		 || (kvm_apic_accept_pic_intr(vcpu)
		     && vcpu->arch.pending_external_vector == -1));
}
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f002cdb13a0b..e85e2f1c661c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1656,6 +1656,7 @@  int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end,
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_cpu_has_extint(struct kvm_vcpu *v);
 int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 99d118ffc67d..9c4ef1682b81 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -40,7 +40,7 @@  static int pending_userspace_extint(struct kvm_vcpu *v)
  * check if there is pending interrupt from
  * non-APIC source without intack.
  */
-static int kvm_cpu_has_extint(struct kvm_vcpu *v)
+int kvm_cpu_has_extint(struct kvm_vcpu *v)
 {
 	u8 accept = kvm_apic_accept_pic_intr(v);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a3fdc16cfd6f..b50ae8ba66e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4080,12 +4080,14 @@  static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu)
  * if userspace requested an interrupt window, check that the
  * interrupt window is open.
  *
- * No need to exit to userspace if we already have an interrupt queued.
+ * If there is already an event which needs reinjection or a
+ * pending ExtINT, allow that to be processed by the kernel
+ * before letting userspace have the opportunity.
  */
 static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
 {
 	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
+		!(lapic_in_kernel(vcpu) && kvm_cpu_has_extint(vcpu)) &&
 		!kvm_event_needs_reinjection(vcpu) &&
 		kvm_cpu_accept_dm_intr(vcpu);
 }