diff mbox

KVM: x86: implement IOAPIC_REG_EOI for directed EOI support

Message ID 20170413140929.GD23556@potion (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář April 13, 2017, 2:09 p.m. UTC
2017-04-13 16:32+0800, Peter Xu:
> On Wed, Apr 12, 2017 at 02:20:12PM +0200, Ladi Prosek wrote:
> > On Wed, Apr 12, 2017 at 1:57 PM, Peter Xu <peterx@redhat.com> wrote:
> > > On Wed, Apr 12, 2017 at 12:28:25PM +0200, Ladi Prosek wrote:
> > >> On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek <lprosek@redhat.com> wrote:
> > >> > On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@redhat.com> wrote:
> > >> >>  void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
> > >> >>  {
> > >> >>         struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
> > >> >>
> > >> >> +       /* If we'll be using direct EOI, skip broadcast */
> > >> >> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> > >> >> +               return;
> > >> >> +
> > >> >
> > >> > I've only seen the direct EOI sent for level irqs so I'm afraid that
> > >> > __kvm_ioapic_update_eoi needs to run for edge-triggered even if the
> > >> > APIC_SPIV_DIRECTED_EOI flag is set.
> > >
> > > Yes, if without your patch, it is problematic. But if with your patch,
> > > __kvm_ioapic_update_eoi() will be called in ioapic mmio write then.
> > 
> > No, there's no ioapic mmio write for edge-triggered interrupts, at
> > least Windows don't do any. Edge interrupts must still be handled on
> > the lapic EOI path.
> 
> I see. Yes it may break that. Though I don't know in what case someone
> would register to this IOAPIC eoi message if it's edge-triggered...

I don't know of any sane case ...

KVM PIT (arch/x86/kvm/i8254.c) uses it to keep track of lost edge
interrupts so it could re-inject them.
Windows needed it for reasonable timeflow, IIRC.

The userspace probably uses it for similar reasons.

We have to keep the edge notifier working. :(

> > >> This is what __eoi_ioapic_pin does for version<0x20 and
> > >> on the host side we reset the remote_irr in ioapic_write_indirect if
> > >> I'm reading the code correctly. Wouldn't we want to deliver the
> > >> notification via kvm_notify_acked_irq in this case also?
> > >
> > > I think in that case (EOI sent via "direct EOI" of ioapic mmio
> > > register) the remote_irr is not cleaned in ioapic_write_indirect(),
> > > but it's cleaned in __kvm_ioapic_update_eoi() as well. It has the
> > > line:
> > >
> > >                 ent->fields.remote_irr = 0;
> > >
> > > Right?
> > 
> > Right, but I meant the other case, a guest that sets
> > APIC_SPIV_DIRECTED_EOI but doesn't write to IOAPIC_REG_EOI. With your
> > patch there would be no kvm_notify_acked_irq and that doesn't look
> > correct.
> 
> For level triggered irq, if it sets APIC_SPIV_DIRECTED_EOI but doesn't
> write IOAPIC_REG_EOI, it should be a guest bug, right?

Yes.

> Thinking about the migration issue that Radim has mentioned, I agree
> maybe we can consider just revert the suppress eoi broadcast. Actually
> now I start to wonder why direct EOI is introduced. I think it should
> be for performance's sake on systems has lots of ioapics? But looks
> like that's normally not the case, at least for kvm and most general
> machines, which only has one ioapic.

That is what I thought as well.

I am wondering why Windows uses it -- maybe they remap vector numbers
and therefore need to EOI a different vector?

The feature might have been introduced because some OS assumed presence
of the feature if x2APIC was available ...
Right, reverting it is pretty safe, because it was broken before. :)
(And we ignore the SPIV completely when forwarding EOI to userspace.)

Does Windows 2016 work with this patch?

---8<---

Comments

Ladi Prosek April 13, 2017, 3:11 p.m. UTC | #1
On Thu, Apr 13, 2017 at 4:09 PM, Radim Krcmar <rkrcmar@redhat.com> wrote:
> 2017-04-13 16:32+0800, Peter Xu:
>> On Wed, Apr 12, 2017 at 02:20:12PM +0200, Ladi Prosek wrote:
>> > On Wed, Apr 12, 2017 at 1:57 PM, Peter Xu <peterx@redhat.com> wrote:
>> > > On Wed, Apr 12, 2017 at 12:28:25PM +0200, Ladi Prosek wrote:
>> > >> On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek <lprosek@redhat.com> wrote:
>> > >> > On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@redhat.com> wrote:
>> > >> >>  void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
>> > >> >>  {
>> > >> >>         struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
>> > >> >>
>> > >> >> +       /* If we'll be using direct EOI, skip broadcast */
>> > >> >> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> > >> >> +               return;
>> > >> >> +
>> > >> >
>> > >> > I've only seen the direct EOI sent for level irqs so I'm afraid that
>> > >> > __kvm_ioapic_update_eoi needs to run for edge-triggered even if the
>> > >> > APIC_SPIV_DIRECTED_EOI flag is set.
>> > >
>> > > Yes, if without your patch, it is problematic. But if with your patch,
>> > > __kvm_ioapic_update_eoi() will be called in ioapic mmio write then.
>> >
>> > No, there's no ioapic mmio write for edge-triggered interrupts, at
>> > least Windows don't do any. Edge interrupts must still be handled on
>> > the lapic EOI path.
>>
>> I see. Yes it may break that. Though I don't know in what case someone
>> would register to this IOAPIC eoi message if it's edge-triggered...
>
> I don't know of any sane case ...
>
> KVM PIT (arch/x86/kvm/i8254.c) uses it to keep track of lost edge
> interrupts so it could re-inject them.
> Windows needed it for reasonable timeflow, IIRC.
>
> The userspace probably uses it for similar reasons.
>
> We have to keep the edge notifier working. :(
>
>> > >> This is what __eoi_ioapic_pin does for version<0x20 and
>> > >> on the host side we reset the remote_irr in ioapic_write_indirect if
>> > >> I'm reading the code correctly. Wouldn't we want to deliver the
>> > >> notification via kvm_notify_acked_irq in this case also?
>> > >
>> > > I think in that case (EOI sent via "direct EOI" of ioapic mmio
>> > > register) the remote_irr is not cleaned in ioapic_write_indirect(),
>> > > but it's cleaned in __kvm_ioapic_update_eoi() as well. It has the
>> > > line:
>> > >
>> > >                 ent->fields.remote_irr = 0;
>> > >
>> > > Right?
>> >
>> > Right, but I meant the other case, a guest that sets
>> > APIC_SPIV_DIRECTED_EOI but doesn't write to IOAPIC_REG_EOI. With your
>> > patch there would be no kvm_notify_acked_irq and that doesn't look
>> > correct.
>>
>> For level triggered irq, if it sets APIC_SPIV_DIRECTED_EOI but doesn't
>> write IOAPIC_REG_EOI, it should be a guest bug, right?
>
> Yes.
>
>> Thinking about the migration issue that Radim has mentioned, I agree
>> maybe we can consider just revert the suppress eoi broadcast. Actually
>> now I start to wonder why direct EOI is introduced. I think it should
>> be for performance's sake on systems has lots of ioapics? But looks
>> like that's normally not the case, at least for kvm and most general
>> machines, which only has one ioapic.
>
> That is what I thought as well.
>
> I am wondering why Windows uses it -- maybe they remap vector numbers
> and therefore need to EOI a different vector?

Maybe it has something to do with how they assign devices to the root
Hyper-V partition. I've seen the issue only with NICs, if it helps.

> The feature might have been introduced because some OS assumed presence
> of the feature if x2APIC was available ...
> Right, reverting it is pretty safe, because it was broken before. :)
> (And we ignore the SPIV completely when forwarding EOI to userspace.)
>
> Does Windows 2016 work with this patch?

Yes, this patch works.

> ---8<---
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index bdff437acbcb..6a1a94d63c52 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -442,8 +442,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>                 kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>                 spin_lock(&ioapic->lock);
>
> -               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> -                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> +               if (trigger_mode != IOAPIC_LEVEL_TRIG)
>                         continue;
>
>                 ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index bad6a25067bc..20fdd93d7dd4 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -319,8 +319,6 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
>                 return;
>
>         feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0);
> -       if (feat && (feat->ecx & (1 << (X86_FEATURE_X2APIC & 31))))
> -               v |= APIC_LVR_DIRECTED_EOI;
>         kvm_lapic_set_reg(apic, APIC_LVR, v);
>  }
>
> @@ -1636,8 +1634,6 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>
>         case APIC_SPIV: {
>                 u32 mask = 0x3ff;
> -               if (kvm_lapic_get_reg(apic, APIC_LVR) & APIC_LVR_DIRECTED_EOI)
> -                       mask |= APIC_SPIV_DIRECTED_EOI;
>                 apic_set_spiv(apic, val & mask);
>                 if (!(val & APIC_SPIV_APIC_ENABLED)) {
>                         int i;
Paolo Bonzini April 14, 2017, 5:28 a.m. UTC | #2
On 13/04/2017 22:09, Radim Krcmar wrote:
> KVM PIT (arch/x86/kvm/i8254.c) uses it to keep track of lost edge
> interrupts so it could re-inject them.
> Windows needed it for reasonable timeflow, IIRC.

This is for old Linux.

Windows needs the RTC tracking, which also uses the irq ack notifier.

> The feature might have been introduced because some OS assumed presence
> of the feature if x2APIC was available ...
> Right, reverting it is pretty safe, because it was broken before. 
> (And we ignore the SPIV completely when forwarding EOI to userspace.)

Yeah, that's the important part.  I'm a bit worried of the consequences
for migration, but AIUI you think it's fine?  (That is, too many EOIs
are not a problem, only too few are).

Paolo
diff mbox

Patch

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index bdff437acbcb..6a1a94d63c52 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -442,8 +442,7 @@  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
 		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
 		spin_lock(&ioapic->lock);
 
-		if (trigger_mode != IOAPIC_LEVEL_TRIG ||
-		    kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+		if (trigger_mode != IOAPIC_LEVEL_TRIG)
 			continue;
 
 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bad6a25067bc..20fdd93d7dd4 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -319,8 +319,6 @@  void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 		return;
 
 	feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0);
-	if (feat && (feat->ecx & (1 << (X86_FEATURE_X2APIC & 31))))
-		v |= APIC_LVR_DIRECTED_EOI;
 	kvm_lapic_set_reg(apic, APIC_LVR, v);
 }
 
@@ -1636,8 +1634,6 @@  int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 
 	case APIC_SPIV: {
 		u32 mask = 0x3ff;
-		if (kvm_lapic_get_reg(apic, APIC_LVR) & APIC_LVR_DIRECTED_EOI)
-			mask |= APIC_SPIV_DIRECTED_EOI;
 		apic_set_spiv(apic, val & mask);
 		if (!(val & APIC_SPIV_APIC_ENABLED)) {
 			int i;