diff mbox

[v4,00/16] IOMMU: Enable interrupt remapping for Intel IOMMU

Message ID 20160426073426.GD28545@pxdev.xzpeter.org (mailing list archive)
State New, archived
Headers show

Commit Message

Peter Xu April 26, 2016, 7:34 a.m. UTC
On Mon, Apr 25, 2016 at 09:24:12AM +0200, Jan Kiszka wrote:
> On 2016-04-25 09:18, Peter Xu wrote:
> > On Mon, Apr 25, 2016 at 07:16:19AM +0200, Jan Kiszka wrote:
> >> On 2016-04-19 10:38, Peter Xu wrote:
> > 
> > [...]
> > 
> >>> By default, IR is disabled to be better compatible with current
> >>> QEMU. To enable IR, we can using the following command to boot a
> >>> IR-supported VM with virtio-net device with vhost (still do not
> >>> support kvm-ioapic, so we need to specify kernel-irqchip={split|off}
> >>> here):
> >>>
> >>> $ qemu-system-x86_64 -M q35,iommu=on,intr=on,kernel-irqchip=split \
> >>
> >> "intr" sounds a bit too much like "interrupt", not "interrupt
> >> remapping". Why not use the kernel's form, "intremap"?
> > 
> > Sure. It sounds nice to be aligned with the kernel one. Let me take
> > it in v5.
> > 
> >>
> >>>      -enable-kvm -m 1024 \
> >>> 	 -netdev tap,id=net0,vhost=on \
> >>>      -device virtio-net-pci,netdev=user.0 \
> >>>      -monitor telnet::3333,server,nowait \
> >>> 	 /var/lib/libvirt/images/vm1.qcow2
> >>>
> >>> When guest boots, we can verify whether IR enabled by grepping the
> >>> dmesg like:
> >>>
> >>> [root@localhost ~]# journalctl -k | grep "DMAR-IR"
> >>> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: IOAPIC id 0 under DRHD base  0xfed90000 IOMMU 0
> >>> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: Enabled IRQ remapping in xapic mode
> >>>
> >>> Currently supported devices:
> >>>
> >>> - Emulated/Splitted irqchip
> >>> - Generic PCI Devices
> >>> - vhost devices
> >>> - pass through device support? Not tested, but suppose it should work.
> >>
> >> I've tested this series against my Jailhouse setup, and it works pretty
> >> well! Actually considering to move my test setup over this branch.
> > 
> > This is really encouraging feedback! Btw, thanks for all kinds of
> > help on this patchset. :-)
> > 
> >>
> >> However, split irqchip still has some issues: When I boot a q35 machine
> >> with Linux, the e1000 network adapter only gets a single IRQ delivered.
> >> Interestingly, other IOAPIC IRQs like the keyboard work all the time. I
> >> didn't debug this in details yet.
> > 
> > I reproduced this problem. It seems that it fails even with
> > kernel-irqchip=off. Will try to dig it out.
> 
> Very good. Hope it can be easily fixed.

Hi, Jan,

The above issue should be caused by EOI missing of level-triggered
interrupts. Before that, I was always using edge-triggered
interrupts for test, so didn't encounter this one. Would you please
help try below patch? It can be applied directly onto the series,
and should solve the issue (it works on my test vm, and I'll take it
in v5 as well if it also works for you):

-------------------------


------------------------

I am still looking into guest part codes. Although the above patch
should solve the issue, there are still issues in guest codes when
IR is enabled:

- mismatched "vector" in IOAPIC entry and IRTE entry (this is
  required in vt-d spec 5.1.5.1, and required to correctly deliver
  EOI broadcast I guess). See intel_irq_remapping_prepare_irte():

        ...
        /*
         * IO-APIC RTE will be configured with virtual vector.
         * irq handler will do the explicit EOI to the io-apic.
         */
        entry->vector   = info->ioapic_pin;
        ...

- I encountered that level-triggered entries in IOAPIC is marked as
  edge-triggered interrupt in APIC (which is strange)... This will
  also affect correct delivery of EOI broadcast. I still need time
  to figure out why.

If EOI broadcast can work, e1000 issue would be solved as
well even without above patch.

[...]

> > 
> >>
> >>> - IR fault reporting
> >>
> >> Would be welcome! I found a "test case" yesterday: misconfigured IOAPIC
> >> ID blocked its IRQs under Jailhouse, and I first had to enable tracing
> >> to realize it ;).
> > 
> > Yes, it sounds nice to have guest side feedback on IR faults. Will
> > do more reading, and see whether I can add one more patch in v5 to
> > do this.
> 
> It's not a must-have for getting things merged. In fact, any additional
> feature that could now delay the merge of what you have should rather
> wait. Stabilizing, addressing style and structure comments is more
> important IMO.

Okay, then let me add this into my todo list, and will pick this up
when got time.

Thanks,

-- peterx

Comments

Jan Kiszka April 26, 2016, 7:57 a.m. UTC | #1
On 2016-04-26 09:34, Peter Xu wrote:
> On Mon, Apr 25, 2016 at 09:24:12AM +0200, Jan Kiszka wrote:
>> On 2016-04-25 09:18, Peter Xu wrote:
>>> On Mon, Apr 25, 2016 at 07:16:19AM +0200, Jan Kiszka wrote:
>>>> On 2016-04-19 10:38, Peter Xu wrote:
>>>
>>> [...]
>>>
>>>>> By default, IR is disabled to be better compatible with current
>>>>> QEMU. To enable IR, we can using the following command to boot a
>>>>> IR-supported VM with virtio-net device with vhost (still do not
>>>>> support kvm-ioapic, so we need to specify kernel-irqchip={split|off}
>>>>> here):
>>>>>
>>>>> $ qemu-system-x86_64 -M q35,iommu=on,intr=on,kernel-irqchip=split \
>>>>
>>>> "intr" sounds a bit too much like "interrupt", not "interrupt
>>>> remapping". Why not use the kernel's form, "intremap"?
>>>
>>> Sure. It sounds nice to be aligned with the kernel one. Let me take
>>> it in v5.
>>>
>>>>
>>>>>      -enable-kvm -m 1024 \
>>>>> 	 -netdev tap,id=net0,vhost=on \
>>>>>      -device virtio-net-pci,netdev=user.0 \
>>>>>      -monitor telnet::3333,server,nowait \
>>>>> 	 /var/lib/libvirt/images/vm1.qcow2
>>>>>
>>>>> When guest boots, we can verify whether IR enabled by grepping the
>>>>> dmesg like:
>>>>>
>>>>> [root@localhost ~]# journalctl -k | grep "DMAR-IR"
>>>>> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: IOAPIC id 0 under DRHD base  0xfed90000 IOMMU 0
>>>>> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: Enabled IRQ remapping in xapic mode
>>>>>
>>>>> Currently supported devices:
>>>>>
>>>>> - Emulated/Splitted irqchip
>>>>> - Generic PCI Devices
>>>>> - vhost devices
>>>>> - pass through device support? Not tested, but suppose it should work.
>>>>
>>>> I've tested this series against my Jailhouse setup, and it works pretty
>>>> well! Actually considering to move my test setup over this branch.
>>>
>>> This is really encouraging feedback! Btw, thanks for all kinds of
>>> help on this patchset. :-)
>>>
>>>>
>>>> However, split irqchip still has some issues: When I boot a q35 machine
>>>> with Linux, the e1000 network adapter only gets a single IRQ delivered.
>>>> Interestingly, other IOAPIC IRQs like the keyboard work all the time. I
>>>> didn't debug this in details yet.
>>>
>>> I reproduced this problem. It seems that it fails even with
>>> kernel-irqchip=off. Will try to dig it out.
>>
>> Very good. Hope it can be easily fixed.
> 
> Hi, Jan,
> 
> The above issue should be caused by EOI missing of level-triggered
> interrupts. Before that, I was always using edge-triggered
> interrupts for test, so didn't encounter this one. Would you please
> help try below patch? It can be applied directly onto the series,
> and should solve the issue (it works on my test vm, and I'll take it
> in v5 as well if it also works for you):
> 

Works here as well. I even made EIM working with some hack, though
Jailhouse spits out strange warnings, despite it works fine (x2apic
mode, split irqchip).

> -------------------------
> 
> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> index b41ab89..de6a8cf 100644
> --- a/hw/intc/ioapic.c
> +++ b/hw/intc/ioapic.c
> @@ -281,6 +281,36 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int size)
>      return val;
>  }
> 
> +/*
> + * This is to satisfy the hack in Linux kernel. One hack of it is to
> + * simulate clearing the Remote IRR bit of IOAPIC entry using the
> + * following:
> + *
> + * "For IO-APIC's with EOI register, we use that to do an explicit EOI.
> + * Otherwise, we simulate the EOI message manually by changing the trigger
> + * mode to edge and then back to level, with RTE being masked during
> + * this."
> + *
> + * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701)
> + *
> + * This is based on the assumption that, Remote IRR bit will be
> + * cleared by IOAPIC hardware for edge-triggered interrupts (I
> + * believe that's what the IOAPIC version 0x1X hardware does). So
> + * if we are emulating it, we'd better do it the same here, so that
> + * the guest kernel hack will work as well on QEMU.
> + *
> + * Without this, level-triggered interrupts in IR mode might fail to
> + * work correctly.
> + */
> +static inline void
> +ioapic_fix_edge_remote_irr(uint64_t *entry)
> +{
> +    if (*entry & IOAPIC_LVT_TRIGGER_MODE) {
> +        /* Level triggered interrupts, make sure remote IRR is zero */
> +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
> +    }
> +}
> +
>  static void
>  ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
>                   unsigned int size)
> @@ -314,6 +344,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
>                      s->ioredtbl[index] &= ~0xffffffffULL;
>                      s->ioredtbl[index] |= val;
>                  }
> +                ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
>                  ioapic_service(s);
>              }
>          }
> 
> ------------------------
> 
> I am still looking into guest part codes. Although the above patch
> should solve the issue, there are still issues in guest codes when
> IR is enabled:
> 
> - mismatched "vector" in IOAPIC entry and IRTE entry (this is
>   required in vt-d spec 5.1.5.1, and required to correctly deliver
>   EOI broadcast I guess). See intel_irq_remapping_prepare_irte():
> 
>         ...
>         /*
>          * IO-APIC RTE will be configured with virtual vector.
>          * irq handler will do the explicit EOI to the io-apic.
>          */
>         entry->vector   = info->ioapic_pin;
>         ...
> 
> - I encountered that level-triggered entries in IOAPIC is marked as
>   edge-triggered interrupt in APIC (which is strange)... This will
>   also affect correct delivery of EOI broadcast. I still need time
>   to figure out why.
> 
> If EOI broadcast can work, e1000 issue would be solved as
> well even without above patch.
> 
> [...]

I don't remember details in this area, but maybe it's worth to look how
my hacks dealt with these cause (or made Linux to not create such weird
configurations).

Jan
Jan Kiszka April 26, 2016, 8:15 a.m. UTC | #2
On 2016-04-26 09:57, Jan Kiszka wrote:
> On 2016-04-26 09:34, Peter Xu wrote:
>> On Mon, Apr 25, 2016 at 09:24:12AM +0200, Jan Kiszka wrote:
>>> On 2016-04-25 09:18, Peter Xu wrote:
>>>> On Mon, Apr 25, 2016 at 07:16:19AM +0200, Jan Kiszka wrote:
>>>>> On 2016-04-19 10:38, Peter Xu wrote:
>>>>
>>>> [...]
>>>>
>>>>>> By default, IR is disabled to be better compatible with current
>>>>>> QEMU. To enable IR, we can using the following command to boot a
>>>>>> IR-supported VM with virtio-net device with vhost (still do not
>>>>>> support kvm-ioapic, so we need to specify kernel-irqchip={split|off}
>>>>>> here):
>>>>>>
>>>>>> $ qemu-system-x86_64 -M q35,iommu=on,intr=on,kernel-irqchip=split \
>>>>>
>>>>> "intr" sounds a bit too much like "interrupt", not "interrupt
>>>>> remapping". Why not use the kernel's form, "intremap"?
>>>>
>>>> Sure. It sounds nice to be aligned with the kernel one. Let me take
>>>> it in v5.
>>>>
>>>>>
>>>>>>      -enable-kvm -m 1024 \
>>>>>> 	 -netdev tap,id=net0,vhost=on \
>>>>>>      -device virtio-net-pci,netdev=user.0 \
>>>>>>      -monitor telnet::3333,server,nowait \
>>>>>> 	 /var/lib/libvirt/images/vm1.qcow2
>>>>>>
>>>>>> When guest boots, we can verify whether IR enabled by grepping the
>>>>>> dmesg like:
>>>>>>
>>>>>> [root@localhost ~]# journalctl -k | grep "DMAR-IR"
>>>>>> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: IOAPIC id 0 under DRHD base  0xfed90000 IOMMU 0
>>>>>> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: Enabled IRQ remapping in xapic mode
>>>>>>
>>>>>> Currently supported devices:
>>>>>>
>>>>>> - Emulated/Splitted irqchip
>>>>>> - Generic PCI Devices
>>>>>> - vhost devices
>>>>>> - pass through device support? Not tested, but suppose it should work.
>>>>>
>>>>> I've tested this series against my Jailhouse setup, and it works pretty
>>>>> well! Actually considering to move my test setup over this branch.
>>>>
>>>> This is really encouraging feedback! Btw, thanks for all kinds of
>>>> help on this patchset. :-)
>>>>
>>>>>
>>>>> However, split irqchip still has some issues: When I boot a q35 machine
>>>>> with Linux, the e1000 network adapter only gets a single IRQ delivered.
>>>>> Interestingly, other IOAPIC IRQs like the keyboard work all the time. I
>>>>> didn't debug this in details yet.
>>>>
>>>> I reproduced this problem. It seems that it fails even with
>>>> kernel-irqchip=off. Will try to dig it out.
>>>
>>> Very good. Hope it can be easily fixed.
>>
>> Hi, Jan,
>>
>> The above issue should be caused by EOI missing of level-triggered
>> interrupts. Before that, I was always using edge-triggered
>> interrupts for test, so didn't encounter this one. Would you please
>> help try below patch? It can be applied directly onto the series,
>> and should solve the issue (it works on my test vm, and I'll take it
>> in v5 as well if it also works for you):
>>
> 
> Works here as well. I even made EIM working with some hack, though
> Jailhouse spits out strange warnings, despite it works fine (x2apic
> mode, split irqchip).

Corrections: the warnings are issued by qemu, not Jailhouse, e.g.

qemu-system-x86_64: VT-d Failed to remap interrupt for gsi 22.

I suspect that comes from the hand-over phase of Jailhouse, when it
mutes all interrupts in the system while reconfiguring IR and IOAPIC.

Please convert this error (in kvm_arch_fixup_msi_route) into a trace
point. It shall not annoy the host. Also check if you have more of such
guest-triggerable error messages.

Jan
Peter Xu April 26, 2016, 10:38 a.m. UTC | #3
On Tue, Apr 26, 2016 at 10:15:46AM +0200, Jan Kiszka wrote:
> On 2016-04-26 09:57, Jan Kiszka wrote:
> > On 2016-04-26 09:34, Peter Xu wrote:
> >> On Mon, Apr 25, 2016 at 09:24:12AM +0200, Jan Kiszka wrote:
> >>> On 2016-04-25 09:18, Peter Xu wrote:
> >>>> On Mon, Apr 25, 2016 at 07:16:19AM +0200, Jan Kiszka wrote:
> >>>>> On 2016-04-19 10:38, Peter Xu wrote:
> >>>>
> >>>> [...]
> >>>>
> >>>>>> By default, IR is disabled to be better compatible with current
> >>>>>> QEMU. To enable IR, we can using the following command to boot a
> >>>>>> IR-supported VM with virtio-net device with vhost (still do not
> >>>>>> support kvm-ioapic, so we need to specify kernel-irqchip={split|off}
> >>>>>> here):
> >>>>>>
> >>>>>> $ qemu-system-x86_64 -M q35,iommu=on,intr=on,kernel-irqchip=split \
> >>>>>
> >>>>> "intr" sounds a bit too much like "interrupt", not "interrupt
> >>>>> remapping". Why not use the kernel's form, "intremap"?
> >>>>
> >>>> Sure. It sounds nice to be aligned with the kernel one. Let me take
> >>>> it in v5.
> >>>>
> >>>>>
> >>>>>>      -enable-kvm -m 1024 \
> >>>>>> 	 -netdev tap,id=net0,vhost=on \
> >>>>>>      -device virtio-net-pci,netdev=user.0 \
> >>>>>>      -monitor telnet::3333,server,nowait \
> >>>>>> 	 /var/lib/libvirt/images/vm1.qcow2
> >>>>>>
> >>>>>> When guest boots, we can verify whether IR enabled by grepping the
> >>>>>> dmesg like:
> >>>>>>
> >>>>>> [root@localhost ~]# journalctl -k | grep "DMAR-IR"
> >>>>>> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: IOAPIC id 0 under DRHD base  0xfed90000 IOMMU 0
> >>>>>> Feb 19 11:21:23 localhost.localdomain kernel: DMAR-IR: Enabled IRQ remapping in xapic mode
> >>>>>>
> >>>>>> Currently supported devices:
> >>>>>>
> >>>>>> - Emulated/Splitted irqchip
> >>>>>> - Generic PCI Devices
> >>>>>> - vhost devices
> >>>>>> - pass through device support? Not tested, but suppose it should work.
> >>>>>
> >>>>> I've tested this series against my Jailhouse setup, and it works pretty
> >>>>> well! Actually considering to move my test setup over this branch.
> >>>>
> >>>> This is really encouraging feedback! Btw, thanks for all kinds of
> >>>> help on this patchset. :-)
> >>>>
> >>>>>
> >>>>> However, split irqchip still has some issues: When I boot a q35 machine
> >>>>> with Linux, the e1000 network adapter only gets a single IRQ delivered.
> >>>>> Interestingly, other IOAPIC IRQs like the keyboard work all the time. I
> >>>>> didn't debug this in details yet.
> >>>>
> >>>> I reproduced this problem. It seems that it fails even with
> >>>> kernel-irqchip=off. Will try to dig it out.
> >>>
> >>> Very good. Hope it can be easily fixed.
> >>
> >> Hi, Jan,
> >>
> >> The above issue should be caused by EOI missing of level-triggered
> >> interrupts. Before that, I was always using edge-triggered
> >> interrupts for test, so didn't encounter this one. Would you please
> >> help try below patch? It can be applied directly onto the series,
> >> and should solve the issue (it works on my test vm, and I'll take it
> >> in v5 as well if it also works for you):
> >>
> > 
> > Works here as well. I even made EIM working with some hack, though
> > Jailhouse spits out strange warnings, despite it works fine (x2apic
> > mode, split irqchip).
> 
> Corrections: the warnings are issued by qemu, not Jailhouse, e.g.
> 
> qemu-system-x86_64: VT-d Failed to remap interrupt for gsi 22.
> 
> I suspect that comes from the hand-over phase of Jailhouse, when it
> mutes all interrupts in the system while reconfiguring IR and IOAPIC.
> 
> Please convert this error (in kvm_arch_fixup_msi_route) into a trace
> point. It shall not annoy the host. Also check if you have more of such
> guest-triggerable error messages.

Okay. This should be the only one. I can use trace instead.

Meanwhile, I still suppose we should not seen it even with
error_report().. Would this happen when boot e.g. generic kernels?

-- peterx
Radim Krčmář April 26, 2016, 2:19 p.m. UTC | #4
2016-04-26 15:34+0800, Peter Xu:
> Hi, Jan,
> 
> The above issue should be caused by EOI missing of level-triggered
> interrupts. Before that, I was always using edge-triggered
> interrupts for test, so didn't encounter this one. Would you please
> help try below patch? It can be applied directly onto the series,
> and should solve the issue (it works on my test vm, and I'll take it
> in v5 as well if it also works for you):
> 
> -------------------------
> 
> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> @@ -281,6 +281,36 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int size)
> +/*
> + * This is to satisfy the hack in Linux kernel. One hack of it is to
> + * simulate clearing the Remote IRR bit of IOAPIC entry using the
> + * following:
> + *
> + * "For IO-APIC's with EOI register, we use that to do an explicit EOI.
> + * Otherwise, we simulate the EOI message manually by changing the trigger
> + * mode to edge and then back to level, with RTE being masked during
> + * this."
> + *
> + * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701)
> + *
> + * This is based on the assumption that, Remote IRR bit will be
> + * cleared by IOAPIC hardware for edge-triggered interrupts (I
> + * believe that's what the IOAPIC version 0x1X hardware does).

I thought that Linux doesn't use explicit "EOI" to IO-APIC, but relies
on EOI broadcast from LAPIC -- does that change with IR?

> + *                                                             So
> + * if we are emulating it, we'd better do it the same here, so that
> + * the guest kernel hack will work as well on QEMU.

Totally.

> + * Without this, level-triggered interrupts in IR mode might fail to
> + * work correctly.

(I don't really understand why it worked before.)

> + */
> +static inline void
> +ioapic_fix_edge_remote_irr(uint64_t *entry)
> +{
> +    if (*entry & IOAPIC_LVT_TRIGGER_MODE) {
> +        /* Level triggered interrupts, make sure remote IRR is zero */
> +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);

(You can just unconditionally zero it, edge doesn't care.)

> +    }
> +}
> +
> @@ -314,6 +344,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
>                      s->ioredtbl[index] &= ~0xffffffffULL;
>                      s->ioredtbl[index] |= val;
>                  }
> +                ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);

I think this can be done only in the else branch of (s->ioregsel & 1).

(If the guest kernel does level->edge->level, then remote_irr probably
 should be cleared only on edge->level transition and not on
 level->level, but I haven't seen that in the spec ...)

>                  ioapic_service(s);
> ------------------------
> 
> I am still looking into guest part codes. Although the above patch
> should solve the issue, there are still issues in guest codes when
> IR is enabled:
> 
> - mismatched "vector" in IOAPIC entry and IRTE entry (this is
>   required in vt-d spec 5.1.5.1, and required to correctly deliver
>   EOI broadcast I guess). See intel_irq_remapping_prepare_irte():

"required" is a way of saying that the opposite is undefined.
No need to think about it in IOMMU.

> - I encountered that level-triggered entries in IOAPIC is marked as
>   edge-triggered interrupt in APIC (which is strange)...

What/where do you mean?
(The only difference I know of is that level triggered vectors in LAPIC
 have their respective TMR bit set while edge do not.)

Thanks.
Peter Xu April 27, 2016, 7:29 a.m. UTC | #5
On Tue, Apr 26, 2016 at 04:19:00PM +0200, Radim Kr?má? wrote:
> 2016-04-26 15:34+0800, Peter Xu:
> > Hi, Jan,
> > 
> > The above issue should be caused by EOI missing of level-triggered
> > interrupts. Before that, I was always using edge-triggered
> > interrupts for test, so didn't encounter this one. Would you please
> > help try below patch? It can be applied directly onto the series,
> > and should solve the issue (it works on my test vm, and I'll take it
> > in v5 as well if it also works for you):
> > 
> > -------------------------
> > 
> > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> > @@ -281,6 +281,36 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int size)
> > +/*
> > + * This is to satisfy the hack in Linux kernel. One hack of it is to
> > + * simulate clearing the Remote IRR bit of IOAPIC entry using the
> > + * following:
> > + *
> > + * "For IO-APIC's with EOI register, we use that to do an explicit EOI.
> > + * Otherwise, we simulate the EOI message manually by changing the trigger
> > + * mode to edge and then back to level, with RTE being masked during
> > + * this."
> > + *
> > + * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701)
> > + *
> > + * This is based on the assumption that, Remote IRR bit will be
> > + * cleared by IOAPIC hardware for edge-triggered interrupts (I
> > + * believe that's what the IOAPIC version 0x1X hardware does).
> 
> I thought that Linux doesn't use explicit "EOI" to IO-APIC, but relies
> on EOI broadcast from LAPIC -- does that change with IR?

IIUC, ioapic_ack_level() should be the one to handle EOI when IR is
disabled. And, the EOI broadcast should be happening at:

	ack_APIC_irq();

While, after that, we can see some more lines:

	/*
	 * Tail end of clearing remote IRR bit (either by delivering the EOI
	 * message via io-apic EOI register write or simulating it using
	 * mask+edge followed by unnask+level logic) manually when the
	 * level triggered interrupt is seen as the edge triggered interrupt
	 * at the cpu.
	 */
	if (!(v & (1 << (i & 0x1f)))) {
		atomic_inc(&irq_mis_count);
		eoi_ioapic_pin(cfg->vector, irq_data->chip_data);
	}

What I understand the above is that: first of all, we will do EOI
broadcast. However, if we found that one level-triggered interrupt
is treated as edge-triggered interrupt (that is exactly what I have
encountered below), we will do one more explicit EOI in
eoi_ioapic_pin(), in which we played the edge-mask/level-unmask
trick for IOAPIC with version 0x1X.

For IR enabled case, we just do both without checking (see
ioapic_ir_ack_level()).

So that's why I think this should not happen if either way
works... Or say, if without this patch, both "EOI broadcast" and
"explicit EOI (hacky version)" are not working for IR case. And I am
still looking for the reason for previous one (this patch fix the
latter one).

> 
> > + *                                                             So
> > + * if we are emulating it, we'd better do it the same here, so that
> > + * the guest kernel hack will work as well on QEMU.
> 
> Totally.
> 
> > + * Without this, level-triggered interrupts in IR mode might fail to
> > + * work correctly.
> 
> (I don't really understand why it worked before.)

Yes, actually what I want to try is to have one IOMMU hardware
machine, plug e1000 (I mean real hardware) into it, and see whether
current Linux kernel IOMMU driver can cope well with level-triggered
devices (I suppose this scenario is rarely used, since
level-triggered interrupts are most legacy IIUC).

> 
> > + */
> > +static inline void
> > +ioapic_fix_edge_remote_irr(uint64_t *entry)
> > +{
> > +    if (*entry & IOAPIC_LVT_TRIGGER_MODE) {
> > +        /* Level triggered interrupts, make sure remote IRR is zero */
> > +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
> 
> (You can just unconditionally zero it, edge doesn't care.)

Ah! I made a mistake. I suppose what I really want is:

+    if (!(*entry & IOAPIC_LVT_TRIGGER_MODE)) {
+        /* Edge-triggered interrupts, make sure remote IRR is zero */
+        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
+    }

Though both should help do the trick, I should be using this new
one in v5.

> 
> > +    }
> > +}
> > +
> > @@ -314,6 +344,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
> >                      s->ioredtbl[index] &= ~0xffffffffULL;
> >                      s->ioredtbl[index] |= val;
> >                  }
> > +                ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
> 
> I think this can be done only in the else branch of (s->ioregsel & 1).

Yes. I can move it there, but there will be hidden assumption (or
say, truth...) that these magic bits are inside entry bits 31-0, and
people might be confused if we do not know that.  IMHO, for better
readability of code, I would still prefer to put it here (it means
"we need to make sure the entry satisfy some kind of rule, but we do
not need to know further about what the rule is"). If you still
insist, I'd like to take your advice though. :)

> 
> (If the guest kernel does level->edge->level, then remote_irr probably
>  should be cleared only on edge->level transition and not on
>  level->level, but I haven't seen that in the spec ...)

Agree. That's what my above diff is trying to fix. Thanks to point out.

> 
> >                  ioapic_service(s);
> > ------------------------
> > 
> > I am still looking into guest part codes. Although the above patch
> > should solve the issue, there are still issues in guest codes when
> > IR is enabled:
> > 
> > - mismatched "vector" in IOAPIC entry and IRTE entry (this is
> >   required in vt-d spec 5.1.5.1, and required to correctly deliver
> >   EOI broadcast I guess). See intel_irq_remapping_prepare_irte():
> 
> "required" is a way of saying that the opposite is undefined.
> No need to think about it in IOMMU.

Why? Without correct vector information, IOAPIC will not be able to
know which entry to clear the Remote IRR bit (please check
ioapic_eoi_broadcast())?

> 
> > - I encountered that level-triggered entries in IOAPIC is marked as
> >   edge-triggered interrupt in APIC (which is strange)...
> 
> What/where do you mean?
> (The only difference I know of is that level triggered vectors in LAPIC
>  have their respective TMR bit set while edge do not.)

Exactly. Here is what I mean:

static void apic_eoi(APICCommonState *s)
{
    int isrv;
    isrv = get_highest_priority_int(s->isr);
    if (isrv < 0)
        return;
    apic_reset_bit(s->isr, isrv);
    if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && apic_get_bit(s->tmr, isrv)) {
        ioapic_eoi_broadcast(isrv);
    }
    apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC);
    apic_update_irq(s);
}

APIC will notify IOAPIC only if the corresponding vector in TMR bit
is set (in "apic_get_bit(s->tmr, isrv)", or say, it's a
level-triggered interrupt in APIC registers). What I have traced is
that, the EOI broadcast is missing because this bit is cleared in
APIC TMR while it should be set. I need some more tests to double
confirm this though, in case I made any mistake.

(P.S. Actually I saw some similiar comments in kernel codes around,
please check the long comments in ioapic_ack_level().  Not sure
whether these are related.)

Thanks!

-- peterx
Radim Krčmář April 27, 2016, 2:31 p.m. UTC | #6
2016-04-27 15:29+0800, Peter Xu:
> On Tue, Apr 26, 2016 at 04:19:00PM +0200, Radim Kr?má? wrote:
>> 2016-04-26 15:34+0800, Peter Xu:
>> > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
>> > @@ -281,6 +281,36 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int size)
>> > +/*
>> > + * This is to satisfy the hack in Linux kernel. One hack of it is to
>> > + * simulate clearing the Remote IRR bit of IOAPIC entry using the
>> > + * following:
>> > + *
>> > + * "For IO-APIC's with EOI register, we use that to do an explicit EOI.
>> > + * Otherwise, we simulate the EOI message manually by changing the trigger
>> > + * mode to edge and then back to level, with RTE being masked during
>> > + * this."
>> > + *
>> > + * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701)
>> > + *
>> > + * This is based on the assumption that, Remote IRR bit will be
>> > + * cleared by IOAPIC hardware for edge-triggered interrupts (I
>> > + * believe that's what the IOAPIC version 0x1X hardware does).
>> 
>> I thought that Linux doesn't use explicit "EOI" to IO-APIC, but relies
>> on EOI broadcast from LAPIC -- does that change with IR?
> 
> IIUC, ioapic_ack_level() should be the one to handle EOI when IR is
> disabled. And, the EOI broadcast should be happening at:
> 
> 	ack_APIC_irq();
> 
> While, after that, we can see some more lines:
> 
> 	/*
> 	 * Tail end of clearing remote IRR bit (either by delivering the EOI
> 	 * message via io-apic EOI register write or simulating it using
> 	 * mask+edge followed by unnask+level logic) manually when the
> 	 * level triggered interrupt is seen as the edge triggered interrupt
> 	 * at the cpu.
> 	 */
> 	if (!(v & (1 << (i & 0x1f)))) {
> 		atomic_inc(&irq_mis_count);
> 		eoi_ioapic_pin(cfg->vector, irq_data->chip_data);
> 	}
> 
> What I understand the above is that: first of all, we will do EOI
> broadcast. However, if we found that one level-triggered interrupt
> is treated as edge-triggered interrupt (that is exactly what I have
> encountered below), we will do one more explicit EOI in
> eoi_ioapic_pin(), in which we played the edge-mask/level-unmask
> trick for IOAPIC with version 0x1X.

Indeed, thanks for the explanation.

> For IR enabled case, we just do both without checking (see
> ioapic_ir_ack_level()).

(IR with IO-APIC below version 0x20 probably does not exist in the wild.
 I don't find any reason why the interaction would bug, though.)

>> > + */
>> > +static inline void
>> > +ioapic_fix_edge_remote_irr(uint64_t *entry)
>> > +{
>> > +    if (*entry & IOAPIC_LVT_TRIGGER_MODE) {
>> > +        /* Level triggered interrupts, make sure remote IRR is zero */
>> > +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
>> 
>> (You can just unconditionally zero it, edge doesn't care.)
> 
> Ah! I made a mistake. I suppose what I really want is:
> 
> +    if (!(*entry & IOAPIC_LVT_TRIGGER_MODE)) {
> +        /* Edge-triggered interrupts, make sure remote IRR is zero */
> +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
> +    }
> 
> Though both should help do the trick, I should be using this new
> one in v5.

(You'd need to look at the old value for this to work.)

>> > @@ -314,6 +344,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
>> >                      s->ioredtbl[index] &= ~0xffffffffULL;
>> >                      s->ioredtbl[index] |= val;
>> >                  }
>> > +                ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
>> 
>> I think this can be done only in the else branch of (s->ioregsel & 1).
> 
> Yes. I can move it there, but there will be hidden assumption (or
> say, truth...) that these magic bits are inside entry bits 31-0, and
> people might be confused if we do not know that.  IMHO, for better
> readability of code, I would still prefer to put it here (it means
> "we need to make sure the entry satisfy some kind of rule, but we do
> not need to know further about what the rule is"). If you still
> insist, I'd like to take your advice though. :)

I don't.  If you clear it only on edge->level transition, then those two
also behave the same.

>> > I am still looking into guest part codes. Although the above patch
>> > should solve the issue, there are still issues in guest codes when
>> > IR is enabled:
>> > 
>> > - mismatched "vector" in IOAPIC entry and IRTE entry (this is
>> >   required in vt-d spec 5.1.5.1, and required to correctly deliver
>> >   EOI broadcast I guess). See intel_irq_remapping_prepare_irte():
>> 
>> "required" is a way of saying that the opposite is undefined.
>> No need to think about it in IOMMU.
> 
> Why? Without correct vector information, IOAPIC will not be able to
> know which entry to clear the Remote IRR bit (please check
> ioapic_eoi_broadcast())?

IOAPIC won't get correct EOI and Intel made it into an OS bug, because
there was no good action that the hardware could take.  (We have a lot
more freedom, but I think that partially fixing something that doesn't
work on real hardware is a wasted effort.)

Or did you mean that mismatched vector is a possible source of the fixed
bug?  (I originally dismissed it, because real hardware works.)

>> > - I encountered that level-triggered entries in IOAPIC is marked as
>> >   edge-triggered interrupt in APIC (which is strange)...
>> 
>> What/where do you mean?
>> (The only difference I know of is that level triggered vectors in LAPIC
>>  have their respective TMR bit set while edge do not.)
> 
> Exactly. Here is what I mean:
> 
> static void apic_eoi(APICCommonState *s)
> {
>     int isrv;
>     isrv = get_highest_priority_int(s->isr);
>     if (isrv < 0)
>         return;
>     apic_reset_bit(s->isr, isrv);
>     if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && apic_get_bit(s->tmr, isrv)) {
>         ioapic_eoi_broadcast(isrv);
>     }
>     apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC);
>     apic_update_irq(s);
> }
> 
> APIC will notify IOAPIC only if the corresponding vector in TMR bit
> is set (in "apic_get_bit(s->tmr, isrv)", or say, it's a
> level-triggered interrupt in APIC registers). What I have traced is
> that, the EOI broadcast is missing because this bit is cleared in
> APIC TMR while it should be set. I need some more tests to double
> confirm this though, in case I made any mistake.

(There are two "legal" situations where TMR can be 0 and IOAPIC sets
 remote IRR -- if edge and level interrupts are assigned to the same
 vector and if IOAPIC is level while IR and OS edge, both would bug on
 real hardware too ...)

Does QEMU bug with TCG?

> (P.S. Actually I saw some similiar comments in kernel codes around,
> please check the long comments in ioapic_ack_level().  Not sure
> whether these are related.)

I hope we didn't emulate the hardware bug. :)
Peter Xu April 28, 2016, 5:27 a.m. UTC | #7
On Wed, Apr 27, 2016 at 04:31:13PM +0200, Radim Kr?má? wrote:

[...]

> >> > I am still looking into guest part codes. Although the above patch
> >> > should solve the issue, there are still issues in guest codes when
> >> > IR is enabled:
> >> > 
> >> > - mismatched "vector" in IOAPIC entry and IRTE entry (this is
> >> >   required in vt-d spec 5.1.5.1, and required to correctly deliver
> >> >   EOI broadcast I guess). See intel_irq_remapping_prepare_irte():
> >> 
> >> "required" is a way of saying that the opposite is undefined.
> >> No need to think about it in IOMMU.
> > 
> > Why? Without correct vector information, IOAPIC will not be able to
> > know which entry to clear the Remote IRR bit (please check
> > ioapic_eoi_broadcast())?
> 
> IOAPIC won't get correct EOI and Intel made it into an OS bug, because
> there was no good action that the hardware could take.  (We have a lot
> more freedom, but I think that partially fixing something that doesn't
> work on real hardware is a wasted effort.)

To make sure I understand this correctly... Do you mean that real
IOAPIC hardware will not handle this EOI broadcast correctly even if
we fill in matched vector in the IOAPIC entry with IRTE one (when IR
is enabled)?

I'd appreciate if there is any link or anything that can provide me
more background on this matter.. TIA.

> 
> Or did you mean that mismatched vector is a possible source of the fixed
> bug?  (I originally dismissed it, because real hardware works.)

Nop. The above patch fixes the hack for "explicit IOAPIC EOI", and I
suppose mismatched vector issue will cause "EOI broadcast" problem.
But IIUC from your above comment, we can temporarily skip this
"issue" for now, if it won't work even on real hardwares and even
vectors are matched.

Anyway, as long as the explicit EOI works, we can survive. And this
gives me the reason to send v5 first.

> 
> >> > - I encountered that level-triggered entries in IOAPIC is marked as
> >> >   edge-triggered interrupt in APIC (which is strange)...
> >> 
> >> What/where do you mean?
> >> (The only difference I know of is that level triggered vectors in LAPIC
> >>  have their respective TMR bit set while edge do not.)
> > 
> > Exactly. Here is what I mean:
> > 
> > static void apic_eoi(APICCommonState *s)
> > {
> >     int isrv;
> >     isrv = get_highest_priority_int(s->isr);
> >     if (isrv < 0)
> >         return;
> >     apic_reset_bit(s->isr, isrv);
> >     if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && apic_get_bit(s->tmr, isrv)) {
> >         ioapic_eoi_broadcast(isrv);
> >     }
> >     apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC);
> >     apic_update_irq(s);
> > }
> > 
> > APIC will notify IOAPIC only if the corresponding vector in TMR bit
> > is set (in "apic_get_bit(s->tmr, isrv)", or say, it's a
> > level-triggered interrupt in APIC registers). What I have traced is
> > that, the EOI broadcast is missing because this bit is cleared in
> > APIC TMR while it should be set. I need some more tests to double
> > confirm this though, in case I made any mistake.
> 
> (There are two "legal" situations where TMR can be 0 and IOAPIC sets
>  remote IRR -- if edge and level interrupts are assigned to the same
>  vector and if IOAPIC is level while IR and OS edge, both would bug on
>  real hardware too ...)
> 
> Does QEMU bug with TCG?

Gave it a shot today. It happens as well.

Thanks,

-- peterx
Peter Xu April 28, 2016, 6:06 a.m. UTC | #8
On Wed, Apr 27, 2016 at 04:31:13PM +0200, Radim Kr?má? wrote:
> >> > + */
> >> > +static inline void
> >> > +ioapic_fix_edge_remote_irr(uint64_t *entry)
> >> > +{
> >> > +    if (*entry & IOAPIC_LVT_TRIGGER_MODE) {
> >> > +        /* Level triggered interrupts, make sure remote IRR is zero */
> >> > +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
> >> 
> >> (You can just unconditionally zero it, edge doesn't care.)
> > 
> > Ah! I made a mistake. I suppose what I really want is:
> > 
> > +    if (!(*entry & IOAPIC_LVT_TRIGGER_MODE)) {
> > +        /* Edge-triggered interrupts, make sure remote IRR is zero */
> > +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
> > +    }
> > 
> > Though both should help do the trick, I should be using this new
> > one in v5.
> 
> (You'd need to look at the old value for this to work.)

Yes, you are right. The problem is that, we actually has RW
permission for remote IRR bit in emulated IOAPIC. If so, I'd rather
take the original version, and unconditionally zero it, as you have
adviced (also, will fix up the comments to get them aligned).

-- peterx
Peter Xu April 28, 2016, 6:44 a.m. UTC | #9
On Thu, Apr 28, 2016 at 02:06:17PM +0800, Peter Xu wrote:
> On Wed, Apr 27, 2016 at 04:31:13PM +0200, Radim Kr?má? wrote:
> > >> > + */
> > >> > +static inline void
> > >> > +ioapic_fix_edge_remote_irr(uint64_t *entry)
> > >> > +{
> > >> > +    if (*entry & IOAPIC_LVT_TRIGGER_MODE) {
> > >> > +        /* Level triggered interrupts, make sure remote IRR is zero */
> > >> > +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
> > >> 
> > >> (You can just unconditionally zero it, edge doesn't care.)
> > > 
> > > Ah! I made a mistake. I suppose what I really want is:
> > > 
> > > +    if (!(*entry & IOAPIC_LVT_TRIGGER_MODE)) {
> > > +        /* Edge-triggered interrupts, make sure remote IRR is zero */
> > > +        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
> > > +    }
> > > 
> > > Though both should help do the trick, I should be using this new
> > > one in v5.
> > 
> > (You'd need to look at the old value for this to work.)
> 
> Yes, you are right. The problem is that, we actually has RW
> permission for remote IRR bit in emulated IOAPIC. If so, I'd rather
> take the original version, and unconditionally zero it, as you have
> adviced (also, will fix up the comments to get them aligned).

After a second thought, a better idea (though may need several more
lines of codes) is to make sure the RO bits in IOAPIC entry are
read-only (I mean, "real" read-only) before the above hack. I
suppose this further matches with real hardware behavior.

Let me send v5 directly to see the codes.

Thanks,

-- peterx
Radim Krčmář April 28, 2016, 4:24 p.m. UTC | #10
2016-04-28 13:27+0800, Peter Xu:
> On Wed, Apr 27, 2016 at 04:31:13PM +0200, Radim Kr?má? wrote:
> 
> [...]
> 
>> >> > I am still looking into guest part codes. Although the above patch
>> >> > should solve the issue, there are still issues in guest codes when
>> >> > IR is enabled:
>> >> > 
>> >> > - mismatched "vector" in IOAPIC entry and IRTE entry (this is
>> >> >   required in vt-d spec 5.1.5.1, and required to correctly deliver
>> >> >   EOI broadcast I guess). See intel_irq_remapping_prepare_irte():
>> >> 
>> >> "required" is a way of saying that the opposite is undefined.
>> >> No need to think about it in IOMMU.
>> > 
>> > Why? Without correct vector information, IOAPIC will not be able to
>> > know which entry to clear the Remote IRR bit (please check
>> > ioapic_eoi_broadcast())?
>> 
>> IOAPIC won't get correct EOI and Intel made it into an OS bug, because
>> there was no good action that the hardware could take.  (We have a lot
>> more freedom, but I think that partially fixing something that doesn't
>> work on real hardware is a wasted effort.)
> 
> To make sure I understand this correctly... Do you mean that real
> IOAPIC hardware will not handle this EOI broadcast correctly even if
> we fill in matched vector in the IOAPIC entry with IRTE one (when IR
> is enabled)?

No, if the OS configures same vector in IR and IOAPIC, then EOI
broadcast will work just fine.

My point was that the OS *must* do it that way.  If the OS doesn't, then
hardware's behavior is undefined = everything that happens is correct.
QEMU/KVM just shouldn't bug.  I think that QEMU even behaves pretty much
like real hardware here, so doing nothing now is the best choice.

> I'd appreciate if there is any link or anything that can provide me
> more background on this matter.. TIA.

Hm, I only read the specs ...

LAPIC EOI broadcast doesn't distinguish whether IOAPIC or IR injected
the interrupt and notifies IOAPICs with the vector in ISR.  The vector
doesn't provide enough information for a unique mapping between IOAPIC
and IR entries, so IOAPIC just clears Remote IRR bits of the vector.
There is no nice solution if you allow different vectors, so the
hardware doesn't.

>> Or did you mean that mismatched vector is a possible source of the fixed
>> bug?  (I originally dismissed it, because real hardware works.)
> 
> Nop. The above patch fixes the hack for "explicit IOAPIC EOI", and I
> suppose mismatched vector issue will cause "EOI broadcast" problem.
> But IIUC from your above comment, we can temporarily skip this
> "issue" for now, if it won't work even on real hardwares and even
> vectors are matched.
> 
> Anyway, as long as the explicit EOI works, we can survive. And this
> gives me the reason to send v5 first.

Yep.  EOI broadcast has to work in some cases, though, I'm sorry if I
said the opposite.
diff mbox

Patch

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index b41ab89..de6a8cf 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -281,6 +281,36 @@  ioapic_mem_read(void *opaque, hwaddr addr, unsigned int size)
     return val;
 }

+/*
+ * This is to satisfy the hack in Linux kernel. One hack of it is to
+ * simulate clearing the Remote IRR bit of IOAPIC entry using the
+ * following:
+ *
+ * "For IO-APIC's with EOI register, we use that to do an explicit EOI.
+ * Otherwise, we simulate the EOI message manually by changing the trigger
+ * mode to edge and then back to level, with RTE being masked during
+ * this."
+ *
+ * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701)
+ *
+ * This is based on the assumption that, Remote IRR bit will be
+ * cleared by IOAPIC hardware for edge-triggered interrupts (I
+ * believe that's what the IOAPIC version 0x1X hardware does). So
+ * if we are emulating it, we'd better do it the same here, so that
+ * the guest kernel hack will work as well on QEMU.
+ *
+ * Without this, level-triggered interrupts in IR mode might fail to
+ * work correctly.
+ */
+static inline void
+ioapic_fix_edge_remote_irr(uint64_t *entry)
+{
+    if (*entry & IOAPIC_LVT_TRIGGER_MODE) {
+        /* Level triggered interrupts, make sure remote IRR is zero */
+        *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
+    }
+}
+
 static void
 ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                  unsigned int size)
@@ -314,6 +344,7 @@  ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                     s->ioredtbl[index] &= ~0xffffffffULL;
                     s->ioredtbl[index] |= val;
                 }
+                ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
                 ioapic_service(s);
             }
         }