diff mbox

[PULL,22/55] q35: ioapic: add support for emulated IOAPIC IR

Message ID 20160719014441-mutt-send-email-mst@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Michael S. Tsirkin July 18, 2016, 10:44 p.m. UTC
From: Peter Xu <peterx@redhat.com>

This patch translates all IOAPIC interrupts into MSI ones. One pseudo
ioapic address space is added to transfer the MSI message. By default,
it will be system memory address space. When IR is enabled, it will be
IOMMU address space.

Currently, only emulated IOAPIC is supported.

Idea suggested by Jan Kiszka and Rita Sinha in the following patch:

https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg01933.html

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/hw/i386/apic-msidef.h     |  1 +
 include/hw/i386/ioapic_internal.h |  1 +
 include/hw/i386/pc.h              |  4 ++++
 hw/i386/intel_iommu.c             |  6 +++++-
 hw/i386/pc.c                      |  3 +++
 hw/intc/ioapic.c                  | 28 ++++++++++++++++++++++++----
 6 files changed, 38 insertions(+), 5 deletions(-)

Comments

Emilio Cota Nov. 11, 2016, 5:18 p.m. UTC | #1
On Tue, Jul 19, 2016 at 01:44:41 +0300, Michael S. Tsirkin wrote:
> From: Peter Xu <peterx@redhat.com>
> 
> This patch translates all IOAPIC interrupts into MSI ones. One pseudo
> ioapic address space is added to transfer the MSI message. By default,
> it will be system memory address space. When IR is enabled, it will be
> IOMMU address space.
> 
> Currently, only emulated IOAPIC is supported.
> 
> Idea suggested by Jan Kiszka and Rita Sinha in the following patch:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg01933.html
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/hw/i386/apic-msidef.h     |  1 +
>  include/hw/i386/ioapic_internal.h |  1 +
>  include/hw/i386/pc.h              |  4 ++++
>  hw/i386/intel_iommu.c             |  6 +++++-
>  hw/i386/pc.c                      |  3 +++
>  hw/intc/ioapic.c                  | 28 ++++++++++++++++++++++++----
>  6 files changed, 38 insertions(+), 5 deletions(-)

This commit (which sits between 2.6 and 2.7) doesn't let me boot a
buildroot-generated x86_64 image when QEMU is configured with
--with-coroutine=gthread (it deadlocks on the BQL shortly after
the framebuffer comes up.)

Is this something we should worry about? I see in the configure
script that --with-coroutine=gthread "is not functional enough to run
QEMU proper". My goal is to use thread sanitizer (tsan) to test
mttcg for x86-64. Unfortunately, tsan blows with ucontext coroutines.

Thanks,

		Emilio
Emilio Cota Nov. 11, 2016, 7:50 p.m. UTC | #2
On Fri, Nov 11, 2016 at 12:18:04 -0500, Emilio G. Cota wrote:
> This commit (which sits between 2.6 and 2.7)

Forgot to add the commit id -- cb135f59b8059c3a3

		E.
Peter Xu Nov. 11, 2016, 11:17 p.m. UTC | #3
Hi, Emilio,

On Fri, Nov 11, 2016 at 12:18:04PM -0500, Emilio G. Cota wrote:
> On Tue, Jul 19, 2016 at 01:44:41 +0300, Michael S. Tsirkin wrote:
> > From: Peter Xu <peterx@redhat.com>
> > 
> > This patch translates all IOAPIC interrupts into MSI ones. One pseudo
> > ioapic address space is added to transfer the MSI message. By default,
> > it will be system memory address space. When IR is enabled, it will be
> > IOMMU address space.
> > 
> > Currently, only emulated IOAPIC is supported.
> > 
> > Idea suggested by Jan Kiszka and Rita Sinha in the following patch:
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg01933.html
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  include/hw/i386/apic-msidef.h     |  1 +
> >  include/hw/i386/ioapic_internal.h |  1 +
> >  include/hw/i386/pc.h              |  4 ++++
> >  hw/i386/intel_iommu.c             |  6 +++++-
> >  hw/i386/pc.c                      |  3 +++
> >  hw/intc/ioapic.c                  | 28 ++++++++++++++++++++++++----
> >  6 files changed, 38 insertions(+), 5 deletions(-)
> 
> This commit (which sits between 2.6 and 2.7) doesn't let me boot a
> buildroot-generated x86_64 image when QEMU is configured with
> --with-coroutine=gthread (it deadlocks on the BQL shortly after
> the framebuffer comes up.)
> 
> Is this something we should worry about? I see in the configure
> script that --with-coroutine=gthread "is not functional enough to run
> QEMU proper". My goal is to use thread sanitizer (tsan) to test
> mttcg for x86-64. Unfortunately, tsan blows with ucontext coroutines.

I tried to build QEMU using:

  ../configure --target-list=x86_64-softmmu --with-coroutine=gthread

with above commit. QEMU binary can boot well with either KVM or TCG
(with no QEMU paramter, so only BIOS is up). However if I provide a
image disk to the VM, KVM version worked, but TCG didn't.

Is this the same error you have encountered?

I also tried to test with exactly the same build parameters with the
previous commit of above (09cd058a2c, "intel_iommu: get rid of {0}
initializers"), it has the same problem (TCG version cannot boot guest
kernel if I provide a disk as parameter).

Do we still support gthread as coroutine backend? And to what extend
do we support it?

Thanks,

-- peterx
Emilio Cota Nov. 12, 2016, 2:04 a.m. UTC | #4
On Fri, Nov 11, 2016 at 18:17:05 -0500, Peter Xu wrote:
> > This commit (which sits between 2.6 and 2.7) doesn't let me boot a
> > buildroot-generated x86_64 image when QEMU is configured with
> > --with-coroutine=gthread (it deadlocks on the BQL shortly after
> > the framebuffer comes up.)
> > 
> > Is this something we should worry about? I see in the configure
> > script that --with-coroutine=gthread "is not functional enough to run
> > QEMU proper". My goal is to use thread sanitizer (tsan) to test
> > mttcg for x86-64. Unfortunately, tsan blows with ucontext coroutines.
> 
> I tried to build QEMU using:
> 
>   ../configure --target-list=x86_64-softmmu --with-coroutine=gthread
> 
> with above commit. QEMU binary can boot well with either KVM or TCG
> (with no QEMU paramter, so only BIOS is up). However if I provide a
> image disk to the VM, KVM version worked, but TCG didn't.
> 
> Is this the same error you have encountered?

KVM works fine for me in all cases.

With TCG, QEMU freezes when booting linux, when the fb comes up (right
after the resolution changes). This is the last output I see from the kernel:
  http://imgur.com/YWHUM9x

I tried booting with -nographic but it still freezes.

I'm booting a buildroot-generated image with:

x86_64-softmmu/qemu-system-x86_64 -no-reboot -M pc \
	-kernel /path/to/buildroot/output/images/bzImage \
	-drive file=/path/to/buildroot/output/images/rootfs.ext2,if=virtio,format=raw \
	-append 'root=/dev/vda' -net nic,model=virtio -net user,hostfwd=tcp::10022-:22 \
	-smp 1 -m 4G

> I also tried to test with exactly the same build parameters with the
> previous commit of above (09cd058a2c, "intel_iommu: get rid of {0}
> initializers"), it has the same problem (TCG version cannot boot guest
> kernel if I provide a disk as parameter).

Hmm that's interesting, on my end 09cd05^ works well 100% of the time
w/ TCG.

> Do we still support gthread as coroutine backend? And to what extend
> do we support it?

I don't know :( That's why I sent the message.

I'll investigate further.

Thanks,

		Emilio
Alex Bennée Nov. 12, 2016, 11:01 a.m. UTC | #5
Emilio G. Cota <cota@braap.org> writes:

> On Fri, Nov 11, 2016 at 18:17:05 -0500, Peter Xu wrote:
<snip>
>> I also tried to test with exactly the same build parameters with the
>> previous commit of above (09cd058a2c, "intel_iommu: get rid of {0}
>> initializers"), it has the same problem (TCG version cannot boot guest
>> kernel if I provide a disk as parameter).
>
> Hmm that's interesting, on my end 09cd05^ works well 100% of the time
> w/ TCG.
>
>> Do we still support gthread as coroutine backend? And to what extend
>> do we support it?
>
> I don't know :( That's why I sent the message.
>
> I'll investigate further.

We would like to drop it but tsan only works with it at the moment. We
are exploring what it would take to support the other setcontext/longjmp
backends in tsan, see:

  Message-ID: <871t8kwygw.fsf@linaro.org>

On the thread-sanitizer mailing list. If we can get it working then we
can drop QEMU's gthread backend.

--
Alex Bennée
diff mbox

Patch

diff --git a/include/hw/i386/apic-msidef.h b/include/hw/i386/apic-msidef.h
index 6e2eb71..8b4d4cc 100644
--- a/include/hw/i386/apic-msidef.h
+++ b/include/hw/i386/apic-msidef.h
@@ -25,6 +25,7 @@ 
 #define MSI_ADDR_REDIRECTION_SHIFT      3
 
 #define MSI_ADDR_DEST_ID_SHIFT          12
+#define MSI_ADDR_DEST_IDX_SHIFT         4
 #define  MSI_ADDR_DEST_ID_MASK          0x00ffff0
 
 #endif /* HW_APIC_MSIDEF_H */
diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h
index 0542aa1..5c901ae 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -31,6 +31,7 @@ 
 #define IOAPIC_VERSION                  0x11
 
 #define IOAPIC_LVT_DEST_SHIFT           56
+#define IOAPIC_LVT_DEST_IDX_SHIFT       48
 #define IOAPIC_LVT_MASKED_SHIFT         16
 #define IOAPIC_LVT_TRIGGER_MODE_SHIFT   15
 #define IOAPIC_LVT_REMOTE_IRR_SHIFT     14
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index e38c95a..9811125 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -72,6 +72,10 @@  struct PCMachineState {
     /* NUMA information: */
     uint64_t numa_nodes;
     uint64_t *node_mem;
+
+    /* Address space used by IOAPIC device. All IOAPIC interrupts
+     * will be translated to MSI messages in the address space. */
+    AddressSpace *ioapic_as;
 };
 
 #define PC_MACHINE_ACPI_DEVICE_PROP "acpi-device"
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3d1b15d..feaf806 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -28,6 +28,7 @@ 
 #include "hw/i386/pc.h"
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
+#include "hw/pci-host/q35.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -2369,7 +2370,8 @@  static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 
 static void vtd_realize(DeviceState *dev, Error **errp)
 {
-    PCIBus *bus = PC_MACHINE(qdev_get_machine())->bus;
+    PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
+    PCIBus *bus = pcms->bus;
     IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
 
     VTD_DPRINTF(GENERAL, "");
@@ -2385,6 +2387,8 @@  static void vtd_realize(DeviceState *dev, Error **errp)
     vtd_init(s);
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
     pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
+    /* Pseudo address space under root PCI bus. */
+    pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
 }
 
 static void vtd_class_init(ObjectClass *klass, void *data)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1b8baa8..66f584b 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1429,6 +1429,9 @@  void pc_memory_init(PCMachineState *pcms,
         rom_add_option(option_rom[i].name, option_rom[i].bootindex);
     }
     pcms->fw_cfg = fw_cfg;
+
+    /* Init default IOAPIC address space */
+    pcms->ioapic_as = &address_space_memory;
 }
 
 qemu_irq pc_allocate_cpu_irq(void)
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 273bb08..36dd42a 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -29,6 +29,8 @@ 
 #include "hw/i386/ioapic_internal.h"
 #include "include/hw/pci/msi.h"
 #include "sysemu/kvm.h"
+#include "target-i386/cpu.h"
+#include "hw/i386/apic-msidef.h"
 
 //#define DEBUG_IOAPIC
 
@@ -50,13 +52,15 @@  extern int ioapic_no;
 
 static void ioapic_service(IOAPICCommonState *s)
 {
+    AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
+    uint32_t addr, data;
     uint8_t i;
     uint8_t trig_mode;
     uint8_t vector;
     uint8_t delivery_mode;
     uint32_t mask;
     uint64_t entry;
-    uint8_t dest;
+    uint16_t dest_idx;
     uint8_t dest_mode;
 
     for (i = 0; i < IOAPIC_NUM_PINS; i++) {
@@ -67,7 +71,14 @@  static void ioapic_service(IOAPICCommonState *s)
             entry = s->ioredtbl[i];
             if (!(entry & IOAPIC_LVT_MASKED)) {
                 trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
-                dest = entry >> IOAPIC_LVT_DEST_SHIFT;
+                /*
+                 * By default, this would be dest_id[8] +
+                 * reserved[8]. When IR is enabled, this would be
+                 * interrupt_index[15] + interrupt_format[1]. This
+                 * field never means anything, but only used to
+                 * generate corresponding MSI.
+                 */
+                dest_idx = entry >> IOAPIC_LVT_DEST_IDX_SHIFT;
                 dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
                 delivery_mode =
                     (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
@@ -97,8 +108,17 @@  static void ioapic_service(IOAPICCommonState *s)
 #else
                 (void)coalesce;
 #endif
-                apic_deliver_irq(dest, dest_mode, delivery_mode, vector,
-                                 trig_mode);
+                /* No matter whether IR is enabled, we translate
+                 * the IOAPIC message into a MSI one, and its
+                 * address space will decide whether we need a
+                 * translation. */
+                addr = APIC_DEFAULT_ADDRESS | \
+                    (dest_idx << MSI_ADDR_DEST_IDX_SHIFT) |
+                    (dest_mode << MSI_ADDR_DEST_MODE_SHIFT);
+                data = (vector << MSI_DATA_VECTOR_SHIFT) |
+                    (trig_mode << MSI_DATA_TRIGGER_SHIFT) |
+                    (delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
+                stl_le_phys(ioapic_as, addr, data);
             }
         }
     }