diff mbox

PVH Dom0 Intel IOMMU issues

Message ID 20170418073445.tk7ukhnbl4ns7r33@dhcp-3-128.uk.xensource.com (mailing list archive)
State New, archived
Headers show

Commit Message

Roger Pau Monné April 18, 2017, 7:34 a.m. UTC
On Tue, Apr 18, 2017 at 03:04:51AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> > Sent: Friday, April 14, 2017 11:35 PM
> > 
> > Hello,
> > 
> > Although PVHv2 Dom0 is not yet finished, I've been trying the current code
> > on
> > different hardware, and found that with pre-Haswell Intel hardware PVHv2
> > Dom0
> > completely freezes the box when calling iommu_hwdom_init in
> > dom0_construct_pvh.
> > OTOH the same doesn't happen when using a newer CPU (ie: haswell or
> > newer).
> > 
> > I'm not able to debug that in any meaningful way because the box seems to
> > lock
> > up completely, even the watchdog NMI stops working. Here is the boot log,
> > up to
> > the point where it freezes:
> > 
> 
> I don't have an idea now w/o seeing more meaningful debug message.
> Maybe you have to add more fine-grained prints to capture some
> useful hints.

Hello, I've added the following debug patch:


And got this output:

(XEN) Xen version 4.9-rc (root@) (FreeBSD clang version 3.9.0 (tags/RELEASE_390/final 280324) (based on LLVM 3.9.0)) debug=y  Tue Apr 18 08:22:39 BST 2017
(XEN) Latest ChangeSet:
(XEN) Console output is synchronous.
(XEN) Bootloader: FreeBSD Loader
(XEN) Command line: dom0_mem=4096M dom0=pvh com1=115200,8n1 console=com1,vga guest_loglvl=all loglvl=all iommu=debug,verbose sync_console watchdog
(XEN) Xen image load base address: 0
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 2 MBR signatures
(XEN)  Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000008dc00 (usable)
(XEN)  000000000008dc00 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 0000000018ebb000 (usable)
(XEN)  0000000018ebb000 - 0000000018fe8000 (ACPI NVS)
(XEN)  0000000018fe8000 - 0000000018fe9000 (usable)
(XEN)  0000000018fe9000 - 0000000019000000 (ACPI NVS)
(XEN)  0000000019000000 - 000000001dffd000 (usable)
(XEN)  000000001dffd000 - 000000001e000000 (ACPI data)
(XEN)  000000001e000000 - 00000000ac784000 (usable)
(XEN)  00000000ac784000 - 00000000ac818000 (reserved)
(XEN)  00000000ac818000 - 00000000ad800000 (usable)
(XEN)  00000000b0000000 - 00000000b4000000 (reserved)
(XEN)  00000000fed20000 - 00000000fed40000 (reserved)
(XEN)  00000000fed50000 - 00000000fed90000 (reserved)
(XEN)  00000000ffa00000 - 00000000ffa40000 (reserved)
(XEN)  0000000100000000 - 0000000250000000 (usable)
(XEN) New Xen image base address: 0xad200000
(XEN) ACPI: RSDP 000FE300, 0024 (r2 DELL  )
(XEN) ACPI: XSDT 1DFFEE18, 0074 (r1 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: FACP 18FEFD98, 00F4 (r4 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: DSDT 18FA9018, 6373 (r1 DELL    CBX3           0 INTL 20091112)
(XEN) ACPI: FACS 18FF1F40, 0040
(XEN) ACPI: APIC 1DFFDC18, 0158 (r2 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: MCFG 18FFED18, 003C (r1 A M I  OEMMCFG.  6222004 MSFT       97)
(XEN) ACPI: TCPA 18FFEC98, 0032 (r2                        0             0)
(XEN) ACPI: SSDT 18FF0A98, 0306 (r1 DELLTP      TPM     3000 INTL 20091112)
(XEN) ACPI: HPET 18FFEC18, 0038 (r1 A M I   PCHHPET  6222004 AMI.        3)
(XEN) ACPI: BOOT 18FFEB98, 0028 (r1 DELL   CBX3      6222004 AMI     10013)
(XEN) ACPI: SSDT 18FB0018, 36FFE (r2  INTEL    CpuPm     4000 INTL 20091112)
(XEN) ACPI: SLIC 18FEEC18, 0176 (r3 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: DMAR 18FF1B18, 0094 (r1 A M I   OEMDMAR        1 INTL        1)
(XEN) System RAM: 8149MB (8345288kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-0000000250000000
(XEN) Domain heap initialised
(XEN) CPU Vendor: Intel, Family 6 (0x6), Model 45 (0x2d), Stepping 7 (raw 000206d7)
(XEN) found SMP MP-table at 000f1db0
(XEN) DMI 2.6 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x408 (32 bits)
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:404,1:0], pm1x_evt[1:400,1:0]
(XEN) ACPI: 32/64X FACS address mismatch in FADT - 18ffdf40/0000000018ff1f40, using 32
(XEN) ACPI:             wakeup_vec[18ffdf4c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x06] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x09] lapic_id[0x08] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x09] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0a] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0b] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0c] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0d] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0e] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x10] lapic_id[0x0f] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x11] lapic_id[0x10] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x12] lapic_id[0x11] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x13] lapic_id[0x12] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x14] lapic_id[0x13] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x15] lapic_id[0x14] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x16] lapic_id[0x15] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x17] lapic_id[0x16] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x18] lapic_id[0x17] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x19] lapic_id[0x18] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x19] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x1a] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x1b] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x1c] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x1d] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x1e] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x20] lapic_id[0x1f] disabled)
(XEN) ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec3f000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 2, version 32, address 0xfec3f000, GSI 24-47
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 2 I/O APICs
(XEN) ACPI: HPET id: 0x8086a701 base: 0xfed00000
(XEN) [VT-D]Host address width 46
(XEN) [VT-D]found ACPI_DMAR_DRHD:
(XEN) [VT-D]  dmaru->address = fbffe000
(XEN) [VT-D]drhd->address = fbffe000 iommu->reg = ffff82c00021b000
(XEN) [VT-D]cap = d2078c106f0462 ecap = f020fa
(XEN) [VT-D] IOAPIC: 0000:00:1f.7
(XEN) [VT-D] IOAPIC: 0000:00:05.4
(XEN) [VT-D] MSI HPET: 0000:f0:0f.0
(XEN) [VT-D]  flags: INCLUDE_ALL
(XEN) [VT-D]found ACPI_DMAR_RMRR:
(XEN) [VT-D] endpoint: 0000:00:1d.0
(XEN) [VT-D] endpoint: 0000:00:1a.0
(XEN) [VT-D]dmar.c:638:   RMRR region: base_addr ac7cf000 end_addr ac7defff
(XEN) [VT-D]found ACPI_DMAR_RHSA:
(XEN) [VT-D]  rhsau->address: fbffe000 rhsau->proximity_domain: 0
(XEN) ERST table was not found
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 32 CPUs (28 hotplug CPUs)
(XEN) IRQ limits: 48 GSI, 736 MSI/MSI-X
(XEN) Switched to APIC driver x2apic_cluster.
(XEN) xstate: size: 0x340 and states: 0x7
(XEN) mce_intel.c:732: MCA capability: firstbank 0, 0 ext MSRs, BCAST, SER, CMCI
(XEN) CPU0: Intel machine check reporting enabled
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Platform timer is 14.318MHz HPET
(XEN) Detected 2992.792 MHz processor.
(XEN) Initing memory sharing.
(XEN) alt table ffff82d0804181f0 -> ffff82d080418964
(XEN) PCI: MCFG configuration 0: base b0000000 segment 0000 buses 00 - 3f
(XEN) PCI: MCFG area at b0000000 reserved in E820
(XEN) PCI: Using MCFG for segment 0000 bus 00-3f
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d Snoop Control enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Posted Interrupt not enabled.
(XEN) Intel VT-d Shared EPT tables enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) nr_sockets: 9
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) TSC deadline timer enabled
(XEN) Allocated console ring of 32 KiB.
(XEN) mwait-idle: MWAIT substates: 0x21120
(XEN) mwait-idle: v0.4.1 model 0x2d
(XEN) mwait-idle: lapic_timer_reliable_states 0xffffffff
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
(XEN) Brought up 4 CPUs
(XEN) Testing NMI watchdog on all CPUs: ok
(XEN) Running stub recovery selftests...
(XEN) traps.c:3457: GPF (0000): ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d08034295e
(XEN) traps.c:813: Trap 12: ffff82d0bffff040 [ffff82d0bffff040] -> ffff82d08034295e
(XEN) traps.c:1215: Trap 3: ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d08034295e
(XEN) ACPI sleep modes: S3
(XEN) VPMU: disabled
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) Dom0 has maximum 624 PIRQs
(XEN) NX (Execute Disable) protection active
(XEN) ** Building a PVH Dom0 **
(XEN) [VT-D]d0:Hostbridge: skip 0000:00:00.0 map
(XEN) Masked UR signaling on 0000:00:00.0
(XEN) Masked UR signaling on 0000:00:01.0
(XEN) Masked UR signaling on 0000:00:01.1
(XEN) Masked UR signaling on 0000:00:02.0
(XEN) Masked UR signaling on 0000:00:03.0
(XEN) [VT-D]d0:PCIe: map 0000:00:05.0
(XEN) Masked VT-d error signaling on 0000:00:05.0
(XEN) [VT-D]d0:PCIe: map 0000:00:05.2
(XEN) [VT-D]d0:PCI: map 0000:00:05.4
(XEN) [VT-D]d0:PCI: map 0000:00:16.0
(XEN) [VT-D]d0:PCI: map 0000:00:19.0
(XEN) [VT-D]d0:PCI: map 0000:00:1a.0
(XEN) [VT-D]d0:PCIe: map 0000:00:1b.0
(XEN) [VT-D]d0:PCI: map 0000:00:1d.0
(XEN) [VT-D]d0:PCI: map 0000:00:1f.0
(XEN) [VT-D]d0:PCI: map 0000:00:1f.2
(XEN) [VT-D]d0:PCI: map 0000:00:1f.3
(XEN) [VT-D]d0:PCIe: map 0000:03:00.0
(XEN) [VT-D]d0:PCIe: map 0000:03:00.1
(XEN) [VT-D]d0:PCIe: map 0000:05:00.0
(XEN) [VT-D]d0:PCIe: map 0000:05:00.3
(XEN) [VT-D]d0:PCIe: map 0000:07:00.0
(XEN) [VT-D]d0:PCI: map 0000:3f:08.0
(XEN) [VT-D]d0:PCIe: map 0000:3f:08.3
(XEN) [VT-D]d0:PCIe: map 0000:3f:08.4
(XEN) [VT-D]d0:PCI: map 0000:3f:09.0
(XEN) [VT-D]d0:PCIe: map 0000:3f:09.3
(XEN) [VT-D]d0:PCIe: map 0000:3f:09.4
(XEN) [VT-D]d0:PCI: map 0000:3f:0a.0
(XEN) [VT-D]d0:PCI: map 0000:3f:0a.1
(XEN) [VT-D]d0:PCI: map 0000:3f:0a.2
(XEN) [VT-D]d0:PCI: map 0000:3f:0a.3
(XEN) [VT-D]d0:PCI: map 0000:3f:0b.0
(XEN) [VT-D]d0:PCI: map 0000:3f:0b.3
(XEN) [VT-D]d0:PCI: map 0000:3f:0c.0
(XEN) [VT-D]d0:PCI: map 0000:3f:0c.1
(XEN) [VT-D]d0:PCI: map 0000:3f:0c.6
(XEN) [VT-D]d0:PCI: map 0000:3f:0c.7
(XEN) [VT-D]d0:PCI: map 0000:3f:0d.0
(XEN) [VT-D]d0:PCI: map 0000:3f:0d.1
(XEN) [VT-D]d0:PCI: map 0000:3f:0d.6
(XEN) [VT-D]d0:PCI: map 0000:3f:0e.0
(XEN) [VT-D]d0:PCI: map 0000:3f:0e.1
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.0
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.1
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.2
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.3
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.4
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.5
(XEN) [VT-D]d0:PCI: map 0000:3f:0f.6
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.0
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.1
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.2
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.3
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.4
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.5
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.6
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.7
(XEN) [VT-D]d0:PCI: map 0000:3f:11.0
(XEN) [VT-D]d0:PCI: map 0000:3f:13.0
(XEN) [VT-D]d0:PCI: map 0000:3f:13.1
(XEN) [VT-D]d0:PCI: map 0000:3f:13.4
(XEN) [VT-D]d0:PCI: map 0000:3f:13.5
(XEN) [VT-D]d0:PCI: map 0000:3f:13.6
(XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
(XEN) Before DMA_GCMD_TE
(

The hang seems to happen when writing DMA_GCMD_TE to the global command
register, which enables the DMA remapping. After that the box is completely
unresponsive, not even the watchdog is working.

Thanks, Roger.

Comments

Jan Beulich April 18, 2017, 8:48 a.m. UTC | #1
>>> On 18.04.17 at 09:34, <roger.pau@citrix.com> wrote:
> (XEN) Before DMA_GCMD_TE
> (
> 
> The hang seems to happen when writing DMA_GCMD_TE to the global command
> register, which enables the DMA remapping. After that the box is completely
> unresponsive, not even the watchdog is working.

How sure are you that this is pre-Haswell specific vs e.g. chipset or
firmware (think of RMRRs [or their lack] for the latter) dependent?
Iirc Elena's command line specifiable RMRR patch series was
motivated by similar behavior she had observed on some system.

Another odd aspect is - why would IOMMU enabling cause the hang
only when intending to use a PVH Dom0? The IOMMU is being
enabled in either case, which again might point at differences in use
of memory.

Jan
Roger Pau Monné April 18, 2017, 8:59 a.m. UTC | #2
On Tue, Apr 18, 2017 at 02:48:31AM -0600, Jan Beulich wrote:
> >>> On 18.04.17 at 09:34, <roger.pau@citrix.com> wrote:
> > (XEN) Before DMA_GCMD_TE
> > (
> > 
> > The hang seems to happen when writing DMA_GCMD_TE to the global command
> > register, which enables the DMA remapping. After that the box is completely
> > unresponsive, not even the watchdog is working.
> 
> How sure are you that this is pre-Haswell specific vs e.g. chipset or
> firmware (think of RMRRs [or their lack] for the latter) dependent?
> Iirc Elena's command line specifiable RMRR patch series was
> motivated by similar behavior she had observed on some system.

This is mostly from trial/error. I don't think it's strictly CPU related, but
rather chipset related (ie: chipsets that come with pre-haswell CPUs).

Elena IIRC was at least getting IOMMU faults, which I don't even get in my
case, and I think that's the issue itself.

> Another odd aspect is - why would IOMMU enabling cause the hang
> only when intending to use a PVH Dom0? The IOMMU is being
> enabled in either case, which again might point at differences in use
> of memory.

Not sure, for PVH Dom0 the IOMMU is enabled quite early in the domain build
process (before populating the domain p2m), which seems to be fine on other
systems.

I've done that (initializing the IOMMU so early) to avoid having to iterate
over the list of domain pages afterwards when the IOMMU is initialized with the
p2m already populated.

FWIW, moving the iommu_hwdom_init call to the end of the PVH Dom0 build process
doesn't solve the issue. I've also tried with and without shared pt, and the
result is the same.

Roger.
diff mbox

Patch

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index a5c61c6e21..cb039d74e7 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -765,7 +765,9 @@  static void iommu_enable_translation(struct acpi_drhd_unit *drhd)
                iommu->reg);
     spin_lock_irqsave(&iommu->register_lock, flags);
     sts = dmar_readl(iommu->reg, DMAR_GSTS_REG);
+    printk("Before DMA_GCMD_TE\n");
     dmar_writel(iommu->reg, DMAR_GCMD_REG, sts | DMA_GCMD_TE);
+    printk("After DMA_GCMD_TE\n");
 
     /* Make sure hardware complete it */
     IOMMU_WAIT_OP(iommu, DMAR_GSTS_REG, dmar_readl,