diff mbox series

AMD/IOMMU: restore DTE fields in amd_iommu_setup_domain_device()

Message ID d3141a4d-b1b8-cc8b-3171-73fe0e6dd1c9@suse.com (mailing list archive)
State New, archived
Headers show
Series AMD/IOMMU: restore DTE fields in amd_iommu_setup_domain_device() | expand

Commit Message

Jan Beulich Nov. 13, 2019, 1:50 p.m. UTC
Commit 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs right after table
allocation") moved ourselves into a more secure default state, but
didn't take sufficient care to also undo the effects when handing a
previously disabled device back to a(nother) domain. Put the fields
that may have been changed elsewhere back to their intended values
(some fields amd_iommu_disable_domain_device() touches don't
currently get written anywhere else, and hence don't need modifying
here).

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Igor Druzhinin Nov. 14, 2019, 12:28 p.m. UTC | #1
On 13/11/2019 13:50, Jan Beulich wrote:
> Commit 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs right after table
> allocation") moved ourselves into a more secure default state, but
> didn't take sufficient care to also undo the effects when handing a
> previously disabled device back to a(nother) domain. Put the fields
> that may have been changed elsewhere back to their intended values
> (some fields amd_iommu_disable_domain_device() touches don't
> currently get written anywhere else, and hence don't need modifying
> here).
> 
> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -114,11 +114,21 @@ static void amd_iommu_setup_domain_devic
>  
>      if ( !dte->v || !dte->tv )
>      {
> +        const struct ivrs_mappings *ivrs_dev;
> +
>          /* bind DTE to domain page-tables */
>          amd_iommu_set_root_page_table(
>              dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
>              hd->arch.paging_mode, valid);
>  
> +        /* Undo what amd_iommu_disable_domain_device() may have done. */
> +        ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
> +        if ( dte->it_root )
> +            dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
> +        dte->iv = iommu_intremap;
> +        dte->ex = ivrs_dev->dte_allow_exclusion;
> +        dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT);
> +
>          if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
>               iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
>              dte->i = ats_enabled;
> 

Tested-by: Igor Druzhinin <igor.druzhinin@citrix.com>

Without this change we get stable TDRs at boot time with GPU passthrough
on AMD machines.

Igor
Andrew Cooper Nov. 15, 2019, 11:29 a.m. UTC | #2
On 13/11/2019 13:50, Jan Beulich wrote:
> Commit 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs right after table
> allocation") moved ourselves into a more secure default state, but
> didn't take sufficient care to also undo the effects when handing a
> previously disabled device back to a(nother) domain. Put the fields
> that may have been changed elsewhere back to their intended values
> (some fields amd_iommu_disable_domain_device() touches don't
> currently get written anywhere else, and hence don't need modifying
> here).
>
> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jürgen Groß Nov. 15, 2019, 1:13 p.m. UTC | #3
On 13.11.19 14:50, Jan Beulich wrote:
> Commit 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs right after table
> allocation") moved ourselves into a more secure default state, but
> didn't take sufficient care to also undo the effects when handing a
> previously disabled device back to a(nother) domain. Put the fields
> that may have been changed elsewhere back to their intended values
> (some fields amd_iommu_disable_domain_device() touches don't
> currently get written anywhere else, and hence don't need modifying
> here).
> 
> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Release-acked-by: Juergen Gross <jgross@suse.com>


Juergen
Igor Druzhinin Nov. 25, 2019, 3:56 p.m. UTC | #4
On 14/11/2019 12:28, Igor Druzhinin wrote:
> On 13/11/2019 13:50, Jan Beulich wrote:
>> Commit 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs right after table
>> allocation") moved ourselves into a more secure default state, but
>> didn't take sufficient care to also undo the effects when handing a
>> previously disabled device back to a(nother) domain. Put the fields
>> that may have been changed elsewhere back to their intended values
>> (some fields amd_iommu_disable_domain_device() touches don't
>> currently get written anywhere else, and hence don't need modifying
>> here).
>>
>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -114,11 +114,21 @@ static void amd_iommu_setup_domain_devic
>>  
>>      if ( !dte->v || !dte->tv )
>>      {
>> +        const struct ivrs_mappings *ivrs_dev;
>> +
>>          /* bind DTE to domain page-tables */
>>          amd_iommu_set_root_page_table(
>>              dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
>>              hd->arch.paging_mode, valid);
>>  
>> +        /* Undo what amd_iommu_disable_domain_device() may have done. */
>> +        ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
>> +        if ( dte->it_root )
>> +            dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
>> +        dte->iv = iommu_intremap;
>> +        dte->ex = ivrs_dev->dte_allow_exclusion;
>> +        dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT);
>> +
>>          if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
>>               iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
>>              dte->i = ats_enabled;
>>


Jan,

Unfortunately, with 1b00c16bdf and this fix on top we're still getting
issues on some old AMD hardware: Lisbon core Opteron 4162.

(XEN) [   13.072921] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 0xbf695000, flags = 0x10
(XEN) [   13.072978] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 0xbf695040, flags = 0x10

Sometimes accompanied by assertion later:

[2019-11-22 01:54:57 UTC] (XEN) [   13.074311] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1275
[2019-11-22 01:54:57 UTC] (XEN) [   13.074317] ----[ Xen-4.13.0-8.0.17-d  x86_64  debug=y   Not tainted ]----
[2019-11-22 01:54:57 UTC] (XEN) [   13.074321] CPU:    0
[2019-11-22 01:54:57 UTC] (XEN) [   13.074325] RIP:    e008:[<ffff82d08028a557>] do_IRQ+0x3fe/0x687
[2019-11-22 01:54:57 UTC] (XEN) [   13.074332] RFLAGS: 0000000000010046   CONTEXT: hypervisor
[2019-11-22 01:54:57 UTC] (XEN) [   13.074338] rax: 0000000000000001   rbx: ffff82d0805c74c0   rcx: 00000000000000a0
[2019-11-22 01:54:57 UTC] (XEN) [   13.074342] rdx: 0000000000000001   rsi: 0000000000000006   rdi: ffff82d0805c7300
[2019-11-22 01:54:57 UTC] (XEN) [   13.074347] rbp: ffff8300bf2bfdd8   rsp: ffff8300bf2bfd58   r8:  0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074386] r9:  ffff83043ffe85d8   r10: 0000000000000000   r11: 000000030bd0be8f
[2019-11-22 01:54:57 UTC] (XEN) [   13.074390] r12: ffff8304340e5100   r13: 00000000000000a0   r14: 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074395] r15: 0000000000000010   cr0: 000000008005003b   cr4: 00000000000006e0
[2019-11-22 01:54:57 UTC] (XEN) [   13.074399] cr3: 000000043a008000   cr2: 00007f5947408250
[2019-11-22 01:54:57 UTC] (XEN) [   13.074402] fsb: 0000000000000000   gsb: ffff8880a3600000   gss: 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074407] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
[2019-11-22 01:54:57 UTC] (XEN) [   13.074412] Xen code around <ffff82d08028a557> (do_IRQ+0x3fe/0x687):
[2019-11-22 01:54:57 UTC] (XEN) [   13.074415]  4c 8b ff 41 39 cd 77 02 <0f> 0b 3d be 00 00 00 7e 02 0f 0b 0f b6 c2 48 8d
[2019-11-22 01:54:57 UTC] (XEN) [   13.074430] Xen stack trace from rsp=ffff8300bf2bfd58:
[2019-11-22 01:54:57 UTC] (XEN) [   13.074432]    ffff82d080389851 ffff82d080389845 ffff82d080389851 ffff82d080389845
[2019-11-22 01:54:57 UTC] (XEN) [   13.074440]    ffff82d000000000 ffff83043fe01024 0000001080389851 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074447]    ffff82d080389851 ffff82d080389845 ffff82d080389851 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074453]    0000000000000000 0000000000000000 ffff8300bf2bffff 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074459]    00007cff40d401f7 ffff82d0803898ba 0000000000000000 ffff82d0805c7270
[2019-11-22 01:54:57 UTC] (XEN) [   13.074465]    0000000000000000 ffff82d0805cda80 ffff8300bf2bfea0 ffff8300bf2bffff
[2019-11-22 01:54:57 UTC] (XEN) [   13.074472]    000000035e998c9a 00000023e3551479 ffff82d080610700 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074478]    0000000000000000 0000000000000048 0000000000000000 ffff8300bf2bfef8
[2019-11-22 01:54:57 UTC] (XEN) [   13.074484]    0000000000000000 000000a000000000 ffff82d08027a6ee 000000000000e008
[2019-11-22 01:54:57 UTC] (XEN) [   13.074490]    0000000000000206 ffff8300bf2bfe90 000000000000e010 ffff82d0805c7270
[2019-11-22 01:54:57 UTC] (XEN) [   13.074496]    0000000000000000 ffff8300bf2bfef0 ffff82d08027a80c ffff82d080242855
[2019-11-22 01:54:57 UTC] (XEN) [   13.074503]    000000003ff4c000 ffff83043fe92000 ffff83043fe92000 ffff83043ff4c000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074510]    ffff83043fe93000 0000000000000000 ffff83043ff68000 ffff8300bf2bfd58
[2019-11-22 01:54:57 UTC] (XEN) [   13.074516]    ffffffff82011740 ffffffff82011740 0000000000000000 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074522]    0000000000000000 ffffffff82011740 0000000000000246 aaaaaaaaaaaaaaaa
[2019-11-22 01:54:57 UTC] (XEN) [   13.074528]    0000000000000000 000000009695f0e8 0000000000000000 ffffffff810013aa
[2019-11-22 01:54:57 UTC] (XEN) [   13.074534]    ffffffff8203d210 deadbeefdeadf00d deadbeefdeadf00d 0000010000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074541]    ffffffff810013aa 000000000000e033 0000000000000246 ffffffff82003e58
[2019-11-22 01:54:57 UTC] (XEN) [   13.074547]    000000000000e02b bf200a45bf2bffe0 bf200d7f0009cf7a bf200da300000001
[2019-11-22 01:54:57 UTC] (XEN) [   13.074554]    bf200952bf2bffe0 0000e01000000000 ffff83043fe92000 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074561] Xen call trace:
[2019-11-22 01:54:57 UTC] (XEN) [   13.074564]    [<ffff82d08028a557>] R do_IRQ+0x3fe/0x687
[2019-11-22 01:54:57 UTC] (XEN) [   13.074570]    [<ffff82d080389851>] S common_interrupt+0xa1/0x120
[2019-11-22 01:54:57 UTC] (XEN) [   13.074575]    [<ffff82d0803898ba>] F common_interrupt+0x10a/0x120
[2019-11-22 01:54:57 UTC] (XEN) [   13.074580]    [<ffff82d08027a6ee>] F domain.c#default_idle+0xc3/0xda
[2019-11-22 01:54:57 UTC] (XEN) [   13.074585]    [<ffff82d08027a80c>] F domain.c#idle_loop+0xaf/0xcb
[2019-11-22 01:54:57 UTC] (XEN) [   13.074588] 
[2019-11-22 01:54:57 UTC] (XEN) [   13.413638] 
[2019-11-22 01:54:57 UTC] (XEN) [   13.415630] ****************************************
[2019-11-22 01:54:57 UTC] (XEN) [   13.421089] Panic on CPU 0:
[2019-11-22 01:54:57 UTC] (XEN) [   13.424383] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1275
[2019-11-22 01:54:57 UTC] (XEN) [   13.432789] ****************************************
[2019-11-22 01:54:57 UTC] (XEN) [   13.438251] 

Worth noting that by default upstream doesn't enable IOMMU on that
particular core due to SP5100 erratum. But could the problem here be
related to it?

Igor
Igor Druzhinin Nov. 25, 2019, 4 p.m. UTC | #5
On 14/11/2019 12:28, Igor Druzhinin wrote:
> On 13/11/2019 13:50, Jan Beulich wrote:
>> Commit 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs right after table
>> allocation") moved ourselves into a more secure default state, but
>> didn't take sufficient care to also undo the effects when handing a
>> previously disabled device back to a(nother) domain. Put the fields
>> that may have been changed elsewhere back to their intended values
>> (some fields amd_iommu_disable_domain_device() touches don't
>> currently get written anywhere else, and hence don't need modifying
>> here).
>>
>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -114,11 +114,21 @@ static void amd_iommu_setup_domain_devic
>>  
>>      if ( !dte->v || !dte->tv )
>>      {
>> +        const struct ivrs_mappings *ivrs_dev;
>> +
>>          /* bind DTE to domain page-tables */
>>          amd_iommu_set_root_page_table(
>>              dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
>>              hd->arch.paging_mode, valid);
>>  
>> +        /* Undo what amd_iommu_disable_domain_device() may have done. */
>> +        ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
>> +        if ( dte->it_root )
>> +            dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
>> +        dte->iv = iommu_intremap;
>> +        dte->ex = ivrs_dev->dte_allow_exclusion;
>> +        dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT);
>> +
>>          if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
>>               iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
>>              dte->i = ats_enabled;
>>

Jan, 

Unfortunately, we're still seeing issues with the original 1b00c16bdf on
AMD Opteron 4162 Lisbon core. It manifests in IOMMU faults during boot:

(XEN) [   13.072921] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 0xbf695000, flags = 0x10
(XEN) [   13.072978] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 0xbf695040, flags = 0x10

... sometimes followed by assertion in debug builds:

[2019-11-22 01:54:57 UTC] (XEN) [   13.074311] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1275
[2019-11-22 01:54:57 UTC] (XEN) [   13.074317] ----[ Xen-4.13.0-8.0.17-d  x86_64  debug=y   Not tainted ]----
[2019-11-22 01:54:57 UTC] (XEN) [   13.074321] CPU:    0
[2019-11-22 01:54:57 UTC] (XEN) [   13.074325] RIP:    e008:[<ffff82d08028a557>] do_IRQ+0x3fe/0x687
[2019-11-22 01:54:57 UTC] (XEN) [   13.074332] RFLAGS: 0000000000010046   CONTEXT: hypervisor
[2019-11-22 01:54:57 UTC] (XEN) [   13.074338] rax: 0000000000000001   rbx: ffff82d0805c74c0   rcx: 00000000000000a0
[2019-11-22 01:54:57 UTC] (XEN) [   13.074342] rdx: 0000000000000001   rsi: 0000000000000006   rdi: ffff82d0805c7300
[2019-11-22 01:54:57 UTC] (XEN) [   13.074347] rbp: ffff8300bf2bfdd8   rsp: ffff8300bf2bfd58   r8:  0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074386] r9:  ffff83043ffe85d8   r10: 0000000000000000   r11: 000000030bd0be8f
[2019-11-22 01:54:57 UTC] (XEN) [   13.074390] r12: ffff8304340e5100   r13: 00000000000000a0   r14: 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074395] r15: 0000000000000010   cr0: 000000008005003b   cr4: 00000000000006e0
[2019-11-22 01:54:57 UTC] (XEN) [   13.074399] cr3: 000000043a008000   cr2: 00007f5947408250
[2019-11-22 01:54:57 UTC] (XEN) [   13.074402] fsb: 0000000000000000   gsb: ffff8880a3600000   gss: 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074407] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
[2019-11-22 01:54:57 UTC] (XEN) [   13.074412] Xen code around <ffff82d08028a557> (do_IRQ+0x3fe/0x687):
[2019-11-22 01:54:57 UTC] (XEN) [   13.074415]  4c 8b ff 41 39 cd 77 02 <0f> 0b 3d be 00 00 00 7e 02 0f 0b 0f b6 c2 48 8d
[2019-11-22 01:54:57 UTC] (XEN) [   13.074430] Xen stack trace from rsp=ffff8300bf2bfd58:
[2019-11-22 01:54:57 UTC] (XEN) [   13.074432]    ffff82d080389851 ffff82d080389845 ffff82d080389851 ffff82d080389845
[2019-11-22 01:54:57 UTC] (XEN) [   13.074440]    ffff82d000000000 ffff83043fe01024 0000001080389851 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074447]    ffff82d080389851 ffff82d080389845 ffff82d080389851 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074453]    0000000000000000 0000000000000000 ffff8300bf2bffff 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074459]    00007cff40d401f7 ffff82d0803898ba 0000000000000000 ffff82d0805c7270
[2019-11-22 01:54:57 UTC] (XEN) [   13.074465]    0000000000000000 ffff82d0805cda80 ffff8300bf2bfea0 ffff8300bf2bffff
[2019-11-22 01:54:57 UTC] (XEN) [   13.074472]    000000035e998c9a 00000023e3551479 ffff82d080610700 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074478]    0000000000000000 0000000000000048 0000000000000000 ffff8300bf2bfef8
[2019-11-22 01:54:57 UTC] (XEN) [   13.074484]    0000000000000000 000000a000000000 ffff82d08027a6ee 000000000000e008
[2019-11-22 01:54:57 UTC] (XEN) [   13.074490]    0000000000000206 ffff8300bf2bfe90 000000000000e010 ffff82d0805c7270
[2019-11-22 01:54:57 UTC] (XEN) [   13.074496]    0000000000000000 ffff8300bf2bfef0 ffff82d08027a80c ffff82d080242855
[2019-11-22 01:54:57 UTC] (XEN) [   13.074503]    000000003ff4c000 ffff83043fe92000 ffff83043fe92000 ffff83043ff4c000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074510]    ffff83043fe93000 0000000000000000 ffff83043ff68000 ffff8300bf2bfd58
[2019-11-22 01:54:57 UTC] (XEN) [   13.074516]    ffffffff82011740 ffffffff82011740 0000000000000000 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074522]    0000000000000000 ffffffff82011740 0000000000000246 aaaaaaaaaaaaaaaa
[2019-11-22 01:54:57 UTC] (XEN) [   13.074528]    0000000000000000 000000009695f0e8 0000000000000000 ffffffff810013aa
[2019-11-22 01:54:57 UTC] (XEN) [   13.074534]    ffffffff8203d210 deadbeefdeadf00d deadbeefdeadf00d 0000010000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074541]    ffffffff810013aa 000000000000e033 0000000000000246 ffffffff82003e58
[2019-11-22 01:54:57 UTC] (XEN) [   13.074547]    000000000000e02b bf200a45bf2bffe0 bf200d7f0009cf7a bf200da300000001
[2019-11-22 01:54:57 UTC] (XEN) [   13.074554]    bf200952bf2bffe0 0000e01000000000 ffff83043fe92000 0000000000000000
[2019-11-22 01:54:57 UTC] (XEN) [   13.074561] Xen call trace:
[2019-11-22 01:54:57 UTC] (XEN) [   13.074564]    [<ffff82d08028a557>] R do_IRQ+0x3fe/0x687
[2019-11-22 01:54:57 UTC] (XEN) [   13.074570]    [<ffff82d080389851>] S common_interrupt+0xa1/0x120
[2019-11-22 01:54:57 UTC] (XEN) [   13.074575]    [<ffff82d0803898ba>] F common_interrupt+0x10a/0x120
[2019-11-22 01:54:57 UTC] (XEN) [   13.074580]    [<ffff82d08027a6ee>] F domain.c#default_idle+0xc3/0xda
[2019-11-22 01:54:57 UTC] (XEN) [   13.074585]    [<ffff82d08027a80c>] F domain.c#idle_loop+0xaf/0xcb
[2019-11-22 01:54:57 UTC] (XEN) [   13.074588] 
[2019-11-22 01:54:57 UTC] (XEN) [   13.413638] 
[2019-11-22 01:54:57 UTC] (XEN) [   13.415630] ****************************************
[2019-11-22 01:54:57 UTC] (XEN) [   13.421089] Panic on CPU 0:
[2019-11-22 01:54:57 UTC] (XEN) [   13.424383] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1275
[2019-11-22 01:54:57 UTC] (XEN) [   13.432789] ****************************************

It worth noting that upstream has IOMMU disabled on that type of core
due to SP5100 erratum. But could the issue here be related to this?

Igor
diff mbox series

Patch

--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -114,11 +114,21 @@  static void amd_iommu_setup_domain_devic
 
     if ( !dte->v || !dte->tv )
     {
+        const struct ivrs_mappings *ivrs_dev;
+
         /* bind DTE to domain page-tables */
         amd_iommu_set_root_page_table(
             dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
             hd->arch.paging_mode, valid);
 
+        /* Undo what amd_iommu_disable_domain_device() may have done. */
+        ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
+        if ( dte->it_root )
+            dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
+        dte->iv = iommu_intremap;
+        dte->ex = ivrs_dev->dte_allow_exclusion;
+        dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT);
+
         if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
              iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
             dte->i = ats_enabled;