diff mbox series

pci: cleanup MSI interrupts before removing device from IOMMU

Message ID 20201021081945.28425-1-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series pci: cleanup MSI interrupts before removing device from IOMMU | expand

Commit Message

Roger Pau Monné Oct. 21, 2020, 8:19 a.m. UTC
Doing the MSI cleanup after removing the device from the IOMMU leads
to the following panic on AMD hardware:

Assertion 'table.ptr && (index < intremap_table_entries(table.ptr, iommu))' failed at iommu_intr.c:172
----[ Xen-4.13.1-10.0.3-d  x86_64  debug=y   Not tainted ]----
CPU:    3
RIP:    e008:[<ffff82d08026ae3c>] drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
[...]
Xen call trace:
   [<ffff82d08026ae3c>] R drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
   [<ffff82d08026af25>] F drivers/passthrough/amd/iommu_intr.c#update_intremap_entry_from_msi_msg+0xc0/0x342
   [<ffff82d08026ba65>] F amd_iommu_msi_msg_update_ire+0x98/0x129
   [<ffff82d08025dd36>] F iommu_update_ire_from_msi+0x1e/0x21
   [<ffff82d080286862>] F msi_free_irq+0x55/0x1a0
   [<ffff82d080286f25>] F pci_cleanup_msi+0x8c/0xb0
   [<ffff82d08025cf52>] F pci_remove_device+0x1af/0x2da
   [<ffff82d0802a42d1>] F do_physdev_op+0xd18/0x1187
   [<ffff82d080383925>] F pv_hypercall+0x1f5/0x567
   [<ffff82d08038a432>] F lstar_enter+0x112/0x120

That's because the call to iommu_remove_device on AMD hardware will
remove the per-device interrupt remapping table, and hence the call to
pci_cleanup_msi done afterwards will find a null intremap table and
crash.

Reorder the calls so that MSI interrupts are torn down before removing
the device from the IOMMU.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
I've discussed the issue with Andrew and maybe we should try to avoid
removing the interrupt remapping table on device removal, but then the
tables would have to be sized to support the maximum amount of
interrupts instead of the maximum supported by the device currently
plugged in.

I think the currently proposed fix can be easily backported. We can
see about improving the resilience going ahead.
---
 xen/drivers/passthrough/pci.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Jan Beulich Oct. 21, 2020, 11:20 a.m. UTC | #1
On 21.10.2020 10:19, Roger Pau Monne wrote:
> Doing the MSI cleanup after removing the device from the IOMMU leads
> to the following panic on AMD hardware:
> 
> Assertion 'table.ptr && (index < intremap_table_entries(table.ptr, iommu))' failed at iommu_intr.c:172
> ----[ Xen-4.13.1-10.0.3-d  x86_64  debug=y   Not tainted ]----
> CPU:    3
> RIP:    e008:[<ffff82d08026ae3c>] drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
> [...]
> Xen call trace:
>    [<ffff82d08026ae3c>] R drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
>    [<ffff82d08026af25>] F drivers/passthrough/amd/iommu_intr.c#update_intremap_entry_from_msi_msg+0xc0/0x342
>    [<ffff82d08026ba65>] F amd_iommu_msi_msg_update_ire+0x98/0x129
>    [<ffff82d08025dd36>] F iommu_update_ire_from_msi+0x1e/0x21
>    [<ffff82d080286862>] F msi_free_irq+0x55/0x1a0
>    [<ffff82d080286f25>] F pci_cleanup_msi+0x8c/0xb0
>    [<ffff82d08025cf52>] F pci_remove_device+0x1af/0x2da
>    [<ffff82d0802a42d1>] F do_physdev_op+0xd18/0x1187
>    [<ffff82d080383925>] F pv_hypercall+0x1f5/0x567
>    [<ffff82d08038a432>] F lstar_enter+0x112/0x120
> 
> That's because the call to iommu_remove_device on AMD hardware will
> remove the per-device interrupt remapping table, and hence the call to
> pci_cleanup_msi done afterwards will find a null intremap table and
> crash.
> 
> Reorder the calls so that MSI interrupts are torn down before removing
> the device from the IOMMU.

I guess this wants

Fixes: d7cfeb7c13ed ("AMD/IOMMU: don't blindly allocate interrupt remapping tables")

?

> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> I've discussed the issue with Andrew and maybe we should try to avoid
> removing the interrupt remapping table on device removal, but then the
> tables would have to be sized to support the maximum amount of
> interrupts instead of the maximum supported by the device currently
> plugged in.

We've specifically limited allocation sizes not so long ago (the
commit above was the first of two steps in that direction). So
I'd rather not see us go back unless there's truly new information
available now.

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -834,10 +834,15 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>          if ( pdev->bus == bus && pdev->devfn == devfn )
>          {
> +            /*
> +             * Cleanup MSI interrupts before removing the device from the
> +             * IOMMU, or else the internal IOMMU data used to track the device
> +             * interrupts might be already gone.
> +             */
> +            pci_cleanup_msi(pdev);
>              ret = iommu_remove_device(pdev);
>              if ( pdev->domain )
>                  list_del(&pdev->domain_list);
> -            pci_cleanup_msi(pdev);

To be honest I'm not sure about the comment. It should really have
been this way from the very beginning, and VT-d not being affected
makes me wonder what possible improvements are there waiting to be
noticed and then carried out.

Jan
Roger Pau Monné Oct. 21, 2020, 11:34 a.m. UTC | #2
On Wed, Oct 21, 2020 at 01:20:27PM +0200, Jan Beulich wrote:
> On 21.10.2020 10:19, Roger Pau Monne wrote:
> > Doing the MSI cleanup after removing the device from the IOMMU leads
> > to the following panic on AMD hardware:
> > 
> > Assertion 'table.ptr && (index < intremap_table_entries(table.ptr, iommu))' failed at iommu_intr.c:172
> > ----[ Xen-4.13.1-10.0.3-d  x86_64  debug=y   Not tainted ]----
> > CPU:    3
> > RIP:    e008:[<ffff82d08026ae3c>] drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
> > [...]
> > Xen call trace:
> >    [<ffff82d08026ae3c>] R drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
> >    [<ffff82d08026af25>] F drivers/passthrough/amd/iommu_intr.c#update_intremap_entry_from_msi_msg+0xc0/0x342
> >    [<ffff82d08026ba65>] F amd_iommu_msi_msg_update_ire+0x98/0x129
> >    [<ffff82d08025dd36>] F iommu_update_ire_from_msi+0x1e/0x21
> >    [<ffff82d080286862>] F msi_free_irq+0x55/0x1a0
> >    [<ffff82d080286f25>] F pci_cleanup_msi+0x8c/0xb0
> >    [<ffff82d08025cf52>] F pci_remove_device+0x1af/0x2da
> >    [<ffff82d0802a42d1>] F do_physdev_op+0xd18/0x1187
> >    [<ffff82d080383925>] F pv_hypercall+0x1f5/0x567
> >    [<ffff82d08038a432>] F lstar_enter+0x112/0x120
> > 
> > That's because the call to iommu_remove_device on AMD hardware will
> > remove the per-device interrupt remapping table, and hence the call to
> > pci_cleanup_msi done afterwards will find a null intremap table and
> > crash.
> > 
> > Reorder the calls so that MSI interrupts are torn down before removing
> > the device from the IOMMU.
> 
> I guess this wants
> 
> Fixes: d7cfeb7c13ed ("AMD/IOMMU: don't blindly allocate interrupt remapping tables")
> 
> ?

Oh yes, I didn't git blame the file to figure out when such allocating
and freeing was added.

> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

> > --- a/xen/drivers/passthrough/pci.c
> > +++ b/xen/drivers/passthrough/pci.c
> > @@ -834,10 +834,15 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
> >      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
> >          if ( pdev->bus == bus && pdev->devfn == devfn )
> >          {
> > +            /*
> > +             * Cleanup MSI interrupts before removing the device from the
> > +             * IOMMU, or else the internal IOMMU data used to track the device
> > +             * interrupts might be already gone.
> > +             */
> > +            pci_cleanup_msi(pdev);
> >              ret = iommu_remove_device(pdev);
> >              if ( pdev->domain )
> >                  list_del(&pdev->domain_list);
> > -            pci_cleanup_msi(pdev);
> 
> To be honest I'm not sure about the comment. It should really have
> been this way from the very beginning, and VT-d not being affected
> makes me wonder what possible improvements are there waiting to be
> noticed and then carried out.

I'm fine with dropping the comment, I would also expect the normal
flow to be to cleanup any interrupt and then remove the device,
instead of the other way around.

Roger.
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index b035067975..64b8a77ce0 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -834,10 +834,15 @@  int pci_remove_device(u16 seg, u8 bus, u8 devfn)
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
         if ( pdev->bus == bus && pdev->devfn == devfn )
         {
+            /*
+             * Cleanup MSI interrupts before removing the device from the
+             * IOMMU, or else the internal IOMMU data used to track the device
+             * interrupts might be already gone.
+             */
+            pci_cleanup_msi(pdev);
             ret = iommu_remove_device(pdev);
             if ( pdev->domain )
                 list_del(&pdev->domain_list);
-            pci_cleanup_msi(pdev);
             printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf);
             free_pdev(pseg, pdev);
             break;