Message ID | 0-v1-ef00ffecea52+2cb-iommu_group_lifetime_jgg@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix splats releated to using the iommu_group after destroying devices | expand |
On 9/8/22 2:44 PM, Jason Gunthorpe wrote: > The basic issue is that the iommu_group is being used by VFIO after all > the device drivers have been removed. > > In part this is caused by bad logic inside the iommu core that doesn't > sequence removing the device from the group properly, and in another part > this is bad logic in VFIO continuing to use device->iommu_group after all > VFIO device drivers have been removed. > > Fix both situations. Either fix alone should fix the bug reported, but > both together bring a nice robust design to this area. > > This is a followup from this thread: > > https://lore.kernel.org/kvm/20220831201236.77595-1-mjrosato@linux.ibm.com/ > > Matthew confirmed an earlier version of the series solved the issue, it > would be best if he would test this as well to confirm the various changes > are still OK. FYI I've been running this series (+ the incremental to patch 4 you mentioned) against my original repro scenario in a loop overnight, looks good. > > The iommu patch is independent of the other patches, it can go through the > iommu rc tree. > > Jason Gunthorpe (4): > vfio: Simplify vfio_create_group() > vfio: Move the sanity check of the group to vfio_create_group() > vfio: Follow a strict lifetime for struct iommu_group * > iommu: Fix ordering of iommu_release_device() > > drivers/iommu/iommu.c | 36 ++++++-- > drivers/vfio/vfio_main.c | 172 +++++++++++++++++++++------------------ > 2 files changed, 120 insertions(+), 88 deletions(-) > > > base-commit: 245898eb9275ce31942cff95d0bdc7412ad3d589
On Fri, Sep 09, 2022 at 08:49:40AM -0400, Matthew Rosato wrote: > On 9/8/22 2:44 PM, Jason Gunthorpe wrote: > > The basic issue is that the iommu_group is being used by VFIO after all > > the device drivers have been removed. > > > > In part this is caused by bad logic inside the iommu core that doesn't > > sequence removing the device from the group properly, and in another part > > this is bad logic in VFIO continuing to use device->iommu_group after all > > VFIO device drivers have been removed. > > > > Fix both situations. Either fix alone should fix the bug reported, but > > both together bring a nice robust design to this area. > > > > This is a followup from this thread: > > > > https://lore.kernel.org/kvm/20220831201236.77595-1-mjrosato@linux.ibm.com/ > > > > Matthew confirmed an earlier version of the series solved the issue, it > > would be best if he would test this as well to confirm the various changes > > are still OK. > > FYI I've been running this series (+ the incremental to patch 4 you > mentioned) against my original repro scenario in a loop overnight, > looks good. Thanks Matthew, looks like we need some more time on the last patch but I think the VFIO ones are OK if Alex wants to pick them before LPC is over. Jason