Message ID | 0-v1-8612b9ef48da+333-iommu_group_locking2_jgg@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix device_lock deadlock on two probe() paths | expand |
On Tue, Aug 08, 2023 at 02:27:04PM -0300, Jason Gunthorpe wrote: > I missed two paths where __iommu_probe_device() can be called while > already holding the device_lock() for the device that is to be probed. > > This causes a deadlock because __iommu_probe_device() will attempt to > re-acquire the lock. > > Organize things so that these two paths can re-use the existing already > held device_lock by marking the call chains based on if the lock is held > or not. > > This is an incremental on top of Joerg's next, but it could also be handled by > respinning the last patch in that series. Please let me know. The issues this series fixes have been causing quite a bit of breakage in a range of CI systems (Arm's internal stuff, KernelCI and my personal CI). Both the KernelCI bot and my colleague Aishwarya (CCed) bisected which pointed to this series so I've tested them - I didn't cover every board but this does fix at least some boots so: Tested-by: Mark Brown <broonie@kernel.org> It'd be great to get these fixes into -next, thanks for getting the patches out so quickly.
On Tue, Aug 08, 2023 at 09:18:51PM +0100, Mark Brown wrote: > On Tue, Aug 08, 2023 at 02:27:04PM -0300, Jason Gunthorpe wrote: > > I missed two paths where __iommu_probe_device() can be called while > > already holding the device_lock() for the device that is to be probed. > > > > This causes a deadlock because __iommu_probe_device() will attempt to > > re-acquire the lock. > > > > Organize things so that these two paths can re-use the existing already > > held device_lock by marking the call chains based on if the lock is held > > or not. > > > > This is an incremental on top of Joerg's next, but it could also be handled by > > respinning the last patch in that series. Please let me know. > > The issues this series fixes have been causing quite a bit of breakage > in a range of CI systems (Arm's internal stuff, KernelCI and my personal > CI). Both the KernelCI bot and my colleague Aishwarya (CCed) bisected > which pointed to this series so I've tested them - I didn't cover every > board but this does fix at least some boots so: > > Tested-by: Mark Brown <broonie@kernel.org> > > It'd be great to get these fixes into -next, thanks for getting the > patches out so quickly. Yes, sorry about that, this series didn't get picked up by any arm testing before getting merged.. Thanks, Jason
On Wed, Aug 9, 2023 at 1:27 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > I missed two paths where __iommu_probe_device() can be called while > already holding the device_lock() for the device that is to be probed. > > This causes a deadlock because __iommu_probe_device() will attempt to > re-acquire the lock. > > Organize things so that these two paths can re-use the existing already > held device_lock by marking the call chains based on if the lock is held > or not. > > This is an incremental on top of Joerg's next, but it could also be handled by > respinning the last patch in that series. Please let me know. > > Jason Gunthorpe (3): > iommu: Provide iommu_probe_device_locked() > iommu: Pass in the iommu_device to probe for in bus_iommu_probe() > iommu: Do not attempt to re-lock the iommu device when probing Tested-by: Chen-Yu Tsai <wenst@chromium.org> Thank you for coming up with fixes so quickly.