mbox series

[0/3] Fix device_lock deadlock on two probe() paths

Message ID 0-v1-8612b9ef48da+333-iommu_group_locking2_jgg@nvidia.com (mailing list archive)
Headers show
Series Fix device_lock deadlock on two probe() paths | expand

Message

Jason Gunthorpe Aug. 8, 2023, 5:27 p.m. UTC
I missed two paths where __iommu_probe_device() can be called while
already holding the device_lock() for the device that is to be probed.

This causes a deadlock because __iommu_probe_device() will attempt to
re-acquire the lock.

Organize things so that these two paths can re-use the existing already
held device_lock by marking the call chains based on if the lock is held
or not.

This is an incremental on top of Joerg's next, but it could also be handled by
respinning the last patch in that series. Please let me know.

Jason Gunthorpe (3):
  iommu: Provide iommu_probe_device_locked()
  iommu: Pass in the iommu_device to probe for in bus_iommu_probe()
  iommu: Do not attempt to re-lock the iommu device when probing

 drivers/acpi/scan.c        |  5 ++--
 drivers/iommu/iommu.c      | 60 +++++++++++++++++++++++++++-----------
 drivers/iommu/of_iommu.c   |  2 +-
 drivers/iommu/omap-iommu.c | 12 ++++++--
 include/linux/iommu.h      |  6 +++-
 5 files changed, 61 insertions(+), 24 deletions(-)


base-commit: 8d3740021d5d461e1ec4c17fc5625b9b4237c2f8

Comments

Mark Brown Aug. 8, 2023, 8:18 p.m. UTC | #1
On Tue, Aug 08, 2023 at 02:27:04PM -0300, Jason Gunthorpe wrote:
> I missed two paths where __iommu_probe_device() can be called while
> already holding the device_lock() for the device that is to be probed.
> 
> This causes a deadlock because __iommu_probe_device() will attempt to
> re-acquire the lock.
> 
> Organize things so that these two paths can re-use the existing already
> held device_lock by marking the call chains based on if the lock is held
> or not.
> 
> This is an incremental on top of Joerg's next, but it could also be handled by
> respinning the last patch in that series. Please let me know.

The issues this series fixes have been causing quite a bit of breakage
in a range of CI systems (Arm's internal stuff, KernelCI and my personal
CI).  Both the KernelCI bot and my colleague Aishwarya (CCed) bisected
which pointed to this series so I've tested them - I didn't cover every
board but this does fix at least some boots so:

Tested-by: Mark Brown <broonie@kernel.org>

It'd be great to get these fixes into -next, thanks for getting the
patches out so quickly.
Jason Gunthorpe Aug. 8, 2023, 10:26 p.m. UTC | #2
On Tue, Aug 08, 2023 at 09:18:51PM +0100, Mark Brown wrote:
> On Tue, Aug 08, 2023 at 02:27:04PM -0300, Jason Gunthorpe wrote:
> > I missed two paths where __iommu_probe_device() can be called while
> > already holding the device_lock() for the device that is to be probed.
> > 
> > This causes a deadlock because __iommu_probe_device() will attempt to
> > re-acquire the lock.
> > 
> > Organize things so that these two paths can re-use the existing already
> > held device_lock by marking the call chains based on if the lock is held
> > or not.
> > 
> > This is an incremental on top of Joerg's next, but it could also be handled by
> > respinning the last patch in that series. Please let me know.
> 
> The issues this series fixes have been causing quite a bit of breakage
> in a range of CI systems (Arm's internal stuff, KernelCI and my personal
> CI).  Both the KernelCI bot and my colleague Aishwarya (CCed) bisected
> which pointed to this series so I've tested them - I didn't cover every
> board but this does fix at least some boots so:
> 
> Tested-by: Mark Brown <broonie@kernel.org>
> 
> It'd be great to get these fixes into -next, thanks for getting the
> patches out so quickly.

Yes, sorry about that, this series didn't get picked up by any arm testing
before getting merged..

Thanks,
Jason
Chen-Yu Tsai Aug. 9, 2023, 6:37 a.m. UTC | #3
On Wed, Aug 9, 2023 at 1:27 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> I missed two paths where __iommu_probe_device() can be called while
> already holding the device_lock() for the device that is to be probed.
>
> This causes a deadlock because __iommu_probe_device() will attempt to
> re-acquire the lock.
>
> Organize things so that these two paths can re-use the existing already
> held device_lock by marking the call chains based on if the lock is held
> or not.
>
> This is an incremental on top of Joerg's next, but it could also be handled by
> respinning the last patch in that series. Please let me know.
>
> Jason Gunthorpe (3):
>   iommu: Provide iommu_probe_device_locked()
>   iommu: Pass in the iommu_device to probe for in bus_iommu_probe()
>   iommu: Do not attempt to re-lock the iommu device when probing

Tested-by: Chen-Yu Tsai <wenst@chromium.org>

Thank you for coming up with fixes so quickly.