diff mbox series

[v6,63/64] drm/i915: Move gt_revoke() slightly

Message ID 20210105153558.134272-64-maarten.lankhorst@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Remove obj->mm.lock! | expand

Commit Message

Maarten Lankhorst Jan. 5, 2021, 3:35 p.m. UTC
We get a lockdep splat when the reset mutex is held, because it can be
taken from fence_wait. This conflicts with the mmu notifier we have,
because we recurse between reset mutex and mmap lock -> mmu notifier.

Remove this recursion by calling revoke_mmaps before taking the lock.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Thomas Hellstrom Jan. 18, 2021, 11:11 a.m. UTC | #1
On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
> We get a lockdep splat when the reset mutex is held, because it can be
> taken from fence_wait. This conflicts with the mmu notifier we have,
> because we recurse between reset mutex and mmap lock -> mmu notifier.
>
> Remove this recursion by calling revoke_mmaps before taking the lock.

Hmm. Is the mmap se taken from gt_revoke()?

If so, isn't the real problem that the mmap_sem is taken in the 
dma_fence critical path (where the reset code sits)?

/Thomas


>
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_reset.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 9d177297db79..3c0807d9a86e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -975,8 +975,6 @@ static int do_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
>   {
>   	int err, i;
>   
> -	gt_revoke(gt);
> -
>   	err = __intel_gt_reset(gt, ALL_ENGINES);
>   	for (i = 0; err && i < RESET_MAX_RETRIES; i++) {
>   		msleep(10 * (i + 1));
> @@ -1031,6 +1029,9 @@ void intel_gt_reset(struct intel_gt *gt,
>   
>   	might_sleep();
>   	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &gt->reset.flags));
> +
> +	gt_revoke(gt);
> +
>   	mutex_lock(&gt->reset.mutex);
>   
>   	/* Clear any previous failed attempts at recovery. Time to try again. */
Maarten Lankhorst Jan. 18, 2021, 12:01 p.m. UTC | #2
Op 18-01-2021 om 12:11 schreef Thomas Hellström:
>
> On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
>> We get a lockdep splat when the reset mutex is held, because it can be
>> taken from fence_wait. This conflicts with the mmu notifier we have,
>> because we recurse between reset mutex and mmap lock -> mmu notifier.
>>
>> Remove this recursion by calling revoke_mmaps before taking the lock.
>
> Hmm. Is the mmap se taken from gt_revoke()?
>
> If so, isn't the real problem that the mmap_sem is taken in the dma_fence critical path (where the reset code sits)? 

Hey,

The gpu reset code specifically needs to revoke all gtt mappings, and the fault handler uses intel_gt_reset_trylock(),

so this change should be ok since all those mappings are invalidated correctly and completed before this point.

The reset mutex isn't actually taken inside fence code, but used for lockdep validation, so this should be ok.

~Maarten
Thomas Hellstrom Jan. 18, 2021, 1:22 p.m. UTC | #3
On 1/18/21 1:01 PM, Maarten Lankhorst wrote:
> Op 18-01-2021 om 12:11 schreef Thomas Hellström:
>> On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
>>> We get a lockdep splat when the reset mutex is held, because it can be
>>> taken from fence_wait. This conflicts with the mmu notifier we have,
>>> because we recurse between reset mutex and mmap lock -> mmu notifier.
>>>
>>> Remove this recursion by calling revoke_mmaps before taking the lock.
>> Hmm. Is the mmap se taken from gt_revoke()?
>>
>> If so, isn't the real problem that the mmap_sem is taken in the dma_fence critical path (where the reset code sits)?
> Hey,
>
> The gpu reset code specifically needs to revoke all gtt mappings, and the fault handler uses intel_gt_reset_trylock(),
>
> so this change should be ok since all those mappings are invalidated correctly and completed before this point.
>
> The reset mutex isn't actually taken inside fence code, but used for lockdep validation, so this should be ok.
>
> ~Maarten

Hmm, OK but then we still have the following established locking order.

lock(fence_signaling)
lock(i_mmap_lock)

But in the notifier

lock(i_mmap_lock)
fence_signaling(within notifier)

So gt_revoke() is violating dma-fence rules.

BTW it looks to me like the reset mutex notation is actually doing much 
the same as the dma-fence annotations; While we can move gt_revoke() out 
of the reset mutex, that only gives us false hopes since it moves it out 
of the equivalent dma-fence annotation. I figure the reason this was not 
seen before the new code is that the reset mutex lockdep isn't taken 
when waiting for active. Only when waiting for dma-fence, but IMO the 
root problem is pre-existing.

/Thomas
Thomas Hellstrom Jan. 18, 2021, 1:28 p.m. UTC | #4
On 1/18/21 2:22 PM, Thomas Hellström wrote:
>
> On 1/18/21 1:01 PM, Maarten Lankhorst wrote:
>> Op 18-01-2021 om 12:11 schreef Thomas Hellström:
>>> On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
>>>> We get a lockdep splat when the reset mutex is held, because it can be
>>>> taken from fence_wait. This conflicts with the mmu notifier we have,
>>>> because we recurse between reset mutex and mmap lock -> mmu notifier.
>>>>
>>>> Remove this recursion by calling revoke_mmaps before taking the lock.
>>> Hmm. Is the mmap se taken from gt_revoke()?
>>>
>>> If so, isn't the real problem that the mmap_sem is taken in the 
>>> dma_fence critical path (where the reset code sits)?
>> Hey,
>>
>> The gpu reset code specifically needs to revoke all gtt mappings, and 
>> the fault handler uses intel_gt_reset_trylock(),
>>
>> so this change should be ok since all those mappings are invalidated 
>> correctly and completed before this point.
>>
>> The reset mutex isn't actually taken inside fence code, but used for 
>> lockdep validation, so this should be ok.
>>
>> ~Maarten
>
> Hmm, OK but then we still have the following established locking order.
>
> lock(fence_signaling)
> lock(i_mmap_lock)
>
> But in the notifier
>
> lock(i_mmap_lock)
> fence_signaling(within notifier)
>
> So gt_revoke() is violating dma-fence rules.
>
> BTW it looks to me like the reset mutex notation is actually doing 
> much the same as the dma-fence annotations; While we can move 
> gt_revoke() out of the reset mutex, that only gives us false hopes 
> since it moves it out of the equivalent dma-fence annotation. I figure 
> the reason this was not seen before the new code is that the reset 
> mutex lockdep isn't taken when waiting for active. Only when waiting 
> for dma-fence, but IMO the root problem is pre-existing.
>
> /Thomas
>
>
The interesting scenario is

thread 1:
take i_mmap_lock()
enter_mmu_notifier()
wait_fence()

thread 2:
need_to_reset_gpu_for_the_above_fence();
take i_mmap_lock()

Deadlock.

/Thomas
Maarten Lankhorst Jan. 18, 2021, 2:46 p.m. UTC | #5
Op 18-01-2021 om 14:28 schreef Thomas Hellström:
>
> On 1/18/21 2:22 PM, Thomas Hellström wrote:
>>
>> On 1/18/21 1:01 PM, Maarten Lankhorst wrote:
>>> Op 18-01-2021 om 12:11 schreef Thomas Hellström:
>>>> On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
>>>>> We get a lockdep splat when the reset mutex is held, because it can be
>>>>> taken from fence_wait. This conflicts with the mmu notifier we have,
>>>>> because we recurse between reset mutex and mmap lock -> mmu notifier.
>>>>>
>>>>> Remove this recursion by calling revoke_mmaps before taking the lock.
>>>> Hmm. Is the mmap se taken from gt_revoke()?
>>>>
>>>> If so, isn't the real problem that the mmap_sem is taken in the dma_fence critical path (where the reset code sits)?
>>> Hey,
>>>
>>> The gpu reset code specifically needs to revoke all gtt mappings, and the fault handler uses intel_gt_reset_trylock(),
>>>
>>> so this change should be ok since all those mappings are invalidated correctly and completed before this point.
>>>
>>> The reset mutex isn't actually taken inside fence code, but used for lockdep validation, so this should be ok.
>>>
>>> ~Maarten
>>
>> Hmm, OK but then we still have the following established locking order.
>>
>> lock(fence_signaling)
>> lock(i_mmap_lock)
>>
>> But in the notifier
>>
>> lock(i_mmap_lock)
>> fence_signaling(within notifier)
>>
>> So gt_revoke() is violating dma-fence rules.
>>
>> BTW it looks to me like the reset mutex notation is actually doing much the same as the dma-fence annotations; While we can move gt_revoke() out of the reset mutex, that only gives us false hopes since it moves it out of the equivalent dma-fence annotation. I figure the reason this was not seen before the new code is that the reset mutex lockdep isn't taken when waiting for active. Only when waiting for dma-fence, but IMO the root problem is pre-existing.
>>
>> /Thomas
>>
>>
> The interesting scenario is
>
> thread 1:
> take i_mmap_lock()
> enter_mmu_notifier()
> wait_fence()
>
> thread 2:
> need_to_reset_gpu_for_the_above_fence();
> take i_mmap_lock()
>
> Deadlock.
>
> /Thomas
>
>
Yeah, I think gpu reset isn't completely following lockdep rules yet. Thread 1 isn't doing anything wrong, gpu reset probably should stop revoking gt bindings, and allow some garbage during reset. I don't see another way out. :-/
Thomas Hellstrom Jan. 18, 2021, 3:05 p.m. UTC | #6
On 1/18/21 3:46 PM, Maarten Lankhorst wrote:
> Op 18-01-2021 om 14:28 schreef Thomas Hellström:
>> On 1/18/21 2:22 PM, Thomas Hellström wrote:
>>> On 1/18/21 1:01 PM, Maarten Lankhorst wrote:
>>>> Op 18-01-2021 om 12:11 schreef Thomas Hellström:
>>>>> On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
>>>>>> We get a lockdep splat when the reset mutex is held, because it can be
>>>>>> taken from fence_wait. This conflicts with the mmu notifier we have,
>>>>>> because we recurse between reset mutex and mmap lock -> mmu notifier.
>>>>>>
>>>>>> Remove this recursion by calling revoke_mmaps before taking the lock.
>>>>> Hmm. Is the mmap se taken from gt_revoke()?
>>>>>
>>>>> If so, isn't the real problem that the mmap_sem is taken in the dma_fence critical path (where the reset code sits)?
>>>> Hey,
>>>>
>>>> The gpu reset code specifically needs to revoke all gtt mappings, and the fault handler uses intel_gt_reset_trylock(),
>>>>
>>>> so this change should be ok since all those mappings are invalidated correctly and completed before this point.
>>>>
>>>> The reset mutex isn't actually taken inside fence code, but used for lockdep validation, so this should be ok.
>>>>
>>>> ~Maarten
>>> Hmm, OK but then we still have the following established locking order.
>>>
>>> lock(fence_signaling)
>>> lock(i_mmap_lock)
>>>
>>> But in the notifier
>>>
>>> lock(i_mmap_lock)
>>> fence_signaling(within notifier)
>>>
>>> So gt_revoke() is violating dma-fence rules.
>>>
>>> BTW it looks to me like the reset mutex notation is actually doing much the same as the dma-fence annotations; While we can move gt_revoke() out of the reset mutex, that only gives us false hopes since it moves it out of the equivalent dma-fence annotation. I figure the reason this was not seen before the new code is that the reset mutex lockdep isn't taken when waiting for active. Only when waiting for dma-fence, but IMO the root problem is pre-existing.
>>>
>>> /Thomas
>>>
>>>
>> The interesting scenario is
>>
>> thread 1:
>> take i_mmap_lock()
>> enter_mmu_notifier()
>> wait_fence()
>>
>> thread 2:
>> need_to_reset_gpu_for_the_above_fence();
>> take i_mmap_lock()
>>
>> Deadlock.
>>
>> /Thomas
>>
>>
> Yeah, I think gpu reset isn't completely following lockdep rules yet. Thread 1 isn't doing anything wrong, gpu reset probably should stop revoking gt bindings, and allow some garbage during reset. I don't see another way out. :-/

Me neither.

But to silence lockdep until dma_fence annotation is widely added:

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Maarten Lankhorst Jan. 18, 2021, 3:32 p.m. UTC | #7
Op 18-01-2021 om 16:05 schreef Thomas Hellström:
>
> On 1/18/21 3:46 PM, Maarten Lankhorst wrote:
>> Op 18-01-2021 om 14:28 schreef Thomas Hellström:
>>> On 1/18/21 2:22 PM, Thomas Hellström wrote:
>>>> On 1/18/21 1:01 PM, Maarten Lankhorst wrote:
>>>>> Op 18-01-2021 om 12:11 schreef Thomas Hellström:
>>>>>> On 1/5/21 4:35 PM, Maarten Lankhorst wrote:
>>>>>>> We get a lockdep splat when the reset mutex is held, because it can be
>>>>>>> taken from fence_wait. This conflicts with the mmu notifier we have,
>>>>>>> because we recurse between reset mutex and mmap lock -> mmu notifier.
>>>>>>>
>>>>>>> Remove this recursion by calling revoke_mmaps before taking the lock.
>>>>>> Hmm. Is the mmap se taken from gt_revoke()?
>>>>>>
>>>>>> If so, isn't the real problem that the mmap_sem is taken in the dma_fence critical path (where the reset code sits)?
>>>>> Hey,
>>>>>
>>>>> The gpu reset code specifically needs to revoke all gtt mappings, and the fault handler uses intel_gt_reset_trylock(),
>>>>>
>>>>> so this change should be ok since all those mappings are invalidated correctly and completed before this point.
>>>>>
>>>>> The reset mutex isn't actually taken inside fence code, but used for lockdep validation, so this should be ok.
>>>>>
>>>>> ~Maarten
>>>> Hmm, OK but then we still have the following established locking order.
>>>>
>>>> lock(fence_signaling)
>>>> lock(i_mmap_lock)
>>>>
>>>> But in the notifier
>>>>
>>>> lock(i_mmap_lock)
>>>> fence_signaling(within notifier)
>>>>
>>>> So gt_revoke() is violating dma-fence rules.
>>>>
>>>> BTW it looks to me like the reset mutex notation is actually doing much the same as the dma-fence annotations; While we can move gt_revoke() out of the reset mutex, that only gives us false hopes since it moves it out of the equivalent dma-fence annotation. I figure the reason this was not seen before the new code is that the reset mutex lockdep isn't taken when waiting for active. Only when waiting for dma-fence, but IMO the root problem is pre-existing.
>>>>
>>>> /Thomas
>>>>
>>>>
>>> The interesting scenario is
>>>
>>> thread 1:
>>> take i_mmap_lock()
>>> enter_mmu_notifier()
>>> wait_fence()
>>>
>>> thread 2:
>>> need_to_reset_gpu_for_the_above_fence();
>>> take i_mmap_lock()
>>>
>>> Deadlock.
>>>
>>> /Thomas
>>>
>>>
>> Yeah, I think gpu reset isn't completely following lockdep rules yet. Thread 1 isn't doing anything wrong, gpu reset probably should stop revoking gt bindings, and allow some garbage during reset. I don't see another way out. :-/
>
> Me neither.
>
> But to silence lockdep until dma_fence annotation is widely added:
>
> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>
>
Ideally we'll add fence signaling annotations to gpu reset, to exactly detect these kind of things. Hopefully in the future. :)
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 9d177297db79..3c0807d9a86e 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -975,8 +975,6 @@  static int do_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
 {
 	int err, i;
 
-	gt_revoke(gt);
-
 	err = __intel_gt_reset(gt, ALL_ENGINES);
 	for (i = 0; err && i < RESET_MAX_RETRIES; i++) {
 		msleep(10 * (i + 1));
@@ -1031,6 +1029,9 @@  void intel_gt_reset(struct intel_gt *gt,
 
 	might_sleep();
 	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &gt->reset.flags));
+
+	gt_revoke(gt);
+
 	mutex_lock(&gt->reset.mutex);
 
 	/* Clear any previous failed attempts at recovery. Time to try again. */