drm/i915/selftests: Move test flush to outside vm->mutex
diff mbox series

Message ID 20200612144451.9081-1-tvrtko.ursulin@linux.intel.com
State New
Headers show
Series
  • drm/i915/selftests: Move test flush to outside vm->mutex
Related show

Commit Message

Tvrtko Ursulin June 12, 2020, 2:44 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As per our locking rules it is not allowed to wait on requests while
holding locks. In this case we were trying to idle the GPU while holding
the vm->mutex.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_evict.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Chris Wilson June 12, 2020, 2:55 p.m. UTC | #1
Quoting Tvrtko Ursulin (2020-06-12 15:44:51)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> As per our locking rules it is not allowed to wait on requests while
> holding locks. In this case we were trying to idle the GPU while holding
> the vm->mutex.

Synchronous eviction would like to have a word.

> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/selftests/i915_gem_evict.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> index 028baae9631f..67f4497c8224 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> @@ -498,8 +498,6 @@ static int igt_evict_contexts(void *arg)
>  
>         mutex_lock(&ggtt->vm.mutex);
>  out_locked:
> -       if (igt_flush_test(i915))
> -               err = -EIO;
>         while (reserved) {
>                 struct reserved *next = reserved->next;
>  
> @@ -513,6 +511,9 @@ static int igt_evict_contexts(void *arg)
>         mutex_unlock(&ggtt->vm.mutex);
>         intel_runtime_pm_put(&i915->runtime_pm, wakeref);
>  
> +       if (igt_flush_test(i915))
> +               err = -EIO;

The patch is ok, since the manual drm_mm_node reservations are not used
by the GTT, but the reason is a bit specious.
-Chris
Tvrtko Ursulin June 12, 2020, 3:04 p.m. UTC | #2
On 12/06/2020 15:55, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-06-12 15:44:51)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> As per our locking rules it is not allowed to wait on requests while
>> holding locks. In this case we were trying to idle the GPU while holding
>> the vm->mutex.
> 
> Synchronous eviction would like to have a word.
> 
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/selftests/i915_gem_evict.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> index 028baae9631f..67f4497c8224 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
>> @@ -498,8 +498,6 @@ static int igt_evict_contexts(void *arg)
>>   
>>          mutex_lock(&ggtt->vm.mutex);
>>   out_locked:
>> -       if (igt_flush_test(i915))
>> -               err = -EIO;
>>          while (reserved) {
>>                  struct reserved *next = reserved->next;
>>   
>> @@ -513,6 +511,9 @@ static int igt_evict_contexts(void *arg)
>>          mutex_unlock(&ggtt->vm.mutex);
>>          intel_runtime_pm_put(&i915->runtime_pm, wakeref);
>>   
>> +       if (igt_flush_test(i915))
>> +               err = -EIO;
> 
> The patch is ok, since the manual drm_mm_node reservations are not used
> by the GTT, but the reason is a bit specious.

We have a comment in i915_request_wait which says:

	/*
	 * We must never wait on the GPU while holding a lock as we
	 * may need to perform a GPU reset. So while we don't need to
	 * serialise wait/reset with an explicit lock, we do want
	 * lockdep to detect potential dependency cycles.
	 */

And then there was a lockdep splat here 
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_6595/fi-skl-6700k2/igt@i915_selftest@live@evict.html, 
which although uses some extra lockdep annotation patches, seemed to 
connect the two:

<4> [258.014638] Chain exists of:
   &gt->reset.mutex --> fs_reclaim --> &vm->mutex
<4> [258.014640]  Possible unsafe locking scenario:
<4> [258.014641]        CPU0                    CPU1
<4> [258.014641]        ----                    ----
<4> [258.014642]   lock(&vm->mutex);
<4> [258.014642]                                lock(fs_reclaim);
<4> [258.014643]                                lock(&vm->mutex);
<4> [258.014644]   lock(&gt->reset.mutex);
<4> [258.014645]
  *** DEADLOCK ***
<4> [258.014646] 2 locks held by i915_selftest/5153:

Why despite the comment in request wait it does not otherwise see this I 
don't know.

Regards,

Tvrtko
Chris Wilson June 12, 2020, 3:11 p.m. UTC | #3
Quoting Tvrtko Ursulin (2020-06-12 16:04:15)
> 
> On 12/06/2020 15:55, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-06-12 15:44:51)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> As per our locking rules it is not allowed to wait on requests while
> >> holding locks. In this case we were trying to idle the GPU while holding
> >> the vm->mutex.
> > 
> > Synchronous eviction would like to have a word.
> > 
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/selftests/i915_gem_evict.c | 5 +++--
> >>   1 file changed, 3 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> >> index 028baae9631f..67f4497c8224 100644
> >> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> >> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> >> @@ -498,8 +498,6 @@ static int igt_evict_contexts(void *arg)
> >>   
> >>          mutex_lock(&ggtt->vm.mutex);
> >>   out_locked:
> >> -       if (igt_flush_test(i915))
> >> -               err = -EIO;
> >>          while (reserved) {
> >>                  struct reserved *next = reserved->next;
> >>   
> >> @@ -513,6 +511,9 @@ static int igt_evict_contexts(void *arg)
> >>          mutex_unlock(&ggtt->vm.mutex);
> >>          intel_runtime_pm_put(&i915->runtime_pm, wakeref);
> >>   
> >> +       if (igt_flush_test(i915))
> >> +               err = -EIO;
> > 
> > The patch is ok, since the manual drm_mm_node reservations are not used
> > by the GTT, but the reason is a bit specious.
> 
> We have a comment in i915_request_wait which says:
> 
>         /*
>          * We must never wait on the GPU while holding a lock as we
>          * may need to perform a GPU reset. So while we don't need to
>          * serialise wait/reset with an explicit lock, we do want
>          * lockdep to detect potential dependency cycles.
>          */

That's for a lock used by reset.

> And then there was a lockdep splat here 
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_6595/fi-skl-6700k2/igt@i915_selftest@live@evict.html, 
> which although uses some extra lockdep annotation patches, seemed to 
> connect the two:
> 
> <4> [258.014638] Chain exists of:
>    &gt->reset.mutex --> fs_reclaim --> &vm->mutex
> <4> [258.014640]  Possible unsafe locking scenario:
> <4> [258.014641]        CPU0                    CPU1
> <4> [258.014641]        ----                    ----
> <4> [258.014642]   lock(&vm->mutex);
> <4> [258.014642]                                lock(fs_reclaim);
> <4> [258.014643]                                lock(&vm->mutex);
> <4> [258.014644]   lock(&gt->reset.mutex);
> <4> [258.014645]
>   *** DEADLOCK ***
> <4> [258.014646] 2 locks held by i915_selftest/5153:

is false.
-Chris

Patch
diff mbox series

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 028baae9631f..67f4497c8224 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -498,8 +498,6 @@  static int igt_evict_contexts(void *arg)
 
 	mutex_lock(&ggtt->vm.mutex);
 out_locked:
-	if (igt_flush_test(i915))
-		err = -EIO;
 	while (reserved) {
 		struct reserved *next = reserved->next;
 
@@ -513,6 +511,9 @@  static int igt_evict_contexts(void *arg)
 	mutex_unlock(&ggtt->vm.mutex);
 	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
 
+	if (igt_flush_test(i915))
+		err = -EIO;
+
 	return err;
 }