Message ID | 20200612144451.9081-1-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/selftests: Move test flush to outside vm->mutex | expand |
Quoting Tvrtko Ursulin (2020-06-12 15:44:51) > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > As per our locking rules it is not allowed to wait on requests while > holding locks. In this case we were trying to idle the GPU while holding > the vm->mutex. Synchronous eviction would like to have a word. > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/selftests/i915_gem_evict.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c > index 028baae9631f..67f4497c8224 100644 > --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c > @@ -498,8 +498,6 @@ static int igt_evict_contexts(void *arg) > > mutex_lock(&ggtt->vm.mutex); > out_locked: > - if (igt_flush_test(i915)) > - err = -EIO; > while (reserved) { > struct reserved *next = reserved->next; > > @@ -513,6 +511,9 @@ static int igt_evict_contexts(void *arg) > mutex_unlock(&ggtt->vm.mutex); > intel_runtime_pm_put(&i915->runtime_pm, wakeref); > > + if (igt_flush_test(i915)) > + err = -EIO; The patch is ok, since the manual drm_mm_node reservations are not used by the GTT, but the reason is a bit specious. -Chris
On 12/06/2020 15:55, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2020-06-12 15:44:51) >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> As per our locking rules it is not allowed to wait on requests while >> holding locks. In this case we were trying to idle the GPU while holding >> the vm->mutex. > > Synchronous eviction would like to have a word. > >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> --- >> drivers/gpu/drm/i915/selftests/i915_gem_evict.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c >> index 028baae9631f..67f4497c8224 100644 >> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c >> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c >> @@ -498,8 +498,6 @@ static int igt_evict_contexts(void *arg) >> >> mutex_lock(&ggtt->vm.mutex); >> out_locked: >> - if (igt_flush_test(i915)) >> - err = -EIO; >> while (reserved) { >> struct reserved *next = reserved->next; >> >> @@ -513,6 +511,9 @@ static int igt_evict_contexts(void *arg) >> mutex_unlock(&ggtt->vm.mutex); >> intel_runtime_pm_put(&i915->runtime_pm, wakeref); >> >> + if (igt_flush_test(i915)) >> + err = -EIO; > > The patch is ok, since the manual drm_mm_node reservations are not used > by the GTT, but the reason is a bit specious. We have a comment in i915_request_wait which says: /* * We must never wait on the GPU while holding a lock as we * may need to perform a GPU reset. So while we don't need to * serialise wait/reset with an explicit lock, we do want * lockdep to detect potential dependency cycles. */ And then there was a lockdep splat here https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_6595/fi-skl-6700k2/igt@i915_selftest@live@evict.html, which although uses some extra lockdep annotation patches, seemed to connect the two: <4> [258.014638] Chain exists of: >->reset.mutex --> fs_reclaim --> &vm->mutex <4> [258.014640] Possible unsafe locking scenario: <4> [258.014641] CPU0 CPU1 <4> [258.014641] ---- ---- <4> [258.014642] lock(&vm->mutex); <4> [258.014642] lock(fs_reclaim); <4> [258.014643] lock(&vm->mutex); <4> [258.014644] lock(>->reset.mutex); <4> [258.014645] *** DEADLOCK *** <4> [258.014646] 2 locks held by i915_selftest/5153: Why despite the comment in request wait it does not otherwise see this I don't know. Regards, Tvrtko
Quoting Tvrtko Ursulin (2020-06-12 16:04:15) > > On 12/06/2020 15:55, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2020-06-12 15:44:51) > >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> > >> As per our locking rules it is not allowed to wait on requests while > >> holding locks. In this case we were trying to idle the GPU while holding > >> the vm->mutex. > > > > Synchronous eviction would like to have a word. > > > >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> --- > >> drivers/gpu/drm/i915/selftests/i915_gem_evict.c | 5 +++-- > >> 1 file changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c > >> index 028baae9631f..67f4497c8224 100644 > >> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c > >> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c > >> @@ -498,8 +498,6 @@ static int igt_evict_contexts(void *arg) > >> > >> mutex_lock(&ggtt->vm.mutex); > >> out_locked: > >> - if (igt_flush_test(i915)) > >> - err = -EIO; > >> while (reserved) { > >> struct reserved *next = reserved->next; > >> > >> @@ -513,6 +511,9 @@ static int igt_evict_contexts(void *arg) > >> mutex_unlock(&ggtt->vm.mutex); > >> intel_runtime_pm_put(&i915->runtime_pm, wakeref); > >> > >> + if (igt_flush_test(i915)) > >> + err = -EIO; > > > > The patch is ok, since the manual drm_mm_node reservations are not used > > by the GTT, but the reason is a bit specious. > > We have a comment in i915_request_wait which says: > > /* > * We must never wait on the GPU while holding a lock as we > * may need to perform a GPU reset. So while we don't need to > * serialise wait/reset with an explicit lock, we do want > * lockdep to detect potential dependency cycles. > */ That's for a lock used by reset. > And then there was a lockdep splat here > https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_6595/fi-skl-6700k2/igt@i915_selftest@live@evict.html, > which although uses some extra lockdep annotation patches, seemed to > connect the two: > > <4> [258.014638] Chain exists of: > >->reset.mutex --> fs_reclaim --> &vm->mutex > <4> [258.014640] Possible unsafe locking scenario: > <4> [258.014641] CPU0 CPU1 > <4> [258.014641] ---- ---- > <4> [258.014642] lock(&vm->mutex); > <4> [258.014642] lock(fs_reclaim); > <4> [258.014643] lock(&vm->mutex); > <4> [258.014644] lock(>->reset.mutex); > <4> [258.014645] > *** DEADLOCK *** > <4> [258.014646] 2 locks held by i915_selftest/5153: is false. -Chris
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c index 028baae9631f..67f4497c8224 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c @@ -498,8 +498,6 @@ static int igt_evict_contexts(void *arg) mutex_lock(&ggtt->vm.mutex); out_locked: - if (igt_flush_test(i915)) - err = -EIO; while (reserved) { struct reserved *next = reserved->next; @@ -513,6 +511,9 @@ static int igt_evict_contexts(void *arg) mutex_unlock(&ggtt->vm.mutex); intel_runtime_pm_put(&i915->runtime_pm, wakeref); + if (igt_flush_test(i915)) + err = -EIO; + return err; }