Message ID | 20180525093206.1919-5-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Chris Wilson <chris@chris-wilson.co.uk> writes: > As we reset the GPU on suspend/resume, we also do need to reset the > engine state tracking so call into the engine backends. This is > especially important so that we can also sanitize the state tracking > across resume. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 7b5544efa0ba..5a7e0b388ad0 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915) > > void i915_gem_sanitize(struct drm_i915_private *i915) > { > + struct intel_engine_cs *engine; > + enum intel_engine_id id; > + > + GEM_TRACE("\n"); > + > mutex_lock(&i915->drm.struct_mutex); > + > + intel_runtime_pm_get(i915); > + intel_uncore_forcewake_get(i915, FORCEWAKE_ALL); > + > + /* > + * As we have just resumed the machine and woken the device up from > + * deep PCI sleep (presumably D3_cold), assume the HW has been reset > + * back to defaults, recovering from whatever wedged state we left it > + * in and so worth trying to use the device once more. > + */ > if (i915_terminally_wedged(&i915->gpu_error)) > i915_gem_unset_wedged(i915); > > @@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915) > if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) > WARN_ON(intel_gpu_reset(i915, ALL_ENGINES)); > > + /* Reset the submission backend after resume as well as the GPU reset */ > + for_each_engine(engine, i915, id) { > + if (engine->reset.reset) > + engine->reset.reset(engine, NULL); > + } The NULL guarantees that it wont try to do any funny things with the incomplete state. But what guarantees the the timeline cleanup so that we don't endup unwinding incomplete requests crap? -Mika > + > + intel_uncore_forcewake_put(i915, FORCEWAKE_ALL); > + intel_runtime_pm_put(i915); > + > i915_gem_contexts_lost(i915); > mutex_unlock(&i915->drm.struct_mutex); > } > -- > 2.17.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Mika Kuoppala (2018-05-25 14:13:19) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > As we reset the GPU on suspend/resume, we also do need to reset the > > engine state tracking so call into the engine backends. This is > > especially important so that we can also sanitize the state tracking > > across resume. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++ > > 1 file changed, 24 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 7b5544efa0ba..5a7e0b388ad0 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915) > > > > void i915_gem_sanitize(struct drm_i915_private *i915) > > { > > + struct intel_engine_cs *engine; > > + enum intel_engine_id id; > > + > > + GEM_TRACE("\n"); > > + > > mutex_lock(&i915->drm.struct_mutex); > > + > > + intel_runtime_pm_get(i915); > > + intel_uncore_forcewake_get(i915, FORCEWAKE_ALL); > > + > > + /* > > + * As we have just resumed the machine and woken the device up from > > + * deep PCI sleep (presumably D3_cold), assume the HW has been reset > > + * back to defaults, recovering from whatever wedged state we left it > > + * in and so worth trying to use the device once more. > > + */ > > if (i915_terminally_wedged(&i915->gpu_error)) > > i915_gem_unset_wedged(i915); > > > > @@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915) > > if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) > > WARN_ON(intel_gpu_reset(i915, ALL_ENGINES)); > > > > + /* Reset the submission backend after resume as well as the GPU reset */ > > + for_each_engine(engine, i915, id) { > > + if (engine->reset.reset) > > + engine->reset.reset(engine, NULL); > > + } > > The NULL guarantees that it wont try to do any funny things > with the incomplete state. The NULL is there because this gets called really, really early before we've finished setting up the engines. > But what guarantees the the timeline cleanup so that > we don't endup unwinding incomplete requests crap? To get here we must have gone through at least the start of a suspend. So we've already cleaned everything up; nicely or forcefully though a wedge. Whatever is here is garbage, including any internal knowledge in the backend about what state we left the machine in. -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > Quoting Mika Kuoppala (2018-05-25 14:13:19) >> Chris Wilson <chris@chris-wilson.co.uk> writes: >> >> > As we reset the GPU on suspend/resume, we also do need to reset the >> > engine state tracking so call into the engine backends. This is >> > especially important so that we can also sanitize the state tracking >> > across resume. >> > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> > --- >> > drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++ >> > 1 file changed, 24 insertions(+) >> > >> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c >> > index 7b5544efa0ba..5a7e0b388ad0 100644 >> > --- a/drivers/gpu/drm/i915/i915_gem.c >> > +++ b/drivers/gpu/drm/i915/i915_gem.c >> > @@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915) >> > >> > void i915_gem_sanitize(struct drm_i915_private *i915) >> > { >> > + struct intel_engine_cs *engine; >> > + enum intel_engine_id id; >> > + >> > + GEM_TRACE("\n"); >> > + >> > mutex_lock(&i915->drm.struct_mutex); >> > + >> > + intel_runtime_pm_get(i915); >> > + intel_uncore_forcewake_get(i915, FORCEWAKE_ALL); >> > + >> > + /* >> > + * As we have just resumed the machine and woken the device up from >> > + * deep PCI sleep (presumably D3_cold), assume the HW has been reset >> > + * back to defaults, recovering from whatever wedged state we left it >> > + * in and so worth trying to use the device once more. >> > + */ >> > if (i915_terminally_wedged(&i915->gpu_error)) >> > i915_gem_unset_wedged(i915); >> > >> > @@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915) >> > if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) >> > WARN_ON(intel_gpu_reset(i915, ALL_ENGINES)); >> > >> > + /* Reset the submission backend after resume as well as the GPU reset */ >> > + for_each_engine(engine, i915, id) { >> > + if (engine->reset.reset) >> > + engine->reset.reset(engine, NULL); >> > + } >> >> The NULL guarantees that it wont try to do any funny things >> with the incomplete state. > > The NULL is there because this gets called really, really early before > we've finished setting up the engines. > >> But what guarantees the the timeline cleanup so that >> we don't endup unwinding incomplete requests crap? > > To get here we must have gone through at least the start of a suspend. > So we've already cleaned everything up; nicely or forcefully though a > wedge. Whatever is here is garbage, including any internal knowledge in > the backend about what state we left the machine in. Fair enough, Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 7b5544efa0ba..5a7e0b388ad0 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915) void i915_gem_sanitize(struct drm_i915_private *i915) { + struct intel_engine_cs *engine; + enum intel_engine_id id; + + GEM_TRACE("\n"); + mutex_lock(&i915->drm.struct_mutex); + + intel_runtime_pm_get(i915); + intel_uncore_forcewake_get(i915, FORCEWAKE_ALL); + + /* + * As we have just resumed the machine and woken the device up from + * deep PCI sleep (presumably D3_cold), assume the HW has been reset + * back to defaults, recovering from whatever wedged state we left it + * in and so worth trying to use the device once more. + */ if (i915_terminally_wedged(&i915->gpu_error)) i915_gem_unset_wedged(i915); @@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915) if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) WARN_ON(intel_gpu_reset(i915, ALL_ENGINES)); + /* Reset the submission backend after resume as well as the GPU reset */ + for_each_engine(engine, i915, id) { + if (engine->reset.reset) + engine->reset.reset(engine, NULL); + } + + intel_uncore_forcewake_put(i915, FORCEWAKE_ALL); + intel_runtime_pm_put(i915); + i915_gem_contexts_lost(i915); mutex_unlock(&i915->drm.struct_mutex); }
As we reset the GPU on suspend/resume, we also do need to reset the engine state tracking so call into the engine backends. This is especially important so that we can also sanitize the state tracking across resume. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)