diff mbox

[04/18] drm/i915: After reset on sanitization, reset the engine backends

Message ID 20180525093206.1919-5-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 25, 2018, 9:31 a.m. UTC
As we reset the GPU on suspend/resume, we also do need to reset the
engine state tracking so call into the engine backends. This is
especially important so that we can also sanitize the state tracking
across resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

Mika Kuoppala May 25, 2018, 1:13 p.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> As we reset the GPU on suspend/resume, we also do need to reset the
> engine state tracking so call into the engine backends. This is
> especially important so that we can also sanitize the state tracking
> across resume.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7b5544efa0ba..5a7e0b388ad0 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
>  
>  void i915_gem_sanitize(struct drm_i915_private *i915)
>  {
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	GEM_TRACE("\n");
> +
>  	mutex_lock(&i915->drm.struct_mutex);
> +
> +	intel_runtime_pm_get(i915);
> +	intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
> +
> +	/*
> +	 * As we have just resumed the machine and woken the device up from
> +	 * deep PCI sleep (presumably D3_cold), assume the HW has been reset
> +	 * back to defaults, recovering from whatever wedged state we left it
> +	 * in and so worth trying to use the device once more.
> +	 */
>  	if (i915_terminally_wedged(&i915->gpu_error))
>  		i915_gem_unset_wedged(i915);
>  
> @@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
>  	if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
>  		WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
>  
> +	/* Reset the submission backend after resume as well as the GPU reset */
> +	for_each_engine(engine, i915, id) {
> +		if (engine->reset.reset)
> +			engine->reset.reset(engine, NULL);
> +	}

The NULL guarantees that it wont try to do any funny things
with the incomplete state.

But what guarantees the the timeline cleanup so that
we don't endup unwinding incomplete requests crap?
-Mika

> +
> +	intel_uncore_forcewake_put(i915, FORCEWAKE_ALL);
> +	intel_runtime_pm_put(i915);
> +
>  	i915_gem_contexts_lost(i915);
>  	mutex_unlock(&i915->drm.struct_mutex);
>  }
> -- 
> 2.17.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson May 25, 2018, 1:17 p.m. UTC | #2
Quoting Mika Kuoppala (2018-05-25 14:13:19)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > As we reset the GPU on suspend/resume, we also do need to reset the
> > engine state tracking so call into the engine backends. This is
> > especially important so that we can also sanitize the state tracking
> > across resume.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 7b5544efa0ba..5a7e0b388ad0 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
> >  
> >  void i915_gem_sanitize(struct drm_i915_private *i915)
> >  {
> > +     struct intel_engine_cs *engine;
> > +     enum intel_engine_id id;
> > +
> > +     GEM_TRACE("\n");
> > +
> >       mutex_lock(&i915->drm.struct_mutex);
> > +
> > +     intel_runtime_pm_get(i915);
> > +     intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
> > +
> > +     /*
> > +      * As we have just resumed the machine and woken the device up from
> > +      * deep PCI sleep (presumably D3_cold), assume the HW has been reset
> > +      * back to defaults, recovering from whatever wedged state we left it
> > +      * in and so worth trying to use the device once more.
> > +      */
> >       if (i915_terminally_wedged(&i915->gpu_error))
> >               i915_gem_unset_wedged(i915);
> >  
> > @@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
> >       if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
> >               WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
> >  
> > +     /* Reset the submission backend after resume as well as the GPU reset */
> > +     for_each_engine(engine, i915, id) {
> > +             if (engine->reset.reset)
> > +                     engine->reset.reset(engine, NULL);
> > +     }
> 
> The NULL guarantees that it wont try to do any funny things
> with the incomplete state.

The NULL is there because this gets called really, really early before
we've finished setting up the engines.

> But what guarantees the the timeline cleanup so that
> we don't endup unwinding incomplete requests crap?

To get here we must have gone through at least the start of a suspend.
So we've already cleaned everything up; nicely or forcefully though a
wedge. Whatever is here is garbage, including any internal knowledge in
the backend about what state we left the machine in.
-Chris
Mika Kuoppala May 25, 2018, 1:25 p.m. UTC | #3
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-05-25 14:13:19)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > As we reset the GPU on suspend/resume, we also do need to reset the
>> > engine state tracking so call into the engine backends. This is
>> > especially important so that we can also sanitize the state tracking
>> > across resume.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > ---
>> >  drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++
>> >  1 file changed, 24 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> > index 7b5544efa0ba..5a7e0b388ad0 100644
>> > --- a/drivers/gpu/drm/i915/i915_gem.c
>> > +++ b/drivers/gpu/drm/i915/i915_gem.c
>> > @@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
>> >  
>> >  void i915_gem_sanitize(struct drm_i915_private *i915)
>> >  {
>> > +     struct intel_engine_cs *engine;
>> > +     enum intel_engine_id id;
>> > +
>> > +     GEM_TRACE("\n");
>> > +
>> >       mutex_lock(&i915->drm.struct_mutex);
>> > +
>> > +     intel_runtime_pm_get(i915);
>> > +     intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
>> > +
>> > +     /*
>> > +      * As we have just resumed the machine and woken the device up from
>> > +      * deep PCI sleep (presumably D3_cold), assume the HW has been reset
>> > +      * back to defaults, recovering from whatever wedged state we left it
>> > +      * in and so worth trying to use the device once more.
>> > +      */
>> >       if (i915_terminally_wedged(&i915->gpu_error))
>> >               i915_gem_unset_wedged(i915);
>> >  
>> > @@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
>> >       if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
>> >               WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
>> >  
>> > +     /* Reset the submission backend after resume as well as the GPU reset */
>> > +     for_each_engine(engine, i915, id) {
>> > +             if (engine->reset.reset)
>> > +                     engine->reset.reset(engine, NULL);
>> > +     }
>> 
>> The NULL guarantees that it wont try to do any funny things
>> with the incomplete state.
>
> The NULL is there because this gets called really, really early before
> we've finished setting up the engines.
>
>> But what guarantees the the timeline cleanup so that
>> we don't endup unwinding incomplete requests crap?
>
> To get here we must have gone through at least the start of a suspend.
> So we've already cleaned everything up; nicely or forcefully though a
> wedge. Whatever is here is garbage, including any internal knowledge in
> the backend about what state we left the machine in.

Fair enough,

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7b5544efa0ba..5a7e0b388ad0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4955,7 +4955,22 @@  static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 
 void i915_gem_sanitize(struct drm_i915_private *i915)
 {
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	GEM_TRACE("\n");
+
 	mutex_lock(&i915->drm.struct_mutex);
+
+	intel_runtime_pm_get(i915);
+	intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
+
+	/*
+	 * As we have just resumed the machine and woken the device up from
+	 * deep PCI sleep (presumably D3_cold), assume the HW has been reset
+	 * back to defaults, recovering from whatever wedged state we left it
+	 * in and so worth trying to use the device once more.
+	 */
 	if (i915_terminally_wedged(&i915->gpu_error))
 		i915_gem_unset_wedged(i915);
 
@@ -4970,6 +4985,15 @@  void i915_gem_sanitize(struct drm_i915_private *i915)
 	if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
 		WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
 
+	/* Reset the submission backend after resume as well as the GPU reset */
+	for_each_engine(engine, i915, id) {
+		if (engine->reset.reset)
+			engine->reset.reset(engine, NULL);
+	}
+
+	intel_uncore_forcewake_put(i915, FORCEWAKE_ALL);
+	intel_runtime_pm_put(i915);
+
 	i915_gem_contexts_lost(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 }