diff mbox

[4/4] drm/i915: Park before resetting the submission backend

Message ID 20180405110220.27008-4-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson April 5, 2018, 11:02 a.m. UTC
As different backends may have different park/unpark callbacks, we
should only ever switch backends (reset_default_submission on wedge
recovery, or on enabling the guc) while parked.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c             | 11 +++++++++++
 drivers/gpu/drm/i915/intel_engine_cs.c      |  3 +++
 drivers/gpu/drm/i915/intel_guc_submission.c |  1 +
 3 files changed, 15 insertions(+)

Comments

sagar.a.kamble@intel.com April 5, 2018, 11:54 a.m. UTC | #1
On 4/5/2018 4:32 PM, Chris Wilson wrote:
> As different backends may have different park/unpark callbacks, we
> should only ever switch backends (reset_default_submission on wedge
> recovery, or on enabling the guc) while parked.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c             | 11 +++++++++++
>   drivers/gpu/drm/i915/intel_engine_cs.c      |  3 +++
>   drivers/gpu/drm/i915/intel_guc_submission.c |  1 +
>   3 files changed, 15 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index e148db310ea6..e2880de2fc7e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3380,6 +3380,17 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>   	i915_retire_requests(i915);
>   	GEM_BUG_ON(i915->gt.active_requests);
>   
> +	/*
> +	 * Park before disengaging the old submit mechanism as different
> +	 * backends may have different park/unpack callbacks.
> +	 *
> +	 * We are idle; the idle-worker will be queued, but we need to run
> +	 * it now. As we already hold the struct mutex, we can get park
> +	 * the GPU right away, letting the lazy worker see that we are
> +	 * already active again by the time it acquires the mutex.
> +	 */
> +	i915_gem_park(i915);
I think we should be calling this before gem_unset_wedged in i915_reset.
With GuC, hitting the GEM_BUG_ON(awake) in guc_submission_enable.
Also idle_work can execute just before reset so GEM_BUG_ON(!awake) in 
gem_park can be hit I think.
> +
>   	/*
>   	 * Undo nop_submit_request. We prevent all new i915 requests from
>   	 * being queued (by disallowing execbuf whilst wedged) so having
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 12486d8f534b..b4ea77a2896c 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1651,6 +1651,9 @@ void intel_engines_reset_default_submission(struct drm_i915_private *i915)
>   	struct intel_engine_cs *engine;
>   	enum intel_engine_id id;
>   
> +	/* Must be parked first! */
> +	GEM_BUG_ON(i915->gt.awake);
> +
>   	for_each_engine(engine, i915, id)
>   		engine->set_default_submission(engine);
>   }
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 97121230656c..225fa3927a02 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -1243,6 +1243,7 @@ int intel_guc_submission_enable(struct intel_guc *guc)
>   	/* Take over from manual control of ELSP (execlists) */
>   	guc_interrupts_capture(dev_priv);
>   
> +	GEM_BUG_ON(dev_priv->gt.awake); /* Must be idle switching park/unpark */
>   	for_each_engine(engine, dev_priv, id) {
>   		struct intel_engine_execlists * const execlists =
>   			&engine->execlists;
Chris Wilson April 5, 2018, 12:17 p.m. UTC | #2
Quoting Sagar Arun Kamble (2018-04-05 12:54:38)
> 
> 
> On 4/5/2018 4:32 PM, Chris Wilson wrote:
> > As different backends may have different park/unpark callbacks, we
> > should only ever switch backends (reset_default_submission on wedge
> > recovery, or on enabling the guc) while parked.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c             | 11 +++++++++++
> >   drivers/gpu/drm/i915/intel_engine_cs.c      |  3 +++
> >   drivers/gpu/drm/i915/intel_guc_submission.c |  1 +
> >   3 files changed, 15 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index e148db310ea6..e2880de2fc7e 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3380,6 +3380,17 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> >       i915_retire_requests(i915);
> >       GEM_BUG_ON(i915->gt.active_requests);
> >   
> > +     /*
> > +      * Park before disengaging the old submit mechanism as different
> > +      * backends may have different park/unpack callbacks.
> > +      *
> > +      * We are idle; the idle-worker will be queued, but we need to run
> > +      * it now. As we already hold the struct mutex, we can get park
> > +      * the GPU right away, letting the lazy worker see that we are
> > +      * already active again by the time it acquires the mutex.
> > +      */
> > +     i915_gem_park(i915);
> I think we should be calling this before gem_unset_wedged in i915_reset.

We can't, we aren't yet conclusively idle at that point. This is the
earliest in that sequence where we know we are.

> With GuC, hitting the GEM_BUG_ON(awake) in guc_submission_enable.

Hmm. We definitely need to keep park/unpark balanced. Brute force would
be to force idle at that point. Or we do if (awake) engine->unpark on
the takeover.

> Also idle_work can execute just before reset so GEM_BUG_ON(!awake) in 
> gem_park can be hit I think.

True, we could just ignore the whole wait-for-idle if already !gt.awake.
-Chris
Chris Wilson April 5, 2018, 12:36 p.m. UTC | #3
Quoting Chris Wilson (2018-04-05 13:17:24)
> Quoting Sagar Arun Kamble (2018-04-05 12:54:38)
> > With GuC, hitting the GEM_BUG_ON(awake) in guc_submission_enable.
> 
> Hmm. We definitely need to keep park/unpark balanced. Brute force would
> be to force idle at that point. Or we do if (awake) engine->unpark on
> the takeover.

I think since we are only gt.awake due to the early context copying, we
are actually idle here just haven't yet run the idle worker. Brute
forcing the park doesn't seem so bad.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e148db310ea6..e2880de2fc7e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3380,6 +3380,17 @@  bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	i915_retire_requests(i915);
 	GEM_BUG_ON(i915->gt.active_requests);
 
+	/*
+	 * Park before disengaging the old submit mechanism as different
+	 * backends may have different park/unpack callbacks.
+	 *
+	 * We are idle; the idle-worker will be queued, but we need to run
+	 * it now. As we already hold the struct mutex, we can get park
+	 * the GPU right away, letting the lazy worker see that we are
+	 * already active again by the time it acquires the mutex.
+	 */
+	i915_gem_park(i915);
+
 	/*
 	 * Undo nop_submit_request. We prevent all new i915 requests from
 	 * being queued (by disallowing execbuf whilst wedged) so having
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 12486d8f534b..b4ea77a2896c 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1651,6 +1651,9 @@  void intel_engines_reset_default_submission(struct drm_i915_private *i915)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
+	/* Must be parked first! */
+	GEM_BUG_ON(i915->gt.awake);
+
 	for_each_engine(engine, i915, id)
 		engine->set_default_submission(engine);
 }
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 97121230656c..225fa3927a02 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -1243,6 +1243,7 @@  int intel_guc_submission_enable(struct intel_guc *guc)
 	/* Take over from manual control of ELSP (execlists) */
 	guc_interrupts_capture(dev_priv);
 
+	GEM_BUG_ON(dev_priv->gt.awake); /* Must be idle switching park/unpark */
 	for_each_engine(engine, dev_priv, id) {
 		struct intel_engine_execlists * const execlists =
 			&engine->execlists;