diff mbox

[4/4] drm/i915: Park before resetting the submission backend

Message ID 152301754847.28413.2797600546019323260@mail.alporthouse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson April 6, 2018, 12:25 p.m. UTC
Quoting Sagar Arun Kamble (2018-04-05 12:54:38)
> 
> 
> On 4/5/2018 4:32 PM, Chris Wilson wrote:
> > As different backends may have different park/unpark callbacks, we
> > should only ever switch backends (reset_default_submission on wedge
> > recovery, or on enabling the guc) while parked.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c             | 11 +++++++++++
> >   drivers/gpu/drm/i915/intel_engine_cs.c      |  3 +++
> >   drivers/gpu/drm/i915/intel_guc_submission.c |  1 +
> >   3 files changed, 15 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index e148db310ea6..e2880de2fc7e 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3380,6 +3380,17 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> >       i915_retire_requests(i915);
> >       GEM_BUG_ON(i915->gt.active_requests);
> >   
> > +     /*
> > +      * Park before disengaging the old submit mechanism as different
> > +      * backends may have different park/unpack callbacks.
> > +      *
> > +      * We are idle; the idle-worker will be queued, but we need to run
> > +      * it now. As we already hold the struct mutex, we can get park
> > +      * the GPU right away, letting the lazy worker see that we are
> > +      * already active again by the time it acquires the mutex.
> > +      */
> > +     i915_gem_park(i915);
> I think we should be calling this before gem_unset_wedged in i915_reset.
> With GuC, hitting the GEM_BUG_ON(awake) in guc_submission_enable.

Right, we really do need to restore guc submission before restarting. So
how can we fit in something like

?
-Chris

Comments

Michal Wajdeczko April 6, 2018, 1:39 p.m. UTC | #1
On Fri, 06 Apr 2018 14:25:48 +0200, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:

> Quoting Sagar Arun Kamble (2018-04-05 12:54:38)
>>
>>
>> On 4/5/2018 4:32 PM, Chris Wilson wrote:
>> > As different backends may have different park/unpark callbacks, we
>> > should only ever switch backends (reset_default_submission on wedge
>> > recovery, or on enabling the guc) while parked.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> > Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
>> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> > ---
>> >   drivers/gpu/drm/i915/i915_gem.c             | 11 +++++++++++
>> >   drivers/gpu/drm/i915/intel_engine_cs.c      |  3 +++
>> >   drivers/gpu/drm/i915/intel_guc_submission.c |  1 +
>> >   3 files changed, 15 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_gem.c  
>> b/drivers/gpu/drm/i915/i915_gem.c
>> > index e148db310ea6..e2880de2fc7e 100644
>> > --- a/drivers/gpu/drm/i915/i915_gem.c
>> > +++ b/drivers/gpu/drm/i915/i915_gem.c
>> > @@ -3380,6 +3380,17 @@ bool i915_gem_unset_wedged(struct  
>> drm_i915_private *i915)
>> >       i915_retire_requests(i915);
>> >       GEM_BUG_ON(i915->gt.active_requests);
>> >
>> > +     /*
>> > +      * Park before disengaging the old submit mechanism as different
>> > +      * backends may have different park/unpack callbacks.
>> > +      *
>> > +      * We are idle; the idle-worker will be queued, but we need to  
>> run
>> > +      * it now. As we already hold the struct mutex, we can get park
>> > +      * the GPU right away, letting the lazy worker see that we are
>> > +      * already active again by the time it acquires the mutex.
>> > +      */
>> > +     i915_gem_park(i915);
>> I think we should be calling this before gem_unset_wedged in i915_reset.
>> With GuC, hitting the GEM_BUG_ON(awake) in guc_submission_enable.
>
> Right, we really do need to restore guc submission before restarting. So
> how can we fit in something like
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c  
> b/drivers/gpu/drm/i915/i915_gem.c
> index 9650a7b10c5f..95fa30d9aec6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3132,6 +3132,9 @@ void i915_gem_reset(struct drm_i915_private  
> *dev_priv)
>        i915_retire_requests(dev_priv);
> +       if (USES_GUC_SUBMISSION(dev_priv))
> +               (void)intel_guc_submission_enable(guc);
> +
>         for_each_engine(engine, dev_priv, id) {
>                 struct i915_gem_context *ctx;
> ?

In series [1] I was trying to fix symmetry in calls to uc_init_hw/fini_hw
where we are enabling/disabling GuC submission. In particular, patch [2]
fixes reset path.

So maybe we should try to merge* both series ?

Michal

*) without my patch 6/12 [3]

[1] https://patchwork.freedesktop.org/series/41159/
[2] https://patchwork.freedesktop.org/patch/214967/
[3] https://patchwork.freedesktop.org/patch/214976/
Chris Wilson April 6, 2018, 2:09 p.m. UTC | #2
Quoting Michal Wajdeczko (2018-04-06 14:39:28)
> On Fri, 06 Apr 2018 14:25:48 +0200, Chris Wilson  
> <chris@chris-wilson.co.uk> wrote:
> 
> > Quoting Sagar Arun Kamble (2018-04-05 12:54:38)
> >>
> >>
> >> On 4/5/2018 4:32 PM, Chris Wilson wrote:
> >> > As different backends may have different park/unpark callbacks, we
> >> > should only ever switch backends (reset_default_submission on wedge
> >> > recovery, or on enabling the guc) while parked.
> >> >
> >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> > Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> >> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> > ---
> >> >   drivers/gpu/drm/i915/i915_gem.c             | 11 +++++++++++
> >> >   drivers/gpu/drm/i915/intel_engine_cs.c      |  3 +++
> >> >   drivers/gpu/drm/i915/intel_guc_submission.c |  1 +
> >> >   3 files changed, 15 insertions(+)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/i915_gem.c  
> >> b/drivers/gpu/drm/i915/i915_gem.c
> >> > index e148db310ea6..e2880de2fc7e 100644
> >> > --- a/drivers/gpu/drm/i915/i915_gem.c
> >> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> >> > @@ -3380,6 +3380,17 @@ bool i915_gem_unset_wedged(struct  
> >> drm_i915_private *i915)
> >> >       i915_retire_requests(i915);
> >> >       GEM_BUG_ON(i915->gt.active_requests);
> >> >
> >> > +     /*
> >> > +      * Park before disengaging the old submit mechanism as different
> >> > +      * backends may have different park/unpack callbacks.
> >> > +      *
> >> > +      * We are idle; the idle-worker will be queued, but we need to  
> >> run
> >> > +      * it now. As we already hold the struct mutex, we can get park
> >> > +      * the GPU right away, letting the lazy worker see that we are
> >> > +      * already active again by the time it acquires the mutex.
> >> > +      */
> >> > +     i915_gem_park(i915);
> >> I think we should be calling this before gem_unset_wedged in i915_reset.
> >> With GuC, hitting the GEM_BUG_ON(awake) in guc_submission_enable.
> >
> > Right, we really do need to restore guc submission before restarting. So
> > how can we fit in something like
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c  
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index 9650a7b10c5f..95fa30d9aec6 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3132,6 +3132,9 @@ void i915_gem_reset(struct drm_i915_private  
> > *dev_priv)
> >        i915_retire_requests(dev_priv);
> > +       if (USES_GUC_SUBMISSION(dev_priv))
> > +               (void)intel_guc_submission_enable(guc);
> > +
> >         for_each_engine(engine, dev_priv, id) {
> >                 struct i915_gem_context *ctx;
> > ?
> 
> In series [1] I was trying to fix symmetry in calls to uc_init_hw/fini_hw
> where we are enabling/disabling GuC submission. In particular, patch [2]
> fixes reset path.

Tbh, I first misread that as i915_gem_fini(). Hmm, but we still should
be re-enabling guc prior to submitting our first requests here in
i915_gem_reset() prior to calling i915_gem_init_hw(). We want to start
from the repopulated request lists, so sticking the current init_hw() is
overkill.

I guess I have this sketch in mind:

i915_reset:

	i915_gem_reset_prepare();
	i915_gem_disable_hw(); // placeholder!

	i915_gem_fini_hw(); // revert to default state

	intel_gpu_reset();

	i915_gem_init_hw(); // just the HW setup

	i915_gem_reset();

	i915_gem_enable_hw(); // tell the HW to run
	i915_gem_reset_finish();

That i915_gem_reset() sticks out like a sore thumb. I think that's what
we need to break up and rearrange. Hmm, there's some of that in the RPS
series. Looks like we're too early to try and do this neatly.

> So maybe we should try to merge* both series ?

Hmm, looking over that series doesn't patch 5 hit the GEM_BUG_ON removed
in patch 6? Otherwise, merging the first 5 with Sagar's blessing would
be on the cards as they look sane otherwise. Maybe start with the first
4 then?
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9650a7b10c5f..95fa30d9aec6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3132,6 +3132,9 @@  void i915_gem_reset(struct drm_i915_private *dev_priv)
 
        i915_retire_requests(dev_priv);
 
+       if (USES_GUC_SUBMISSION(dev_priv))
+               (void)intel_guc_submission_enable(guc);
+
        for_each_engine(engine, dev_priv, id) {
                struct i915_gem_context *ctx;