diff mbox

drm/i915: Restore inhibiting the load of the default context

Message ID 87zixz4r0k.fsf@gaia.fi.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mika Kuoppala Nov. 27, 2015, 11:32 a.m. UTC
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Following a GPU reset, we may leave the context in a poorly defined
> state, and reloading from that context will leave the GPU flummoxed. For
> secondary contexts, this will lead to that context being banned - but
> currently it is also causing the default context to become banned,
> leading to turmoil in the shared state.
>
> This is a regression from
>
> commit 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b [v4.1]
> Author: Ben Widawsky <benjamin.widawsky@intel.com>
> Date:   Mon Mar 16 16:00:58 2015 +0000
>
>     drm/i915: Initialize all contexts
>
> which quietly introduced the removal of the MI_RESTORE_INHIBIT on the
> default context.
>

As we never submit anything except driver initialization commands
for that context, what would cause this context to become corrupted?

Please consider:

To achieve the same effect and as a bonus, get the
same default context (with workarounds) as we
did in driver init.

I also think that we should zero the global
default context in here to gain similarity wrt
module init.

-Mika

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 43761c5bcaca..1041099d285a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -708,7 +708,7 @@ static int do_switch(struct drm_i915_gem_request *req)
>  	if (ret)
>  		goto unpin_out;
>  
> -	if (!to->legacy_hw_ctx.initialized) {
> +	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
>  		hw_flags |= MI_RESTORE_INHIBIT;
>  		/* NB: If we inhibit the restore, the context is not allowed to
>  		 * die because future work may end up depending on valid address
> -- 
> 2.6.2

Comments

Chris Wilson Nov. 27, 2015, 1:14 p.m. UTC | #1
On Fri, Nov 27, 2015 at 01:32:11PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Following a GPU reset, we may leave the context in a poorly defined
> > state, and reloading from that context will leave the GPU flummoxed. For
> > secondary contexts, this will lead to that context being banned - but
> > currently it is also causing the default context to become banned,
> > leading to turmoil in the shared state.
> >
> > This is a regression from
> >
> > commit 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b [v4.1]
> > Author: Ben Widawsky <benjamin.widawsky@intel.com>
> > Date:   Mon Mar 16 16:00:58 2015 +0000
> >
> >     drm/i915: Initialize all contexts
> >
> > which quietly introduced the removal of the MI_RESTORE_INHIBIT on the
> > default context.
> >
> 
> As we never submit anything except driver initialization commands
> for that context, what would cause this context to become corrupted?

I can only hazard that the act of reseting the GPU left it invalid. A
bisect pointed to that commit, and partially reverting each chunk left
me with the conclusion that the hang was a direct result of reloading
the context. Closer inspection may reveal someelse suspect about the
context, but I object to this sneaky change.

> Please consider:
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 43761c5..45b9a39 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -332,6 +332,7 @@ void i915_gem_context_reset(struct drm_device *dev)
>         for (i = 0; i < I915_NUM_RINGS; i++) {
>                 struct intel_engine_cs *ring = &dev_priv->ring[i];
>                 struct intel_context *lctx = ring->last_context;
> +               struct intel_context *dctx = ring->default_context;
>  
>                 if (lctx) {
>                         if (lctx->legacy_hw_ctx.rcs_state && i == RCS)
> @@ -340,6 +341,9 @@ void i915_gem_context_reset(struct drm_device *dev)
>                         i915_gem_context_unreference(lctx);
>                         ring->last_context = NULL;
>                 }
> +
> +               if (dctx)
> +                       dctx->legacy_hw_ctx.initialized = false;
>         }
>  }
> 
> To achieve the same effect and as a bonus, get the
> same default context (with workarounds) as we
> did in driver init.

I considered it, and wondered why it wasn't already there. It is a
separate issue imo.
 
> I also think that we should zero the global
> default context in here to gain similarity wrt
> module init.

You mean reallocate it from scratch? We have avoided doing the
reallocations in the past, as they can fail at inopportune times
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
b/drivers/gpu/drm/i915/i915_gem_context.c
index 43761c5..45b9a39 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -332,6 +332,7 @@  void i915_gem_context_reset(struct drm_device *dev)
        for (i = 0; i < I915_NUM_RINGS; i++) {
                struct intel_engine_cs *ring = &dev_priv->ring[i];
                struct intel_context *lctx = ring->last_context;
+               struct intel_context *dctx = ring->default_context;
 
                if (lctx) {
                        if (lctx->legacy_hw_ctx.rcs_state && i == RCS)
@@ -340,6 +341,9 @@  void i915_gem_context_reset(struct drm_device *dev)
                        i915_gem_context_unreference(lctx);
                        ring->last_context = NULL;
                }
+
+               if (dctx)
+                       dctx->legacy_hw_ctx.initialized = false;
        }
 }