diff mbox

[1/3] drm/i915/execlists: Reset ring registers on rebinding contexts

Message ID 20180327210157.16896-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson March 27, 2018, 9:01 p.m. UTC
Tvrtko uncovered a fun issue with recovering from a wedge device. In his
tests, he wedged the driver by injecting an unrecoverable hang whilst a
batch was spinning. As we reset the gpu in the middle of the spinner,
when resumed it would continue on from the next instruction in the ring
and write it's breadcrumb. However, on wedging we updated our
bookkeeping to indicate that the GPU had completed executing and would
restart from after the breadcrumb; so the emission of the stale
breadcrumb from before the reset came as a bit of a surprise.

A simple fix is to when rebinding the context into the GPU, we update
the ring register state in the context image to match our bookkeeping.
We already have to update the RING_START and RING_TAIL, so updating
RING_HEAD as well is trivial. This works because whenever we unbind the
context, we keep the bookkeeping in check; and on wedging we unbind all
contexts.

Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Mika Kuoppala March 28, 2018, 11:31 a.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Tvrtko uncovered a fun issue with recovering from a wedge device. In his
> tests, he wedged the driver by injecting an unrecoverable hang whilst a
> batch was spinning. As we reset the gpu in the middle of the spinner,
> when resumed it would continue on from the next instruction in the ring
> and write it's breadcrumb. However, on wedging we updated our
> bookkeeping to indicate that the GPU had completed executing and would
> restart from after the breadcrumb; so the emission of the stale
> breadcrumb from before the reset came as a bit of a surprise.
>
> A simple fix is to when rebinding the context into the GPU, we update
> the ring register state in the context image to match our bookkeeping.
> We already have to update the RING_START and RING_TAIL, so updating
> RING_HEAD as well is trivial. This works because whenever we unbind the
> context, we keep the bookkeeping in check; and on wedging we unbind all
> contexts.

s/wedging/unwedging. The context lost markup is on unwedge side tho
it should not matter on which stage the unbind happends between
wedge and unwedge so this minor change to commit message and

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>
> Testcase: igt/gem_eio
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ba7f7831f934..654634254b64 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1272,6 +1272,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
>  	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
>  	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
>  		i915_ggtt_offset(ce->ring->vma);
> +	ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
>  
>  	ce->state->obj->pin_global++;
>  	i915_gem_context_get(ctx);
> -- 
> 2.16.3
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ba7f7831f934..654634254b64 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1272,6 +1272,7 @@  execlists_context_pin(struct intel_engine_cs *engine,
 	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
 	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
 		i915_ggtt_offset(ce->ring->vma);
+	ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
 
 	ce->state->obj->pin_global++;
 	i915_gem_context_get(ctx);