[2/2] drm/i915/execlists: Reset RING registers upon resume

Message ID	20160916192318.14030-2-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Fri, 16 Sep 2016 20:23:18 +0100 Message-Id: <20160916192318.14030-2-chris@chris-wilson.co.uk> In-Reply-To: <20160916192318.14030-1-chris@chris-wilson.co.uk> References: <20160916192318.14030-1-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH 2/2] drm/i915/execlists: Reset RING registers upon resume Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

20160916192318.14030-2-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Fri, 16 Sep 2016 20:23:18 +0100
Message-Id: <20160916192318.14030-2-chris@chris-wilson.co.uk>
In-Reply-To: <20160916192318.14030-1-chris@chris-wilson.co.uk>
References: <20160916192318.14030-1-chris@chris-wilson.co.uk>
Subject: [Intel-gfx] [PATCH 2/2] drm/i915/execlists: Reset RING registers
	upon resume
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

Chris Wilson Sept. 16, 2016, 7:23 p.m. UTC

There is a disparity in the context image saved to disk and our own
bookkeeping - that is we presume the RING_HEAD and RING_TAIL match our
stored ce->ring->tail value. However, as we emit WA_TAIL_DWORDS into the
ring but may not tell the GPU about them, the GPU may be lagging behind
our bookkeeping. Upon hibernation we do not save stolen pages, presuming
that their contents are volatile. This means that although we start
writing into the ring at tail, the GPU starts executing from its HEAD
and there may be some garbage in between and so the GPU promptly hangs
upon resume.

Testcase: igt/gem_exec_suspend/basic-S4
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 49 ++++++++++++++++++++++++----------------
 1 file changed, 30 insertions(+), 19 deletions(-)

Comments

Joonas Lahtinen Sept. 19, 2016, 8:24 a.m. UTC | #1

On pe, 2016-09-16 at 20:23 +0100, Chris Wilson wrote:
>  void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>  {
> > -	struct i915_gem_context *ctx = dev_priv->kernel_context;
> >  	struct intel_engine_cs *engine;
> > +	struct i915_gem_context *ctx;
> +
> > +	/* Because we emit WA_TAIL_DWORDS there may be a disparity
> > +	 * between our bookkeeping in ce->ring->head and ce->ring->tail and
> +	 * that stored in context. As we only write new comamnds from

"new commands"

> +	 * ce->ring->tail onwards, everything before that is junk. If the GPU
> > +	 * starts reading from its RING_HEAD from the context, it may try to
> > +	 * execute that junk and die.
> > +	 *
> > +	 * So to avoid that we reset the context images upon resume. For
> > +	 * simplicity, we just zero everything out.
> > +	 */
> > +	list_for_each_entry(ctx, &dev_priv->context_list, link) {
> > +		for_each_engine(engine, dev_priv) {
> > +			struct intel_context *ce = &ctx->engine[engine->id];
> > +			u32 *reg_state;
>  
> > -	for_each_engine(engine, dev_priv) {
> > -		struct intel_context *ce = &ctx->engine[engine->id];
> > -		void *vaddr;
> > -		uint32_t *reg_state;
> -
> > -		if (!ce->state)
> > -			continue;
> -
> > -		vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
> > -		if (WARN_ON(IS_ERR(vaddr)))
> > -			continue;
> > +			if (!ce->state)
> > +				continue;
>  
> > -		reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
> > +			reg_state = i915_gem_object_pin_map(ce->state->obj,
> > +							    I915_MAP_WB);
> > +			if (WARN_ON(IS_ERR(reg_state)))
> > +				continue;
>  
> > -		reg_state[CTX_RING_HEAD+1] = 0;
> > -		reg_state[CTX_RING_TAIL+1] = 0;
> +			reg_state += LRC_STATE_PN * PAGE_SIZE / sizeof(u32);

/ sizeof(*reg_state) I presume.

Also add a newline here.

With those;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bc1786478f8e..ec2ad603be86 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2100,30 +2100,41 @@  error_deref_obj:
 
 void intel_lr_context_resume(struct drm_i915_private *dev_priv)
 {
-	struct i915_gem_context *ctx = dev_priv->kernel_context;
 	struct intel_engine_cs *engine;
+	struct i915_gem_context *ctx;
+
+	/* Because we emit WA_TAIL_DWORDS there may be a disparity
+	 * between our bookkeeping in ce->ring->head and ce->ring->tail and
+	 * that stored in context. As we only write new comamnds from
+	 * ce->ring->tail onwards, everything before that is junk. If the GPU
+	 * starts reading from its RING_HEAD from the context, it may try to
+	 * execute that junk and die.
+	 *
+	 * So to avoid that we reset the context images upon resume. For
+	 * simplicity, we just zero everything out.
+	 */
+	list_for_each_entry(ctx, &dev_priv->context_list, link) {
+		for_each_engine(engine, dev_priv) {
+			struct intel_context *ce = &ctx->engine[engine->id];
+			u32 *reg_state;
 
-	for_each_engine(engine, dev_priv) {
-		struct intel_context *ce = &ctx->engine[engine->id];
-		void *vaddr;
-		uint32_t *reg_state;
-
-		if (!ce->state)
-			continue;
-
-		vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
-		if (WARN_ON(IS_ERR(vaddr)))
-			continue;
+			if (!ce->state)
+				continue;
 
-		reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
+			reg_state = i915_gem_object_pin_map(ce->state->obj,
+							    I915_MAP_WB);
+			if (WARN_ON(IS_ERR(reg_state)))
+				continue;
 
-		reg_state[CTX_RING_HEAD+1] = 0;
-		reg_state[CTX_RING_TAIL+1] = 0;
+			reg_state += LRC_STATE_PN * PAGE_SIZE / sizeof(u32);
+			reg_state[CTX_RING_HEAD+1] = 0;
+			reg_state[CTX_RING_TAIL+1] = 0;
 
-		ce->state->obj->dirty = true;
-		i915_gem_object_unpin_map(ce->state->obj);
+			ce->state->obj->dirty = true;
+			i915_gem_object_unpin_map(ce->state->obj);
 
-		ce->ring->head = 0;
-		ce->ring->tail = 0;
+			ce->ring->head = ce->ring->tail = 0;
+			ce->ring->last_retired_head = -1;
+		}
 	}
 }

[2/2] drm/i915/execlists: Reset RING registers upon resume

Commit Message

Comments

Patch