diff mbox

[6/7] drm/i915/execlists: Preemption!

Message ID F3B0350DF4CB6849A642218320DE483D7D6456D1@SHSMSX101.ccr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wang, Zhi A Sept. 25, 2017, 6:31 p.m. UTC
Two ideas from me. :)

1) For GVT-g, can we have an execlist context status notification in lrc irq handler? Then we can switch back those MMIO registers for host in the handler if a GVT-g request is preempted.

2) Might need a flag to indicate if a request can be pre-emptible in GEM request.

It looks not every request can be preempted to me since previously I saw there were some HW WAs to disable preemption for specific drawing topologies by LRI to 0x2244 on some BDW stepping. 

I'm not sure if this kinds of stuff is still existing in SKL and will appear again in future. Also some execlist contexts of OCL workload needs to be patched  if they got preempted. So UMD might be the guy which knows if the workload can be preempted.

For GVT-g request, it doesn't need to worry about this since it's the responsibility of guest to place the right MI_ARB_ON_OFFs in its ring buffer.

Thanks,
Zhi.

-----Original Message-----
From: Chris Wilson [mailto:chris@chris-wilson.co.uk] 
Sent: Monday, September 25, 2017 2:44 PM
To: intel-gfx@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>; Winiarski, Michal <michal.winiarski@intel.com>; Ursulin, Tvrtko <tvrtko.ursulin@intel.com>; Hiler, Arkadiusz <arkadiusz.hiler@intel.com>; Kuoppala, Mika <mika.kuoppala@intel.com>; Widawsky, Benjamin <benjamin.widawsky@intel.com>; Zhenyu Wang <zhenyuw@linux.intel.com>; Wang, Zhi A <zhi.a.wang@intel.com>
Subject: [PATCH 6/7] drm/i915/execlists: Preemption!

When we write to ELSP, it triggers a context preemption at the earliest arbitration point (3DPRIMITIVE, some PIPECONTROLs, a few other operations and the explicit MI_ARB_CHECK). If this is to the same context, it triggers a LITE_RESTORE where the RING_TAIL is merely updated (used currently to chain requests from the same context together, avoiding bubbles). However, if it is to a different context, a full context-switch is performed and it will start to execute the new context saving the image of the old for later execution.

Previously we avoided preemption by only submitting a new context when the old was idle. But now we wish embrace it, and if the new request has a higher priority than the currently executing request, we write to the ELSP regardless, thus triggering preemption, but we tell the GPU to switch to our special preemption context (not the target). In the context-switch interrupt handler, we know that the previous contexts have finished execution and so can unwind all the incomplete requests and compute the new highest priority request to execute.

It would be feasible to avoid the switch-to-idle intermediate by programming the ELSP with the target context. The difficulty is in tracking which request that should be whilst maintaining the dependency change, the error comes in with coalesced requests. As we only track the most recent request and its priority, we may run into the issue of being tricked in preempting a high priority request that was followed by a low priority request from the same context (e.g. for PI); worse still that earlier request may be our own dependency and the order then broken by preemption. By injecting the switch-to-idle and then recomputing the priority queue, we avoid the issue with tracking in-flight coalesced requests. Having tried the preempt-to-busy approach, and failed to find a way around the coalesced priority issue, Michal's original proposal to inject an idle context (based on handling GuC preemption) succeeds.

The current heuristic for deciding when to preempt are only if the new request is of higher priority, and has the privileged priority of greater than 0. Note that the scheduler remains unfair!

Suggested-by: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c         |   6 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 123 ++++++++++++++++++++++++--------
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 3 files changed, 98 insertions(+), 33 deletions(-)

Comments

Chris Wilson Sept. 25, 2017, 6:39 p.m. UTC | #1
Quoting Wang, Zhi A (2017-09-25 19:31:15)
> Two ideas from me. :)
> 
> 1) For GVT-g, can we have an execlist context status notification in lrc irq handler? Then we can switch back those MMIO registers for host in the handler if a GVT-g request is preempted.

GVT-g is hosed with this patch if requires matching context-in,
context-out notification. We broke that back in

commit 221ab9719bf33ad2984928d2afb20988d652a289
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Sep 16 21:44:14 2017 +0100

    drm/i915/execlists: Unwind incomplete requests on resets

The simple question is it ok to call context-out for the incomplete
request when they are no longer being scheduler?

> 2) Might need a flag to indicate if a request can be pre-emptible in GEM request.

Flag set by whom? The big problem with flags on requests is that we get
to coalesce requests into the context-switch. That makes tracking the
individuals more complicated -- before deciding upon preemption you have
to walk the list of all executing requests rather than just being able
to check one. (execlists_schedule() with its list walking for PI is
already one of the prime hotspots :(

> It looks not every request can be preempted to me since previously I saw there were some HW WAs to disable preemption for specific drawing topologies by LRI to 0x2244 on some BDW stepping. 
> 
> I'm not sure if this kinds of stuff is still existing in SKL and will appear again in future. Also some execlist contexts of OCL workload needs to be patched  if they got preempted. So UMD might be the guy which knows if the workload can be preempted.
> 

Pulled in a couple of patches from Michal to setup the contexts for w/a.
And Michal's suggestion is that we just skip enabling preemption for
bdw/bsw.
-Chris
Wang, Zhi A Sept. 26, 2017, 6:20 a.m. UTC | #2
Thanks for the explanation 2). :)

I'm thinking about the rough design of preemption in GVT-g since host is moving to support preemption.

1) Global MMIO save/restore, which is covered by context status notifier. 

2) Support host preemption. GVT-g request is preempted by host. Since this preemption is not expected by guest, GVT-g workload scheduler should keep it and re-schedule it.

* GVT-g has to know if a context is scheduled-out or preempted here, it's better the context notification can explicitly report INTEL_CONTEXT_PREEMPTED. Or we have to get the information from context or request data structure.
* Better the re-scheduling of GVT-g request can be controlled by GVT-g not i915, since GVT-g has its own scheduling policies.

3) Guest preemption. Guest requests a preemption by writing vELSP. GVT-g has to find a way to stop the previous GVT-g request if it's active on HW. If it's not active on HW, the request should be cancelled.

* If host can expose such a APIs to cancel request in HW(by preemption) or in priotree for GVT-g , that would be nice. :)

Thanks,
Zhi.

-----Original Message-----
From: Chris Wilson [mailto:chris@chris-wilson.co.uk] 

Sent: Monday, September 25, 2017 9:39 PM
To: Wang, Zhi A <zhi.a.wang@intel.com>; intel-gfx@lists.freedesktop.org
Cc: Winiarski, Michal <michal.winiarski@intel.com>; Ursulin, Tvrtko <tvrtko.ursulin@intel.com>; Hiler, Arkadiusz <arkadiusz.hiler@intel.com>; Kuoppala, Mika <mika.kuoppala@intel.com>; Widawsky, Benjamin <benjamin.widawsky@intel.com>; Zhenyu Wang <zhenyuw@linux.intel.com>
Subject: RE: [PATCH 6/7] drm/i915/execlists: Preemption!

Quoting Wang, Zhi A (2017-09-25 19:31:15)
> Two ideas from me. :)

> 

> 1) For GVT-g, can we have an execlist context status notification in lrc irq handler? Then we can switch back those MMIO registers for host in the handler if a GVT-g request is preempted.


GVT-g is hosed with this patch if requires matching context-in, context-out notification. We broke that back in

commit 221ab9719bf33ad2984928d2afb20988d652a289
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Sep 16 21:44:14 2017 +0100

    drm/i915/execlists: Unwind incomplete requests on resets

The simple question is it ok to call context-out for the incomplete request when they are no longer being scheduler?

> 2) Might need a flag to indicate if a request can be pre-emptible in GEM request.


Flag set by whom? The big problem with flags on requests is that we get to coalesce requests into the context-switch. That makes tracking the individuals more complicated -- before deciding upon preemption you have to walk the list of all executing requests rather than just being able to check one. (execlists_schedule() with its list walking for PI is already one of the prime hotspots :(

> It looks not every request can be preempted to me since previously I saw there were some HW WAs to disable preemption for specific drawing topologies by LRI to 0x2244 on some BDW stepping. 

> 

> I'm not sure if this kinds of stuff is still existing in SKL and will appear again in future. Also some execlist contexts of OCL workload needs to be patched  if they got preempted. So UMD might be the guy which knows if the workload can be preempted.

> 


Pulled in a couple of patches from Michal to setup the contexts for w/a.
And Michal's suggestion is that we just skip enabling preemption for bdw/bsw.
-Chris
Chris Wilson Sept. 26, 2017, 10:12 a.m. UTC | #3
Quoting Wang, Zhi A (2017-09-26 07:20:00)
> Thanks for the explanation 2). :)
> 
> I'm thinking about the rough design of preemption in GVT-g since host is moving to support preemption.
> 
> 1) Global MMIO save/restore, which is covered by context status notifier. 
> 
> 2) Support host preemption. GVT-g request is preempted by host. Since this preemption is not expected by guest, GVT-g workload scheduler should keep it and re-schedule it.
>
> * GVT-g has to know if a context is scheduled-out or preempted here, it's better the context notification can explicitly report INTEL_CONTEXT_PREEMPTED. Or we have to get the information from context or request data structure.
> * Better the re-scheduling of GVT-g request can be controlled by GVT-g not i915, since GVT-g has its own scheduling policies.

You'll get the sequence
  CONTEXT_IN, CONTEXT_PREEMPTED, CONTEXT_IN..., CONTEXT_OUT
Happy?

But it feels like we should be focusing on a single scheduler, rather
than in the end having two timeslicing CFS (and the different modes).
It's para-virt vs full-virt, and I thought gvt-g was more about being a
lightweight para-virt...

> 3) Guest preemption. Guest requests a preemption by writing vELSP. GVT-g has to find a way to stop the previous GVT-g request if it's active on HW. If it's not active on HW, the request should be cancelled.
> 
> * If host can expose such a APIs to cancel request in HW(by preemption) or in priotree for GVT-g , that would be nice. :)

Hmm, the interface exists for the purpose of native requests. Rough
approach would be to use a second gvt context that preempts the first,
then when you know that's been scheduled, you can then requeue the
normal gvt context. But I'd rather we get to the point where we all feed
into the same scheduler, rather than having callbacks from within it.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index af82bd721dbc..20c1fd905110 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1350,10 +1350,8 @@  gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir, int test_shift)
 	bool tasklet = false;
 
 	if (iir & (GT_CONTEXT_SWITCH_INTERRUPT << test_shift)) {
-		if (port_count(&execlists->port[0])) {
-			__set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-			tasklet = true;
-		}
+		__set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+		tasklet = true;
 	}
 
 	if (iir & (GT_RENDER_USER_INTERRUPT << test_shift)) { diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d7ea1f828b82..99f7b7535e74 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -208,8 +208,8 @@ 
 
 /* Typical size of the average request (2 pipecontrols and a MI_BB) */  #define EXECLISTS_REQUEST_SIZE 64 /* bytes */
-
 #define WA_TAIL_DWORDS 2
+#define PREEMPT_ID 0x1
 
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine); @@ -482,26 +482,39 @@ static void port_assign(struct execlist_port *port,
 	port_set(port, port_pack(i915_gem_request_get(rq), port_count(port)));  }
 
+static void inject_preempt_context(struct intel_engine_cs *engine) {
+	struct intel_context *ce =
+		&engine->i915->preempt_context->engine[engine->id];
+	u32 __iomem *elsp =
+		engine->i915->regs + i915_mmio_reg_offset(RING_ELSP(engine));
+	unsigned int n;
+
+	GEM_BUG_ON(engine->i915->preempt_context->hw_id != PREEMPT_ID);
+
+	memset(ce->ring->vaddr + ce->ring->tail, 0, 8);
+	ce->ring->tail += 8;
+	ce->ring->tail &= (ce->ring->size - 1);
+	ce->lrc_reg_state[CTX_RING_TAIL+1] = ce->ring->tail;
+
+	for (n = execlists_num_ports(&engine->execlists); --n; ) {
+		writel(0, elsp);
+		writel(0, elsp);
+	}
+	writel(upper_32_bits(ce->lrc_desc), elsp);
+	writel(lower_32_bits(ce->lrc_desc), elsp); }
+
 static void execlists_dequeue(struct intel_engine_cs *engine)  {
-	struct drm_i915_gem_request *last;
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
 	const struct execlist_port * const last_port =
 		&execlists->port[execlists->port_mask];
+	struct drm_i915_gem_request *last = port_request(port);
 	struct rb_node *rb;
 	bool submit = false;
 
-	last = port_request(port);
-	if (last)
-		/* WaIdleLiteRestore:bdw,skl
-		 * Apply the wa NOOPs to prevent ring:HEAD == req:TAIL
-		 * as we resubmit the request. See gen8_emit_breadcrumb()
-		 * for where we prepare the padding after the end of the
-		 * request.
-		 */
-		last->tail = last->wa_tail;
-
 	/* Hardware submission is through 2 ports. Conceptually each port
 	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
 	 * static for a context, and unique to each, so we only execute @@ -526,7 +539,44 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	spin_lock_irq(&engine->timeline->lock);
 	rb = execlists->first;
 	GEM_BUG_ON(rb_first(&execlists->queue) != rb);
-	while (rb) {
+	if (!rb)
+		goto unlock;
+
+	if (last) {
+		/*
+		 * Don't resubmit or switch until all outstanding
+		 * preemptions (lite-restore) are seen. Then we
+		 * know the next preemption status we see corresponds
+		 * to this ELSP update.
+		 */
+		if (port_count(&port[0]) > 1)
+			goto unlock;
+
+		if (rb_entry(rb, struct i915_priolist, node)->priority >
+		    max(last->priotree.priority, 0)) {
+			/*
+			 * Switch to our empty preempt context so
+			 * the state of the GPU is known (idle).
+			 */
+			inject_preempt_context(engine);
+			execlists->preempt = true;
+			goto unlock;
+		} else {
+			if (port_count(&port[1]))
+				goto unlock;
+
+			/* WaIdleLiteRestore:bdw,skl
+			 * Apply the wa NOOPs to prevent
+			 * ring:HEAD == req:TAIL as we resubmit the
+			 * request. See gen8_emit_breadcrumb() for
+			 * where we prepare the padding after the
+			 * end of the request.
+			 */
+			last->tail = last->wa_tail;
+		}
+	}
+
+	do {
 		struct i915_priolist *p = rb_entry(rb, typeof(*p), node);
 		struct drm_i915_gem_request *rq, *rn;
 
@@ -589,11 +639,12 @@  static void execlists_dequeue(struct intel_engine_cs *engine)
 		INIT_LIST_HEAD(&p->requests);
 		if (p->priority != I915_PRIORITY_NORMAL)
 			kmem_cache_free(engine->i915->priorities, p);
-	}
+	} while (rb);
 done:
 	execlists->first = rb;
 	if (submit)
 		port_assign(port, last);
+unlock:
 	spin_unlock_irq(&engine->timeline->lock);
 
 	if (submit)
@@ -670,13 +721,6 @@  static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	spin_unlock_irqrestore(&engine->timeline->lock, flags);  }
 
-static bool execlists_elsp_ready(const struct intel_engine_cs *engine) -{
-	const struct execlist_port *port = engine->execlists.port;
-
-	return port_count(&port[0]) + port_count(&port[1]) < 2;
-}
-
 /*
  * Check the unread Context Status Buffers and manage the submission of new
  * contexts to the ELSP accordingly.
@@ -685,7 +729,7 @@  static void intel_lrc_irq_handler(unsigned long data)  {
 	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct execlist_port *port = execlists->port;
+	struct execlist_port * const port = execlists->port;
 	struct drm_i915_private *dev_priv = engine->i915;
 
 	/* We can skip acquiring intel_runtime_pm_get() here as it was taken @@ -770,6 +814,23 @@ static void intel_lrc_irq_handler(unsigned long data)
 			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
 				continue;
 
+			if (status & GEN8_CTX_STATUS_ACTIVE_IDLE &&
+			    buf[2*head + 1] == PREEMPT_ID) {
+				execlist_cancel_port_requests(execlists);
+
+				spin_lock_irq(&engine->timeline->lock);
+				unwind_incomplete_requests(engine);
+				spin_unlock_irq(&engine->timeline->lock);
+
+				GEM_BUG_ON(!execlists->preempt);
+				execlists->preempt = false;
+				continue;
+			}
+
+			if (status & GEN8_CTX_STATUS_PREEMPTED &&
+			    execlists->preempt)
+				continue;
+
 			/* Check the context/desc id for this event matches */
 			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
 
@@ -801,7 +862,7 @@  static void intel_lrc_irq_handler(unsigned long data)
 		}
 	}
 
-	if (execlists_elsp_ready(engine))
+	if (!execlists->preempt)
 		execlists_dequeue(engine);
 
 	intel_uncore_forcewake_put(dev_priv, execlists->fw_domains); @@ -814,7 +875,7 @@ static void insert_request(struct intel_engine_cs *engine,
 	struct i915_priolist *p = lookup_priolist(engine, pt, prio);
 
 	list_add_tail(&pt->link, &ptr_mask_bits(p, 1)->requests);
-	if (ptr_unmask_bits(p, 1) && execlists_elsp_ready(engine))
+	if (ptr_unmask_bits(p, 1))
 		tasklet_hi_schedule(&engine->execlists.irq_tasklet);
 }
 
@@ -942,8 +1003,6 @@  static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
 	}
 
 	spin_unlock_irq(&engine->timeline->lock);
-
-	/* XXX Do we need to preempt to make room for us and our deps? */
 }
 
 static struct intel_ring *
@@ -1139,6 +1198,8 @@  static u32 *gen8_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 				       i915_ggtt_offset(engine->scratch) +
 				       2 * CACHELINE_BYTES);
 
+	*batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+
 	/* Pad to end of cacheline */
 	while ((unsigned long)batch % CACHELINE_BYTES)
 		*batch++ = MI_NOOP;
@@ -1154,6 +1215,8 @@  static u32 *gen8_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 
 static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)  {
+	*batch++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
+
 	/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
 	batch = gen8_emit_flush_coherentl3_wa(engine, batch);
 
@@ -1199,6 +1262,8 @@  static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 		*batch++ = 0;
 	}
 
+	*batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+
 	/* Pad to end of cacheline */
 	while ((unsigned long)batch % CACHELINE_BYTES)
 		*batch++ = MI_NOOP;
@@ -1352,6 +1417,7 @@  static int gen8_init_common_ring(struct intel_engine_cs *engine)
 		   GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift);
 	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 	execlists->csb_head = -1;
+	execlists->preempt = false;
 
 	/* After a GPU reset, we may have requests to replay */
 	if (!i915_modparams.enable_guc_submission && execlists->first) @@ -1647,7 +1713,8 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
  */
 static void gen8_emit_wa_tail(struct drm_i915_gem_request *request, u32 *cs)  {
-	*cs++ = MI_NOOP;
+	/* Ensure there's always at least one preemption point per-request. */
+	*cs++ = MI_ARB_CHECK;
 	*cs++ = MI_NOOP;
 	request->wa_tail = intel_ring_offset(request, cs);  } @@ -1668,7 +1735,6 @@ static void gen8_emit_breadcrumb(struct drm_i915_gem_request *request, u32 *cs)
 
 	gen8_emit_wa_tail(request, cs);
 }
-
 static const int gen8_emit_breadcrumb_sz = 6 + WA_TAIL_DWORDS;
 
 static void gen8_emit_breadcrumb_render(struct drm_i915_gem_request *request, @@ -1696,7 +1762,6 @@ static void gen8_emit_breadcrumb_render(struct drm_i915_gem_request *request,
 
 	gen8_emit_wa_tail(request, cs);
 }
-
 static const int gen8_emit_breadcrumb_render_sz = 8 + WA_TAIL_DWORDS;
 
 static int gen8_init_rcs_context(struct drm_i915_gem_request *req) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 56d7ae9f298b..891d9555dce6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -238,6 +238,8 @@  struct intel_engine_execlists {  #define EXECLIST_MAX_PORTS 2
 	} port[EXECLIST_MAX_PORTS];
 
+	bool preempt;
+
 	/**
 	 * @port_mask: number of execlist ports - 1
 	 */
--
2.14.1