diff mbox

[2/2] drm/i915: Add WaKBLVECSSemaphoreWaitPoll

Message ID 20180530150206.7798-2-mika.kuoppala@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mika Kuoppala May 30, 2018, 3:02 p.m. UTC
There is a problem with kbl up to rev E0 where a heavy
memory traffic from adjacent engine(s) can cause an engine
reset to fail. This traffic can be from normal memory accesses
or it can be from heavy polling on a semaphore wait.

To combat the normal traffic, we do our best to idle the adjacent
engines, before we ask the engine to prepare for reset. For per
engine reset, this will add an unwanted extra latency as we
do blanket approach before every reset. In past already have
noticed that idling an engine before reset, improves our chances
of resetting it, but this only idles the engines we are about to
reset, not the adjancent ones.

We could only take the approach of idling adjacent engines,
if the first reset fails. But in this area, it is usually best
to get it right off the bat.

For the second issue where unlimited semaphore wait poll loop
is generating the heavy memory traffic and preventing a reset,
we add one microsecond poll interval to semaphore wait to
guarantee bandwidth for the reset preration. The side effect
is that we make semaphore completion latencies also 1us longer.

References: VTHSD#2227190, HSDES#1604216706, BSID#0917
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h          |  1 +
 drivers/gpu/drm/i915/intel_engine_cs.c   | 30 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  1 +
 drivers/gpu/drm/i915/intel_uncore.c      | 25 ++++++++++++++++++++
 drivers/gpu/drm/i915/intel_workarounds.c |  9 +++++++
 5 files changed, 66 insertions(+)

Comments

Chris Wilson May 30, 2018, 8:19 p.m. UTC | #1
Quoting Mika Kuoppala (2018-05-30 16:02:06)
> There is a problem with kbl up to rev E0 where a heavy
> memory traffic from adjacent engine(s) can cause an engine
> reset to fail. This traffic can be from normal memory accesses
> or it can be from heavy polling on a semaphore wait.
> 
> To combat the normal traffic, we do our best to idle the adjacent
> engines, before we ask the engine to prepare for reset. For per
> engine reset, this will add an unwanted extra latency as we
> do blanket approach before every reset. In past already have
> noticed that idling an engine before reset, improves our chances
> of resetting it, but this only idles the engines we are about to
> reset, not the adjancent ones.

Unfortunately we don't have a lock on the other engines, so can't
prevent two resets running in parallel clobbering state on the other.

So what's stopping the failure mode of falling back to resetting all
engines at once if resetting one fails? Is it a catastrophic failure?
 
> We could only take the approach of idling adjacent engines,
> if the first reset fails. But in this area, it is usually best
> to get it right off the bat.
> 
> For the second issue where unlimited semaphore wait poll loop
> is generating the heavy memory traffic and preventing a reset,
> we add one microsecond poll interval to semaphore wait to
> guarantee bandwidth for the reset preration. The side effect
> is that we make semaphore completion latencies also 1us longer.

You know the rule: second issue, second patch. That's odd, I would have
expected a MI_SEMA op to be an arbitration point (even inside the busy
wait loop), so would have expected it to behave nicely with STOP_RING.
-Chris
Mika Kuoppala June 5, 2018, 2:54 p.m. UTC | #2
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-05-30 16:02:06)
>> There is a problem with kbl up to rev E0 where a heavy
>> memory traffic from adjacent engine(s) can cause an engine
>> reset to fail. This traffic can be from normal memory accesses
>> or it can be from heavy polling on a semaphore wait.
>> 
>> To combat the normal traffic, we do our best to idle the adjacent
>> engines, before we ask the engine to prepare for reset. For per
>> engine reset, this will add an unwanted extra latency as we
>> do blanket approach before every reset. In past already have
>> noticed that idling an engine before reset, improves our chances
>> of resetting it, but this only idles the engines we are about to
>> reset, not the adjancent ones.
>
> Unfortunately we don't have a lock on the other engines, so can't
> prevent two resets running in parallel clobbering state on the other.
>
> So what's stopping the failure mode of falling back to resetting all
> engines at once if resetting one fails? Is it a catastrophic failure?

Nothing that I can think of and for this we should just let the full
reset and it's explicit idling to be our backup plan.

And it if ever shows itself as frequent enough to warrant further
work, we should consider doing a full reset right from the start on
the affected hw.

-Mika
Mika Kuoppala June 5, 2018, 4:01 p.m. UTC | #3
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-05-30 16:02:06)
>> 
>> For the second issue where unlimited semaphore wait poll loop
>> is generating the heavy memory traffic and preventing a reset,
>> we add one microsecond poll interval to semaphore wait to
>> guarantee bandwidth for the reset preration. The side effect
>> is that we make semaphore completion latencies also 1us longer.
>
> You know the rule: second issue, second patch. That's odd, I would have
> expected a MI_SEMA op to be an arbitration point (even inside the busy
> wait loop), so would have expected it to behave nicely with STOP_RING.

My interpretation was that the context save gets delayed/stuck.

-Mika
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index f238b7b33cd9..3c615a865cc5 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -2237,6 +2237,7 @@  enum i915_power_well_id {
 #define RING_RESET_CTL(base)	_MMIO((base)+0xd0)
 #define   RESET_CTL_REQUEST_RESET  (1 << 0)
 #define   RESET_CTL_READY_TO_RESET (1 << 1)
+#define RING_SEMA_WAIT_POLL(base) _MMIO((base)+0x24c)
 
 #define HSW_GTT_CACHE_EN	_MMIO(0x4024)
 #define   GTT_CACHE_EN_ALL	0xF0007FFF
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 13448ea76f57..4d30cbb2281e 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -808,6 +808,36 @@  int intel_engine_stop_cs(struct intel_engine_cs *engine)
 	return err;
 }
 
+int intel_engine_start_cs(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	const u32 base = engine->mmio_base;
+	const i915_reg_t mode = RING_MI_MODE(base);
+	int err;
+
+	if (INTEL_GEN(dev_priv) < 3)
+		return -ENODEV;
+
+	GEM_TRACE("%s\n", engine->name);
+
+	I915_WRITE_FW(mode, _MASKED_BIT_DISABLE(STOP_RING));
+
+	err = 0;
+	if (__intel_wait_for_register_fw(dev_priv,
+					 mode, MODE_IDLE, 0,
+					 1000, 0,
+					 NULL)) {
+		GEM_TRACE("%s: timed out on STOP_RING -> NOT IDLE\n",
+			  engine->name);
+		err = -ETIMEDOUT;
+	}
+
+	/* A final mmio read to let GPU writes be hopefully flushed to memory */
+	POSTING_READ_FW(mode);
+
+	return err;
+}
+
 const char *i915_cache_level_str(struct drm_i915_private *i915, int type)
 {
 	switch (type) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index acef385c4c80..5e2c59128fa9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -879,6 +879,7 @@  int intel_init_blt_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine);
 
 int intel_engine_stop_cs(struct intel_engine_cs *engine);
+int intel_engine_start_cs(struct intel_engine_cs *engine);
 
 u64 intel_engine_get_active_head(const struct intel_engine_cs *engine);
 u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 68fe4c16acfb..ffbae5c44b8c 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -2088,12 +2088,34 @@  static void gen8_reset_engine_cancel(struct intel_engine_cs *engine)
 		      _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
 }
 
+static void gen8_idle_engines(struct drm_i915_private *i915)
+{
+	struct intel_engine_cs *engine;
+	unsigned int tmp;
+
+	for_each_engine(engine, i915, tmp)
+		intel_engine_stop_cs(engine);
+}
+
+static void gen8_unidle_engines(struct drm_i915_private *i915)
+{
+	struct intel_engine_cs *engine;
+	unsigned int tmp;
+
+	for_each_engine(engine, i915, tmp)
+		intel_engine_start_cs(engine);
+}
+
 static int gen8_reset_engines(struct drm_i915_private *dev_priv,
 			      unsigned engine_mask)
 {
 	struct intel_engine_cs *engine;
 	unsigned int tmp, ret;
 
+	/* WaKBLVECSSemaphoreWaitPoll:kbl */
+	if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0))
+		gen8_idle_engines(dev_priv);
+
 	for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
 		if (gen8_reset_engine_start(engine)) {
 			ret = -EIO;
@@ -2111,6 +2133,9 @@  static int gen8_reset_engines(struct drm_i915_private *dev_priv,
 		for_each_engine_masked(engine, dev_priv, engine_mask, tmp)
 			gen8_reset_engine_cancel(engine);
 
+	if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0))
+		gen8_unidle_engines(dev_priv);
+
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_workarounds.c b/drivers/gpu/drm/i915/intel_workarounds.c
index b1ab56a1ec31..5655d39c65cb 100644
--- a/drivers/gpu/drm/i915/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/intel_workarounds.c
@@ -666,6 +666,15 @@  static void kbl_gt_workarounds_apply(struct drm_i915_private *dev_priv)
 	I915_WRITE(GEN9_GAMT_ECO_REG_RW_IA,
 		   I915_READ(GEN9_GAMT_ECO_REG_RW_IA) |
 		   GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS);
+
+	/* WaKBLVECSSemaphoreWaitPoll:kbl */
+	if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0)) {
+		struct intel_engine_cs *engine;
+		unsigned int tmp;
+
+		for_each_engine(engine, dev_priv, tmp)
+			I915_WRITE(RING_SEMA_WAIT_POLL(engine->mmio_base), 1);
+	}
 }
 
 static void glk_gt_workarounds_apply(struct drm_i915_private *dev_priv)