Message ID | 20180530150206.7798-2-mika.kuoppala@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Mika Kuoppala (2018-05-30 16:02:06) > There is a problem with kbl up to rev E0 where a heavy > memory traffic from adjacent engine(s) can cause an engine > reset to fail. This traffic can be from normal memory accesses > or it can be from heavy polling on a semaphore wait. > > To combat the normal traffic, we do our best to idle the adjacent > engines, before we ask the engine to prepare for reset. For per > engine reset, this will add an unwanted extra latency as we > do blanket approach before every reset. In past already have > noticed that idling an engine before reset, improves our chances > of resetting it, but this only idles the engines we are about to > reset, not the adjancent ones. Unfortunately we don't have a lock on the other engines, so can't prevent two resets running in parallel clobbering state on the other. So what's stopping the failure mode of falling back to resetting all engines at once if resetting one fails? Is it a catastrophic failure? > We could only take the approach of idling adjacent engines, > if the first reset fails. But in this area, it is usually best > to get it right off the bat. > > For the second issue where unlimited semaphore wait poll loop > is generating the heavy memory traffic and preventing a reset, > we add one microsecond poll interval to semaphore wait to > guarantee bandwidth for the reset preration. The side effect > is that we make semaphore completion latencies also 1us longer. You know the rule: second issue, second patch. That's odd, I would have expected a MI_SEMA op to be an arbitration point (even inside the busy wait loop), so would have expected it to behave nicely with STOP_RING. -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > Quoting Mika Kuoppala (2018-05-30 16:02:06) >> There is a problem with kbl up to rev E0 where a heavy >> memory traffic from adjacent engine(s) can cause an engine >> reset to fail. This traffic can be from normal memory accesses >> or it can be from heavy polling on a semaphore wait. >> >> To combat the normal traffic, we do our best to idle the adjacent >> engines, before we ask the engine to prepare for reset. For per >> engine reset, this will add an unwanted extra latency as we >> do blanket approach before every reset. In past already have >> noticed that idling an engine before reset, improves our chances >> of resetting it, but this only idles the engines we are about to >> reset, not the adjancent ones. > > Unfortunately we don't have a lock on the other engines, so can't > prevent two resets running in parallel clobbering state on the other. > > So what's stopping the failure mode of falling back to resetting all > engines at once if resetting one fails? Is it a catastrophic failure? Nothing that I can think of and for this we should just let the full reset and it's explicit idling to be our backup plan. And it if ever shows itself as frequent enough to warrant further work, we should consider doing a full reset right from the start on the affected hw. -Mika
Chris Wilson <chris@chris-wilson.co.uk> writes: > Quoting Mika Kuoppala (2018-05-30 16:02:06) >> >> For the second issue where unlimited semaphore wait poll loop >> is generating the heavy memory traffic and preventing a reset, >> we add one microsecond poll interval to semaphore wait to >> guarantee bandwidth for the reset preration. The side effect >> is that we make semaphore completion latencies also 1us longer. > > You know the rule: second issue, second patch. That's odd, I would have > expected a MI_SEMA op to be an arbitration point (even inside the busy > wait loop), so would have expected it to behave nicely with STOP_RING. My interpretation was that the context save gets delayed/stuck. -Mika
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index f238b7b33cd9..3c615a865cc5 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -2237,6 +2237,7 @@ enum i915_power_well_id { #define RING_RESET_CTL(base) _MMIO((base)+0xd0) #define RESET_CTL_REQUEST_RESET (1 << 0) #define RESET_CTL_READY_TO_RESET (1 << 1) +#define RING_SEMA_WAIT_POLL(base) _MMIO((base)+0x24c) #define HSW_GTT_CACHE_EN _MMIO(0x4024) #define GTT_CACHE_EN_ALL 0xF0007FFF diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 13448ea76f57..4d30cbb2281e 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -808,6 +808,36 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine) return err; } +int intel_engine_start_cs(struct intel_engine_cs *engine) +{ + struct drm_i915_private *dev_priv = engine->i915; + const u32 base = engine->mmio_base; + const i915_reg_t mode = RING_MI_MODE(base); + int err; + + if (INTEL_GEN(dev_priv) < 3) + return -ENODEV; + + GEM_TRACE("%s\n", engine->name); + + I915_WRITE_FW(mode, _MASKED_BIT_DISABLE(STOP_RING)); + + err = 0; + if (__intel_wait_for_register_fw(dev_priv, + mode, MODE_IDLE, 0, + 1000, 0, + NULL)) { + GEM_TRACE("%s: timed out on STOP_RING -> NOT IDLE\n", + engine->name); + err = -ETIMEDOUT; + } + + /* A final mmio read to let GPU writes be hopefully flushed to memory */ + POSTING_READ_FW(mode); + + return err; +} + const char *i915_cache_level_str(struct drm_i915_private *i915, int type) { switch (type) { diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index acef385c4c80..5e2c59128fa9 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -879,6 +879,7 @@ int intel_init_blt_ring_buffer(struct intel_engine_cs *engine); int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine); int intel_engine_stop_cs(struct intel_engine_cs *engine); +int intel_engine_start_cs(struct intel_engine_cs *engine); u64 intel_engine_get_active_head(const struct intel_engine_cs *engine); u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine); diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 68fe4c16acfb..ffbae5c44b8c 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -2088,12 +2088,34 @@ static void gen8_reset_engine_cancel(struct intel_engine_cs *engine) _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET)); } +static void gen8_idle_engines(struct drm_i915_private *i915) +{ + struct intel_engine_cs *engine; + unsigned int tmp; + + for_each_engine(engine, i915, tmp) + intel_engine_stop_cs(engine); +} + +static void gen8_unidle_engines(struct drm_i915_private *i915) +{ + struct intel_engine_cs *engine; + unsigned int tmp; + + for_each_engine(engine, i915, tmp) + intel_engine_start_cs(engine); +} + static int gen8_reset_engines(struct drm_i915_private *dev_priv, unsigned engine_mask) { struct intel_engine_cs *engine; unsigned int tmp, ret; + /* WaKBLVECSSemaphoreWaitPoll:kbl */ + if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0)) + gen8_idle_engines(dev_priv); + for_each_engine_masked(engine, dev_priv, engine_mask, tmp) { if (gen8_reset_engine_start(engine)) { ret = -EIO; @@ -2111,6 +2133,9 @@ static int gen8_reset_engines(struct drm_i915_private *dev_priv, for_each_engine_masked(engine, dev_priv, engine_mask, tmp) gen8_reset_engine_cancel(engine); + if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0)) + gen8_unidle_engines(dev_priv); + return ret; } diff --git a/drivers/gpu/drm/i915/intel_workarounds.c b/drivers/gpu/drm/i915/intel_workarounds.c index b1ab56a1ec31..5655d39c65cb 100644 --- a/drivers/gpu/drm/i915/intel_workarounds.c +++ b/drivers/gpu/drm/i915/intel_workarounds.c @@ -666,6 +666,15 @@ static void kbl_gt_workarounds_apply(struct drm_i915_private *dev_priv) I915_WRITE(GEN9_GAMT_ECO_REG_RW_IA, I915_READ(GEN9_GAMT_ECO_REG_RW_IA) | GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS); + + /* WaKBLVECSSemaphoreWaitPoll:kbl */ + if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0)) { + struct intel_engine_cs *engine; + unsigned int tmp; + + for_each_engine(engine, dev_priv, tmp) + I915_WRITE(RING_SEMA_WAIT_POLL(engine->mmio_base), 1); + } } static void glk_gt_workarounds_apply(struct drm_i915_private *dev_priv)
There is a problem with kbl up to rev E0 where a heavy memory traffic from adjacent engine(s) can cause an engine reset to fail. This traffic can be from normal memory accesses or it can be from heavy polling on a semaphore wait. To combat the normal traffic, we do our best to idle the adjacent engines, before we ask the engine to prepare for reset. For per engine reset, this will add an unwanted extra latency as we do blanket approach before every reset. In past already have noticed that idling an engine before reset, improves our chances of resetting it, but this only idles the engines we are about to reset, not the adjancent ones. We could only take the approach of idling adjacent engines, if the first reset fails. But in this area, it is usually best to get it right off the bat. For the second issue where unlimited semaphore wait poll loop is generating the heavy memory traffic and preventing a reset, we add one microsecond poll interval to semaphore wait to guarantee bandwidth for the reset preration. The side effect is that we make semaphore completion latencies also 1us longer. References: VTHSD#2227190, HSDES#1604216706, BSID#0917 Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> --- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_engine_cs.c | 30 ++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + drivers/gpu/drm/i915/intel_uncore.c | 25 ++++++++++++++++++++ drivers/gpu/drm/i915/intel_workarounds.c | 9 +++++++ 5 files changed, 66 insertions(+)