Message ID | 20180605160357.32591-2-mika.kuoppala@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Mika Kuoppala (2018-06-05 17:03:57) > There is a problem with kbl up to rev E0 where a heavy > memory/fabric traffic from adjacent engine(s) can cause an engine > reset to fail. This traffic can be from normal memory accesses > or it can be from heavy polling on a semaphore wait. > > For engine hogging causing a fail, we already fallback to > full reset. Which effectively stops all engines and thus > we only add a workaround documentation. > > For the semaphore wait loop poll case, we add one microsecond > poll interval to semaphore wait to guarantee bandwidth for > the reset preration. The side effect is that we make semaphore > completion latencies also 1us longer. > > v2: Let full reset handle the adjacent engine idling (Chris) > > References: https://bugs.freedesktop.org/show_bug.cgi?id=106684 > References: VTHSD#2227190, HSDES#1604216706, BSID#0917 > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > --- > diff --git a/drivers/gpu/drm/i915/intel_workarounds.c b/drivers/gpu/drm/i915/intel_workarounds.c > index b1ab56a1ec31..5655d39c65cb 100644 > --- a/drivers/gpu/drm/i915/intel_workarounds.c > +++ b/drivers/gpu/drm/i915/intel_workarounds.c > @@ -666,6 +666,15 @@ static void kbl_gt_workarounds_apply(struct drm_i915_private *dev_priv) > I915_WRITE(GEN9_GAMT_ECO_REG_RW_IA, > I915_READ(GEN9_GAMT_ECO_REG_RW_IA) | > GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS); > + > + /* WaKBLVECSSemaphoreWaitPoll:kbl */ > + if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0)) { Hmm, what revision was production? Just checking we need to ship this w/a... -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > Quoting Mika Kuoppala (2018-06-05 17:03:57) >> There is a problem with kbl up to rev E0 where a heavy >> memory/fabric traffic from adjacent engine(s) can cause an engine >> reset to fail. This traffic can be from normal memory accesses >> or it can be from heavy polling on a semaphore wait. >> >> For engine hogging causing a fail, we already fallback to >> full reset. Which effectively stops all engines and thus >> we only add a workaround documentation. >> >> For the semaphore wait loop poll case, we add one microsecond >> poll interval to semaphore wait to guarantee bandwidth for >> the reset preration. The side effect is that we make semaphore >> completion latencies also 1us longer. >> >> v2: Let full reset handle the adjacent engine idling (Chris) >> >> References: https://bugs.freedesktop.org/show_bug.cgi?id=106684 >> References: VTHSD#2227190, HSDES#1604216706, BSID#0917 >> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> >> --- >> diff --git a/drivers/gpu/drm/i915/intel_workarounds.c b/drivers/gpu/drm/i915/intel_workarounds.c >> index b1ab56a1ec31..5655d39c65cb 100644 >> --- a/drivers/gpu/drm/i915/intel_workarounds.c >> +++ b/drivers/gpu/drm/i915/intel_workarounds.c >> @@ -666,6 +666,15 @@ static void kbl_gt_workarounds_apply(struct drm_i915_private *dev_priv) >> I915_WRITE(GEN9_GAMT_ECO_REG_RW_IA, >> I915_READ(GEN9_GAMT_ECO_REG_RW_IA) | >> GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS); >> + >> + /* WaKBLVECSSemaphoreWaitPoll:kbl */ >> + if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0)) { > > Hmm, what revision was production? Just checking we need to ship this > w/a... The bspec list of revs seems outdated so can't trust that blindly but already 0x1 is not preprod on that list. Also found nuc in lab which is 0x02. -Mika
Quoting Mika Kuoppala (2018-06-06 09:40:11) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > Quoting Mika Kuoppala (2018-06-05 17:03:57) > >> There is a problem with kbl up to rev E0 where a heavy > >> memory/fabric traffic from adjacent engine(s) can cause an engine > >> reset to fail. This traffic can be from normal memory accesses > >> or it can be from heavy polling on a semaphore wait. > >> > >> For engine hogging causing a fail, we already fallback to > >> full reset. Which effectively stops all engines and thus > >> we only add a workaround documentation. > >> > >> For the semaphore wait loop poll case, we add one microsecond > >> poll interval to semaphore wait to guarantee bandwidth for > >> the reset preration. The side effect is that we make semaphore > >> completion latencies also 1us longer. > >> > >> v2: Let full reset handle the adjacent engine idling (Chris) > >> > >> References: https://bugs.freedesktop.org/show_bug.cgi?id=106684 > >> References: VTHSD#2227190, HSDES#1604216706, BSID#0917 > >> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > >> --- > >> diff --git a/drivers/gpu/drm/i915/intel_workarounds.c b/drivers/gpu/drm/i915/intel_workarounds.c > >> index b1ab56a1ec31..5655d39c65cb 100644 > >> --- a/drivers/gpu/drm/i915/intel_workarounds.c > >> +++ b/drivers/gpu/drm/i915/intel_workarounds.c > >> @@ -666,6 +666,15 @@ static void kbl_gt_workarounds_apply(struct drm_i915_private *dev_priv) > >> I915_WRITE(GEN9_GAMT_ECO_REG_RW_IA, > >> I915_READ(GEN9_GAMT_ECO_REG_RW_IA) | > >> GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS); > >> + > >> + /* WaKBLVECSSemaphoreWaitPoll:kbl */ > >> + if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0)) { > > > > Hmm, what revision was production? Just checking we need to ship this > > w/a... > > The bspec list of revs seems outdated so can't trust that blindly > but already 0x1 is not preprod on that list. Also found nuc in lab > which is 0x02. Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Care to update intel_detect_preproduction_hw() ? -Chris
Quoting Mika Kuoppala (2018-06-05 19:03:57) > There is a problem with kbl up to rev E0 where a heavy > memory/fabric traffic from adjacent engine(s) can cause an engine > reset to fail. This traffic can be from normal memory accesses > or it can be from heavy polling on a semaphore wait. > > For engine hogging causing a fail, we already fallback to > full reset. Which effectively stops all engines and thus > we only add a workaround documentation. > > For the semaphore wait loop poll case, we add one microsecond > poll interval to semaphore wait to guarantee bandwidth for > the reset preration. The side effect is that we make semaphore > completion latencies also 1us longer. > > v2: Let full reset handle the adjacent engine idling (Chris) > > References: https://bugs.freedesktop.org/show_bug.cgi?id=106684 > References: VTHSD#2227190, HSDES#1604216706, BSID#0917 > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Skip the RCS engine and this is; Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Regards, Joonas
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> writes: > Quoting Mika Kuoppala (2018-06-05 19:03:57) >> There is a problem with kbl up to rev E0 where a heavy >> memory/fabric traffic from adjacent engine(s) can cause an engine >> reset to fail. This traffic can be from normal memory accesses >> or it can be from heavy polling on a semaphore wait. >> >> For engine hogging causing a fail, we already fallback to >> full reset. Which effectively stops all engines and thus >> we only add a workaround documentation. >> >> For the semaphore wait loop poll case, we add one microsecond >> poll interval to semaphore wait to guarantee bandwidth for >> the reset preration. The side effect is that we make semaphore >> completion latencies also 1us longer. >> >> v2: Let full reset handle the adjacent engine idling (Chris) >> >> References: https://bugs.freedesktop.org/show_bug.cgi?id=106684 >> References: VTHSD#2227190, HSDES#1604216706, BSID#0917 >> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > Skip the RCS engine and this is; > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> RCS engine skipped on v2, and both patches pushed. Thank you both for review. -Mika
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index f0317bde3aab..0e8c7896cd74 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -2242,6 +2242,7 @@ enum i915_power_well_id { #define RING_RESET_CTL(base) _MMIO((base)+0xd0) #define RESET_CTL_REQUEST_RESET (1 << 0) #define RESET_CTL_READY_TO_RESET (1 << 1) +#define RING_SEMA_WAIT_POLL(base) _MMIO((base)+0x24c) #define HSW_GTT_CACHE_EN _MMIO(0x4024) #define GTT_CACHE_EN_ALL 0xF0007FFF diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index bb03f6d8b3d1..b892ca8396e8 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -2174,6 +2174,8 @@ int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) * Thus assume it is best to stop engines on all gens * where we have a gpu reset. * + * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES) + * * WaMediaResetMainRingCleanup:ctg,elk (presumably) * * FIXME: Wa for more modern gens needs to be validated diff --git a/drivers/gpu/drm/i915/intel_workarounds.c b/drivers/gpu/drm/i915/intel_workarounds.c index b1ab56a1ec31..5655d39c65cb 100644 --- a/drivers/gpu/drm/i915/intel_workarounds.c +++ b/drivers/gpu/drm/i915/intel_workarounds.c @@ -666,6 +666,15 @@ static void kbl_gt_workarounds_apply(struct drm_i915_private *dev_priv) I915_WRITE(GEN9_GAMT_ECO_REG_RW_IA, I915_READ(GEN9_GAMT_ECO_REG_RW_IA) | GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS); + + /* WaKBLVECSSemaphoreWaitPoll:kbl */ + if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0)) { + struct intel_engine_cs *engine; + unsigned int tmp; + + for_each_engine(engine, dev_priv, tmp) + I915_WRITE(RING_SEMA_WAIT_POLL(engine->mmio_base), 1); + } } static void glk_gt_workarounds_apply(struct drm_i915_private *dev_priv)
There is a problem with kbl up to rev E0 where a heavy memory/fabric traffic from adjacent engine(s) can cause an engine reset to fail. This traffic can be from normal memory accesses or it can be from heavy polling on a semaphore wait. For engine hogging causing a fail, we already fallback to full reset. Which effectively stops all engines and thus we only add a workaround documentation. For the semaphore wait loop poll case, we add one microsecond poll interval to semaphore wait to guarantee bandwidth for the reset preration. The side effect is that we make semaphore completion latencies also 1us longer. v2: Let full reset handle the adjacent engine idling (Chris) References: https://bugs.freedesktop.org/show_bug.cgi?id=106684 References: VTHSD#2227190, HSDES#1604216706, BSID#0917 Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> --- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_uncore.c | 2 ++ drivers/gpu/drm/i915/intel_workarounds.c | 9 +++++++++ 3 files changed, 12 insertions(+)