[2/2] drm/i915: Add WaKBLVECSSemaphoreWaitPoll

Message ID	20180605160357.32591-2-mika.kuoppala@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Mika Kuoppala <mika.kuoppala@linux.intel.com> To: intel-gfx@lists.freedesktop.org Date: Tue, 5 Jun 2018 19:03:57 +0300 Message-Id: <20180605160357.32591-2-mika.kuoppala@linux.intel.com> In-Reply-To: <20180605160357.32591-1-mika.kuoppala@linux.intel.com> References: <20180605160357.32591-1-mika.kuoppala@linux.intel.com> Subject: [Intel-gfx] [PATCH 2/2] drm/i915: Add WaKBLVECSSemaphoreWaitPoll Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Mika Kuoppala June 5, 2018, 4:03 p.m. UTC

There is a problem with kbl up to rev E0 where a heavy
memory/fabric traffic from adjacent engine(s) can cause an engine
reset to fail. This traffic can be from normal memory accesses
or it can be from heavy polling on a semaphore wait.

For engine hogging causing a fail, we already fallback to
full reset. Which effectively stops all engines and thus
we only add a workaround documentation.

For the semaphore wait loop poll case, we add one microsecond
poll interval to semaphore wait to guarantee bandwidth for
the reset preration. The side effect is that we make semaphore
completion latencies also 1us longer.

v2: Let full reset handle the adjacent engine idling (Chris)

References: https://bugs.freedesktop.org/show_bug.cgi?id=106684
References: VTHSD#2227190, HSDES#1604216706, BSID#0917
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h          | 1 +
 drivers/gpu/drm/i915/intel_uncore.c      | 2 ++
 drivers/gpu/drm/i915/intel_workarounds.c | 9 +++++++++
 3 files changed, 12 insertions(+)

Chris Wilson June 5, 2018, 4:12 p.m. UTC | #1

Quoting Mika Kuoppala (2018-06-05 17:03:57)
> There is a problem with kbl up to rev E0 where a heavy
> memory/fabric traffic from adjacent engine(s) can cause an engine
> reset to fail. This traffic can be from normal memory accesses
> or it can be from heavy polling on a semaphore wait.
> 
> For engine hogging causing a fail, we already fallback to
> full reset. Which effectively stops all engines and thus
> we only add a workaround documentation.
> 
> For the semaphore wait loop poll case, we add one microsecond
> poll interval to semaphore wait to guarantee bandwidth for
> the reset preration. The side effect is that we make semaphore
> completion latencies also 1us longer.
> 
> v2: Let full reset handle the adjacent engine idling (Chris)
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=106684
> References: VTHSD#2227190, HSDES#1604216706, BSID#0917
> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> diff --git a/drivers/gpu/drm/i915/intel_workarounds.c b/drivers/gpu/drm/i915/intel_workarounds.c
> index b1ab56a1ec31..5655d39c65cb 100644
> --- a/drivers/gpu/drm/i915/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/intel_workarounds.c
> @@ -666,6 +666,15 @@ static void kbl_gt_workarounds_apply(struct drm_i915_private *dev_priv)
>         I915_WRITE(GEN9_GAMT_ECO_REG_RW_IA,
>                    I915_READ(GEN9_GAMT_ECO_REG_RW_IA) |
>                    GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS);
> +
> +       /* WaKBLVECSSemaphoreWaitPoll:kbl */
> +       if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0)) {

Hmm, what revision was production? Just checking we need to ship this
w/a...
-Chris

Mika Kuoppala June 6, 2018, 8:40 a.m. UTC | #2

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-06-05 17:03:57)
>> There is a problem with kbl up to rev E0 where a heavy
>> memory/fabric traffic from adjacent engine(s) can cause an engine
>> reset to fail. This traffic can be from normal memory accesses
>> or it can be from heavy polling on a semaphore wait.
>> 
>> For engine hogging causing a fail, we already fallback to
>> full reset. Which effectively stops all engines and thus
>> we only add a workaround documentation.
>> 
>> For the semaphore wait loop poll case, we add one microsecond
>> poll interval to semaphore wait to guarantee bandwidth for
>> the reset preration. The side effect is that we make semaphore
>> completion latencies also 1us longer.
>> 
>> v2: Let full reset handle the adjacent engine idling (Chris)
>> 
>> References: https://bugs.freedesktop.org/show_bug.cgi?id=106684
>> References: VTHSD#2227190, HSDES#1604216706, BSID#0917
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> ---
>> diff --git a/drivers/gpu/drm/i915/intel_workarounds.c b/drivers/gpu/drm/i915/intel_workarounds.c
>> index b1ab56a1ec31..5655d39c65cb 100644
>> --- a/drivers/gpu/drm/i915/intel_workarounds.c
>> +++ b/drivers/gpu/drm/i915/intel_workarounds.c
>> @@ -666,6 +666,15 @@ static void kbl_gt_workarounds_apply(struct drm_i915_private *dev_priv)
>>         I915_WRITE(GEN9_GAMT_ECO_REG_RW_IA,
>>                    I915_READ(GEN9_GAMT_ECO_REG_RW_IA) |
>>                    GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS);
>> +
>> +       /* WaKBLVECSSemaphoreWaitPoll:kbl */
>> +       if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0)) {
>
> Hmm, what revision was production? Just checking we need to ship this
> w/a...

The bspec list of revs seems outdated so can't trust that blindly
but already 0x1 is not preprod on that list. Also found nuc in lab
which is 0x02.

-Mika

Chris Wilson June 6, 2018, 8:47 a.m. UTC | #3

Quoting Mika Kuoppala (2018-06-06 09:40:11)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Quoting Mika Kuoppala (2018-06-05 17:03:57)
> >> There is a problem with kbl up to rev E0 where a heavy
> >> memory/fabric traffic from adjacent engine(s) can cause an engine
> >> reset to fail. This traffic can be from normal memory accesses
> >> or it can be from heavy polling on a semaphore wait.
> >> 
> >> For engine hogging causing a fail, we already fallback to
> >> full reset. Which effectively stops all engines and thus
> >> we only add a workaround documentation.
> >> 
> >> For the semaphore wait loop poll case, we add one microsecond
> >> poll interval to semaphore wait to guarantee bandwidth for
> >> the reset preration. The side effect is that we make semaphore
> >> completion latencies also 1us longer.
> >> 
> >> v2: Let full reset handle the adjacent engine idling (Chris)
> >> 
> >> References: https://bugs.freedesktop.org/show_bug.cgi?id=106684
> >> References: VTHSD#2227190, HSDES#1604216706, BSID#0917
> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> ---
> >> diff --git a/drivers/gpu/drm/i915/intel_workarounds.c b/drivers/gpu/drm/i915/intel_workarounds.c
> >> index b1ab56a1ec31..5655d39c65cb 100644
> >> --- a/drivers/gpu/drm/i915/intel_workarounds.c
> >> +++ b/drivers/gpu/drm/i915/intel_workarounds.c
> >> @@ -666,6 +666,15 @@ static void kbl_gt_workarounds_apply(struct drm_i915_private *dev_priv)
> >>         I915_WRITE(GEN9_GAMT_ECO_REG_RW_IA,
> >>                    I915_READ(GEN9_GAMT_ECO_REG_RW_IA) |
> >>                    GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS);
> >> +
> >> +       /* WaKBLVECSSemaphoreWaitPoll:kbl */
> >> +       if (IS_KBL_REVID(dev_priv, KBL_REVID_A0, KBL_REVID_E0)) {
> >
> > Hmm, what revision was production? Just checking we need to ship this
> > w/a...
> 
> The bspec list of revs seems outdated so can't trust that blindly
> but already 0x1 is not preprod on that list. Also found nuc in lab
> which is 0x02.

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>

Care to update intel_detect_preproduction_hw() ?
-Chris

Joonas Lahtinen June 7, 2018, 8:54 a.m. UTC | #4

Quoting Mika Kuoppala (2018-06-05 19:03:57)
> There is a problem with kbl up to rev E0 where a heavy
> memory/fabric traffic from adjacent engine(s) can cause an engine
> reset to fail. This traffic can be from normal memory accesses
> or it can be from heavy polling on a semaphore wait.
> 
> For engine hogging causing a fail, we already fallback to
> full reset. Which effectively stops all engines and thus
> we only add a workaround documentation.
> 
> For the semaphore wait loop poll case, we add one microsecond
> poll interval to semaphore wait to guarantee bandwidth for
> the reset preration. The side effect is that we make semaphore
> completion latencies also 1us longer.
> 
> v2: Let full reset handle the adjacent engine idling (Chris)
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=106684
> References: VTHSD#2227190, HSDES#1604216706, BSID#0917
> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Skip the RCS engine and this is;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas

Mika Kuoppala June 8, 2018, 9:31 a.m. UTC | #5

Joonas Lahtinen <joonas.lahtinen@linux.intel.com> writes:

> Quoting Mika Kuoppala (2018-06-05 19:03:57)
>> There is a problem with kbl up to rev E0 where a heavy
>> memory/fabric traffic from adjacent engine(s) can cause an engine
>> reset to fail. This traffic can be from normal memory accesses
>> or it can be from heavy polling on a semaphore wait.
>> 
>> For engine hogging causing a fail, we already fallback to
>> full reset. Which effectively stops all engines and thus
>> we only add a workaround documentation.
>> 
>> For the semaphore wait loop poll case, we add one microsecond
>> poll interval to semaphore wait to guarantee bandwidth for
>> the reset preration. The side effect is that we make semaphore
>> completion latencies also 1us longer.
>> 
>> v2: Let full reset handle the adjacent engine idling (Chris)
>> 
>> References: https://bugs.freedesktop.org/show_bug.cgi?id=106684
>> References: VTHSD#2227190, HSDES#1604216706, BSID#0917
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>
> Skip the RCS engine and this is;
>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

RCS engine skipped on v2, and both patches pushed.
Thank you both for review.
-Mika

[2/2] drm/i915: Add WaKBLVECSSemaphoreWaitPoll

Commit Message

Comments

Patch