drm/i915: Extend Haswell GT1 PSMI workaround to all

Message ID	20190917194746.26710-1-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=PSjr=XM=lists.freedesktop.org=intel-gfx-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8366320862 From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Tue, 17 Sep 2019 20:47:46 +0100 Message-Id: <20190917194746.26710-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH] drm/i915: Extend Haswell GT1 PSMI workaround to all Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	drm/i915: Extend Haswell GT1 PSMI workaround to all \| expand drm/i915: Extend Haswell GT1 PSMI workaround to all

Message ID

20190917194746.26710-1-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8366320862
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Tue, 17 Sep 2019 20:47:46 +0100
Message-Id: <20190917194746.26710-1-chris@chris-wilson.co.uk>
MIME-Version: 1.0
Subject: [Intel-gfx] [PATCH] drm/i915: Extend Haswell GT1 PSMI workaround to
 all
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

drm/i915: Extend Haswell GT1 PSMI workaround to all | expand

Commit Message

Chris Wilson Sept. 17, 2019, 7:47 p.m. UTC

A few times in CI, we have detected a GPU hang on our Haswell GT2
systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a
was first introducted, it was applied to all Haswell, but later on we
found an erratum that supposedly restricted the issue to GT1 and so
constrained it only be applied on GT1. That may have been a mistake...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111692
Fixes: 167bc759e823 ("drm/i915: Restrict PSMI context load w/a to Haswell GT1")
References: 2c550183476d ("drm/i915: Disable PSMI sleep messages on all rings around context switches")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Chris Wilson Sept. 17, 2019, 8:23 p.m. UTC | #1

Quoting Chris Wilson (2019-09-17 20:47:46)
> A few times in CI, we have detected a GPU hang on our Haswell GT2
> systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a
> was first introducted, it was applied to all Haswell, but later on we
> found an erratum that supposedly restricted the issue to GT1 and so
> constrained it only be applied on GT1. That may have been a mistake...

Something else to bear in mind about why this is showing up now, is that
the enabling of iommu on these machines. It's the last instruction in
the context image... Could we need to expand the context?
-Chris

Chris Wilson Sept. 17, 2019, 8:26 p.m. UTC | #2

Quoting Chris Wilson (2019-09-17 21:23:01)
> Quoting Chris Wilson (2019-09-17 20:47:46)
> > A few times in CI, we have detected a GPU hang on our Haswell GT2
> > systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a
> > was first introducted, it was applied to all Haswell, but later on we
> > found an erratum that supposedly restricted the issue to GT1 and so
> > constrained it only be applied on GT1. That may have been a mistake...
> 
> Something else to bear in mind about why this is showing up now, is that
> the enabling of iommu on these machines. It's the last instruction in
> the context image... Could we need to expand the context?

Fwiw, we say the maximum size for the haswell context is 70270, which
even expanding for prefetch is well inside the next page boundary of
73728. Furthermore, no DMAR faults. The coincidence may be a timing
artifact of the iommu indirection? Or just a mere coincidence.
-Chris

Mika Kuoppala Sept. 18, 2019, 10:32 a.m. UTC | #3

Chris Wilson <chris@chris-wilson.co.uk> writes:

> A few times in CI, we have detected a GPU hang on our Haswell GT2
> systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a
> was first introducted, it was applied to all Haswell, but later on we
> found an erratum that supposedly restricted the issue to GT1 and so
> constrained it only be applied on GT1. That may have been a mistake...
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111692
> Fixes: 167bc759e823 ("drm/i915: Restrict PSMI context load w/a to Haswell GT1")
> References: 2c550183476d ("drm/i915: Disable PSMI sleep messages on all rings around context switches")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

I see no harm of extending the umbrella disabling sleep
so,

Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


> ---
>  drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> index a73296e6b13d..a25b84b12ef1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> @@ -1574,7 +1574,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>  	struct intel_engine_cs *engine = rq->engine;
>  	enum intel_engine_id id;
>  	const int num_engines =
> -		IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0;
> +		IS_HASWELL(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0;
>  	bool force_restore = false;
>  	int len;
>  	u32 *cs;
> -- 
> 2.23.0

diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index a73296e6b13d..a25b84b12ef1 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1574,7 +1574,7 @@  static inline int mi_set_context(struct i915_request *rq, u32 flags)
 	struct intel_engine_cs *engine = rq->engine;
 	enum intel_engine_id id;
 	const int num_engines =
-		IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0;
+		IS_HASWELL(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0;
 	bool force_restore = false;
 	int len;
 	u32 *cs;

drm/i915: Extend Haswell GT1 PSMI workaround to all

Commit Message

Comments

Patch