diff mbox series

drm/i915: Extend Haswell GT1 PSMI workaround to all

Message ID 20190917194746.26710-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915: Extend Haswell GT1 PSMI workaround to all | expand

Commit Message

Chris Wilson Sept. 17, 2019, 7:47 p.m. UTC
A few times in CI, we have detected a GPU hang on our Haswell GT2
systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a
was first introducted, it was applied to all Haswell, but later on we
found an erratum that supposedly restricted the issue to GT1 and so
constrained it only be applied on GT1. That may have been a mistake...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111692
Fixes: 167bc759e823 ("drm/i915: Restrict PSMI context load w/a to Haswell GT1")
References: 2c550183476d ("drm/i915: Disable PSMI sleep messages on all rings around context switches")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Chris Wilson Sept. 17, 2019, 8:23 p.m. UTC | #1
Quoting Chris Wilson (2019-09-17 20:47:46)
> A few times in CI, we have detected a GPU hang on our Haswell GT2
> systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a
> was first introducted, it was applied to all Haswell, but later on we
> found an erratum that supposedly restricted the issue to GT1 and so
> constrained it only be applied on GT1. That may have been a mistake...

Something else to bear in mind about why this is showing up now, is that
the enabling of iommu on these machines. It's the last instruction in
the context image... Could we need to expand the context?
-Chris
Chris Wilson Sept. 17, 2019, 8:26 p.m. UTC | #2
Quoting Chris Wilson (2019-09-17 21:23:01)
> Quoting Chris Wilson (2019-09-17 20:47:46)
> > A few times in CI, we have detected a GPU hang on our Haswell GT2
> > systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a
> > was first introducted, it was applied to all Haswell, but later on we
> > found an erratum that supposedly restricted the issue to GT1 and so
> > constrained it only be applied on GT1. That may have been a mistake...
> 
> Something else to bear in mind about why this is showing up now, is that
> the enabling of iommu on these machines. It's the last instruction in
> the context image... Could we need to expand the context?

Fwiw, we say the maximum size for the haswell context is 70270, which
even expanding for prefetch is well inside the next page boundary of
73728. Furthermore, no DMAR faults. The coincidence may be a timing
artifact of the iommu indirection? Or just a mere coincidence.
-Chris
Mika Kuoppala Sept. 18, 2019, 10:32 a.m. UTC | #3
Chris Wilson <chris@chris-wilson.co.uk> writes:

> A few times in CI, we have detected a GPU hang on our Haswell GT2
> systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a
> was first introducted, it was applied to all Haswell, but later on we
> found an erratum that supposedly restricted the issue to GT1 and so
> constrained it only be applied on GT1. That may have been a mistake...
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111692
> Fixes: 167bc759e823 ("drm/i915: Restrict PSMI context load w/a to Haswell GT1")
> References: 2c550183476d ("drm/i915: Disable PSMI sleep messages on all rings around context switches")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

I see no harm of extending the umbrella disabling sleep
so,

Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


> ---
>  drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> index a73296e6b13d..a25b84b12ef1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> @@ -1574,7 +1574,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>  	struct intel_engine_cs *engine = rq->engine;
>  	enum intel_engine_id id;
>  	const int num_engines =
> -		IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0;
> +		IS_HASWELL(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0;
>  	bool force_restore = false;
>  	int len;
>  	u32 *cs;
> -- 
> 2.23.0
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index a73296e6b13d..a25b84b12ef1 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1574,7 +1574,7 @@  static inline int mi_set_context(struct i915_request *rq, u32 flags)
 	struct intel_engine_cs *engine = rq->engine;
 	enum intel_engine_id id;
 	const int num_engines =
-		IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0;
+		IS_HASWELL(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0;
 	bool force_restore = false;
 	int len;
 	u32 *cs;