Message ID | 20190917194746.26710-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Extend Haswell GT1 PSMI workaround to all | expand |
Quoting Chris Wilson (2019-09-17 20:47:46) > A few times in CI, we have detected a GPU hang on our Haswell GT2 > systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a > was first introducted, it was applied to all Haswell, but later on we > found an erratum that supposedly restricted the issue to GT1 and so > constrained it only be applied on GT1. That may have been a mistake... Something else to bear in mind about why this is showing up now, is that the enabling of iommu on these machines. It's the last instruction in the context image... Could we need to expand the context? -Chris
Quoting Chris Wilson (2019-09-17 21:23:01) > Quoting Chris Wilson (2019-09-17 20:47:46) > > A few times in CI, we have detected a GPU hang on our Haswell GT2 > > systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a > > was first introducted, it was applied to all Haswell, but later on we > > found an erratum that supposedly restricted the issue to GT1 and so > > constrained it only be applied on GT1. That may have been a mistake... > > Something else to bear in mind about why this is showing up now, is that > the enabling of iommu on these machines. It's the last instruction in > the context image... Could we need to expand the context? Fwiw, we say the maximum size for the haswell context is 70270, which even expanding for prefetch is well inside the next page boundary of 73728. Furthermore, no DMAR faults. The coincidence may be a timing artifact of the iommu indirection? Or just a mere coincidence. -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > A few times in CI, we have detected a GPU hang on our Haswell GT2 > systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a > was first introducted, it was applied to all Haswell, but later on we > found an erratum that supposedly restricted the issue to GT1 and so > constrained it only be applied on GT1. That may have been a mistake... > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111692 > Fixes: 167bc759e823 ("drm/i915: Restrict PSMI context load w/a to Haswell GT1") > References: 2c550183476d ("drm/i915: Disable PSMI sleep messages on all rings around context switches") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> I see no harm of extending the umbrella disabling sleep so, Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > --- > drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c > index a73296e6b13d..a25b84b12ef1 100644 > --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c > @@ -1574,7 +1574,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags) > struct intel_engine_cs *engine = rq->engine; > enum intel_engine_id id; > const int num_engines = > - IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0; > + IS_HASWELL(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0; > bool force_restore = false; > int len; > u32 *cs; > -- > 2.23.0
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c index a73296e6b13d..a25b84b12ef1 100644 --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c @@ -1574,7 +1574,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags) struct intel_engine_cs *engine = rq->engine; enum intel_engine_id id; const int num_engines = - IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0; + IS_HASWELL(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0; bool force_restore = false; int len; u32 *cs;
A few times in CI, we have detected a GPU hang on our Haswell GT2 systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a was first introducted, it was applied to all Haswell, but later on we found an erratum that supposedly restricted the issue to GT1 and so constrained it only be applied on GT1. That may have been a mistake... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111692 Fixes: 167bc759e823 ("drm/i915: Restrict PSMI context load w/a to Haswell GT1") References: 2c550183476d ("drm/i915: Disable PSMI sleep messages on all rings around context switches") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> --- drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)