Message ID | 20180517154726.686-2-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Chris Wilson (2018-05-17 16:47:26) > Inside the live_hangcheck (reset) selftests, we occasionally see > failures like > > <7>[ 239.094840] i915_gem_set_wedged rcs0 > <7>[ 239.094843] i915_gem_set_wedged current seqno 19a98, last 19a9a, hangcheck 0 [5158 ms] > <7>[ 239.094846] i915_gem_set_wedged Reset count: 6239 (global 1) > <7>[ 239.094848] i915_gem_set_wedged Requests: > <7>[ 239.095052] i915_gem_set_wedged first 19a99 [e8c:5f] prio=1024 @ 5159ms: (null) > <7>[ 239.095056] i915_gem_set_wedged last 19a9a [e81:1a] prio=139 @ 5159ms: igt/rcs0[5977]/1 > <7>[ 239.095059] i915_gem_set_wedged active 19a99 [e8c:5f] prio=1024 @ 5159ms: (null) > <7>[ 239.095062] i915_gem_set_wedged [head 0220, postfix 0280, tail 02a8, batch 0xffffffff_ffffffff] > <7>[ 239.100050] i915_gem_set_wedged ring->start: 0x00283000 > <7>[ 239.100053] i915_gem_set_wedged ring->head: 0x000001f8 > <7>[ 239.100055] i915_gem_set_wedged ring->tail: 0x000002a8 > <7>[ 239.100057] i915_gem_set_wedged ring->emit: 0x000002a8 > <7>[ 239.100059] i915_gem_set_wedged ring->space: 0x00000f10 > <7>[ 239.100085] i915_gem_set_wedged RING_START: 0x00283000 > <7>[ 239.100088] i915_gem_set_wedged RING_HEAD: 0x00000260 > <7>[ 239.100091] i915_gem_set_wedged RING_TAIL: 0x000002a8 > <7>[ 239.100094] i915_gem_set_wedged RING_CTL: 0x00000001 > <7>[ 239.100097] i915_gem_set_wedged RING_MODE: 0x00000300 [idle] > <7>[ 239.100100] i915_gem_set_wedged RING_IMR: fffffefe > <7>[ 239.100104] i915_gem_set_wedged ACTHD: 0x00000000_0000609c > <7>[ 239.100108] i915_gem_set_wedged BBADDR: 0x00000000_0000609d > <7>[ 239.100111] i915_gem_set_wedged DMA_FADDR: 0x00000000_00283260 > <7>[ 239.100114] i915_gem_set_wedged IPEIR: 0x00000000 > <7>[ 239.100117] i915_gem_set_wedged IPEHR: 0x02800000 > <7>[ 239.100120] i915_gem_set_wedged Execlist status: 0x00044052 00000002 > <7>[ 239.100124] i915_gem_set_wedged Execlist CSB read 5 [5 cached], write 5 [5 from hws], interrupt posted? no, tasklet queued? no (enabled) > <7>[ 239.100128] i915_gem_set_wedged ELSP[0] count=1, ring->start=00283000, rq: 19a99 [e8c:5f] prio=1024 @ 5164ms: (null) > <7>[ 239.100132] i915_gem_set_wedged ELSP[1] count=1, ring->start=00257000, rq: 19a9a [e81:1a] prio=139 @ 5164ms: igt/rcs0[5977]/1 > <7>[ 239.100135] i915_gem_set_wedged HW active? 0x5 > <7>[ 239.100250] i915_gem_set_wedged E 19a99 [e8c:5f] prio=1024 @ 5164ms: (null) > <7>[ 239.100338] i915_gem_set_wedged E 19a9a [e81:1a] prio=139 @ 5164ms: igt/rcs0[5977]/1 > <7>[ 239.100340] i915_gem_set_wedged Queue priority: 139 > <7>[ 239.100343] i915_gem_set_wedged Q 0 [e98:19] prio=132 @ 5164ms: igt/rcs0[5977]/8 > <7>[ 239.100346] i915_gem_set_wedged Q 0 [e84:19] prio=121 @ 5165ms: igt/rcs0[5977]/2 > <7>[ 239.100349] i915_gem_set_wedged Q 0 [e87:19] prio=82 @ 5165ms: igt/rcs0[5977]/3 > <7>[ 239.100352] i915_gem_set_wedged Q 0 [e84:1a] prio=44 @ 5164ms: igt/rcs0[5977]/2 > <7>[ 239.100356] i915_gem_set_wedged Q 0 [e8b:19] prio=20 @ 5165ms: igt/rcs0[5977]/4 > <7>[ 239.100362] i915_gem_set_wedged drv_selftest [5894] waiting for 19a99 > > where the GPU saw an arbitration point and idles; AND HAS NOT BEEN RESET! > The RING_MODE indicates that is idle and has the STOP_RING bit set, so > try clearing it. > > v2: Only clear the bit on restarting the ring, as we want to be sure the > STOP_RING bit is kept if reset fails on wedging. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> 2/2 passes, it might not just be a coincidence! Please kindly review, -Chris > --- > drivers/gpu/drm/i915/intel_lrc.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 646ecf267411..211585187d2f 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -1773,6 +1773,9 @@ static void enable_execlists(struct intel_engine_cs *engine) > I915_WRITE(RING_MODE_GEN7(engine), > _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE)); > > + I915_WRITE(RING_MI_MODE(engine->mmio_base), > + _MASKED_BIT_DISABLE(STOP_RING)); > + > I915_WRITE(RING_HWS_PGA(engine->mmio_base), > engine->status_page.ggtt_offset); > POSTING_READ(RING_HWS_PGA(engine->mmio_base)); > -- > 2.17.0 >
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 646ecf267411..211585187d2f 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1773,6 +1773,9 @@ static void enable_execlists(struct intel_engine_cs *engine) I915_WRITE(RING_MODE_GEN7(engine), _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE)); + I915_WRITE(RING_MI_MODE(engine->mmio_base), + _MASKED_BIT_DISABLE(STOP_RING)); + I915_WRITE(RING_HWS_PGA(engine->mmio_base), engine->status_page.ggtt_offset); POSTING_READ(RING_HWS_PGA(engine->mmio_base));
Inside the live_hangcheck (reset) selftests, we occasionally see failures like <7>[ 239.094840] i915_gem_set_wedged rcs0 <7>[ 239.094843] i915_gem_set_wedged current seqno 19a98, last 19a9a, hangcheck 0 [5158 ms] <7>[ 239.094846] i915_gem_set_wedged Reset count: 6239 (global 1) <7>[ 239.094848] i915_gem_set_wedged Requests: <7>[ 239.095052] i915_gem_set_wedged first 19a99 [e8c:5f] prio=1024 @ 5159ms: (null) <7>[ 239.095056] i915_gem_set_wedged last 19a9a [e81:1a] prio=139 @ 5159ms: igt/rcs0[5977]/1 <7>[ 239.095059] i915_gem_set_wedged active 19a99 [e8c:5f] prio=1024 @ 5159ms: (null) <7>[ 239.095062] i915_gem_set_wedged [head 0220, postfix 0280, tail 02a8, batch 0xffffffff_ffffffff] <7>[ 239.100050] i915_gem_set_wedged ring->start: 0x00283000 <7>[ 239.100053] i915_gem_set_wedged ring->head: 0x000001f8 <7>[ 239.100055] i915_gem_set_wedged ring->tail: 0x000002a8 <7>[ 239.100057] i915_gem_set_wedged ring->emit: 0x000002a8 <7>[ 239.100059] i915_gem_set_wedged ring->space: 0x00000f10 <7>[ 239.100085] i915_gem_set_wedged RING_START: 0x00283000 <7>[ 239.100088] i915_gem_set_wedged RING_HEAD: 0x00000260 <7>[ 239.100091] i915_gem_set_wedged RING_TAIL: 0x000002a8 <7>[ 239.100094] i915_gem_set_wedged RING_CTL: 0x00000001 <7>[ 239.100097] i915_gem_set_wedged RING_MODE: 0x00000300 [idle] <7>[ 239.100100] i915_gem_set_wedged RING_IMR: fffffefe <7>[ 239.100104] i915_gem_set_wedged ACTHD: 0x00000000_0000609c <7>[ 239.100108] i915_gem_set_wedged BBADDR: 0x00000000_0000609d <7>[ 239.100111] i915_gem_set_wedged DMA_FADDR: 0x00000000_00283260 <7>[ 239.100114] i915_gem_set_wedged IPEIR: 0x00000000 <7>[ 239.100117] i915_gem_set_wedged IPEHR: 0x02800000 <7>[ 239.100120] i915_gem_set_wedged Execlist status: 0x00044052 00000002 <7>[ 239.100124] i915_gem_set_wedged Execlist CSB read 5 [5 cached], write 5 [5 from hws], interrupt posted? no, tasklet queued? no (enabled) <7>[ 239.100128] i915_gem_set_wedged ELSP[0] count=1, ring->start=00283000, rq: 19a99 [e8c:5f] prio=1024 @ 5164ms: (null) <7>[ 239.100132] i915_gem_set_wedged ELSP[1] count=1, ring->start=00257000, rq: 19a9a [e81:1a] prio=139 @ 5164ms: igt/rcs0[5977]/1 <7>[ 239.100135] i915_gem_set_wedged HW active? 0x5 <7>[ 239.100250] i915_gem_set_wedged E 19a99 [e8c:5f] prio=1024 @ 5164ms: (null) <7>[ 239.100338] i915_gem_set_wedged E 19a9a [e81:1a] prio=139 @ 5164ms: igt/rcs0[5977]/1 <7>[ 239.100340] i915_gem_set_wedged Queue priority: 139 <7>[ 239.100343] i915_gem_set_wedged Q 0 [e98:19] prio=132 @ 5164ms: igt/rcs0[5977]/8 <7>[ 239.100346] i915_gem_set_wedged Q 0 [e84:19] prio=121 @ 5165ms: igt/rcs0[5977]/2 <7>[ 239.100349] i915_gem_set_wedged Q 0 [e87:19] prio=82 @ 5165ms: igt/rcs0[5977]/3 <7>[ 239.100352] i915_gem_set_wedged Q 0 [e84:1a] prio=44 @ 5164ms: igt/rcs0[5977]/2 <7>[ 239.100356] i915_gem_set_wedged Q 0 [e8b:19] prio=20 @ 5165ms: igt/rcs0[5977]/4 <7>[ 239.100362] i915_gem_set_wedged drv_selftest [5894] waiting for 19a99 where the GPU saw an arbitration point and idles; AND HAS NOT BEEN RESET! The RING_MODE indicates that is idle and has the STOP_RING bit set, so try clearing it. v2: Only clear the bit on restarting the ring, as we want to be sure the STOP_RING bit is kept if reset fails on wedging. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/intel_lrc.c | 3 +++ 1 file changed, 3 insertions(+)