Message ID | 1492082127-29007-1-git-send-email-mika.kuoppala@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 13, 2017 at 02:15:27PM +0300, Mika Kuoppala wrote: > Previously with commit a9c1f90c8e17 > ("drm/i915: Don't mask EI UP interrupt on IVB|SNB") certain, > seemingly unrelated bit (GEN6_PM_RP_UP_EI_EXPIRED) was needed > to be unmasked for IVB and SNB in order to prevent system hang > with chained batchbuffers. > > Our CI was seeing incomplete results with tests that used > chained batches and it was found out that HSW needs to have this > same bit unmasked to reliably survive chained batches. > > Always unmask GEN6_PM_RP_UP_EI_EXPIRED on Haswell to > prevent system hang with batch chaining. > > Testcase: igt/gem_exec_fence/nb-await-default > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100672 > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: stable@vger.kernel.org > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> * facepalm. I am amazed that took so long for us to notice. Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Did we ever get a w/a identifier for this? -Chris
On Thu, Apr 13, 2017 at 11:39:40AM -0000, Patchwork wrote: > == Series Details == > > Series: drm/i915: Fix system hang with EI UP masked on Haswell > URL : https://patchwork.freedesktop.org/series/22991/ > State : failure > > == Summary == > > Series 22991v1 drm/i915: Fix system hang with EI UP masked on Haswell > https://patchwork.freedesktop.org/api/1.0/series/22991/revisions/1/mbox/ > > Test gem_exec_flush: > Subgroup basic-batch-kernel-default-uc: > pass -> FAIL (fi-snb-2600) fdo#100007 > Test gem_exec_suspend: > Subgroup basic-s4-devices: > pass -> DMESG-WARN (fi-kbl-7560u) fdo#100125 > Test kms_cursor_legacy: > Subgroup basic-flip-before-cursor-varying-size: > pass -> INCOMPLETE (fi-bxt-t5700) > > fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007 > fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125 It passes the irony test ;) -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > On Thu, Apr 13, 2017 at 02:15:27PM +0300, Mika Kuoppala wrote: >> Previously with commit a9c1f90c8e17 >> ("drm/i915: Don't mask EI UP interrupt on IVB|SNB") certain, >> seemingly unrelated bit (GEN6_PM_RP_UP_EI_EXPIRED) was needed >> to be unmasked for IVB and SNB in order to prevent system hang >> with chained batchbuffers. >> >> Our CI was seeing incomplete results with tests that used >> chained batches and it was found out that HSW needs to have this >> same bit unmasked to reliably survive chained batches. >> >> Always unmask GEN6_PM_RP_UP_EI_EXPIRED on Haswell to >> prevent system hang with batch chaining. >> >> Testcase: igt/gem_exec_fence/nb-await-default >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100672 >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: stable@vger.kernel.org >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > > * facepalm. > > I am amazed that took so long for us to notice. It could be that we don't have chained so much in CI. Also it seems to be more subtle than with IVB. With spin batch it didnt surface but with nb-await-default the store/spin and possibly(?) the cpu side sleep lured it out. > Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Thanks. > > Did we ever get a w/a identifier for this? Not that I know of. And in retrospect excluding hsw was not wise in the original patch. It was v3 where it was excluded but I didn't find the trail that lead there. Trusting it not to inherit the peculiarities... I like to think that we tested and it never hung with straight up busy chaining. nb-await-default is more sophisticated. -Mika
Patchwork <patchwork@emeril.freedesktop.org> writes: > == Series Details == > > Series: drm/i915: Fix system hang with EI UP masked on Haswell > URL : https://patchwork.freedesktop.org/series/22991/ > State : failure > > == Summary == > > Series 22991v1 drm/i915: Fix system hang with EI UP masked on Haswell > https://patchwork.freedesktop.org/api/1.0/series/22991/revisions/1/mbox/ > > Test gem_exec_flush: > Subgroup basic-batch-kernel-default-uc: > pass -> FAIL (fi-snb-2600) fdo#100007 > Test gem_exec_suspend: > Subgroup basic-s4-devices: > pass -> DMESG-WARN (fi-kbl-7560u) fdo#100125 > Test kms_cursor_legacy: > Subgroup basic-flip-before-cursor-varying-size: > pass -> INCOMPLETE (fi-bxt-t5700) Patch is not affecting BXT so it has to be bug in drm-tip: https://bugs.freedesktop.org/show_bug.cgi?id=100706 -Mika > > fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007 > fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125 > > fi-bdw-5557u total:278 pass:267 dwarn:0 dfail:0 fail:0 skip:11 time:434s > fi-bdw-gvtdvm total:278 pass:256 dwarn:8 dfail:0 fail:0 skip:14 time:429s > fi-bsw-n3050 total:278 pass:242 dwarn:0 dfail:0 fail:0 skip:36 time:580s > fi-bxt-j4205 total:278 pass:259 dwarn:0 dfail:0 fail:0 skip:19 time:507s > fi-bxt-t5700 total:206 pass:192 dwarn:0 dfail:0 fail:0 skip:13 > fi-byt-j1900 total:278 pass:254 dwarn:0 dfail:0 fail:0 skip:24 time:487s > fi-byt-n2820 total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time:482s > fi-hsw-4770 total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time:414s > fi-hsw-4770r total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time:405s > fi-ilk-650 total:278 pass:228 dwarn:0 dfail:0 fail:0 skip:50 time:418s > fi-ivb-3520m total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:488s > fi-ivb-3770 total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:463s > fi-kbl-7500u total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time:455s > fi-kbl-7560u total:278 pass:267 dwarn:1 dfail:0 fail:0 skip:10 time:565s > fi-skl-6260u total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time:459s > fi-skl-6700hq total:278 pass:261 dwarn:0 dfail:0 fail:0 skip:17 time:574s > fi-skl-6700k total:278 pass:256 dwarn:4 dfail:0 fail:0 skip:18 time:460s > fi-skl-6770hq total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time:491s > fi-skl-gvtdvm total:278 pass:265 dwarn:0 dfail:0 fail:0 skip:13 time:441s > fi-snb-2520m total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time:533s > fi-snb-2600 total:278 pass:248 dwarn:0 dfail:0 fail:1 skip:29 time:402s > > 6184edce6665aee9c9131149a7b9314a1313eaf9 drm-tip: 2017y-04m-13d-08h-27m-10s UTC integration manifest > aee691a drm/i915: Fix system hang with EI UP masked on Haswell > > == Logs == > > For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4501/
Chris Wilson <chris@chris-wilson.co.uk> writes: > On Thu, Apr 13, 2017 at 02:15:27PM +0300, Mika Kuoppala wrote: >> Previously with commit a9c1f90c8e17 >> ("drm/i915: Don't mask EI UP interrupt on IVB|SNB") certain, >> seemingly unrelated bit (GEN6_PM_RP_UP_EI_EXPIRED) was needed >> to be unmasked for IVB and SNB in order to prevent system hang >> with chained batchbuffers. >> >> Our CI was seeing incomplete results with tests that used >> chained batches and it was found out that HSW needs to have this >> same bit unmasked to reliably survive chained batches. >> >> Always unmask GEN6_PM_RP_UP_EI_EXPIRED on Haswell to >> prevent system hang with batch chaining. >> >> Testcase: igt/gem_exec_fence/nb-await-default >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100672 >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: stable@vger.kernel.org >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > > * facepalm. > > I am amazed that took so long for us to notice. > Acked-by: Chris Wilson <chris@chris-wilson.co.uk> > Pushed to drm-intel-next-queued. Thanks. -Mika
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index d9d1969..fd97fe0 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -4252,12 +4252,12 @@ void intel_irq_init(struct drm_i915_private *dev_priv) dev_priv->rps.pm_intrmsk_mbz = 0; /* - * SNB,IVB can while VLV,CHV may hard hang on looping batchbuffer + * SNB,IVB,HSW can while VLV,CHV may hard hang on looping batchbuffer * if GEN6_PM_UP_EI_EXPIRED is masked. * * TODO: verify if this can be reproduced on VLV,CHV. */ - if (INTEL_INFO(dev_priv)->gen <= 7 && !IS_HASWELL(dev_priv)) + if (INTEL_INFO(dev_priv)->gen <= 7) dev_priv->rps.pm_intrmsk_mbz |= GEN6_PM_RP_UP_EI_EXPIRED; if (INTEL_INFO(dev_priv)->gen >= 8)
Previously with commit a9c1f90c8e17 ("drm/i915: Don't mask EI UP interrupt on IVB|SNB") certain, seemingly unrelated bit (GEN6_PM_RP_UP_EI_EXPIRED) was needed to be unmasked for IVB and SNB in order to prevent system hang with chained batchbuffers. Our CI was seeing incomplete results with tests that used chained batches and it was found out that HSW needs to have this same bit unmasked to reliably survive chained batches. Always unmask GEN6_PM_RP_UP_EI_EXPIRED on Haswell to prevent system hang with batch chaining. Testcase: igt/gem_exec_fence/nb-await-default Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100672 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/i915_irq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)