Message ID | 20170626203051.28480-1-ville.syrjala@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jun 26, 2017 at 11:30:51PM +0300, ville.syrjala@linux.intel.com wrote: > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > We have pretty clear evidence that MSIs are getting lost on g4x and > somehow the interrupt logic doesn't seem to recover from that state > even if we try hard to clear the IIR. > > Disabling IER around the normal IIR clearing in the irq handler isn't > sufficient to avoid this, so the problem really seems to be further > up the interrupt chain. This should guarantee that there's always > an edge if any IIR bits are set after the interrupt handler is done, > which should normally guarantee that the CPU interrupt is generated. > That approach seems to work perfectly on VLV/CHV, but apparently > not on g4x. > > MSI is documented to be broken on 965gm at least. The chipset spec > says MSI is defeatured because interrupts can be delayed or lost, > which fits well with what we're seeing on g4x. Previously we've > already disabled GMBUS interrupts on g4x because somehow GMBUS > manages to raise legacy interrupts even when MSI is enabled. > > Since there's such widespread MSI breakahge all over in the pre-gen5 > land let's just give up on MSI on these platforms. > > Seqno reporting might be negatively affected by this since the legcy > interrupts aren't guaranteed to be ordered with the seqno writes, > whereas MSI interrupts may be? But an occasioanlly missed seqno > seems like a small price to pay for generally working interrupts. > > Cc: stable@vger.kernel.org > Cc: Diego Viola <diego.viola@gmail.com> > Tested-by: Diego Viola <diego.viola@gmail.com> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101261 > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> One caveat was that iirc we've also seen lost irq without msi, which was originally the reason to enable this. But that was misremembering on my side, apparently msi was done purely to get our own irq and avoid cpu overhead with shared irq. See: commit ed4cb4142b242d8090d3811d5eb4abf6aa985bc8 Author: Eric Anholt <eric@anholt.net> Date: Tue Jul 29 12:10:39 2008 -0700 i915: Add support for MSI and interrupt mitigation. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > --- > drivers/gpu/drm/i915/i915_drv.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index e9d8e9ee51b2..9167a73f3c69 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -1141,10 +1141,12 @@ static int i915_driver_init_hw(struct drm_i915_private *dev_priv) > * and the registers being closely associated. > * > * According to chipset errata, on the 965GM, MSI interrupts may > - * be lost or delayed, but we use them anyways to avoid > - * stuck interrupts on some machines. > + * be lost or delayed, and was defeatured. MSI interrupts seem to > + * get lost on g4x as well, and interrupt delivery seems to stay > + * properly dead afterwards. So we'll just disable them for all > + * pre-gen5 chipsets. > */ > - if (!IS_I945G(dev_priv) && !IS_I945GM(dev_priv)) { > + if (INTEL_GEN(dev_priv) >= 5) { > if (pci_enable_msi(pdev) < 0) > DRM_DEBUG_DRIVER("can't enable MSI"); > } > -- > 2.13.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Mon, Jun 26, 2017 at 08:49:56PM -0000, Patchwork wrote: > == Series Details == > > Series: drm/i915: Disable MSI for all pre-gen5 > URL : https://patchwork.freedesktop.org/series/26392/ > State : failure > > == Summary == > > Series 26392v1 drm/i915: Disable MSI for all pre-gen5 > https://patchwork.freedesktop.org/api/1.0/series/26392/revisions/1/mbox/ > > Test kms_flip: > Subgroup basic-flip-vs-wf_vblank: > fail -> PASS (fi-blb-e6850) fdo#101599 > pass -> FAIL (fi-skl-6770hq) (kms_flip:4376) CRITICAL: Test assertion failure function calibrate_ts, file kms_flip.c:1207: (kms_flip:4376) CRITICAL: Failed assertion: ev.sequence == last_seq + 1 (kms_flip:4376) CRITICAL: error: 230 != 229 Subtest basic-flip-vs-wf_vblank failed. Not related since the patch didn't touch gen5+. > > fdo#101599 https://bugs.freedesktop.org/show_bug.cgi?id=101599 > > fi-bdw-5557u total:279 pass:268 dwarn:0 dfail:0 fail:0 skip:11 time:438s > fi-bdw-gvtdvm total:279 pass:257 dwarn:8 dfail:0 fail:0 skip:14 time:425s > fi-blb-e6850 total:279 pass:224 dwarn:1 dfail:0 fail:0 skip:54 time:353s > fi-bsw-n3050 total:279 pass:242 dwarn:1 dfail:0 fail:0 skip:36 time:521s > fi-bxt-j4205 total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:509s > fi-byt-j1900 total:279 pass:253 dwarn:2 dfail:0 fail:0 skip:24 time:489s > fi-byt-n2820 total:279 pass:249 dwarn:2 dfail:0 fail:0 skip:28 time:481s > fi-glk-2a total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:597s > fi-hsw-4770 total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:434s > fi-hsw-4770r total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:413s > fi-ilk-650 total:279 pass:229 dwarn:0 dfail:0 fail:0 skip:50 time:417s > fi-ivb-3520m total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:491s > fi-ivb-3770 total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:472s > fi-kbl-7500u total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:469s > fi-kbl-7560u total:279 pass:268 dwarn:1 dfail:0 fail:0 skip:10 time:569s > fi-kbl-r total:279 pass:260 dwarn:1 dfail:0 fail:0 skip:18 time:583s > fi-pnv-d510 total:279 pass:221 dwarn:3 dfail:0 fail:0 skip:55 time:562s > fi-skl-6260u total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:464s > fi-skl-6700k total:279 pass:257 dwarn:4 dfail:0 fail:0 skip:18 time:468s > fi-skl-6770hq total:279 pass:268 dwarn:0 dfail:0 fail:1 skip:10 time:472s > fi-skl-gvtdvm total:279 pass:266 dwarn:0 dfail:0 fail:0 skip:13 time:431s > fi-snb-2520m total:279 pass:251 dwarn:0 dfail:0 fail:0 skip:28 time:544s > fi-snb-2600 total:279 pass:250 dwarn:0 dfail:0 fail:0 skip:29 time:400s > > 21d74e215ad650f0e8e30de609bd65601f0aa11d drm-tip: 2017y-06m-26d-09h-12m-14s UTC integration manifest > dfa05d7 drm/i915: Disable MSI for all pre-gen5 > > == Logs == > > For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_5046/
On Tue, Jun 27, 2017 at 11:18:33AM +0200, Daniel Vetter wrote: > On Mon, Jun 26, 2017 at 11:30:51PM +0300, ville.syrjala@linux.intel.com wrote: > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > We have pretty clear evidence that MSIs are getting lost on g4x and > > somehow the interrupt logic doesn't seem to recover from that state > > even if we try hard to clear the IIR. > > > > Disabling IER around the normal IIR clearing in the irq handler isn't > > sufficient to avoid this, so the problem really seems to be further > > up the interrupt chain. This should guarantee that there's always > > an edge if any IIR bits are set after the interrupt handler is done, > > which should normally guarantee that the CPU interrupt is generated. > > That approach seems to work perfectly on VLV/CHV, but apparently > > not on g4x. > > > > MSI is documented to be broken on 965gm at least. The chipset spec > > says MSI is defeatured because interrupts can be delayed or lost, > > which fits well with what we're seeing on g4x. Previously we've > > already disabled GMBUS interrupts on g4x because somehow GMBUS > > manages to raise legacy interrupts even when MSI is enabled. > > > > Since there's such widespread MSI breakahge all over in the pre-gen5 > > land let's just give up on MSI on these platforms. > > > > Seqno reporting might be negatively affected by this since the legcy > > interrupts aren't guaranteed to be ordered with the seqno writes, > > whereas MSI interrupts may be? But an occasioanlly missed seqno > > seems like a small price to pay for generally working interrupts. > > > > Cc: stable@vger.kernel.org > > Cc: Diego Viola <diego.viola@gmail.com> > > Tested-by: Diego Viola <diego.viola@gmail.com> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101261 > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > One caveat was that iirc we've also seen lost irq without msi, which was > originally the reason to enable this. But that was misremembering on my > side, apparently msi was done purely to get our own irq and avoid cpu > overhead with shared irq. See: > > commit ed4cb4142b242d8090d3811d5eb4abf6aa985bc8 > Author: Eric Anholt <eric@anholt.net> > Date: Tue Jul 29 12:10:39 2008 -0700 > > i915: Add support for MSI and interrupt mitigation. > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Thanks for double checking the old lore. Pushed to dinq. > > --- > > drivers/gpu/drm/i915/i915_drv.c | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > > index e9d8e9ee51b2..9167a73f3c69 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.c > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > @@ -1141,10 +1141,12 @@ static int i915_driver_init_hw(struct drm_i915_private *dev_priv) > > * and the registers being closely associated. > > * > > * According to chipset errata, on the 965GM, MSI interrupts may > > - * be lost or delayed, but we use them anyways to avoid > > - * stuck interrupts on some machines. > > + * be lost or delayed, and was defeatured. MSI interrupts seem to > > + * get lost on g4x as well, and interrupt delivery seems to stay > > + * properly dead afterwards. So we'll just disable them for all > > + * pre-gen5 chipsets. > > */ > > - if (!IS_I945G(dev_priv) && !IS_I945GM(dev_priv)) { > > + if (INTEL_GEN(dev_priv) >= 5) { > > if (pci_enable_msi(pdev) < 0) > > DRM_DEBUG_DRIVER("can't enable MSI"); > > } > > -- > > 2.13.0 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
Quoting ville.syrjala@linux.intel.com (2017-06-26 21:30:51) > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > We have pretty clear evidence that MSIs are getting lost on g4x and > somehow the interrupt logic doesn't seem to recover from that state > even if we try hard to clear the IIR. > > Disabling IER around the normal IIR clearing in the irq handler isn't > sufficient to avoid this, so the problem really seems to be further > up the interrupt chain. This should guarantee that there's always > an edge if any IIR bits are set after the interrupt handler is done, > which should normally guarantee that the CPU interrupt is generated. > That approach seems to work perfectly on VLV/CHV, but apparently > not on g4x. > > MSI is documented to be broken on 965gm at least. The chipset spec > says MSI is defeatured because interrupts can be delayed or lost, > which fits well with what we're seeing on g4x. Previously we've > already disabled GMBUS interrupts on g4x because somehow GMBUS > manages to raise legacy interrupts even when MSI is enabled. > > Since there's such widespread MSI breakahge all over in the pre-gen5 > land let's just give up on MSI on these platforms. An unmentioned sibling caught by this change is g33. It is very unlikely to have been fixed in g33 but broken in g4x, still deserves a mention as old gen3 used MSI and g33 tried to tame the beast. > Seqno reporting might be negatively affected by this since the legcy > interrupts aren't guaranteed to be ordered with the seqno writes, > whereas MSI interrupts may be? But an occasioanlly missed seqno > seems like a small price to pay for generally working interrupts. > > Cc: stable@vger.kernel.org > Cc: Diego Viola <diego.viola@gmail.com> > Tested-by: Diego Viola <diego.viola@gmail.com> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101261 > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index e9d8e9ee51b2..9167a73f3c69 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1141,10 +1141,12 @@ static int i915_driver_init_hw(struct drm_i915_private *dev_priv) * and the registers being closely associated. * * According to chipset errata, on the 965GM, MSI interrupts may - * be lost or delayed, but we use them anyways to avoid - * stuck interrupts on some machines. + * be lost or delayed, and was defeatured. MSI interrupts seem to + * get lost on g4x as well, and interrupt delivery seems to stay + * properly dead afterwards. So we'll just disable them for all + * pre-gen5 chipsets. */ - if (!IS_I945G(dev_priv) && !IS_I945GM(dev_priv)) { + if (INTEL_GEN(dev_priv) >= 5) { if (pci_enable_msi(pdev) < 0) DRM_DEBUG_DRIVER("can't enable MSI"); }