diff mbox

drm/i915: Disable MSI for all pre-gen5

Message ID 20170626203051.28480-1-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ville Syrjälä June 26, 2017, 8:30 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

We have pretty clear evidence that MSIs are getting lost on g4x and
somehow the interrupt logic doesn't seem to recover from that state
even if we try hard to clear the IIR.

Disabling IER around the normal IIR clearing in the irq handler isn't
sufficient to avoid this, so the problem really seems to be further
up the interrupt chain. This should guarantee that there's always
an edge if any IIR bits are set after the interrupt handler is done,
which should normally guarantee that the CPU interrupt is generated.
That approach seems to work perfectly on VLV/CHV, but apparently
not on g4x.

MSI is documented to be broken on 965gm at least. The chipset spec
says MSI is defeatured because interrupts can be delayed or lost,
which fits well with what we're seeing on g4x. Previously we've
already disabled GMBUS interrupts on g4x because somehow GMBUS
manages to raise legacy interrupts even when MSI is enabled.

Since there's such widespread MSI breakahge all over in the pre-gen5
land let's just give up on MSI on these platforms.

Seqno reporting might be negatively affected by this since the legcy
interrupts aren't guaranteed to be ordered with the seqno writes,
whereas MSI interrupts may be? But an occasioanlly missed seqno
seems like a small price to pay for generally working interrupts.

Cc: stable@vger.kernel.org
Cc: Diego Viola <diego.viola@gmail.com>
Tested-by: Diego Viola <diego.viola@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101261
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Daniel Vetter June 27, 2017, 9:18 a.m. UTC | #1
On Mon, Jun 26, 2017 at 11:30:51PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> We have pretty clear evidence that MSIs are getting lost on g4x and
> somehow the interrupt logic doesn't seem to recover from that state
> even if we try hard to clear the IIR.
> 
> Disabling IER around the normal IIR clearing in the irq handler isn't
> sufficient to avoid this, so the problem really seems to be further
> up the interrupt chain. This should guarantee that there's always
> an edge if any IIR bits are set after the interrupt handler is done,
> which should normally guarantee that the CPU interrupt is generated.
> That approach seems to work perfectly on VLV/CHV, but apparently
> not on g4x.
> 
> MSI is documented to be broken on 965gm at least. The chipset spec
> says MSI is defeatured because interrupts can be delayed or lost,
> which fits well with what we're seeing on g4x. Previously we've
> already disabled GMBUS interrupts on g4x because somehow GMBUS
> manages to raise legacy interrupts even when MSI is enabled.
> 
> Since there's such widespread MSI breakahge all over in the pre-gen5
> land let's just give up on MSI on these platforms.
> 
> Seqno reporting might be negatively affected by this since the legcy
> interrupts aren't guaranteed to be ordered with the seqno writes,
> whereas MSI interrupts may be? But an occasioanlly missed seqno
> seems like a small price to pay for generally working interrupts.
> 
> Cc: stable@vger.kernel.org
> Cc: Diego Viola <diego.viola@gmail.com>
> Tested-by: Diego Viola <diego.viola@gmail.com>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101261
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

One caveat was that iirc we've also seen lost irq without msi, which was
originally the reason to enable this. But that was misremembering on my
side, apparently msi was done purely to get our own irq and avoid cpu
overhead with shared irq. See:

commit ed4cb4142b242d8090d3811d5eb4abf6aa985bc8
Author: Eric Anholt <eric@anholt.net>
Date:   Tue Jul 29 12:10:39 2008 -0700

    i915: Add support for MSI and interrupt mitigation.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index e9d8e9ee51b2..9167a73f3c69 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1141,10 +1141,12 @@ static int i915_driver_init_hw(struct drm_i915_private *dev_priv)
>  	 * and the registers being closely associated.
>  	 *
>  	 * According to chipset errata, on the 965GM, MSI interrupts may
> -	 * be lost or delayed, but we use them anyways to avoid
> -	 * stuck interrupts on some machines.
> +	 * be lost or delayed, and was defeatured. MSI interrupts seem to
> +	 * get lost on g4x as well, and interrupt delivery seems to stay
> +	 * properly dead afterwards. So we'll just disable them for all
> +	 * pre-gen5 chipsets.
>  	 */
> -	if (!IS_I945G(dev_priv) && !IS_I945GM(dev_priv)) {
> +	if (INTEL_GEN(dev_priv) >= 5) {
>  		if (pci_enable_msi(pdev) < 0)
>  			DRM_DEBUG_DRIVER("can't enable MSI");
>  	}
> -- 
> 2.13.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Ville Syrjälä June 27, 2017, 12:52 p.m. UTC | #2
On Mon, Jun 26, 2017 at 08:49:56PM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915: Disable MSI for all pre-gen5
> URL   : https://patchwork.freedesktop.org/series/26392/
> State : failure
> 
> == Summary ==
> 
> Series 26392v1 drm/i915: Disable MSI for all pre-gen5
> https://patchwork.freedesktop.org/api/1.0/series/26392/revisions/1/mbox/
> 
> Test kms_flip:
>         Subgroup basic-flip-vs-wf_vblank:
>                 fail       -> PASS       (fi-blb-e6850) fdo#101599
>                 pass       -> FAIL       (fi-skl-6770hq)

(kms_flip:4376) CRITICAL: Test assertion failure function calibrate_ts, file kms_flip.c:1207:
(kms_flip:4376) CRITICAL: Failed assertion: ev.sequence == last_seq + 1
(kms_flip:4376) CRITICAL: error: 230 != 229
Subtest basic-flip-vs-wf_vblank failed.

Not related since the patch didn't touch gen5+.

> 
> fdo#101599 https://bugs.freedesktop.org/show_bug.cgi?id=101599
> 
> fi-bdw-5557u     total:279  pass:268  dwarn:0   dfail:0   fail:0   skip:11  time:438s
> fi-bdw-gvtdvm    total:279  pass:257  dwarn:8   dfail:0   fail:0   skip:14  time:425s
> fi-blb-e6850     total:279  pass:224  dwarn:1   dfail:0   fail:0   skip:54  time:353s
> fi-bsw-n3050     total:279  pass:242  dwarn:1   dfail:0   fail:0   skip:36  time:521s
> fi-bxt-j4205     total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:509s
> fi-byt-j1900     total:279  pass:253  dwarn:2   dfail:0   fail:0   skip:24  time:489s
> fi-byt-n2820     total:279  pass:249  dwarn:2   dfail:0   fail:0   skip:28  time:481s
> fi-glk-2a        total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:597s
> fi-hsw-4770      total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:434s
> fi-hsw-4770r     total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:413s
> fi-ilk-650       total:279  pass:229  dwarn:0   dfail:0   fail:0   skip:50  time:417s
> fi-ivb-3520m     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:491s
> fi-ivb-3770      total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:472s
> fi-kbl-7500u     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:469s
> fi-kbl-7560u     total:279  pass:268  dwarn:1   dfail:0   fail:0   skip:10  time:569s
> fi-kbl-r         total:279  pass:260  dwarn:1   dfail:0   fail:0   skip:18  time:583s
> fi-pnv-d510      total:279  pass:221  dwarn:3   dfail:0   fail:0   skip:55  time:562s
> fi-skl-6260u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:464s
> fi-skl-6700k     total:279  pass:257  dwarn:4   dfail:0   fail:0   skip:18  time:468s
> fi-skl-6770hq    total:279  pass:268  dwarn:0   dfail:0   fail:1   skip:10  time:472s
> fi-skl-gvtdvm    total:279  pass:266  dwarn:0   dfail:0   fail:0   skip:13  time:431s
> fi-snb-2520m     total:279  pass:251  dwarn:0   dfail:0   fail:0   skip:28  time:544s
> fi-snb-2600      total:279  pass:250  dwarn:0   dfail:0   fail:0   skip:29  time:400s
> 
> 21d74e215ad650f0e8e30de609bd65601f0aa11d drm-tip: 2017y-06m-26d-09h-12m-14s UTC integration manifest
> dfa05d7 drm/i915: Disable MSI for all pre-gen5
> 
> == Logs ==
> 
> For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_5046/
Ville Syrjälä June 27, 2017, 1:16 p.m. UTC | #3
On Tue, Jun 27, 2017 at 11:18:33AM +0200, Daniel Vetter wrote:
> On Mon, Jun 26, 2017 at 11:30:51PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > We have pretty clear evidence that MSIs are getting lost on g4x and
> > somehow the interrupt logic doesn't seem to recover from that state
> > even if we try hard to clear the IIR.
> > 
> > Disabling IER around the normal IIR clearing in the irq handler isn't
> > sufficient to avoid this, so the problem really seems to be further
> > up the interrupt chain. This should guarantee that there's always
> > an edge if any IIR bits are set after the interrupt handler is done,
> > which should normally guarantee that the CPU interrupt is generated.
> > That approach seems to work perfectly on VLV/CHV, but apparently
> > not on g4x.
> > 
> > MSI is documented to be broken on 965gm at least. The chipset spec
> > says MSI is defeatured because interrupts can be delayed or lost,
> > which fits well with what we're seeing on g4x. Previously we've
> > already disabled GMBUS interrupts on g4x because somehow GMBUS
> > manages to raise legacy interrupts even when MSI is enabled.
> > 
> > Since there's such widespread MSI breakahge all over in the pre-gen5
> > land let's just give up on MSI on these platforms.
> > 
> > Seqno reporting might be negatively affected by this since the legcy
> > interrupts aren't guaranteed to be ordered with the seqno writes,
> > whereas MSI interrupts may be? But an occasioanlly missed seqno
> > seems like a small price to pay for generally working interrupts.
> > 
> > Cc: stable@vger.kernel.org
> > Cc: Diego Viola <diego.viola@gmail.com>
> > Tested-by: Diego Viola <diego.viola@gmail.com>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101261
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> One caveat was that iirc we've also seen lost irq without msi, which was
> originally the reason to enable this. But that was misremembering on my
> side, apparently msi was done purely to get our own irq and avoid cpu
> overhead with shared irq. See:
> 
> commit ed4cb4142b242d8090d3811d5eb4abf6aa985bc8
> Author: Eric Anholt <eric@anholt.net>
> Date:   Tue Jul 29 12:10:39 2008 -0700
> 
>     i915: Add support for MSI and interrupt mitigation.
> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Thanks for double checking the old lore.

Pushed to dinq.

> > ---
> >  drivers/gpu/drm/i915/i915_drv.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index e9d8e9ee51b2..9167a73f3c69 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1141,10 +1141,12 @@ static int i915_driver_init_hw(struct drm_i915_private *dev_priv)
> >  	 * and the registers being closely associated.
> >  	 *
> >  	 * According to chipset errata, on the 965GM, MSI interrupts may
> > -	 * be lost or delayed, but we use them anyways to avoid
> > -	 * stuck interrupts on some machines.
> > +	 * be lost or delayed, and was defeatured. MSI interrupts seem to
> > +	 * get lost on g4x as well, and interrupt delivery seems to stay
> > +	 * properly dead afterwards. So we'll just disable them for all
> > +	 * pre-gen5 chipsets.
> >  	 */
> > -	if (!IS_I945G(dev_priv) && !IS_I945GM(dev_priv)) {
> > +	if (INTEL_GEN(dev_priv) >= 5) {
> >  		if (pci_enable_msi(pdev) < 0)
> >  			DRM_DEBUG_DRIVER("can't enable MSI");
> >  	}
> > -- 
> > 2.13.0
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Chris Wilson June 27, 2017, 1:18 p.m. UTC | #4
Quoting ville.syrjala@linux.intel.com (2017-06-26 21:30:51)
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> We have pretty clear evidence that MSIs are getting lost on g4x and
> somehow the interrupt logic doesn't seem to recover from that state
> even if we try hard to clear the IIR.
> 
> Disabling IER around the normal IIR clearing in the irq handler isn't
> sufficient to avoid this, so the problem really seems to be further
> up the interrupt chain. This should guarantee that there's always
> an edge if any IIR bits are set after the interrupt handler is done,
> which should normally guarantee that the CPU interrupt is generated.
> That approach seems to work perfectly on VLV/CHV, but apparently
> not on g4x.
> 
> MSI is documented to be broken on 965gm at least. The chipset spec
> says MSI is defeatured because interrupts can be delayed or lost,
> which fits well with what we're seeing on g4x. Previously we've
> already disabled GMBUS interrupts on g4x because somehow GMBUS
> manages to raise legacy interrupts even when MSI is enabled.
> 
> Since there's such widespread MSI breakahge all over in the pre-gen5
> land let's just give up on MSI on these platforms.

An unmentioned sibling caught by this change is g33. It is very unlikely
to have been fixed in g33 but broken in g4x, still deserves a mention as
old gen3 used MSI and g33 tried to tame the beast.

> Seqno reporting might be negatively affected by this since the legcy
> interrupts aren't guaranteed to be ordered with the seqno writes,
> whereas MSI interrupts may be? But an occasioanlly missed seqno
> seems like a small price to pay for generally working interrupts.
> 
> Cc: stable@vger.kernel.org
> Cc: Diego Viola <diego.viola@gmail.com>
> Tested-by: Diego Viola <diego.viola@gmail.com>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101261
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index e9d8e9ee51b2..9167a73f3c69 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1141,10 +1141,12 @@  static int i915_driver_init_hw(struct drm_i915_private *dev_priv)
 	 * and the registers being closely associated.
 	 *
 	 * According to chipset errata, on the 965GM, MSI interrupts may
-	 * be lost or delayed, but we use them anyways to avoid
-	 * stuck interrupts on some machines.
+	 * be lost or delayed, and was defeatured. MSI interrupts seem to
+	 * get lost on g4x as well, and interrupt delivery seems to stay
+	 * properly dead afterwards. So we'll just disable them for all
+	 * pre-gen5 chipsets.
 	 */
-	if (!IS_I945G(dev_priv) && !IS_I945GM(dev_priv)) {
+	if (INTEL_GEN(dev_priv) >= 5) {
 		if (pci_enable_msi(pdev) < 0)
 			DRM_DEBUG_DRIVER("can't enable MSI");
 	}