drm/i915: Officially give up on seqno coherency
diff mbox

Message ID 1434741369-28932-1-git-send-email-daniel.vetter@ffwll.ch
State New
Headers show

Commit Message

Daniel Vetter June 19, 2015, 7:16 p.m. UTC
We've never figured out the magic trick to make irq vs. seqno
updates coherent, only tricks to make it work. And since

commit 094f9a54e35500739da185cdb78f2e92fc379458
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Sep 25 17:34:55 2013 +0100

    drm/i915: Fix __wait_seqno to use true infinite timeouts

we automatically fall back to an irq augmented with polling scheme
after the first missed interrupt. There's really nothing else we can
do, hence tune down the message to informational level. It's still
useful for users in case it reliable preceedes a hard system hang.

v2: Use NOTICE since it might be of value for bug reports (Chris).

Cc: Mark Janes <mark.a.janes@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Chris Wilson June 19, 2015, 7:26 p.m. UTC | #1
On Fri, Jun 19, 2015 at 09:16:09PM +0200, Daniel Vetter wrote:
> We've never figured out the magic trick to make irq vs. seqno
> updates coherent, only tricks to make it work. And since
> 
> commit 094f9a54e35500739da185cdb78f2e92fc379458
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Sep 25 17:34:55 2013 +0100
> 
>     drm/i915: Fix __wait_seqno to use true infinite timeouts
> 
> we automatically fall back to an irq augmented with polling scheme
> after the first missed interrupt. There's really nothing else we can
> do, hence tune down the message to informational level. It's still
> useful for users in case it reliable preceedes a hard system hang.
> 
> v2: Use NOTICE since it might be of value for bug reports (Chris).
> 
> Cc: Mark Janes <mark.a.janes@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Now all we need to is to save the GPU state to the pstore in the
picoseconds before a hard hang, and we'll be sorted.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Jani Nikula June 23, 2015, 10:05 a.m. UTC | #2
On Fri, 19 Jun 2015, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> We've never figured out the magic trick to make irq vs. seqno
> updates coherent, only tricks to make it work. And since
>
> commit 094f9a54e35500739da185cdb78f2e92fc379458
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Sep 25 17:34:55 2013 +0100
>
>     drm/i915: Fix __wait_seqno to use true infinite timeouts
>
> we automatically fall back to an irq augmented with polling scheme
> after the first missed interrupt. There's really nothing else we can
> do, hence tune down the message to informational level. It's still
> useful for users in case it reliable preceedes a hard system hang.
>
> v2: Use NOTICE since it might be of value for bug reports (Chris).
>
> Cc: Mark Janes <mark.a.janes@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index e6bb72dca3ff..5072fb49367e 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2946,8 +2946,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  					/* Issue a wake-up to catch stuck h/w. */
>  					if (!test_and_set_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings)) {
>  						if (!(dev_priv->gpu_error.test_irq_rings & intel_ring_flag(ring)))
> -							DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
> -								  ring->name);
> +							DRM_NOTICE("Hangcheck timer elapsed... %s idle\n",

drivers/gpu/drm/i915/i915_irq.c: In function ‘i915_hangcheck_elapsed’:
drivers/gpu/drm/i915/i915_irq.c:2949:8: error: implicit declaration of function ‘DRM_NOTICE’ [-Werror=implicit-function-declaration]
        DRM_NOTICE("Hangcheck timer elapsed... %s idle\n",
        ^


BR,
Jani.


> +								   ring->name);
>  						else
>  							DRM_INFO("Fake missed irq on %s\n",
>  								 ring->name);
> -- 
> 2.1.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter June 23, 2015, 11:52 a.m. UTC | #3
On Tue, Jun 23, 2015 at 01:05:41PM +0300, Jani Nikula wrote:
> On Fri, 19 Jun 2015, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > We've never figured out the magic trick to make irq vs. seqno
> > updates coherent, only tricks to make it work. And since
> >
> > commit 094f9a54e35500739da185cdb78f2e92fc379458
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Wed Sep 25 17:34:55 2013 +0100
> >
> >     drm/i915: Fix __wait_seqno to use true infinite timeouts
> >
> > we automatically fall back to an irq augmented with polling scheme
> > after the first missed interrupt. There's really nothing else we can
> > do, hence tune down the message to informational level. It's still
> > useful for users in case it reliable preceedes a hard system hang.
> >
> > v2: Use NOTICE since it might be of value for bug reports (Chris).
> >
> > Cc: Mark Janes <mark.a.janes@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index e6bb72dca3ff..5072fb49367e 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -2946,8 +2946,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> >  					/* Issue a wake-up to catch stuck h/w. */
> >  					if (!test_and_set_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings)) {
> >  						if (!(dev_priv->gpu_error.test_irq_rings & intel_ring_flag(ring)))
> > -							DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
> > -								  ring->name);
> > +							DRM_NOTICE("Hangcheck timer elapsed... %s idle\n",
> 
> drivers/gpu/drm/i915/i915_irq.c: In function ‘i915_hangcheck_elapsed’:
> drivers/gpu/drm/i915/i915_irq.c:2949:8: error: implicit declaration of function ‘DRM_NOTICE’ [-Werror=implicit-function-declaration]
>         DRM_NOTICE("Hangcheck timer elapsed... %s idle\n",
>         ^

Embarassing. Can you pick up v1 instead please?
-Daniel

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e6bb72dca3ff..5072fb49367e 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2946,8 +2946,8 @@  static void i915_hangcheck_elapsed(struct work_struct *work)
 					/* Issue a wake-up to catch stuck h/w. */
 					if (!test_and_set_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings)) {
 						if (!(dev_priv->gpu_error.test_irq_rings & intel_ring_flag(ring)))
-							DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
-								  ring->name);
+							DRM_NOTICE("Hangcheck timer elapsed... %s idle\n",
+								   ring->name);
 						else
 							DRM_INFO("Fake missed irq on %s\n",
 								 ring->name);