diff mbox

RFC drm/i915: Add a sunset clause to GPU hang logging

Message ID 20161014134428.29582-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Oct. 14, 2016, 1:44 p.m. UTC
If the kernel is old, more than a few releases old, chances are that the
user is using an old kernel for a good reason, despite there being GPU
hangs. After 180days since driver release stop suggesting that they
should send those reports upstream.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h       | 1 +
 drivers/gpu/drm/i915/i915_gpu_error.c | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Saarinen, Jani Oct. 14, 2016, 5:09 p.m. UTC | #1
> == Series Details ==

> 

> Series: RFC drm/i915: Add a sunset clause to GPU hang logging

> URL   : https://patchwork.freedesktop.org/series/13788/

> State : warning

> 

> == Summary ==

> 

> Series 13788v1 RFC drm/i915: Add a sunset clause to GPU hang logging

> https://patchwork.freedesktop.org/api/1.0/series/13788/revisions/1/mbox/

> 

> Test kms_force_connector_basic:

>         Subgroup prune-stale-modes:

>                 pass       -> SKIP       (fi-snb-2600)

Have been stable before:
IGT-Version: 1.16-g171b21d (x86_64) (Linux: 4.8.0-CI-Patchwork_2722+ x86_64)
Test requirement not met in function main, file kms_force_connector_basic.c:111:
Test requirement: !(vga_connector->connection == DRM_MODE_CONNECTED)
Last errno: 2, No such file or directory
Subtest prune-stale-modes: SKIP

> Test kms_pipe_crc_basic:

>         Subgroup read-crc-pipe-b-frame-sequence:

>                 dmesg-warn -> PASS       (fi-skl-6770hq)


>         Subgroup suspend-read-crc-pipe-a:

>                 pass       -> DMESG-WARN (fi-byt-j1900)

>         Subgroup suspend-read-crc-pipe-b:

>                 pass       -> DMESG-WARN (fi-byt-j1900)

For a/b still: https://bugs.freedesktop.org/show_bug.cgi?id=98040

> Test vgem_basic:

>         Subgroup unload:

>                 pass       -> SKIP       (fi-hsw-4770)

>                 skip       -> PASS       (fi-kbl-7200u)

>                 skip       -> PASS       (fi-skl-6700k)

Still unstable test on HSW, SKL's and KBL.

> 

> fi-bdw-5557u     total:246  pass:231  dwarn:0   dfail:0   fail:0   skip:15

> fi-bsw-n3050     total:246  pass:204  dwarn:0   dfail:0   fail:0   skip:42

> fi-bxt-t5700     total:246  pass:216  dwarn:0   dfail:0   fail:0   skip:30

> fi-byt-j1900     total:246  pass:212  dwarn:2   dfail:0   fail:1   skip:31

> fi-byt-n2820     total:246  pass:210  dwarn:0   dfail:0   fail:1   skip:35

> fi-hsw-4770      total:246  pass:223  dwarn:0   dfail:0   fail:0   skip:23

> fi-hsw-4770r     total:246  pass:224  dwarn:0   dfail:0   fail:0   skip:22

> fi-ilk-650       total:246  pass:184  dwarn:0   dfail:0   fail:2   skip:60

> fi-ivb-3520m     total:246  pass:221  dwarn:0   dfail:0   fail:0   skip:25

> fi-ivb-3770      total:246  pass:221  dwarn:0   dfail:0   fail:0   skip:25

> fi-kbl-7200u     total:246  pass:222  dwarn:0   dfail:0   fail:0   skip:24

> fi-skl-6260u     total:246  pass:232  dwarn:0   dfail:0   fail:0   skip:14

> fi-skl-6700hq    total:246  pass:223  dwarn:0   dfail:0   fail:0   skip:23

> fi-skl-6700k     total:246  pass:221  dwarn:1   dfail:0   fail:0   skip:24

> fi-skl-6770hq    total:246  pass:230  dwarn:1   dfail:0   fail:1   skip:14

> fi-snb-2520m     total:246  pass:210  dwarn:0   dfail:0   fail:0   skip:36

> fi-snb-2600      total:246  pass:208  dwarn:0   dfail:0   fail:0   skip:38

> 

> Results at /archive/results/CI_IGT_test/Patchwork_2722/

> 

> e086610ff079f1bf1fe91d4ab175443590cacb8d drm-intel-nightly: 2016y-10m-

> 14d-11h-43m-09s UTC integration manifest

> 5a74e57 RFC drm/i915: Add a sunset clause to GPU hang logging

> 


Jani Saarinen
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
Chris Wilson Oct. 14, 2016, 5:30 p.m. UTC | #2
On Fri, Oct 14, 2016 at 05:09:08PM +0000, Saarinen, Jani wrote:
> > == Series Details ==
> > 
> > Series: RFC drm/i915: Add a sunset clause to GPU hang logging
> > URL   : https://patchwork.freedesktop.org/series/13788/
> > State : warning
> > 
> > == Summary ==
> > 
> > Series 13788v1 RFC drm/i915: Add a sunset clause to GPU hang logging
> > https://patchwork.freedesktop.org/api/1.0/series/13788/revisions/1/mbox/
> > 
> > Test kms_force_connector_basic:
> >         Subgroup prune-stale-modes:
> >                 pass       -> SKIP       (fi-snb-2600)
> Have been stable before:
> IGT-Version: 1.16-g171b21d (x86_64) (Linux: 4.8.0-CI-Patchwork_2722+ x86_64)
> Test requirement not met in function main, file kms_force_connector_basic.c:111:
> Test requirement: !(vga_connector->connection == DRM_MODE_CONNECTED)
> Last errno: 2, No such file or directory
> Subtest prune-stale-modes: SKIP

This is an indication that it is not as stable as you think it is ;)
-Chris
Joonas Lahtinen Oct. 17, 2016, 12:33 p.m. UTC | #3
On pe, 2016-10-14 at 14:44 +0100, Chris Wilson wrote:
> If the kernel is old, more than a few releases old, chances are that the
> user is using an old kernel for a good reason, despite there being GPU
> hangs. After 180days since driver release stop suggesting that they
> should send those reports upstream.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Maybe we could even explicitly state that bugs should be reported to
the distro bugzilla because of running an old kernel?

Regards, Joonas
Daniel Vetter Oct. 17, 2016, 2:10 p.m. UTC | #4
On Mon, Oct 17, 2016 at 03:33:43PM +0300, Joonas Lahtinen wrote:
> On pe, 2016-10-14 at 14:44 +0100, Chris Wilson wrote:
> > If the kernel is old, more than a few releases old, chances are that the
> > user is using an old kernel for a good reason, despite there being GPU
> > hangs. After 180days since driver release stop suggesting that they
> > should send those reports upstream.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Maybe we could even explicitly state that bugs should be reported to
> the distro bugzilla because of running an old kernel?

Distro's already shut down our warnings "because too much noise", I don't
think that's valuable.

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Joonas Lahtinen Oct. 17, 2016, 2:26 p.m. UTC | #5
On ma, 2016-10-17 at 16:10 +0200, Daniel Vetter wrote:
> On Mon, Oct 17, 2016 at 03:33:43PM +0300, Joonas Lahtinen wrote:
> > Maybe we could even explicitly state that bugs should be reported to
> > the distro bugzilla because of running an old kernel?
> 
> Distro's already shut down our warnings "because too much noise", I don't
> think that's valuable.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Just needs a patch to DIM to bump the timestamp.

Regards, Joonas

> 
> > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0e82f04ac3d6..0719104ebdd5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -73,6 +73,7 @@ 
 #define DRIVER_NAME		"i915"
 #define DRIVER_DESC		"Intel Graphics"
 #define DRIVER_DATE		"20161010"
+#define DRIVER_TIMESTAMP	1476452087
 
 #undef WARN_ON
 /* Many gcc seem to no see through this and fall over :( */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2275a8d91539..e757783f935b 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1551,6 +1551,8 @@  static int capture(void *data)
 	return 0;
 }
 
+#define DAY_AS_SECONDS(x) (24 * 60 * 60 * (x))
+
 /**
  * i915_capture_error_state - capture an error record for later analysis
  * @dev: drm device
@@ -1603,7 +1605,8 @@  void i915_capture_error_state(struct drm_i915_private *dev_priv,
 		return;
 	}
 
-	if (!warned) {
+	if (!warned &&
+	    ktime_get_real_seconds() - DRIVER_TIMESTAMP < DAY_AS_SECONDS(180)) {
 		DRM_INFO("GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.\n");
 		DRM_INFO("Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel\n");
 		DRM_INFO("drm/i915 developers can then reassign to the right component if it's not a kernel issue.\n");