diff mbox

drm/i915: Report to userspace if we have a (presumed) working GPU reset

Message ID 20150615135341.GA28462@nuc-i3427.alporthouse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson June 15, 2015, 1:53 p.m. UTC
On Mon, Jun 15, 2015 at 03:45:38PM +0200, Daniel Vetter wrote:
> On Mon, Jun 15, 2015 at 12:23:48PM +0100, Chris Wilson wrote:
> > In igt, we want to test handling of GPU hangs, both for recovery
> > purposes and for reporting. However, we don't want to inject a genuine
> > GPU hang onto a machine that cannot recover and so be permenantly
> > wedged. Rather than embed heuristics into igt, have the kernel report
> > exactly when it expects the GPU reset to work.
> > 
> > This can also be usefully extended in future to indicate different
> > levels of fine-grained resets.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Tim Gore <tim.gore@intel.com>
> > Cc: Tomas Elf <tomas.elf@intel.com>
> 
> Yeah makes sense. Will merge as soon as someone smashes a t-b with a few
> igt patches using this on top.

 void igt_require_hang_ring(int fd, int ring)
 {
        gem_context_require_ban_period(fd);
-       igt_require(intel_gen(intel_get_drm_devid(fd)) >= 5);
+       igt_require(has_gpu_reset(fd));
 }
 
 /**

Comments

Chris Wilson June 15, 2015, 1:58 p.m. UTC | #1
On Mon, Jun 15, 2015 at 02:53:41PM +0100, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 03:45:38PM +0200, Daniel Vetter wrote:
> > On Mon, Jun 15, 2015 at 12:23:48PM +0100, Chris Wilson wrote:
> > > In igt, we want to test handling of GPU hangs, both for recovery
> > > purposes and for reporting. However, we don't want to inject a genuine
> > > GPU hang onto a machine that cannot recover and so be permenantly
> > > wedged. Rather than embed heuristics into igt, have the kernel report
> > > exactly when it expects the GPU reset to work.
> > > 
> > > This can also be usefully extended in future to indicate different
> > > levels of fine-grained resets.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Tim Gore <tim.gore@intel.com>
> > > Cc: Tomas Elf <tomas.elf@intel.com>
> > 
> > Yeah makes sense. Will merge as soon as someone smashes a t-b with a few
> > igt patches using this on top.
> 
> diff --git a/lib/igt_gt.c b/lib/igt_gt.c
> index deb5560..8a1ffb2 100644
> --- a/lib/igt_gt.c
> +++ b/lib/igt_gt.c
> @@ -26,6 +26,7 @@
>  #include <errno.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> +#include <sys/ioctl.h>
>  #include <fcntl.h>
>  
>  #include "drmtest.h"
> @@ -47,6 +48,21 @@
>   * engines.
>   */
>  
> +static bool has_gpu_reset(int fd)
> +{
> +       struct drm_i915_getparam gp;
> +       int val = 0;
> +
> +       memset(&gp, 0, sizeof(gp));
> +       gp.param = 35; /* HAS_GPU_RESET */
> +       gp.value = &val;
> +
> +       if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp, sizeof(gp)))
> +               return intel_gen(intel_get_drm_devid(fd)) >= 5;
> +
> +       return val > 0;
> +}
>  
>  /**
>   * igt_require_hang_ring:
> @@ -60,7 +76,7 @@
>  void igt_require_hang_ring(int fd, int ring)
>  {
>         gem_context_require_ban_period(fd);
> -       igt_require(intel_gen(intel_get_drm_devid(fd)) >= 5);
> +       igt_require(has_gpu_reset(fd));
>  }

Speaking of which, do we want
  igt_require(getenv("IGT_DISABLE_HANG") == NULL);
here?
-Chris
Daniel Vetter June 15, 2015, 3:01 p.m. UTC | #2
On Mon, Jun 15, 2015 at 02:58:17PM +0100, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 02:53:41PM +0100, Chris Wilson wrote:
> > On Mon, Jun 15, 2015 at 03:45:38PM +0200, Daniel Vetter wrote:
> > > On Mon, Jun 15, 2015 at 12:23:48PM +0100, Chris Wilson wrote:
> > > > In igt, we want to test handling of GPU hangs, both for recovery
> > > > purposes and for reporting. However, we don't want to inject a genuine
> > > > GPU hang onto a machine that cannot recover and so be permenantly
> > > > wedged. Rather than embed heuristics into igt, have the kernel report
> > > > exactly when it expects the GPU reset to work.
> > > > 
> > > > This can also be usefully extended in future to indicate different
> > > > levels of fine-grained resets.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > Cc: Tim Gore <tim.gore@intel.com>
> > > > Cc: Tomas Elf <tomas.elf@intel.com>
> > > 
> > > Yeah makes sense. Will merge as soon as someone smashes a t-b with a few
> > > igt patches using this on top.
> > 
> > diff --git a/lib/igt_gt.c b/lib/igt_gt.c
> > index deb5560..8a1ffb2 100644
> > --- a/lib/igt_gt.c
> > +++ b/lib/igt_gt.c
> > @@ -26,6 +26,7 @@
> >  #include <errno.h>
> >  #include <sys/types.h>
> >  #include <sys/stat.h>
> > +#include <sys/ioctl.h>
> >  #include <fcntl.h>
> >  
> >  #include "drmtest.h"
> > @@ -47,6 +48,21 @@
> >   * engines.
> >   */
> >  
> > +static bool has_gpu_reset(int fd)
> > +{
> > +       struct drm_i915_getparam gp;
> > +       int val = 0;
> > +
> > +       memset(&gp, 0, sizeof(gp));
> > +       gp.param = 35; /* HAS_GPU_RESET */
> > +       gp.value = &val;
> > +
> > +       if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp, sizeof(gp)))
> > +               return intel_gen(intel_get_drm_devid(fd)) >= 5;
> > +
> > +       return val > 0;
> > +}
> >  
> >  /**
> >   * igt_require_hang_ring:
> > @@ -60,7 +76,7 @@
> >  void igt_require_hang_ring(int fd, int ring)
> >  {
> >         gem_context_require_ban_period(fd);
> > -       igt_require(intel_gen(intel_get_drm_devid(fd)) >= 5);
> > +       igt_require(has_gpu_reset(fd));
> >  }

Count me convinced, patch applied ;-)

> Speaking of which, do we want
>   igt_require(getenv("IGT_DISABLE_HANG") == NULL);
> here?

Well igt_require(!igt_check_boolean_env_var(IGT_DISABLE_HANG, false)); but
tbh I'm not sure of that. Filtering testcases with piglit using -x hang
should amount to the same really.
-Daniel
diff mbox

Patch

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index deb5560..8a1ffb2 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -26,6 +26,7 @@ 
 #include <errno.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <sys/ioctl.h>
 #include <fcntl.h>
 
 #include "drmtest.h"
@@ -47,6 +48,21 @@ 
  * engines.
  */
 
+static bool has_gpu_reset(int fd)
+{
+       struct drm_i915_getparam gp;
+       int val = 0;
+
+       memset(&gp, 0, sizeof(gp));
+       gp.param = 35; /* HAS_GPU_RESET */
+       gp.value = &val;
+
+       if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp, sizeof(gp)))
+               return intel_gen(intel_get_drm_devid(fd)) >= 5;
+
+       return val > 0;
+}
 
 /**
  * igt_require_hang_ring:
@@ -60,7 +76,7 @@