diff mbox

[03/15] drm/i915: Upgrade execbuffer fail after resume failure to EIO

Message ID 1407250286-1801-4-git-send-email-rodrigo.vivi@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Rodrigo Vivi Aug. 5, 2014, 2:51 p.m. UTC
From: Chris Wilson <chris@chris-wilson.co.uk>

If we try to execute on a known ring, but it has failed to be
initialised correctly, report that the GPU is hung rather than the
command invalid. This leaves us reporting EINVAL only if the user
requests execution on a ring that is not supported by the device.

This should prevent UXA from getting stuck in a null render loop after a
failed resume.

v2 (Rodrigo): Fix conflict and add VCS2 ring and
   	      s/intel_ring_buffer/intel_engine_cs.

Reported-by: Jiri Kosina <jikos@jikos.cz>
References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Daniel Vetter Aug. 6, 2014, 7:56 a.m. UTC | #1
On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> If we try to execute on a known ring, but it has failed to be
> initialised correctly, report that the GPU is hung rather than the
> command invalid. This leaves us reporting EINVAL only if the user
> requests execution on a ring that is not supported by the device.
> 
> This should prevent UXA from getting stuck in a null render loop after a
> failed resume.
> 
> v2 (Rodrigo): Fix conflict and add VCS2 ring and
>    	      s/intel_ring_buffer/intel_engine_cs.
> 
> Reported-by: Jiri Kosina <jikos@jikos.cz>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

This isn't required any more, see

commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Apr 9 09:19:43 2014 +0100

    drm/i915: Mark device as wedged if we fail to resume

for the alternate merged patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 60998fc..288ff61 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1233,6 +1233,19 @@ eb_get_batch(struct eb_vmas *eb)
>  	return vma->obj;
>  }
>  
> +static bool
> +intel_ring_valid(struct intel_engine_cs *ring)
> +{
> +	switch (ring->id) {
> +	case RCS: return true;
> +	case VCS: return HAS_BSD(ring->dev);
> +	case BCS: return HAS_BLT(ring->dev);
> +	case VECS: return HAS_VEBOX(ring->dev);
> +	case VCS2: return HAS_BSD2(ring->dev);
> +	default: return false;
> +	}
> +}
> +
>  static int
>  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  		       struct drm_file *file,
> @@ -1289,7 +1302,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	if (!intel_ring_initialized(ring)) {
>  		DRM_DEBUG("execbuf with invalid ring: %d\n",
>  			  (int)(args->flags & I915_EXEC_RING_MASK));
> -		return -EINVAL;
> +		return intel_ring_valid(ring) ? -EIO : -EINVAL;
>  	}
>  
>  	if (args->buffer_count < 1) {
> -- 
> 1.9.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Aug. 6, 2014, 8:12 a.m. UTC | #2
On Wed, Aug 06, 2014 at 09:56:45AM +0200, Daniel Vetter wrote:
> On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > If we try to execute on a known ring, but it has failed to be
> > initialised correctly, report that the GPU is hung rather than the
> > command invalid. This leaves us reporting EINVAL only if the user
> > requests execution on a ring that is not supported by the device.
> > 
> > This should prevent UXA from getting stuck in a null render loop after a
> > failed resume.
> > 
> > v2 (Rodrigo): Fix conflict and add VCS2 ring and
> >    	      s/intel_ring_buffer/intel_engine_cs.
> > 
> > Reported-by: Jiri Kosina <jikos@jikos.cz>
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> 
> This isn't required any more, see
> 
> commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Apr 9 09:19:43 2014 +0100
> 
>     drm/i915: Mark device as wedged if we fail to resume
> 
> for the alternate merged patch.

Hmm, there is still a path that ends here, but the example above is
already fixed as you say.
-Chris
Daniel Vetter Aug. 6, 2014, 8:39 a.m. UTC | #3
On Wed, Aug 06, 2014 at 09:12:32AM +0100, Chris Wilson wrote:
> On Wed, Aug 06, 2014 at 09:56:45AM +0200, Daniel Vetter wrote:
> > On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote:
> > > From: Chris Wilson <chris@chris-wilson.co.uk>
> > > 
> > > If we try to execute on a known ring, but it has failed to be
> > > initialised correctly, report that the GPU is hung rather than the
> > > command invalid. This leaves us reporting EINVAL only if the user
> > > requests execution on a ring that is not supported by the device.
> > > 
> > > This should prevent UXA from getting stuck in a null render loop after a
> > > failed resume.
> > > 
> > > v2 (Rodrigo): Fix conflict and add VCS2 ring and
> > >    	      s/intel_ring_buffer/intel_engine_cs.
> > > 
> > > Reported-by: Jiri Kosina <jikos@jikos.cz>
> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > 
> > This isn't required any more, see
> > 
> > commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Wed Apr 9 09:19:43 2014 +0100
> > 
> >     drm/i915: Mark device as wedged if we fail to resume
> > 
> > for the alternate merged patch.
> 
> Hmm, there is still a path that ends here, but the example above is
> already fixed as you say.

We have the EIO check both in the resume and driver load paths. Which
other path are we missing?
-Daniel
Chris Wilson Aug. 8, 2014, 9:17 a.m. UTC | #4
On Wed, Aug 06, 2014 at 10:39:16AM +0200, Daniel Vetter wrote:
> On Wed, Aug 06, 2014 at 09:12:32AM +0100, Chris Wilson wrote:
> > On Wed, Aug 06, 2014 at 09:56:45AM +0200, Daniel Vetter wrote:
> > > On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote:
> > > > From: Chris Wilson <chris@chris-wilson.co.uk>
> > > > 
> > > > If we try to execute on a known ring, but it has failed to be
> > > > initialised correctly, report that the GPU is hung rather than the
> > > > command invalid. This leaves us reporting EINVAL only if the user
> > > > requests execution on a ring that is not supported by the device.
> > > > 
> > > > This should prevent UXA from getting stuck in a null render loop after a
> > > > failed resume.
> > > > 
> > > > v2 (Rodrigo): Fix conflict and add VCS2 ring and
> > > >    	      s/intel_ring_buffer/intel_engine_cs.
> > > > 
> > > > Reported-by: Jiri Kosina <jikos@jikos.cz>
> > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > 
> > > This isn't required any more, see
> > > 
> > > commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date:   Wed Apr 9 09:19:43 2014 +0100
> > > 
> > >     drm/i915: Mark device as wedged if we fail to resume
> > > 
> > > for the alternate merged patch.
> > 
> > Hmm, there is still a path that ends here, but the example above is
> > already fixed as you say.
> 
> We have the EIO check both in the resume and driver load paths. Which
> other path are we missing?

The GPU may be set to wedged, but this check in execbuffer occurs before
we check for a wedged GPU.
-Chris
Daniel Vetter Aug. 8, 2014, 9:46 a.m. UTC | #5
On Fri, Aug 08, 2014 at 10:17:10AM +0100, Chris Wilson wrote:
> On Wed, Aug 06, 2014 at 10:39:16AM +0200, Daniel Vetter wrote:
> > On Wed, Aug 06, 2014 at 09:12:32AM +0100, Chris Wilson wrote:
> > > On Wed, Aug 06, 2014 at 09:56:45AM +0200, Daniel Vetter wrote:
> > > > On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote:
> > > > > From: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > 
> > > > > If we try to execute on a known ring, but it has failed to be
> > > > > initialised correctly, report that the GPU is hung rather than the
> > > > > command invalid. This leaves us reporting EINVAL only if the user
> > > > > requests execution on a ring that is not supported by the device.
> > > > > 
> > > > > This should prevent UXA from getting stuck in a null render loop after a
> > > > > failed resume.
> > > > > 
> > > > > v2 (Rodrigo): Fix conflict and add VCS2 ring and
> > > > >    	      s/intel_ring_buffer/intel_engine_cs.
> > > > > 
> > > > > Reported-by: Jiri Kosina <jikos@jikos.cz>
> > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
> > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > 
> > > > This isn't required any more, see
> > > > 
> > > > commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0
> > > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Date:   Wed Apr 9 09:19:43 2014 +0100
> > > > 
> > > >     drm/i915: Mark device as wedged if we fail to resume
> > > > 
> > > > for the alternate merged patch.
> > > 
> > > Hmm, there is still a path that ends here, but the example above is
> > > already fixed as you say.
> > 
> > We have the EIO check both in the resume and driver load paths. Which
> > other path are we missing?
> 
> The GPU may be set to wedged, but this check in execbuffer occurs before
> we check for a wedged GPU.

But we no longer free the ring structures over suspedn/resume, so at least
the commit message is outdated.

I wonder whether the easier fix wouldn't be to continue ring init if we
get an -EIO.
-Daniel
Chris Wilson Aug. 8, 2014, 9:52 a.m. UTC | #6
On Fri, Aug 08, 2014 at 11:46:07AM +0200, Daniel Vetter wrote:
> On Fri, Aug 08, 2014 at 10:17:10AM +0100, Chris Wilson wrote:
> > On Wed, Aug 06, 2014 at 10:39:16AM +0200, Daniel Vetter wrote:
> > > On Wed, Aug 06, 2014 at 09:12:32AM +0100, Chris Wilson wrote:
> > > > On Wed, Aug 06, 2014 at 09:56:45AM +0200, Daniel Vetter wrote:
> > > > > On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote:
> > > > > > From: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > 
> > > > > > If we try to execute on a known ring, but it has failed to be
> > > > > > initialised correctly, report that the GPU is hung rather than the
> > > > > > command invalid. This leaves us reporting EINVAL only if the user
> > > > > > requests execution on a ring that is not supported by the device.
> > > > > > 
> > > > > > This should prevent UXA from getting stuck in a null render loop after a
> > > > > > failed resume.
> > > > > > 
> > > > > > v2 (Rodrigo): Fix conflict and add VCS2 ring and
> > > > > >    	      s/intel_ring_buffer/intel_engine_cs.
> > > > > > 
> > > > > > Reported-by: Jiri Kosina <jikos@jikos.cz>
> > > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
> > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > 
> > > > > This isn't required any more, see
> > > > > 
> > > > > commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0
> > > > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Date:   Wed Apr 9 09:19:43 2014 +0100
> > > > > 
> > > > >     drm/i915: Mark device as wedged if we fail to resume
> > > > > 
> > > > > for the alternate merged patch.
> > > > 
> > > > Hmm, there is still a path that ends here, but the example above is
> > > > already fixed as you say.
> > > 
> > > We have the EIO check both in the resume and driver load paths. Which
> > > other path are we missing?
> > 
> > The GPU may be set to wedged, but this check in execbuffer occurs before
> > we check for a wedged GPU.
> 
> But we no longer free the ring structures over suspedn/resume, so at least
> the commit message is outdated.

Went off on an incorrect tangent.
 
> I wonder whether the easier fix wouldn't be to continue ring init if we
> get an -EIO.

Right, that's the problem I remember. But I am not sure if it is really
worth it. It's only to hide log spew in a corner case of a corner case in
UXA.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 60998fc..288ff61 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1233,6 +1233,19 @@  eb_get_batch(struct eb_vmas *eb)
 	return vma->obj;
 }
 
+static bool
+intel_ring_valid(struct intel_engine_cs *ring)
+{
+	switch (ring->id) {
+	case RCS: return true;
+	case VCS: return HAS_BSD(ring->dev);
+	case BCS: return HAS_BLT(ring->dev);
+	case VECS: return HAS_VEBOX(ring->dev);
+	case VCS2: return HAS_BSD2(ring->dev);
+	default: return false;
+	}
+}
+
 static int
 i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		       struct drm_file *file,
@@ -1289,7 +1302,7 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (!intel_ring_initialized(ring)) {
 		DRM_DEBUG("execbuf with invalid ring: %d\n",
 			  (int)(args->flags & I915_EXEC_RING_MASK));
-		return -EINVAL;
+		return intel_ring_valid(ring) ? -EIO : -EINVAL;
 	}
 
 	if (args->buffer_count < 1) {