diff mbox series

[03/25] drm/i915: Prevent user context creation while wedged

Message ID 20190219122215.8941-3-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [01/25] drm/i915: Move verify_wm_state() to heap | expand

Commit Message

Chris Wilson Feb. 19, 2019, 12:21 p.m. UTC
Introduce a new ABI method for detecting a wedged driver by reporting
-EIO from DRM_IOCTL_I915_GEM_CONTEXT_CREATE.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Mika Kuoppala Feb. 19, 2019, 1:07 p.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Introduce a new ABI method for detecting a wedged driver by reporting
> -EIO from DRM_IOCTL_I915_GEM_CONTEXT_CREATE.

Need more on the 'why' department. What is lacking with
the method of submitting and noticing the wedged?

Also detecting banned from wedged is not trivial.

I am trying to figure out what is the userspace
need and flow wrt this.

If we will have object parameters, should we
convey this type of information with status query?

-Mika

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 7541c6f961b3..7337aa01c361 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -802,18 +802,23 @@ static bool client_is_banned(struct drm_i915_file_private *file_priv)
>  int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>  				  struct drm_file *file)
>  {
> -	struct drm_i915_private *dev_priv = to_i915(dev);
> +	struct drm_i915_private *i915 = to_i915(dev);
>  	struct drm_i915_gem_context_create *args = data;
>  	struct drm_i915_file_private *file_priv = file->driver_priv;
>  	struct i915_gem_context *ctx;
>  	int ret;
>  
> -	if (!DRIVER_CAPS(dev_priv)->has_logical_contexts)
> +	if (!DRIVER_CAPS(i915)->has_logical_contexts)
>  		return -ENODEV;
>  
>  	if (args->pad != 0)
>  		return -EINVAL;
>  
> +	if (i915_terminally_wedged(&i915->gpu_error)) {
> +		DRM_DEBUG("driver is wedged; banning new ctx!\n");
> +		return -EIO;
> +	}
> +
>  	if (client_is_banned(file_priv)) {
>  		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
>  			  current->comm,
> @@ -826,7 +831,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>  	if (ret)
>  		return ret;
>  
> -	ctx = i915_gem_create_context(dev_priv, file_priv);
> +	ctx = i915_gem_create_context(i915, file_priv);
>  	mutex_unlock(&dev->struct_mutex);
>  	if (IS_ERR(ctx))
>  		return PTR_ERR(ctx);
> -- 
> 2.20.1
Chris Wilson Feb. 19, 2019, 1:11 p.m. UTC | #2
Quoting Mika Kuoppala (2019-02-19 13:07:08)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Introduce a new ABI method for detecting a wedged driver by reporting
> > -EIO from DRM_IOCTL_I915_GEM_CONTEXT_CREATE.
> 
> Need more on the 'why' department. What is lacking with
> the method of submitting and noticing the wedged?

This fits in with

https://lists.freedesktop.org/archives/mesa-dev/2019-February/215469.html

where we are recreating the contexting the context after receiving the
-EIO from execbuf and so this allows us to actually break that loop if
the driver is wedged.
 
> Also detecting banned from wedged is not trivial.

We are not detecting banned per-se, just saying that if the driver is
wedged.
 
> I am trying to figure out what is the userspace
> need and flow wrt this.
> 
> If we will have object parameters, should we
> convey this type of information with status query?

It's not an object property, it's the status of the driver. Currently the
indication is throttle returning -EIO, but that would look mighty
strange in the above recreate_context().
-Chris
Chris Wilson Feb. 19, 2019, 1:20 p.m. UTC | #3
Quoting Mika Kuoppala (2019-02-19 13:07:08)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Introduce a new ABI method for detecting a wedged driver by reporting
> > -EIO from DRM_IOCTL_I915_GEM_CONTEXT_CREATE.
> 
> Need more on the 'why' department. What is lacking with
> the method of submitting and noticing the wedged?
> 
> Also detecting banned from wedged is not trivial.

The biggest problem is our transient wedging, where we use
i915_gem_set_wedged() to cancel inflight rendering to avoid a deadlock
during reset. That is an issue here as we may submit and see -EIO even
though the driver isn't strictly terminally wedged and may reset.

We've even seen that once in practice,
https://bugs.freedesktop.org/show_bug.cgi?id=109580.

Hmm, pre-existing problem requiring more general solution. I'm thinking
along the lines of

	if (i915_terminally_wedged())
		return i915_reset_backoff() ? -EAGAIN : -EIO;

Joy. /o\

ret = i915_reset_backoff();

static int i915_reset_backoff()
{
	unsigned long flags = READ_ONCE(error->flags);

	if (!(flags & WEDGED))
		return 0;
	
	if (flags & BACKOFF)
		return -EAGAIN;

	return -EIO;
}

Seems eerily familiar, like a full circle.
-Chris
Mika Kuoppala Feb. 19, 2019, 1:31 p.m. UTC | #4
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2019-02-19 13:07:08)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > Introduce a new ABI method for detecting a wedged driver by reporting
>> > -EIO from DRM_IOCTL_I915_GEM_CONTEXT_CREATE.
>> 
>> Need more on the 'why' department. What is lacking with
>> the method of submitting and noticing the wedged?
>
> This fits in with
>
> https://lists.freedesktop.org/archives/mesa-dev/2019-February/215469.html
>
> where we are recreating the contexting the context after receiving the
> -EIO from execbuf and so this allows us to actually break that loop if
> the driver is wedged.
>  
>> Also detecting banned from wedged is not trivial.
>
> We are not detecting banned per-se, just saying that if the driver is
> wedged.

Yes. I was meaning that you will now get -EIO from both wedged
driver and from banned client and you can't tell em apart.
Not that you would need to :O

Add that mesa commit as a reference? (even tho I didn't notice
it setting unrecoverable as it boldy promised :)

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

-Mika
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 7541c6f961b3..7337aa01c361 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -802,18 +802,23 @@  static bool client_is_banned(struct drm_i915_file_private *file_priv)
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_context_create *args = data;
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct i915_gem_context *ctx;
 	int ret;
 
-	if (!DRIVER_CAPS(dev_priv)->has_logical_contexts)
+	if (!DRIVER_CAPS(i915)->has_logical_contexts)
 		return -ENODEV;
 
 	if (args->pad != 0)
 		return -EINVAL;
 
+	if (i915_terminally_wedged(&i915->gpu_error)) {
+		DRM_DEBUG("driver is wedged; banning new ctx!\n");
+		return -EIO;
+	}
+
 	if (client_is_banned(file_priv)) {
 		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
 			  current->comm,
@@ -826,7 +831,7 @@  int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	ctx = i915_gem_create_context(dev_priv, file_priv);
+	ctx = i915_gem_create_context(i915, file_priv);
 	mutex_unlock(&dev->struct_mutex);
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);