[2/3] drm/i915: Fix some invalid requests cancellations

Message ID	1454086145-16160-2-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Fri, 29 Jan 2016 16:49:04 +0000 Message-Id: <1454086145-16160-2-git-send-email-chris@chris-wilson.co.uk> In-Reply-To: <1454086145-16160-1-git-send-email-chris@chris-wilson.co.uk> References: <1454086145-16160-1-git-send-email-chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, stable@vger.kernel.org Subject: [Intel-gfx] [PATCH 2/3] drm/i915: Fix some invalid requests cancellations Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

1454086145-16160-2-git-send-email-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Fri, 29 Jan 2016 16:49:04 +0000
Message-Id: <1454086145-16160-2-git-send-email-chris@chris-wilson.co.uk>
In-Reply-To: <1454086145-16160-1-git-send-email-chris@chris-wilson.co.uk>
References: <1454086145-16160-1-git-send-email-chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, stable@vger.kernel.org
Subject: [Intel-gfx] [PATCH 2/3] drm/i915: Fix some invalid requests
	cancellations
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

Chris Wilson Jan. 29, 2016, 4:49 p.m. UTC

As we add the VMA to the request early, it may be cancelled during
execbuf reservation. This will leave the context object pointing to a
dangling request; i915_wait_request() simply skips the wait and so we
may unbind the object whilst it is still active.

However, if at any point we make a change to the hardware (and equally
importantly our bookkeeping in the driver), we cannot cancel the request
as what has already been written must be submitted. Submitting a partial
request is far easier than trying to unwind the incomplete change.

Unfortunately this patch undoes the excess breadcrumb usage that olr
prevented, e.g. if we interrupt batchbuffer submission then we submit
the requests along with the memory writes and interrupt (even though we
do no real work). Disassociating requests from breadcrumbs (and
semaphores) is a topic for a past/future series, but now much more
important.

v2: Rebase

Note that igt/gem_concurrent_blit tiggers both misrendering and a GPF
that is fixed by this patch.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_drv.h            |  1 -
 drivers/gpu/drm/i915/i915_gem.c            |  7 ++-----
 drivers/gpu/drm/i915/i915_gem_context.c    | 21 +++++++++------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++++-----------
 drivers/gpu/drm/i915/intel_display.c       |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  1 -
 6 files changed, 17 insertions(+), 31 deletions(-)

Comments

Tvrtko Ursulin Feb. 11, 2016, 1:01 p.m. UTC | #1

On 29/01/16 16:49, Chris Wilson wrote:
> As we add the VMA to the request early, it may be cancelled during
> execbuf reservation. This will leave the context object pointing to a

I don't get it, request is created after the reservation.

> dangling request; i915_wait_request() simply skips the wait and so we
> may unbind the object whilst it is still active.
>
> However, if at any point we make a change to the hardware (and equally
> importantly our bookkeeping in the driver), we cannot cancel the request
> as what has already been written must be submitted. Submitting a partial
> request is far easier than trying to unwind the incomplete change.
>
> Unfortunately this patch undoes the excess breadcrumb usage that olr
> prevented, e.g. if we interrupt batchbuffer submission then we submit
> the requests along with the memory writes and interrupt (even though we
> do no real work). Disassociating requests from breadcrumbs (and
> semaphores) is a topic for a past/future series, but now much more
> important.
 >
> v2: Rebase
>
> Note that igt/gem_concurrent_blit tiggers both misrendering and a GPF
> that is fixed by this patch.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
> Testcase: igt/gem_concurrent_blit
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: stable@vger.kernel.org
> ---
>   drivers/gpu/drm/i915/i915_drv.h            |  1 -
>   drivers/gpu/drm/i915/i915_gem.c            |  7 ++-----
>   drivers/gpu/drm/i915/i915_gem_context.c    | 21 +++++++++------------
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++++-----------
>   drivers/gpu/drm/i915/intel_display.c       |  2 +-
>   drivers/gpu/drm/i915/intel_lrc.c           |  1 -
>   6 files changed, 17 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a2d2f08b7515..f828a7ffed37 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2823,7 +2823,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
>   			     struct drm_file *file_priv);
>   void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>   					struct drm_i915_gem_request *req);
> -void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
>   int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>   				   struct drm_i915_gem_execbuffer2 *args,
>   				   struct list_head *vmas);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index e9b19bca1383..f764f33580fc 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3407,12 +3407,9 @@ int i915_gpu_idle(struct drm_device *dev)
>   				return PTR_ERR(req);
>
>   			ret = i915_switch_context(req);
> -			if (ret) {
> -				i915_gem_request_cancel(req);
> -				return ret;
> -			}
> -
>   			i915_add_request_no_flush(req);
> +			if (ret)
> +				return ret;
>   		}
>
>   		ret = intel_ring_idle(ring);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 83a097c94911..5da7adc3f7b2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -652,7 +652,6 @@ static int do_switch(struct drm_i915_gem_request *req)
>   	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>   	struct intel_context *from = ring->last_context;
>   	u32 hw_flags = 0;
> -	bool uninitialized = false;
>   	int ret, i;
>
>   	if (from != NULL && ring == &dev_priv->ring[RCS]) {
> @@ -759,6 +758,15 @@ static int do_switch(struct drm_i915_gem_request *req)
>   			to->remap_slice &= ~(1<<i);
>   	}
>
> +	if (!to->legacy_hw_ctx.initialized) {
> +		if (ring->init_context) {
> +			ret = ring->init_context(req);
> +			if (ret)
> +				goto unpin_out;
> +		}
> +		to->legacy_hw_ctx.initialized = true;
> +	}
> +
>   	/* The backing object for the context is done after switching to the
>   	 * *next* context. Therefore we cannot retire the previous context until
>   	 * the next context has already started running. In fact, the below code
> @@ -782,21 +790,10 @@ static int do_switch(struct drm_i915_gem_request *req)
>   		i915_gem_context_unreference(from);
>   	}
>
> -	uninitialized = !to->legacy_hw_ctx.initialized;
> -	to->legacy_hw_ctx.initialized = true;
> -
>   done:
>   	i915_gem_context_reference(to);
>   	ring->last_context = to;
>
> -	if (uninitialized) {
> -		if (ring->init_context) {
> -			ret = ring->init_context(req);
> -			if (ret)
> -				DRM_ERROR("ring init context: %d\n", ret);
> -		}
> -	}
> -
>   	return 0;
>
>   unpin_out:
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 8fd00d279447..61bb15507b30 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1133,7 +1133,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>   	}
>   }
>
> -void
> +static void
>   i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
>   {
>   	/* Unconditionally force add_request to emit a full flush. */
> @@ -1318,7 +1318,6 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>   	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
>
>   	i915_gem_execbuffer_move_to_active(vmas, params->request);
> -	i915_gem_execbuffer_retire_commands(params);

Hm this whole block block from trace_... to 
i915_gem_execbuffer_retire_commands could be moved to do_execbuffer as 
and independent cleanup patch if under "if (ret == 0)".

So the concern is that something has been written to the ringbuffer but 
not all of it?

Why do we have to submit that, I am assuming whatever was written is not 
interesting to be executed unless the whole bb went in.

So could we just rewind the pointers? Store them at the beginning and 
rewind if something failed.

>
>   	return 0;
>   }
> @@ -1616,8 +1615,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	}
>
>   	ret = i915_gem_request_add_to_client(req, file);
> -	if (ret)
> +	if (ret) {
> +		i915_gem_request_cancel(req);
>   		goto err_batch_unpin;
> +	}

Doesn't look like this can fail so a side cleanup only?

Regards,

Tvrtko

Daniel Vetter Feb. 15, 2016, 9:47 a.m. UTC | #2

On Thu, Feb 11, 2016 at 01:01:34PM +0000, Tvrtko Ursulin wrote:
> On 29/01/16 16:49, Chris Wilson wrote:
> >As we add the VMA to the request early, it may be cancelled during
> >execbuf reservation. This will leave the context object pointing to a
> 
> I don't get it, request is created after the reservation.

That's the problem - we add vma for a given request (well ring/seqno pair)
before that request exists.
-Daniel

> 
> >dangling request; i915_wait_request() simply skips the wait and so we
> >may unbind the object whilst it is still active.
> >
> >However, if at any point we make a change to the hardware (and equally
> >importantly our bookkeeping in the driver), we cannot cancel the request
> >as what has already been written must be submitted. Submitting a partial
> >request is far easier than trying to unwind the incomplete change.
> >
> >Unfortunately this patch undoes the excess breadcrumb usage that olr
> >prevented, e.g. if we interrupt batchbuffer submission then we submit
> >the requests along with the memory writes and interrupt (even though we
> >do no real work). Disassociating requests from breadcrumbs (and
> >semaphores) is a topic for a past/future series, but now much more
> >important.
> >
> >v2: Rebase
> >
> >Note that igt/gem_concurrent_blit tiggers both misrendering and a GPF
> >that is fixed by this patch.
> >
> >Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
> >Testcase: igt/gem_concurrent_blit
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >Cc: stable@vger.kernel.org
> >---
> >  drivers/gpu/drm/i915/i915_drv.h            |  1 -
> >  drivers/gpu/drm/i915/i915_gem.c            |  7 ++-----
> >  drivers/gpu/drm/i915/i915_gem_context.c    | 21 +++++++++------------
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++++-----------
> >  drivers/gpu/drm/i915/intel_display.c       |  2 +-
> >  drivers/gpu/drm/i915/intel_lrc.c           |  1 -
> >  6 files changed, 17 insertions(+), 31 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index a2d2f08b7515..f828a7ffed37 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -2823,7 +2823,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
> >  			     struct drm_file *file_priv);
> >  void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
> >  					struct drm_i915_gem_request *req);
> >-void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
> >  int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
> >  				   struct drm_i915_gem_execbuffer2 *args,
> >  				   struct list_head *vmas);
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index e9b19bca1383..f764f33580fc 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -3407,12 +3407,9 @@ int i915_gpu_idle(struct drm_device *dev)
> >  				return PTR_ERR(req);
> >
> >  			ret = i915_switch_context(req);
> >-			if (ret) {
> >-				i915_gem_request_cancel(req);
> >-				return ret;
> >-			}
> >-
> >  			i915_add_request_no_flush(req);
> >+			if (ret)
> >+				return ret;
> >  		}
> >
> >  		ret = intel_ring_idle(ring);
> >diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> >index 83a097c94911..5da7adc3f7b2 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_context.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >@@ -652,7 +652,6 @@ static int do_switch(struct drm_i915_gem_request *req)
> >  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> >  	struct intel_context *from = ring->last_context;
> >  	u32 hw_flags = 0;
> >-	bool uninitialized = false;
> >  	int ret, i;
> >
> >  	if (from != NULL && ring == &dev_priv->ring[RCS]) {
> >@@ -759,6 +758,15 @@ static int do_switch(struct drm_i915_gem_request *req)
> >  			to->remap_slice &= ~(1<<i);
> >  	}
> >
> >+	if (!to->legacy_hw_ctx.initialized) {
> >+		if (ring->init_context) {
> >+			ret = ring->init_context(req);
> >+			if (ret)
> >+				goto unpin_out;
> >+		}
> >+		to->legacy_hw_ctx.initialized = true;
> >+	}
> >+
> >  	/* The backing object for the context is done after switching to the
> >  	 * *next* context. Therefore we cannot retire the previous context until
> >  	 * the next context has already started running. In fact, the below code
> >@@ -782,21 +790,10 @@ static int do_switch(struct drm_i915_gem_request *req)
> >  		i915_gem_context_unreference(from);
> >  	}
> >
> >-	uninitialized = !to->legacy_hw_ctx.initialized;
> >-	to->legacy_hw_ctx.initialized = true;
> >-
> >  done:
> >  	i915_gem_context_reference(to);
> >  	ring->last_context = to;
> >
> >-	if (uninitialized) {
> >-		if (ring->init_context) {
> >-			ret = ring->init_context(req);
> >-			if (ret)
> >-				DRM_ERROR("ring init context: %d\n", ret);
> >-		}
> >-	}
> >-
> >  	return 0;
> >
> >  unpin_out:
> >diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >index 8fd00d279447..61bb15507b30 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >@@ -1133,7 +1133,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
> >  	}
> >  }
> >
> >-void
> >+static void
> >  i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
> >  {
> >  	/* Unconditionally force add_request to emit a full flush. */
> >@@ -1318,7 +1318,6 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
> >  	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
> >
> >  	i915_gem_execbuffer_move_to_active(vmas, params->request);
> >-	i915_gem_execbuffer_retire_commands(params);
> 
> Hm this whole block block from trace_... to
> i915_gem_execbuffer_retire_commands could be moved to do_execbuffer as and
> independent cleanup patch if under "if (ret == 0)".
> 
> So the concern is that something has been written to the ringbuffer but not
> all of it?
> 
> Why do we have to submit that, I am assuming whatever was written is not
> interesting to be executed unless the whole bb went in.
> 
> So could we just rewind the pointers? Store them at the beginning and rewind
> if something failed.
> 
> >
> >  	return 0;
> >  }
> >@@ -1616,8 +1615,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	}
> >
> >  	ret = i915_gem_request_add_to_client(req, file);
> >-	if (ret)
> >+	if (ret) {
> >+		i915_gem_request_cancel(req);
> >  		goto err_batch_unpin;
> >+	}
> 
> Doesn't look like this can fail so a side cleanup only?
> 
> Regards,
> 
> Tvrtko

Tvrtko Ursulin Feb. 19, 2016, 12:29 p.m. UTC | #3

On 15/02/16 09:47, Daniel Vetter wrote:
> On Thu, Feb 11, 2016 at 01:01:34PM +0000, Tvrtko Ursulin wrote:
>> On 29/01/16 16:49, Chris Wilson wrote:
>>> As we add the VMA to the request early, it may be cancelled during
>>> execbuf reservation. This will leave the context object pointing to a
>>
>> I don't get it, request is created after the reservation.
>
> That's the problem - we add vma for a given request (well ring/seqno pair)
> before that request exists.

Why can't we remove them?

Sorry I just can't penetrate the commit message. Don't get what 
wait_request is it talking about when the request is canceled so no 
more. Or how do contexts point to requests etc.

Maybe then find someone else to review this.

Regards,

Tvrtko

> -Daniel

>
>>
>>> dangling request; i915_wait_request() simply skips the wait and so we
>>> may unbind the object whilst it is still active.
>>>
>>> However, if at any point we make a change to the hardware (and equally
>>> importantly our bookkeeping in the driver), we cannot cancel the request
>>> as what has already been written must be submitted. Submitting a partial
>>> request is far easier than trying to unwind the incomplete change.
>>>
>>> Unfortunately this patch undoes the excess breadcrumb usage that olr
>>> prevented, e.g. if we interrupt batchbuffer submission then we submit
>>> the requests along with the memory writes and interrupt (even though we
>>> do no real work). Disassociating requests from breadcrumbs (and
>>> semaphores) is a topic for a past/future series, but now much more
>>> important.
>>>
>>> v2: Rebase
>>>
>>> Note that igt/gem_concurrent_blit tiggers both misrendering and a GPF
>>> that is fixed by this patch.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
>>> Testcase: igt/gem_concurrent_blit
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> Cc: stable@vger.kernel.org
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h            |  1 -
>>>   drivers/gpu/drm/i915/i915_gem.c            |  7 ++-----
>>>   drivers/gpu/drm/i915/i915_gem_context.c    | 21 +++++++++------------
>>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++++-----------
>>>   drivers/gpu/drm/i915/intel_display.c       |  2 +-
>>>   drivers/gpu/drm/i915/intel_lrc.c           |  1 -
>>>   6 files changed, 17 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index a2d2f08b7515..f828a7ffed37 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -2823,7 +2823,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
>>>   			     struct drm_file *file_priv);
>>>   void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>>>   					struct drm_i915_gem_request *req);
>>> -void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
>>>   int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>>>   				   struct drm_i915_gem_execbuffer2 *args,
>>>   				   struct list_head *vmas);
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index e9b19bca1383..f764f33580fc 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -3407,12 +3407,9 @@ int i915_gpu_idle(struct drm_device *dev)
>>>   				return PTR_ERR(req);
>>>
>>>   			ret = i915_switch_context(req);
>>> -			if (ret) {
>>> -				i915_gem_request_cancel(req);
>>> -				return ret;
>>> -			}
>>> -
>>>   			i915_add_request_no_flush(req);
>>> +			if (ret)
>>> +				return ret;
>>>   		}
>>>
>>>   		ret = intel_ring_idle(ring);
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>> index 83a097c94911..5da7adc3f7b2 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>> @@ -652,7 +652,6 @@ static int do_switch(struct drm_i915_gem_request *req)
>>>   	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>>>   	struct intel_context *from = ring->last_context;
>>>   	u32 hw_flags = 0;
>>> -	bool uninitialized = false;
>>>   	int ret, i;
>>>
>>>   	if (from != NULL && ring == &dev_priv->ring[RCS]) {
>>> @@ -759,6 +758,15 @@ static int do_switch(struct drm_i915_gem_request *req)
>>>   			to->remap_slice &= ~(1<<i);
>>>   	}
>>>
>>> +	if (!to->legacy_hw_ctx.initialized) {
>>> +		if (ring->init_context) {
>>> +			ret = ring->init_context(req);
>>> +			if (ret)
>>> +				goto unpin_out;
>>> +		}
>>> +		to->legacy_hw_ctx.initialized = true;
>>> +	}
>>> +
>>>   	/* The backing object for the context is done after switching to the
>>>   	 * *next* context. Therefore we cannot retire the previous context until
>>>   	 * the next context has already started running. In fact, the below code
>>> @@ -782,21 +790,10 @@ static int do_switch(struct drm_i915_gem_request *req)
>>>   		i915_gem_context_unreference(from);
>>>   	}
>>>
>>> -	uninitialized = !to->legacy_hw_ctx.initialized;
>>> -	to->legacy_hw_ctx.initialized = true;
>>> -
>>>   done:
>>>   	i915_gem_context_reference(to);
>>>   	ring->last_context = to;
>>>
>>> -	if (uninitialized) {
>>> -		if (ring->init_context) {
>>> -			ret = ring->init_context(req);
>>> -			if (ret)
>>> -				DRM_ERROR("ring init context: %d\n", ret);
>>> -		}
>>> -	}
>>> -
>>>   	return 0;
>>>
>>>   unpin_out:
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> index 8fd00d279447..61bb15507b30 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> @@ -1133,7 +1133,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>>>   	}
>>>   }
>>>
>>> -void
>>> +static void
>>>   i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
>>>   {
>>>   	/* Unconditionally force add_request to emit a full flush. */
>>> @@ -1318,7 +1318,6 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
>>>   	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
>>>
>>>   	i915_gem_execbuffer_move_to_active(vmas, params->request);
>>> -	i915_gem_execbuffer_retire_commands(params);
>>
>> Hm this whole block block from trace_... to
>> i915_gem_execbuffer_retire_commands could be moved to do_execbuffer as and
>> independent cleanup patch if under "if (ret == 0)".
>>
>> So the concern is that something has been written to the ringbuffer but not
>> all of it?
>>
>> Why do we have to submit that, I am assuming whatever was written is not
>> interesting to be executed unless the whole bb went in.
>>
>> So could we just rewind the pointers? Store them at the beginning and rewind
>> if something failed.
>>
>>>
>>>   	return 0;
>>>   }
>>> @@ -1616,8 +1615,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>>>   	}
>>>
>>>   	ret = i915_gem_request_add_to_client(req, file);
>>> -	if (ret)
>>> +	if (ret) {
>>> +		i915_gem_request_cancel(req);
>>>   		goto err_batch_unpin;
>>> +	}
>>
>> Doesn't look like this can fail so a side cleanup only?
>>
>> Regards,
>>
>> Tvrtko
>

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a2d2f08b7515..f828a7ffed37 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2823,7 +2823,6 @@  int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *file_priv);
 void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 					struct drm_i915_gem_request *req);
-void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
 int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 				   struct drm_i915_gem_execbuffer2 *args,
 				   struct list_head *vmas);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e9b19bca1383..f764f33580fc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3407,12 +3407,9 @@  int i915_gpu_idle(struct drm_device *dev)
 				return PTR_ERR(req);
 
 			ret = i915_switch_context(req);
-			if (ret) {
-				i915_gem_request_cancel(req);
-				return ret;
-			}
-
 			i915_add_request_no_flush(req);
+			if (ret)
+				return ret;
 		}
 
 		ret = intel_ring_idle(ring);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 83a097c94911..5da7adc3f7b2 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -652,7 +652,6 @@  static int do_switch(struct drm_i915_gem_request *req)
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct intel_context *from = ring->last_context;
 	u32 hw_flags = 0;
-	bool uninitialized = false;
 	int ret, i;
 
 	if (from != NULL && ring == &dev_priv->ring[RCS]) {
@@ -759,6 +758,15 @@  static int do_switch(struct drm_i915_gem_request *req)
 			to->remap_slice &= ~(1<<i);
 	}
 
+	if (!to->legacy_hw_ctx.initialized) {
+		if (ring->init_context) {
+			ret = ring->init_context(req);
+			if (ret)
+				goto unpin_out;
+		}
+		to->legacy_hw_ctx.initialized = true;
+	}
+
 	/* The backing object for the context is done after switching to the
 	 * *next* context. Therefore we cannot retire the previous context until
 	 * the next context has already started running. In fact, the below code
@@ -782,21 +790,10 @@  static int do_switch(struct drm_i915_gem_request *req)
 		i915_gem_context_unreference(from);
 	}
 
-	uninitialized = !to->legacy_hw_ctx.initialized;
-	to->legacy_hw_ctx.initialized = true;
-
 done:
 	i915_gem_context_reference(to);
 	ring->last_context = to;
 
-	if (uninitialized) {
-		if (ring->init_context) {
-			ret = ring->init_context(req);
-			if (ret)
-				DRM_ERROR("ring init context: %d\n", ret);
-		}
-	}
-
 	return 0;
 
 unpin_out:
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8fd00d279447..61bb15507b30 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1133,7 +1133,7 @@  i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 	}
 }
 
-void
+static void
 i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
 {
 	/* Unconditionally force add_request to emit a full flush. */
@@ -1318,7 +1318,6 @@  i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
 	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
-	i915_gem_execbuffer_retire_commands(params);
 
 	return 0;
 }
@@ -1616,8 +1615,10 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	ret = i915_gem_request_add_to_client(req, file);
-	if (ret)
+	if (ret) {
+		i915_gem_request_cancel(req);
 		goto err_batch_unpin;
+	}
 
 	/*
 	 * Save assorted stuff away to pass through to *_submission().
@@ -1634,6 +1635,7 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->request                 = req;
 
 	ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
+	i915_gem_execbuffer_retire_commands(params);
 
 err_batch_unpin:
 	/*
@@ -1650,14 +1652,6 @@  err:
 	i915_gem_context_unreference(ctx);
 	eb_destroy(eb);
 
-	/*
-	 * If the request was created but not successfully submitted then it
-	 * must be freed again. If it was submitted then it is being tracked
-	 * on the active request list and no clean up is required here.
-	 */
-	if (ret && !IS_ERR_OR_NULL(req))
-		i915_gem_request_cancel(req);
-
 	mutex_unlock(&dev->struct_mutex);
 
 pre_mutex_err:
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 304fc9637026..732c915bef9e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11723,7 +11723,7 @@  cleanup_unpin:
 	intel_unpin_fb_obj(fb, crtc->primary->state);
 cleanup_pending:
 	if (!IS_ERR_OR_NULL(request))
-		i915_gem_request_cancel(request);
+		i915_add_request_no_flush(request);
 	atomic_dec(&intel_crtc->unpin_work_count);
 	mutex_unlock(&dev->struct_mutex);
 cleanup:
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3a03646e343d..4edd700e6402 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1002,7 +1002,6 @@  int intel_execlists_submission(struct i915_execbuffer_params *params,
 	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
 
 	i915_gem_execbuffer_move_to_active(vmas, params->request);
-	i915_gem_execbuffer_retire_commands(params);
 
 	return 0;
 }

[2/3] drm/i915: Fix some invalid requests cancellations

Commit Message

Comments

Patch