[02/55] drm/i915: Reserve ring buffer space for i915_add_request() commands
diff mbox

Message ID 1434731652-15672-1-git-send-email-John.C.Harrison@Intel.com
State New
Headers show

Commit Message

John Harrison June 19, 2015, 4:34 p.m. UTC
From: John Harrison <John.C.Harrison@Intel.com>

It is a bad idea for i915_add_request() to fail. The work will already have been
send to the ring and will be processed, but there will not be any tracking or
management of that work.

The only way the add request call can fail is if it can't write its epilogue
commands to the ring (cache flushing, seqno updates, interrupt signalling). The
reasons for that are mostly down to running out of ring buffer space and the
problems associated with trying to get some more. This patch prevents that
situation from happening in the first place.

When a request is created, it marks sufficient space as reserved for the
epilogue commands. Thus guaranteeing that by the time the epilogue is written,
there will be plenty of space for it. Note that a ring_begin() call is required
to actually reserve the space (and do any potential waiting). However, that is
not currently done at request creation time. This is because the ring_begin()
code can allocate a request. Hence calling begin() from the request allocation
code would lead to infinite recursion! Later patches in this series remove the
need for begin() to do the allocate. At that point, it becomes safe for the
allocate to call begin() and really reserve the space.

Until then, there is a potential for insufficient space to be available at the
point of calling i915_add_request(). However, that would only be in the case
where the request was created and immediately submitted without ever calling
ring_begin() and adding any work to that request. Which should never happen. And
even if it does, and if that request happens to fall down the tiny window of
opportunity for failing due to being out of ring space then does it really
matter because the request wasn't doing anything in the first place?

v2: Updated the 'reserved space too small' warning to include the offending
sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
re-initialisation of tracking state after a buffer wrap to keep the sanity
checks accurate.

v3: Incremented the reserved size to accommodate Ironlake (after finally
managing to run on an ILK system). Also fixed missing wrap code in LRC mode.

v4: Added extra comment and removed duplicate WARN (feedback from Tomas).

v5: Re-write of wrap handling to prevent unnecessary early wraps (feedback from
Daniel Vetter).

For: VIZ-5115
CC: Tomas Elf <tomas.elf@intel.com>
CC: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |    1 +
 drivers/gpu/drm/i915/i915_gem.c         |   37 ++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        |   35 +++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.c |   98 +++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |   25 ++++++++
 5 files changed, 186 insertions(+), 10 deletions(-)

Comments

Daniel Vetter June 22, 2015, 8:12 p.m. UTC | #1
On Fri, Jun 19, 2015 at 05:34:12PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> It is a bad idea for i915_add_request() to fail. The work will already have been
> send to the ring and will be processed, but there will not be any tracking or
> management of that work.
> 
> The only way the add request call can fail is if it can't write its epilogue
> commands to the ring (cache flushing, seqno updates, interrupt signalling). The
> reasons for that are mostly down to running out of ring buffer space and the
> problems associated with trying to get some more. This patch prevents that
> situation from happening in the first place.
> 
> When a request is created, it marks sufficient space as reserved for the
> epilogue commands. Thus guaranteeing that by the time the epilogue is written,
> there will be plenty of space for it. Note that a ring_begin() call is required
> to actually reserve the space (and do any potential waiting). However, that is
> not currently done at request creation time. This is because the ring_begin()
> code can allocate a request. Hence calling begin() from the request allocation
> code would lead to infinite recursion! Later patches in this series remove the
> need for begin() to do the allocate. At that point, it becomes safe for the
> allocate to call begin() and really reserve the space.
> 
> Until then, there is a potential for insufficient space to be available at the
> point of calling i915_add_request(). However, that would only be in the case
> where the request was created and immediately submitted without ever calling
> ring_begin() and adding any work to that request. Which should never happen. And
> even if it does, and if that request happens to fall down the tiny window of
> opportunity for failing due to being out of ring space then does it really
> matter because the request wasn't doing anything in the first place?
> 
> v2: Updated the 'reserved space too small' warning to include the offending
> sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
> re-initialisation of tracking state after a buffer wrap to keep the sanity
> checks accurate.
> 
> v3: Incremented the reserved size to accommodate Ironlake (after finally
> managing to run on an ILK system). Also fixed missing wrap code in LRC mode.
> 
> v4: Added extra comment and removed duplicate WARN (feedback from Tomas).
> 
> v5: Re-write of wrap handling to prevent unnecessary early wraps (feedback from
> Daniel Vetter).

This didn't actually implement what I suggested (wrapping is the worst
case, hence skipping the check for that is breaking the sanity check) and
so changed the patch from "correct, but a bit fragile" to broken. I've
merged the previous version instead.
-Daniel

> 
> For: VIZ-5115
> CC: Tomas Elf <tomas.elf@intel.com>
> CC: Daniel Vetter <daniel@ffwll.ch>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |    1 +
>  drivers/gpu/drm/i915/i915_gem.c         |   37 ++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c        |   35 +++++++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   98 +++++++++++++++++++++++++++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   25 ++++++++
>  5 files changed, 186 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 0347eb9..eba1857 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2187,6 +2187,7 @@ struct drm_i915_gem_request {
>  
>  int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  			   struct intel_context *ctx);
> +void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>  void i915_gem_request_free(struct kref *req_ref);
>  
>  static inline uint32_t
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 81f3512..85fa27b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2485,6 +2485,13 @@ int __i915_add_request(struct intel_engine_cs *ring,
>  	} else
>  		ringbuf = ring->buffer;
>  
> +	/*
> +	 * To ensure that this call will not fail, space for its emissions
> +	 * should already have been reserved in the ring buffer. Let the ring
> +	 * know that it is time to use that space up.
> +	 */
> +	intel_ring_reserved_space_use(ringbuf);
> +
>  	request_start = intel_ring_get_tail(ringbuf);
>  	/*
>  	 * Emit any outstanding flushes - execbuf can fail to emit the flush
> @@ -2567,6 +2574,9 @@ int __i915_add_request(struct intel_engine_cs *ring,
>  			   round_jiffies_up_relative(HZ));
>  	intel_mark_busy(dev_priv->dev);
>  
> +	/* Sanity check that the reserved size was large enough. */
> +	intel_ring_reserved_space_end(ringbuf);
> +
>  	return 0;
>  }
>  
> @@ -2666,6 +2676,26 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  	if (ret)
>  		goto err;
>  
> +	/*
> +	 * Reserve space in the ring buffer for all the commands required to
> +	 * eventually emit this request. This is to guarantee that the
> +	 * i915_add_request() call can't fail. Note that the reserve may need
> +	 * to be redone if the request is not actually submitted straight
> +	 * away, e.g. because a GPU scheduler has deferred it.
> +	 *
> +	 * Note further that this call merely notes the reserve request. A
> +	 * subsequent call to *_ring_begin() is required to actually ensure
> +	 * that the reservation is available. Without the begin, if the
> +	 * request creator immediately submitted the request without adding
> +	 * any commands to it then there might not actually be sufficient
> +	 * room for the submission commands. Unfortunately, the current
> +	 * *_ring_begin() implementations potentially call back here to
> +	 * i915_gem_request_alloc(). Thus calling _begin() here would lead to
> +	 * infinite recursion! Until that back call path is removed, it is
> +	 * necessary to do a manual _begin() outside.
> +	 */
> +	intel_ring_reserved_space_reserve(req->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
> +
>  	ring->outstanding_lazy_request = req;
>  	return 0;
>  
> @@ -2674,6 +2704,13 @@ err:
>  	return ret;
>  }
>  
> +void i915_gem_request_cancel(struct drm_i915_gem_request *req)
> +{
> +	intel_ring_reserved_space_cancel(req->ringbuf);
> +
> +	i915_gem_request_unreference(req);
> +}
> +
>  struct drm_i915_gem_request *
>  i915_gem_find_active_request(struct intel_engine_cs *ring)
>  {
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 6a5ed07..bd62bd6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -690,6 +690,9 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
>  	if (intel_ring_space(ringbuf) >= bytes)
>  		return 0;
>  
> +	/* The whole point of reserving space is to not wait! */
> +	WARN_ON(ringbuf->reserved_in_use);
> +
>  	list_for_each_entry(request, &ring->request_list, list) {
>  		/*
>  		 * The request queue is per-engine, so can contain requests
> @@ -748,8 +751,12 @@ static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
>  	int rem = ringbuf->size - ringbuf->tail;
>  
>  	if (ringbuf->space < rem) {
> -		int ret = logical_ring_wait_for_space(ringbuf, ctx, rem);
> +		int ret;
> +
> +		/* Can't wait if space has already been reserved! */
> +		WARN_ON(ringbuf->reserved_in_use);
>  
> +		ret = logical_ring_wait_for_space(ringbuf, ctx, rem);
>  		if (ret)
>  			return ret;
>  	}
> @@ -768,7 +775,7 @@ static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
>  static int logical_ring_prepare(struct intel_ringbuffer *ringbuf,
>  				struct intel_context *ctx, int bytes)
>  {
> -	int ret;
> +	int ret, max_bytes;
>  
>  	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
>  		ret = logical_ring_wrap_buffer(ringbuf, ctx);
> @@ -776,8 +783,28 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf,
>  			return ret;
>  	}
>  
> -	if (unlikely(ringbuf->space < bytes)) {
> -		ret = logical_ring_wait_for_space(ringbuf, ctx, bytes);
> +	/*
> +	 * Add on the reserved size to the request to make sure that after
> +	 * the intended commands have been emitted, there is guaranteed to
> +	 * still be enough free space to send them to the hardware.
> +	 */
> +	max_bytes = bytes + ringbuf->reserved_size;
> +
> +	if (unlikely(ringbuf->space < max_bytes)) {
> +		/*
> +		 * Bytes is guaranteed to fit within the tail of the buffer,
> +		 * but the reserved space may push it off the end. If so then
> +		 * need to wait for the whole of the tail plus the reserved
> +		 * size. That should guarantee that the actual request
> +		 * (bytes) will fit between here and the end and the reserved
> +		 * usage will fit either in the same or at the start. Either
> +		 * way, if a wrap occurs it will not involve a wait and thus
> +		 * cannot fail.
> +		 */
> +		if (unlikely(ringbuf->tail + max_bytes + I915_RING_FREE_SPACE > ringbuf->effective_size))
> +			max_bytes = ringbuf->reserved_size + I915_RING_FREE_SPACE + ringbuf->size - ringbuf->tail;
> +
> +		ret = logical_ring_wait_for_space(ringbuf, ctx, max_bytes);
>  		if (unlikely(ret))
>  			return ret;
>  	}
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index d934f85..1c125e9 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2106,6 +2106,9 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
>  	if (intel_ring_space(ringbuf) >= n)
>  		return 0;
>  
> +	/* The whole point of reserving space is to not wait! */
> +	WARN_ON(ringbuf->reserved_in_use);
> +
>  	list_for_each_entry(request, &ring->request_list, list) {
>  		space = __intel_ring_space(request->postfix, ringbuf->tail,
>  					   ringbuf->size);
> @@ -2131,7 +2134,12 @@ static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
>  	int rem = ringbuf->size - ringbuf->tail;
>  
>  	if (ringbuf->space < rem) {
> -		int ret = ring_wait_for_space(ring, rem);
> +		int ret;
> +
> +		/* Can't wait if space has already been reserved! */
> +		WARN_ON(ringbuf->reserved_in_use);
> +
> +		ret = ring_wait_for_space(ring, rem);
>  		if (ret)
>  			return ret;
>  	}
> @@ -2180,11 +2188,69 @@ int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
>  	return 0;
>  }
>  
> -static int __intel_ring_prepare(struct intel_engine_cs *ring,
> -				int bytes)
> +void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size)
> +{
> +	/* NB: Until request management is fully tidied up and the OLR is
> +	 * removed, there are too many ways for get false hits on this
> +	 * anti-recursion check! */
> +	/*WARN_ON(ringbuf->reserved_size);*/
> +	WARN_ON(ringbuf->reserved_in_use);
> +
> +	ringbuf->reserved_size = size;
> +
> +	/*
> +	 * Really need to call _begin() here but that currently leads to
> +	 * recursion problems! This will be fixed later but for now just
> +	 * return and hope for the best. Note that there is only a real
> +	 * problem if the create of the request never actually calls _begin()
> +	 * but if they are not submitting any work then why did they create
> +	 * the request in the first place?
> +	 */
> +}
> +
> +void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf)
> +{
> +	WARN_ON(ringbuf->reserved_in_use);
> +
> +	ringbuf->reserved_size   = 0;
> +	ringbuf->reserved_in_use = false;
> +}
> +
> +void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf)
> +{
> +	WARN_ON(ringbuf->reserved_in_use);
> +
> +	ringbuf->reserved_in_use = true;
> +	ringbuf->reserved_tail   = ringbuf->tail;
> +}
> +
> +void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf)
> +{
> +	WARN_ON(!ringbuf->reserved_in_use);
> +	if (ringbuf->tail > ringbuf->reserved_tail) {
> +		WARN(ringbuf->tail > ringbuf->reserved_tail + ringbuf->reserved_size,
> +		     "request reserved size too small: %d vs %d!\n",
> +		     ringbuf->tail - ringbuf->reserved_tail, ringbuf->reserved_size);
> +	} else {
> +		/*
> +		 * The ring was wrapped while the reserved space was in use.
> +		 * That means that some unknown amount of the ring tail was
> +		 * no-op filled and skipped. Thus simply adding the ring size
> +		 * to the tail and doing the above space check will not work.
> +		 * Rather than attempt to track how much tail was skipped,
> +		 * it is much simpler to say that also skipping the sanity
> +		 * check every once in a while is not a big issue.
> +		 */
> +	}
> +
> +	ringbuf->reserved_size   = 0;
> +	ringbuf->reserved_in_use = false;
> +}
> +
> +static int __intel_ring_prepare(struct intel_engine_cs *ring, int bytes)
>  {
>  	struct intel_ringbuffer *ringbuf = ring->buffer;
> -	int ret;
> +	int ret, max_bytes;
>  
>  	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
>  		ret = intel_wrap_ring_buffer(ring);
> @@ -2192,8 +2258,28 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring,
>  			return ret;
>  	}
>  
> -	if (unlikely(ringbuf->space < bytes)) {
> -		ret = ring_wait_for_space(ring, bytes);
> +	/*
> +	 * Add on the reserved size to the request to make sure that after
> +	 * the intended commands have been emitted, there is guaranteed to
> +	 * still be enough free space to send them to the hardware.
> +	 */
> +	max_bytes = bytes + ringbuf->reserved_size;
> +
> +	if (unlikely(ringbuf->space < max_bytes)) {
> +		/*
> +		 * Bytes is guaranteed to fit within the tail of the buffer,
> +		 * but the reserved space may push it off the end. If so then
> +		 * need to wait for the whole of the tail plus the reserved
> +		 * size. That should guarantee that the actual request
> +		 * (bytes) will fit between here and the end and the reserved
> +		 * usage will fit either in the same or at the start. Either
> +		 * way, if a wrap occurs it will not involve a wait and thus
> +		 * cannot fail.
> +		 */
> +		if (unlikely(ringbuf->tail + max_bytes > ringbuf->effective_size))
> +			max_bytes = ringbuf->reserved_size + I915_RING_FREE_SPACE + ringbuf->size - ringbuf->tail;
> +
> +		ret = ring_wait_for_space(ring, max_bytes);
>  		if (unlikely(ret))
>  			return ret;
>  	}
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 39f6dfc..bf2ac28 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -105,6 +105,9 @@ struct intel_ringbuffer {
>  	int space;
>  	int size;
>  	int effective_size;
> +	int reserved_size;
> +	int reserved_tail;
> +	bool reserved_in_use;
>  
>  	/** We track the position of the requests in the ring buffer, and
>  	 * when each is retired we increment last_retired_head as the GPU
> @@ -450,4 +453,26 @@ intel_ring_get_request(struct intel_engine_cs *ring)
>  	return ring->outstanding_lazy_request;
>  }
>  
> +/*
> + * Arbitrary size for largest possible 'add request' sequence. The code paths
> + * are complex and variable. Empirical measurement shows that the worst case
> + * is ILK at 136 words. Reserving too much is better than reserving too little
> + * as that allows for corner cases that might have been missed. So the figure
> + * has been rounded up to 160 words.
> + */
> +#define MIN_SPACE_FOR_ADD_REQUEST	160
> +
> +/*
> + * Reserve space in the ring to guarantee that the i915_add_request() call
> + * will always have sufficient room to do its stuff. The request creation
> + * code calls this automatically.
> + */
> +void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size);
> +/* Cancel the reservation, e.g. because the request is being discarded. */
> +void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf);
> +/* Use the reserved space - for use by i915_add_request() only. */
> +void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf);
> +/* Finish with the reserved space - for use by i915_add_request() only. */
> +void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf);
> +
>  #endif /* _INTEL_RINGBUFFER_H_ */
> -- 
> 1.7.9.5
>
John Harrison June 23, 2015, 11:38 a.m. UTC | #2
On 22/06/2015 21:12, Daniel Vetter wrote:
> On Fri, Jun 19, 2015 at 05:34:12PM +0100, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> It is a bad idea for i915_add_request() to fail. The work will already have been
>> send to the ring and will be processed, but there will not be any tracking or
>> management of that work.
>>
>> The only way the add request call can fail is if it can't write its epilogue
>> commands to the ring (cache flushing, seqno updates, interrupt signalling). The
>> reasons for that are mostly down to running out of ring buffer space and the
>> problems associated with trying to get some more. This patch prevents that
>> situation from happening in the first place.
>>
>> When a request is created, it marks sufficient space as reserved for the
>> epilogue commands. Thus guaranteeing that by the time the epilogue is written,
>> there will be plenty of space for it. Note that a ring_begin() call is required
>> to actually reserve the space (and do any potential waiting). However, that is
>> not currently done at request creation time. This is because the ring_begin()
>> code can allocate a request. Hence calling begin() from the request allocation
>> code would lead to infinite recursion! Later patches in this series remove the
>> need for begin() to do the allocate. At that point, it becomes safe for the
>> allocate to call begin() and really reserve the space.
>>
>> Until then, there is a potential for insufficient space to be available at the
>> point of calling i915_add_request(). However, that would only be in the case
>> where the request was created and immediately submitted without ever calling
>> ring_begin() and adding any work to that request. Which should never happen. And
>> even if it does, and if that request happens to fall down the tiny window of
>> opportunity for failing due to being out of ring space then does it really
>> matter because the request wasn't doing anything in the first place?
>>
>> v2: Updated the 'reserved space too small' warning to include the offending
>> sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
>> re-initialisation of tracking state after a buffer wrap to keep the sanity
>> checks accurate.
>>
>> v3: Incremented the reserved size to accommodate Ironlake (after finally
>> managing to run on an ILK system). Also fixed missing wrap code in LRC mode.
>>
>> v4: Added extra comment and removed duplicate WARN (feedback from Tomas).
>>
>> v5: Re-write of wrap handling to prevent unnecessary early wraps (feedback from
>> Daniel Vetter).
> This didn't actually implement what I suggested (wrapping is the worst
> case, hence skipping the check for that is breaking the sanity check) and
> so changed the patch from "correct, but a bit fragile" to broken. I've
> merged the previous version instead.
> -Daniel
I'm confused. I thought your main issue was the early wrapping not the 
sanity check. The check is to ensure that the reservation is large 
enough to cover all the commands written during request submission. That 
should not be affected by whether a wrap occurs or not. Wrapping does 
not magically add an extra bunch of dwords to the emit_request() call. 
Whereas making the check work with the wrap condition requires adding in 
extra tracking state of exactly where the wrap occurred. That is extra 
code that only exists to catch something in the very rare case which 
should already have been caught in the very common case. I.e. if your 
reserved size is too small then you will hit the warning on every batch 
buffer submission.

John.

>> For: VIZ-5115
>> CC: Tomas Elf <tomas.elf@intel.com>
>> CC: Daniel Vetter <daniel@ffwll.ch>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         |    1 +
>>   drivers/gpu/drm/i915/i915_gem.c         |   37 ++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c        |   35 +++++++++--
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   98 +++++++++++++++++++++++++++++--
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   25 ++++++++
>>   5 files changed, 186 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 0347eb9..eba1857 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2187,6 +2187,7 @@ struct drm_i915_gem_request {
>>   
>>   int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>   			   struct intel_context *ctx);
>> +void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>>   void i915_gem_request_free(struct kref *req_ref);
>>   
>>   static inline uint32_t
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 81f3512..85fa27b 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2485,6 +2485,13 @@ int __i915_add_request(struct intel_engine_cs *ring,
>>   	} else
>>   		ringbuf = ring->buffer;
>>   
>> +	/*
>> +	 * To ensure that this call will not fail, space for its emissions
>> +	 * should already have been reserved in the ring buffer. Let the ring
>> +	 * know that it is time to use that space up.
>> +	 */
>> +	intel_ring_reserved_space_use(ringbuf);
>> +
>>   	request_start = intel_ring_get_tail(ringbuf);
>>   	/*
>>   	 * Emit any outstanding flushes - execbuf can fail to emit the flush
>> @@ -2567,6 +2574,9 @@ int __i915_add_request(struct intel_engine_cs *ring,
>>   			   round_jiffies_up_relative(HZ));
>>   	intel_mark_busy(dev_priv->dev);
>>   
>> +	/* Sanity check that the reserved size was large enough. */
>> +	intel_ring_reserved_space_end(ringbuf);
>> +
>>   	return 0;
>>   }
>>   
>> @@ -2666,6 +2676,26 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>   	if (ret)
>>   		goto err;
>>   
>> +	/*
>> +	 * Reserve space in the ring buffer for all the commands required to
>> +	 * eventually emit this request. This is to guarantee that the
>> +	 * i915_add_request() call can't fail. Note that the reserve may need
>> +	 * to be redone if the request is not actually submitted straight
>> +	 * away, e.g. because a GPU scheduler has deferred it.
>> +	 *
>> +	 * Note further that this call merely notes the reserve request. A
>> +	 * subsequent call to *_ring_begin() is required to actually ensure
>> +	 * that the reservation is available. Without the begin, if the
>> +	 * request creator immediately submitted the request without adding
>> +	 * any commands to it then there might not actually be sufficient
>> +	 * room for the submission commands. Unfortunately, the current
>> +	 * *_ring_begin() implementations potentially call back here to
>> +	 * i915_gem_request_alloc(). Thus calling _begin() here would lead to
>> +	 * infinite recursion! Until that back call path is removed, it is
>> +	 * necessary to do a manual _begin() outside.
>> +	 */
>> +	intel_ring_reserved_space_reserve(req->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
>> +
>>   	ring->outstanding_lazy_request = req;
>>   	return 0;
>>   
>> @@ -2674,6 +2704,13 @@ err:
>>   	return ret;
>>   }
>>   
>> +void i915_gem_request_cancel(struct drm_i915_gem_request *req)
>> +{
>> +	intel_ring_reserved_space_cancel(req->ringbuf);
>> +
>> +	i915_gem_request_unreference(req);
>> +}
>> +
>>   struct drm_i915_gem_request *
>>   i915_gem_find_active_request(struct intel_engine_cs *ring)
>>   {
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 6a5ed07..bd62bd6 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -690,6 +690,9 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
>>   	if (intel_ring_space(ringbuf) >= bytes)
>>   		return 0;
>>   
>> +	/* The whole point of reserving space is to not wait! */
>> +	WARN_ON(ringbuf->reserved_in_use);
>> +
>>   	list_for_each_entry(request, &ring->request_list, list) {
>>   		/*
>>   		 * The request queue is per-engine, so can contain requests
>> @@ -748,8 +751,12 @@ static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
>>   	int rem = ringbuf->size - ringbuf->tail;
>>   
>>   	if (ringbuf->space < rem) {
>> -		int ret = logical_ring_wait_for_space(ringbuf, ctx, rem);
>> +		int ret;
>> +
>> +		/* Can't wait if space has already been reserved! */
>> +		WARN_ON(ringbuf->reserved_in_use);
>>   
>> +		ret = logical_ring_wait_for_space(ringbuf, ctx, rem);
>>   		if (ret)
>>   			return ret;
>>   	}
>> @@ -768,7 +775,7 @@ static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
>>   static int logical_ring_prepare(struct intel_ringbuffer *ringbuf,
>>   				struct intel_context *ctx, int bytes)
>>   {
>> -	int ret;
>> +	int ret, max_bytes;
>>   
>>   	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
>>   		ret = logical_ring_wrap_buffer(ringbuf, ctx);
>> @@ -776,8 +783,28 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf,
>>   			return ret;
>>   	}
>>   
>> -	if (unlikely(ringbuf->space < bytes)) {
>> -		ret = logical_ring_wait_for_space(ringbuf, ctx, bytes);
>> +	/*
>> +	 * Add on the reserved size to the request to make sure that after
>> +	 * the intended commands have been emitted, there is guaranteed to
>> +	 * still be enough free space to send them to the hardware.
>> +	 */
>> +	max_bytes = bytes + ringbuf->reserved_size;
>> +
>> +	if (unlikely(ringbuf->space < max_bytes)) {
>> +		/*
>> +		 * Bytes is guaranteed to fit within the tail of the buffer,
>> +		 * but the reserved space may push it off the end. If so then
>> +		 * need to wait for the whole of the tail plus the reserved
>> +		 * size. That should guarantee that the actual request
>> +		 * (bytes) will fit between here and the end and the reserved
>> +		 * usage will fit either in the same or at the start. Either
>> +		 * way, if a wrap occurs it will not involve a wait and thus
>> +		 * cannot fail.
>> +		 */
>> +		if (unlikely(ringbuf->tail + max_bytes + I915_RING_FREE_SPACE > ringbuf->effective_size))
>> +			max_bytes = ringbuf->reserved_size + I915_RING_FREE_SPACE + ringbuf->size - ringbuf->tail;
>> +
>> +		ret = logical_ring_wait_for_space(ringbuf, ctx, max_bytes);
>>   		if (unlikely(ret))
>>   			return ret;
>>   	}
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index d934f85..1c125e9 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2106,6 +2106,9 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
>>   	if (intel_ring_space(ringbuf) >= n)
>>   		return 0;
>>   
>> +	/* The whole point of reserving space is to not wait! */
>> +	WARN_ON(ringbuf->reserved_in_use);
>> +
>>   	list_for_each_entry(request, &ring->request_list, list) {
>>   		space = __intel_ring_space(request->postfix, ringbuf->tail,
>>   					   ringbuf->size);
>> @@ -2131,7 +2134,12 @@ static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
>>   	int rem = ringbuf->size - ringbuf->tail;
>>   
>>   	if (ringbuf->space < rem) {
>> -		int ret = ring_wait_for_space(ring, rem);
>> +		int ret;
>> +
>> +		/* Can't wait if space has already been reserved! */
>> +		WARN_ON(ringbuf->reserved_in_use);
>> +
>> +		ret = ring_wait_for_space(ring, rem);
>>   		if (ret)
>>   			return ret;
>>   	}
>> @@ -2180,11 +2188,69 @@ int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
>>   	return 0;
>>   }
>>   
>> -static int __intel_ring_prepare(struct intel_engine_cs *ring,
>> -				int bytes)
>> +void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size)
>> +{
>> +	/* NB: Until request management is fully tidied up and the OLR is
>> +	 * removed, there are too many ways for get false hits on this
>> +	 * anti-recursion check! */
>> +	/*WARN_ON(ringbuf->reserved_size);*/
>> +	WARN_ON(ringbuf->reserved_in_use);
>> +
>> +	ringbuf->reserved_size = size;
>> +
>> +	/*
>> +	 * Really need to call _begin() here but that currently leads to
>> +	 * recursion problems! This will be fixed later but for now just
>> +	 * return and hope for the best. Note that there is only a real
>> +	 * problem if the create of the request never actually calls _begin()
>> +	 * but if they are not submitting any work then why did they create
>> +	 * the request in the first place?
>> +	 */
>> +}
>> +
>> +void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf)
>> +{
>> +	WARN_ON(ringbuf->reserved_in_use);
>> +
>> +	ringbuf->reserved_size   = 0;
>> +	ringbuf->reserved_in_use = false;
>> +}
>> +
>> +void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf)
>> +{
>> +	WARN_ON(ringbuf->reserved_in_use);
>> +
>> +	ringbuf->reserved_in_use = true;
>> +	ringbuf->reserved_tail   = ringbuf->tail;
>> +}
>> +
>> +void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf)
>> +{
>> +	WARN_ON(!ringbuf->reserved_in_use);
>> +	if (ringbuf->tail > ringbuf->reserved_tail) {
>> +		WARN(ringbuf->tail > ringbuf->reserved_tail + ringbuf->reserved_size,
>> +		     "request reserved size too small: %d vs %d!\n",
>> +		     ringbuf->tail - ringbuf->reserved_tail, ringbuf->reserved_size);
>> +	} else {
>> +		/*
>> +		 * The ring was wrapped while the reserved space was in use.
>> +		 * That means that some unknown amount of the ring tail was
>> +		 * no-op filled and skipped. Thus simply adding the ring size
>> +		 * to the tail and doing the above space check will not work.
>> +		 * Rather than attempt to track how much tail was skipped,
>> +		 * it is much simpler to say that also skipping the sanity
>> +		 * check every once in a while is not a big issue.
>> +		 */
>> +	}
>> +
>> +	ringbuf->reserved_size   = 0;
>> +	ringbuf->reserved_in_use = false;
>> +}
>> +
>> +static int __intel_ring_prepare(struct intel_engine_cs *ring, int bytes)
>>   {
>>   	struct intel_ringbuffer *ringbuf = ring->buffer;
>> -	int ret;
>> +	int ret, max_bytes;
>>   
>>   	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
>>   		ret = intel_wrap_ring_buffer(ring);
>> @@ -2192,8 +2258,28 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring,
>>   			return ret;
>>   	}
>>   
>> -	if (unlikely(ringbuf->space < bytes)) {
>> -		ret = ring_wait_for_space(ring, bytes);
>> +	/*
>> +	 * Add on the reserved size to the request to make sure that after
>> +	 * the intended commands have been emitted, there is guaranteed to
>> +	 * still be enough free space to send them to the hardware.
>> +	 */
>> +	max_bytes = bytes + ringbuf->reserved_size;
>> +
>> +	if (unlikely(ringbuf->space < max_bytes)) {
>> +		/*
>> +		 * Bytes is guaranteed to fit within the tail of the buffer,
>> +		 * but the reserved space may push it off the end. If so then
>> +		 * need to wait for the whole of the tail plus the reserved
>> +		 * size. That should guarantee that the actual request
>> +		 * (bytes) will fit between here and the end and the reserved
>> +		 * usage will fit either in the same or at the start. Either
>> +		 * way, if a wrap occurs it will not involve a wait and thus
>> +		 * cannot fail.
>> +		 */
>> +		if (unlikely(ringbuf->tail + max_bytes > ringbuf->effective_size))
>> +			max_bytes = ringbuf->reserved_size + I915_RING_FREE_SPACE + ringbuf->size - ringbuf->tail;
>> +
>> +		ret = ring_wait_for_space(ring, max_bytes);
>>   		if (unlikely(ret))
>>   			return ret;
>>   	}
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 39f6dfc..bf2ac28 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -105,6 +105,9 @@ struct intel_ringbuffer {
>>   	int space;
>>   	int size;
>>   	int effective_size;
>> +	int reserved_size;
>> +	int reserved_tail;
>> +	bool reserved_in_use;
>>   
>>   	/** We track the position of the requests in the ring buffer, and
>>   	 * when each is retired we increment last_retired_head as the GPU
>> @@ -450,4 +453,26 @@ intel_ring_get_request(struct intel_engine_cs *ring)
>>   	return ring->outstanding_lazy_request;
>>   }
>>   
>> +/*
>> + * Arbitrary size for largest possible 'add request' sequence. The code paths
>> + * are complex and variable. Empirical measurement shows that the worst case
>> + * is ILK at 136 words. Reserving too much is better than reserving too little
>> + * as that allows for corner cases that might have been missed. So the figure
>> + * has been rounded up to 160 words.
>> + */
>> +#define MIN_SPACE_FOR_ADD_REQUEST	160
>> +
>> +/*
>> + * Reserve space in the ring to guarantee that the i915_add_request() call
>> + * will always have sufficient room to do its stuff. The request creation
>> + * code calls this automatically.
>> + */
>> +void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size);
>> +/* Cancel the reservation, e.g. because the request is being discarded. */
>> +void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf);
>> +/* Use the reserved space - for use by i915_add_request() only. */
>> +void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf);
>> +/* Finish with the reserved space - for use by i915_add_request() only. */
>> +void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf);
>> +
>>   #endif /* _INTEL_RINGBUFFER_H_ */
>> -- 
>> 1.7.9.5
>>
Daniel Vetter June 23, 2015, 1:24 p.m. UTC | #3
On Tue, Jun 23, 2015 at 12:38:01PM +0100, John Harrison wrote:
> On 22/06/2015 21:12, Daniel Vetter wrote:
> >On Fri, Jun 19, 2015 at 05:34:12PM +0100, John.C.Harrison@Intel.com wrote:
> >>From: John Harrison <John.C.Harrison@Intel.com>
> >>
> >>It is a bad idea for i915_add_request() to fail. The work will already have been
> >>send to the ring and will be processed, but there will not be any tracking or
> >>management of that work.
> >>
> >>The only way the add request call can fail is if it can't write its epilogue
> >>commands to the ring (cache flushing, seqno updates, interrupt signalling). The
> >>reasons for that are mostly down to running out of ring buffer space and the
> >>problems associated with trying to get some more. This patch prevents that
> >>situation from happening in the first place.
> >>
> >>When a request is created, it marks sufficient space as reserved for the
> >>epilogue commands. Thus guaranteeing that by the time the epilogue is written,
> >>there will be plenty of space for it. Note that a ring_begin() call is required
> >>to actually reserve the space (and do any potential waiting). However, that is
> >>not currently done at request creation time. This is because the ring_begin()
> >>code can allocate a request. Hence calling begin() from the request allocation
> >>code would lead to infinite recursion! Later patches in this series remove the
> >>need for begin() to do the allocate. At that point, it becomes safe for the
> >>allocate to call begin() and really reserve the space.
> >>
> >>Until then, there is a potential for insufficient space to be available at the
> >>point of calling i915_add_request(). However, that would only be in the case
> >>where the request was created and immediately submitted without ever calling
> >>ring_begin() and adding any work to that request. Which should never happen. And
> >>even if it does, and if that request happens to fall down the tiny window of
> >>opportunity for failing due to being out of ring space then does it really
> >>matter because the request wasn't doing anything in the first place?
> >>
> >>v2: Updated the 'reserved space too small' warning to include the offending
> >>sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
> >>re-initialisation of tracking state after a buffer wrap to keep the sanity
> >>checks accurate.
> >>
> >>v3: Incremented the reserved size to accommodate Ironlake (after finally
> >>managing to run on an ILK system). Also fixed missing wrap code in LRC mode.
> >>
> >>v4: Added extra comment and removed duplicate WARN (feedback from Tomas).
> >>
> >>v5: Re-write of wrap handling to prevent unnecessary early wraps (feedback from
> >>Daniel Vetter).
> >This didn't actually implement what I suggested (wrapping is the worst
> >case, hence skipping the check for that is breaking the sanity check) and
> >so changed the patch from "correct, but a bit fragile" to broken. I've
> >merged the previous version instead.
> >-Daniel
> I'm confused. I thought your main issue was the early wrapping not the
> sanity check. The check is to ensure that the reservation is large enough to
> cover all the commands written during request submission. That should not be
> affected by whether a wrap occurs or not. Wrapping does not magically add an
> extra bunch of dwords to the emit_request() call. Whereas making the check
> work with the wrap condition requires adding in extra tracking state of
> exactly where the wrap occurred. That is extra code that only exists to
> catch something in the very rare case which should already have been caught
> in the very common case. I.e. if your reserved size is too small then you
> will hit the warning on every batch buffer submission.

The problem is that if you allow a wrap in the reserve size then the
ringspace requirements are bigger than if you don't wrap. And since the
add request is split up into many intel_ring_begin that's possible. Hence
if you allow wrapping in the reserved space, then the most important case
for the debug check is to make sure that it catches any kind of
reservation overflow while wrapping. The not-wrapped case is probably the
boring one.

And indeed eventually we should overflow since according to your comment
the worst case add request on ilk is 136 dwords. And the largest
intel_ring_begin in there is 32 dwords, which means at most we'll throw
away 31 dwords when wrapping. Which means the 160 dwords of reservation
are not enough since we'd need 167 dwords of space for the worst case. But
since the space_end debug check was a no-op for the wrapped case you won't
catch this one.

Wrt keeping track of wrapping while the reservation is in use, the
following should do that without any need of additional tracking:


	int used_size = ringbuf->tail - ringbuf->reserved_tail;

	if (used_size < 0)
		used_size += ringbuf->size;

	WARN(used_size < ringbuf->reserved_size,
	     "request reserved size too small: %d vs %d!\n",
	     used_size, ringbuf->reserved_size);

I was mistaken that you can reuse __intel_ring_space (since that has
slightly different requirements), but this gives you a nicely localized
check for reservation overflow which works even when you wrap. Ofc it
won't work if an add_request is bigger than the entire ring, but that's
impossible anyway since we can at most reserve ringbuf->size -
I915_RING_FREE_SPACE.

Or do I still miss something?
-Daniel
John Harrison June 23, 2015, 3:43 p.m. UTC | #4
On 23/06/2015 14:24, Daniel Vetter wrote:
> On Tue, Jun 23, 2015 at 12:38:01PM +0100, John Harrison wrote:
>> On 22/06/2015 21:12, Daniel Vetter wrote:
>>> On Fri, Jun 19, 2015 at 05:34:12PM +0100, John.C.Harrison@Intel.com wrote:
>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> It is a bad idea for i915_add_request() to fail. The work will already have been
>>>> send to the ring and will be processed, but there will not be any tracking or
>>>> management of that work.
>>>>
>>>> The only way the add request call can fail is if it can't write its epilogue
>>>> commands to the ring (cache flushing, seqno updates, interrupt signalling). The
>>>> reasons for that are mostly down to running out of ring buffer space and the
>>>> problems associated with trying to get some more. This patch prevents that
>>>> situation from happening in the first place.
>>>>
>>>> When a request is created, it marks sufficient space as reserved for the
>>>> epilogue commands. Thus guaranteeing that by the time the epilogue is written,
>>>> there will be plenty of space for it. Note that a ring_begin() call is required
>>>> to actually reserve the space (and do any potential waiting). However, that is
>>>> not currently done at request creation time. This is because the ring_begin()
>>>> code can allocate a request. Hence calling begin() from the request allocation
>>>> code would lead to infinite recursion! Later patches in this series remove the
>>>> need for begin() to do the allocate. At that point, it becomes safe for the
>>>> allocate to call begin() and really reserve the space.
>>>>
>>>> Until then, there is a potential for insufficient space to be available at the
>>>> point of calling i915_add_request(). However, that would only be in the case
>>>> where the request was created and immediately submitted without ever calling
>>>> ring_begin() and adding any work to that request. Which should never happen. And
>>>> even if it does, and if that request happens to fall down the tiny window of
>>>> opportunity for failing due to being out of ring space then does it really
>>>> matter because the request wasn't doing anything in the first place?
>>>>
>>>> v2: Updated the 'reserved space too small' warning to include the offending
>>>> sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
>>>> re-initialisation of tracking state after a buffer wrap to keep the sanity
>>>> checks accurate.
>>>>
>>>> v3: Incremented the reserved size to accommodate Ironlake (after finally
>>>> managing to run on an ILK system). Also fixed missing wrap code in LRC mode.
>>>>
>>>> v4: Added extra comment and removed duplicate WARN (feedback from Tomas).
>>>>
>>>> v5: Re-write of wrap handling to prevent unnecessary early wraps (feedback from
>>>> Daniel Vetter).
>>> This didn't actually implement what I suggested (wrapping is the worst
>>> case, hence skipping the check for that is breaking the sanity check) and
>>> so changed the patch from "correct, but a bit fragile" to broken. I've
>>> merged the previous version instead.
>>> -Daniel
>> I'm confused. I thought your main issue was the early wrapping not the
>> sanity check. The check is to ensure that the reservation is large enough to
>> cover all the commands written during request submission. That should not be
>> affected by whether a wrap occurs or not. Wrapping does not magically add an
>> extra bunch of dwords to the emit_request() call. Whereas making the check
>> work with the wrap condition requires adding in extra tracking state of
>> exactly where the wrap occurred. That is extra code that only exists to
>> catch something in the very rare case which should already have been caught
>> in the very common case. I.e. if your reserved size is too small then you
>> will hit the warning on every batch buffer submission.
> The problem is that if you allow a wrap in the reserve size then the
> ringspace requirements are bigger than if you don't wrap. And since the
> add request is split up into many intel_ring_begin that's possible. Hence
> if you allow wrapping in the reserved space, then the most important case
> for the debug check is to make sure that it catches any kind of
> reservation overflow while wrapping. The not-wrapped case is probably the
> boring one.
>
> And indeed eventually we should overflow since according to your comment
> the worst case add request on ilk is 136 dwords. And the largest
> intel_ring_begin in there is 32 dwords, which means at most we'll throw
> away 31 dwords when wrapping. Which means the 160 dwords of reservation
> are not enough since we'd need 167 dwords of space for the worst case. But
> since the space_end debug check was a no-op for the wrapped case you won't
> catch this one.

The minimum reservation size in this case is still only 136. The prepare 
code checks for the 32 words actually requested and wraps if necessary. 
It then checks for 136+32 words of space. If that would cause a wrap it 
will then add on the amount of space actually left in the ring and wait 
for that bigger total. That guarantees that it has waited for the 136 at 
the start of the ring. The caller is then free to fill in the 32 words 
and there is still guaranteed to be a minimum of 136 words available 
(with or without wrapping) before any further wait for space is 
necessary. Thus the add_request() code is safe from fear of failure 
irrespective of where any wrap might occur.


>
> Wrt keeping track of wrapping while the reservation is in use, the
> following should do that without any need of additional tracking:
>
>
> 	int used_size = ringbuf->tail - ringbuf->reserved_tail;
>
> 	if (used_size < 0)
> 		used_size += ringbuf->size;
>
> 	WARN(used_size < ringbuf->reserved_size,
> 	     "request reserved size too small: %d vs %d!\n",
> 	     used_size, ringbuf->reserved_size);
>
> I was mistaken that you can reuse __intel_ring_space (since that has
> slightly different requirements), but this gives you a nicely localized
> check for reservation overflow which works even when you wrap. Ofc it
> won't work if an add_request is bigger than the entire ring, but that's
> impossible anyway since we can at most reserve ringbuf->size -
> I915_RING_FREE_SPACE.
The problem with the above calculation is that it includes the wasted 
space at the end of the ring. Thus it will complain the reserved size 
was too small when in fact it was just fine.


> Or do I still miss something?
> -Daniel
Daniel Vetter June 23, 2015, 8 p.m. UTC | #5
On Tue, Jun 23, 2015 at 04:43:24PM +0100, John Harrison wrote:
> On 23/06/2015 14:24, Daniel Vetter wrote:
> >On Tue, Jun 23, 2015 at 12:38:01PM +0100, John Harrison wrote:
> >>On 22/06/2015 21:12, Daniel Vetter wrote:
> >>>On Fri, Jun 19, 2015 at 05:34:12PM +0100, John.C.Harrison@Intel.com wrote:
> >>>>From: John Harrison <John.C.Harrison@Intel.com>
> >>>>
> >>>>It is a bad idea for i915_add_request() to fail. The work will already have been
> >>>>send to the ring and will be processed, but there will not be any tracking or
> >>>>management of that work.
> >>>>
> >>>>The only way the add request call can fail is if it can't write its epilogue
> >>>>commands to the ring (cache flushing, seqno updates, interrupt signalling). The
> >>>>reasons for that are mostly down to running out of ring buffer space and the
> >>>>problems associated with trying to get some more. This patch prevents that
> >>>>situation from happening in the first place.
> >>>>
> >>>>When a request is created, it marks sufficient space as reserved for the
> >>>>epilogue commands. Thus guaranteeing that by the time the epilogue is written,
> >>>>there will be plenty of space for it. Note that a ring_begin() call is required
> >>>>to actually reserve the space (and do any potential waiting). However, that is
> >>>>not currently done at request creation time. This is because the ring_begin()
> >>>>code can allocate a request. Hence calling begin() from the request allocation
> >>>>code would lead to infinite recursion! Later patches in this series remove the
> >>>>need for begin() to do the allocate. At that point, it becomes safe for the
> >>>>allocate to call begin() and really reserve the space.
> >>>>
> >>>>Until then, there is a potential for insufficient space to be available at the
> >>>>point of calling i915_add_request(). However, that would only be in the case
> >>>>where the request was created and immediately submitted without ever calling
> >>>>ring_begin() and adding any work to that request. Which should never happen. And
> >>>>even if it does, and if that request happens to fall down the tiny window of
> >>>>opportunity for failing due to being out of ring space then does it really
> >>>>matter because the request wasn't doing anything in the first place?
> >>>>
> >>>>v2: Updated the 'reserved space too small' warning to include the offending
> >>>>sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
> >>>>re-initialisation of tracking state after a buffer wrap to keep the sanity
> >>>>checks accurate.
> >>>>
> >>>>v3: Incremented the reserved size to accommodate Ironlake (after finally
> >>>>managing to run on an ILK system). Also fixed missing wrap code in LRC mode.
> >>>>
> >>>>v4: Added extra comment and removed duplicate WARN (feedback from Tomas).
> >>>>
> >>>>v5: Re-write of wrap handling to prevent unnecessary early wraps (feedback from
> >>>>Daniel Vetter).
> >>>This didn't actually implement what I suggested (wrapping is the worst
> >>>case, hence skipping the check for that is breaking the sanity check) and
> >>>so changed the patch from "correct, but a bit fragile" to broken. I've
> >>>merged the previous version instead.
> >>>-Daniel
> >>I'm confused. I thought your main issue was the early wrapping not the
> >>sanity check. The check is to ensure that the reservation is large enough to
> >>cover all the commands written during request submission. That should not be
> >>affected by whether a wrap occurs or not. Wrapping does not magically add an
> >>extra bunch of dwords to the emit_request() call. Whereas making the check
> >>work with the wrap condition requires adding in extra tracking state of
> >>exactly where the wrap occurred. That is extra code that only exists to
> >>catch something in the very rare case which should already have been caught
> >>in the very common case. I.e. if your reserved size is too small then you
> >>will hit the warning on every batch buffer submission.
> >The problem is that if you allow a wrap in the reserve size then the
> >ringspace requirements are bigger than if you don't wrap. And since the
> >add request is split up into many intel_ring_begin that's possible. Hence
> >if you allow wrapping in the reserved space, then the most important case
> >for the debug check is to make sure that it catches any kind of
> >reservation overflow while wrapping. The not-wrapped case is probably the
> >boring one.
> >
> >And indeed eventually we should overflow since according to your comment
> >the worst case add request on ilk is 136 dwords. And the largest
> >intel_ring_begin in there is 32 dwords, which means at most we'll throw
> >away 31 dwords when wrapping. Which means the 160 dwords of reservation
> >are not enough since we'd need 167 dwords of space for the worst case. But
> >since the space_end debug check was a no-op for the wrapped case you won't
> >catch this one.
> 
> The minimum reservation size in this case is still only 136. The prepare
> code checks for the 32 words actually requested and wraps if necessary. It
> then checks for 136+32 words of space. If that would cause a wrap it will
> then add on the amount of space actually left in the ring and wait for that
> bigger total. That guarantees that it has waited for the 136 at the start of
> the ring. The caller is then free to fill in the 32 words and there is still
> guaranteed to be a minimum of 136 words available (with or without wrapping)
> before any further wait for space is necessary. Thus the add_request() code
> is safe from fear of failure irrespective of where any wrap might occur.
> 
> 
> >
> >Wrt keeping track of wrapping while the reservation is in use, the
> >following should do that without any need of additional tracking:
> >
> >
> >	int used_size = ringbuf->tail - ringbuf->reserved_tail;
> >
> >	if (used_size < 0)
> >		used_size += ringbuf->size;
> >
> >	WARN(used_size < ringbuf->reserved_size,
> >	     "request reserved size too small: %d vs %d!\n",
> >	     used_size, ringbuf->reserved_size);
> >
> >I was mistaken that you can reuse __intel_ring_space (since that has
> >slightly different requirements), but this gives you a nicely localized
> >check for reservation overflow which works even when you wrap. Ofc it
> >won't work if an add_request is bigger than the entire ring, but that's
> >impossible anyway since we can at most reserve ringbuf->size -
> >I915_RING_FREE_SPACE.
> The problem with the above calculation is that it includes the wasted space
> at the end of the ring. Thus it will complain the reserved size was too
> small when in fact it was just fine.

Ok I again misunderstood your patch a bit since it didn't quite do what I
expect, and I stand corrected that v5 works too. But I still seem to fail
to get my main concern across. I'll see whether I can whip up a patch as a
short demonstration, maybe that helps to unconfuse this dicussion.

For now I think we're covered with either v4 or v5 so sticking with either
is ok with me.
-Daniel
John Harrison June 24, 2015, 12:18 p.m. UTC | #6
On 23/06/2015 21:00, Daniel Vetter wrote:
> On Tue, Jun 23, 2015 at 04:43:24PM +0100, John Harrison wrote:
>> On 23/06/2015 14:24, Daniel Vetter wrote:
>>> On Tue, Jun 23, 2015 at 12:38:01PM +0100, John Harrison wrote:
>>>> On 22/06/2015 21:12, Daniel Vetter wrote:
>>>>> On Fri, Jun 19, 2015 at 05:34:12PM +0100, John.C.Harrison@Intel.com wrote:
>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>
>>>>>> It is a bad idea for i915_add_request() to fail. The work will already have been
>>>>>> send to the ring and will be processed, but there will not be any tracking or
>>>>>> management of that work.
>>>>>>
>>>>>> The only way the add request call can fail is if it can't write its epilogue
>>>>>> commands to the ring (cache flushing, seqno updates, interrupt signalling). The
>>>>>> reasons for that are mostly down to running out of ring buffer space and the
>>>>>> problems associated with trying to get some more. This patch prevents that
>>>>>> situation from happening in the first place.
>>>>>>
>>>>>> When a request is created, it marks sufficient space as reserved for the
>>>>>> epilogue commands. Thus guaranteeing that by the time the epilogue is written,
>>>>>> there will be plenty of space for it. Note that a ring_begin() call is required
>>>>>> to actually reserve the space (and do any potential waiting). However, that is
>>>>>> not currently done at request creation time. This is because the ring_begin()
>>>>>> code can allocate a request. Hence calling begin() from the request allocation
>>>>>> code would lead to infinite recursion! Later patches in this series remove the
>>>>>> need for begin() to do the allocate. At that point, it becomes safe for the
>>>>>> allocate to call begin() and really reserve the space.
>>>>>>
>>>>>> Until then, there is a potential for insufficient space to be available at the
>>>>>> point of calling i915_add_request(). However, that would only be in the case
>>>>>> where the request was created and immediately submitted without ever calling
>>>>>> ring_begin() and adding any work to that request. Which should never happen. And
>>>>>> even if it does, and if that request happens to fall down the tiny window of
>>>>>> opportunity for failing due to being out of ring space then does it really
>>>>>> matter because the request wasn't doing anything in the first place?
>>>>>>
>>>>>> v2: Updated the 'reserved space too small' warning to include the offending
>>>>>> sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
>>>>>> re-initialisation of tracking state after a buffer wrap to keep the sanity
>>>>>> checks accurate.
>>>>>>
>>>>>> v3: Incremented the reserved size to accommodate Ironlake (after finally
>>>>>> managing to run on an ILK system). Also fixed missing wrap code in LRC mode.
>>>>>>
>>>>>> v4: Added extra comment and removed duplicate WARN (feedback from Tomas).
>>>>>>
>>>>>> v5: Re-write of wrap handling to prevent unnecessary early wraps (feedback from
>>>>>> Daniel Vetter).
>>>>> This didn't actually implement what I suggested (wrapping is the worst
>>>>> case, hence skipping the check for that is breaking the sanity check) and
>>>>> so changed the patch from "correct, but a bit fragile" to broken. I've
>>>>> merged the previous version instead.
>>>>> -Daniel
>>>> I'm confused. I thought your main issue was the early wrapping not the
>>>> sanity check. The check is to ensure that the reservation is large enough to
>>>> cover all the commands written during request submission. That should not be
>>>> affected by whether a wrap occurs or not. Wrapping does not magically add an
>>>> extra bunch of dwords to the emit_request() call. Whereas making the check
>>>> work with the wrap condition requires adding in extra tracking state of
>>>> exactly where the wrap occurred. That is extra code that only exists to
>>>> catch something in the very rare case which should already have been caught
>>>> in the very common case. I.e. if your reserved size is too small then you
>>>> will hit the warning on every batch buffer submission.
>>> The problem is that if you allow a wrap in the reserve size then the
>>> ringspace requirements are bigger than if you don't wrap. And since the
>>> add request is split up into many intel_ring_begin that's possible. Hence
>>> if you allow wrapping in the reserved space, then the most important case
>>> for the debug check is to make sure that it catches any kind of
>>> reservation overflow while wrapping. The not-wrapped case is probably the
>>> boring one.
>>>
>>> And indeed eventually we should overflow since according to your comment
>>> the worst case add request on ilk is 136 dwords. And the largest
>>> intel_ring_begin in there is 32 dwords, which means at most we'll throw
>>> away 31 dwords when wrapping. Which means the 160 dwords of reservation
>>> are not enough since we'd need 167 dwords of space for the worst case. But
>>> since the space_end debug check was a no-op for the wrapped case you won't
>>> catch this one.
>> The minimum reservation size in this case is still only 136. The prepare
>> code checks for the 32 words actually requested and wraps if necessary. It
>> then checks for 136+32 words of space. If that would cause a wrap it will
>> then add on the amount of space actually left in the ring and wait for that
>> bigger total. That guarantees that it has waited for the 136 at the start of
>> the ring. The caller is then free to fill in the 32 words and there is still
>> guaranteed to be a minimum of 136 words available (with or without wrapping)
>> before any further wait for space is necessary. Thus the add_request() code
>> is safe from fear of failure irrespective of where any wrap might occur.
>>
>>
>>> Wrt keeping track of wrapping while the reservation is in use, the
>>> following should do that without any need of additional tracking:
>>>
>>>
>>> 	int used_size = ringbuf->tail - ringbuf->reserved_tail;
>>>
>>> 	if (used_size < 0)
>>> 		used_size += ringbuf->size;
>>>
>>> 	WARN(used_size < ringbuf->reserved_size,
>>> 	     "request reserved size too small: %d vs %d!\n",
>>> 	     used_size, ringbuf->reserved_size);
>>>
>>> I was mistaken that you can reuse __intel_ring_space (since that has
>>> slightly different requirements), but this gives you a nicely localized
>>> check for reservation overflow which works even when you wrap. Ofc it
>>> won't work if an add_request is bigger than the entire ring, but that's
>>> impossible anyway since we can at most reserve ringbuf->size -
>>> I915_RING_FREE_SPACE.
>> The problem with the above calculation is that it includes the wasted space
>> at the end of the ring. Thus it will complain the reserved size was too
>> small when in fact it was just fine.
> Ok I again misunderstood your patch a bit since it didn't quite do what I
> expect, and I stand corrected that v5 works too. But I still seem to fail
> to get my main concern across. I'll see whether I can whip up a patch as a
> short demonstration, maybe that helps to unconfuse this dicussion.
>
> For now I think we're covered with either v4 or v5 so sticking with either
> is ok with me.
> -Daniel

I think v5 is much better. It reduces the ring space wastage which I 
thought was your main concern.

The problem with a more simplistic approach that just doubles the 
minimum reserve size to ensure that it will fit before or after a wrap 
is that you are doubling the reserve size. That too is rather wasteful 
of ring space. It also means that you only find out when the reserve 
size is too small when you hit the maximum usage coincident with a worst 
case wrap point. Whereas the v5 method means that you notice a too small 
reserve whether wrapping or not.

John.
Daniel Vetter June 24, 2015, 12:45 p.m. UTC | #7
On Wed, Jun 24, 2015 at 01:18:48PM +0100, John Harrison wrote:
> On 23/06/2015 21:00, Daniel Vetter wrote:
> >On Tue, Jun 23, 2015 at 04:43:24PM +0100, John Harrison wrote:
> >>On 23/06/2015 14:24, Daniel Vetter wrote:
> >>>On Tue, Jun 23, 2015 at 12:38:01PM +0100, John Harrison wrote:
> >>>>On 22/06/2015 21:12, Daniel Vetter wrote:
> >>>>>On Fri, Jun 19, 2015 at 05:34:12PM +0100, John.C.Harrison@Intel.com wrote:
> >>>>>>From: John Harrison <John.C.Harrison@Intel.com>
> >>>>>>
> >>>>>>It is a bad idea for i915_add_request() to fail. The work will already have been
> >>>>>>send to the ring and will be processed, but there will not be any tracking or
> >>>>>>management of that work.
> >>>>>>
> >>>>>>The only way the add request call can fail is if it can't write its epilogue
> >>>>>>commands to the ring (cache flushing, seqno updates, interrupt signalling). The
> >>>>>>reasons for that are mostly down to running out of ring buffer space and the
> >>>>>>problems associated with trying to get some more. This patch prevents that
> >>>>>>situation from happening in the first place.
> >>>>>>
> >>>>>>When a request is created, it marks sufficient space as reserved for the
> >>>>>>epilogue commands. Thus guaranteeing that by the time the epilogue is written,
> >>>>>>there will be plenty of space for it. Note that a ring_begin() call is required
> >>>>>>to actually reserve the space (and do any potential waiting). However, that is
> >>>>>>not currently done at request creation time. This is because the ring_begin()
> >>>>>>code can allocate a request. Hence calling begin() from the request allocation
> >>>>>>code would lead to infinite recursion! Later patches in this series remove the
> >>>>>>need for begin() to do the allocate. At that point, it becomes safe for the
> >>>>>>allocate to call begin() and really reserve the space.
> >>>>>>
> >>>>>>Until then, there is a potential for insufficient space to be available at the
> >>>>>>point of calling i915_add_request(). However, that would only be in the case
> >>>>>>where the request was created and immediately submitted without ever calling
> >>>>>>ring_begin() and adding any work to that request. Which should never happen. And
> >>>>>>even if it does, and if that request happens to fall down the tiny window of
> >>>>>>opportunity for failing due to being out of ring space then does it really
> >>>>>>matter because the request wasn't doing anything in the first place?
> >>>>>>
> >>>>>>v2: Updated the 'reserved space too small' warning to include the offending
> >>>>>>sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
> >>>>>>re-initialisation of tracking state after a buffer wrap to keep the sanity
> >>>>>>checks accurate.
> >>>>>>
> >>>>>>v3: Incremented the reserved size to accommodate Ironlake (after finally
> >>>>>>managing to run on an ILK system). Also fixed missing wrap code in LRC mode.
> >>>>>>
> >>>>>>v4: Added extra comment and removed duplicate WARN (feedback from Tomas).
> >>>>>>
> >>>>>>v5: Re-write of wrap handling to prevent unnecessary early wraps (feedback from
> >>>>>>Daniel Vetter).
> >>>>>This didn't actually implement what I suggested (wrapping is the worst
> >>>>>case, hence skipping the check for that is breaking the sanity check) and
> >>>>>so changed the patch from "correct, but a bit fragile" to broken. I've
> >>>>>merged the previous version instead.
> >>>>>-Daniel
> >>>>I'm confused. I thought your main issue was the early wrapping not the
> >>>>sanity check. The check is to ensure that the reservation is large enough to
> >>>>cover all the commands written during request submission. That should not be
> >>>>affected by whether a wrap occurs or not. Wrapping does not magically add an
> >>>>extra bunch of dwords to the emit_request() call. Whereas making the check
> >>>>work with the wrap condition requires adding in extra tracking state of
> >>>>exactly where the wrap occurred. That is extra code that only exists to
> >>>>catch something in the very rare case which should already have been caught
> >>>>in the very common case. I.e. if your reserved size is too small then you
> >>>>will hit the warning on every batch buffer submission.
> >>>The problem is that if you allow a wrap in the reserve size then the
> >>>ringspace requirements are bigger than if you don't wrap. And since the
> >>>add request is split up into many intel_ring_begin that's possible. Hence
> >>>if you allow wrapping in the reserved space, then the most important case
> >>>for the debug check is to make sure that it catches any kind of
> >>>reservation overflow while wrapping. The not-wrapped case is probably the
> >>>boring one.
> >>>
> >>>And indeed eventually we should overflow since according to your comment
> >>>the worst case add request on ilk is 136 dwords. And the largest
> >>>intel_ring_begin in there is 32 dwords, which means at most we'll throw
> >>>away 31 dwords when wrapping. Which means the 160 dwords of reservation
> >>>are not enough since we'd need 167 dwords of space for the worst case. But
> >>>since the space_end debug check was a no-op for the wrapped case you won't
> >>>catch this one.
> >>The minimum reservation size in this case is still only 136. The prepare
> >>code checks for the 32 words actually requested and wraps if necessary. It
> >>then checks for 136+32 words of space. If that would cause a wrap it will
> >>then add on the amount of space actually left in the ring and wait for that
> >>bigger total. That guarantees that it has waited for the 136 at the start of
> >>the ring. The caller is then free to fill in the 32 words and there is still
> >>guaranteed to be a minimum of 136 words available (with or without wrapping)
> >>before any further wait for space is necessary. Thus the add_request() code
> >>is safe from fear of failure irrespective of where any wrap might occur.
> >>
> >>
> >>>Wrt keeping track of wrapping while the reservation is in use, the
> >>>following should do that without any need of additional tracking:
> >>>
> >>>
> >>>	int used_size = ringbuf->tail - ringbuf->reserved_tail;
> >>>
> >>>	if (used_size < 0)
> >>>		used_size += ringbuf->size;
> >>>
> >>>	WARN(used_size < ringbuf->reserved_size,
> >>>	     "request reserved size too small: %d vs %d!\n",
> >>>	     used_size, ringbuf->reserved_size);
> >>>
> >>>I was mistaken that you can reuse __intel_ring_space (since that has
> >>>slightly different requirements), but this gives you a nicely localized
> >>>check for reservation overflow which works even when you wrap. Ofc it
> >>>won't work if an add_request is bigger than the entire ring, but that's
> >>>impossible anyway since we can at most reserve ringbuf->size -
> >>>I915_RING_FREE_SPACE.
> >>The problem with the above calculation is that it includes the wasted space
> >>at the end of the ring. Thus it will complain the reserved size was too
> >>small when in fact it was just fine.
> >Ok I again misunderstood your patch a bit since it didn't quite do what I
> >expect, and I stand corrected that v5 works too. But I still seem to fail
> >to get my main concern across. I'll see whether I can whip up a patch as a
> >short demonstration, maybe that helps to unconfuse this dicussion.
> >
> >For now I think we're covered with either v4 or v5 so sticking with either
> >is ok with me.
> >-Daniel
> 
> I think v5 is much better. It reduces the ring space wastage which I thought
> was your main concern.

Ok with me too - I simply didn't pick it up when merging yesterday because
I couldn't immediately convince myself it's correct, but really wanted to
pull in your series. Unfortunately it's now burried below piles of
patches, so can you please do a delta patch?

> The problem with a more simplistic approach that just doubles the minimum
> reserve size to ensure that it will fit before or after a wrap is that you
> are doubling the reserve size. That too is rather wasteful of ring space. It
> also means that you only find out when the reserve size is too small when
> you hit the maximum usage coincident with a worst case wrap point. Whereas
> the v5 method means that you notice a too small reserve whether wrapping or
> not.

We don't need to double the reservation since the add_request tail is
split up into many individual intel_ring_begin. And we'd only need to wrap
for the largest of those, which is substantially less than the entire
reservation. Furthermore with the reservation these commands can't ever
fail, so for those we know are only used in the add_request tail we could
go to a wrap-only intel_ring_begin which never waits and have one at a
dword cmd boundary. That means we'd need to overestimate the needed
ringbuffer space by just a few dwords (namely the size of the longest CS
cmd we emit under reservation). Which is around 6 dwords or so iirc. And
to avoid changing ilk we could just special case that in reserve_space().

In practice I don't think there would be any difference with your v5 since
especially with the scheduler we shouldn't ever overfill rings really. But
the clear upside is that the reserve_space_end check would be independent
of any implementation details of how reservation vs. wrapping is done
exactly. And hence robust against any future fumbles in this area. Looking
at our history of the relevant code we can expect a lot of those.
-Daniel
John Harrison June 24, 2015, 5:05 p.m. UTC | #8
On 24/06/2015 13:45, Daniel Vetter wrote:
> On Wed, Jun 24, 2015 at 01:18:48PM +0100, John Harrison wrote:
>> On 23/06/2015 21:00, Daniel Vetter wrote:
>>> On Tue, Jun 23, 2015 at 04:43:24PM +0100, John Harrison wrote:
>>>> On 23/06/2015 14:24, Daniel Vetter wrote:
>>>>> On Tue, Jun 23, 2015 at 12:38:01PM +0100, John Harrison wrote:
>>>>>> On 22/06/2015 21:12, Daniel Vetter wrote:
>>>>>>> On Fri, Jun 19, 2015 at 05:34:12PM +0100, John.C.Harrison@Intel.com wrote:
>>>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>>
>>>>>>>> It is a bad idea for i915_add_request() to fail. The work will already have been
>>>>>>>> send to the ring and will be processed, but there will not be any tracking or
>>>>>>>> management of that work.
>>>>>>>>
>>>>>>>> The only way the add request call can fail is if it can't write its epilogue
>>>>>>>> commands to the ring (cache flushing, seqno updates, interrupt signalling). The
>>>>>>>> reasons for that are mostly down to running out of ring buffer space and the
>>>>>>>> problems associated with trying to get some more. This patch prevents that
>>>>>>>> situation from happening in the first place.
>>>>>>>>
>>>>>>>> When a request is created, it marks sufficient space as reserved for the
>>>>>>>> epilogue commands. Thus guaranteeing that by the time the epilogue is written,
>>>>>>>> there will be plenty of space for it. Note that a ring_begin() call is required
>>>>>>>> to actually reserve the space (and do any potential waiting). However, that is
>>>>>>>> not currently done at request creation time. This is because the ring_begin()
>>>>>>>> code can allocate a request. Hence calling begin() from the request allocation
>>>>>>>> code would lead to infinite recursion! Later patches in this series remove the
>>>>>>>> need for begin() to do the allocate. At that point, it becomes safe for the
>>>>>>>> allocate to call begin() and really reserve the space.
>>>>>>>>
>>>>>>>> Until then, there is a potential for insufficient space to be available at the
>>>>>>>> point of calling i915_add_request(). However, that would only be in the case
>>>>>>>> where the request was created and immediately submitted without ever calling
>>>>>>>> ring_begin() and adding any work to that request. Which should never happen. And
>>>>>>>> even if it does, and if that request happens to fall down the tiny window of
>>>>>>>> opportunity for failing due to being out of ring space then does it really
>>>>>>>> matter because the request wasn't doing anything in the first place?
>>>>>>>>
>>>>>>>> v2: Updated the 'reserved space too small' warning to include the offending
>>>>>>>> sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
>>>>>>>> re-initialisation of tracking state after a buffer wrap to keep the sanity
>>>>>>>> checks accurate.
>>>>>>>>
>>>>>>>> v3: Incremented the reserved size to accommodate Ironlake (after finally
>>>>>>>> managing to run on an ILK system). Also fixed missing wrap code in LRC mode.
>>>>>>>>
>>>>>>>> v4: Added extra comment and removed duplicate WARN (feedback from Tomas).
>>>>>>>>
>>>>>>>> v5: Re-write of wrap handling to prevent unnecessary early wraps (feedback from
>>>>>>>> Daniel Vetter).
>>>>>>> This didn't actually implement what I suggested (wrapping is the worst
>>>>>>> case, hence skipping the check for that is breaking the sanity check) and
>>>>>>> so changed the patch from "correct, but a bit fragile" to broken. I've
>>>>>>> merged the previous version instead.
>>>>>>> -Daniel
>>>>>> I'm confused. I thought your main issue was the early wrapping not the
>>>>>> sanity check. The check is to ensure that the reservation is large enough to
>>>>>> cover all the commands written during request submission. That should not be
>>>>>> affected by whether a wrap occurs or not. Wrapping does not magically add an
>>>>>> extra bunch of dwords to the emit_request() call. Whereas making the check
>>>>>> work with the wrap condition requires adding in extra tracking state of
>>>>>> exactly where the wrap occurred. That is extra code that only exists to
>>>>>> catch something in the very rare case which should already have been caught
>>>>>> in the very common case. I.e. if your reserved size is too small then you
>>>>>> will hit the warning on every batch buffer submission.
>>>>> The problem is that if you allow a wrap in the reserve size then the
>>>>> ringspace requirements are bigger than if you don't wrap. And since the
>>>>> add request is split up into many intel_ring_begin that's possible. Hence
>>>>> if you allow wrapping in the reserved space, then the most important case
>>>>> for the debug check is to make sure that it catches any kind of
>>>>> reservation overflow while wrapping. The not-wrapped case is probably the
>>>>> boring one.
>>>>>
>>>>> And indeed eventually we should overflow since according to your comment
>>>>> the worst case add request on ilk is 136 dwords. And the largest
>>>>> intel_ring_begin in there is 32 dwords, which means at most we'll throw
>>>>> away 31 dwords when wrapping. Which means the 160 dwords of reservation
>>>>> are not enough since we'd need 167 dwords of space for the worst case. But
>>>>> since the space_end debug check was a no-op for the wrapped case you won't
>>>>> catch this one.
>>>> The minimum reservation size in this case is still only 136. The prepare
>>>> code checks for the 32 words actually requested and wraps if necessary. It
>>>> then checks for 136+32 words of space. If that would cause a wrap it will
>>>> then add on the amount of space actually left in the ring and wait for that
>>>> bigger total. That guarantees that it has waited for the 136 at the start of
>>>> the ring. The caller is then free to fill in the 32 words and there is still
>>>> guaranteed to be a minimum of 136 words available (with or without wrapping)
>>>> before any further wait for space is necessary. Thus the add_request() code
>>>> is safe from fear of failure irrespective of where any wrap might occur.
>>>>
>>>>
>>>>> Wrt keeping track of wrapping while the reservation is in use, the
>>>>> following should do that without any need of additional tracking:
>>>>>
>>>>>
>>>>> 	int used_size = ringbuf->tail - ringbuf->reserved_tail;
>>>>>
>>>>> 	if (used_size < 0)
>>>>> 		used_size += ringbuf->size;
>>>>>
>>>>> 	WARN(used_size < ringbuf->reserved_size,
>>>>> 	     "request reserved size too small: %d vs %d!\n",
>>>>> 	     used_size, ringbuf->reserved_size);
>>>>>
>>>>> I was mistaken that you can reuse __intel_ring_space (since that has
>>>>> slightly different requirements), but this gives you a nicely localized
>>>>> check for reservation overflow which works even when you wrap. Ofc it
>>>>> won't work if an add_request is bigger than the entire ring, but that's
>>>>> impossible anyway since we can at most reserve ringbuf->size -
>>>>> I915_RING_FREE_SPACE.
>>>> The problem with the above calculation is that it includes the wasted space
>>>> at the end of the ring. Thus it will complain the reserved size was too
>>>> small when in fact it was just fine.
>>> Ok I again misunderstood your patch a bit since it didn't quite do what I
>>> expect, and I stand corrected that v5 works too. But I still seem to fail
>>> to get my main concern across. I'll see whether I can whip up a patch as a
>>> short demonstration, maybe that helps to unconfuse this dicussion.
>>>
>>> For now I think we're covered with either v4 or v5 so sticking with either
>>> is ok with me.
>>> -Daniel
>> I think v5 is much better. It reduces the ring space wastage which I thought
>> was your main concern.
> Ok with me too - I simply didn't pick it up when merging yesterday because
> I couldn't immediately convince myself it's correct, but really wanted to
> pull in your series. Unfortunately it's now burried below piles of
> patches, so can you please do a delta patch?
Delta patch posted:  '[PATCH] drm/i915: Reserve space improvements'.


>
>> The problem with a more simplistic approach that just doubles the minimum
>> reserve size to ensure that it will fit before or after a wrap is that you
>> are doubling the reserve size. That too is rather wasteful of ring space. It
>> also means that you only find out when the reserve size is too small when
>> you hit the maximum usage coincident with a worst case wrap point. Whereas
>> the v5 method means that you notice a too small reserve whether wrapping or
>> not.
> We don't need to double the reservation since the add_request tail is
> split up into many individual intel_ring_begin. And we'd only need to wrap
> for the largest of those, which is substantially less than the entire
> reservation. Furthermore with the reservation these commands can't ever
> fail, so for those we know are only used in the add_request tail we could
> go to a wrap-only intel_ring_begin which never waits and have one at a
> dword cmd boundary. That means we'd need to overestimate the needed
> ringbuffer space by just a few dwords (namely the size of the longest CS
> cmd we emit under reservation). Which is around 6 dwords or so iirc. And
> to avoid changing ilk we could just special case that in reserve_space().
>
> In practice I don't think there would be any difference with your v5 since
> especially with the scheduler we shouldn't ever overfill rings really. But
> the clear upside is that the reserve_space_end check would be independent
> of any implementation details of how reservation vs. wrapping is done
> exactly. And hence robust against any future fumbles in this area. Looking
> at our history of the relevant code we can expect a lot of those.
> -Daniel

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0347eb9..eba1857 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2187,6 +2187,7 @@  struct drm_i915_gem_request {
 
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx);
+void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 void i915_gem_request_free(struct kref *req_ref);
 
 static inline uint32_t
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 81f3512..85fa27b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2485,6 +2485,13 @@  int __i915_add_request(struct intel_engine_cs *ring,
 	} else
 		ringbuf = ring->buffer;
 
+	/*
+	 * To ensure that this call will not fail, space for its emissions
+	 * should already have been reserved in the ring buffer. Let the ring
+	 * know that it is time to use that space up.
+	 */
+	intel_ring_reserved_space_use(ringbuf);
+
 	request_start = intel_ring_get_tail(ringbuf);
 	/*
 	 * Emit any outstanding flushes - execbuf can fail to emit the flush
@@ -2567,6 +2574,9 @@  int __i915_add_request(struct intel_engine_cs *ring,
 			   round_jiffies_up_relative(HZ));
 	intel_mark_busy(dev_priv->dev);
 
+	/* Sanity check that the reserved size was large enough. */
+	intel_ring_reserved_space_end(ringbuf);
+
 	return 0;
 }
 
@@ -2666,6 +2676,26 @@  int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	if (ret)
 		goto err;
 
+	/*
+	 * Reserve space in the ring buffer for all the commands required to
+	 * eventually emit this request. This is to guarantee that the
+	 * i915_add_request() call can't fail. Note that the reserve may need
+	 * to be redone if the request is not actually submitted straight
+	 * away, e.g. because a GPU scheduler has deferred it.
+	 *
+	 * Note further that this call merely notes the reserve request. A
+	 * subsequent call to *_ring_begin() is required to actually ensure
+	 * that the reservation is available. Without the begin, if the
+	 * request creator immediately submitted the request without adding
+	 * any commands to it then there might not actually be sufficient
+	 * room for the submission commands. Unfortunately, the current
+	 * *_ring_begin() implementations potentially call back here to
+	 * i915_gem_request_alloc(). Thus calling _begin() here would lead to
+	 * infinite recursion! Until that back call path is removed, it is
+	 * necessary to do a manual _begin() outside.
+	 */
+	intel_ring_reserved_space_reserve(req->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
+
 	ring->outstanding_lazy_request = req;
 	return 0;
 
@@ -2674,6 +2704,13 @@  err:
 	return ret;
 }
 
+void i915_gem_request_cancel(struct drm_i915_gem_request *req)
+{
+	intel_ring_reserved_space_cancel(req->ringbuf);
+
+	i915_gem_request_unreference(req);
+}
+
 struct drm_i915_gem_request *
 i915_gem_find_active_request(struct intel_engine_cs *ring)
 {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6a5ed07..bd62bd6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -690,6 +690,9 @@  static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
 	if (intel_ring_space(ringbuf) >= bytes)
 		return 0;
 
+	/* The whole point of reserving space is to not wait! */
+	WARN_ON(ringbuf->reserved_in_use);
+
 	list_for_each_entry(request, &ring->request_list, list) {
 		/*
 		 * The request queue is per-engine, so can contain requests
@@ -748,8 +751,12 @@  static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
 	int rem = ringbuf->size - ringbuf->tail;
 
 	if (ringbuf->space < rem) {
-		int ret = logical_ring_wait_for_space(ringbuf, ctx, rem);
+		int ret;
+
+		/* Can't wait if space has already been reserved! */
+		WARN_ON(ringbuf->reserved_in_use);
 
+		ret = logical_ring_wait_for_space(ringbuf, ctx, rem);
 		if (ret)
 			return ret;
 	}
@@ -768,7 +775,7 @@  static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
 static int logical_ring_prepare(struct intel_ringbuffer *ringbuf,
 				struct intel_context *ctx, int bytes)
 {
-	int ret;
+	int ret, max_bytes;
 
 	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
 		ret = logical_ring_wrap_buffer(ringbuf, ctx);
@@ -776,8 +783,28 @@  static int logical_ring_prepare(struct intel_ringbuffer *ringbuf,
 			return ret;
 	}
 
-	if (unlikely(ringbuf->space < bytes)) {
-		ret = logical_ring_wait_for_space(ringbuf, ctx, bytes);
+	/*
+	 * Add on the reserved size to the request to make sure that after
+	 * the intended commands have been emitted, there is guaranteed to
+	 * still be enough free space to send them to the hardware.
+	 */
+	max_bytes = bytes + ringbuf->reserved_size;
+
+	if (unlikely(ringbuf->space < max_bytes)) {
+		/*
+		 * Bytes is guaranteed to fit within the tail of the buffer,
+		 * but the reserved space may push it off the end. If so then
+		 * need to wait for the whole of the tail plus the reserved
+		 * size. That should guarantee that the actual request
+		 * (bytes) will fit between here and the end and the reserved
+		 * usage will fit either in the same or at the start. Either
+		 * way, if a wrap occurs it will not involve a wait and thus
+		 * cannot fail.
+		 */
+		if (unlikely(ringbuf->tail + max_bytes + I915_RING_FREE_SPACE > ringbuf->effective_size))
+			max_bytes = ringbuf->reserved_size + I915_RING_FREE_SPACE + ringbuf->size - ringbuf->tail;
+
+		ret = logical_ring_wait_for_space(ringbuf, ctx, max_bytes);
 		if (unlikely(ret))
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d934f85..1c125e9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2106,6 +2106,9 @@  static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 	if (intel_ring_space(ringbuf) >= n)
 		return 0;
 
+	/* The whole point of reserving space is to not wait! */
+	WARN_ON(ringbuf->reserved_in_use);
+
 	list_for_each_entry(request, &ring->request_list, list) {
 		space = __intel_ring_space(request->postfix, ringbuf->tail,
 					   ringbuf->size);
@@ -2131,7 +2134,12 @@  static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
 	int rem = ringbuf->size - ringbuf->tail;
 
 	if (ringbuf->space < rem) {
-		int ret = ring_wait_for_space(ring, rem);
+		int ret;
+
+		/* Can't wait if space has already been reserved! */
+		WARN_ON(ringbuf->reserved_in_use);
+
+		ret = ring_wait_for_space(ring, rem);
 		if (ret)
 			return ret;
 	}
@@ -2180,11 +2188,69 @@  int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 	return 0;
 }
 
-static int __intel_ring_prepare(struct intel_engine_cs *ring,
-				int bytes)
+void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size)
+{
+	/* NB: Until request management is fully tidied up and the OLR is
+	 * removed, there are too many ways for get false hits on this
+	 * anti-recursion check! */
+	/*WARN_ON(ringbuf->reserved_size);*/
+	WARN_ON(ringbuf->reserved_in_use);
+
+	ringbuf->reserved_size = size;
+
+	/*
+	 * Really need to call _begin() here but that currently leads to
+	 * recursion problems! This will be fixed later but for now just
+	 * return and hope for the best. Note that there is only a real
+	 * problem if the create of the request never actually calls _begin()
+	 * but if they are not submitting any work then why did they create
+	 * the request in the first place?
+	 */
+}
+
+void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf)
+{
+	WARN_ON(ringbuf->reserved_in_use);
+
+	ringbuf->reserved_size   = 0;
+	ringbuf->reserved_in_use = false;
+}
+
+void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf)
+{
+	WARN_ON(ringbuf->reserved_in_use);
+
+	ringbuf->reserved_in_use = true;
+	ringbuf->reserved_tail   = ringbuf->tail;
+}
+
+void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf)
+{
+	WARN_ON(!ringbuf->reserved_in_use);
+	if (ringbuf->tail > ringbuf->reserved_tail) {
+		WARN(ringbuf->tail > ringbuf->reserved_tail + ringbuf->reserved_size,
+		     "request reserved size too small: %d vs %d!\n",
+		     ringbuf->tail - ringbuf->reserved_tail, ringbuf->reserved_size);
+	} else {
+		/*
+		 * The ring was wrapped while the reserved space was in use.
+		 * That means that some unknown amount of the ring tail was
+		 * no-op filled and skipped. Thus simply adding the ring size
+		 * to the tail and doing the above space check will not work.
+		 * Rather than attempt to track how much tail was skipped,
+		 * it is much simpler to say that also skipping the sanity
+		 * check every once in a while is not a big issue.
+		 */
+	}
+
+	ringbuf->reserved_size   = 0;
+	ringbuf->reserved_in_use = false;
+}
+
+static int __intel_ring_prepare(struct intel_engine_cs *ring, int bytes)
 {
 	struct intel_ringbuffer *ringbuf = ring->buffer;
-	int ret;
+	int ret, max_bytes;
 
 	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
 		ret = intel_wrap_ring_buffer(ring);
@@ -2192,8 +2258,28 @@  static int __intel_ring_prepare(struct intel_engine_cs *ring,
 			return ret;
 	}
 
-	if (unlikely(ringbuf->space < bytes)) {
-		ret = ring_wait_for_space(ring, bytes);
+	/*
+	 * Add on the reserved size to the request to make sure that after
+	 * the intended commands have been emitted, there is guaranteed to
+	 * still be enough free space to send them to the hardware.
+	 */
+	max_bytes = bytes + ringbuf->reserved_size;
+
+	if (unlikely(ringbuf->space < max_bytes)) {
+		/*
+		 * Bytes is guaranteed to fit within the tail of the buffer,
+		 * but the reserved space may push it off the end. If so then
+		 * need to wait for the whole of the tail plus the reserved
+		 * size. That should guarantee that the actual request
+		 * (bytes) will fit between here and the end and the reserved
+		 * usage will fit either in the same or at the start. Either
+		 * way, if a wrap occurs it will not involve a wait and thus
+		 * cannot fail.
+		 */
+		if (unlikely(ringbuf->tail + max_bytes > ringbuf->effective_size))
+			max_bytes = ringbuf->reserved_size + I915_RING_FREE_SPACE + ringbuf->size - ringbuf->tail;
+
+		ret = ring_wait_for_space(ring, max_bytes);
 		if (unlikely(ret))
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 39f6dfc..bf2ac28 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -105,6 +105,9 @@  struct intel_ringbuffer {
 	int space;
 	int size;
 	int effective_size;
+	int reserved_size;
+	int reserved_tail;
+	bool reserved_in_use;
 
 	/** We track the position of the requests in the ring buffer, and
 	 * when each is retired we increment last_retired_head as the GPU
@@ -450,4 +453,26 @@  intel_ring_get_request(struct intel_engine_cs *ring)
 	return ring->outstanding_lazy_request;
 }
 
+/*
+ * Arbitrary size for largest possible 'add request' sequence. The code paths
+ * are complex and variable. Empirical measurement shows that the worst case
+ * is ILK at 136 words. Reserving too much is better than reserving too little
+ * as that allows for corner cases that might have been missed. So the figure
+ * has been rounded up to 160 words.
+ */
+#define MIN_SPACE_FOR_ADD_REQUEST	160
+
+/*
+ * Reserve space in the ring to guarantee that the i915_add_request() call
+ * will always have sufficient room to do its stuff. The request creation
+ * code calls this automatically.
+ */
+void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size);
+/* Cancel the reservation, e.g. because the request is being discarded. */
+void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf);
+/* Use the reserved space - for use by i915_add_request() only. */
+void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf);
+/* Finish with the reserved space - for use by i915_add_request() only. */
+void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf);
+
 #endif /* _INTEL_RINGBUFFER_H_ */