[22/66] drm/i915/gem: Bind the fence async for execbuf

Message ID	20200715115147.11866-22-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ev6m=A2=lists.freedesktop.org=intel-gfx-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1839A20658 From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Wed, 15 Jul 2020 12:51:03 +0100 Message-Id: <20200715115147.11866-22-chris@chris-wilson.co.uk> In-Reply-To: <20200715115147.11866-1-chris@chris-wilson.co.uk> References: <20200715115147.11866-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 22/66] drm/i915/gem: Bind the fence async for execbuf Precedence: list Cc: Chris Wilson <chris@chris-wilson.co.uk> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	[01/66] drm/i915: Reduce i915_request.lock contention for i915_request_wait \| expand [01/66] drm/i915: Reduce i915_request.lock contention for i915_request_wait [02/66] drm/i915: Remove i915_request.lock requirement for execution callbacks [03/66] drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs [04/66] drm/i915: Add a couple of missing i915_active_fini() [05/66] drm/i915: Skip taking acquire mutex for no ref->active callback [06/66] drm/i915: Export a preallocate variant of i915_active_acquire() [07/66] drm/i915: Keep the most recently used active-fence upon discard [08/66] drm/i915: Make the stale cached active node available for any timeline [09/66] drm/i915: Provide a fastpath for waiting on vma bindings [10/66] drm/i915: Soften the tasklet flush frequency before waits [11/66] drm/i915: Preallocate stashes for vma page-directories [12/66] drm/i915: Switch to object allocations for page directories [13/66] drm/i915/gem: Don't drop the timeline lock during execbuf [14/66] drm/i915/gem: Rename execbuf.bind_link to unbound_link [15/66] drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup [16/66] drm/i915/gem: Remove the call for no-evict i915_vma_pin [17/66] drm/i915: Add list_for_each_entry_safe_continue_reverse [18/66] drm/i915: Always defer fenced work to the worker [19/66] drm/i915/gem: Assign context id for async work [20/66] drm/i915/gem: Separate the ww_mutex walker into its own list [21/66] drm/i915/gem: Asynchronous GTT unbinding [22/66] drm/i915/gem: Bind the fence async for execbuf [23/66] drm/i915/gem: Include cmdparser in common execbuf pinning [24/66] drm/i915/gem: Include secure batch in common execbuf pinning [25/66] drm/i915/gem: Reintroduce multiple passes for reloc processing [26/66] drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2. [27/66] drm/i915/gem: Pull execbuf dma resv under a single critical section [28/66] drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class [29/66] drm/i915: Hold wakeref for the duration of the vma GGTT binding [30/66] drm/i915: Specialise GGTT binding [31/66] drm/i915/gt: Acquire backing storage for the context [32/66] drm/i915/gt: Push the wait for the context to bound to the request [33/66] drm/i915: Remove unused i915_gem_evict_vm() [34/66] drm/i915/gt: Decouple completed requests on unwind [35/66] drm/i915/gt: Check for a completed last request once [36/66] drm/i915/gt: Replace direct submit with direct call to tasklet [37/66] drm/i915/gt: Free stale request on destroying the virtual engine [38/66] drm/i915/gt: Use virtual_engine during execlists_dequeue [39/66] drm/i915/gt: Decouple inflight virtual engines [40/66] drm/i915/gt: Defer schedule_out until after the next dequeue [41/66] drm/i915/gt: Resubmit the virtual engine on schedule-out [42/66] drm/i915/gt: Simplify virtual engine handling for execlists_hold() [43/66] drm/i915/gt: ce->inflight updates are now serialised [44/66] drm/i915/gt: Drop atomic for engine->fw_active tracking [45/66] drm/i915/gt: Extract busy-stats for ring-scheduler [46/66] drm/i915/gt: Convert stats.active to plain unsigned int [47/66] drm/i915: Lift waiter/signaler iterators [48/66] drm/i915: Strip out internal priorities [49/66] drm/i915: Remove I915_USER_PRIORITY_SHIFT [50/66] drm/i915: Replace engine->schedule() with a known request operation [51/66] drm/i915/gt: Do not suspend bonded requests if one hangs [52/66] drm/i915: Teach the i915_dependency to use a double-lock [53/66] drm/i915: Restructure priority inheritance [54/66] drm/i915/gt: Remove timeslice suppression [55/66] drm/i915: Fair low-latency scheduling [56/66] drm/i915/gt: Specify a deadline for the heartbeat [57/66] drm/i915: Replace the priority boosting for the display with a deadline [58/66] drm/i915: Move saturated workload detection to the GT [59/66] Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq" [60/66] drm/i915/gt: Couple tasklet scheduling for all CS interrupts [61/66] drm/i915/gt: Support creation of 'internal' rings [62/66] drm/i915/gt: Use client timeline address for seqno writes [63/66] drm/i915/gt: Infrastructure for ring scheduling [64/66] drm/i915/gt: Implement ring scheduler for gen6/7 [65/66] drm/i915/gt: Enable ring scheduling for gen6/7 [66/66] drm/i915/gem: Remove timeline nesting from snb relocs

Message ID

20200715115147.11866-22-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1839A20658
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Wed, 15 Jul 2020 12:51:03 +0100
Message-Id: <20200715115147.11866-22-chris@chris-wilson.co.uk>
In-Reply-To: <20200715115147.11866-1-chris@chris-wilson.co.uk>
References: <20200715115147.11866-1-chris@chris-wilson.co.uk>
MIME-Version: 1.0
Subject: [Intel-gfx] [PATCH 22/66] drm/i915/gem: Bind the fence async for
 execbuf
Precedence: list
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

[01/66] drm/i915: Reduce i915_request.lock contention for i915_request_wait | expand

Commit Message

Chris Wilson July 15, 2020, 11:51 a.m. UTC

It is illegal to wait on an another vma while holding the vm->mutex, as
that easily leads to ABBA deadlocks (we wait on a second vma that waits
on us to release the vm->mutex). So while the vm->mutex exists, move the
waiting outside of the lock into the async binding pipeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  21 +--
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  | 137 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h  |   5 +
 3 files changed, 151 insertions(+), 12 deletions(-)

Comments

Thomas Hellström (Intel) July 27, 2020, 6:19 p.m. UTC | #1

On 7/15/20 1:51 PM, Chris Wilson wrote:
> It is illegal to wait on an another vma while holding the vm->mutex, as
> that easily leads to ABBA deadlocks (we wait on a second vma that waits
> on us to release the vm->mutex). So while the vm->mutex exists, move the
> waiting outside of the lock into the async binding pipeline.

Why is it we don't just move the fence binding to a separate loop after 
unlocking the vm->mutex in eb_reserve_vm()?

/Thomas

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  21 +--
>   drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  | 137 +++++++++++++++++-
>   drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h  |   5 +
>   3 files changed, 151 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index bdcbb82bfc3d..af2b4aeb6df0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -1056,15 +1056,6 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind)
>   		return err;
>   
>   pin:
> -	if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
> -		err = __i915_vma_pin_fence(vma); /* XXX no waiting */
> -		if (unlikely(err))
> -			return err;
> -
> -		if (vma->fence)
> -			bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE;
> -	}
> -
>   	bind_flags &= ~atomic_read(&vma->flags);
>   	if (bind_flags) {
>   		err = set_bind_fence(vma, work);
> @@ -1095,6 +1086,15 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind)
>   	bind->ev->flags |= __EXEC_OBJECT_HAS_PIN;
>   	GEM_BUG_ON(eb_vma_misplaced(entry, vma, bind->ev->flags));
>   
> +	if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
> +		err = __i915_vma_pin_fence_async(vma, &work->base);
> +		if (unlikely(err))
> +			return err;
> +
> +		if (vma->fence)
> +			bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE;
> +	}
> +
>   	return 0;
>   }
>   
> @@ -1160,6 +1160,9 @@ static void __eb_bind_vma(struct eb_vm_work *work)
>   		struct eb_bind_vma *bind = &work->bind[n];
>   		struct i915_vma *vma = bind->ev->vma;
>   
> +		if (bind->ev->flags & __EXEC_OBJECT_HAS_FENCE)
> +			__i915_vma_apply_fence_async(vma);
> +
>   		if (!bind->bind_flags)
>   			goto put;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> index 7fb36b12fe7a..734b6aa61809 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
> @@ -21,10 +21,13 @@
>    * IN THE SOFTWARE.
>    */
>   
> +#include "i915_active.h"
>   #include "i915_drv.h"
>   #include "i915_scatterlist.h"
> +#include "i915_sw_fence_work.h"
>   #include "i915_pvinfo.h"
>   #include "i915_vgpu.h"
> +#include "i915_vma.h"
>   
>   /**
>    * DOC: fence register handling
> @@ -340,19 +343,37 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
>   	return ERR_PTR(-EDEADLK);
>   }
>   
> +static int fence_wait_bind(struct i915_fence_reg *reg)
> +{
> +	struct dma_fence *fence;
> +	int err = 0;
> +
> +	fence = i915_active_fence_get(&reg->active.excl);
> +	if (fence) {
> +		err = dma_fence_wait(fence, true);
> +		dma_fence_put(fence);
> +	}
> +
> +	return err;
> +}
> +
>   int __i915_vma_pin_fence(struct i915_vma *vma)
>   {
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
> -	struct i915_fence_reg *fence;
> +	struct i915_fence_reg *fence = vma->fence;
>   	struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL;
>   	int err;
>   
>   	lockdep_assert_held(&vma->vm->mutex);
>   
>   	/* Just update our place in the LRU if our fence is getting reused. */
> -	if (vma->fence) {
> -		fence = vma->fence;
> +	if (fence) {
>   		GEM_BUG_ON(fence->vma != vma);
> +
> +		err = fence_wait_bind(fence);
> +		if (err)
> +			return err;
> +
>   		atomic_inc(&fence->pin_count);
>   		if (!fence->dirty) {
>   			list_move_tail(&fence->link, &ggtt->fence_list);
> @@ -384,6 +405,116 @@ int __i915_vma_pin_fence(struct i915_vma *vma)
>   	return err;
>   }
>   
> +static int set_bind_fence(struct i915_fence_reg *fence,
> +			  struct dma_fence_work *work)
> +{
> +	struct dma_fence *prev;
> +	int err;
> +
> +	if (rcu_access_pointer(fence->active.excl.fence) == &work->dma)
> +		return 0;
> +
> +	err = i915_sw_fence_await_active(&work->chain,
> +					 &fence->active,
> +					 I915_ACTIVE_AWAIT_ACTIVE);
> +	if (err)
> +		return err;
> +
> +	if (i915_active_acquire(&fence->active))
> +		return -ENOENT;
> +
> +	prev = i915_active_set_exclusive(&fence->active, &work->dma);
> +	if (unlikely(prev)) {
> +		err = i915_sw_fence_await_dma_fence(&work->chain, prev, 0,
> +						    GFP_NOWAIT | __GFP_NOWARN);
> +		dma_fence_put(prev);
> +	}
> +
> +	i915_active_release(&fence->active);
> +	return err < 0 ? err : 0;
> +}
> +
> +int __i915_vma_pin_fence_async(struct i915_vma *vma,
> +			       struct dma_fence_work *work)
> +{
> +	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
> +	struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL;
> +	struct i915_fence_reg *fence = vma->fence;
> +	int err;
> +
> +	lockdep_assert_held(&vma->vm->mutex);
> +
> +	/* Just update our place in the LRU if our fence is getting reused. */
> +	if (fence) {
> +		GEM_BUG_ON(fence->vma != vma);
> +		GEM_BUG_ON(!i915_vma_is_map_and_fenceable(vma));
> +	} else if (set) {
> +		if (!i915_vma_is_map_and_fenceable(vma))
> +			return -EINVAL;
> +
> +		fence = fence_find(ggtt);
> +		if (IS_ERR(fence))
> +			return -ENOSPC;
> +
> +		GEM_BUG_ON(atomic_read(&fence->pin_count));
> +		fence->dirty = true;
> +	} else {
> +		return 0;
> +	}
> +
> +	atomic_inc(&fence->pin_count);
> +	list_move_tail(&fence->link, &ggtt->fence_list);
> +	if (!fence->dirty)
> +		return 0;
> +
> +	if (INTEL_GEN(fence_to_i915(fence)) < 4 &&
> +	    rcu_access_pointer(vma->active.excl.fence) != &work->dma) {
> +		/* implicit 'unfenced' GPU blits */
> +		err = i915_sw_fence_await_active(&work->chain,
> +						 &vma->active,
> +						 I915_ACTIVE_AWAIT_ACTIVE);
> +		if (err)
> +			goto err_unpin;
> +	}
> +
> +	err = set_bind_fence(fence, work);
> +	if (err)
> +		goto err_unpin;
> +
> +	if (set) {
> +		fence->start = vma->node.start;
> +		fence->size  = vma->fence_size;
> +		fence->stride = i915_gem_object_get_stride(vma->obj);
> +		fence->tiling = i915_gem_object_get_tiling(vma->obj);
> +
> +		vma->fence = fence;
> +	} else {
> +		fence->tiling = 0;
> +		vma->fence = NULL;
> +	}
> +
> +	set = xchg(&fence->vma, set);
> +	if (set && set != vma) {
> +		GEM_BUG_ON(set->fence != fence);
> +		WRITE_ONCE(set->fence, NULL);
> +		i915_vma_revoke_mmap(set);
> +	}
> +
> +	return 0;
> +
> +err_unpin:
> +	atomic_dec(&fence->pin_count);
> +	return err;
> +}
> +
> +void __i915_vma_apply_fence_async(struct i915_vma *vma)
> +{
> +	struct i915_fence_reg *fence = vma->fence;
> +
> +	if (fence->dirty)
> +		fence_write(fence);
> +}
> +
>   /**
>    * i915_vma_pin_fence - set up fencing for a vma
>    * @vma: vma to map through a fence reg
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
> index 9eef679e1311..d306ac14d47e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
> @@ -30,6 +30,7 @@
>   
>   #include "i915_active.h"
>   
> +struct dma_fence_work;
>   struct drm_i915_gem_object;
>   struct i915_ggtt;
>   struct i915_vma;
> @@ -70,6 +71,10 @@ void i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj,
>   void i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj,
>   					 struct sg_table *pages);
>   
> +int __i915_vma_pin_fence_async(struct i915_vma *vma,
> +			       struct dma_fence_work *work);
> +void __i915_vma_apply_fence_async(struct i915_vma *vma);
> +
>   void intel_ggtt_init_fences(struct i915_ggtt *ggtt);
>   void intel_ggtt_fini_fences(struct i915_ggtt *ggtt);
>

Chris Wilson July 28, 2020, 3:08 p.m. UTC | #2

Quoting Thomas Hellström (Intel) (2020-07-27 19:19:19)
> 
> On 7/15/20 1:51 PM, Chris Wilson wrote:
> > It is illegal to wait on an another vma while holding the vm->mutex, as
> > that easily leads to ABBA deadlocks (we wait on a second vma that waits
> > on us to release the vm->mutex). So while the vm->mutex exists, move the
> > waiting outside of the lock into the async binding pipeline.
> 
> Why is it we don't just move the fence binding to a separate loop after 
> unlocking the vm->mutex in eb_reserve_vm()?

That is what is done. The work is called immediately when possible. Just
the loop may be deferred if the what we need to unbind are still active.
-Chris

Thomas Hellström (Intel) July 31, 2020, 1:12 p.m. UTC | #3

On 7/28/20 5:08 PM, Chris Wilson wrote:
> Quoting Thomas Hellström (Intel) (2020-07-27 19:19:19)
>> On 7/15/20 1:51 PM, Chris Wilson wrote:
>>> It is illegal to wait on an another vma while holding the vm->mutex, as
>>> that easily leads to ABBA deadlocks (we wait on a second vma that waits
>>> on us to release the vm->mutex). So while the vm->mutex exists, move the
>>> waiting outside of the lock into the async binding pipeline.
>> Why is it we don't just move the fence binding to a separate loop after
>> unlocking the vm->mutex in eb_reserve_vm()?
> That is what is done. The work is called immediately when possible. Just
> the loop may be deferred if the what we need to unbind are still active

OK, then

Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>


> -Chris

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index bdcbb82bfc3d..af2b4aeb6df0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1056,15 +1056,6 @@  static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind)
 		return err;
 
 pin:
-	if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
-		err = __i915_vma_pin_fence(vma); /* XXX no waiting */
-		if (unlikely(err))
-			return err;
-
-		if (vma->fence)
-			bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE;
-	}
-
 	bind_flags &= ~atomic_read(&vma->flags);
 	if (bind_flags) {
 		err = set_bind_fence(vma, work);
@@ -1095,6 +1086,15 @@  static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind)
 	bind->ev->flags |= __EXEC_OBJECT_HAS_PIN;
 	GEM_BUG_ON(eb_vma_misplaced(entry, vma, bind->ev->flags));
 
+	if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
+		err = __i915_vma_pin_fence_async(vma, &work->base);
+		if (unlikely(err))
+			return err;
+
+		if (vma->fence)
+			bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE;
+	}
+
 	return 0;
 }
 
@@ -1160,6 +1160,9 @@  static void __eb_bind_vma(struct eb_vm_work *work)
 		struct eb_bind_vma *bind = &work->bind[n];
 		struct i915_vma *vma = bind->ev->vma;
 
+		if (bind->ev->flags & __EXEC_OBJECT_HAS_FENCE)
+			__i915_vma_apply_fence_async(vma);
+
 		if (!bind->bind_flags)
 			goto put;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
index 7fb36b12fe7a..734b6aa61809 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
@@ -21,10 +21,13 @@ 
  * IN THE SOFTWARE.
  */
 
+#include "i915_active.h"
 #include "i915_drv.h"
 #include "i915_scatterlist.h"
+#include "i915_sw_fence_work.h"
 #include "i915_pvinfo.h"
 #include "i915_vgpu.h"
+#include "i915_vma.h"
 
 /**
  * DOC: fence register handling
@@ -340,19 +343,37 @@  static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
 	return ERR_PTR(-EDEADLK);
 }
 
+static int fence_wait_bind(struct i915_fence_reg *reg)
+{
+	struct dma_fence *fence;
+	int err = 0;
+
+	fence = i915_active_fence_get(&reg->active.excl);
+	if (fence) {
+		err = dma_fence_wait(fence, true);
+		dma_fence_put(fence);
+	}
+
+	return err;
+}
+
 int __i915_vma_pin_fence(struct i915_vma *vma)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
-	struct i915_fence_reg *fence;
+	struct i915_fence_reg *fence = vma->fence;
 	struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL;
 	int err;
 
 	lockdep_assert_held(&vma->vm->mutex);
 
 	/* Just update our place in the LRU if our fence is getting reused. */
-	if (vma->fence) {
-		fence = vma->fence;
+	if (fence) {
 		GEM_BUG_ON(fence->vma != vma);
+
+		err = fence_wait_bind(fence);
+		if (err)
+			return err;
+
 		atomic_inc(&fence->pin_count);
 		if (!fence->dirty) {
 			list_move_tail(&fence->link, &ggtt->fence_list);
@@ -384,6 +405,116 @@  int __i915_vma_pin_fence(struct i915_vma *vma)
 	return err;
 }
 
+static int set_bind_fence(struct i915_fence_reg *fence,
+			  struct dma_fence_work *work)
+{
+	struct dma_fence *prev;
+	int err;
+
+	if (rcu_access_pointer(fence->active.excl.fence) == &work->dma)
+		return 0;
+
+	err = i915_sw_fence_await_active(&work->chain,
+					 &fence->active,
+					 I915_ACTIVE_AWAIT_ACTIVE);
+	if (err)
+		return err;
+
+	if (i915_active_acquire(&fence->active))
+		return -ENOENT;
+
+	prev = i915_active_set_exclusive(&fence->active, &work->dma);
+	if (unlikely(prev)) {
+		err = i915_sw_fence_await_dma_fence(&work->chain, prev, 0,
+						    GFP_NOWAIT | __GFP_NOWARN);
+		dma_fence_put(prev);
+	}
+
+	i915_active_release(&fence->active);
+	return err < 0 ? err : 0;
+}
+
+int __i915_vma_pin_fence_async(struct i915_vma *vma,
+			       struct dma_fence_work *work)
+{
+	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
+	struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL;
+	struct i915_fence_reg *fence = vma->fence;
+	int err;
+
+	lockdep_assert_held(&vma->vm->mutex);
+
+	/* Just update our place in the LRU if our fence is getting reused. */
+	if (fence) {
+		GEM_BUG_ON(fence->vma != vma);
+		GEM_BUG_ON(!i915_vma_is_map_and_fenceable(vma));
+	} else if (set) {
+		if (!i915_vma_is_map_and_fenceable(vma))
+			return -EINVAL;
+
+		fence = fence_find(ggtt);
+		if (IS_ERR(fence))
+			return -ENOSPC;
+
+		GEM_BUG_ON(atomic_read(&fence->pin_count));
+		fence->dirty = true;
+	} else {
+		return 0;
+	}
+
+	atomic_inc(&fence->pin_count);
+	list_move_tail(&fence->link, &ggtt->fence_list);
+	if (!fence->dirty)
+		return 0;
+
+	if (INTEL_GEN(fence_to_i915(fence)) < 4 &&
+	    rcu_access_pointer(vma->active.excl.fence) != &work->dma) {
+		/* implicit 'unfenced' GPU blits */
+		err = i915_sw_fence_await_active(&work->chain,
+						 &vma->active,
+						 I915_ACTIVE_AWAIT_ACTIVE);
+		if (err)
+			goto err_unpin;
+	}
+
+	err = set_bind_fence(fence, work);
+	if (err)
+		goto err_unpin;
+
+	if (set) {
+		fence->start = vma->node.start;
+		fence->size  = vma->fence_size;
+		fence->stride = i915_gem_object_get_stride(vma->obj);
+		fence->tiling = i915_gem_object_get_tiling(vma->obj);
+
+		vma->fence = fence;
+	} else {
+		fence->tiling = 0;
+		vma->fence = NULL;
+	}
+
+	set = xchg(&fence->vma, set);
+	if (set && set != vma) {
+		GEM_BUG_ON(set->fence != fence);
+		WRITE_ONCE(set->fence, NULL);
+		i915_vma_revoke_mmap(set);
+	}
+
+	return 0;
+
+err_unpin:
+	atomic_dec(&fence->pin_count);
+	return err;
+}
+
+void __i915_vma_apply_fence_async(struct i915_vma *vma)
+{
+	struct i915_fence_reg *fence = vma->fence;
+
+	if (fence->dirty)
+		fence_write(fence);
+}
+
 /**
  * i915_vma_pin_fence - set up fencing for a vma
  * @vma: vma to map through a fence reg
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
index 9eef679e1311..d306ac14d47e 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
@@ -30,6 +30,7 @@ 
 
 #include "i915_active.h"
 
+struct dma_fence_work;
 struct drm_i915_gem_object;
 struct i915_ggtt;
 struct i915_vma;
@@ -70,6 +71,10 @@  void i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj,
 void i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj,
 					 struct sg_table *pages);
 
+int __i915_vma_pin_fence_async(struct i915_vma *vma,
+			       struct dma_fence_work *work);
+void __i915_vma_apply_fence_async(struct i915_vma *vma);
+
 void intel_ggtt_init_fences(struct i915_ggtt *ggtt);
 void intel_ggtt_fini_fences(struct i915_ggtt *ggtt);

[22/66] drm/i915/gem: Bind the fence async for execbuf

Commit Message

Comments

Patch