[v4] drm/i915: Check for awaits on still currently executing requests
diff mbox series

Message ID 20200529122851.8540-1-chris@chris-wilson.co.uk
State New
Headers show
Series
  • [v4] drm/i915: Check for awaits on still currently executing requests
Related show

Commit Message

Chris Wilson May 29, 2020, 12:28 p.m. UTC
With the advent of preempt-to-busy, a request may still be on the GPU as
we unwind. And in the case of a unpreemptible [due to HW] request, that
request will remain indefinitely on the GPU even though we have
returned it back to our submission queue, and cleared the active bit.

We only run the execution callbacks on transferring the request from our
submission queue to the execution queue, but if this is a bonded request
that the HW is waiting for, we will not submit it (as we wait for a
fresh execution) even though it is still being executed.

As we know that there are always preemption points between requests, we
know that only the currently executing request may be still active even
though we have cleared the flag.

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Testcase: igt/gem_exec_balancer/bonded-dual
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

Comments

Chris Wilson May 29, 2020, 2:03 p.m. UTC | #1
Quoting Chris Wilson (2020-05-29 13:28:51)
> With the advent of preempt-to-busy, a request may still be on the GPU as
> we unwind. And in the case of a unpreemptible [due to HW] request, that
> request will remain indefinitely on the GPU even though we have
> returned it back to our submission queue, and cleared the active bit.
> 
> We only run the execution callbacks on transferring the request from our
> submission queue to the execution queue, but if this is a bonded request
> that the HW is waiting for, we will not submit it (as we wait for a
> fresh execution) even though it is still being executed.
> 
> As we know that there are always preemption points between requests, we
> know that only the currently executing request may be still active even
> though we have cleared the flag.
> 
> Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
> Testcase: igt/gem_exec_balancer/bonded-dual
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_request.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index e5aba6824e26..2f0e9a63002d 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -363,6 +363,23 @@ static void __llist_add(struct llist_node *node, struct llist_head *head)
>         head->first = node;
>  }
>  
> +static bool __request_in_flight(const struct i915_request *signal)
> +{
> +       /*
> +        * Even if we have unwound the request, it may still be on
> +        * the GPU (preempt-to-busy). If that request is inside an
> +        * unpreemptible critical section, it will not be removed. Some
> +        * GPU functions may even be stuck waiting for the paired request
> +        * (__await_execution) to be submitted and cannot be preempted
> +        * until the bond is executing.
> +        *
> +        * As we know that there are always preemption points between
> +        * requests, we know that only the currently executing request
> +        * may be still active even though we have cleared the flag.
> +        */
> +       return signal == execlists_active(&signal->engine->execlists);

Iff and only if there is one request in ELSP[0]. And presuming
process_csb has been run recently.

I think I'm back at intel_context_inflight(signal).
-Chris

Patch
diff mbox series

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index e5aba6824e26..2f0e9a63002d 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -363,6 +363,23 @@  static void __llist_add(struct llist_node *node, struct llist_head *head)
 	head->first = node;
 }
 
+static bool __request_in_flight(const struct i915_request *signal)
+{
+	/*
+	 * Even if we have unwound the request, it may still be on
+	 * the GPU (preempt-to-busy). If that request is inside an
+	 * unpreemptible critical section, it will not be removed. Some
+	 * GPU functions may even be stuck waiting for the paired request
+	 * (__await_execution) to be submitted and cannot be preempted
+	 * until the bond is executing.
+	 *
+	 * As we know that there are always preemption points between
+	 * requests, we know that only the currently executing request
+	 * may be still active even though we have cleared the flag.
+	 */
+	return signal == execlists_active(&signal->engine->execlists);
+}
+
 static int
 __await_execution(struct i915_request *rq,
 		  struct i915_request *signal,
@@ -393,7 +410,7 @@  __await_execution(struct i915_request *rq,
 	}
 
 	spin_lock_irq(&signal->lock);
-	if (i915_request_is_active(signal)) {
+	if (i915_request_is_active(signal) || __request_in_flight(signal)) {
 		if (hook) {
 			hook(rq, &signal->fence);
 			i915_request_put(signal);