drm/i915: Propagate fence->error across semaphores
diff mbox series

Message ID 20200505161302.21726-1-chris@chris-wilson.co.uk
State New
Headers show
Series
  • drm/i915: Propagate fence->error across semaphores
Related show

Commit Message

Chris Wilson May 5, 2020, 4:13 p.m. UTC
Replacing an inter-engine fence with a semaphore reduced the HW
execution latency, but that comes at a cost. For normal fences, we are
able to propagate the metadata such as errors along with the signaling.
For semaphores, we are missing this error propagation so add it in the
back channel we use to monitor the semaphore overload.

This raises a valid point on whether error propagation is sufficient in
the semaphore case if it is coupled to a fatal error, such as EFAULT. It
is not, and we should teach ourselves not to use a semaphore if we would
chain up to an external fence whose error we must not ignore.

Fixes: ef4688497512 ("drm/i915: Propagate fence errors")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Chris Wilson May 5, 2020, 4:57 p.m. UTC | #1
Quoting Chris Wilson (2020-05-05 17:13:02)
> Replacing an inter-engine fence with a semaphore reduced the HW
> execution latency, but that comes at a cost. For normal fences, we are
> able to propagate the metadata such as errors along with the signaling.
> For semaphores, we are missing this error propagation so add it in the
> back channel we use to monitor the semaphore overload.
> 
> This raises a valid point on whether error propagation is sufficient in
> the semaphore case if it is coupled to a fatal error, such as EFAULT. It
> is not, and we should teach ourselves not to use a semaphore if we would
> chain up to an external fence whose error we must not ignore.
> 
> Fixes: ef4688497512 ("drm/i915: Propagate fence errors")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_request.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 9c5de07db47d..96a8c7a1be73 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -614,6 +614,9 @@ semaphore_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>  
>         switch (state) {
>         case FENCE_COMPLETE:
> +               if (unlikely(fence->error))
> +                       i915_request_set_error_once(rq, fence->error);

This is just horrible. I don't like it even as a hack.
-Chris

Patch
diff mbox series

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9c5de07db47d..96a8c7a1be73 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -614,6 +614,9 @@  semaphore_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 
 	switch (state) {
 	case FENCE_COMPLETE:
+		if (unlikely(fence->error))
+			i915_request_set_error_once(rq, fence->error);
+
 		if (!(READ_ONCE(rq->sched.attr.priority) & I915_PRIORITY_NOSEMAPHORE)) {
 			i915_request_get(rq);
 			init_irq_work(&rq->semaphore_work, irq_semaphore_cb);