diff mbox series

[1/3] drm/i915: Fix timeout handling when retiring requests

Message ID 20221109190937.64155-2-janusz.krzysztofik@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Fix timeout handling when retiring requests | expand

Commit Message

Janusz Krzysztofik Nov. 9, 2022, 7:09 p.m. UTC
I believe that intel_gt_retire_requests_timeout() should return either
-ETIME if all time designated by timeout argument has been consumed while
waiting for fences being signaled, or remaining time if there are requests
still not retired, or 0 otherwise.  In the latter case, remaining time
should be passed back via remaining_timeout argument.

Remaining time is updated with return value of each consecutive call to
dma_fence_wait_timeout().  If an error code is returned instead of
remaining time, a few potentially unexpected side effects occur:
- we no longer wait for consecutive timelines' last request fences being
  signaled before we try to retire requests from those timelines -- while
  expected in case of -ETIME, that's probably not intended in case of
  other errors that dma_fence_wait_timeout() can return,
- the error code (a negative value) is passed back as remaining time and
  if no more requests happen to be left pending despite the error, a user
  may pass that value forward as a remaining timeout -- that can
  potentially trigger a WARN or BUG,
- potentially unexpected error code is returned to user when a
  non-critical error that probably shouldn't stop the user from retrying
  occurs while active requests are still pending.
Moreover, should dma_fence_wait_timeout() ever return 0 (which should mean
timeout expiration) while we are processing requests and there are still
pending requests when we are about to return, that 0 value is returned to
user like if all requests were successfully retired.

Ignore error codes from dma_fence_wait_timeout() other than -ETIME and
don't overwrite remaining time with those error codes.  Also, convert 0
value returned by dma_fence_wait_timeout() to -ETIME.

Fixes: f33a8a51602c ("drm/i915: Merge wait_for_timelines with retire_request")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: stable@vger.kernel.org # v5.5+
---
 drivers/gpu/drm/i915/gt/intel_gt_requests.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index edb881d756309..6c3b8ac3055c3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -156,11 +156,22 @@  long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
 
 			fence = i915_active_fence_get(&tl->last_request);
 			if (fence) {
+				signed long time_left;
+
 				mutex_unlock(&tl->mutex);
 
-				timeout = dma_fence_wait_timeout(fence,
-								 true,
-								 timeout);
+				time_left = dma_fence_wait_timeout(fence,
+								   true,
+								   timeout);
+				/*
+				 * 0 or -ETIME: timeout expired
+				 * other errors: ignore, assume no time consumed
+				 */
+				if (time_left == -ETIME || time_left == 0)
+					timeout = -ETIME;
+				else if (time_left > 0)
+					timeout = time_left;
+
 				dma_fence_put(fence);
 
 				/* Retirement is best effort */