diff mbox series

[1/3] drm/i915: Fix negative remaining time after retire requests

Message ID 20221116112532.36253-2-janusz.krzysztofik@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Fix timeout handling when retiring requests | expand

Commit Message

Janusz Krzysztofik Nov. 16, 2022, 11:25 a.m. UTC
Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
with GuC") extended the API of intel_gt_retire_requests_timeout() with an
extra argument 'remaining_timeout', intended for passing back unconsumed
portion of requested timeout when 0 (success) is returned.  However, when
request retirement happens to succeed despite an error returned by
dma_fence_wait_timeout(), the error code (a negative value) is passed back
instead of remaining time.  If a user then passes that negative value
forward as requested timeout to another wait, an explicit WARN or BUG can
be triggered.

Instead of copying the value of timeout variable to *remaining_timeout
before return, update the *remaining_timeout after each DMA fence wait.
Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been
consumed on other errors returned from the wait.

Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: stable@vger.kernel.org # v5.15+
---
 drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

Comments

Andrzej Hajda Nov. 16, 2022, 1:13 p.m. UTC | #1
On 16.11.2022 12:25, Janusz Krzysztofik wrote:
> Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> with GuC") extended the API of intel_gt_retire_requests_timeout() with an
> extra argument 'remaining_timeout', intended for passing back unconsumed
> portion of requested timeout when 0 (success) is returned.  However, when
> request retirement happens to succeed despite an error returned by
> dma_fence_wait_timeout(), the error code (a negative value) is passed back
> instead of remaining time.  If a user then passes that negative value
> forward as requested timeout to another wait, an explicit WARN or BUG can
> be triggered.
> 
> Instead of copying the value of timeout variable to *remaining_timeout
> before return, update the *remaining_timeout after each DMA fence wait.
> Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been
> consumed on other errors returned from the wait.
> 
> Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Cc: stable@vger.kernel.org # v5.15+
> ---
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++++++++++++++++++---
>   1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index edb881d756309..ccaf2fd80625b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -138,6 +138,9 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>   	unsigned long active_count = 0;
>   	LIST_HEAD(free);
>   
> +	if (remaining_timeout)
> +		*remaining_timeout = timeout;
> +
>   	flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */
>   	spin_lock(&timelines->lock);
>   	list_for_each_entry_safe(tl, tn, &timelines->active_list, link) {
> @@ -163,6 +166,23 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>   								 timeout);
>   				dma_fence_put(fence);
>   
> +				if (remaining_timeout) {
> +					/*
> +					 * If we get an error here but request
> +					 * retirement succeeds anyway
> +					 * (!active_count) and we return 0, the
> +					 * caller may want to spend remaining
> +					 * time on waiting for other events.
> +					 */
> +					if (timeout == -ETIME ||
> +					    timeout == -EINTR ||
> +					    timeout == -ERESTARTSYS)
> +						*remaining_timeout = 0;
> +					else if (timeout >= 0)
> +						*remaining_timeout = timeout;
> +					/* else assume no time consumed */

Looks correct, but the crazy semantic of dma_fence_wait_timeout does not 
make it easy to understand.

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej


> +				}
> +
>   				/* Retirement is best effort */
>   				if (!mutex_trylock(&tl->mutex)) {
>   					active_count++;
> @@ -196,9 +216,6 @@ out_active:	spin_lock(&timelines->lock);
>   	if (flush_submission(gt, timeout)) /* Wait, there's more! */
>   		active_count++;
>   
> -	if (remaining_timeout)
> -		*remaining_timeout = timeout;
> -
>   	return active_count ? timeout : 0;
>   }
>
Nirmoy Das Nov. 17, 2022, 9:58 a.m. UTC | #2
On 11/16/2022 12:25 PM, Janusz Krzysztofik wrote:

> Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> with GuC") extended the API of intel_gt_retire_requests_timeout() with an
> extra argument 'remaining_timeout', intended for passing back unconsumed
> portion of requested timeout when 0 (success) is returned.  However, when
> request retirement happens to succeed despite an error returned by
> dma_fence_wait_timeout(), the error code (a negative value) is passed back
> instead of remaining time.  If a user then passes that negative value
> forward as requested timeout to another wait, an explicit WARN or BUG can
> be triggered.
>
> Instead of copying the value of timeout variable to *remaining_timeout
> before return, update the *remaining_timeout after each DMA fence wait.


Thanks for the detailed comment, indeed we were not accounting for the 
return value of dma_fence_wait_timeout()

Acked-by: Nirmoy Das <nirmoy.das@intel.com>


Thanks,

Nirmoy


> Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been
> consumed on other errors returned from the wait.
>
> Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Cc: stable@vger.kernel.org # v5.15+
> ---
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++++++++++++++++++---
>   1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index edb881d756309..ccaf2fd80625b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -138,6 +138,9 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>   	unsigned long active_count = 0;
>   	LIST_HEAD(free);
>   
> +	if (remaining_timeout)
> +		*remaining_timeout = timeout;
> +
>   	flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */
>   	spin_lock(&timelines->lock);
>   	list_for_each_entry_safe(tl, tn, &timelines->active_list, link) {
> @@ -163,6 +166,23 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>   								 timeout);
>   				dma_fence_put(fence);
>   
> +				if (remaining_timeout) {
> +					/*
> +					 * If we get an error here but request
> +					 * retirement succeeds anyway
> +					 * (!active_count) and we return 0, the
> +					 * caller may want to spend remaining
> +					 * time on waiting for other events.
> +					 */
> +					if (timeout == -ETIME ||
> +					    timeout == -EINTR ||
> +					    timeout == -ERESTARTSYS)
> +						*remaining_timeout = 0;
> +					else if (timeout >= 0)
> +						*remaining_timeout = timeout;
> +					/* else assume no time consumed */
> +				}
> +
>   				/* Retirement is best effort */
>   				if (!mutex_trylock(&tl->mutex)) {
>   					active_count++;
> @@ -196,9 +216,6 @@ out_active:	spin_lock(&timelines->lock);
>   	if (flush_submission(gt, timeout)) /* Wait, there's more! */
>   		active_count++;
>   
> -	if (remaining_timeout)
> -		*remaining_timeout = timeout;
> -
>   	return active_count ? timeout : 0;
>   }
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index edb881d756309..ccaf2fd80625b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -138,6 +138,9 @@  long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
 	unsigned long active_count = 0;
 	LIST_HEAD(free);
 
+	if (remaining_timeout)
+		*remaining_timeout = timeout;
+
 	flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */
 	spin_lock(&timelines->lock);
 	list_for_each_entry_safe(tl, tn, &timelines->active_list, link) {
@@ -163,6 +166,23 @@  long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
 								 timeout);
 				dma_fence_put(fence);
 
+				if (remaining_timeout) {
+					/*
+					 * If we get an error here but request
+					 * retirement succeeds anyway
+					 * (!active_count) and we return 0, the
+					 * caller may want to spend remaining
+					 * time on waiting for other events.
+					 */
+					if (timeout == -ETIME ||
+					    timeout == -EINTR ||
+					    timeout == -ERESTARTSYS)
+						*remaining_timeout = 0;
+					else if (timeout >= 0)
+						*remaining_timeout = timeout;
+					/* else assume no time consumed */
+				}
+
 				/* Retirement is best effort */
 				if (!mutex_trylock(&tl->mutex)) {
 					active_count++;
@@ -196,9 +216,6 @@  out_active:	spin_lock(&timelines->lock);
 	if (flush_submission(gt, timeout)) /* Wait, there's more! */
 		active_count++;
 
-	if (remaining_timeout)
-		*remaining_timeout = timeout;
-
 	return active_count ? timeout : 0;
 }