Message ID | 20221116112532.36253-2-janusz.krzysztofik@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Fix timeout handling when retiring requests | expand |
On 16.11.2022 12:25, Janusz Krzysztofik wrote: > Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work > with GuC") extended the API of intel_gt_retire_requests_timeout() with an > extra argument 'remaining_timeout', intended for passing back unconsumed > portion of requested timeout when 0 (success) is returned. However, when > request retirement happens to succeed despite an error returned by > dma_fence_wait_timeout(), the error code (a negative value) is passed back > instead of remaining time. If a user then passes that negative value > forward as requested timeout to another wait, an explicit WARN or BUG can > be triggered. > > Instead of copying the value of timeout variable to *remaining_timeout > before return, update the *remaining_timeout after each DMA fence wait. > Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been > consumed on other errors returned from the wait. > > Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > Cc: stable@vger.kernel.org # v5.15+ > --- > drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > index edb881d756309..ccaf2fd80625b 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > @@ -138,6 +138,9 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, > unsigned long active_count = 0; > LIST_HEAD(free); > > + if (remaining_timeout) > + *remaining_timeout = timeout; > + > flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */ > spin_lock(&timelines->lock); > list_for_each_entry_safe(tl, tn, &timelines->active_list, link) { > @@ -163,6 +166,23 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, > timeout); > dma_fence_put(fence); > > + if (remaining_timeout) { > + /* > + * If we get an error here but request > + * retirement succeeds anyway > + * (!active_count) and we return 0, the > + * caller may want to spend remaining > + * time on waiting for other events. > + */ > + if (timeout == -ETIME || > + timeout == -EINTR || > + timeout == -ERESTARTSYS) > + *remaining_timeout = 0; > + else if (timeout >= 0) > + *remaining_timeout = timeout; > + /* else assume no time consumed */ Looks correct, but the crazy semantic of dma_fence_wait_timeout does not make it easy to understand. Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Regards Andrzej > + } > + > /* Retirement is best effort */ > if (!mutex_trylock(&tl->mutex)) { > active_count++; > @@ -196,9 +216,6 @@ out_active: spin_lock(&timelines->lock); > if (flush_submission(gt, timeout)) /* Wait, there's more! */ > active_count++; > > - if (remaining_timeout) > - *remaining_timeout = timeout; > - > return active_count ? timeout : 0; > } >
On 11/16/2022 12:25 PM, Janusz Krzysztofik wrote: > Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work > with GuC") extended the API of intel_gt_retire_requests_timeout() with an > extra argument 'remaining_timeout', intended for passing back unconsumed > portion of requested timeout when 0 (success) is returned. However, when > request retirement happens to succeed despite an error returned by > dma_fence_wait_timeout(), the error code (a negative value) is passed back > instead of remaining time. If a user then passes that negative value > forward as requested timeout to another wait, an explicit WARN or BUG can > be triggered. > > Instead of copying the value of timeout variable to *remaining_timeout > before return, update the *remaining_timeout after each DMA fence wait. Thanks for the detailed comment, indeed we were not accounting for the return value of dma_fence_wait_timeout() Acked-by: Nirmoy Das <nirmoy.das@intel.com> Thanks, Nirmoy > Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been > consumed on other errors returned from the wait. > > Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > Cc: stable@vger.kernel.org # v5.15+ > --- > drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > index edb881d756309..ccaf2fd80625b 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > @@ -138,6 +138,9 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, > unsigned long active_count = 0; > LIST_HEAD(free); > > + if (remaining_timeout) > + *remaining_timeout = timeout; > + > flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */ > spin_lock(&timelines->lock); > list_for_each_entry_safe(tl, tn, &timelines->active_list, link) { > @@ -163,6 +166,23 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, > timeout); > dma_fence_put(fence); > > + if (remaining_timeout) { > + /* > + * If we get an error here but request > + * retirement succeeds anyway > + * (!active_count) and we return 0, the > + * caller may want to spend remaining > + * time on waiting for other events. > + */ > + if (timeout == -ETIME || > + timeout == -EINTR || > + timeout == -ERESTARTSYS) > + *remaining_timeout = 0; > + else if (timeout >= 0) > + *remaining_timeout = timeout; > + /* else assume no time consumed */ > + } > + > /* Retirement is best effort */ > if (!mutex_trylock(&tl->mutex)) { > active_count++; > @@ -196,9 +216,6 @@ out_active: spin_lock(&timelines->lock); > if (flush_submission(gt, timeout)) /* Wait, there's more! */ > active_count++; > > - if (remaining_timeout) > - *remaining_timeout = timeout; > - > return active_count ? timeout : 0; > } >
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index edb881d756309..ccaf2fd80625b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -138,6 +138,9 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, unsigned long active_count = 0; LIST_HEAD(free); + if (remaining_timeout) + *remaining_timeout = timeout; + flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */ spin_lock(&timelines->lock); list_for_each_entry_safe(tl, tn, &timelines->active_list, link) { @@ -163,6 +166,23 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, timeout); dma_fence_put(fence); + if (remaining_timeout) { + /* + * If we get an error here but request + * retirement succeeds anyway + * (!active_count) and we return 0, the + * caller may want to spend remaining + * time on waiting for other events. + */ + if (timeout == -ETIME || + timeout == -EINTR || + timeout == -ERESTARTSYS) + *remaining_timeout = 0; + else if (timeout >= 0) + *remaining_timeout = timeout; + /* else assume no time consumed */ + } + /* Retirement is best effort */ if (!mutex_trylock(&tl->mutex)) { active_count++; @@ -196,9 +216,6 @@ out_active: spin_lock(&timelines->lock); if (flush_submission(gt, timeout)) /* Wait, there's more! */ active_count++; - if (remaining_timeout) - *remaining_timeout = timeout; - return active_count ? timeout : 0; }
Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") extended the API of intel_gt_retire_requests_timeout() with an extra argument 'remaining_timeout', intended for passing back unconsumed portion of requested timeout when 0 (success) is returned. However, when request retirement happens to succeed despite an error returned by dma_fence_wait_timeout(), the error code (a negative value) is passed back instead of remaining time. If a user then passes that negative value forward as requested timeout to another wait, an explicit WARN or BUG can be triggered. Instead of copying the value of timeout variable to *remaining_timeout before return, update the *remaining_timeout after each DMA fence wait. Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been consumed on other errors returned from the wait. Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: stable@vger.kernel.org # v5.15+ --- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)