Message ID | 20221118104222.57328-2-janusz.krzysztofik@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Fix timeout handling when retiring requests | expand |
On Friday, 18 November 2022 11:42:21 CET Janusz Krzysztofik wrote: > Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work > with GuC") extended the API of intel_gt_retire_requests_timeout() with an > extra argument 'remaining_timeout', intended for passing back unconsumed > portion of requested timeout when 0 (success) is returned. However, when > request retirement happens to succeed despite an error returned by a call > to dma_fence_wait_timeout(), that error code (a negative value) is passed > back instead of remaining time. If we then pass that negative value > forward as requested timeout to intel_uc_wait_for_idle(), an explicit BUG > will be triggered. > > If request retirement succeeds but an error code other than -ETIME is > passed back via remaininig_timeout, we have no clue on how much of > the initial timeout might have been left for spending it on waiting for > GuC to become idle. Then, we have no choice other than fail in that case > -- do it. Looking at this again, I think we should ignore those errors, like they have been already ignored by intel_gt_retire_requests_timeout() returning 0, and call intel_uc_wait_for_idle() with 0 timeout. I'll submit new version if you agree. Thanks, Janusz > However, if -ETIME is returned via remaining_timeout then we > know that no more time has been left. Then, pass 0 timeout value to > intel_uc_wait_for_idle() to give it a chance to return success if GuC is > already idle. > > v2: Fix the issue on the caller side, not the provider. > > Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > Cc: stable@vger.kernel.org # v5.15+ > --- > drivers/gpu/drm/i915/gt/intel_gt.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/ intel_gt.c > index 0325f071046ca..5d612ba547d23 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > @@ -677,8 +677,15 @@ int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) > return -EINTR; > } > > - return timeout ? timeout : intel_uc_wait_for_idle(>->uc, > - remaining_timeout); > + if (timeout) > + return timeout; > + > + if (remaining_timeout == -ETIME) > + remaining_timeout = 0; > + else if (remaining_timeout < 0) > + return remaining_timeout; > + > + return intel_uc_wait_for_idle(>->uc, remaining_timeout); > } > > int intel_gt_init(struct intel_gt *gt) >
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 0325f071046ca..5d612ba547d23 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -677,8 +677,15 @@ int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) return -EINTR; } - return timeout ? timeout : intel_uc_wait_for_idle(>->uc, - remaining_timeout); + if (timeout) + return timeout; + + if (remaining_timeout == -ETIME) + remaining_timeout = 0; + else if (remaining_timeout < 0) + return remaining_timeout; + + return intel_uc_wait_for_idle(>->uc, remaining_timeout); } int intel_gt_init(struct intel_gt *gt)
Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") extended the API of intel_gt_retire_requests_timeout() with an extra argument 'remaining_timeout', intended for passing back unconsumed portion of requested timeout when 0 (success) is returned. However, when request retirement happens to succeed despite an error returned by a call to dma_fence_wait_timeout(), that error code (a negative value) is passed back instead of remaining time. If we then pass that negative value forward as requested timeout to intel_uc_wait_for_idle(), an explicit BUG will be triggered. If request retirement succeeds but an error code other than -ETIME is passed back via remaininig_timeout, we have no clue on how much of the initial timeout might have been left for spending it on waiting for GuC to become idle. Then, we have no choice other than fail in that case -- do it. However, if -ETIME is returned via remaining_timeout then we know that no more time has been left. Then, pass 0 timeout value to intel_uc_wait_for_idle() to give it a chance to return success if GuC is already idle. v2: Fix the issue on the caller side, not the provider. Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: stable@vger.kernel.org # v5.15+ --- drivers/gpu/drm/i915/gt/intel_gt.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)