diff mbox series

[v3,1/2] drm/i915: Fix negative value passed as remaining time

Message ID 20221121145655.75141-2-janusz.krzysztofik@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Fix timeout handling when retiring requests | expand

Commit Message

Janusz Krzysztofik Nov. 21, 2022, 2:56 p.m. UTC
Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
with GuC") extended the API of intel_gt_retire_requests_timeout() with an
extra argument 'remaining_timeout', intended for passing back unconsumed
portion of requested timeout when 0 (success) is returned.  However, when
request retirement happens to succeed despite an error returned by a call
to dma_fence_wait_timeout(), that error code (a negative value) is passed
back instead of remaining time.  If we then pass that negative value
forward as requested timeout to intel_uc_wait_for_idle(), an explicit BUG
will be triggered.

If request retirement succeeds but an error code is passed back via
remaininig_timeout, we may have no clue on how much of the initial timeout
might have been left for spending it on waiting for GuC to become idle.
OTOH, since all pending requests have been successfully retired, that
error code has been already ignored by intel_gt_retire_requests_timeout(),
then we shouldn't fail.

Assume no more time has been left on error and pass 0 timeout value to
intel_uc_wait_for_idle() to give it a chance to return success if GuC is
already idle.

v3: Don't fail on any error passed back via remaining_timeout.

v2: Fix the issue on the caller side, not the provider.

Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: stable@vger.kernel.org # v5.15+
---
 drivers/gpu/drm/i915/gt/intel_gt.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Andrzej Hajda Nov. 21, 2022, 5:40 p.m. UTC | #1
On 21.11.2022 15:56, Janusz Krzysztofik wrote:
> Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> with GuC") extended the API of intel_gt_retire_requests_timeout() with an
> extra argument 'remaining_timeout', intended for passing back unconsumed
> portion of requested timeout when 0 (success) is returned.  However, when
> request retirement happens to succeed despite an error returned by a call
> to dma_fence_wait_timeout(), that error code (a negative value) is passed
> back instead of remaining time.  If we then pass that negative value
> forward as requested timeout to intel_uc_wait_for_idle(), an explicit BUG
> will be triggered.
> 
> If request retirement succeeds but an error code is passed back via
> remaininig_timeout, we may have no clue on how much of the initial timeout
> might have been left for spending it on waiting for GuC to become idle.
> OTOH, since all pending requests have been successfully retired, that
> error code has been already ignored by intel_gt_retire_requests_timeout(),
> then we shouldn't fail.
> 
> Assume no more time has been left on error and pass 0 timeout value to
> intel_uc_wait_for_idle() to give it a chance to return success if GuC is
> already idle.
> 
> v3: Don't fail on any error passed back via remaining_timeout.
> 
> v2: Fix the issue on the caller side, not the provider.
> 
> Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Cc: stable@vger.kernel.org # v5.15+

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej

> ---
>   drivers/gpu/drm/i915/gt/intel_gt.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index b5ad9caa55372..7ef0edb2e37cd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -677,8 +677,13 @@ int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
>   			return -EINTR;
>   	}
>   
> -	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
> -							  remaining_timeout);
> +	if (timeout)
> +		return timeout;
> +
> +	if (remaining_timeout < 0)
> +		remaining_timeout = 0;
> +
> +	return intel_uc_wait_for_idle(&gt->uc, remaining_timeout);
>   }
>   
>   int intel_gt_init(struct intel_gt *gt)
Janusz Krzysztofik Nov. 21, 2022, 11:19 p.m. UTC | #2
Hi Andrzej,

Thanks for providing your R-b.

On Monday, 21 November 2022 18:40:51 CET Andrzej Hajda wrote:
> On 21.11.2022 15:56, Janusz Krzysztofik wrote:
> > Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> > with GuC") extended the API of intel_gt_retire_requests_timeout() with an
> > extra argument 'remaining_timeout', intended for passing back unconsumed
> > portion of requested timeout when 0 (success) is returned.  However, when
> > request retirement happens to succeed despite an error returned by a call
> > to dma_fence_wait_timeout(), that error code (a negative value) is passed
> > back instead of remaining time.  If we then pass that negative value
> > forward as requested timeout to intel_uc_wait_for_idle(), an explicit BUG
> > will be triggered.
> > 
> > If request retirement succeeds but an error code is passed back via
> > remaininig_timeout, we may have no clue on how much of the initial timeout
> > might have been left for spending it on waiting for GuC to become idle.
> > OTOH, since all pending requests have been successfully retired, that
> > error code has been already ignored by intel_gt_retire_requests_timeout(),
> > then we shouldn't fail.
> > 
> > Assume no more time has been left on error and pass 0 timeout value to
> > intel_uc_wait_for_idle() to give it a chance to return success if GuC is
> > already idle.
> > 
> > v3: Don't fail on any error passed back via remaining_timeout.
> > 
> > v2: Fix the issue on the caller side, not the provider.
> > 
> > Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work 
with GuC")
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > Cc: stable@vger.kernel.org # v5.15+
> 
> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

While still open for comments from others, I'm now looking for potential 
committer.

Thanks,
Janusz


> 
> Regards
> Andrzej
> 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_gt.c | 9 +++++++--
> >   1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/
intel_gt.c
> > index b5ad9caa55372..7ef0edb2e37cd 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > @@ -677,8 +677,13 @@ int intel_gt_wait_for_idle(struct intel_gt *gt, long 
timeout)
> >   			return -EINTR;
> >   	}
> >   
> > -	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
> > -							  
remaining_timeout);
> > +	if (timeout)
> > +		return timeout;
> > +
> > +	if (remaining_timeout < 0)
> > +		remaining_timeout = 0;
> > +
> > +	return intel_uc_wait_for_idle(&gt->uc, remaining_timeout);
> >   }
> >   
> >   int intel_gt_init(struct intel_gt *gt)
> 
>
Tvrtko Ursulin Nov. 22, 2022, 10:41 a.m. UTC | #3
On 21/11/2022 23:19, Janusz Krzysztofik wrote:
> Hi Andrzej,
> 
> Thanks for providing your R-b.
> 
> On Monday, 21 November 2022 18:40:51 CET Andrzej Hajda wrote:
>> On 21.11.2022 15:56, Janusz Krzysztofik wrote:
>>> Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
>>> with GuC") extended the API of intel_gt_retire_requests_timeout() with an
>>> extra argument 'remaining_timeout', intended for passing back unconsumed
>>> portion of requested timeout when 0 (success) is returned.  However, when
>>> request retirement happens to succeed despite an error returned by a call
>>> to dma_fence_wait_timeout(), that error code (a negative value) is passed
>>> back instead of remaining time.  If we then pass that negative value
>>> forward as requested timeout to intel_uc_wait_for_idle(), an explicit BUG
>>> will be triggered.

Right, AFAICT a GEM_BUG_ON in debug builds, but in production builds 
negative timeout will get passed along all the way to schedule_timeout 
where error and call trace will be dumped. So fix appears warranted indeed.

>>> If request retirement succeeds but an error code is passed back via
>>> remaininig_timeout, we may have no clue on how much of the initial timeout
>>> might have been left for spending it on waiting for GuC to become idle.
>>> OTOH, since all pending requests have been successfully retired, that
>>> error code has been already ignored by intel_gt_retire_requests_timeout(),
>>> then we shouldn't fail.
>>>
>>> Assume no more time has been left on error and pass 0 timeout value to
>>> intel_uc_wait_for_idle() to give it a chance to return success if GuC is
>>> already idle.
>>>
>>> v3: Don't fail on any error passed back via remaining_timeout.
>>>
>>> v2: Fix the issue on the caller side, not the provider.
>>>
>>> Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> with GuC")
>>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
>>> Cc: stable@vger.kernel.org # v5.15+
>>
>> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
> 
> While still open for comments from others, I'm now looking for potential
> committer.

Both patches are considered good to go?

Regards,

Tvrtko

> 
> Thanks,
> Janusz
> 
> 
>>
>> Regards
>> Andrzej
>>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_gt.c | 9 +++++++--
>>>    1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/
> intel_gt.c
>>> index b5ad9caa55372..7ef0edb2e37cd 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>> @@ -677,8 +677,13 @@ int intel_gt_wait_for_idle(struct intel_gt *gt, long
> timeout)
>>>    			return -EINTR;
>>>    	}
>>>    
>>> -	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
>>> -							
> remaining_timeout);
>>> +	if (timeout)
>>> +		return timeout;
>>> +
>>> +	if (remaining_timeout < 0)
>>> +		remaining_timeout = 0;
>>> +
>>> +	return intel_uc_wait_for_idle(&gt->uc, remaining_timeout);
>>>    }
>>>    
>>>    int intel_gt_init(struct intel_gt *gt)
>>
>>
> 
> 
> 
>
Janusz Krzysztofik Nov. 23, 2022, 11:29 a.m. UTC | #4
On Tuesday, 22 November 2022 11:41:29 CET Tvrtko Ursulin wrote:
> 
> On 21/11/2022 23:19, Janusz Krzysztofik wrote:
> > Hi Andrzej,
> > 
> > Thanks for providing your R-b.
> > 
> > On Monday, 21 November 2022 18:40:51 CET Andrzej Hajda wrote:
> >> On 21.11.2022 15:56, Janusz Krzysztofik wrote:
> >>> Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> >>> with GuC") extended the API of intel_gt_retire_requests_timeout() with an
> >>> extra argument 'remaining_timeout', intended for passing back unconsumed
> >>> portion of requested timeout when 0 (success) is returned.  However, when
> >>> request retirement happens to succeed despite an error returned by a call
> >>> to dma_fence_wait_timeout(), that error code (a negative value) is passed
> >>> back instead of remaining time.  If we then pass that negative value
> >>> forward as requested timeout to intel_uc_wait_for_idle(), an explicit BUG
> >>> will be triggered.
> 
> Right, AFAICT a GEM_BUG_ON in debug builds, but in production builds 
> negative timeout will get passed along all the way to schedule_timeout 
> where error and call trace will be dumped. So fix appears warranted indeed.
> 
> >>> If request retirement succeeds but an error code is passed back via
> >>> remaininig_timeout, we may have no clue on how much of the initial timeout
> >>> might have been left for spending it on waiting for GuC to become idle.
> >>> OTOH, since all pending requests have been successfully retired, that
> >>> error code has been already ignored by intel_gt_retire_requests_timeout(),
> >>> then we shouldn't fail.
> >>>
> >>> Assume no more time has been left on error and pass 0 timeout value to
> >>> intel_uc_wait_for_idle() to give it a chance to return success if GuC is
> >>> already idle.
> >>>
> >>> v3: Don't fail on any error passed back via remaining_timeout.
> >>>
> >>> v2: Fix the issue on the caller side, not the provider.
> >>>
> >>> Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
> > with GuC")
> >>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> >>> Cc: stable@vger.kernel.org # v5.15+
> >>
> >> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
> > 
> > While still open for comments from others, I'm now looking for potential
> > committer.
> 
> Both patches are considered good to go?

Yes, I hope so.  The actual diff of 2/2 v3 has received R-b from Andrzej still 
under discussion of v2, then I decided to add that tag to v3.

Andrzej, can you please respond to 2/2 v3 and confirm your R-b applies?

Thanks,
Janusz

> 
> Regards,
> 
> Tvrtko
> 
> > 
> > Thanks,
> > Janusz
> > 
> > 
> >>
> >> Regards
> >> Andrzej
> >>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_gt.c | 9 +++++++--
> >>>    1 file changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/
> > intel_gt.c
> >>> index b5ad9caa55372..7ef0edb2e37cd 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> >>> @@ -677,8 +677,13 @@ int intel_gt_wait_for_idle(struct intel_gt *gt, long
> > timeout)
> >>>    			return -EINTR;
> >>>    	}
> >>>    
> >>> -	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
> >>> -							
> > remaining_timeout);
> >>> +	if (timeout)
> >>> +		return timeout;
> >>> +
> >>> +	if (remaining_timeout < 0)
> >>> +		remaining_timeout = 0;
> >>> +
> >>> +	return intel_uc_wait_for_idle(&gt->uc, remaining_timeout);
> >>>    }
> >>>    
> >>>    int intel_gt_init(struct intel_gt *gt)
> >>
> >>
> > 
> > 
> > 
> > 
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index b5ad9caa55372..7ef0edb2e37cd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -677,8 +677,13 @@  int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
 			return -EINTR;
 	}
 
-	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
-							  remaining_timeout);
+	if (timeout)
+		return timeout;
+
+	if (remaining_timeout < 0)
+		remaining_timeout = 0;
+
+	return intel_uc_wait_for_idle(&gt->uc, remaining_timeout);
 }
 
 int intel_gt_init(struct intel_gt *gt)