diff mbox series

drm/i915: fix remaining_timeout in intel_gt_retire_requests_timeout

Message ID 20220325175839.2717499-1-daniele.ceraolospurio@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: fix remaining_timeout in intel_gt_retire_requests_timeout | expand

Commit Message

Daniele Ceraolo Spurio March 25, 2022, 5:58 p.m. UTC
In intel_gt_wait_for_idle, we use the remaining timeout returned from
intel_gt_retire_requests_timeout to wait on the GuC being idle. However,
the returned variable can have a negative value if something goes wrong
during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait
function.
To fix this, make sure to only return the timeout if it is positive.

Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Matthew Brost March 25, 2022, 6:29 p.m. UTC | #1
On Fri, Mar 25, 2022 at 10:58:39AM -0700, Daniele Ceraolo Spurio wrote:
> In intel_gt_wait_for_idle, we use the remaining timeout returned from
> intel_gt_retire_requests_timeout to wait on the GuC being idle. However,
> the returned variable can have a negative value if something goes wrong
> during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait
> function.
> To fix this, make sure to only return the timeout if it is positive.
> 
> Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: John Harrison <john.c.harrison@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index edb881d756309..ef70c209976d8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -197,7 +197,7 @@ out_active:	spin_lock(&timelines->lock);
>  		active_count++;
>  
>  	if (remaining_timeout)
> -		*remaining_timeout = timeout;
> +		*remaining_timeout = timeout > 0 ? timeout : 0;
>  
>  	return active_count ? timeout : 0;
>  }
> -- 
> 2.25.1
>
Nirmoy Das March 25, 2022, 6:37 p.m. UTC | #2
On 3/25/2022 6:58 PM, Daniele Ceraolo Spurio wrote:
> In intel_gt_wait_for_idle, we use the remaining timeout returned from
> intel_gt_retire_requests_timeout to wait on the GuC being idle. However,
> the returned variable can have a negative value if something goes wrong
> during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait
> function.
> To fix this, make sure to only return the timeout if it is positive.
>
> Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: John Harrison <john.c.harrison@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index edb881d756309..ef70c209976d8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -197,7 +197,7 @@ out_active:	spin_lock(&timelines->lock);
>   		active_count++;
>   
>   	if (remaining_timeout)
> -		*remaining_timeout = timeout;
> +		*remaining_timeout = timeout > 0 ? timeout : 0;


Should the last flush_submission() be  "if ( timeout > 0 
&&flush_submission(gt, timeout))" ?


Nirmoy

>   
>   	return active_count ? timeout : 0;
>   }
Daniele Ceraolo Spurio March 25, 2022, 8:33 p.m. UTC | #3
On 3/25/2022 11:37 AM, Das, Nirmoy wrote:
>
> On 3/25/2022 6:58 PM, Daniele Ceraolo Spurio wrote:
>> In intel_gt_wait_for_idle, we use the remaining timeout returned from
>> intel_gt_retire_requests_timeout to wait on the GuC being idle. However,
>> the returned variable can have a negative value if something goes wrong
>> during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait
>> function.
>> To fix this, make sure to only return the timeout if it is positive.
>>
>> Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle to 
>> work with GuC")
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: John Harrison <john.c.harrison@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
>> b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>> index edb881d756309..ef70c209976d8 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>> @@ -197,7 +197,7 @@ out_active: spin_lock(&timelines->lock);
>>           active_count++;
>>         if (remaining_timeout)
>> -        *remaining_timeout = timeout;
>> +        *remaining_timeout = timeout > 0 ? timeout : 0;
>
>
> Should the last flush_submission() be  "if ( timeout > 0 
> &&flush_submission(gt, timeout))" ?

I considered it, but flush_submission only checks for timeout != 0 so it 
won't accidentally make use of a negative value thinking it's positive. 
I don't know if the flush is purposely done even if timeout is negative 
or if that's a mistake, but that code has been there long before we 
modified the function to return the remaining timeout and never seems to 
have caused issues, so I decided not to change it.

Daniele

>
>
> Nirmoy
>
>>         return active_count ? timeout : 0;
>>   }
Nirmoy Das March 28, 2022, 9:16 a.m. UTC | #4
On 3/25/2022 9:33 PM, Ceraolo Spurio, Daniele wrote:
>
>
> On 3/25/2022 11:37 AM, Das, Nirmoy wrote:
>>
>> On 3/25/2022 6:58 PM, Daniele Ceraolo Spurio wrote:
>>> In intel_gt_wait_for_idle, we use the remaining timeout returned from
>>> intel_gt_retire_requests_timeout to wait on the GuC being idle. 
>>> However,
>>> the returned variable can have a negative value if something goes wrong
>>> during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait
>>> function.
>>> To fix this, make sure to only return the timeout if it is positive.
>>>
>>> Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle 
>>> to work with GuC")
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
>>> b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> index edb881d756309..ef70c209976d8 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> @@ -197,7 +197,7 @@ out_active: spin_lock(&timelines->lock);
>>>           active_count++;
>>>         if (remaining_timeout)
>>> -        *remaining_timeout = timeout;
>>> +        *remaining_timeout = timeout > 0 ? timeout : 0;
>>
>>
>> Should the last flush_submission() be  "if ( timeout > 0 
>> &&flush_submission(gt, timeout))" ?
>
> I considered it, but flush_submission only checks for timeout != 0 so 
> it won't accidentally make use of a negative value thinking it's 
> positive. I don't know if the flush is purposely done even if timeout 
> is negative or if that's a mistake, but that code has been there long 
> before we modified the function to return the remaining timeout and 
> never seems to have caused issues, so I decided not to change it.


Yes, we need clarify if we really need the final flush if the timeout is 
negative.

But this patch  is Acked-by: Nirmoy Das <nirmoy.das@intel.com>

Nirmoy

>
> Daniele
>
>>
>>
>> Nirmoy
>>
>>>         return active_count ? timeout : 0;
>>>   }
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index edb881d756309..ef70c209976d8 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -197,7 +197,7 @@  out_active:	spin_lock(&timelines->lock);
 		active_count++;
 
 	if (remaining_timeout)
-		*remaining_timeout = timeout;
+		*remaining_timeout = timeout > 0 ? timeout : 0;
 
 	return active_count ? timeout : 0;
 }