diff mbox series

[02/12] drm/i915/guc: Don't call ring_is_idle in GuC submission

Message ID 20220712233136.1044951-3-John.C.Harrison@Intel.com (mailing list archive)
State New, archived
Headers show
Series Random assortment of (mostly) GuC related patches | expand

Commit Message

John Harrison July 12, 2022, 11:31 p.m. UTC
From: Matthew Brost <matthew.brost@intel.com>

The engine registers really shouldn't be touched during GuC submission
as the GuC owns the registers. Don't call ring_is_idle and tie
intel_engine_is_idle strictly to the engine pm.

Because intel_engine_is_idle tied to the engine pm, retire requests
before checking intel_engines_are_idle in gt_drop_caches, and lastly
increase the timeout in gt_drop_caches for the intel_engines_are_idle
check.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 13 +++++++++++++
 drivers/gpu/drm/i915/i915_debugfs.c       |  6 +++---
 drivers/gpu/drm/i915/i915_drv.h           |  2 +-
 3 files changed, 17 insertions(+), 4 deletions(-)

Comments

Tvrtko Ursulin July 18, 2022, 12:26 p.m. UTC | #1
On 13/07/2022 00:31, John.C.Harrison@Intel.com wrote:
> From: Matthew Brost <matthew.brost@intel.com>
> 
> The engine registers really shouldn't be touched during GuC submission
> as the GuC owns the registers. Don't call ring_is_idle and tie

Touch being just read and it is somehow harmful?

> intel_engine_is_idle strictly to the engine pm.

Strictly seems wrong - it is just ring_is_idle check that is replaced 
and not the whole implementation of intel_engine_is_idle.

> Because intel_engine_is_idle tied to the engine pm, retire requests
> before checking intel_engines_are_idle in gt_drop_caches, and lastly
Why is re-ordering important? I at least can't understand it. I hope 
it's not working around IGT failures.

> increase the timeout in gt_drop_caches for the intel_engines_are_idle
> check.

Same here - why?

Regards,

Tvrtko

> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c | 13 +++++++++++++
>   drivers/gpu/drm/i915/i915_debugfs.c       |  6 +++---
>   drivers/gpu/drm/i915/i915_drv.h           |  2 +-
>   3 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 283870c659911..959a7c92e8f4d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1602,6 +1602,9 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
>   {
>   	bool idle = true;
>   
> +	/* GuC submission shouldn't access HEAD & TAIL via MMIO */
> +	GEM_BUG_ON(intel_engine_uses_guc(engine));
> +
>   	if (I915_SELFTEST_ONLY(!engine->mmio_base))
>   		return true;
>   
> @@ -1668,6 +1671,16 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
>   	if (!i915_sched_engine_is_empty(engine->sched_engine))
>   		return false;
>   
> +	/*
> +	 * We shouldn't touch engine registers with GuC submission as the GuC
> +	 * owns the registers. Let's tie the idle to engine pm, at worst this
> +	 * function sometimes will falsely report non-idle when idle during the
> +	 * delay to retire requests or with virtual engines and a request
> +	 * running on another instance within the same class / submit mask.
> +	 */
> +	if (intel_engine_uses_guc(engine))
> +		return false;
> +
>   	/* Ring stopped? */
>   	return ring_is_idle(engine);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 94e5c29d2ee3a..ee5334840e9cb 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -654,13 +654,13 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
>   {
>   	int ret;
>   
> +	if (val & DROP_RETIRE || val & DROP_RESET_ACTIVE)
> +		intel_gt_retire_requests(gt);
> +
>   	if (val & DROP_RESET_ACTIVE &&
>   	    wait_for(intel_engines_are_idle(gt), I915_IDLE_ENGINES_TIMEOUT))
>   		intel_gt_set_wedged(gt);
>   
> -	if (val & DROP_RETIRE)
> -		intel_gt_retire_requests(gt);
> -
>   	if (val & (DROP_IDLE | DROP_ACTIVE)) {
>   		ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
>   		if (ret)
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c22f29c3faa0e..53c7474dde495 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -278,7 +278,7 @@ struct i915_gem_mm {
>   	u32 shrink_count;
>   };
>   
> -#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
> +#define I915_IDLE_ENGINES_TIMEOUT (500) /* in ms */
>   
>   unsigned long i915_fence_context_timeout(const struct drm_i915_private *i915,
>   					 u64 context);
John Harrison July 19, 2022, 12:09 a.m. UTC | #2
On 7/18/2022 05:26, Tvrtko Ursulin wrote:
>
> On 13/07/2022 00:31, John.C.Harrison@Intel.com wrote:
>> From: Matthew Brost <matthew.brost@intel.com>
>>
>> The engine registers really shouldn't be touched during GuC submission
>> as the GuC owns the registers. Don't call ring_is_idle and tie
>
> Touch being just read and it is somehow harmful?
The registers are meaningless when GuC is controlling the submission. 
The i915 driver has no knowledge of what context is or is not executing 
on any given engine at any given time. So reading reading the ring 
registers is incorrect - it can lead to bad assumptions about what state 
the hardware is in.

>
>> intel_engine_is_idle strictly to the engine pm.
>
> Strictly seems wrong - it is just ring_is_idle check that is replaced 
> and not the whole implementation of intel_engine_is_idle.
>
>> Because intel_engine_is_idle tied to the engine pm, retire requests
>> before checking intel_engines_are_idle in gt_drop_caches, and lastly
> Why is re-ordering important? I at least can't understand it. I hope 
> it's not working around IGT failures.
If requests are physically completed but not retired then they will be 
holding unnecessary PM references. So we need to flush those out before 
checking for idle.

>
>> increase the timeout in gt_drop_caches for the intel_engines_are_idle
>> check.
>
> Same here - why?
@Matthew Brost - do you recall which particular tests were hitting an 
issue? I'm guessing gem_ctx_create? I believe that's the one that 
creates and destroys thousands of contexts. That is much slower with GuC 
(GuC communication required) than with execlists (i915 internal state 
change only).

John.



>
> Regards,
>
> Tvrtko
>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_engine_cs.c | 13 +++++++++++++
>>   drivers/gpu/drm/i915/i915_debugfs.c       |  6 +++---
>>   drivers/gpu/drm/i915/i915_drv.h           |  2 +-
>>   3 files changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> index 283870c659911..959a7c92e8f4d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> @@ -1602,6 +1602,9 @@ static bool ring_is_idle(struct intel_engine_cs 
>> *engine)
>>   {
>>       bool idle = true;
>>   +    /* GuC submission shouldn't access HEAD & TAIL via MMIO */
>> +    GEM_BUG_ON(intel_engine_uses_guc(engine));
>> +
>>       if (I915_SELFTEST_ONLY(!engine->mmio_base))
>>           return true;
>>   @@ -1668,6 +1671,16 @@ bool intel_engine_is_idle(struct 
>> intel_engine_cs *engine)
>>       if (!i915_sched_engine_is_empty(engine->sched_engine))
>>           return false;
>>   +    /*
>> +     * We shouldn't touch engine registers with GuC submission as 
>> the GuC
>> +     * owns the registers. Let's tie the idle to engine pm, at worst 
>> this
>> +     * function sometimes will falsely report non-idle when idle 
>> during the
>> +     * delay to retire requests or with virtual engines and a request
>> +     * running on another instance within the same class / submit mask.
>> +     */
>> +    if (intel_engine_uses_guc(engine))
>> +        return false;
>> +
>>       /* Ring stopped? */
>>       return ring_is_idle(engine);
>>   }
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
>> b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 94e5c29d2ee3a..ee5334840e9cb 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -654,13 +654,13 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
>>   {
>>       int ret;
>>   +    if (val & DROP_RETIRE || val & DROP_RESET_ACTIVE)
>> +        intel_gt_retire_requests(gt);
>> +
>>       if (val & DROP_RESET_ACTIVE &&
>>           wait_for(intel_engines_are_idle(gt), 
>> I915_IDLE_ENGINES_TIMEOUT))
>>           intel_gt_set_wedged(gt);
>>   -    if (val & DROP_RETIRE)
>> -        intel_gt_retire_requests(gt);
>> -
>>       if (val & (DROP_IDLE | DROP_ACTIVE)) {
>>           ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
>>           if (ret)
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index c22f29c3faa0e..53c7474dde495 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -278,7 +278,7 @@ struct i915_gem_mm {
>>       u32 shrink_count;
>>   };
>>   -#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
>> +#define I915_IDLE_ENGINES_TIMEOUT (500) /* in ms */
>>     unsigned long i915_fence_context_timeout(const struct 
>> drm_i915_private *i915,
>>                        u64 context);
Tvrtko Ursulin July 19, 2022, 9:49 a.m. UTC | #3
On 19/07/2022 01:09, John Harrison wrote:
> On 7/18/2022 05:26, Tvrtko Ursulin wrote:
>>
>> On 13/07/2022 00:31, John.C.Harrison@Intel.com wrote:
>>> From: Matthew Brost <matthew.brost@intel.com>
>>>
>>> The engine registers really shouldn't be touched during GuC submission
>>> as the GuC owns the registers. Don't call ring_is_idle and tie
>>
>> Touch being just read and it is somehow harmful?
> The registers are meaningless when GuC is controlling the submission. 
> The i915 driver has no knowledge of what context is or is not executing 
> on any given engine at any given time. So reading reading the ring 
> registers is incorrect - it can lead to bad assumptions about what state 
> the hardware is in.

Same is actually true with the execlists backend. The code in 
ring_is_idle is not concerning itself with which context is running or 
not. Just that the head/tail/ctl appear idle.

Problem/motivation appears to be on a higher than simply ring registers.

I am not claiming it makes sense with Guc and that it has to remain but 
just suggesting for as a minimum clearer commit message.

>>> intel_engine_is_idle strictly to the engine pm.
>>
>> Strictly seems wrong - it is just ring_is_idle check that is replaced 
>> and not the whole implementation of intel_engine_is_idle.
>>
>>> Because intel_engine_is_idle tied to the engine pm, retire requests
>>> before checking intel_engines_are_idle in gt_drop_caches, and lastly
>> Why is re-ordering important? I at least can't understand it. I hope 
>> it's not working around IGT failures.
> If requests are physically completed but not retired then they will be 
> holding unnecessary PM references. So we need to flush those out before 
> checking for idle.

And if they are not as someone passes in DROP_RESET_ACTIVE? They will 
not retire and code still enters intel_engines_are_idle so that has to 
work, no? Something does not align for me still.

>>> increase the timeout in gt_drop_caches for the intel_engines_are_idle
>>> check.
>>
>> Same here - why?
> @Matthew Brost - do you recall which particular tests were hitting an 
> issue? I'm guessing gem_ctx_create? I believe that's the one that 
> creates and destroys thousands of contexts. That is much slower with GuC 
> (GuC communication required) than with execlists (i915 internal state 
> change only).

And if that is a logically separate change please split the patch up.

Regards,

Tvrtko

> 
> John.
> 
> 
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_engine_cs.c | 13 +++++++++++++
>>>   drivers/gpu/drm/i915/i915_debugfs.c       |  6 +++---
>>>   drivers/gpu/drm/i915/i915_drv.h           |  2 +-
>>>   3 files changed, 17 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
>>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> index 283870c659911..959a7c92e8f4d 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> @@ -1602,6 +1602,9 @@ static bool ring_is_idle(struct intel_engine_cs 
>>> *engine)
>>>   {
>>>       bool idle = true;
>>>   +    /* GuC submission shouldn't access HEAD & TAIL via MMIO */
>>> +    GEM_BUG_ON(intel_engine_uses_guc(engine));
>>> +
>>>       if (I915_SELFTEST_ONLY(!engine->mmio_base))
>>>           return true;
>>>   @@ -1668,6 +1671,16 @@ bool intel_engine_is_idle(struct 
>>> intel_engine_cs *engine)
>>>       if (!i915_sched_engine_is_empty(engine->sched_engine))
>>>           return false;
>>>   +    /*
>>> +     * We shouldn't touch engine registers with GuC submission as 
>>> the GuC
>>> +     * owns the registers. Let's tie the idle to engine pm, at worst 
>>> this
>>> +     * function sometimes will falsely report non-idle when idle 
>>> during the
>>> +     * delay to retire requests or with virtual engines and a request
>>> +     * running on another instance within the same class / submit mask.
>>> +     */
>>> +    if (intel_engine_uses_guc(engine))
>>> +        return false;
>>> +
>>>       /* Ring stopped? */
>>>       return ring_is_idle(engine);
>>>   }
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
>>> b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index 94e5c29d2ee3a..ee5334840e9cb 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -654,13 +654,13 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
>>>   {
>>>       int ret;
>>>   +    if (val & DROP_RETIRE || val & DROP_RESET_ACTIVE)
>>> +        intel_gt_retire_requests(gt);
>>> +
>>>       if (val & DROP_RESET_ACTIVE &&
>>>           wait_for(intel_engines_are_idle(gt), 
>>> I915_IDLE_ENGINES_TIMEOUT))
>>>           intel_gt_set_wedged(gt);
>>>   -    if (val & DROP_RETIRE)
>>> -        intel_gt_retire_requests(gt);
>>> -
>>>       if (val & (DROP_IDLE | DROP_ACTIVE)) {
>>>           ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
>>>           if (ret)
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>>> b/drivers/gpu/drm/i915/i915_drv.h
>>> index c22f29c3faa0e..53c7474dde495 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -278,7 +278,7 @@ struct i915_gem_mm {
>>>       u32 shrink_count;
>>>   };
>>>   -#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
>>> +#define I915_IDLE_ENGINES_TIMEOUT (500) /* in ms */
>>>     unsigned long i915_fence_context_timeout(const struct 
>>> drm_i915_private *i915,
>>>                        u64 context);
>
Tvrtko Ursulin July 19, 2022, 10:14 a.m. UTC | #4
On 19/07/2022 10:49, Tvrtko Ursulin wrote:
> 
> On 19/07/2022 01:09, John Harrison wrote:
>> On 7/18/2022 05:26, Tvrtko Ursulin wrote:
>>>
>>> On 13/07/2022 00:31, John.C.Harrison@Intel.com wrote:
>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>
>>>> The engine registers really shouldn't be touched during GuC submission
>>>> as the GuC owns the registers. Don't call ring_is_idle and tie
>>>
>>> Touch being just read and it is somehow harmful?
>> The registers are meaningless when GuC is controlling the submission. 
>> The i915 driver has no knowledge of what context is or is not 
>> executing on any given engine at any given time. So reading reading 
>> the ring registers is incorrect - it can lead to bad assumptions about 
>> what state the hardware is in.
> 
> Same is actually true with the execlists backend. The code in 
> ring_is_idle is not concerning itself with which context is running or 
> not. Just that the head/tail/ctl appear idle.
> 
> Problem/motivation appears to be on a higher than simply ring registers.
> 
> I am not claiming it makes sense with Guc and that it has to remain but 
> just suggesting for as a minimum clearer commit message.
> 
>>>> intel_engine_is_idle strictly to the engine pm.
>>>
>>> Strictly seems wrong - it is just ring_is_idle check that is replaced 
>>> and not the whole implementation of intel_engine_is_idle.
>>>
>>>> Because intel_engine_is_idle tied to the engine pm, retire requests
>>>> before checking intel_engines_are_idle in gt_drop_caches, and lastly
>>> Why is re-ordering important? I at least can't understand it. I hope 
>>> it's not working around IGT failures.
>> If requests are physically completed but not retired then they will be 
>> holding unnecessary PM references. So we need to flush those out 
>> before checking for idle.
> 
> And if they are not as someone passes in DROP_RESET_ACTIVE? They will 
> not retire and code still enters intel_engines_are_idle so that has to 
> work, no? Something does not align for me still.

With "not retire" I meant potentially not retire within 
I915_IDLE_ENGINES_TIMEOUT. I guess hack happens to work for some or all 
IGTs which use DROP_RESET_ACTIVE.

Does it also mean patch would fix that problem without touching 
intel_engine_is_idle/ring_is_idle - with just the re-ordering in 
gt_drop_caches?

Regards,

Tvrtko

> 
>>>> increase the timeout in gt_drop_caches for the intel_engines_are_idle
>>>> check.
>>>
>>> Same here - why?
>> @Matthew Brost - do you recall which particular tests were hitting an 
>> issue? I'm guessing gem_ctx_create? I believe that's the one that 
>> creates and destroys thousands of contexts. That is much slower with 
>> GuC (GuC communication required) than with execlists (i915 internal 
>> state change only).
> 
> And if that is a logically separate change please split the patch up.
> 
> Regards,
> 
> Tvrtko
> 
>>
>> John.
>>
>>
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/gt/intel_engine_cs.c | 13 +++++++++++++
>>>>   drivers/gpu/drm/i915/i915_debugfs.c       |  6 +++---
>>>>   drivers/gpu/drm/i915/i915_drv.h           |  2 +-
>>>>   3 files changed, 17 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
>>>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>>> index 283870c659911..959a7c92e8f4d 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>>> @@ -1602,6 +1602,9 @@ static bool ring_is_idle(struct 
>>>> intel_engine_cs *engine)
>>>>   {
>>>>       bool idle = true;
>>>>   +    /* GuC submission shouldn't access HEAD & TAIL via MMIO */
>>>> +    GEM_BUG_ON(intel_engine_uses_guc(engine));
>>>> +
>>>>       if (I915_SELFTEST_ONLY(!engine->mmio_base))
>>>>           return true;
>>>>   @@ -1668,6 +1671,16 @@ bool intel_engine_is_idle(struct 
>>>> intel_engine_cs *engine)
>>>>       if (!i915_sched_engine_is_empty(engine->sched_engine))
>>>>           return false;
>>>>   +    /*
>>>> +     * We shouldn't touch engine registers with GuC submission as 
>>>> the GuC
>>>> +     * owns the registers. Let's tie the idle to engine pm, at 
>>>> worst this
>>>> +     * function sometimes will falsely report non-idle when idle 
>>>> during the
>>>> +     * delay to retire requests or with virtual engines and a request
>>>> +     * running on another instance within the same class / submit 
>>>> mask.
>>>> +     */
>>>> +    if (intel_engine_uses_guc(engine))
>>>> +        return false;
>>>> +
>>>>       /* Ring stopped? */
>>>>       return ring_is_idle(engine);
>>>>   }
>>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
>>>> b/drivers/gpu/drm/i915/i915_debugfs.c
>>>> index 94e5c29d2ee3a..ee5334840e9cb 100644
>>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>>> @@ -654,13 +654,13 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
>>>>   {
>>>>       int ret;
>>>>   +    if (val & DROP_RETIRE || val & DROP_RESET_ACTIVE)
>>>> +        intel_gt_retire_requests(gt);
>>>> +
>>>>       if (val & DROP_RESET_ACTIVE &&
>>>>           wait_for(intel_engines_are_idle(gt), 
>>>> I915_IDLE_ENGINES_TIMEOUT))
>>>>           intel_gt_set_wedged(gt);
>>>>   -    if (val & DROP_RETIRE)
>>>> -        intel_gt_retire_requests(gt);
>>>> -
>>>>       if (val & (DROP_IDLE | DROP_ACTIVE)) {
>>>>           ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
>>>>           if (ret)
>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>>>> b/drivers/gpu/drm/i915/i915_drv.h
>>>> index c22f29c3faa0e..53c7474dde495 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -278,7 +278,7 @@ struct i915_gem_mm {
>>>>       u32 shrink_count;
>>>>   };
>>>>   -#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
>>>> +#define I915_IDLE_ENGINES_TIMEOUT (500) /* in ms */
>>>>     unsigned long i915_fence_context_timeout(const struct 
>>>> drm_i915_private *i915,
>>>>                        u64 context);
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 283870c659911..959a7c92e8f4d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1602,6 +1602,9 @@  static bool ring_is_idle(struct intel_engine_cs *engine)
 {
 	bool idle = true;
 
+	/* GuC submission shouldn't access HEAD & TAIL via MMIO */
+	GEM_BUG_ON(intel_engine_uses_guc(engine));
+
 	if (I915_SELFTEST_ONLY(!engine->mmio_base))
 		return true;
 
@@ -1668,6 +1671,16 @@  bool intel_engine_is_idle(struct intel_engine_cs *engine)
 	if (!i915_sched_engine_is_empty(engine->sched_engine))
 		return false;
 
+	/*
+	 * We shouldn't touch engine registers with GuC submission as the GuC
+	 * owns the registers. Let's tie the idle to engine pm, at worst this
+	 * function sometimes will falsely report non-idle when idle during the
+	 * delay to retire requests or with virtual engines and a request
+	 * running on another instance within the same class / submit mask.
+	 */
+	if (intel_engine_uses_guc(engine))
+		return false;
+
 	/* Ring stopped? */
 	return ring_is_idle(engine);
 }
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 94e5c29d2ee3a..ee5334840e9cb 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -654,13 +654,13 @@  gt_drop_caches(struct intel_gt *gt, u64 val)
 {
 	int ret;
 
+	if (val & DROP_RETIRE || val & DROP_RESET_ACTIVE)
+		intel_gt_retire_requests(gt);
+
 	if (val & DROP_RESET_ACTIVE &&
 	    wait_for(intel_engines_are_idle(gt), I915_IDLE_ENGINES_TIMEOUT))
 		intel_gt_set_wedged(gt);
 
-	if (val & DROP_RETIRE)
-		intel_gt_retire_requests(gt);
-
 	if (val & (DROP_IDLE | DROP_ACTIVE)) {
 		ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
 		if (ret)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c22f29c3faa0e..53c7474dde495 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -278,7 +278,7 @@  struct i915_gem_mm {
 	u32 shrink_count;
 };
 
-#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
+#define I915_IDLE_ENGINES_TIMEOUT (500) /* in ms */
 
 unsigned long i915_fence_context_timeout(const struct drm_i915_private *i915,
 					 u64 context);