diff mbox series

drm/i915/guc: GuC suspend path cleanup

Message ID 20190321171400.31900-1-sujaritha.sundaresan@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/guc: GuC suspend path cleanup | expand

Commit Message

Sundaresan, Sujaritha March 21, 2019, 5:14 p.m. UTC
Adding a call to intel_uc_suspend in i915_gem_suspend, which
is a common point for the suspend/resume and hibernate paths.
This fixes an unbalanced call that causes issues with the CTB
register/deregister.

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Daniele Ceraolo Spurio March 21, 2019, 5:49 p.m. UTC | #1
On 3/21/19 10:14 AM, Sujaritha Sundaresan wrote:
> Adding a call to intel_uc_suspend in i915_gem_suspend, which
> is a common point for the suspend/resume and hibernate paths.
> This fixes an unbalanced call that causes issues with the CTB
> register/deregister.
> 
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1a684b7e8c09..980855ebdeda 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4371,6 +4371,9 @@ void i915_gem_suspend(struct drm_i915_private *i915)
>   	 */
>   	switch_to_kernel_context_sync(i915, i915->gt.active_engines);
>   
> +	if (!__i915_wedged(&i915->gpu_error))
> +		intel_uc_suspend(i915);

We can probably call intel_uc_suspend unconditionally, because it 
internally checks for INTEL_UC_FIRMWARE_SUCCESS, which should be false 
if we wedged due to the uc_reset_prepare we added to the wedge function.

Daniele

> +
>   	mutex_unlock(&i915->drm.struct_mutex);
>   	i915_reset_flush(i915);
>   
>
Chris Wilson March 21, 2019, 5:54 p.m. UTC | #2
Quoting Daniele Ceraolo Spurio (2019-03-21 17:49:55)
> 
> 
> On 3/21/19 10:14 AM, Sujaritha Sundaresan wrote:
> > Adding a call to intel_uc_suspend in i915_gem_suspend, which
> > is a common point for the suspend/resume and hibernate paths.
> > This fixes an unbalanced call that causes issues with the CTB
> > register/deregister.
> > 
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Signed-off-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 1a684b7e8c09..980855ebdeda 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4371,6 +4371,9 @@ void i915_gem_suspend(struct drm_i915_private *i915)
> >        */
> >       switch_to_kernel_context_sync(i915, i915->gt.active_engines);
> >   
> > +     if (!__i915_wedged(&i915->gpu_error))
> > +             intel_uc_suspend(i915);
> 
> We can probably call intel_uc_suspend unconditionally, because it 
> internally checks for INTEL_UC_FIRMWARE_SUCCESS, which should be false 
> if we wedged due to the uc_reset_prepare we added to the wedge function.

\o/ I was hoping there was a more natural test inside intel_uc.
-Chris
Chris Wilson March 21, 2019, 7:37 p.m. UTC | #3
Quoting Patchwork (2019-03-21 19:26:27)
> == Series Details ==
> 
> Series: drm/i915/guc: GuC suspend path cleanup
> URL   : https://patchwork.freedesktop.org/series/58370/
> State : failure
> 
> == Summary ==
> 
> CI Bug Log - changes from CI_DRM_5789 -> Patchwork_12553
> ====================================================
> 
> Summary
> -------
> 
>   **FAILURE**
> 
>   Serious unknown changes coming with Patchwork_12553 absolutely need to be
>   verified manually.
>   
>   If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_12553, please notify your bug team to allow them
>   to document this new failure mode, which will reduce false positives in CI.
> 
>   External URL: https://patchwork.freedesktop.org/api/1.0/series/58370/revisions/1/mbox/
> 
> Possible new issues
> -------------------
> 
>   Here are the unknown changes that may have been introduced in Patchwork_12553:
> 
> ### IGT changes ###
> 
> #### Possible regressions ####
> 
>   * igt@gem_exec_suspend@basic-s3:
>     - fi-apl-guc:         PASS -> DMESG-WARN

That says we turned the guc off before completing the idle sequence, so
the intel_uc_suspend() has to be after the flush_workqueues.
-Chris
Sundaresan, Sujaritha March 21, 2019, 7:41 p.m. UTC | #4
On 3/21/19 12:37 PM, Chris Wilson wrote:
> Quoting Patchwork (2019-03-21 19:26:27)
>> == Series Details ==
>>
>> Series: drm/i915/guc: GuC suspend path cleanup
>> URL   : https://patchwork.freedesktop.org/series/58370/
>> State : failure
>>
>> == Summary ==
>>
>> CI Bug Log - changes from CI_DRM_5789 -> Patchwork_12553
>> ====================================================
>>
>> Summary
>> -------
>>
>>    **FAILURE**
>>
>>    Serious unknown changes coming with Patchwork_12553 absolutely need to be
>>    verified manually.
>>    
>>    If you think the reported changes have nothing to do with the changes
>>    introduced in Patchwork_12553, please notify your bug team to allow them
>>    to document this new failure mode, which will reduce false positives in CI.
>>
>>    External URL: https://patchwork.freedesktop.org/api/1.0/series/58370/revisions/1/mbox/
>>
>> Possible new issues
>> -------------------
>>
>>    Here are the unknown changes that may have been introduced in Patchwork_12553:
>>
>> ### IGT changes ###
>>
>> #### Possible regressions ####
>>
>>    * igt@gem_exec_suspend@basic-s3:
>>      - fi-apl-guc:         PASS -> DMESG-WARN
> That says we turned the guc off before completing the idle sequence, so
> the intel_uc_suspend() has to be after the flush_workqueues.
> -Chris


But shouldn't this be taken care of by the switch_to_kernel_context_sync ?

And would be better have uc_suspend after drain_delayed_work instead of

just after flush_workqueue ?

-Sujaritha
Sundaresan, Sujaritha March 21, 2019, 7:46 p.m. UTC | #5
On 3/21/19 10:54 AM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2019-03-21 17:49:55)
>>
>> On 3/21/19 10:14 AM, Sujaritha Sundaresan wrote:
>>> Adding a call to intel_uc_suspend in i915_gem_suspend, which
>>> is a common point for the suspend/resume and hibernate paths.
>>> This fixes an unbalanced call that causes issues with the CTB
>>> register/deregister.
>>>
>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> Signed-off-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 1a684b7e8c09..980855ebdeda 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -4371,6 +4371,9 @@ void i915_gem_suspend(struct drm_i915_private *i915)
>>>         */
>>>        switch_to_kernel_context_sync(i915, i915->gt.active_engines);
>>>    
>>> +     if (!__i915_wedged(&i915->gpu_error))
>>> +             intel_uc_suspend(i915);
>> We can probably call intel_uc_suspend unconditionally, because it
>> internally checks for INTEL_UC_FIRMWARE_SUCCESS, which should be false
>> if we wedged due to the uc_reset_prepare we added to the wedge function.
> \o/ I was hoping there was a more natural test inside intel_uc.
> -Chris

Sure that seems like a more straight forward check.

I will drop the condition.

-Sujaritha
Sundaresan, Sujaritha March 21, 2019, 8:02 p.m. UTC | #6
On 3/21/19 1:08 PM, Chris Wilson wrote:
> Quoting Sujaritha (2019-03-21 19:41:17)
>> On 3/21/19 12:37 PM, Chris Wilson wrote:
>>> Quoting Patchwork (2019-03-21 19:26:27)
>>>> == Series Details ==
>>>>
>>>> Series: drm/i915/guc: GuC suspend path cleanup
>>>> URL   : https://patchwork.freedesktop.org/series/58370/
>>>> State : failure
>>>>
>>>> == Summary ==
>>>>
>>>> CI Bug Log - changes from CI_DRM_5789 -> Patchwork_12553
>>>> ====================================================
>>>>
>>>> Summary
>>>> -------
>>>>
>>>>     **FAILURE**
>>>>
>>>>     Serious unknown changes coming with Patchwork_12553 absolutely need to be
>>>>     verified manually.
>>>>     
>>>>     If you think the reported changes have nothing to do with the changes
>>>>     introduced in Patchwork_12553, please notify your bug team to allow them
>>>>     to document this new failure mode, which will reduce false positives in CI.
>>>>
>>>>     External URL: https://patchwork.freedesktop.org/api/1.0/series/58370/revisions/1/mbox/
>>>>
>>>> Possible new issues
>>>> -------------------
>>>>
>>>>     Here are the unknown changes that may have been introduced in Patchwork_12553:
>>>>
>>>> ### IGT changes ###
>>>>
>>>> #### Possible regressions ####
>>>>
>>>>     * igt@gem_exec_suspend@basic-s3:
>>>>       - fi-apl-guc:         PASS -> DMESG-WARN
>>> That says we turned the guc off before completing the idle sequence, so
>>> the intel_uc_suspend() has to be after the flush_workqueues.
>>> -Chris
>>
>> But shouldn't this be taken care of by the switch_to_kernel_context_sync ?
> Hmm, no, we can't flush the retire worker there (because of
> struct_mutex). But it should be taken care! Something to work on :)
>
>> And would be better have uc_suspend after drain_delayed_work instead of
>>
>> just after flush_workqueue ?
> Basically right at the end; you don't need struct_mutex right? And the
> assert that the gt is !awake fits in with the intent to switch guc off.
> -Chris


Yes at the end, before the GEM_BUG_ON. The struct_mutex is not required.

-Sujaritha
Chris Wilson March 21, 2019, 8:08 p.m. UTC | #7
Quoting Sujaritha (2019-03-21 19:41:17)
> 
> On 3/21/19 12:37 PM, Chris Wilson wrote:
> > Quoting Patchwork (2019-03-21 19:26:27)
> >> == Series Details ==
> >>
> >> Series: drm/i915/guc: GuC suspend path cleanup
> >> URL   : https://patchwork.freedesktop.org/series/58370/
> >> State : failure
> >>
> >> == Summary ==
> >>
> >> CI Bug Log - changes from CI_DRM_5789 -> Patchwork_12553
> >> ====================================================
> >>
> >> Summary
> >> -------
> >>
> >>    **FAILURE**
> >>
> >>    Serious unknown changes coming with Patchwork_12553 absolutely need to be
> >>    verified manually.
> >>    
> >>    If you think the reported changes have nothing to do with the changes
> >>    introduced in Patchwork_12553, please notify your bug team to allow them
> >>    to document this new failure mode, which will reduce false positives in CI.
> >>
> >>    External URL: https://patchwork.freedesktop.org/api/1.0/series/58370/revisions/1/mbox/
> >>
> >> Possible new issues
> >> -------------------
> >>
> >>    Here are the unknown changes that may have been introduced in Patchwork_12553:
> >>
> >> ### IGT changes ###
> >>
> >> #### Possible regressions ####
> >>
> >>    * igt@gem_exec_suspend@basic-s3:
> >>      - fi-apl-guc:         PASS -> DMESG-WARN
> > That says we turned the guc off before completing the idle sequence, so
> > the intel_uc_suspend() has to be after the flush_workqueues.
> > -Chris
> 
> 
> But shouldn't this be taken care of by the switch_to_kernel_context_sync ?

Hmm, no, we can't flush the retire worker there (because of
struct_mutex). But it should be taken care! Something to work on :)

> And would be better have uc_suspend after drain_delayed_work instead of
> 
> just after flush_workqueue ?

Basically right at the end; you don't need struct_mutex right? And the
assert that the gt is !awake fits in with the intent to switch guc off.
-Chris
Chris Wilson March 21, 2019, 8:23 p.m. UTC | #8
Quoting Sujaritha (2019-03-21 20:02:36)
> 
> On 3/21/19 1:08 PM, Chris Wilson wrote:
> > Quoting Sujaritha (2019-03-21 19:41:17)
> >> On 3/21/19 12:37 PM, Chris Wilson wrote:
> >>> Quoting Patchwork (2019-03-21 19:26:27)
> >>>> == Series Details ==
> >>>>
> >>>> Series: drm/i915/guc: GuC suspend path cleanup
> >>>> URL   : https://patchwork.freedesktop.org/series/58370/
> >>>> State : failure
> >>>>
> >>>> == Summary ==
> >>>>
> >>>> CI Bug Log - changes from CI_DRM_5789 -> Patchwork_12553
> >>>> ====================================================
> >>>>
> >>>> Summary
> >>>> -------
> >>>>
> >>>>     **FAILURE**
> >>>>
> >>>>     Serious unknown changes coming with Patchwork_12553 absolutely need to be
> >>>>     verified manually.
> >>>>     
> >>>>     If you think the reported changes have nothing to do with the changes
> >>>>     introduced in Patchwork_12553, please notify your bug team to allow them
> >>>>     to document this new failure mode, which will reduce false positives in CI.
> >>>>
> >>>>     External URL: https://patchwork.freedesktop.org/api/1.0/series/58370/revisions/1/mbox/
> >>>>
> >>>> Possible new issues
> >>>> -------------------
> >>>>
> >>>>     Here are the unknown changes that may have been introduced in Patchwork_12553:
> >>>>
> >>>> ### IGT changes ###
> >>>>
> >>>> #### Possible regressions ####
> >>>>
> >>>>     * igt@gem_exec_suspend@basic-s3:
> >>>>       - fi-apl-guc:         PASS -> DMESG-WARN
> >>> That says we turned the guc off before completing the idle sequence, so
> >>> the intel_uc_suspend() has to be after the flush_workqueues.
> >>> -Chris
> >>
> >> But shouldn't this be taken care of by the switch_to_kernel_context_sync ?
> > Hmm, no, we can't flush the retire worker there (because of
> > struct_mutex). But it should be taken care! Something to work on :)
> >
> >> And would be better have uc_suspend after drain_delayed_work instead of
> >>
> >> just after flush_workqueue ?
> > Basically right at the end; you don't need struct_mutex right? And the
> > assert that the gt is !awake fits in with the intent to switch guc off.
> > -Chris
> 
> 
> Yes at the end, before the GEM_BUG_ON. The struct_mutex is not required.

I'd go just after. The assert is that GEM is idle, so ready to suspend
the FW. Worksforme.
-Chris
Sundaresan, Sujaritha March 21, 2019, 8:28 p.m. UTC | #9
On 3/21/19 1:23 PM, Chris Wilson wrote:
> Quoting Sujaritha (2019-03-21 20:02:36)
>> On 3/21/19 1:08 PM, Chris Wilson wrote:
>>> Quoting Sujaritha (2019-03-21 19:41:17)
>>>> On 3/21/19 12:37 PM, Chris Wilson wrote:
>>>>> Quoting Patchwork (2019-03-21 19:26:27)
>>>>>> == Series Details ==
>>>>>>
>>>>>> Series: drm/i915/guc: GuC suspend path cleanup
>>>>>> URL   : https://patchwork.freedesktop.org/series/58370/
>>>>>> State : failure
>>>>>>
>>>>>> == Summary ==
>>>>>>
>>>>>> CI Bug Log - changes from CI_DRM_5789 -> Patchwork_12553
>>>>>> ====================================================
>>>>>>
>>>>>> Summary
>>>>>> -------
>>>>>>
>>>>>>      **FAILURE**
>>>>>>
>>>>>>      Serious unknown changes coming with Patchwork_12553 absolutely need to be
>>>>>>      verified manually.
>>>>>>      
>>>>>>      If you think the reported changes have nothing to do with the changes
>>>>>>      introduced in Patchwork_12553, please notify your bug team to allow them
>>>>>>      to document this new failure mode, which will reduce false positives in CI.
>>>>>>
>>>>>>      External URL: https://patchwork.freedesktop.org/api/1.0/series/58370/revisions/1/mbox/
>>>>>>
>>>>>> Possible new issues
>>>>>> -------------------
>>>>>>
>>>>>>      Here are the unknown changes that may have been introduced in Patchwork_12553:
>>>>>>
>>>>>> ### IGT changes ###
>>>>>>
>>>>>> #### Possible regressions ####
>>>>>>
>>>>>>      * igt@gem_exec_suspend@basic-s3:
>>>>>>        - fi-apl-guc:         PASS -> DMESG-WARN
>>>>> That says we turned the guc off before completing the idle sequence, so
>>>>> the intel_uc_suspend() has to be after the flush_workqueues.
>>>>> -Chris
>>>> But shouldn't this be taken care of by the switch_to_kernel_context_sync ?
>>> Hmm, no, we can't flush the retire worker there (because of
>>> struct_mutex). But it should be taken care! Something to work on :)
>>>
>>>> And would be better have uc_suspend after drain_delayed_work instead of
>>>>
>>>> just after flush_workqueue ?
>>> Basically right at the end; you don't need struct_mutex right? And the
>>> assert that the gt is !awake fits in with the intent to switch guc off.
>>> -Chris
>>
>> Yes at the end, before the GEM_BUG_ON. The struct_mutex is not required.
> I'd go just after. The assert is that GEM is idle, so ready to suspend
> the FW. Worksforme.
> -Chris


Okay sure, I will add it just after the BUG_ON.

-Sujaritha
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1a684b7e8c09..980855ebdeda 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4371,6 +4371,9 @@  void i915_gem_suspend(struct drm_i915_private *i915)
 	 */
 	switch_to_kernel_context_sync(i915, i915->gt.active_engines);
 
+	if (!__i915_wedged(&i915->gpu_error))
+		intel_uc_suspend(i915);
+
 	mutex_unlock(&i915->drm.struct_mutex);
 	i915_reset_flush(i915);