diff mbox series

[v3] drm/amdgpu: Cancel delayed work when GFXOFF is disabled

Message ID 20210816103506.2671-1-michel@daenzer.net (mailing list archive)
State New, archived
Headers show
Series [v3] drm/amdgpu: Cancel delayed work when GFXOFF is disabled | expand

Commit Message

Michel Dänzer Aug. 16, 2021, 10:35 a.m. UTC
From: Michel Dänzer <mdaenzer@redhat.com>

schedule_delayed_work does not push back the work if it was already
scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
was disabled and re-enabled again during those 100 ms.

This resulted in frame drops / stutter with the upcoming mutter 41
release on Navi 14, due to constantly enabling GFXOFF in the HW and
disabling it again (for getting the GPU clock counter).

To fix this, call cancel_delayed_work_sync when the disable count
transitions from 0 to 1, and only schedule the delayed work on the
reverse transition, not if the disable count was already 0. This makes
sure the delayed work doesn't run at unexpected times, and allows it to
be lock-free.

v2:
* Use cancel_delayed_work_sync & mutex_trylock instead of
  mod_delayed_work.
v3:
* Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)

Cc: stable@vger.kernel.org
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 22 +++++++++++++++++-----
 2 files changed, 22 insertions(+), 11 deletions(-)

Comments

Lazar, Lijo Aug. 16, 2021, 11:33 a.m. UTC | #1
On 8/16/2021 4:05 PM, Michel Dänzer wrote:
> From: Michel Dänzer <mdaenzer@redhat.com>
> 
> schedule_delayed_work does not push back the work if it was already
> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
> was disabled and re-enabled again during those 100 ms.
> 
> This resulted in frame drops / stutter with the upcoming mutter 41
> release on Navi 14, due to constantly enabling GFXOFF in the HW and
> disabling it again (for getting the GPU clock counter).
> 
> To fix this, call cancel_delayed_work_sync when the disable count
> transitions from 0 to 1, and only schedule the delayed work on the
> reverse transition, not if the disable count was already 0. This makes
> sure the delayed work doesn't run at unexpected times, and allows it to
> be lock-free.
> 
> v2:
> * Use cancel_delayed_work_sync & mutex_trylock instead of
>    mod_delayed_work.
> v3:
> * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++++------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 22 +++++++++++++++++-----
>   2 files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index f3fd5ec710b6..f944ed858f3e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2777,12 +2777,11 @@ static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
>   	struct amdgpu_device *adev =
>   		container_of(work, struct amdgpu_device, gfx.gfx_off_delay_work.work);
>   
> -	mutex_lock(&adev->gfx.gfx_off_mutex);
> -	if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
> -		if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true))
> -			adev->gfx.gfx_off_state = true;
> -	}
> -	mutex_unlock(&adev->gfx.gfx_off_mutex);
> +	WARN_ON_ONCE(adev->gfx.gfx_off_state);

Don't see any case for this. It's not expected to be scheduled in this 
case, right?

> +	WARN_ON_ONCE(adev->gfx.gfx_off_req_count);
> +

Thinking about ON_ONCE here - this may happen more than once if it's 
completed as part of cancel_ call. Is the warning needed?

Anyway,
	Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

> +	if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true))
> +		adev->gfx.gfx_off_state = true;
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index a0be0772c8b3..ca91aafcb32b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -563,15 +563,26 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
>   
>   	mutex_lock(&adev->gfx.gfx_off_mutex);
>   
> -	if (!enable)
> -		adev->gfx.gfx_off_req_count++;
> -	else if (adev->gfx.gfx_off_req_count > 0)
> +	if (enable) {
> +		/* If the count is already 0, it means there's an imbalance bug somewhere.
> +		 * Note that the bug may be in a different caller than the one which triggers the
> +		 * WARN_ON_ONCE.
> +		 */
> +		if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0))
> +			goto unlock;
> +
>   		adev->gfx.gfx_off_req_count--;
> +	} else {
> +		adev->gfx.gfx_off_req_count++;
> +	}
>   
>   	if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>   		schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE);
> -	} else if (!enable && adev->gfx.gfx_off_state) {
> -		if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) {
> +	} else if (!enable && adev->gfx.gfx_off_req_count == 1) {
> +		cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);
> +
> +		if (adev->gfx.gfx_off_state &&
> +		    !amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) {
>   			adev->gfx.gfx_off_state = false;
>   
>   			if (adev->gfx.funcs->init_spm_golden) {
> @@ -581,6 +592,7 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
>   		}
>   	}
>   
> +unlock:
>   	mutex_unlock(&adev->gfx.gfx_off_mutex);
>   }
>   
>
Christian König Aug. 16, 2021, 12:06 p.m. UTC | #2
Am 16.08.21 um 13:33 schrieb Lazar, Lijo:
> On 8/16/2021 4:05 PM, Michel Dänzer wrote:
>> From: Michel Dänzer <mdaenzer@redhat.com>
>>
>> schedule_delayed_work does not push back the work if it was already
>> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
>> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
>> was disabled and re-enabled again during those 100 ms.
>>
>> This resulted in frame drops / stutter with the upcoming mutter 41
>> release on Navi 14, due to constantly enabling GFXOFF in the HW and
>> disabling it again (for getting the GPU clock counter).
>>
>> To fix this, call cancel_delayed_work_sync when the disable count
>> transitions from 0 to 1, and only schedule the delayed work on the
>> reverse transition, not if the disable count was already 0. This makes
>> sure the delayed work doesn't run at unexpected times, and allows it to
>> be lock-free.
>>
>> v2:
>> * Use cancel_delayed_work_sync & mutex_trylock instead of
>>    mod_delayed_work.
>> v3:
>> * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++++------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 22 +++++++++++++++++-----
>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index f3fd5ec710b6..f944ed858f3e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2777,12 +2777,11 @@ static void 
>> amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
>>       struct amdgpu_device *adev =
>>           container_of(work, struct amdgpu_device, 
>> gfx.gfx_off_delay_work.work);
>>   -    mutex_lock(&adev->gfx.gfx_off_mutex);
>> -    if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>> -        if (!amdgpu_dpm_set_powergating_by_smu(adev, 
>> AMD_IP_BLOCK_TYPE_GFX, true))
>> -            adev->gfx.gfx_off_state = true;
>> -    }
>> -    mutex_unlock(&adev->gfx.gfx_off_mutex);
>> +    WARN_ON_ONCE(adev->gfx.gfx_off_state);
>
> Don't see any case for this. It's not expected to be scheduled in this 
> case, right?
>
>> + WARN_ON_ONCE(adev->gfx.gfx_off_req_count);
>> +
>
> Thinking about ON_ONCE here - this may happen more than once if it's 
> completed as part of cancel_ call. Is the warning needed?

WARN_ON_ONCE() is usually used to prevent spamming the system log with 
warnings. E.g. the warning is only printed once indicating a driver bug 
and that's it.

>
> Anyway,
>     Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

Acked-by: Christian König <christian.koenig@amd.com>

Regards,
Christian.

>
>> +    if (!amdgpu_dpm_set_powergating_by_smu(adev, 
>> AMD_IP_BLOCK_TYPE_GFX, true))
>> +        adev->gfx.gfx_off_state = true;
>>   }
>>     /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index a0be0772c8b3..ca91aafcb32b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -563,15 +563,26 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device 
>> *adev, bool enable)
>>         mutex_lock(&adev->gfx.gfx_off_mutex);
>>   -    if (!enable)
>> -        adev->gfx.gfx_off_req_count++;
>> -    else if (adev->gfx.gfx_off_req_count > 0)
>> +    if (enable) {
>> +        /* If the count is already 0, it means there's an imbalance 
>> bug somewhere.
>> +         * Note that the bug may be in a different caller than the 
>> one which triggers the
>> +         * WARN_ON_ONCE.
>> +         */
>> +        if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0))
>> +            goto unlock;
>> +
>>           adev->gfx.gfx_off_req_count--;
>> +    } else {
>> +        adev->gfx.gfx_off_req_count++;
>> +    }
>>         if (enable && !adev->gfx.gfx_off_state && 
>> !adev->gfx.gfx_off_req_count) {
>> schedule_delayed_work(&adev->gfx.gfx_off_delay_work, 
>> GFX_OFF_DELAY_ENABLE);
>> -    } else if (!enable && adev->gfx.gfx_off_state) {
>> -        if (!amdgpu_dpm_set_powergating_by_smu(adev, 
>> AMD_IP_BLOCK_TYPE_GFX, false)) {
>> +    } else if (!enable && adev->gfx.gfx_off_req_count == 1) {
>> + cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);
>> +
>> +        if (adev->gfx.gfx_off_state &&
>> +            !amdgpu_dpm_set_powergating_by_smu(adev, 
>> AMD_IP_BLOCK_TYPE_GFX, false)) {
>>               adev->gfx.gfx_off_state = false;
>>                 if (adev->gfx.funcs->init_spm_golden) {
>> @@ -581,6 +592,7 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device 
>> *adev, bool enable)
>>           }
>>       }
>>   +unlock:
>>       mutex_unlock(&adev->gfx.gfx_off_mutex);
>>   }
>>
Michel Dänzer Aug. 16, 2021, 3:06 p.m. UTC | #3
On 2021-08-16 2:06 p.m., Christian König wrote:
> Am 16.08.21 um 13:33 schrieb Lazar, Lijo:
>> On 8/16/2021 4:05 PM, Michel Dänzer wrote:
>>> From: Michel Dänzer <mdaenzer@redhat.com>
>>>
>>> schedule_delayed_work does not push back the work if it was already
>>> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
>>> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
>>> was disabled and re-enabled again during those 100 ms.
>>>
>>> This resulted in frame drops / stutter with the upcoming mutter 41
>>> release on Navi 14, due to constantly enabling GFXOFF in the HW and
>>> disabling it again (for getting the GPU clock counter).
>>>
>>> To fix this, call cancel_delayed_work_sync when the disable count
>>> transitions from 0 to 1, and only schedule the delayed work on the
>>> reverse transition, not if the disable count was already 0. This makes
>>> sure the delayed work doesn't run at unexpected times, and allows it to
>>> be lock-free.
>>>
>>> v2:
>>> * Use cancel_delayed_work_sync & mutex_trylock instead of
>>>    mod_delayed_work.
>>> v3:
>>> * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++++------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 22 +++++++++++++++++-----
>>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index f3fd5ec710b6..f944ed858f3e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -2777,12 +2777,11 @@ static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
>>>       struct amdgpu_device *adev =
>>>           container_of(work, struct amdgpu_device, gfx.gfx_off_delay_work.work);
>>>   -    mutex_lock(&adev->gfx.gfx_off_mutex);
>>> -    if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>>> -        if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true))
>>> -            adev->gfx.gfx_off_state = true;
>>> -    }
>>> -    mutex_unlock(&adev->gfx.gfx_off_mutex);
>>> +    WARN_ON_ONCE(adev->gfx.gfx_off_state);
>>
>> Don't see any case for this. It's not expected to be scheduled in this case, right?
>>
>>> + WARN_ON_ONCE(adev->gfx.gfx_off_req_count);
>>> +
>>
>> Thinking about ON_ONCE here - this may happen more than once if it's completed as part of cancel_ call. Is the warning needed?
> 
> WARN_ON_ONCE() is usually used to prevent spamming the system log with warnings. E.g. the warning is only printed once indicating a driver bug and that's it.

Right, these WARN_ONs are like assert()s in user-space code, documenting the pre-conditions and checking them at runtime. And I use _ONCE so that if a pre-condition is ever violated for some reason, dmesg isn't spammed with multiple warnings.


>> Anyway,
>>     Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
> 
> Acked-by: Christian König <christian.koenig@amd.com>

Thanks guys!
Alex Deucher Aug. 16, 2021, 7:02 p.m. UTC | #4
Applied.  Thanks!

Alex

On Mon, Aug 16, 2021 at 11:07 AM Michel Dänzer <michel@daenzer.net> wrote:
>
> On 2021-08-16 2:06 p.m., Christian König wrote:
> > Am 16.08.21 um 13:33 schrieb Lazar, Lijo:
> >> On 8/16/2021 4:05 PM, Michel Dänzer wrote:
> >>> From: Michel Dänzer <mdaenzer@redhat.com>
> >>>
> >>> schedule_delayed_work does not push back the work if it was already
> >>> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
> >>> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
> >>> was disabled and re-enabled again during those 100 ms.
> >>>
> >>> This resulted in frame drops / stutter with the upcoming mutter 41
> >>> release on Navi 14, due to constantly enabling GFXOFF in the HW and
> >>> disabling it again (for getting the GPU clock counter).
> >>>
> >>> To fix this, call cancel_delayed_work_sync when the disable count
> >>> transitions from 0 to 1, and only schedule the delayed work on the
> >>> reverse transition, not if the disable count was already 0. This makes
> >>> sure the delayed work doesn't run at unexpected times, and allows it to
> >>> be lock-free.
> >>>
> >>> v2:
> >>> * Use cancel_delayed_work_sync & mutex_trylock instead of
> >>>    mod_delayed_work.
> >>> v3:
> >>> * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
> >>>
> >>> Cc: stable@vger.kernel.org
> >>> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
> >>> ---
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++++------
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 22 +++++++++++++++++-----
> >>>   2 files changed, 22 insertions(+), 11 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> index f3fd5ec710b6..f944ed858f3e 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> @@ -2777,12 +2777,11 @@ static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
> >>>       struct amdgpu_device *adev =
> >>>           container_of(work, struct amdgpu_device, gfx.gfx_off_delay_work.work);
> >>>   -    mutex_lock(&adev->gfx.gfx_off_mutex);
> >>> -    if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
> >>> -        if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true))
> >>> -            adev->gfx.gfx_off_state = true;
> >>> -    }
> >>> -    mutex_unlock(&adev->gfx.gfx_off_mutex);
> >>> +    WARN_ON_ONCE(adev->gfx.gfx_off_state);
> >>
> >> Don't see any case for this. It's not expected to be scheduled in this case, right?
> >>
> >>> + WARN_ON_ONCE(adev->gfx.gfx_off_req_count);
> >>> +
> >>
> >> Thinking about ON_ONCE here - this may happen more than once if it's completed as part of cancel_ call. Is the warning needed?
> >
> > WARN_ON_ONCE() is usually used to prevent spamming the system log with warnings. E.g. the warning is only printed once indicating a driver bug and that's it.
>
> Right, these WARN_ONs are like assert()s in user-space code, documenting the pre-conditions and checking them at runtime. And I use _ONCE so that if a pre-condition is ever violated for some reason, dmesg isn't spammed with multiple warnings.
>
>
> >> Anyway,
> >>     Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
> >
> > Acked-by: Christian König <christian.koenig@amd.com>
>
> Thanks guys!
>
>
> --
> Earthling Michel Dänzer               |               https://redhat.com
> Libre software enthusiast             |             Mesa and X developer
Evan Quan Aug. 17, 2021, 7:51 a.m. UTC | #5
[AMD Official Use Only]



> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
> Michel Dänzer
> Sent: Monday, August 16, 2021 6:35 PM
> To: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>
> Cc: Liu, Leo <Leo.Liu@amd.com>; Zhu, James <James.Zhu@amd.com>; amd-
> gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Subject: [PATCH v3] drm/amdgpu: Cancel delayed work when GFXOFF is
> disabled
> 
> From: Michel Dänzer <mdaenzer@redhat.com>
> 
> schedule_delayed_work does not push back the work if it was already
> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
> was disabled and re-enabled again during those 100 ms.
> 
> This resulted in frame drops / stutter with the upcoming mutter 41
> release on Navi 14, due to constantly enabling GFXOFF in the HW and
> disabling it again (for getting the GPU clock counter).
> 
> To fix this, call cancel_delayed_work_sync when the disable count
> transitions from 0 to 1, and only schedule the delayed work on the
> reverse transition, not if the disable count was already 0. This makes
> sure the delayed work doesn't run at unexpected times, and allows it to
> be lock-free.
> 
> v2:
> * Use cancel_delayed_work_sync & mutex_trylock instead of
>   mod_delayed_work.
> v3:
> * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++++------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 22 +++++++++++++++++-
> ----
>  2 files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index f3fd5ec710b6..f944ed858f3e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2777,12 +2777,11 @@ static void
> amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
>  	struct amdgpu_device *adev =
>  		container_of(work, struct amdgpu_device,
> gfx.gfx_off_delay_work.work);
> 
> -	mutex_lock(&adev->gfx.gfx_off_mutex);
> -	if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
> -		if (!amdgpu_dpm_set_powergating_by_smu(adev,
> AMD_IP_BLOCK_TYPE_GFX, true))
> -			adev->gfx.gfx_off_state = true;
> -	}
> -	mutex_unlock(&adev->gfx.gfx_off_mutex);
> +	WARN_ON_ONCE(adev->gfx.gfx_off_state);
> +	WARN_ON_ONCE(adev->gfx.gfx_off_req_count);
> +
> +	if (!amdgpu_dpm_set_powergating_by_smu(adev,
> AMD_IP_BLOCK_TYPE_GFX, true))
> +		adev->gfx.gfx_off_state = true;
>  }
> 
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index a0be0772c8b3..ca91aafcb32b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -563,15 +563,26 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device
> *adev, bool enable)
> 
>  	mutex_lock(&adev->gfx.gfx_off_mutex);
> 
> -	if (!enable)
> -		adev->gfx.gfx_off_req_count++;
> -	else if (adev->gfx.gfx_off_req_count > 0)
> +	if (enable) {
> +		/* If the count is already 0, it means there's an imbalance bug
> somewhere.
> +		 * Note that the bug may be in a different caller than the one
> which triggers the
> +		 * WARN_ON_ONCE.
> +		 */
> +		if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0))
> +			goto unlock;
> +
>  		adev->gfx.gfx_off_req_count--;
> +	} else {
> +		adev->gfx.gfx_off_req_count++;
> +	}
> 
>  	if (enable && !adev->gfx.gfx_off_state && !adev-
> >gfx.gfx_off_req_count) {
>  		schedule_delayed_work(&adev->gfx.gfx_off_delay_work,
> GFX_OFF_DELAY_ENABLE);
> -	} else if (!enable && adev->gfx.gfx_off_state) {
> -		if (!amdgpu_dpm_set_powergating_by_smu(adev,
> AMD_IP_BLOCK_TYPE_GFX, false)) {
> +	} else if (!enable && adev->gfx.gfx_off_req_count == 1) {
[Quan, Evan] It seems here will leave a small time window for race condition. If amdgpu_device_delay_enable_gfx_off() happens to occur here, it will "WARN_ON_ONCE(adev->gfx.gfx_off_req_count);". How about something as below?
@@ -573,13 +573,11 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
                        goto unlock;

                adev->gfx.gfx_off_req_count--;
-       } else {
-               adev->gfx.gfx_off_req_count++;
        }

        if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
                schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE);
-       } else if (!enable && adev->gfx.gfx_off_req_count == 1) {
+       } else if (!enable && adev->gfx.gfx_off_req_count == 0) {
                cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);

                if (adev->gfx.gfx_off_state &&
@@ -593,6 +591,9 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
                }
        }

+       if (!enable)
+               adev->gfx.gfx_off_req_count++;
+
 unlock:

BR
Evan
> +		cancel_delayed_work_sync(&adev-
> >gfx.gfx_off_delay_work);
> +
> +		if (adev->gfx.gfx_off_state &&
> +		    !amdgpu_dpm_set_powergating_by_smu(adev,
> AMD_IP_BLOCK_TYPE_GFX, false)) {
>  			adev->gfx.gfx_off_state = false;
> 
>  			if (adev->gfx.funcs->init_spm_golden) {
> @@ -581,6 +592,7 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device
> *adev, bool enable)
>  		}
>  	}
> 
> +unlock:
>  	mutex_unlock(&adev->gfx.gfx_off_mutex);
>  }
> 
> --
> 2.32.0
Lazar, Lijo Aug. 17, 2021, 8:17 a.m. UTC | #6
On 8/17/2021 1:21 PM, Quan, Evan wrote:
> [AMD Official Use Only]
> 
> 
> 
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>> Michel Dänzer
>> Sent: Monday, August 16, 2021 6:35 PM
>> To: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
>> <Christian.Koenig@amd.com>
>> Cc: Liu, Leo <Leo.Liu@amd.com>; Zhu, James <James.Zhu@amd.com>; amd-
>> gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
>> Subject: [PATCH v3] drm/amdgpu: Cancel delayed work when GFXOFF is
>> disabled
>>
>> From: Michel Dänzer <mdaenzer@redhat.com>
>>
>> schedule_delayed_work does not push back the work if it was already
>> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
>> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
>> was disabled and re-enabled again during those 100 ms.
>>
>> This resulted in frame drops / stutter with the upcoming mutter 41
>> release on Navi 14, due to constantly enabling GFXOFF in the HW and
>> disabling it again (for getting the GPU clock counter).
>>
>> To fix this, call cancel_delayed_work_sync when the disable count
>> transitions from 0 to 1, and only schedule the delayed work on the
>> reverse transition, not if the disable count was already 0. This makes
>> sure the delayed work doesn't run at unexpected times, and allows it to
>> be lock-free.
>>
>> v2:
>> * Use cancel_delayed_work_sync & mutex_trylock instead of
>>    mod_delayed_work.
>> v3:
>> * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++++------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 22 +++++++++++++++++-
>> ----
>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index f3fd5ec710b6..f944ed858f3e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2777,12 +2777,11 @@ static void
>> amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
>>   	struct amdgpu_device *adev =
>>   		container_of(work, struct amdgpu_device,
>> gfx.gfx_off_delay_work.work);
>>
>> -	mutex_lock(&adev->gfx.gfx_off_mutex);
>> -	if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>> -		if (!amdgpu_dpm_set_powergating_by_smu(adev,
>> AMD_IP_BLOCK_TYPE_GFX, true))
>> -			adev->gfx.gfx_off_state = true;
>> -	}
>> -	mutex_unlock(&adev->gfx.gfx_off_mutex);
>> +	WARN_ON_ONCE(adev->gfx.gfx_off_state);
>> +	WARN_ON_ONCE(adev->gfx.gfx_off_req_count);
>> +
>> +	if (!amdgpu_dpm_set_powergating_by_smu(adev,
>> AMD_IP_BLOCK_TYPE_GFX, true))
>> +		adev->gfx.gfx_off_state = true;
>>   }
>>
>>   /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index a0be0772c8b3..ca91aafcb32b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -563,15 +563,26 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device
>> *adev, bool enable)
>>
>>   	mutex_lock(&adev->gfx.gfx_off_mutex);
>>
>> -	if (!enable)
>> -		adev->gfx.gfx_off_req_count++;
>> -	else if (adev->gfx.gfx_off_req_count > 0)
>> +	if (enable) {
>> +		/* If the count is already 0, it means there's an imbalance bug
>> somewhere.
>> +		 * Note that the bug may be in a different caller than the one
>> which triggers the
>> +		 * WARN_ON_ONCE.
>> +		 */
>> +		if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0))
>> +			goto unlock;
>> +
>>   		adev->gfx.gfx_off_req_count--;
>> +	} else {
>> +		adev->gfx.gfx_off_req_count++;
>> +	}
>>
>>   	if (enable && !adev->gfx.gfx_off_state && !adev-
>>> gfx.gfx_off_req_count) {
>>   		schedule_delayed_work(&adev->gfx.gfx_off_delay_work,
>> GFX_OFF_DELAY_ENABLE);
>> -	} else if (!enable && adev->gfx.gfx_off_state) {
>> -		if (!amdgpu_dpm_set_powergating_by_smu(adev,
>> AMD_IP_BLOCK_TYPE_GFX, false)) {
>> +	} else if (!enable && adev->gfx.gfx_off_req_count == 1) {
> [Quan, Evan] It seems here will leave a small time window for race condition. If amdgpu_device_delay_enable_gfx_off() happens to occur here, it will "WARN_ON_ONCE(adev->gfx.gfx_off_req_count);". How about something as below?
> @@ -573,13 +573,11 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
>                          goto unlock;
> 
>                  adev->gfx.gfx_off_req_count--;
> -       } else {
> -               adev->gfx.gfx_off_req_count++;
>          }
> 
>          if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>                  schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE);
> -       } else if (!enable && adev->gfx.gfx_off_req_count == 1) {
> +       } else if (!enable && adev->gfx.gfx_off_req_count == 0) {
>                  cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);
> 
>                  if (adev->gfx.gfx_off_state &&
> @@ -593,6 +591,9 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
>                  }
>          }
> 
> +       if (!enable)
> +               adev->gfx.gfx_off_req_count++;
> +
>   unlock:
> 

Hi Evan,

It's not a race per se, it is just an undesirable condition of Enable 
Gfxoff immediately followed by a Disable GfxOff. The purpose of the WARN 
is to intimate the user about it.

There are other cases - for ex: if amdgpu_device_delay_enable_gfx_off() 
called amdgpu_dpm_set_powergating_by_smu() already at the same place you 
pointed out. In this case WARN doesn't get printed, but it's not an 
optimal situation either. Probably it makes sense to move the WARN_ON as 
the last line of amdgpu_device_delay_enable_gfx_off. Either way, I don't 
think it's a race condition.

Thanks,
Lijo


> BR
> Evan
>> +		cancel_delayed_work_sync(&adev-
>>> gfx.gfx_off_delay_work);
>> +
>> +		if (adev->gfx.gfx_off_state &&
>> +		    !amdgpu_dpm_set_powergating_by_smu(adev,
>> AMD_IP_BLOCK_TYPE_GFX, false)) {
>>   			adev->gfx.gfx_off_state = false;
>>
>>   			if (adev->gfx.funcs->init_spm_golden) {
>> @@ -581,6 +592,7 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device
>> *adev, bool enable)
>>   		}
>>   	}
>>
>> +unlock:
>>   	mutex_unlock(&adev->gfx.gfx_off_mutex);
>>   }
>>
>> --
>> 2.32.0
Michel Dänzer Aug. 17, 2021, 8:35 a.m. UTC | #7
On 2021-08-17 10:17 a.m., Lazar, Lijo wrote:
> On 8/17/2021 1:21 PM, Quan, Evan wrote:
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>>> Michel Dänzer
>>> Sent: Monday, August 16, 2021 6:35 PM
>>> To: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
>>> <Christian.Koenig@amd.com>
>>> Cc: Liu, Leo <Leo.Liu@amd.com>; Zhu, James <James.Zhu@amd.com>; amd-
>>> gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
>>> Subject: [PATCH v3] drm/amdgpu: Cancel delayed work when GFXOFF is
>>> disabled
>>>
>>> From: Michel Dänzer <mdaenzer@redhat.com>
>>>
>>> schedule_delayed_work does not push back the work if it was already
>>> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
>>> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
>>> was disabled and re-enabled again during those 100 ms.
>>>
>>> This resulted in frame drops / stutter with the upcoming mutter 41
>>> release on Navi 14, due to constantly enabling GFXOFF in the HW and
>>> disabling it again (for getting the GPU clock counter).
>>>
>>> To fix this, call cancel_delayed_work_sync when the disable count
>>> transitions from 0 to 1, and only schedule the delayed work on the
>>> reverse transition, not if the disable count was already 0. This makes
>>> sure the delayed work doesn't run at unexpected times, and allows it to
>>> be lock-free.
>>>
>>> v2:
>>> * Use cancel_delayed_work_sync & mutex_trylock instead of
>>>    mod_delayed_work.
>>> v3:
>>> * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++++------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 22 +++++++++++++++++-
>>> ----
>>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index f3fd5ec710b6..f944ed858f3e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -2777,12 +2777,11 @@ static void
>>> amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
>>>       struct amdgpu_device *adev =
>>>           container_of(work, struct amdgpu_device,
>>> gfx.gfx_off_delay_work.work);
>>>
>>> -    mutex_lock(&adev->gfx.gfx_off_mutex);
>>> -    if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>>> -        if (!amdgpu_dpm_set_powergating_by_smu(adev,
>>> AMD_IP_BLOCK_TYPE_GFX, true))
>>> -            adev->gfx.gfx_off_state = true;
>>> -    }
>>> -    mutex_unlock(&adev->gfx.gfx_off_mutex);
>>> +    WARN_ON_ONCE(adev->gfx.gfx_off_state);
>>> +    WARN_ON_ONCE(adev->gfx.gfx_off_req_count);
>>> +
>>> +    if (!amdgpu_dpm_set_powergating_by_smu(adev,
>>> AMD_IP_BLOCK_TYPE_GFX, true))
>>> +        adev->gfx.gfx_off_state = true;
>>>   }
>>>
>>>   /**
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> index a0be0772c8b3..ca91aafcb32b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> @@ -563,15 +563,26 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device
>>> *adev, bool enable)
>>>
>>>       mutex_lock(&adev->gfx.gfx_off_mutex);
>>>
>>> -    if (!enable)
>>> -        adev->gfx.gfx_off_req_count++;
>>> -    else if (adev->gfx.gfx_off_req_count > 0)
>>> +    if (enable) {
>>> +        /* If the count is already 0, it means there's an imbalance bug
>>> somewhere.
>>> +         * Note that the bug may be in a different caller than the one
>>> which triggers the
>>> +         * WARN_ON_ONCE.
>>> +         */
>>> +        if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0))
>>> +            goto unlock;
>>> +
>>>           adev->gfx.gfx_off_req_count--;
>>> +    } else {
>>> +        adev->gfx.gfx_off_req_count++;
>>> +    }
>>>
>>>       if (enable && !adev->gfx.gfx_off_state && !adev-
>>>> gfx.gfx_off_req_count) {
>>>           schedule_delayed_work(&adev->gfx.gfx_off_delay_work,
>>> GFX_OFF_DELAY_ENABLE);
>>> -    } else if (!enable && adev->gfx.gfx_off_state) {
>>> -        if (!amdgpu_dpm_set_powergating_by_smu(adev,
>>> AMD_IP_BLOCK_TYPE_GFX, false)) {
>>> +    } else if (!enable && adev->gfx.gfx_off_req_count == 1) {
>> [Quan, Evan] It seems here will leave a small time window for race condition. If amdgpu_device_delay_enable_gfx_off() happens to occur here, it will "WARN_ON_ONCE(adev->gfx.gfx_off_req_count);". How about something as below?
>> @@ -573,13 +573,11 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
>>                          goto unlock;
>>
>>                  adev->gfx.gfx_off_req_count--;
>> -       } else {
>> -               adev->gfx.gfx_off_req_count++;
>>          }
>>
>>          if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
>>                  schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE);
>> -       } else if (!enable && adev->gfx.gfx_off_req_count == 1) {
>> +       } else if (!enable && adev->gfx.gfx_off_req_count == 0) {
>>                  cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);
>>
>>                  if (adev->gfx.gfx_off_state &&
>> @@ -593,6 +591,9 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
>>                  }
>>          }
>>
>> +       if (!enable)
>> +               adev->gfx.gfx_off_req_count++;
>> +
>>   unlock:
>>
> 
> Hi Evan,
> 
> It's not a race per se, it is just an undesirable condition of Enable Gfxoff immediately followed by a Disable GfxOff. The purpose of the WARN is to intimate the user about it.

What Evan pointed out (good catch, thanks!) is technically a race condition WRT adev->gfx.gfx_off_req_count, even though in this case it would have only triggered the sanity checks in place to catch bugs like it, it wouldn't otherwise have affected the correctness of the code.

Fixed in v4.


> There are other cases - for ex: if amdgpu_device_delay_enable_gfx_off() called amdgpu_dpm_set_powergating_by_smu() already at the same place you pointed out.

That OTOH is indeed not a race condition, just unlucky timing.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f3fd5ec710b6..f944ed858f3e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2777,12 +2777,11 @@  static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
 	struct amdgpu_device *adev =
 		container_of(work, struct amdgpu_device, gfx.gfx_off_delay_work.work);
 
-	mutex_lock(&adev->gfx.gfx_off_mutex);
-	if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
-		if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true))
-			adev->gfx.gfx_off_state = true;
-	}
-	mutex_unlock(&adev->gfx.gfx_off_mutex);
+	WARN_ON_ONCE(adev->gfx.gfx_off_state);
+	WARN_ON_ONCE(adev->gfx.gfx_off_req_count);
+
+	if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true))
+		adev->gfx.gfx_off_state = true;
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index a0be0772c8b3..ca91aafcb32b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -563,15 +563,26 @@  void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
 
 	mutex_lock(&adev->gfx.gfx_off_mutex);
 
-	if (!enable)
-		adev->gfx.gfx_off_req_count++;
-	else if (adev->gfx.gfx_off_req_count > 0)
+	if (enable) {
+		/* If the count is already 0, it means there's an imbalance bug somewhere.
+		 * Note that the bug may be in a different caller than the one which triggers the
+		 * WARN_ON_ONCE.
+		 */
+		if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0))
+			goto unlock;
+
 		adev->gfx.gfx_off_req_count--;
+	} else {
+		adev->gfx.gfx_off_req_count++;
+	}
 
 	if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
 		schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE);
-	} else if (!enable && adev->gfx.gfx_off_state) {
-		if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) {
+	} else if (!enable && adev->gfx.gfx_off_req_count == 1) {
+		cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);
+
+		if (adev->gfx.gfx_off_state &&
+		    !amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) {
 			adev->gfx.gfx_off_state = false;
 
 			if (adev->gfx.funcs->init_spm_golden) {
@@ -581,6 +592,7 @@  void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
 		}
 	}
 
+unlock:
 	mutex_unlock(&adev->gfx.gfx_off_mutex);
 }