diff mbox

drm/amd/powerplay: don't give up if DPM is already running

Message ID 1476040988-11676-1-git-send-email-notasas@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Grazvydas Ignotas Oct. 9, 2016, 7:23 p.m. UTC
Currently the driver crashes if smu7_enable_dpm_tasks() returns early,
which it does if DPM is already active. It seems to be better just to
continue anyway, at least I haven't noticed any ill effects. It's also
unclear at what state the hardware was left by the previous driver, so
IMO it's better to always fully initialize.

Way to reproduce:
$ modprobe amdgpu
$ rmmod amdgpu
$ modprobe amdgpu
...
DPM is already running right now, no need to enable DPM!
...
failed to send message 18b ret is 0
BUG: unable to handle kernel paging request at ffffed01fc9ab21f
Call Trace:
 smu7_set_power_state_tasks+0x499/0x1940 [amdgpu]
 phm_set_power_state+0xcb/0x120 [amdgpu]
 psm_adjust_power_state_dynamic+0x11e/0x1b0 [amdgpu]
 pem_task_adjust_power_state+0xb9/0xd0 [amdgpu]
 pem_excute_event_chain+0x7d/0xe0 [amdgpu]
 pem_handle_event_unlocked+0x49/0x60 [amdgpu]
 pem_handle_event+0xe/0x10 [amdgpu]
 pp_dpm_dispatch_tasks+0xe0/0x190 [amdgpu]
 amdgpu_pm_compute_clocks+0x10c/0xc60 [amdgpu]
 dce_v11_0_crtc_dpms+0x7d/0x150 [amdgpu]
 dce_v11_0_crtc_disable+0x90/0x4a0 [amdgpu]
 drm_helper_disable_unused_functions+0x67/0x80 [drm_kms_helper]
 amdgpu_fbdev_init+0x13e/0x170 [amdgpu]
 amdgpu_device_init+0x1aeb/0x2490 [amdgpu]

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
---
 drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Edward O'Callaghan Oct. 9, 2016, 10:12 p.m. UTC | #1
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

On 10/10/2016 06:23 AM, Grazvydas Ignotas wrote:
> Currently the driver crashes if smu7_enable_dpm_tasks() returns early,
> which it does if DPM is already active. It seems to be better just to
> continue anyway, at least I haven't noticed any ill effects. It's also
> unclear at what state the hardware was left by the previous driver, so
> IMO it's better to always fully initialize.
> 
> Way to reproduce:
> $ modprobe amdgpu
> $ rmmod amdgpu
> $ modprobe amdgpu
> ...
> DPM is already running right now, no need to enable DPM!
> ...
> failed to send message 18b ret is 0
> BUG: unable to handle kernel paging request at ffffed01fc9ab21f
> Call Trace:
>  smu7_set_power_state_tasks+0x499/0x1940 [amdgpu]
>  phm_set_power_state+0xcb/0x120 [amdgpu]
>  psm_adjust_power_state_dynamic+0x11e/0x1b0 [amdgpu]
>  pem_task_adjust_power_state+0xb9/0xd0 [amdgpu]
>  pem_excute_event_chain+0x7d/0xe0 [amdgpu]
>  pem_handle_event_unlocked+0x49/0x60 [amdgpu]
>  pem_handle_event+0xe/0x10 [amdgpu]
>  pp_dpm_dispatch_tasks+0xe0/0x190 [amdgpu]
>  amdgpu_pm_compute_clocks+0x10c/0xc60 [amdgpu]
>  dce_v11_0_crtc_dpms+0x7d/0x150 [amdgpu]
>  dce_v11_0_crtc_disable+0x90/0x4a0 [amdgpu]
>  drm_helper_disable_unused_functions+0x67/0x80 [drm_kms_helper]
>  amdgpu_fbdev_init+0x13e/0x170 [amdgpu]
>  amdgpu_device_init+0x1aeb/0x2490 [amdgpu]
> 
> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
> ---
>  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> index f6afa6a..327030b 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> @@ -1166,8 +1166,8 @@ static int smu7_enable_dpm_tasks(struct pp_hwmgr *hwmgr)
>  
>  	tmp_result = (!smum_is_dpm_running(hwmgr)) ? 0 : -1;
>  	PP_ASSERT_WITH_CODE(tmp_result == 0,
> -			"DPM is already running right now, no need to enable DPM!",
> -			return 0);
> +			"DPM is already running",
> +			);
>  
>  	if (smu7_voltage_control(hwmgr)) {
>  		tmp_result = smu7_enable_voltage_control(hwmgr);
>
Alex Deucher Oct. 11, 2016, 8 p.m. UTC | #2
+Rex to review this.

Alex

On Sun, Oct 9, 2016 at 3:23 PM, Grazvydas Ignotas <notasas@gmail.com> wrote:
> Currently the driver crashes if smu7_enable_dpm_tasks() returns early,
> which it does if DPM is already active. It seems to be better just to
> continue anyway, at least I haven't noticed any ill effects. It's also
> unclear at what state the hardware was left by the previous driver, so
> IMO it's better to always fully initialize.
>
> Way to reproduce:
> $ modprobe amdgpu
> $ rmmod amdgpu
> $ modprobe amdgpu
> ...
> DPM is already running right now, no need to enable DPM!
> ...
> failed to send message 18b ret is 0
> BUG: unable to handle kernel paging request at ffffed01fc9ab21f
> Call Trace:
>  smu7_set_power_state_tasks+0x499/0x1940 [amdgpu]
>  phm_set_power_state+0xcb/0x120 [amdgpu]
>  psm_adjust_power_state_dynamic+0x11e/0x1b0 [amdgpu]
>  pem_task_adjust_power_state+0xb9/0xd0 [amdgpu]
>  pem_excute_event_chain+0x7d/0xe0 [amdgpu]
>  pem_handle_event_unlocked+0x49/0x60 [amdgpu]
>  pem_handle_event+0xe/0x10 [amdgpu]
>  pp_dpm_dispatch_tasks+0xe0/0x190 [amdgpu]
>  amdgpu_pm_compute_clocks+0x10c/0xc60 [amdgpu]
>  dce_v11_0_crtc_dpms+0x7d/0x150 [amdgpu]
>  dce_v11_0_crtc_disable+0x90/0x4a0 [amdgpu]
>  drm_helper_disable_unused_functions+0x67/0x80 [drm_kms_helper]
>  amdgpu_fbdev_init+0x13e/0x170 [amdgpu]
>  amdgpu_device_init+0x1aeb/0x2490 [amdgpu]
>
> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
> ---
>  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> index f6afa6a..327030b 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> @@ -1166,8 +1166,8 @@ static int smu7_enable_dpm_tasks(struct pp_hwmgr *hwmgr)
>
>         tmp_result = (!smum_is_dpm_running(hwmgr)) ? 0 : -1;
>         PP_ASSERT_WITH_CODE(tmp_result == 0,
> -                       "DPM is already running right now, no need to enable DPM!",
> -                       return 0);
> +                       "DPM is already running",
> +                       );
>
>         if (smu7_voltage_control(hwmgr)) {
>                 tmp_result = smu7_enable_voltage_control(hwmgr);
> --
> 2.7.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Zhu, Rex Oct. 12, 2016, 1:44 p.m. UTC | #3
Hi Grazvydas and Alex,

We needed to disable dpm when rmmod amdgpu for this issue.

I am checking the function of disable dpm task. 

Best Regards
Rex

-----Original Message-----
From: Alex Deucher [mailto:alexdeucher@gmail.com] 

Sent: Wednesday, October 12, 2016 4:01 AM
To: Grazvydas Ignotas; Zhu, Rex
Cc: Maling list - DRI developers; amd-gfx list
Subject: Re: [PATCH] drm/amd/powerplay: don't give up if DPM is already running

+Rex to review this.

Alex

On Sun, Oct 9, 2016 at 3:23 PM, Grazvydas Ignotas <notasas@gmail.com> wrote:
> Currently the driver crashes if smu7_enable_dpm_tasks() returns early, 

> which it does if DPM is already active. It seems to be better just to 

> continue anyway, at least I haven't noticed any ill effects. It's also 

> unclear at what state the hardware was left by the previous driver, so 

> IMO it's better to always fully initialize.

>

> Way to reproduce:

> $ modprobe amdgpu

> $ rmmod amdgpu

> $ modprobe amdgpu

> ...

> DPM is already running right now, no need to enable DPM!

> ...

> failed to send message 18b ret is 0

> BUG: unable to handle kernel paging request at ffffed01fc9ab21f Call 

> Trace:

>  smu7_set_power_state_tasks+0x499/0x1940 [amdgpu]

>  phm_set_power_state+0xcb/0x120 [amdgpu]

>  psm_adjust_power_state_dynamic+0x11e/0x1b0 [amdgpu]

>  pem_task_adjust_power_state+0xb9/0xd0 [amdgpu]

>  pem_excute_event_chain+0x7d/0xe0 [amdgpu]

>  pem_handle_event_unlocked+0x49/0x60 [amdgpu]

>  pem_handle_event+0xe/0x10 [amdgpu]

>  pp_dpm_dispatch_tasks+0xe0/0x190 [amdgpu]

>  amdgpu_pm_compute_clocks+0x10c/0xc60 [amdgpu]

>  dce_v11_0_crtc_dpms+0x7d/0x150 [amdgpu]

>  dce_v11_0_crtc_disable+0x90/0x4a0 [amdgpu]

>  drm_helper_disable_unused_functions+0x67/0x80 [drm_kms_helper]

>  amdgpu_fbdev_init+0x13e/0x170 [amdgpu]

>  amdgpu_device_init+0x1aeb/0x2490 [amdgpu]

>

> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>

> ---

>  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c | 4 ++--

>  1 file changed, 2 insertions(+), 2 deletions(-)

>

> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c 

> b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c

> index f6afa6a..327030b 100644

> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c

> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c

> @@ -1166,8 +1166,8 @@ static int smu7_enable_dpm_tasks(struct pp_hwmgr 

> *hwmgr)

>

>         tmp_result = (!smum_is_dpm_running(hwmgr)) ? 0 : -1;

>         PP_ASSERT_WITH_CODE(tmp_result == 0,

> -                       "DPM is already running right now, no need to enable DPM!",

> -                       return 0);

> +                       "DPM is already running",

> +                       );

>

>         if (smu7_voltage_control(hwmgr)) {

>                 tmp_result = smu7_enable_voltage_control(hwmgr);

> --

> 2.7.4

>

> _______________________________________________

> amd-gfx mailing list

> amd-gfx@lists.freedesktop.org

> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Zhu, Rex Oct. 13, 2016, 7:45 a.m. UTC | #4
Hi all,

The attached patches were also for this issue.
Disable dpm when rmmod amdgpu.

Please help to review.

Best Regards
Rex

-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Zhu, Rex

Sent: Wednesday, October 12, 2016 9:45 PM
To: Alex Deucher; Grazvydas Ignotas
Cc: amd-gfx list; Maling list - DRI developers
Subject: RE: [PATCH] drm/amd/powerplay: don't give up if DPM is already running

Hi Grazvydas and Alex,

We needed to disable dpm when rmmod amdgpu for this issue.

I am checking the function of disable dpm task. 

Best Regards
Rex

-----Original Message-----
From: Alex Deucher [mailto:alexdeucher@gmail.com]

Sent: Wednesday, October 12, 2016 4:01 AM
To: Grazvydas Ignotas; Zhu, Rex
Cc: Maling list - DRI developers; amd-gfx list
Subject: Re: [PATCH] drm/amd/powerplay: don't give up if DPM is already running

+Rex to review this.

Alex

On Sun, Oct 9, 2016 at 3:23 PM, Grazvydas Ignotas <notasas@gmail.com> wrote:
> Currently the driver crashes if smu7_enable_dpm_tasks() returns early, 

> which it does if DPM is already active. It seems to be better just to 

> continue anyway, at least I haven't noticed any ill effects. It's also 

> unclear at what state the hardware was left by the previous driver, so 

> IMO it's better to always fully initialize.

>

> Way to reproduce:

> $ modprobe amdgpu

> $ rmmod amdgpu

> $ modprobe amdgpu

> ...

> DPM is already running right now, no need to enable DPM!

> ...

> failed to send message 18b ret is 0

> BUG: unable to handle kernel paging request at ffffed01fc9ab21f Call

> Trace:

>  smu7_set_power_state_tasks+0x499/0x1940 [amdgpu]

>  phm_set_power_state+0xcb/0x120 [amdgpu]

>  psm_adjust_power_state_dynamic+0x11e/0x1b0 [amdgpu]

>  pem_task_adjust_power_state+0xb9/0xd0 [amdgpu]

>  pem_excute_event_chain+0x7d/0xe0 [amdgpu]

>  pem_handle_event_unlocked+0x49/0x60 [amdgpu]

>  pem_handle_event+0xe/0x10 [amdgpu]

>  pp_dpm_dispatch_tasks+0xe0/0x190 [amdgpu]

>  amdgpu_pm_compute_clocks+0x10c/0xc60 [amdgpu]

>  dce_v11_0_crtc_dpms+0x7d/0x150 [amdgpu]

>  dce_v11_0_crtc_disable+0x90/0x4a0 [amdgpu]

>  drm_helper_disable_unused_functions+0x67/0x80 [drm_kms_helper]

>  amdgpu_fbdev_init+0x13e/0x170 [amdgpu]

>  amdgpu_device_init+0x1aeb/0x2490 [amdgpu]

>

> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>

> ---

>  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c | 4 ++--

>  1 file changed, 2 insertions(+), 2 deletions(-)

>

> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c

> b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c

> index f6afa6a..327030b 100644

> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c

> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c

> @@ -1166,8 +1166,8 @@ static int smu7_enable_dpm_tasks(struct pp_hwmgr

> *hwmgr)

>

>         tmp_result = (!smum_is_dpm_running(hwmgr)) ? 0 : -1;

>         PP_ASSERT_WITH_CODE(tmp_result == 0,

> -                       "DPM is already running right now, no need to enable DPM!",

> -                       return 0);

> +                       "DPM is already running",

> +                       );

>

>         if (smu7_voltage_control(hwmgr)) {

>                 tmp_result = smu7_enable_voltage_control(hwmgr);

> --

> 2.7.4

>

> _______________________________________________

> amd-gfx mailing list

> amd-gfx@lists.freedesktop.org

> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Grazvydas Ignotas Oct. 15, 2016, 6:55 p.m. UTC | #5
Hi,

On Thu, Oct 13, 2016 at 10:45 AM, Zhu, Rex <Rex.Zhu@amd.com> wrote:
>
> The attached patches were also for this issue.
> Disable dpm when rmmod amdgpu.

It works for modprobe-rmmod-modprobe test, thanks.

However with GPU passthrough (giving control of the GPU to a Windows
virtual machine using iommu, then shutting down the VM and loading
amdgpu) the problem is still there, same backtrace as in my commit
message. It seems the Windows driver leaves DPM enabled on shutdown.
With my patch the crash goes away.

It would be nice to have GPU passthrough working, this is the only
issue that breaks it. Windows can drive the GPU after amdgpu fine
already, only amdgpu has problems to run after Windows.

Gražvydas
Zhu, Rex Oct. 19, 2016, 1:59 p.m. UTC | #6
Hi Gražvydas,

Sorry for so late response.

I can reproduce this issue in gpu passthrough case.

When we forced power off the VM or forced reset the VM without shutting down the Os, the rmmod will not be called. So dpm was still running.

If we skipped the enable dpm tasks, some clock tables were not be initialized, so kernel panic when we visited those tables in set power state.

I will send the fix patch for review tomorrow.

Thanks.

Best Regards
Rex

-----Original Message-----
From: Grazvydas Ignotas [mailto:notasas@gmail.com] 

Sent: Sunday, October 16, 2016 2:55 AM
To: Zhu, Rex
Cc: Alex Deucher; amd-gfx list; Maling list - DRI developers
Subject: Re: [PATCH] drm/amd/powerplay: don't give up if DPM is already running

Hi,

On Thu, Oct 13, 2016 at 10:45 AM, Zhu, Rex <Rex.Zhu@amd.com> wrote:
>

> The attached patches were also for this issue.

> Disable dpm when rmmod amdgpu.


It works for modprobe-rmmod-modprobe test, thanks.

However with GPU passthrough (giving control of the GPU to a Windows virtual machine using iommu, then shutting down the VM and loading
amdgpu) the problem is still there, same backtrace as in my commit message. It seems the Windows driver leaves DPM enabled on shutdown.
With my patch the crash goes away.

It would be nice to have GPU passthrough working, this is the only issue that breaks it. Windows can drive the GPU after amdgpu fine already, only amdgpu has problems to run after Windows.

Gražvydas
Zhu, Rex Oct. 20, 2016, 8:41 a.m. UTC | #7
Hi Gražvydas,

In GPU passthrough case, I can't find better solution than yours from driver.
So will apply your patch and just update reproduce step.

Thanks.


Best Regards
Rex

-----Original Message-----
From: Zhu, Rex 

Sent: Wednesday, October 19, 2016 10:00 PM
To: 'Grazvydas Ignotas'
Cc: Alex Deucher; amd-gfx list; Maling list - DRI developers
Subject: RE: [PATCH] drm/amd/powerplay: don't give up if DPM is already running

Hi Gražvydas,

Sorry for so late response.

I can reproduce this issue in gpu passthrough case.

When we forced power off the VM or forced reset the VM without shutting down the Os, the rmmod will not be called. So dpm was still running.

If we skipped the enable dpm tasks, some clock tables were not be initialized, so kernel panic when we visited those tables in set power state.

I will send the fix patch for review tomorrow.

Thanks.

Best Regards
Rex

-----Original Message-----
From: Grazvydas Ignotas [mailto:notasas@gmail.com] 

Sent: Sunday, October 16, 2016 2:55 AM
To: Zhu, Rex
Cc: Alex Deucher; amd-gfx list; Maling list - DRI developers
Subject: Re: [PATCH] drm/amd/powerplay: don't give up if DPM is already running

Hi,

On Thu, Oct 13, 2016 at 10:45 AM, Zhu, Rex <Rex.Zhu@amd.com> wrote:
>

> The attached patches were also for this issue.

> Disable dpm when rmmod amdgpu.


It works for modprobe-rmmod-modprobe test, thanks.

However with GPU passthrough (giving control of the GPU to a Windows virtual machine using iommu, then shutting down the VM and loading
amdgpu) the problem is still there, same backtrace as in my commit message. It seems the Windows driver leaves DPM enabled on shutdown.
With my patch the crash goes away.

It would be nice to have GPU passthrough working, this is the only issue that breaks it. Windows can drive the GPU after amdgpu fine already, only amdgpu has problems to run after Windows.

Gražvydas
diff mbox

Patch

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
index f6afa6a..327030b 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
@@ -1166,8 +1166,8 @@  static int smu7_enable_dpm_tasks(struct pp_hwmgr *hwmgr)
 
 	tmp_result = (!smum_is_dpm_running(hwmgr)) ? 0 : -1;
 	PP_ASSERT_WITH_CODE(tmp_result == 0,
-			"DPM is already running right now, no need to enable DPM!",
-			return 0);
+			"DPM is already running",
+			);
 
 	if (smu7_voltage_control(hwmgr)) {
 		tmp_result = smu7_enable_voltage_control(hwmgr);