Message ID | 20230310074000.2078124-1-lizhenneng@kylinos.cn (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/amdgpu: resove reboot exception for si oland | expand |
> -----Original Message----- > From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of > Zhenneng Li > Sent: Friday, March 10, 2023 3:40 PM > To: Deucher, Alexander <Alexander.Deucher@amd.com> > Cc: David Airlie <airlied@linux.ie>; Pan, Xinhui <Xinhui.Pan@amd.com>; > linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Zhenneng Li > <lizhenneng@kylinos.cn>; amd-gfx@lists.freedesktop.org; Daniel Vetter > <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com> > Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland > > During reboot test on arm64 platform, it may failure on boot. > > The error message are as follows: > [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] > *ERROR* > late_init of IP block <si_dpm> failed -22 > [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init > failed > [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init > --- > drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > index d6d9e3b1b2c0..dee51c757ac0 100644 > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > @@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle) > if (!adev->pm.dpm_enabled) > return 0; > > - ret = si_set_temperature_range(adev); > - if (ret) > - return ret; si_set_temperature_range should be platform agnostic. Can you please elaborate more? Regards, Guchun > #if 0 //TODO ? > si_dpm_powergate_uvd(adev, true); > #endif > -- > 2.25.1
On Fri, Mar 10, 2023 at 3:18 AM Chen, Guchun <Guchun.Chen@amd.com> wrote: > > > > -----Original Message----- > > From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of > > Zhenneng Li > > Sent: Friday, March 10, 2023 3:40 PM > > To: Deucher, Alexander <Alexander.Deucher@amd.com> > > Cc: David Airlie <airlied@linux.ie>; Pan, Xinhui <Xinhui.Pan@amd.com>; > > linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Zhenneng Li > > <lizhenneng@kylinos.cn>; amd-gfx@lists.freedesktop.org; Daniel Vetter > > <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com> > > Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland > > > > During reboot test on arm64 platform, it may failure on boot. > > > > The error message are as follows: > > [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] > > *ERROR* > > late_init of IP block <si_dpm> failed -22 > > [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init > > failed > > [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init > > --- > > drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 --- > > 1 file changed, 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > > b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > > index d6d9e3b1b2c0..dee51c757ac0 100644 > > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c > > @@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle) > > if (!adev->pm.dpm_enabled) > > return 0; > > > > - ret = si_set_temperature_range(adev); > > - if (ret) > > - return ret; > > si_set_temperature_range should be platform agnostic. Can you please elaborate more? > Yes. Not setting this means we won't get thermal interrupts. We shouldn't skip this. Alex > Regards, > Guchun > > > #if 0 //TODO ? > > si_dpm_powergate_uvd(adev, true); > > #endif > > -- > > 2.25.1 >
[AMD Official Use Only - General] I recall that there was a previous discussion around this and that time we found that the range is already set earlier during DPM enablement. The suspected root cause was enable/disable of thermal alert within this call to set range again. Thanks, Lijo
This bug is first reported here: https://lore.kernel.org/lkml/1a620e7c-5b71-3d16-001a-0d79b292aca7@amd.com/ I modify the patch accroding mail list's discusstion, and I do reboot test for tens of thousands of times about 10 machines on arm64, there's no bug reported. 在 2023/3/10 16:18, Chen, Guchun 写道: >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of >> Zhenneng Li >> Sent: Friday, March 10, 2023 3:40 PM >> To: Deucher, Alexander <Alexander.Deucher@amd.com> >> Cc: David Airlie <airlied@linux.ie>; Pan, Xinhui <Xinhui.Pan@amd.com>; >> linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Zhenneng Li >> <lizhenneng@kylinos.cn>; amd-gfx@lists.freedesktop.org; Daniel Vetter >> <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com> >> Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland >> >> During reboot test on arm64 platform, it may failure on boot. >> >> The error message are as follows: >> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] >> *ERROR* >> late_init of IP block <si_dpm> failed -22 >> [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init >> failed >> [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init >> --- >> drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 --- >> 1 file changed, 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c >> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c >> index d6d9e3b1b2c0..dee51c757ac0 100644 >> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c >> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c >> @@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle) >> if (!adev->pm.dpm_enabled) >> return 0; >> >> - ret = si_set_temperature_range(adev); >> - if (ret) >> - return ret; > si_set_temperature_range should be platform agnostic. Can you please elaborate more? > > Regards, > Guchun > >> #if 0 //TODO ? >> si_dpm_powergate_uvd(adev, true); >> #endif >> -- >> 2.25.1
diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c index d6d9e3b1b2c0..dee51c757ac0 100644 --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c @@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle) if (!adev->pm.dpm_enabled) return 0; - ret = si_set_temperature_range(adev); - if (ret) - return ret; #if 0 //TODO ? si_dpm_powergate_uvd(adev, true); #endif