diff mbox series

gpu/drm/msm: fix shutdown hook in case GPU components failed to bind

Message ID 20210301214152.1805737-1-dmitry.baryshkov@linaro.org (mailing list archive)
State Superseded
Headers show
Series gpu/drm/msm: fix shutdown hook in case GPU components failed to bind | expand

Commit Message

Dmitry Baryshkov March 1, 2021, 9:41 p.m. UTC
if GPU components have failed to bind, shutdown callback would fail with
the following backtrace. Add safeguard check to stop that oops from
happening and allow the board to reboot.

[   66.617046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[   66.626066] Mem abort info:
[   66.628939]   ESR = 0x96000006
[   66.632088]   EC = 0x25: DABT (current EL), IL = 32 bits
[   66.637542]   SET = 0, FnV = 0
[   66.640688]   EA = 0, S1PTW = 0
[   66.643924] Data abort info:
[   66.646889]   ISV = 0, ISS = 0x00000006
[   66.650832]   CM = 0, WnR = 0
[   66.653890] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000107f81000
[   66.660505] [0000000000000000] pgd=0000000100bb2003, p4d=0000000100bb2003, pud=0000000100897003, pmd=0000000000000000
[   66.671398] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[   66.677115] Modules linked in:
[   66.680261] CPU: 6 PID: 352 Comm: reboot Not tainted 5.11.0-rc2-00309-g79e3faa756b2 #38
[   66.688473] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
[   66.695347] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[   66.701507] pc : msm_atomic_commit_tail+0x78/0x4e0
[   66.706437] lr : commit_tail+0xa4/0x184
[   66.710381] sp : ffff8000108f3af0
[   66.713791] x29: ffff8000108f3af0 x28: ffff418c44337000
[   66.719242] x27: 0000000000000000 x26: ffff418c40a24490
[   66.724693] x25: ffffd3a842a4f1a0 x24: 0000000000000008
[   66.730146] x23: ffffd3a84313f030 x22: ffff418c444ce000
[   66.735598] x21: ffff418c408a4980 x20: 0000000000000000
[   66.741049] x19: 0000000000000000 x18: ffff800010710fbc
[   66.746500] x17: 000000000000000c x16: 0000000000000001
[   66.751954] x15: 0000000000010008 x14: 0000000000000068
[   66.757405] x13: 0000000000000001 x12: 0000000000000000
[   66.762855] x11: 0000000000000001 x10: 00000000000009b0
[   66.768306] x9 : ffffd3a843192000 x8 : ffff418c44337000
[   66.773757] x7 : 0000000000000000 x6 : 00000000a401b34e
[   66.779210] x5 : 00ffffffffffffff x4 : 0000000000000000
[   66.784660] x3 : 0000000000000000 x2 : ffff418c444ce000
[   66.790111] x1 : ffffd3a841dce530 x0 : ffff418c444cf000
[   66.795563] Call trace:
[   66.798075]  msm_atomic_commit_tail+0x78/0x4e0
[   66.802633]  commit_tail+0xa4/0x184
[   66.806217]  drm_atomic_helper_commit+0x160/0x390
[   66.811051]  drm_atomic_commit+0x4c/0x60
[   66.815082]  drm_atomic_helper_disable_all+0x1f4/0x210
[   66.820355]  drm_atomic_helper_shutdown+0x80/0x130
[   66.825276]  msm_pdev_shutdown+0x14/0x20
[   66.829303]  platform_shutdown+0x28/0x40
[   66.833330]  device_shutdown+0x158/0x330
[   66.837357]  kernel_restart+0x40/0xa0
[   66.841122]  __do_sys_reboot+0x228/0x250
[   66.845148]  __arm64_sys_reboot+0x28/0x34
[   66.849264]  el0_svc_common.constprop.0+0x74/0x190
[   66.854187]  do_el0_svc+0x24/0x90
[   66.857595]  el0_svc+0x14/0x20
[   66.860739]  el0_sync_handler+0x1a4/0x1b0
[   66.864858]  el0_sync+0x174/0x180
[   66.868269] Code: 1ac020a0 2a000273 eb02007f 54ffff01 (f9400285)
[   66.874525] ---[ end trace 20dedb2a3229fec8 ]---

Fixes: 9d5cbf5fe46e ("drm/msm: add shutdown support for display platform_driver")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
 drivers/gpu/drm/msm/msm_atomic.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Abhinav Kumar March 4, 2021, 12:22 a.m. UTC | #1
On 2021-03-01 13:41, Dmitry Baryshkov wrote:
> if GPU components have failed to bind, shutdown callback would fail 
> with
> the following backtrace. Add safeguard check to stop that oops from
> happening and allow the board to reboot.
> 
> [   66.617046] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000000
> [   66.626066] Mem abort info:
> [   66.628939]   ESR = 0x96000006
> [   66.632088]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   66.637542]   SET = 0, FnV = 0
> [   66.640688]   EA = 0, S1PTW = 0
> [   66.643924] Data abort info:
> [   66.646889]   ISV = 0, ISS = 0x00000006
> [   66.650832]   CM = 0, WnR = 0
> [   66.653890] user pgtable: 4k pages, 48-bit VAs, 
> pgdp=0000000107f81000
> [   66.660505] [0000000000000000] pgd=0000000100bb2003,
> p4d=0000000100bb2003, pud=0000000100897003, pmd=0000000000000000
> [   66.671398] Internal error: Oops: 96000006 [#1] PREEMPT SMP
> [   66.677115] Modules linked in:
> [   66.680261] CPU: 6 PID: 352 Comm: reboot Not tainted
> 5.11.0-rc2-00309-g79e3faa756b2 #38
> [   66.688473] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 
> (DT)
> [   66.695347] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> [   66.701507] pc : msm_atomic_commit_tail+0x78/0x4e0
> [   66.706437] lr : commit_tail+0xa4/0x184
> [   66.710381] sp : ffff8000108f3af0
> [   66.713791] x29: ffff8000108f3af0 x28: ffff418c44337000
> [   66.719242] x27: 0000000000000000 x26: ffff418c40a24490
> [   66.724693] x25: ffffd3a842a4f1a0 x24: 0000000000000008
> [   66.730146] x23: ffffd3a84313f030 x22: ffff418c444ce000
> [   66.735598] x21: ffff418c408a4980 x20: 0000000000000000
> [   66.741049] x19: 0000000000000000 x18: ffff800010710fbc
> [   66.746500] x17: 000000000000000c x16: 0000000000000001
> [   66.751954] x15: 0000000000010008 x14: 0000000000000068
> [   66.757405] x13: 0000000000000001 x12: 0000000000000000
> [   66.762855] x11: 0000000000000001 x10: 00000000000009b0
> [   66.768306] x9 : ffffd3a843192000 x8 : ffff418c44337000
> [   66.773757] x7 : 0000000000000000 x6 : 00000000a401b34e
> [   66.779210] x5 : 00ffffffffffffff x4 : 0000000000000000
> [   66.784660] x3 : 0000000000000000 x2 : ffff418c444ce000
> [   66.790111] x1 : ffffd3a841dce530 x0 : ffff418c444cf000
> [   66.795563] Call trace:
> [   66.798075]  msm_atomic_commit_tail+0x78/0x4e0
> [   66.802633]  commit_tail+0xa4/0x184
> [   66.806217]  drm_atomic_helper_commit+0x160/0x390
> [   66.811051]  drm_atomic_commit+0x4c/0x60
> [   66.815082]  drm_atomic_helper_disable_all+0x1f4/0x210
> [   66.820355]  drm_atomic_helper_shutdown+0x80/0x130
> [   66.825276]  msm_pdev_shutdown+0x14/0x20
> [   66.829303]  platform_shutdown+0x28/0x40
> [   66.833330]  device_shutdown+0x158/0x330
> [   66.837357]  kernel_restart+0x40/0xa0
> [   66.841122]  __do_sys_reboot+0x228/0x250
> [   66.845148]  __arm64_sys_reboot+0x28/0x34
> [   66.849264]  el0_svc_common.constprop.0+0x74/0x190
> [   66.854187]  do_el0_svc+0x24/0x90
> [   66.857595]  el0_svc+0x14/0x20
> [   66.860739]  el0_sync_handler+0x1a4/0x1b0
> [   66.864858]  el0_sync+0x174/0x180
> [   66.868269] Code: 1ac020a0 2a000273 eb02007f 54ffff01 (f9400285)
> [   66.874525] ---[ end trace 20dedb2a3229fec8 ]---
> 
> Fixes: 9d5cbf5fe46e ("drm/msm: add shutdown support for display
> platform_driver")
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Abhinav Kumar <abhinavk@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/msm_atomic.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_atomic.c 
> b/drivers/gpu/drm/msm/msm_atomic.c
> index 6a326761dc4a..2fd0cf6421ad 100644
> --- a/drivers/gpu/drm/msm/msm_atomic.c
> +++ b/drivers/gpu/drm/msm/msm_atomic.c
> @@ -207,7 +207,12 @@ void msm_atomic_commit_tail(struct 
> drm_atomic_state *state)
>  	struct msm_kms *kms = priv->kms;
>  	struct drm_crtc *async_crtc = NULL;
>  	unsigned crtc_mask = get_crtc_mask(state);
> -	bool async = kms->funcs->vsync_time &&
> +	bool async;
> +
> +	if (!kms)
> +		return;
> +
> +	async = kms->funcs->vsync_time &&
>  			can_do_async(state, &async_crtc);
> 
>  	trace_msm_atomic_commit_tail_start(async, crtc_mask);
Rob Clark March 17, 2021, 4:25 p.m. UTC | #2
On Mon, Mar 1, 2021 at 1:41 PM Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:
>
> if GPU components have failed to bind, shutdown callback would fail with
> the following backtrace. Add safeguard check to stop that oops from
> happening and allow the board to reboot.
>
> [   66.617046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [   66.626066] Mem abort info:
> [   66.628939]   ESR = 0x96000006
> [   66.632088]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   66.637542]   SET = 0, FnV = 0
> [   66.640688]   EA = 0, S1PTW = 0
> [   66.643924] Data abort info:
> [   66.646889]   ISV = 0, ISS = 0x00000006
> [   66.650832]   CM = 0, WnR = 0
> [   66.653890] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000107f81000
> [   66.660505] [0000000000000000] pgd=0000000100bb2003, p4d=0000000100bb2003, pud=0000000100897003, pmd=0000000000000000
> [   66.671398] Internal error: Oops: 96000006 [#1] PREEMPT SMP
> [   66.677115] Modules linked in:
> [   66.680261] CPU: 6 PID: 352 Comm: reboot Not tainted 5.11.0-rc2-00309-g79e3faa756b2 #38
> [   66.688473] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
> [   66.695347] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> [   66.701507] pc : msm_atomic_commit_tail+0x78/0x4e0
> [   66.706437] lr : commit_tail+0xa4/0x184
> [   66.710381] sp : ffff8000108f3af0
> [   66.713791] x29: ffff8000108f3af0 x28: ffff418c44337000
> [   66.719242] x27: 0000000000000000 x26: ffff418c40a24490
> [   66.724693] x25: ffffd3a842a4f1a0 x24: 0000000000000008
> [   66.730146] x23: ffffd3a84313f030 x22: ffff418c444ce000
> [   66.735598] x21: ffff418c408a4980 x20: 0000000000000000
> [   66.741049] x19: 0000000000000000 x18: ffff800010710fbc
> [   66.746500] x17: 000000000000000c x16: 0000000000000001
> [   66.751954] x15: 0000000000010008 x14: 0000000000000068
> [   66.757405] x13: 0000000000000001 x12: 0000000000000000
> [   66.762855] x11: 0000000000000001 x10: 00000000000009b0
> [   66.768306] x9 : ffffd3a843192000 x8 : ffff418c44337000
> [   66.773757] x7 : 0000000000000000 x6 : 00000000a401b34e
> [   66.779210] x5 : 00ffffffffffffff x4 : 0000000000000000
> [   66.784660] x3 : 0000000000000000 x2 : ffff418c444ce000
> [   66.790111] x1 : ffffd3a841dce530 x0 : ffff418c444cf000
> [   66.795563] Call trace:
> [   66.798075]  msm_atomic_commit_tail+0x78/0x4e0
> [   66.802633]  commit_tail+0xa4/0x184
> [   66.806217]  drm_atomic_helper_commit+0x160/0x390
> [   66.811051]  drm_atomic_commit+0x4c/0x60
> [   66.815082]  drm_atomic_helper_disable_all+0x1f4/0x210
> [   66.820355]  drm_atomic_helper_shutdown+0x80/0x130
> [   66.825276]  msm_pdev_shutdown+0x14/0x20
> [   66.829303]  platform_shutdown+0x28/0x40
> [   66.833330]  device_shutdown+0x158/0x330
> [   66.837357]  kernel_restart+0x40/0xa0
> [   66.841122]  __do_sys_reboot+0x228/0x250
> [   66.845148]  __arm64_sys_reboot+0x28/0x34
> [   66.849264]  el0_svc_common.constprop.0+0x74/0x190
> [   66.854187]  do_el0_svc+0x24/0x90
> [   66.857595]  el0_svc+0x14/0x20
> [   66.860739]  el0_sync_handler+0x1a4/0x1b0
> [   66.864858]  el0_sync+0x174/0x180
> [   66.868269] Code: 1ac020a0 2a000273 eb02007f 54ffff01 (f9400285)
> [   66.874525] ---[ end trace 20dedb2a3229fec8 ]---
>
> Fixes: 9d5cbf5fe46e ("drm/msm: add shutdown support for display platform_driver")
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>  drivers/gpu/drm/msm/msm_atomic.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> index 6a326761dc4a..2fd0cf6421ad 100644
> --- a/drivers/gpu/drm/msm/msm_atomic.c
> +++ b/drivers/gpu/drm/msm/msm_atomic.c
> @@ -207,7 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state)
>         struct msm_kms *kms = priv->kms;
>         struct drm_crtc *async_crtc = NULL;
>         unsigned crtc_mask = get_crtc_mask(state);
> -       bool async = kms->funcs->vsync_time &&
> +       bool async;
> +
> +       if (!kms)
> +               return;

I think we could instead just check for null priv->kms in
msm_pdev_shutdown() and not call drm_atomic_helper_shutdown()?

BR,
-R

> +
> +       async = kms->funcs->vsync_time &&
>                         can_do_async(state, &async_crtc);
>
>         trace_msm_atomic_commit_tail_start(async, crtc_mask);
> --
> 2.30.1
>
Dmitry Baryshkov March 18, 2021, 8:05 p.m. UTC | #3
On 17/03/2021 19:25, Rob Clark wrote:
> On Mon, Mar 1, 2021 at 1:41 PM Dmitry Baryshkov
> <dmitry.baryshkov@linaro.org> wrote:
>>
>> if GPU components have failed to bind, shutdown callback would fail with
>> the following backtrace. Add safeguard check to stop that oops from
>> happening and allow the board to reboot.

[skipped]

>> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
>> index 6a326761dc4a..2fd0cf6421ad 100644
>> --- a/drivers/gpu/drm/msm/msm_atomic.c
>> +++ b/drivers/gpu/drm/msm/msm_atomic.c
>> @@ -207,7 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state)
>>          struct msm_kms *kms = priv->kms;
>>          struct drm_crtc *async_crtc = NULL;
>>          unsigned crtc_mask = get_crtc_mask(state);
>> -       bool async = kms->funcs->vsync_time &&
>> +       bool async;
>> +
>> +       if (!kms)
>> +               return;
> 
> I think we could instead just check for null priv->kms in
> msm_pdev_shutdown() and not call drm_atomic_helper_shutdown()?


Good idea. Sending v2.

> 
> BR,
> -R
> 
>> +
>> +       async = kms->funcs->vsync_time &&
>>                          can_do_async(state, &async_crtc);
>>
>>          trace_msm_atomic_commit_tail_start(async, crtc_mask);
>> --
>> 2.30.1
>>
Fabio Estevam March 19, 2021, 12:09 p.m. UTC | #4
Hi Dmitry,

On Mon, Mar 1, 2021 at 6:41 PM Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:

> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> index 6a326761dc4a..2fd0cf6421ad 100644
> --- a/drivers/gpu/drm/msm/msm_atomic.c
> +++ b/drivers/gpu/drm/msm/msm_atomic.c
> @@ -207,7 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state)
>         struct msm_kms *kms = priv->kms;
>         struct drm_crtc *async_crtc = NULL;
>         unsigned crtc_mask = get_crtc_mask(state);
> -       bool async = kms->funcs->vsync_time &&
> +       bool async;
> +
> +       if (!kms)
> +               return;
> +
> +       async = kms->funcs->vsync_time &&
>                         can_do_async(state, &async_crtc);

I also see the same issue on a i.MX53:
https://lists.freedesktop.org/archives/freedreno/2021-January/009369.html

Then I got a different suggestion from Rob. Please check:

https://www.spinics.net/lists/dri-devel/msg286648.html
and
https://www.spinics.net/lists/dri-devel/msg286649.html

Does this series fix the issue in your platform too?
Rob Clark March 19, 2021, 2:47 p.m. UTC | #5
On Fri, Mar 19, 2021 at 5:09 AM Fabio Estevam <festevam@gmail.com> wrote:
>
> Hi Dmitry,
>
> On Mon, Mar 1, 2021 at 6:41 PM Dmitry Baryshkov
> <dmitry.baryshkov@linaro.org> wrote:
>
> > diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
> > index 6a326761dc4a..2fd0cf6421ad 100644
> > --- a/drivers/gpu/drm/msm/msm_atomic.c
> > +++ b/drivers/gpu/drm/msm/msm_atomic.c
> > @@ -207,7 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state)
> >         struct msm_kms *kms = priv->kms;
> >         struct drm_crtc *async_crtc = NULL;
> >         unsigned crtc_mask = get_crtc_mask(state);
> > -       bool async = kms->funcs->vsync_time &&
> > +       bool async;
> > +
> > +       if (!kms)
> > +               return;
> > +
> > +       async = kms->funcs->vsync_time &&
> >                         can_do_async(state, &async_crtc);
>
> I also see the same issue on a i.MX53:
> https://lists.freedesktop.org/archives/freedreno/2021-January/009369.html
>
> Then I got a different suggestion from Rob. Please check:
>
> https://www.spinics.net/lists/dri-devel/msg286648.html
> and
> https://www.spinics.net/lists/dri-devel/msg286649.html
>
> Does this series fix the issue in your platform too?

I think that might not help if something fails to probe due to (for
example) a missing dependency, so !priv->kms is probably a better
check to cover both cases.  But the 2nd patch makes a good point, that
the suspend/resume path probably needs the same treatment

BR,
-R
Fabio Estevam March 19, 2021, 3:13 p.m. UTC | #6
Hi Rob,

On Fri, Mar 19, 2021 at 11:44 AM Rob Clark <robdclark@gmail.com> wrote:

> I think that might not help if something fails to probe due to (for
> example) a missing dependency, so !priv->kms is probably a better
> check to cover both cases.  But the 2nd patch makes a good point, that
> the suspend/resume path probably needs the same treatment

Thanks for the feedback.
I will follow the same approach for fixing the suspend/resume path then.

Let me test it and then I will re-submit Dmitry's patch and the one
for suspend/resume as part of a patch series.

Thanks
Rob Clark March 19, 2021, 3:30 p.m. UTC | #7
On Fri, Mar 19, 2021 at 8:13 AM Fabio Estevam <festevam@gmail.com> wrote:
>
> Hi Rob,
>
> On Fri, Mar 19, 2021 at 11:44 AM Rob Clark <robdclark@gmail.com> wrote:
>
> > I think that might not help if something fails to probe due to (for
> > example) a missing dependency, so !priv->kms is probably a better
> > check to cover both cases.  But the 2nd patch makes a good point, that
> > the suspend/resume path probably needs the same treatment
>
> Thanks for the feedback.
> I will follow the same approach for fixing the suspend/resume path then.
>
> Let me test it and then I will re-submit Dmitry's patch and the one
> for suspend/resume as part of a patch series.

Thanks,

BR,
-R
Fabio Estevam March 19, 2021, 4:24 p.m. UTC | #8
On Fri, Mar 19, 2021 at 12:13 PM Fabio Estevam <festevam@gmail.com> wrote:

> Thanks for the feedback.
> I will follow the same approach for fixing the suspend/resume path then.
>
> Let me test it and then I will re-submit Dmitry's patch and the one
> for suspend/resume as part of a patch series.

This approach works here for the suspend/resume path too.

I have just submitted the series, thanks.
Dmitry Baryshkov March 22, 2021, 10:12 a.m. UTC | #9
On Fri, 19 Mar 2021 at 19:25, Fabio Estevam <festevam@gmail.com> wrote:
>
> On Fri, Mar 19, 2021 at 12:13 PM Fabio Estevam <festevam@gmail.com> wrote:
>
> > Thanks for the feedback.
> > I will follow the same approach for fixing the suspend/resume path then.
> >
> > Let me test it and then I will re-submit Dmitry's patch and the one
> > for suspend/resume as part of a patch series.
>
> This approach works here for the suspend/resume path too.
>
> I have just submitted the series, thanks.

Thank you!
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
index 6a326761dc4a..2fd0cf6421ad 100644
--- a/drivers/gpu/drm/msm/msm_atomic.c
+++ b/drivers/gpu/drm/msm/msm_atomic.c
@@ -207,7 +207,12 @@  void msm_atomic_commit_tail(struct drm_atomic_state *state)
 	struct msm_kms *kms = priv->kms;
 	struct drm_crtc *async_crtc = NULL;
 	unsigned crtc_mask = get_crtc_mask(state);
-	bool async = kms->funcs->vsync_time &&
+	bool async;
+
+	if (!kms)
+		return;
+
+	async = kms->funcs->vsync_time &&
 			can_do_async(state, &async_crtc);
 
 	trace_msm_atomic_commit_tail_start(async, crtc_mask);