diff mbox series

drm/rockchip: Allow driver to be shutdown on reboot/kexec

Message ID 20180805124807.18169-1-marc.zyngier@arm.com (mailing list archive)
State New, archived
Headers show
Series drm/rockchip: Allow driver to be shutdown on reboot/kexec | expand

Commit Message

Marc Zyngier Aug. 5, 2018, 12:48 p.m. UTC
Leaving the DRM driver enabled on reboot or kexec has the annoying
effect of leaving the display generating transactions whilst the
IOMMU has been shut down.

In turn, the IOMMU driver (which shares its interrupt line with
the VOP) starts warning either on shutdown or when entering the
secondary kernel in the kexec case (nothing is expected on that
front).

A cheap way of ensuring that things are nicely shut down is to
register a shutdown callback in the platform driver.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Heiko Stübner Sept. 10, 2018, 9:07 a.m. UTC | #1
Am Sonntag, 5. August 2018, 14:48:07 CEST schrieb Marc Zyngier:
> Leaving the DRM driver enabled on reboot or kexec has the annoying
> effect of leaving the display generating transactions whilst the
> IOMMU has been shut down.
> 
> In turn, the IOMMU driver (which shares its interrupt line with
> the VOP) starts warning either on shutdown or when entering the
> secondary kernel in the kexec case (nothing is expected on that
> front).
> 
> A cheap way of ensuring that things are nicely shut down is to
> register a shutdown callback in the platform driver.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

applied to drm-misc-next
with Vicentes Tested-tag and also a Cc: stable tag added.

sorry for the holdup
Heiko
Brian Norris Dec. 5, 2018, 3:01 a.m. UTC | #2
+ others

Hi,

On Sun, Aug 05, 2018 at 01:48:07PM +0100, Marc Zyngier wrote:
> Leaving the DRM driver enabled on reboot or kexec has the annoying
> effect of leaving the display generating transactions whilst the
> IOMMU has been shut down.
> 
> In turn, the IOMMU driver (which shares its interrupt line with
> the VOP) starts warning either on shutdown or when entering the
> secondary kernel in the kexec case (nothing is expected on that
> front).
> 
> A cheap way of ensuring that things are nicely shut down is to
> register a shutdown callback in the platform driver.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---

This patch made it into 4.20-rc1 as well as -stable, and it has caused
regressions for me, on the Kevin and Scarlet [1] RK3399 platforms. On
shutdown/reboot, I see this:

[   94.742559] WARNING: CPU: 4 PID: 2035 at drivers/gpu/drm/drm_mode_config.c:477 drm_mode_config_cleanup+0x1c4/0x294
...
[   94.775904] CPU: 4 PID: 2035 Comm: reboot Tainted: G        W         4.20.0-rc5+ #83
[   94.784651] Hardware name: Google Scarlet (DT)
[   94.789611] pstate: 20000005 (nzCv daif -PAN -UAO)
[   94.794959] pc : drm_mode_config_cleanup+0x1c4/0x294
[   94.800500] lr : drm_mode_config_cleanup+0x108/0x294
...
[   94.898683] Call trace:
[   94.901410]  drm_mode_config_cleanup+0x1c4/0x294
[   94.906565]  rockchip_drm_unbind+0x4c/0x8c
[   94.911138]  component_master_del+0x88/0xb8
[   94.915807]  rockchip_drm_platform_remove+0x2c/0x44
[   94.921243]  rockchip_drm_platform_shutdown+0x20/0x2c
[   94.926881]  platform_drv_shutdown+0x2c/0x38
[   94.931647]  device_shutdown+0x164/0x1b8
[   94.936016]  kernel_restart_prepare+0x40/0x48
[   94.940878]  kernel_restart+0x20/0x68
[   94.944964]  __se_sys_reboot+0x1ac/0x204
[   94.949331]  __arm64_sys_reboot+0x2c/0x38
[   94.953806]  el0_svc_common+0xa4/0xec
[   94.957891]  el0_svc_compat_handler+0x30/0x3c
[   94.962753]  el0_svc_compat+0x8/0x18
[   94.966740] ---[ end trace b9ba2e701f4fb233 ]---
[   95.255169] Memory manager not clean during takedown.
[   95.260824] WARNING: CPU: 4 PID: 2035 at drivers/gpu/drm/drm_mm.c:950 drm_mm_takedown+0x34/0x44
...
[   95.292314] CPU: 4 PID: 2035 Comm: reboot Tainted: G        W         4.20.0-rc5+ #83
[   95.301061] Hardware name: Google Scarlet (DT)
[   95.306020] pstate: 60000005 (nZCv daif -PAN -UAO)
[   95.311369] pc : drm_mm_takedown+0x34/0x44
[   95.315940] lr : drm_mm_takedown+0x34/0x44
...
[   95.415857]  drm_mm_takedown+0x34/0x44
[   95.420042]  rockchip_drm_unbind+0x64/0x8c
[   95.424613]  component_master_del+0x88/0xb8
[   95.429283]  rockchip_drm_platform_remove+0x2c/0x44
[   95.434728]  rockchip_drm_platform_shutdown+0x20/0x2c
[   95.440360]  platform_drv_shutdown+0x2c/0x38
[   95.445127]  device_shutdown+0x164/0x1b8
[   95.449504]  kernel_restart_prepare+0x40/0x48
[   95.454358]  kernel_restart+0x20/0x68
[   95.458436]  __se_sys_reboot+0x1ac/0x204
[   95.462812]  __arm64_sys_reboot+0x2c/0x38
[   95.467287]  el0_svc_common+0xa4/0xec
[   95.471373]  el0_svc_compat_handler+0x30/0x3c
[   95.476235]  el0_svc_compat+0x8/0x18
[   95.480215] ---[ end trace b9ba2e701f4fb234 ]---

It's especially bad on -stable kernels, where I believe the remove()
paths were even worse. This triggers a variety of OOPSes, and it's not
clear if those are simply because of backports (e.g., RK3399 did not
have support in 4.4.y, but our downstream has merged all sorts of
backports to make it work).

Anyway, the above warnings occur on v4.20-rc, which I think is
justification enough for a revert.

I plan to submit a revert which I hope can go to 4.20 as well as
-stable. I'd hope the remove()/shutdown() paths should be fixed before
this gets applied again, and that it does not get shipped to -stable
kernels.

Brian

[1] Technically Scarlet needed a few patches from -next to work at all,
    but Kevin is a similar platform that has been working for several
    releases.

>  drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> index f814d37b1db2..05368fa4f956 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> @@ -442,6 +442,11 @@ static int rockchip_drm_platform_remove(struct platform_device *pdev)
>  	return 0;
>  }
>  
> +static void rockchip_drm_platform_shutdown(struct platform_device *pdev)
> +{
> +	rockchip_drm_platform_remove(pdev);
> +}
> +
>  static const struct of_device_id rockchip_drm_dt_ids[] = {
>  	{ .compatible = "rockchip,display-subsystem", },
>  	{ /* sentinel */ },
> @@ -451,6 +456,7 @@ MODULE_DEVICE_TABLE(of, rockchip_drm_dt_ids);
>  static struct platform_driver rockchip_drm_platform_driver = {
>  	.probe = rockchip_drm_platform_probe,
>  	.remove = rockchip_drm_platform_remove,
> +	.shutdown = rockchip_drm_platform_shutdown,
>  	.driver = {
>  		.name = "rockchip-drm",
>  		.of_match_table = rockchip_drm_dt_ids,
Heiko Stübner Dec. 5, 2018, 2:11 p.m. UTC | #3
Hi Brian,

Am Mittwoch, 5. Dezember 2018, 04:01:34 CET schrieb Brian Norris:
> + others
> 
> Hi,
> 
> On Sun, Aug 05, 2018 at 01:48:07PM +0100, Marc Zyngier wrote:
> > Leaving the DRM driver enabled on reboot or kexec has the annoying
> > effect of leaving the display generating transactions whilst the
> > IOMMU has been shut down.
> > 
> > In turn, the IOMMU driver (which shares its interrupt line with
> > the VOP) starts warning either on shutdown or when entering the
> > secondary kernel in the kexec case (nothing is expected on that
> > front).
> > 
> > A cheap way of ensuring that things are nicely shut down is to
> > register a shutdown callback in the platform driver.
> > 
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> 
> This patch made it into 4.20-rc1 as well as -stable, and it has caused
> regressions for me, on the Kevin and Scarlet [1] RK3399 platforms.
>
> On
> shutdown/reboot, I see this:
> 
> [   94.742559] WARNING: CPU: 4 PID: 2035 at
> drivers/gpu/drm/drm_mode_config.c:477 drm_mode_config_cleanup+0x1c4/0x294
> ...
> [   94.775904] CPU: 4 PID: 2035 Comm: reboot Tainted: G        W        
> 4.20.0-rc5+ #83 [   94.784651] Hardware name: Google Scarlet (DT)
> [   94.789611] pstate: 20000005 (nzCv daif -PAN -UAO)
> [   94.794959] pc : drm_mode_config_cleanup+0x1c4/0x294
> [   94.800500] lr : drm_mode_config_cleanup+0x108/0x294
> ...
> [   94.898683] Call trace:
> [   94.901410]  drm_mode_config_cleanup+0x1c4/0x294
> [   94.906565]  rockchip_drm_unbind+0x4c/0x8c
> [   94.911138]  component_master_del+0x88/0xb8
> [   94.915807]  rockchip_drm_platform_remove+0x2c/0x44
> [   94.921243]  rockchip_drm_platform_shutdown+0x20/0x2c
> [   94.926881]  platform_drv_shutdown+0x2c/0x38
> [   94.931647]  device_shutdown+0x164/0x1b8
> [   94.936016]  kernel_restart_prepare+0x40/0x48
> [   94.940878]  kernel_restart+0x20/0x68
> [   94.944964]  __se_sys_reboot+0x1ac/0x204
> [   94.949331]  __arm64_sys_reboot+0x2c/0x38
> [   94.953806]  el0_svc_common+0xa4/0xec
> [   94.957891]  el0_svc_compat_handler+0x30/0x3c
> [   94.962753]  el0_svc_compat+0x8/0x18
> [   94.966740] ---[ end trace b9ba2e701f4fb233 ]---
> [   95.255169] Memory manager not clean during takedown.
> [   95.260824] WARNING: CPU: 4 PID: 2035 at drivers/gpu/drm/drm_mm.c:950
> drm_mm_takedown+0x34/0x44 ...
> [   95.292314] CPU: 4 PID: 2035 Comm: reboot Tainted: G        W        
> 4.20.0-rc5+ #83 [   95.301061] Hardware name: Google Scarlet (DT)
> [   95.306020] pstate: 60000005 (nZCv daif -PAN -UAO)
> [   95.311369] pc : drm_mm_takedown+0x34/0x44
> [   95.315940] lr : drm_mm_takedown+0x34/0x44
> ...
> [   95.415857]  drm_mm_takedown+0x34/0x44
> [   95.420042]  rockchip_drm_unbind+0x64/0x8c
> [   95.424613]  component_master_del+0x88/0xb8
> [   95.429283]  rockchip_drm_platform_remove+0x2c/0x44
> [   95.434728]  rockchip_drm_platform_shutdown+0x20/0x2c
> [   95.440360]  platform_drv_shutdown+0x2c/0x38
> [   95.445127]  device_shutdown+0x164/0x1b8
> [   95.449504]  kernel_restart_prepare+0x40/0x48
> [   95.454358]  kernel_restart+0x20/0x68
> [   95.458436]  __se_sys_reboot+0x1ac/0x204
> [   95.462812]  __arm64_sys_reboot+0x2c/0x38
> [   95.467287]  el0_svc_common+0xa4/0xec
> [   95.471373]  el0_svc_compat_handler+0x30/0x3c
> [   95.476235]  el0_svc_compat+0x8/0x18
> [   95.480215] ---[ end trace b9ba2e701f4fb234 ]---
> 
> It's especially bad on -stable kernels, where I believe the remove()
> paths were even worse. This triggers a variety of OOPSes, and it's not
> clear if those are simply because of backports (e.g., RK3399 did not
> have support in 4.4.y, but our downstream has merged all sorts of
> backports to make it work).
> 
> Anyway, the above warnings occur on v4.20-rc, which I think is
> justification enough for a revert.

That's strange. I remember testing quite a number of shutdown/reboot
cycles before applying that patch. And for good measure did the same
again right now.

- Kevin, with netboot firmware, booting into Debian+console only
- Bob, with stock firmware, booting into Debian+KDE Plasma
- Scarlet, with stock firmware, booting into Debian+KDE Plasma

With some random number of reboot and shutdowns on each I didn't
see any warnings at all.

> I plan to submit a revert which I hope can go to 4.20 as well as
> -stable. I'd hope the remove()/shutdown() paths should be fixed before
> this gets applied again, and that it does not get shipped to -stable
> kernels.

But judging by the fact that the warning indicates that somthing is still
holding onto a framebuffer and a rmmod rockchipdrm is not possible
at runrtime for likely the same reason, I guess we really might be creating
a problem with that shutdown.

Can you maybe give "drm/rockchip: shutdown drm subsystem on shutdown" [2]
a try? When the underlying issue of rebooting surfaced we had 2 competing 
solutions, so we at least don't reopen the issue, that people have problems
rebooting?

drm-drivers seem to be short on shutdown handlers to peek at but this
second variant mimics what tinydrm does in its shutdown
(calling drm_atomic_helper_shutdown), so at least shouldn't be completely
wrong.


[2] https://patchwork.kernel.org/patch/10556151/


Heiko


> [1] Technically Scarlet needed a few patches from -next to work at all,
>     but Kevin is a similar platform that has been working for several
>     releases.
> 
> >  drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> > b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c index
> > f814d37b1db2..05368fa4f956 100644
> > --- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> > +++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> > @@ -442,6 +442,11 @@ static int rockchip_drm_platform_remove(struct
> > platform_device *pdev)> 
> >  	return 0;
> >  
> >  }
> > 
> > +static void rockchip_drm_platform_shutdown(struct platform_device *pdev)
> > +{
> > +	rockchip_drm_platform_remove(pdev);
> > +}
> > +
> > 
> >  static const struct of_device_id rockchip_drm_dt_ids[] = {
> >  
> >  	{ .compatible = "rockchip,display-subsystem", },
> >  	{ /* sentinel */ },
> > 
> > @@ -451,6 +456,7 @@ MODULE_DEVICE_TABLE(of, rockchip_drm_dt_ids);
> > 
> >  static struct platform_driver rockchip_drm_platform_driver = {
> >  
> >  	.probe = rockchip_drm_platform_probe,
> >  	.remove = rockchip_drm_platform_remove,
> > 
> > +	.shutdown = rockchip_drm_platform_shutdown,
> > 
> >  	.driver = {
> >  	
> >  		.name = "rockchip-drm",
> >  		.of_match_table = rockchip_drm_dt_ids,
Marc Zyngier Dec. 5, 2018, 2:28 p.m. UTC | #4
Hi all,

On 05/12/2018 14:11, Heiko Stübner wrote:
> Hi Brian,
> 
> Am Mittwoch, 5. Dezember 2018, 04:01:34 CET schrieb Brian Norris:
>> + others
>>
>> Hi,
>>
>> On Sun, Aug 05, 2018 at 01:48:07PM +0100, Marc Zyngier wrote:
>>> Leaving the DRM driver enabled on reboot or kexec has the annoying
>>> effect of leaving the display generating transactions whilst the
>>> IOMMU has been shut down.
>>>
>>> In turn, the IOMMU driver (which shares its interrupt line with
>>> the VOP) starts warning either on shutdown or when entering the
>>> secondary kernel in the kexec case (nothing is expected on that
>>> front).
>>>
>>> A cheap way of ensuring that things are nicely shut down is to
>>> register a shutdown callback in the platform driver.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>
>> This patch made it into 4.20-rc1 as well as -stable, and it has caused
>> regressions for me, on the Kevin and Scarlet [1] RK3399 platforms.
>>
>> On
>> shutdown/reboot, I see this:
>>
>> [   94.742559] WARNING: CPU: 4 PID: 2035 at
>> drivers/gpu/drm/drm_mode_config.c:477 drm_mode_config_cleanup+0x1c4/0x294
>> ...
>> [   94.775904] CPU: 4 PID: 2035 Comm: reboot Tainted: G        W        
>> 4.20.0-rc5+ #83 [   94.784651] Hardware name: Google Scarlet (DT)
>> [   94.789611] pstate: 20000005 (nzCv daif -PAN -UAO)
>> [   94.794959] pc : drm_mode_config_cleanup+0x1c4/0x294
>> [   94.800500] lr : drm_mode_config_cleanup+0x108/0x294
>> ...
>> [   94.898683] Call trace:
>> [   94.901410]  drm_mode_config_cleanup+0x1c4/0x294
>> [   94.906565]  rockchip_drm_unbind+0x4c/0x8c
>> [   94.911138]  component_master_del+0x88/0xb8
>> [   94.915807]  rockchip_drm_platform_remove+0x2c/0x44
>> [   94.921243]  rockchip_drm_platform_shutdown+0x20/0x2c
>> [   94.926881]  platform_drv_shutdown+0x2c/0x38
>> [   94.931647]  device_shutdown+0x164/0x1b8
>> [   94.936016]  kernel_restart_prepare+0x40/0x48
>> [   94.940878]  kernel_restart+0x20/0x68
>> [   94.944964]  __se_sys_reboot+0x1ac/0x204
>> [   94.949331]  __arm64_sys_reboot+0x2c/0x38
>> [   94.953806]  el0_svc_common+0xa4/0xec
>> [   94.957891]  el0_svc_compat_handler+0x30/0x3c
>> [   94.962753]  el0_svc_compat+0x8/0x18
>> [   94.966740] ---[ end trace b9ba2e701f4fb233 ]---
>> [   95.255169] Memory manager not clean during takedown.
>> [   95.260824] WARNING: CPU: 4 PID: 2035 at drivers/gpu/drm/drm_mm.c:950
>> drm_mm_takedown+0x34/0x44 ...
>> [   95.292314] CPU: 4 PID: 2035 Comm: reboot Tainted: G        W        
>> 4.20.0-rc5+ #83 [   95.301061] Hardware name: Google Scarlet (DT)
>> [   95.306020] pstate: 60000005 (nZCv daif -PAN -UAO)
>> [   95.311369] pc : drm_mm_takedown+0x34/0x44
>> [   95.315940] lr : drm_mm_takedown+0x34/0x44
>> ...
>> [   95.415857]  drm_mm_takedown+0x34/0x44
>> [   95.420042]  rockchip_drm_unbind+0x64/0x8c
>> [   95.424613]  component_master_del+0x88/0xb8
>> [   95.429283]  rockchip_drm_platform_remove+0x2c/0x44
>> [   95.434728]  rockchip_drm_platform_shutdown+0x20/0x2c
>> [   95.440360]  platform_drv_shutdown+0x2c/0x38
>> [   95.445127]  device_shutdown+0x164/0x1b8
>> [   95.449504]  kernel_restart_prepare+0x40/0x48
>> [   95.454358]  kernel_restart+0x20/0x68
>> [   95.458436]  __se_sys_reboot+0x1ac/0x204
>> [   95.462812]  __arm64_sys_reboot+0x2c/0x38
>> [   95.467287]  el0_svc_common+0xa4/0xec
>> [   95.471373]  el0_svc_compat_handler+0x30/0x3c
>> [   95.476235]  el0_svc_compat+0x8/0x18
>> [   95.480215] ---[ end trace b9ba2e701f4fb234 ]---
>>
>> It's especially bad on -stable kernels, where I believe the remove()
>> paths were even worse. This triggers a variety of OOPSes, and it's not
>> clear if those are simply because of backports (e.g., RK3399 did not
>> have support in 4.4.y, but our downstream has merged all sorts of
>> backports to make it work).
>>
>> Anyway, the above warnings occur on v4.20-rc, which I think is
>> justification enough for a revert.
> 
> That's strange. I remember testing quite a number of shutdown/reboot
> cycles before applying that patch. And for good measure did the same
> again right now.
> 
> - Kevin, with netboot firmware, booting into Debian+console only
> - Bob, with stock firmware, booting into Debian+KDE Plasma
> - Scarlet, with stock firmware, booting into Debian+KDE Plasma
> 
> With some random number of reboot and shutdowns on each I didn't
> see any warnings at all.

And I've been using this very patch for quite a while now.

Before suggesting a revert, I'd rather we understand what is going on,
and why is the DRM layer crapping itself that badly for a legitimate
operation (it is certainly better to have a shutdown than to let the VOP
scan out crap once the IOMMU has been shut down). In short, don't shoot
the messenger.

> 
>> I plan to submit a revert which I hope can go to 4.20 as well as
>> -stable. I'd hope the remove()/shutdown() paths should be fixed before
>> this gets applied again, and that it does not get shipped to -stable
>> kernels.
> 
> But judging by the fact that the warning indicates that somthing is still
> holding onto a framebuffer and a rmmod rockchipdrm is not possible
> at runrtime for likely the same reason, I guess we really might be creating
> a problem with that shutdown.

That's a potential root cause.

> 
> Can you maybe give "drm/rockchip: shutdown drm subsystem on shutdown" [2]
> a try? When the underlying issue of rebooting surfaced we had 2 competing 
> solutions, so we at least don't reopen the issue, that people have problems
> rebooting?

kexec working is certainly something I need. And I'd like to understand
why Brian sees this and nobody else.

Thanks,

	M.
Brian Norris Dec. 5, 2018, 5:42 p.m. UTC | #5
Hi,

On Wed, Dec 05, 2018 at 02:28:48PM +0000, Marc Zyngier wrote:
> On 05/12/2018 14:11, Heiko Stübner wrote:
> > Am Mittwoch, 5. Dezember 2018, 04:01:34 CET schrieb Brian Norris:
> >> On Sun, Aug 05, 2018 at 01:48:07PM +0100, Marc Zyngier wrote:
> >>> Leaving the DRM driver enabled on reboot or kexec has the annoying
> >>> effect of leaving the display generating transactions whilst the
> >>> IOMMU has been shut down.
> >>>
> >>> In turn, the IOMMU driver (which shares its interrupt line with
> >>> the VOP) starts warning either on shutdown or when entering the
> >>> secondary kernel in the kexec case (nothing is expected on that
> >>> front).
> >>>
> >>> A cheap way of ensuring that things are nicely shut down is to
> >>> register a shutdown callback in the platform driver.
> >>>
> >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >>> ---
> >>
> >> This patch made it into 4.20-rc1 as well as -stable, and it has caused
> >> regressions for me, on the Kevin and Scarlet [1] RK3399 platforms.
> >>
> >> On
> >> shutdown/reboot, I see this:
> >>
> >> [   94.742559] WARNING: CPU: 4 PID: 2035 at
> >> drivers/gpu/drm/drm_mode_config.c:477 drm_mode_config_cleanup+0x1c4/0x294
> >> ...
...
> >> Anyway, the above warnings occur on v4.20-rc, which I think is
> >> justification enough for a revert.
> > 
> > That's strange. I remember testing quite a number of shutdown/reboot
> > cycles before applying that patch. And for good measure did the same
> > again right now.
> > 
> > - Kevin, with netboot firmware, booting into Debian+console only
> > - Bob, with stock firmware, booting into Debian+KDE Plasma
> > - Scarlet, with stock firmware, booting into Debian+KDE Plasma
> > 
> > With some random number of reboot and shutdowns on each I didn't
> > see any warnings at all.
> 
> And I've been using this very patch for quite a while now.
> 
> Before suggesting a revert, I'd rather we understand what is going on,
> and why is the DRM layer crapping itself that badly for a legitimate
> operation (it is certainly better to have a shutdown than to let the VOP
> scan out crap once the IOMMU has been shut down). In short, don't shoot
> the messenger.

I honestly don't know much at all about DRM. But I do see this problem
on 4.19.y also (and probably 4.14.y), now that this patch was included
there.

I'm fine with trying to "fix forward" in mainline, but unfortunately,
it's usually quite difficult to get Greg to drop things from -stable,
especially when the regression is already pushed to a release. That's
why I'd propose a revert first, which can be sent back to -stable while
things are figured out.

I'm also willing to test any updates, if you have better suggestions.

> >> I plan to submit a revert which I hope can go to 4.20 as well as
> >> -stable. I'd hope the remove()/shutdown() paths should be fixed before
> >> this gets applied again, and that it does not get shipped to -stable
> >> kernels.
> > 
> > But judging by the fact that the warning indicates that somthing is still
> > holding onto a framebuffer and a rmmod rockchipdrm is not possible
> > at runrtime for likely the same reason, I guess we really might be creating
> > a problem with that shutdown.
> 
> That's a potential root cause.
> 
> > 
> > Can you maybe give "drm/rockchip: shutdown drm subsystem on shutdown" [2]
> > a try? When the underlying issue of rebooting surfaced we had 2 competing 
> > solutions, so we at least don't reopen the issue, that people have problems
> > rebooting?

I'll try to give that a spin.

> kexec working is certainly something I need. And I'd like to understand
> why Brian sees this and nobody else.

For one, I'm actually running Chrome OS. My tests currently don't have
the full Chrome UI working, since Chrome OS has some basic graphics API
requirements and there's no Mali GPU driver upstream (so I get relegated
to our splash screen and console manager, frecon, instead). But some
people have a software-rendered llvmpipe backend working, and they
likely would see the same problem.

Maybe common Linux distros treat "no GPU" too simplistically and don't
really exercise the DRM framework much. I dunno.

Brian
Brian Norris Dec. 5, 2018, 6 p.m. UTC | #6
On Wed, Dec 05, 2018 at 03:11:03PM +0100, Heiko Stuebner wrote:
> Can you maybe give "drm/rockchip: shutdown drm subsystem on shutdown" [2]
> a try? When the underlying issue of rebooting surfaced we had 2 competing 
> solutions, so we at least don't reopen the issue, that people have problems
> rebooting?
> 
> drm-drivers seem to be short on shutdown handlers to peek at but this
> second variant mimics what tinydrm does in its shutdown
> (calling drm_atomic_helper_shutdown), so at least shouldn't be completely
> wrong.
> 
> 
> [2] https://patchwork.kernel.org/patch/10556151/

I tried the linked patch on Kevin and Scarlet with 4.20-rc5-ish, with
$subject reverted; I saw no warnings/complaints, and things worked OK.
Same with Kevin on 4.19.6.

I can't test on 4.4.y, since drm_atomic_helper_shutdown() doesn't exist
there.

I can provide a Tested-by to the original patch if you'd like. (I won't
provide it here, lest patchwork apply it to the $subject patch.)

Brian
diff mbox series

Patch

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
index f814d37b1db2..05368fa4f956 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
@@ -442,6 +442,11 @@  static int rockchip_drm_platform_remove(struct platform_device *pdev)
 	return 0;
 }
 
+static void rockchip_drm_platform_shutdown(struct platform_device *pdev)
+{
+	rockchip_drm_platform_remove(pdev);
+}
+
 static const struct of_device_id rockchip_drm_dt_ids[] = {
 	{ .compatible = "rockchip,display-subsystem", },
 	{ /* sentinel */ },
@@ -451,6 +456,7 @@  MODULE_DEVICE_TABLE(of, rockchip_drm_dt_ids);
 static struct platform_driver rockchip_drm_platform_driver = {
 	.probe = rockchip_drm_platform_probe,
 	.remove = rockchip_drm_platform_remove,
+	.shutdown = rockchip_drm_platform_shutdown,
 	.driver = {
 		.name = "rockchip-drm",
 		.of_match_table = rockchip_drm_dt_ids,