[9/9] drm: Turn off crtc before tearing down its data structure
diff mbox

Message ID e83cf2e628a8e0299e029e7a1c3d4f183ce1a2af.1464103767.git.lukas@wunner.de
State New
Headers show

Commit Message

Lukas Wunner May 24, 2016, 4:03 p.m. UTC
When a drm_crtc structure is destroyed with drm_crtc_cleanup(), the DRM
core does not turn off the crtc first and neither do the drivers. With
nouveau, radeon and amdgpu, this causes a runtime pm ref to be leaked on
driver unload if at least one crtc was enabled.

(See usage of have_disp_power_ref in nouveau_crtc_set_config(),
radeon_crtc_set_config() and amdgpu_crtc_set_config()).

Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
Cc: Dave Airlie <airlied@redhat.com>
Tested-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/gpu/drm/drm_crtc.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Daniel Vetter May 24, 2016, 9:30 p.m. UTC | #1
On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> When a drm_crtc structure is destroyed with drm_crtc_cleanup(), the DRM
> core does not turn off the crtc first and neither do the drivers. With
> nouveau, radeon and amdgpu, this causes a runtime pm ref to be leaked on
> driver unload if at least one crtc was enabled.
> 
> (See usage of have_disp_power_ref in nouveau_crtc_set_config(),
> radeon_crtc_set_config() and amdgpu_crtc_set_config()).
> 
> Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> Cc: Dave Airlie <airlied@redhat.com>
> Tested-by: Karol Herbst <nouveau@karolherbst.de>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

This is a core regression, we fixed it again. Previously when unreference
drm_planes the core made sure that it's not longer in use, which had the
side effect of shutting everything off in module unload.

For a bunch of reasons we've stopped doing that, but that turned out to be
a mistake. It's fixed since

commit f2d580b9a8149735cbc4b59c4a8df60173658140
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date:   Wed May 4 14:38:26 2016 +0200

    drm/core: Do not preserve framebuffer on rmfb, v4.

Your patch shouldn't be needed with that any more. If it still is it's
most likely the fbdev cleanup done too late, but you /should/ get a big
WARNING splat in that case from drm_mode_config_cleanup().
-Daniel

> ---
>  drivers/gpu/drm/drm_crtc.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index d2a6d95..0cd6f00 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -716,12 +716,23 @@ EXPORT_SYMBOL(drm_crtc_init_with_planes);
>   *
>   * This function cleans up @crtc and removes it from the DRM mode setting
>   * core. Note that the function does *not* free the crtc structure itself,
> - * this is the responsibility of the caller.
> + * this is the responsibility of the caller. If @crtc is currently enabled,
> + * it is turned off first.
>   */
>  void drm_crtc_cleanup(struct drm_crtc *crtc)
>  {
>  	struct drm_device *dev = crtc->dev;
>  
> +	if (crtc->enabled) {
> +		struct drm_mode_set modeset = {
> +			.crtc = crtc,
> +		};
> +
> +		drm_modeset_lock_all(dev);
> +		drm_mode_set_config_internal(&modeset);
> +		drm_modeset_unlock_all(dev);
> +	}
> +
>  	kfree(crtc->gamma_store);
>  	crtc->gamma_store = NULL;
>  
> -- 
> 2.8.1
> 
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
Lukas Wunner May 24, 2016, 10:07 p.m. UTC | #2
Good evening Daniel,

On Tue, May 24, 2016 at 11:30:42PM +0200, Daniel Vetter wrote:
> On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> > When a drm_crtc structure is destroyed with drm_crtc_cleanup(), the DRM
> > core does not turn off the crtc first and neither do the drivers. With
> > nouveau, radeon and amdgpu, this causes a runtime pm ref to be leaked on
> > driver unload if at least one crtc was enabled.
> > 
> > (See usage of have_disp_power_ref in nouveau_crtc_set_config(),
> > radeon_crtc_set_config() and amdgpu_crtc_set_config()).
> > 
> > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> > Cc: Dave Airlie <airlied@redhat.com>
> > Tested-by: Karol Herbst <nouveau@karolherbst.de>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> 
> This is a core regression, we fixed it again. Previously when unreference
> drm_planes the core made sure that it's not longer in use, which had the
> side effect of shutting everything off in module unload.
> 
> For a bunch of reasons we've stopped doing that, but that turned out to be
> a mistake. It's fixed since
> 
> commit f2d580b9a8149735cbc4b59c4a8df60173658140
> Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Date:   Wed May 4 14:38:26 2016 +0200
> 
>     drm/core: Do not preserve framebuffer on rmfb, v4.

Okay, I will test it.

May I ask you a question while you have this topic swapped into your
brain: nouveau, radeon and amdgpu currently hold one runtime pm ref
if any crtc is turned on. I'm pondering how to make this work for
muxed dual GPU laptops. When switching GPUs, the now inactive GPU
should turn off the crtc used by the panel to save power (if it's
*only* used by the panel) and release that runtime pm ref. Likewise,
the now active GPU needs to turn on the crtc used by the panel and
take a runtime pm ref.

The whole thing becomes a bit complicated because MacBook Pros with
Thunderbolt can only drive external displays with their discrete GPU.
So when switching to the integrated GPU, the discrete GPU may turn
off the crtc for the panel but the crtc for the external display needs
to stay alive if it's in use and the GPU may not suspend.

What I have in mind is to change the scheme nouveau/radeon/amdgpu are
currently using by taking a runtime pm ref when enabling a crtc and
releasing it when disabling the crtc. This is different from the status
quo where only a *single* runtime pm ref is taken if *any* crtc is enabled.

Upon switching, the runtime pm ref for the crtc previously used by the
panel is released and the crtc should be turned off. If it was the only
active crtc the GPU automatically goes to sleep.

I'm thinking about putting the pm_runtime_get() and pm_runtime_put()
in the DRM core. (Actually I already have a commit to do just that in
my local repo.) That way we're using less code in nouveau/radeon/amdgpu
because we're doing the runtime pm handling centrally in the core.
I'm wondering if that would impact other DRM drivers negatively.

Basically the idea is to harmonize runtime pm handling among DRM drivers.
What do you think about that?

Thanks,

Lukas

> 
> Your patch shouldn't be needed with that any more. If it still is it's
> most likely the fbdev cleanup done too late, but you /should/ get a big
> WARNING splat in that case from drm_mode_config_cleanup().
> -Daniel
> 
> > ---
> >  drivers/gpu/drm/drm_crtc.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> > index d2a6d95..0cd6f00 100644
> > --- a/drivers/gpu/drm/drm_crtc.c
> > +++ b/drivers/gpu/drm/drm_crtc.c
> > @@ -716,12 +716,23 @@ EXPORT_SYMBOL(drm_crtc_init_with_planes);
> >   *
> >   * This function cleans up @crtc and removes it from the DRM mode setting
> >   * core. Note that the function does *not* free the crtc structure itself,
> > - * this is the responsibility of the caller.
> > + * this is the responsibility of the caller. If @crtc is currently enabled,
> > + * it is turned off first.
> >   */
> >  void drm_crtc_cleanup(struct drm_crtc *crtc)
> >  {
> >  	struct drm_device *dev = crtc->dev;
> >  
> > +	if (crtc->enabled) {
> > +		struct drm_mode_set modeset = {
> > +			.crtc = crtc,
> > +		};
> > +
> > +		drm_modeset_lock_all(dev);
> > +		drm_mode_set_config_internal(&modeset);
> > +		drm_modeset_unlock_all(dev);
> > +	}
> > +
> >  	kfree(crtc->gamma_store);
> >  	crtc->gamma_store = NULL;
> >  
> > -- 
> > 2.8.1
> > 
> > _______________________________________________
> > Nouveau mailing list
> > Nouveau@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/nouveau
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter May 24, 2016, 10:30 p.m. UTC | #3
On Wed, May 25, 2016 at 12:07 AM, Lukas Wunner <lukas@wunner.de> wrote:
> Good evening Daniel,
>
> On Tue, May 24, 2016 at 11:30:42PM +0200, Daniel Vetter wrote:
>> On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
>> > When a drm_crtc structure is destroyed with drm_crtc_cleanup(), the DRM
>> > core does not turn off the crtc first and neither do the drivers. With
>> > nouveau, radeon and amdgpu, this causes a runtime pm ref to be leaked on
>> > driver unload if at least one crtc was enabled.
>> >
>> > (See usage of have_disp_power_ref in nouveau_crtc_set_config(),
>> > radeon_crtc_set_config() and amdgpu_crtc_set_config()).
>> >
>> > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
>> > Cc: Dave Airlie <airlied@redhat.com>
>> > Tested-by: Karol Herbst <nouveau@karolherbst.de>
>> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
>>
>> This is a core regression, we fixed it again. Previously when unreference
>> drm_planes the core made sure that it's not longer in use, which had the
>> side effect of shutting everything off in module unload.
>>
>> For a bunch of reasons we've stopped doing that, but that turned out to be
>> a mistake. It's fixed since
>>
>> commit f2d580b9a8149735cbc4b59c4a8df60173658140
>> Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Date:   Wed May 4 14:38:26 2016 +0200
>>
>>     drm/core: Do not preserve framebuffer on rmfb, v4.
>
> Okay, I will test it.
>
> May I ask you a question while you have this topic swapped into your
> brain: nouveau, radeon and amdgpu currently hold one runtime pm ref
> if any crtc is turned on. I'm pondering how to make this work for
> muxed dual GPU laptops. When switching GPUs, the now inactive GPU
> should turn off the crtc used by the panel to save power (if it's
> *only* used by the panel) and release that runtime pm ref. Likewise,
> the now active GPU needs to turn on the crtc used by the panel and
> take a runtime pm ref.
>
> The whole thing becomes a bit complicated because MacBook Pros with
> Thunderbolt can only drive external displays with their discrete GPU.
> So when switching to the integrated GPU, the discrete GPU may turn
> off the crtc for the panel but the crtc for the external display needs
> to stay alive if it's in use and the GPU may not suspend.
>
> What I have in mind is to change the scheme nouveau/radeon/amdgpu are
> currently using by taking a runtime pm ref when enabling a crtc and
> releasing it when disabling the crtc. This is different from the status
> quo where only a *single* runtime pm ref is taken if *any* crtc is enabled.
>
> Upon switching, the runtime pm ref for the crtc previously used by the
> panel is released and the crtc should be turned off. If it was the only
> active crtc the GPU automatically goes to sleep.
>
> I'm thinking about putting the pm_runtime_get() and pm_runtime_put()
> in the DRM core. (Actually I already have a commit to do just that in
> my local repo.) That way we're using less code in nouveau/radeon/amdgpu
> because we're doing the runtime pm handling centrally in the core.
> I'm wondering if that would impact other DRM drivers negatively.
>
> Basically the idea is to harmonize runtime pm handling among DRM drivers.
> What do you think about that?

Great idea and should work well with atomic helpers. Kerneldoc even
explains the suggested way to do it, but doesn't go into all details
since e.g. on arm-soc you might have a platform device for each crtc
and each encoder. Lots of the arm drivers do full-blown runtime pm
with atomic, so there's plenty of examples.

Will be fireworks show with legacy drivers (and hence amdgpu&nouveau)
unfortunately because with legacy crtc helpers the ordering of crtc
enable/disable isn't as well-defined, and you might end up accessing
hw without an rpm reference. Maybe possible with a lot of swearing and
some hacks, but "make rpm easy" was a big reason for an entirely
revamped helper design for atomic (among a few other reasons ofc).

Putting rpm get/put calls into the drm core otoh is a complete no-go,
that's a perfect example of the midlayer mistake. The long-term goal
is to entirely decouple the drm core from underlying devices. All the
new arm drivers don't even use the drm_platform.c stuff, it's just
that there's a big pile of existing drivers which will be somewhat
painful to convert. And for pci it's probably impossible due to old
crap like dri1 and agp :(

Cheers, Daniel


>
> Thanks,
>
> Lukas
>
>>
>> Your patch shouldn't be needed with that any more. If it still is it's
>> most likely the fbdev cleanup done too late, but you /should/ get a big
>> WARNING splat in that case from drm_mode_config_cleanup().
>> -Daniel
>>
>> > ---
>> >  drivers/gpu/drm/drm_crtc.c | 13 ++++++++++++-
>> >  1 file changed, 12 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
>> > index d2a6d95..0cd6f00 100644
>> > --- a/drivers/gpu/drm/drm_crtc.c
>> > +++ b/drivers/gpu/drm/drm_crtc.c
>> > @@ -716,12 +716,23 @@ EXPORT_SYMBOL(drm_crtc_init_with_planes);
>> >   *
>> >   * This function cleans up @crtc and removes it from the DRM mode setting
>> >   * core. Note that the function does *not* free the crtc structure itself,
>> > - * this is the responsibility of the caller.
>> > + * this is the responsibility of the caller. If @crtc is currently enabled,
>> > + * it is turned off first.
>> >   */
>> >  void drm_crtc_cleanup(struct drm_crtc *crtc)
>> >  {
>> >     struct drm_device *dev = crtc->dev;
>> >
>> > +   if (crtc->enabled) {
>> > +           struct drm_mode_set modeset = {
>> > +                   .crtc = crtc,
>> > +           };
>> > +
>> > +           drm_modeset_lock_all(dev);
>> > +           drm_mode_set_config_internal(&modeset);
>> > +           drm_modeset_unlock_all(dev);
>> > +   }
>> > +
>> >     kfree(crtc->gamma_store);
>> >     crtc->gamma_store = NULL;
>> >
>> > --
>> > 2.8.1
>> >
>> > _______________________________________________
>> > Nouveau mailing list
>> > Nouveau@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/nouveau
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
Lukas Wunner May 25, 2016, 10:51 a.m. UTC | #4
Hi Daniel,

On Tue, May 24, 2016 at 11:30:42PM +0200, Daniel Vetter wrote:
> On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> > When a drm_crtc structure is destroyed with drm_crtc_cleanup(), the DRM
> > core does not turn off the crtc first and neither do the drivers. With
> > nouveau, radeon and amdgpu, this causes a runtime pm ref to be leaked on
> > driver unload if at least one crtc was enabled.
> > 
> > (See usage of have_disp_power_ref in nouveau_crtc_set_config(),
> > radeon_crtc_set_config() and amdgpu_crtc_set_config()).
> > 
> > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> > Cc: Dave Airlie <airlied@redhat.com>
> > Tested-by: Karol Herbst <nouveau@karolherbst.de>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> 
> This is a core regression, we fixed it again. Previously when unreference
> drm_planes the core made sure that it's not longer in use, which had the
> side effect of shutting everything off in module unload.
> 
> For a bunch of reasons we've stopped doing that, but that turned out to be
> a mistake. It's fixed since
> 
> commit f2d580b9a8149735cbc4b59c4a8df60173658140
> Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Date:   Wed May 4 14:38:26 2016 +0200
> 
>     drm/core: Do not preserve framebuffer on rmfb, v4.
> 
> Your patch shouldn't be needed with that any more. If it still is it's
> most likely the fbdev cleanup done too late, but you /should/ get a big
> WARNING splat in that case from drm_mode_config_cleanup().

I tested it and at least with nouveau, the above-mentioned commit does *not*
solve the issue, so patch [9/9] of this series is still needed. I do not get
a WARN splat when unloading nouveau.

Best regards,

Lukas

> 
> > ---
> >  drivers/gpu/drm/drm_crtc.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> > index d2a6d95..0cd6f00 100644
> > --- a/drivers/gpu/drm/drm_crtc.c
> > +++ b/drivers/gpu/drm/drm_crtc.c
> > @@ -716,12 +716,23 @@ EXPORT_SYMBOL(drm_crtc_init_with_planes);
> >   *
> >   * This function cleans up @crtc and removes it from the DRM mode setting
> >   * core. Note that the function does *not* free the crtc structure itself,
> > - * this is the responsibility of the caller.
> > + * this is the responsibility of the caller. If @crtc is currently enabled,
> > + * it is turned off first.
> >   */
> >  void drm_crtc_cleanup(struct drm_crtc *crtc)
> >  {
> >  	struct drm_device *dev = crtc->dev;
> >  
> > +	if (crtc->enabled) {
> > +		struct drm_mode_set modeset = {
> > +			.crtc = crtc,
> > +		};
> > +
> > +		drm_modeset_lock_all(dev);
> > +		drm_mode_set_config_internal(&modeset);
> > +		drm_modeset_unlock_all(dev);
> > +	}
> > +
> >  	kfree(crtc->gamma_store);
> >  	crtc->gamma_store = NULL;
> >  
> > -- 
> > 2.8.1
> > 
> > _______________________________________________
> > Nouveau mailing list
> > Nouveau@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/nouveau
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter May 25, 2016, 1:43 p.m. UTC | #5
On Wed, May 25, 2016 at 12:51 PM, Lukas Wunner <lukas@wunner.de> wrote:
>
> On Tue, May 24, 2016 at 11:30:42PM +0200, Daniel Vetter wrote:
>> On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
>> > When a drm_crtc structure is destroyed with drm_crtc_cleanup(), the DRM
>> > core does not turn off the crtc first and neither do the drivers. With
>> > nouveau, radeon and amdgpu, this causes a runtime pm ref to be leaked on
>> > driver unload if at least one crtc was enabled.
>> >
>> > (See usage of have_disp_power_ref in nouveau_crtc_set_config(),
>> > radeon_crtc_set_config() and amdgpu_crtc_set_config()).
>> >
>> > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
>> > Cc: Dave Airlie <airlied@redhat.com>
>> > Tested-by: Karol Herbst <nouveau@karolherbst.de>
>> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
>>
>> This is a core regression, we fixed it again. Previously when unreference
>> drm_planes the core made sure that it's not longer in use, which had the
>> side effect of shutting everything off in module unload.
>>
>> For a bunch of reasons we've stopped doing that, but that turned out to be
>> a mistake. It's fixed since
>>
>> commit f2d580b9a8149735cbc4b59c4a8df60173658140
>> Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Date:   Wed May 4 14:38:26 2016 +0200
>>
>>     drm/core: Do not preserve framebuffer on rmfb, v4.
>>
>> Your patch shouldn't be needed with that any more. If it still is it's
>> most likely the fbdev cleanup done too late, but you /should/ get a big
>> WARNING splat in that case from drm_mode_config_cleanup().
>
> I tested it and at least with nouveau, the above-mentioned commit does *not*
> solve the issue, so patch [9/9] of this series is still needed. I do not get
> a WARN splat when unloading nouveau.

With legacy kms the only way to keep a crtc enabled is to display a
drm_framebuffer on it. And drm_mode_config_cleanup has a WARN_ON if
framebuffers are left behind. There's a bunch of options:
- nouveau somehow manages to keep the crtc on without a framebuffer
- nouveau somehow leaks a drm_framebuffer, but removes it from the fb_list
- something else

There's still no need to forcefully shut down crtc at cleanup time in
the core, this is still a driver bug. So yes your patch might be
needed, but it's not the right fix.
-Daniel
Lukas Wunner June 1, 2016, 12:36 p.m. UTC | #6
On Wed, May 25, 2016 at 03:43:42PM +0200, Daniel Vetter wrote:
> On Wed, May 25, 2016 at 12:51 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > On Tue, May 24, 2016 at 11:30:42PM +0200, Daniel Vetter wrote:
> >> On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> >> > When a drm_crtc structure is destroyed with drm_crtc_cleanup(), the DRM
> >> > core does not turn off the crtc first and neither do the drivers. With
> >> > nouveau, radeon and amdgpu, this causes a runtime pm ref to be leaked on
> >> > driver unload if at least one crtc was enabled.
> >> >
> >> > (See usage of have_disp_power_ref in nouveau_crtc_set_config(),
> >> > radeon_crtc_set_config() and amdgpu_crtc_set_config()).
> >> >
> >> > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> >> > Cc: Dave Airlie <airlied@redhat.com>
> >> > Tested-by: Karol Herbst <nouveau@karolherbst.de>
> >> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> >>
> >> This is a core regression, we fixed it again. Previously when unreference
> >> drm_planes the core made sure that it's not longer in use, which had the
> >> side effect of shutting everything off in module unload.
> >>
> >> For a bunch of reasons we've stopped doing that, but that turned out to be
> >> a mistake. It's fixed since
> >>
> >> commit f2d580b9a8149735cbc4b59c4a8df60173658140
> >> Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >> Date:   Wed May 4 14:38:26 2016 +0200
> >>
> >>     drm/core: Do not preserve framebuffer on rmfb, v4.
> >>
> >> Your patch shouldn't be needed with that any more. If it still is it's
> >> most likely the fbdev cleanup done too late, but you /should/ get a big
> >> WARNING splat in that case from drm_mode_config_cleanup().
> >
> > I tested it and at least with nouveau, the above-mentioned commit does
> > *not* solve the issue, so patch [9/9] of this series is still needed.
> > I do not get a WARN splat when unloading nouveau.
> 
> With legacy kms the only way to keep a crtc enabled is to display a
> drm_framebuffer on it. And drm_mode_config_cleanup has a WARN_ON if
> framebuffers are left behind. There's a bunch of options:
> - nouveau somehow manages to keep the crtc on without a framebuffer
> - nouveau somehow leaks a drm_framebuffer, but removes it from the fb_list
> - something else

Found it. nouveau_fbcon_destroy() doesn't call drm_framebuffer_remove().
If I add that, the crtc gets properly disabled on unload.

It does call drm_framebuffer_cleanup(). That's why there was no WARN,
drm_mode_config_cleanup() only WARNs if a framebuffer was left on the
mode_config.fb_list.

radeon and amdgpu have the same problem. In fact there are very few
drivers that call drm_framebuffer_remove(): tegra, msm, exynos, omapdrm
and i915 (since Imre Deak's 9d6612516da0).

Should we add a WARN to prevent this? How about WARN_ON(crtc->enabled)
in drm_crtc_cleanup()?

Also, i915 calls drm_framebuffer_unregister_private() before it calls
drm_framebuffer_remove(). This ordering has the unfortunate side effect
that the drm_framebuffer has ID 0 in log messages emitted by
drm_framebuffer_remove():

[   39.680874] [drm:drm_mode_object_unreference] OBJ ID: 0 (3)
[   39.680878] [drm:drm_mode_object_unreference] OBJ ID: 0 (2)
[   39.680884] [drm:drm_mode_object_unreference] OBJ ID: 0 (1)

Best regards,

Lukas

> 
> There's still no need to forcefully shut down crtc at cleanup time in
> the core, this is still a driver bug. So yes your patch might be
> needed, but it's not the right fix.
> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter June 1, 2016, 2:40 p.m. UTC | #7
On Wed, Jun 01, 2016 at 02:36:41PM +0200, Lukas Wunner wrote:
> On Wed, May 25, 2016 at 03:43:42PM +0200, Daniel Vetter wrote:
> > On Wed, May 25, 2016 at 12:51 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > On Tue, May 24, 2016 at 11:30:42PM +0200, Daniel Vetter wrote:
> > >> On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> > >> > When a drm_crtc structure is destroyed with drm_crtc_cleanup(), the DRM
> > >> > core does not turn off the crtc first and neither do the drivers. With
> > >> > nouveau, radeon and amdgpu, this causes a runtime pm ref to be leaked on
> > >> > driver unload if at least one crtc was enabled.
> > >> >
> > >> > (See usage of have_disp_power_ref in nouveau_crtc_set_config(),
> > >> > radeon_crtc_set_config() and amdgpu_crtc_set_config()).
> > >> >
> > >> > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> > >> > Cc: Dave Airlie <airlied@redhat.com>
> > >> > Tested-by: Karol Herbst <nouveau@karolherbst.de>
> > >> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > >>
> > >> This is a core regression, we fixed it again. Previously when unreference
> > >> drm_planes the core made sure that it's not longer in use, which had the
> > >> side effect of shutting everything off in module unload.
> > >>
> > >> For a bunch of reasons we've stopped doing that, but that turned out to be
> > >> a mistake. It's fixed since
> > >>
> > >> commit f2d580b9a8149735cbc4b59c4a8df60173658140
> > >> Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > >> Date:   Wed May 4 14:38:26 2016 +0200
> > >>
> > >>     drm/core: Do not preserve framebuffer on rmfb, v4.
> > >>
> > >> Your patch shouldn't be needed with that any more. If it still is it's
> > >> most likely the fbdev cleanup done too late, but you /should/ get a big
> > >> WARNING splat in that case from drm_mode_config_cleanup().
> > >
> > > I tested it and at least with nouveau, the above-mentioned commit does
> > > *not* solve the issue, so patch [9/9] of this series is still needed.
> > > I do not get a WARN splat when unloading nouveau.
> > 
> > With legacy kms the only way to keep a crtc enabled is to display a
> > drm_framebuffer on it. And drm_mode_config_cleanup has a WARN_ON if
> > framebuffers are left behind. There's a bunch of options:
> > - nouveau somehow manages to keep the crtc on without a framebuffer
> > - nouveau somehow leaks a drm_framebuffer, but removes it from the fb_list
> > - something else
> 
> Found it. nouveau_fbcon_destroy() doesn't call drm_framebuffer_remove().
> If I add that, the crtc gets properly disabled on unload.
> 
> It does call drm_framebuffer_cleanup(). That's why there was no WARN,
> drm_mode_config_cleanup() only WARNs if a framebuffer was left on the
> mode_config.fb_list.
> 
> radeon and amdgpu have the same problem. In fact there are very few
> drivers that call drm_framebuffer_remove(): tegra, msm, exynos, omapdrm
> and i915 (since Imre Deak's 9d6612516da0).
> 
> Should we add a WARN to prevent this? How about WARN_ON(crtc->enabled)
> in drm_crtc_cleanup()?
> 
> Also, i915 calls drm_framebuffer_unregister_private() before it calls
> drm_framebuffer_remove(). This ordering has the unfortunate side effect
> that the drm_framebuffer has ID 0 in log messages emitted by
> drm_framebuffer_remove():
> 
> [   39.680874] [drm:drm_mode_object_unreference] OBJ ID: 0 (3)
> [   39.680878] [drm:drm_mode_object_unreference] OBJ ID: 0 (2)
> [   39.680884] [drm:drm_mode_object_unreference] OBJ ID: 0 (1)

Well we must first unregister it before we can remove it, so this is
unavoidable.

Wrt switching from _cleanup to _remove, iirc there was troubles with the
later calling into the fb->funcs->destroy hook. But many drivers have
their fbdev fb embedded into some struct (instead of a pointer like i915),
and then things go sideways badly. That's why you can't just blindly
replace them.
-Daniel
Lukas Wunner June 3, 2016, 7:30 a.m. UTC | #8
On Wed, Jun 01, 2016 at 04:40:12PM +0200, Daniel Vetter wrote:
> On Wed, Jun 01, 2016 at 02:36:41PM +0200, Lukas Wunner wrote:
> > On Wed, May 25, 2016 at 03:43:42PM +0200, Daniel Vetter wrote:
> > > On Wed, May 25, 2016 at 12:51 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > On Tue, May 24, 2016 at 11:30:42PM +0200, Daniel Vetter wrote:
> > > > > On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> > > > > > When a drm_crtc structure is destroyed with drm_crtc_cleanup(), the DRM
> > > > > > core does not turn off the crtc first and neither do the drivers. With
> > > > > > nouveau, radeon and amdgpu, this causes a runtime pm ref to be leaked on
> > > > > > driver unload if at least one crtc was enabled.
> > > > > >
> > > > > > (See usage of have_disp_power_ref in nouveau_crtc_set_config(),
> > > > > > radeon_crtc_set_config() and amdgpu_crtc_set_config()).
> > > > > >
> > > > > > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> > > > > > Cc: Dave Airlie <airlied@redhat.com>
> > > > > > Tested-by: Karol Herbst <nouveau@karolherbst.de>
> > > > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > 
> > > With legacy kms the only way to keep a crtc enabled is to display a
> > > drm_framebuffer on it. And drm_mode_config_cleanup has a WARN_ON if
> > > framebuffers are left behind. There's a bunch of options:
> > > - nouveau somehow manages to keep the crtc on without a framebuffer
> > > - nouveau somehow leaks a drm_framebuffer, but removes it from the fb_list
> > > - something else
> > 
> > Found it. nouveau_fbcon_destroy() doesn't call drm_framebuffer_remove().
> > If I add that, the crtc gets properly disabled on unload.
> > 
> > It does call drm_framebuffer_cleanup(). That's why there was no WARN,
> > drm_mode_config_cleanup() only WARNs if a framebuffer was left on the
> > mode_config.fb_list.
> > 
> > radeon and amdgpu have the same problem. In fact there are very few
> > drivers that call drm_framebuffer_remove(): tegra, msm, exynos, omapdrm
> > and i915 (since Imre Deak's 9d6612516da0).
> > 
> > Should we add a WARN to prevent this? How about WARN_ON(crtc->enabled)
> > in drm_crtc_cleanup()?
> > 
> > Also, i915 calls drm_framebuffer_unregister_private() before it calls
> > drm_framebuffer_remove(). This ordering has the unfortunate side effect
> > that the drm_framebuffer has ID 0 in log messages emitted by
> > drm_framebuffer_remove():
> > 
> > [   39.680874] [drm:drm_mode_object_unreference] OBJ ID: 0 (3)
> > [   39.680878] [drm:drm_mode_object_unreference] OBJ ID: 0 (2)
> > [   39.680884] [drm:drm_mode_object_unreference] OBJ ID: 0 (1)
> 
> Well we must first unregister it before we can remove it, so this is
> unavoidable.

Yes but drm_framebuffer_free() calls drm_mode_object_unregister()
and is invoked by drm_framebuffer_remove(), so the additional call to
drm_framebuffer_unregister_private() in intel_fbdev_destroy() seems
superfluous. Or is there some reason I'm missing that this needs to
be called before intel_unpin_fb_obj()?


> Wrt switching from _cleanup to _remove, iirc there was troubles with the
> later calling into the fb->funcs->destroy hook. But many drivers have
> their fbdev fb embedded into some struct (instead of a pointer like i915),
> and then things go sideways badly. That's why you can't just blindly
> replace them.

So the options seem to be:

(1) Refactor nouveau, radeon and amdgpu to not embed their framebuffer
    struct in their fbdev struct, so that drm_framebuffer_remove() can
    be used.

(2) Amend each of them to turn off crtcs which are using the fbdev
    framebuffer, duplicating the code in drm_framebuffer_remove().

(3) Split drm_framebuffer_remove(), move the portion to turn off crtcs
    into a separate helper, say, drm_framebuffer_deactivate(), call that
    from nouveau, radeon and amdgpu.

(4) Go back to square one and use patch [9/9] of this series.

Which one would be most preferred? Is there another solution I've missed?

Thanks,

Lukas
Daniel Vetter June 3, 2016, 6:21 p.m. UTC | #9
On Fri, Jun 03, 2016 at 09:30:06AM +0200, Lukas Wunner wrote:
> On Wed, Jun 01, 2016 at 04:40:12PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 01, 2016 at 02:36:41PM +0200, Lukas Wunner wrote:
> > > On Wed, May 25, 2016 at 03:43:42PM +0200, Daniel Vetter wrote:
> > > > On Wed, May 25, 2016 at 12:51 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > > On Tue, May 24, 2016 at 11:30:42PM +0200, Daniel Vetter wrote:
> > > > > > On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> > > > > > > When a drm_crtc structure is destroyed with drm_crtc_cleanup(), the DRM
> > > > > > > core does not turn off the crtc first and neither do the drivers. With
> > > > > > > nouveau, radeon and amdgpu, this causes a runtime pm ref to be leaked on
> > > > > > > driver unload if at least one crtc was enabled.
> > > > > > >
> > > > > > > (See usage of have_disp_power_ref in nouveau_crtc_set_config(),
> > > > > > > radeon_crtc_set_config() and amdgpu_crtc_set_config()).
> > > > > > >
> > > > > > > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> > > > > > > Cc: Dave Airlie <airlied@redhat.com>
> > > > > > > Tested-by: Karol Herbst <nouveau@karolherbst.de>
> > > > > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > > 
> > > > With legacy kms the only way to keep a crtc enabled is to display a
> > > > drm_framebuffer on it. And drm_mode_config_cleanup has a WARN_ON if
> > > > framebuffers are left behind. There's a bunch of options:
> > > > - nouveau somehow manages to keep the crtc on without a framebuffer
> > > > - nouveau somehow leaks a drm_framebuffer, but removes it from the fb_list
> > > > - something else
> > > 
> > > Found it. nouveau_fbcon_destroy() doesn't call drm_framebuffer_remove().
> > > If I add that, the crtc gets properly disabled on unload.
> > > 
> > > It does call drm_framebuffer_cleanup(). That's why there was no WARN,
> > > drm_mode_config_cleanup() only WARNs if a framebuffer was left on the
> > > mode_config.fb_list.
> > > 
> > > radeon and amdgpu have the same problem. In fact there are very few
> > > drivers that call drm_framebuffer_remove(): tegra, msm, exynos, omapdrm
> > > and i915 (since Imre Deak's 9d6612516da0).
> > > 
> > > Should we add a WARN to prevent this? How about WARN_ON(crtc->enabled)
> > > in drm_crtc_cleanup()?
> > > 
> > > Also, i915 calls drm_framebuffer_unregister_private() before it calls
> > > drm_framebuffer_remove(). This ordering has the unfortunate side effect
> > > that the drm_framebuffer has ID 0 in log messages emitted by
> > > drm_framebuffer_remove():
> > > 
> > > [   39.680874] [drm:drm_mode_object_unreference] OBJ ID: 0 (3)
> > > [   39.680878] [drm:drm_mode_object_unreference] OBJ ID: 0 (2)
> > > [   39.680884] [drm:drm_mode_object_unreference] OBJ ID: 0 (1)
> > 
> > Well we must first unregister it before we can remove it, so this is
> > unavoidable.
> 
> Yes but drm_framebuffer_free() calls drm_mode_object_unregister()
> and is invoked by drm_framebuffer_remove(), so the additional call to
> drm_framebuffer_unregister_private() in intel_fbdev_destroy() seems
> superfluous. Or is there some reason I'm missing that this needs to
> be called before intel_unpin_fb_obj()?
> 
> 
> > Wrt switching from _cleanup to _remove, iirc there was troubles with the
> > later calling into the fb->funcs->destroy hook. But many drivers have
> > their fbdev fb embedded into some struct (instead of a pointer like i915),
> > and then things go sideways badly. That's why you can't just blindly
> > replace them.
> 
> So the options seem to be:
> 
> (1) Refactor nouveau, radeon and amdgpu to not embed their framebuffer
>     struct in their fbdev struct, so that drm_framebuffer_remove() can
>     be used.
> 
> (2) Amend each of them to turn off crtcs which are using the fbdev
>     framebuffer, duplicating the code in drm_framebuffer_remove().
> 
> (3) Split drm_framebuffer_remove(), move the portion to turn off crtcs
>     into a separate helper, say, drm_framebuffer_deactivate(), call that
>     from nouveau, radeon and amdgpu.
> 
> (4) Go back to square one and use patch [9/9] of this series.
> 
> Which one would be most preferred? Is there another solution I've missed?

I think a dedicated turn_off_everything helper would be best. We'd need an
atomic and a legacy version (because hooray), but that would work in all
cases. Relying on the implicit behaviour to turn off everything (strictly
speaking you only need to turn off all the planes, you can leave crtcs on,
and that's what most atomic drivers want really under normal
circumstances) is a bit fragile, and it's also possible to disable fbdev
emulation. If you driver needs everything to be off in module unload, then
it's imo best to explicitly enforce that.

So "(5) Write dedicated helper to turn off everything" is imo the right
fix.
-Daniel
Lukas Wunner June 8, 2016, 4:55 p.m. UTC | #10
On Fri, Jun 03, 2016 at 08:21:50PM +0200, Daniel Vetter wrote:
> On Fri, Jun 03, 2016 at 09:30:06AM +0200, Lukas Wunner wrote:
> > On Wed, Jun 01, 2016 at 04:40:12PM +0200, Daniel Vetter wrote:
> > > On Wed, Jun 01, 2016 at 02:36:41PM +0200, Lukas Wunner wrote:
> > > > On Wed, May 25, 2016 at 03:43:42PM +0200, Daniel Vetter wrote:
> > > > > On Wed, May 25, 2016 at 12:51 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > > > On Tue, May 24, 2016 at 11:30:42PM +0200, Daniel Vetter wrote:
> > > > > > > On Tue, May 24, 2016 at 06:03:27PM +0200, Lukas Wunner wrote:
> > > > > > > > When a drm_crtc structure is destroyed with drm_crtc_cleanup(), the DRM
> > > > > > > > core does not turn off the crtc first and neither do the drivers. With
> > > > > > > > nouveau, radeon and amdgpu, this causes a runtime pm ref to be leaked on
> > > > > > > > driver unload if at least one crtc was enabled.
> > > > > > > >
> > > > > > > > (See usage of have_disp_power_ref in nouveau_crtc_set_config(),
> > > > > > > > radeon_crtc_set_config() and amdgpu_crtc_set_config()).
> > > > > > > >
> > > > > > > > Fixes: 5addcf0a5f0f ("nouveau: add runtime PM support (v0.9)")
> > > > > > > > Cc: Dave Airlie <airlied@redhat.com>
> > > > > > > > Tested-by: Karol Herbst <nouveau@karolherbst.de>
> > > > > > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > > > 
> > > > > With legacy kms the only way to keep a crtc enabled is to display a
> > > > > drm_framebuffer on it. And drm_mode_config_cleanup has a WARN_ON if
> > > > > framebuffers are left behind. There's a bunch of options:
> > > > > - nouveau somehow manages to keep the crtc on without a framebuffer
> > > > > - nouveau somehow leaks a drm_framebuffer, but removes it from the fb_list
> > > > > - something else
> > > > 
> > > > Found it. nouveau_fbcon_destroy() doesn't call drm_framebuffer_remove().
> > > > If I add that, the crtc gets properly disabled on unload.
> > > > 
> > > > It does call drm_framebuffer_cleanup(). That's why there was no WARN,
> > > > drm_mode_config_cleanup() only WARNs if a framebuffer was left on the
> > > > mode_config.fb_list.
> > > > 
> > > > radeon and amdgpu have the same problem. In fact there are very few
> > > > drivers that call drm_framebuffer_remove(): tegra, msm, exynos, omapdrm
> > > > and i915 (since Imre Deak's 9d6612516da0).
> > > > 
> > > > Should we add a WARN to prevent this? How about WARN_ON(crtc->enabled)
> > > > in drm_crtc_cleanup()?
> > > > 
> > > > Also, i915 calls drm_framebuffer_unregister_private() before it calls
> > > > drm_framebuffer_remove(). This ordering has the unfortunate side effect
> > > > that the drm_framebuffer has ID 0 in log messages emitted by
> > > > drm_framebuffer_remove():
> > > > 
> > > > [   39.680874] [drm:drm_mode_object_unreference] OBJ ID: 0 (3)
> > > > [   39.680878] [drm:drm_mode_object_unreference] OBJ ID: 0 (2)
> > > > [   39.680884] [drm:drm_mode_object_unreference] OBJ ID: 0 (1)
> > > 
> > > Well we must first unregister it before we can remove it, so this is
> > > unavoidable.
> > 
> > Yes but drm_framebuffer_free() calls drm_mode_object_unregister()
> > and is invoked by drm_framebuffer_remove(), so the additional call to
> > drm_framebuffer_unregister_private() in intel_fbdev_destroy() seems
> > superfluous. Or is there some reason I'm missing that this needs to
> > be called before intel_unpin_fb_obj()?
> > 
> > 
> > > Wrt switching from _cleanup to _remove, iirc there was troubles with the
> > > later calling into the fb->funcs->destroy hook. But many drivers have
> > > their fbdev fb embedded into some struct (instead of a pointer like i915),
> > > and then things go sideways badly. That's why you can't just blindly
> > > replace them.
> > 
> > So the options seem to be:
> > 
> > (1) Refactor nouveau, radeon and amdgpu to not embed their framebuffer
> >     struct in their fbdev struct, so that drm_framebuffer_remove() can
> >     be used.
> > 
> > (2) Amend each of them to turn off crtcs which are using the fbdev
> >     framebuffer, duplicating the code in drm_framebuffer_remove().
> > 
> > (3) Split drm_framebuffer_remove(), move the portion to turn off crtcs
> >     into a separate helper, say, drm_framebuffer_deactivate(), call that
> >     from nouveau, radeon and amdgpu.
> > 
> > (4) Go back to square one and use patch [9/9] of this series.
> > 
> > Which one would be most preferred? Is there another solution I've missed?
> 
> I think a dedicated turn_off_everything helper would be best. We'd need an
> atomic and a legacy version (because hooray), but that would work in all
> cases. Relying on the implicit behaviour to turn off everything (strictly
> speaking you only need to turn off all the planes, you can leave crtcs on,
> and that's what most atomic drivers want really under normal
> circumstances) is a bit fragile, and it's also possible to disable fbdev
> emulation. If you driver needs everything to be off in module unload, then
> it's imo best to explicitly enforce that.
> 
> So "(5) Write dedicated helper to turn off everything" is imo the right
> fix.

Okay I did that and just posted it as v2. Hope I've understood correctly
what you suggested, if not please let me know and I'll rectify in a v3.

Thanks,

Lukas

Patch
diff mbox

diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index d2a6d95..0cd6f00 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -716,12 +716,23 @@  EXPORT_SYMBOL(drm_crtc_init_with_planes);
  *
  * This function cleans up @crtc and removes it from the DRM mode setting
  * core. Note that the function does *not* free the crtc structure itself,
- * this is the responsibility of the caller.
+ * this is the responsibility of the caller. If @crtc is currently enabled,
+ * it is turned off first.
  */
 void drm_crtc_cleanup(struct drm_crtc *crtc)
 {
 	struct drm_device *dev = crtc->dev;
 
+	if (crtc->enabled) {
+		struct drm_mode_set modeset = {
+			.crtc = crtc,
+		};
+
+		drm_modeset_lock_all(dev);
+		drm_mode_set_config_internal(&modeset);
+		drm_modeset_unlock_all(dev);
+	}
+
 	kfree(crtc->gamma_store);
 	crtc->gamma_store = NULL;