Patchwork [1/3] drm/atomic-helper: Fix leak in disable_all

login
register
mail settings
Submitter Daniel Vetter
Date July 14, 2017, 10:46 p.m.
Message ID <20170714224656.6431-1-daniel.vetter@ffwll.ch>
Download mbox | patch
Permalink /patch/9841783/
State New
Headers show

Comments

Daniel Vetter - July 14, 2017, 10:46 p.m.
The legacy plane->fb pointer is refcounted by calling
drm_atomic_clean_old_fb().

In practice this isn't a real problem because:
- The caller in the i915 gpu reset code restores the original state
  again, which means the plane->fb pointer won't change, hence can't
  leak.
- Drivers using drm_atomic_helper_shutdown call the fbdev cleanup
  first, and that usually cleans up the fb through
  drm_remove_framebuffer, which does this correctly.
- Without fbdev the only framebuffers are from userspace, and those
  get cleaned up (again using drm_remove_framebuffer) befor the driver
  can even be unloaded.

But in i915 I've switched the cleanup sequence around so that the
_shutdown() calls happens after the drm_remove_framebuffer(), which is
how I discovered this issue.

v2: My analysis why the current code was ok for gpu reset and
suspend/resume was correct, but then I totally failed to realize that
we better keep this symmetric. Thanksfully CI noticed that for
balance, a refcounting bug must exist at 2 places if previously there
was no issue ...

Cc: martin.peres@free.fr
Cc: chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/drm_atomic_helper.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)
Chris Wilson - July 15, 2017, 10:10 a.m.
Quoting Daniel Vetter (2017-07-14 23:46:54)
> @@ -2902,6 +2907,8 @@ int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state,
>         struct drm_connector_state *new_conn_state;
>         struct drm_crtc *crtc;
>         struct drm_crtc_state *new_crtc_state;
> +       struct drm_device *dev = state->dev;
> +       int ret;
>  
>         state->acquire_ctx = ctx;
>  
> @@ -2914,7 +2921,10 @@ int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state,
>         for_each_new_connector_in_state(state, connector, new_conn_state, i)
>                 state->connectors[i].old_state = connector->state;
>  
> -       return drm_atomic_commit(state);
> +       ret = drm_atomic_commit(state);
> +       drm_atomic_clean_old_fb(dev, ~0U, ret);

I have no idea what the rules should be here. Or how it should interact
with error. Should we just try the "drm: Track framebuffer references at
the point of assignment" approach to simplify the rules (at least from
my perspective)? The problem with that patch is sorting out the state
duplication done in a couple of drivers and figuring out if they are
transferring ownership or not.
-Chris
Daniel Vetter - July 15, 2017, 11:03 a.m.
On Sat, Jul 15, 2017 at 11:10:07AM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-07-14 23:46:54)
> > @@ -2902,6 +2907,8 @@ int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state,
> >         struct drm_connector_state *new_conn_state;
> >         struct drm_crtc *crtc;
> >         struct drm_crtc_state *new_crtc_state;
> > +       struct drm_device *dev = state->dev;
> > +       int ret;
> >  
> >         state->acquire_ctx = ctx;
> >  
> > @@ -2914,7 +2921,10 @@ int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state,
> >         for_each_new_connector_in_state(state, connector, new_conn_state, i)
> >                 state->connectors[i].old_state = connector->state;
> >  
> > -       return drm_atomic_commit(state);
> > +       ret = drm_atomic_commit(state);
> > +       drm_atomic_clean_old_fb(dev, ~0U, ret);
> 
> I have no idea what the rules should be here. Or how it should interact
> with error. Should we just try the "drm: Track framebuffer references at
> the point of assignment" approach to simplify the rules (at least from
> my perspective)? The problem with that patch is sorting out the state
> duplication done in a couple of drivers and figuring out if they are
> transferring ownership or not.

Yeah this is a decent mess. I think a first incremental step would be to
move the ->old_fb and ->fb refcount updating into drm_atomic_commit -
clean_old_fb is a superbly misnamed function, what it really does is
update the legacy framebuffer pointer refcounts. The kernel-doc it has
also just serves to increase the confusion :-/

The only problem is that for an drm_atomic_commit in one of the legacy
callbacks we must _not_ do that, because the core already takes care fo
that. But having a drm_atomic_commit_legacy for that would be a lot
cleaner I think.

Once we have that I guess we could try and figure out what to do with the
refcounting of the legacy pointers at large.

Meanwhile the rules are:
- If you do an atomic commit, you must call clean_old_fb. Everywhere. We
  got away in a few cases because we've made nice symmetric commits (like
  suspend/resume) or commits where we know the fb doesn't get updated.

- Exception: When called from ->plane_update/disable, ->set_config or
  ->page_flip, the core does it for you.

It's a mess, but I'd like to decouple fixing that mess from fixing the
unload bug we have. I'll try to type up the drm_atomic_commit/_legacy
patch next week.
-Daniel

Patch

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index b07fc30372d3..3a04619d3bec 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -2726,6 +2726,7 @@  int drm_atomic_helper_disable_all(struct drm_device *dev,
 	struct drm_plane *plane;
 	struct drm_crtc_state *crtc_state;
 	struct drm_crtc *crtc;
+	unsigned plane_mask = 0;
 	int ret, i;
 
 	state = drm_atomic_state_alloc(dev);
@@ -2768,10 +2769,14 @@  int drm_atomic_helper_disable_all(struct drm_device *dev,
 			goto free;
 
 		drm_atomic_set_fb_for_plane(plane_state, NULL);
+		plane_mask |= BIT(drm_plane_index(plane));
+		plane->old_fb = plane->fb;
 	}
 
 	ret = drm_atomic_commit(state);
 free:
+	if (plane_mask)
+		drm_atomic_clean_old_fb(dev, plane_mask, ret);
 	drm_atomic_state_put(state);
 	return ret;
 }
@@ -2902,6 +2907,8 @@  int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state,
 	struct drm_connector_state *new_conn_state;
 	struct drm_crtc *crtc;
 	struct drm_crtc_state *new_crtc_state;
+	struct drm_device *dev = state->dev;
+	int ret;
 
 	state->acquire_ctx = ctx;
 
@@ -2914,7 +2921,10 @@  int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state,
 	for_each_new_connector_in_state(state, connector, new_conn_state, i)
 		state->connectors[i].old_state = connector->state;
 
-	return drm_atomic_commit(state);
+	ret = drm_atomic_commit(state);
+	drm_atomic_clean_old_fb(dev, ~0U, ret);
+
+	return ret;
 }
 EXPORT_SYMBOL(drm_atomic_helper_commit_duplicated_state);