diff mbox

[1/3] drm/atomic-helper: Fix leak in disable_all

Message ID 20170714191439.31169-1-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Vetter July 14, 2017, 7:14 p.m. UTC
The legacy plane->fb pointer is refcounted by calling
drm_atomic_clean_old_fb().

In practice this isn't a real problem because:
- The caller in the i915 gpu reset code restores the original state
  again, which means the plane->fb pointer won't change, hence can't
  leak.
- Drivers using drm_atomic_helper_shutdown call the fbdev cleanup
  first, and that usually cleans up the fb through
  drm_remove_framebuffer, which does this correctly.
- Without fbdev the only framebuffers are from userspace, and those
  get cleaned up (again using drm_remove_framebuffer) befor the driver
  can even be unloaded.

But in i915 I've switched the cleanup sequence around so that the
_shutdown() calls happens after the drm_remove_framebuffer(), which is
how I discovered this issue.

Cc: martin.peres@free.fr
Cc: chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/drm_atomic_helper.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Chris Wilson July 14, 2017, 7:27 p.m. UTC | #1
Quoting Daniel Vetter (2017-07-14 20:14:37)
> The legacy plane->fb pointer is refcounted by calling
> drm_atomic_clean_old_fb().
> 
> In practice this isn't a real problem because:
> - The caller in the i915 gpu reset code restores the original state
>   again, which means the plane->fb pointer won't change, hence can't
>   leak.
> - Drivers using drm_atomic_helper_shutdown call the fbdev cleanup
>   first, and that usually cleans up the fb through
>   drm_remove_framebuffer, which does this correctly.
> - Without fbdev the only framebuffers are from userspace, and those
>   get cleaned up (again using drm_remove_framebuffer) befor the driver
>   can even be unloaded.
> 
> But in i915 I've switched the cleanup sequence around so that the
> _shutdown() calls happens after the drm_remove_framebuffer(), which is
> how I discovered this issue.

If only we have refcounted fb now and didn't need every caller manually
tracking the old pointers. </s>

Given the requirement for the caller to cleanup old_fb around
drm_atomic_commit(), this looks correct to me.

> Cc: martin.peres@free.fr
> Cc: chris@chris-wilson.co.uk
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Chris Wilson July 14, 2017, 8:32 p.m. UTC | #2
Quoting Patchwork (2017-07-14 21:18:04)
> == Series Details ==
> 
> Series: series starting with [1/3] drm/atomic-helper: Fix leak in disable_all
> URL   : https://patchwork.freedesktop.org/series/27327/
> State : failure
> 
> == Summary ==
> 
> Series 27327v1 Series without cover letter
> https://patchwork.freedesktop.org/api/1.0/series/27327/revisions/1/mbox/
> 
> Test gem_exec_fence:
>         Subgroup await-hang-default:
>                 pass       -> INCOMPLETE (fi-blb-e6850)
>                 pass       -> INCOMPLETE (fi-pnv-d510)
> Test gem_exec_flush:
>         Subgroup basic-batch-kernel-default-uc:
>                 fail       -> PASS       (fi-snb-2600) fdo#100007
> Test kms_busy:
>         Subgroup basic-flip-default-a:
>                 pass       -> INCOMPLETE (fi-ilk-650)
>                 pass       -> INCOMPLETE (fi-snb-2520m)
>                 pass       -> INCOMPLETE (fi-snb-2600)
>                 pass       -> INCOMPLETE (fi-ivb-3520m)
>                 pass       -> INCOMPLETE (fi-ivb-3770)
>                 pass       -> INCOMPLETE (fi-byt-j1900)
>                 pass       -> INCOMPLETE (fi-byt-n2820)
>                 pass       -> INCOMPLETE (fi-hsw-4770)
>                 pass       -> INCOMPLETE (fi-hsw-4770r)
>                 pass       -> INCOMPLETE (fi-bdw-5557u)
>                 pass       -> INCOMPLETE (fi-bdw-gvtdvm)
>                 pass       -> INCOMPLETE (fi-skl-6260u)
>                 pass       -> INCOMPLETE (fi-skl-6700hq) fdo#101144
>                 pass       -> INCOMPLETE (fi-skl-6700k)
>                 pass       -> INCOMPLETE (fi-skl-6770hq)
>                 pass       -> INCOMPLETE (fi-skl-gvtdvm)
>                 none       -> INCOMPLETE (fi-skl-x1585l)
>                 pass       -> INCOMPLETE (fi-bxt-j4205)
>                 pass       -> INCOMPLETE (fi-kbl-7500u)
>                 pass       -> INCOMPLETE (fi-kbl-7560u)
>                 pass       -> INCOMPLETE (fi-kbl-r)
>                 pass       -> INCOMPLETE (fi-glk-2a)
>         Subgroup basic-flip-default-c:
>                 pass       -> INCOMPLETE (fi-bsw-n3050)

Looks like we decremented a refcount once too often (after it was freed,
poison 0x6b--). That clean up plane->old_fb?
-Chris
Daniel Vetter July 14, 2017, 9:03 p.m. UTC | #3
On Fri, Jul 14, 2017 at 09:32:13PM +0100, Chris Wilson wrote:
> Quoting Patchwork (2017-07-14 21:18:04)
> > == Series Details ==
> > 
> > Series: series starting with [1/3] drm/atomic-helper: Fix leak in disable_all
> > URL   : https://patchwork.freedesktop.org/series/27327/
> > State : failure
> > 
> > == Summary ==
> > 
> > Series 27327v1 Series without cover letter
> > https://patchwork.freedesktop.org/api/1.0/series/27327/revisions/1/mbox/
> > 
> > Test gem_exec_fence:
> >         Subgroup await-hang-default:
> >                 pass       -> INCOMPLETE (fi-blb-e6850)
> >                 pass       -> INCOMPLETE (fi-pnv-d510)
> > Test gem_exec_flush:
> >         Subgroup basic-batch-kernel-default-uc:
> >                 fail       -> PASS       (fi-snb-2600) fdo#100007
> > Test kms_busy:
> >         Subgroup basic-flip-default-a:
> >                 pass       -> INCOMPLETE (fi-ilk-650)
> >                 pass       -> INCOMPLETE (fi-snb-2520m)
> >                 pass       -> INCOMPLETE (fi-snb-2600)
> >                 pass       -> INCOMPLETE (fi-ivb-3520m)
> >                 pass       -> INCOMPLETE (fi-ivb-3770)
> >                 pass       -> INCOMPLETE (fi-byt-j1900)
> >                 pass       -> INCOMPLETE (fi-byt-n2820)
> >                 pass       -> INCOMPLETE (fi-hsw-4770)
> >                 pass       -> INCOMPLETE (fi-hsw-4770r)
> >                 pass       -> INCOMPLETE (fi-bdw-5557u)
> >                 pass       -> INCOMPLETE (fi-bdw-gvtdvm)
> >                 pass       -> INCOMPLETE (fi-skl-6260u)
> >                 pass       -> INCOMPLETE (fi-skl-6700hq) fdo#101144
> >                 pass       -> INCOMPLETE (fi-skl-6700k)
> >                 pass       -> INCOMPLETE (fi-skl-6770hq)
> >                 pass       -> INCOMPLETE (fi-skl-gvtdvm)
> >                 none       -> INCOMPLETE (fi-skl-x1585l)
> >                 pass       -> INCOMPLETE (fi-bxt-j4205)
> >                 pass       -> INCOMPLETE (fi-kbl-7500u)
> >                 pass       -> INCOMPLETE (fi-kbl-7560u)
> >                 pass       -> INCOMPLETE (fi-kbl-r)
> >                 pass       -> INCOMPLETE (fi-glk-2a)
> >         Subgroup basic-flip-default-c:
> >                 pass       -> INCOMPLETE (fi-bsw-n3050)
> 
> Looks like we decremented a refcount once too often (after it was freed,
> poison 0x6b--). That clean up plane->old_fb?

I can't even run that testcase without my gpu reset fixes, epic win :(

I'll assemble an even bigger pile and see what happens.
-Daniel
Chris Wilson July 14, 2017, 9:17 p.m. UTC | #4
Quoting Daniel Vetter (2017-07-14 22:03:05)
> On Fri, Jul 14, 2017 at 09:32:13PM +0100, Chris Wilson wrote:
> > Quoting Patchwork (2017-07-14 21:18:04)
> > > == Series Details ==
> > > 
> > > Series: series starting with [1/3] drm/atomic-helper: Fix leak in disable_all
> > > URL   : https://patchwork.freedesktop.org/series/27327/
> > > State : failure
> > > 
> > > == Summary ==
> > > 
> > > Series 27327v1 Series without cover letter
> > > https://patchwork.freedesktop.org/api/1.0/series/27327/revisions/1/mbox/
> > > 
> > > Test gem_exec_fence:
> > >         Subgroup await-hang-default:
> > >                 pass       -> INCOMPLETE (fi-blb-e6850)
> > >                 pass       -> INCOMPLETE (fi-pnv-d510)
> > > Test gem_exec_flush:
> > >         Subgroup basic-batch-kernel-default-uc:
> > >                 fail       -> PASS       (fi-snb-2600) fdo#100007
> > > Test kms_busy:
> > >         Subgroup basic-flip-default-a:
> > >                 pass       -> INCOMPLETE (fi-ilk-650)
> > >                 pass       -> INCOMPLETE (fi-snb-2520m)
> > >                 pass       -> INCOMPLETE (fi-snb-2600)
> > >                 pass       -> INCOMPLETE (fi-ivb-3520m)
> > >                 pass       -> INCOMPLETE (fi-ivb-3770)
> > >                 pass       -> INCOMPLETE (fi-byt-j1900)
> > >                 pass       -> INCOMPLETE (fi-byt-n2820)
> > >                 pass       -> INCOMPLETE (fi-hsw-4770)
> > >                 pass       -> INCOMPLETE (fi-hsw-4770r)
> > >                 pass       -> INCOMPLETE (fi-bdw-5557u)
> > >                 pass       -> INCOMPLETE (fi-bdw-gvtdvm)
> > >                 pass       -> INCOMPLETE (fi-skl-6260u)
> > >                 pass       -> INCOMPLETE (fi-skl-6700hq) fdo#101144
> > >                 pass       -> INCOMPLETE (fi-skl-6700k)
> > >                 pass       -> INCOMPLETE (fi-skl-6770hq)
> > >                 pass       -> INCOMPLETE (fi-skl-gvtdvm)
> > >                 none       -> INCOMPLETE (fi-skl-x1585l)
> > >                 pass       -> INCOMPLETE (fi-bxt-j4205)
> > >                 pass       -> INCOMPLETE (fi-kbl-7500u)
> > >                 pass       -> INCOMPLETE (fi-kbl-7560u)
> > >                 pass       -> INCOMPLETE (fi-kbl-r)
> > >                 pass       -> INCOMPLETE (fi-glk-2a)
> > >         Subgroup basic-flip-default-c:
> > >                 pass       -> INCOMPLETE (fi-bsw-n3050)
> > 
> > Looks like we decremented a refcount once too often (after it was freed,
> > poison 0x6b--). That clean up plane->old_fb?
> 
> I can't even run that testcase without my gpu reset fixes, epic win :(

It's hard to tell, but considering that the error it is detecting is slab
corruption, it is more likely an earlier test that is doing the dirty
deed. This is kasan territory. Tomi!
-Chris
Daniel Vetter July 14, 2017, 9:50 p.m. UTC | #5
On Fri, Jul 14, 2017 at 10:17:52PM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-07-14 22:03:05)
> > On Fri, Jul 14, 2017 at 09:32:13PM +0100, Chris Wilson wrote:
> > > Quoting Patchwork (2017-07-14 21:18:04)
> > > > == Series Details ==
> > > > 
> > > > Series: series starting with [1/3] drm/atomic-helper: Fix leak in disable_all
> > > > URL   : https://patchwork.freedesktop.org/series/27327/
> > > > State : failure
> > > > 
> > > > == Summary ==
> > > > 
> > > > Series 27327v1 Series without cover letter
> > > > https://patchwork.freedesktop.org/api/1.0/series/27327/revisions/1/mbox/
> > > > 
> > > > Test gem_exec_fence:
> > > >         Subgroup await-hang-default:
> > > >                 pass       -> INCOMPLETE (fi-blb-e6850)
> > > >                 pass       -> INCOMPLETE (fi-pnv-d510)
> > > > Test gem_exec_flush:
> > > >         Subgroup basic-batch-kernel-default-uc:
> > > >                 fail       -> PASS       (fi-snb-2600) fdo#100007
> > > > Test kms_busy:
> > > >         Subgroup basic-flip-default-a:
> > > >                 pass       -> INCOMPLETE (fi-ilk-650)
> > > >                 pass       -> INCOMPLETE (fi-snb-2520m)
> > > >                 pass       -> INCOMPLETE (fi-snb-2600)
> > > >                 pass       -> INCOMPLETE (fi-ivb-3520m)
> > > >                 pass       -> INCOMPLETE (fi-ivb-3770)
> > > >                 pass       -> INCOMPLETE (fi-byt-j1900)
> > > >                 pass       -> INCOMPLETE (fi-byt-n2820)
> > > >                 pass       -> INCOMPLETE (fi-hsw-4770)
> > > >                 pass       -> INCOMPLETE (fi-hsw-4770r)
> > > >                 pass       -> INCOMPLETE (fi-bdw-5557u)
> > > >                 pass       -> INCOMPLETE (fi-bdw-gvtdvm)
> > > >                 pass       -> INCOMPLETE (fi-skl-6260u)
> > > >                 pass       -> INCOMPLETE (fi-skl-6700hq) fdo#101144
> > > >                 pass       -> INCOMPLETE (fi-skl-6700k)
> > > >                 pass       -> INCOMPLETE (fi-skl-6770hq)
> > > >                 pass       -> INCOMPLETE (fi-skl-gvtdvm)
> > > >                 none       -> INCOMPLETE (fi-skl-x1585l)
> > > >                 pass       -> INCOMPLETE (fi-bxt-j4205)
> > > >                 pass       -> INCOMPLETE (fi-kbl-7500u)
> > > >                 pass       -> INCOMPLETE (fi-kbl-7560u)
> > > >                 pass       -> INCOMPLETE (fi-kbl-r)
> > > >                 pass       -> INCOMPLETE (fi-glk-2a)
> > > >         Subgroup basic-flip-default-c:
> > > >                 pass       -> INCOMPLETE (fi-bsw-n3050)
> > > 
> > > Looks like we decremented a refcount once too often (after it was freed,
> > > poison 0x6b--). That clean up plane->old_fb?
> > 
> > I can't even run that testcase without my gpu reset fixes, epic win :(
> 
> It's hard to tell, but considering that the error it is detecting is slab
> corruption, it is more likely an earlier test that is doing the dirty
> deed. This is kasan territory. Tomi!

I did fire up my full pile of stuff, and it indeed goes somehow horribly
wrong in the modeset path for the reset code. Looks like my
innocently-looking change to disable_all (we call that somewhere in there)
to properly refcount ->fb wasn't all that innocent :(

It's getting late, I'll try to figure out what's going tomorrow or on Mon.
Promises to be another intriguing wild goose chase at least. kasan didn't
say anything special, but somehow the atomic commit machinery died
completely.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index b07fc30372d3..71b5f71e61e0 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -2726,6 +2726,7 @@  int drm_atomic_helper_disable_all(struct drm_device *dev,
 	struct drm_plane *plane;
 	struct drm_crtc_state *crtc_state;
 	struct drm_crtc *crtc;
+	unsigned plane_mask = 0;
 	int ret, i;
 
 	state = drm_atomic_state_alloc(dev);
@@ -2768,10 +2769,14 @@  int drm_atomic_helper_disable_all(struct drm_device *dev,
 			goto free;
 
 		drm_atomic_set_fb_for_plane(plane_state, NULL);
+		plane_mask |= BIT(drm_plane_index(plane));
+		plane->old_fb = plane->fb;
 	}
 
 	ret = drm_atomic_commit(state);
 free:
+	if (plane_mask)
+		drm_atomic_clean_old_fb(dev, plane_mask, ret);
 	drm_atomic_state_put(state);
 	return ret;
 }