diff mbox

drm/gem: fix up flink name create race

Message ID 1374645842-18538-1-git-send-email-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Vetter July 24, 2013, 6:04 a.m. UTC
This is the 2nd attempt, I've always been a bit dissatisified with the
tricky nature of the first one:

http://lists.freedesktop.org/archives/dri-devel/2012-July/025451.html

The issue is that the flink ioctl can race with calling gem_close on
the last gem handle. In that case we'll end up with a zero handle
count, but an flink name (and it's corresponding reference). Which
results in a neat space leak.

In my first attempt I've solved this by rechecking the handle count.
But fundamentally the issue is that ->handle_count isn't your usual
refcount - it can be resurrected from 0 among other things.

For those special beasts atomic_t often suggest way more ordering that
it actually guarantees. To prevent being tricked by those hairy
semantics take the easy way out and simply protect the handle with the
existing dev->object_name_lock.

With that change implemented it's dead easy to fix the flink vs. gem
close reace: When we try to create the name we simply have to check
whether there's still officially a gem handle around and if not refuse
to create the flink name. Since the handle count decrement and flink
name destruction is now also protected by that lock the reace is gone
and we can't ever leak the flink reference again.

Outside of the drm core only the exynos driver looks at the handle
count, and tbh I have no idea why (it's just for debug dmesg output
luckily).

I've considered inlining the drm_gem_object_handle_free, but I plan to
add more name-like things (like the exported dma_buf) to this scheme,
so it's clearer to leave the handle freeing in its own function.

This is exercised by the new gem_flink_race i-g-t testcase, which on
my snb leaks gem objects at a rate of roughly 1k objects/s.

v2: Fix up the error path handling in handle_create and make it more
robust by simply calling object_handle_unreference.

v3: Fix up the handle_unreference logic bug - atomic_dec_and_test
retursn 1 for 0. Oops.

v4: Squash in inlining of drm_gem_object_handle_reference as suggested
by Dave Airlie and add a note that we now have a testcase.

Cc: Dave Airlie <airlied@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/drm_gem.c               | 33 +++++++++++++++++++++------------
 drivers/gpu/drm/drm_info.c              |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c |  2 +-
 include/drm/drmP.h                      | 19 ++++++++++---------
 4 files changed, 33 insertions(+), 23 deletions(-)

Comments

Daniel Vetter July 24, 2013, 9:02 a.m. UTC | #1
On Wed, Jul 24, 2013 at 8:04 AM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> This is the 2nd attempt, I've always been a bit dissatisified with the
> tricky nature of the first one:
>
> http://lists.freedesktop.org/archives/dri-devel/2012-July/025451.html
>
> The issue is that the flink ioctl can race with calling gem_close on
> the last gem handle. In that case we'll end up with a zero handle
> count, but an flink name (and it's corresponding reference). Which
> results in a neat space leak.
>
> In my first attempt I've solved this by rechecking the handle count.
> But fundamentally the issue is that ->handle_count isn't your usual
> refcount - it can be resurrected from 0 among other things.
>
> For those special beasts atomic_t often suggest way more ordering that
> it actually guarantees. To prevent being tricked by those hairy
> semantics take the easy way out and simply protect the handle with the
> existing dev->object_name_lock.
>
> With that change implemented it's dead easy to fix the flink vs. gem
> close reace: When we try to create the name we simply have to check
> whether there's still officially a gem handle around and if not refuse
> to create the flink name. Since the handle count decrement and flink
> name destruction is now also protected by that lock the reace is gone
> and we can't ever leak the flink reference again.
>
> Outside of the drm core only the exynos driver looks at the handle
> count, and tbh I have no idea why (it's just for debug dmesg output
> luckily).
>
> I've considered inlining the drm_gem_object_handle_free, but I plan to
> add more name-like things (like the exported dma_buf) to this scheme,
> so it's clearer to leave the handle freeing in its own function.
>
> This is exercised by the new gem_flink_race i-g-t testcase, which on
> my snb leaks gem objects at a rate of roughly 1k objects/s.

That's actually incorrect since the leak I've found is just a race in
the drm/i915 object tracking. So I need to go back to the drawing
board and figure out which are the ghosts and which the dragons here.

I've turned that testcase into an exercise for "drm/gem: completely
close gem_open vs. gem_close races", but that race only results in
userspace seeing different flink names for the same object. And that
only happens if userspace is racy already.

For this patch here I still think there's an issue, but I seriously
need to restart my brain first and flush out the bogons with some
coffee before I try again ;-)
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter July 24, 2013, 12:13 p.m. UTC | #2
On Wed, Jul 24, 2013 at 11:02:02AM +0200, Daniel Vetter wrote:
> On Wed, Jul 24, 2013 at 8:04 AM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > This is the 2nd attempt, I've always been a bit dissatisified with the
> > tricky nature of the first one:
> >
> > http://lists.freedesktop.org/archives/dri-devel/2012-July/025451.html
> >
> > The issue is that the flink ioctl can race with calling gem_close on
> > the last gem handle. In that case we'll end up with a zero handle
> > count, but an flink name (and it's corresponding reference). Which
> > results in a neat space leak.
> >
> > In my first attempt I've solved this by rechecking the handle count.
> > But fundamentally the issue is that ->handle_count isn't your usual
> > refcount - it can be resurrected from 0 among other things.
> >
> > For those special beasts atomic_t often suggest way more ordering that
> > it actually guarantees. To prevent being tricked by those hairy
> > semantics take the easy way out and simply protect the handle with the
> > existing dev->object_name_lock.
> >
> > With that change implemented it's dead easy to fix the flink vs. gem
> > close reace: When we try to create the name we simply have to check
> > whether there's still officially a gem handle around and if not refuse
> > to create the flink name. Since the handle count decrement and flink
> > name destruction is now also protected by that lock the reace is gone
> > and we can't ever leak the flink reference again.
> >
> > Outside of the drm core only the exynos driver looks at the handle
> > count, and tbh I have no idea why (it's just for debug dmesg output
> > luckily).
> >
> > I've considered inlining the drm_gem_object_handle_free, but I plan to
> > add more name-like things (like the exported dma_buf) to this scheme,
> > so it's clearer to leave the handle freeing in its own function.
> >
> > This is exercised by the new gem_flink_race i-g-t testcase, which on
> > my snb leaks gem objects at a rate of roughly 1k objects/s.
> 
> That's actually incorrect since the leak I've found is just a race in
> the drm/i915 object tracking. So I need to go back to the drawing
> board and figure out which are the ghosts and which the dragons here.
> 
> I've turned that testcase into an exercise for "drm/gem: completely
> close gem_open vs. gem_close races", but that race only results in
> userspace seeing different flink names for the same object. And that
> only happens if userspace is racy already.
> 
> For this patch here I still think there's an issue, but I seriously
> need to restart my brain first and flush out the bogons with some
> coffee before I try again ;-)

Ok, I've written a second subtest now, and the race is indeed there and
I've managed to leak a few objects. It's rather hard to hit though, I get
a leak for roughly ever 1M attempts to provoke the race. But I'm rather
convinced now that the leak is indeed there ;-)

So I think the patch as-is stands as correct and required to block off
evil usespace.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index e7d2b7f..14c70b5 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -140,7 +140,7 @@  int drm_gem_object_init(struct drm_device *dev,
 		return PTR_ERR(obj->filp);
 
 	kref_init(&obj->refcount);
-	atomic_set(&obj->handle_count, 0);
+	obj->handle_count = 0;
 	obj->size = size;
 
 	return 0;
@@ -161,7 +161,7 @@  int drm_gem_private_object_init(struct drm_device *dev,
 	obj->filp = NULL;
 
 	kref_init(&obj->refcount);
-	atomic_set(&obj->handle_count, 0);
+	obj->handle_count = 0;
 	obj->size = size;
 
 	return 0;
@@ -227,11 +227,9 @@  static void drm_gem_object_handle_free(struct drm_gem_object *obj)
 	struct drm_device *dev = obj->dev;
 
 	/* Remove any name for this object */
-	spin_lock(&dev->object_name_lock);
 	if (obj->name) {
 		idr_remove(&dev->object_name_idr, obj->name);
 		obj->name = 0;
-		spin_unlock(&dev->object_name_lock);
 		/*
 		 * The object name held a reference to this object, drop
 		 * that now.
@@ -239,15 +237,13 @@  static void drm_gem_object_handle_free(struct drm_gem_object *obj)
 		* This cannot be the last reference, since the handle holds one too.
 		 */
 		kref_put(&obj->refcount, drm_gem_object_ref_bug);
-	} else
-		spin_unlock(&dev->object_name_lock);
-
+	}
 }
 
 void
 drm_gem_object_handle_unreference_unlocked(struct drm_gem_object *obj)
 {
-	if (WARN_ON(atomic_read(&obj->handle_count) == 0))
+	if (WARN_ON(obj->handle_count == 0))
 		return;
 
 	/*
@@ -256,8 +252,11 @@  drm_gem_object_handle_unreference_unlocked(struct drm_gem_object *obj)
 	* checked for a name
 	*/
 
-	if (atomic_dec_and_test(&obj->handle_count))
+	spin_lock(&obj->dev->object_name_lock);
+	if (--obj->handle_count == 0)
 		drm_gem_object_handle_free(obj);
+	spin_unlock(&obj->dev->object_name_lock);
+
 	drm_gem_object_unreference_unlocked(obj);
 }
 
@@ -321,17 +320,21 @@  drm_gem_handle_create(struct drm_file *file_priv,
 	 * allocation under our spinlock.
 	 */
 	idr_preload(GFP_KERNEL);
+	spin_lock(&dev->object_name_lock);
 	spin_lock(&file_priv->table_lock);
 
 	ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
-
+	drm_gem_object_reference(obj);
+	obj->handle_count++;
 	spin_unlock(&file_priv->table_lock);
+	spin_unlock(&dev->object_name_lock);
 	idr_preload_end();
-	if (ret < 0)
+	if (ret < 0) {
+		drm_gem_object_handle_unreference_unlocked(obj);
 		return ret;
+	}
 	*handlep = ret;
 
-	drm_gem_object_handle_reference(obj);
 
 	if (dev->driver->gem_open_object) {
 		ret = dev->driver->gem_open_object(obj, file_priv);
@@ -498,6 +501,12 @@  drm_gem_flink_ioctl(struct drm_device *dev, void *data,
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&dev->object_name_lock);
+	/* prevent races with concurrent gem_close. */
+	if (obj->handle_count == 0) {
+		ret = -ENOENT;
+		goto err;
+	}
+
 	if (!obj->name) {
 		ret = idr_alloc(&dev->object_name_idr, obj, 1, 0, GFP_NOWAIT);
 		if (ret < 0)
diff --git a/drivers/gpu/drm/drm_info.c b/drivers/gpu/drm/drm_info.c
index d4b20ce..f4b348c 100644
--- a/drivers/gpu/drm/drm_info.c
+++ b/drivers/gpu/drm/drm_info.c
@@ -207,7 +207,7 @@  static int drm_gem_one_name_info(int id, void *ptr, void *data)
 
 	seq_printf(m, "%6d %8zd %7d %8d\n",
 		   obj->name, obj->size,
-		   atomic_read(&obj->handle_count),
+		   obj->handle_count,
 		   atomic_read(&obj->refcount.refcount));
 	return 0;
 }
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index 24c22a8..16963ca 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -135,7 +135,7 @@  void exynos_drm_gem_destroy(struct exynos_drm_gem_obj *exynos_gem_obj)
 	obj = &exynos_gem_obj->base;
 	buf = exynos_gem_obj->buffer;
 
-	DRM_DEBUG_KMS("handle count = %d\n", atomic_read(&obj->handle_count));
+	DRM_DEBUG_KMS("handle count = %d\n", obj->handle_count);
 
 	/*
 	 * do not release memory region from exporter.
diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index 301a6d6..25da8e0 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -634,8 +634,16 @@  struct drm_gem_object {
 	/** Reference count of this object */
 	struct kref refcount;
 
-	/** Handle count of this object. Each handle also holds a reference */
-	atomic_t handle_count; /* number of handles on this object */
+	/**
+	 * handle_count - gem file_priv handle count of this object
+	 *
+	 * Each handle also holds a reference. Note that when the handle_count
+	 * drops to 0 any global names (e.g. the id in the flink namespace) will
+	 * be cleared.
+	 *
+	 * Protected by dev->object_name_lock.
+	 * */
+	unsigned handle_count;
 
 	/** Related drm device */
 	struct drm_device *dev;
@@ -1662,13 +1670,6 @@  int drm_gem_handle_create(struct drm_file *file_priv,
 			  u32 *handlep);
 int drm_gem_handle_delete(struct drm_file *filp, u32 handle);
 
-static inline void
-drm_gem_object_handle_reference(struct drm_gem_object *obj)
-{
-	drm_gem_object_reference(obj);
-	atomic_inc(&obj->handle_count);
-}
-
 void drm_gem_object_handle_unreference_unlocked(struct drm_gem_object *obj);
 
 void drm_gem_free_mmap_offset(struct drm_gem_object *obj);