drm/i915: drop the coditional mutex

Message ID	1364909458-59282-1-git-send-email-bigeasy@linutronix.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> To: Daniel Vetter <daniel.vetter@ffwll.ch> Subject: [PATCH] drm/i915: drop the coditional mutex Date: Tue, 2 Apr 2013 15:30:58 +0200 Message-Id: <1364909458-59282-1-git-send-email-bigeasy@linutronix.de> Cc: intel-gfx@lists.freedesktop.org, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, tglx@linutronix.de, dri-devel@lists.freedesktop.org Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org Errors-To: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org

Message ID

1364909458-59282-1-git-send-email-bigeasy@linutronix.de (mailing list archive)

State

New, archived

Headers

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: [PATCH] drm/i915: drop the coditional mutex
Date: Tue,  2 Apr 2013 15:30:58 +0200
Message-Id: <1364909458-59282-1-git-send-email-bigeasy@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	tglx@linutronix.de, dri-devel@lists.freedesktop.org
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org
Errors-To: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org

Commit Message

Sebastian Sewior April 2, 2013, 1:30 p.m. UTC

mutex_is_locked_by() checks the owner of the lock against current. This
is done by accessing a private member of struct mutex which works on
mainline but does not on RT.
I did not figure out, why this "lock-owner-check" and the "lock stealing
flag" is required. If the lock can not be acquire the lock (because it
would deadlock) then it can return -1.
The lock stealing makes actually no sense to me. If
shrinker_no_lock_stealing is true then the same functions
(i915_gem_purge(), i915_gem_shrink_all()) are called from
i915_gem_object_create_mmap_offset() as from i915_gem_inactive_shrink().
I haven't found a path in which i915_gem_inactive_shrink() is invoked
from i915_gem_object_create_mmap_offset() that means there is no way
shrinker_no_lock_stealing is true _and_ the lock is owned by the current
process.
Since I don't see why the i915 needs this hack while all other user do
not I recommend to remove this hack and return -1 in case of a dead
lock. Here is the patch.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/i915_drv.h |    1 -
 drivers/gpu/drm/i915/i915_gem.c |   32 +++-----------------------------
 2 files changed, 3 insertions(+), 30 deletions(-)

Comments

Sebastian Sewior April 8, 2013, 6:58 p.m. UTC | #1

* Daniel Vetter | 2013-04-02 15:47:02 [+0200]:

>On Tue, Apr 02, 2013 at 03:30:58PM +0200, Sebastian Andrzej Siewior wrote:
>> mutex_is_locked_by() checks the owner of the lock against current. This
>> is done by accessing a private member of struct mutex which works on
>> mainline but does not on RT.
>> I did not figure out, why this "lock-owner-check" and the "lock stealing
>> flag" is required. If the lock can not be acquire the lock (because it
>> would deadlock) then it can return -1.
>> The lock stealing makes actually no sense to me. If
>> shrinker_no_lock_stealing is true then the same functions
>> (i915_gem_purge(), i915_gem_shrink_all()) are called from
>> i915_gem_object_create_mmap_offset() as from i915_gem_inactive_shrink().
>> I haven't found a path in which i915_gem_inactive_shrink() is invoked
>> from i915_gem_object_create_mmap_offset() that means there is no way
>> shrinker_no_lock_stealing is true _and_ the lock is owned by the current
>> process.
>
>Every invocation of a memory allocation function can potentially recourse
>into the shrinker callbacks. Since we currently protect almost all our gpu
>buffer object state with one single lock, and we can also easily hold onto
>tons of memory, we can easily OOM without that lock stealing trick.

But this only occures when you allocate memory from within the driver or
is this mutex so contended that it is almost always taken?

>The patch you're reverting here was added to work around spurious OOM when
>trying to allocate sg tables, so this is real.

Couldn't you simply call the two shrink functions before allocating the
sg table?

>I fully realize that this is an awful hack and I'll burn on a pretty big
>pyre for merging it. I also know that it doesn't really work if someone
>else is trying to allocate while another thread is hogging our single
>mutex. But without a replacement fix, this is nacked.

I'm sorry to point it out but this "fix" has staging quality.

>The no_lock_stealing hack is probably even worse, we use it to protect
>critical section which are touching the gpu lru.

Maybe another reason to have a few spots where you call shirnk function
manually…

>
>Long term we're working towards that goal (also for a bunch of other
>reasons) and aim for more fine-grained locking, so that we can free memory
>in smaller chunks. But that's gonna be a lot of work.
Could you please define "long term" in time units? I would really like
to see the owner check gone asap.

>Cheers, Daniel

Sebastian

Daniel Vetter April 8, 2013, 7:37 p.m. UTC | #2

On Mon, Apr 8, 2013 at 8:58 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> * Daniel Vetter | 2013-04-02 15:47:02 [+0200]:
>>On Tue, Apr 02, 2013 at 03:30:58PM +0200, Sebastian Andrzej Siewior wrote:
>>> mutex_is_locked_by() checks the owner of the lock against current. This
>>> is done by accessing a private member of struct mutex which works on
>>> mainline but does not on RT.
>>> I did not figure out, why this "lock-owner-check" and the "lock stealing
>>> flag" is required. If the lock can not be acquire the lock (because it
>>> would deadlock) then it can return -1.
>>> The lock stealing makes actually no sense to me. If
>>> shrinker_no_lock_stealing is true then the same functions
>>> (i915_gem_purge(), i915_gem_shrink_all()) are called from
>>> i915_gem_object_create_mmap_offset() as from i915_gem_inactive_shrink().
>>> I haven't found a path in which i915_gem_inactive_shrink() is invoked
>>> from i915_gem_object_create_mmap_offset() that means there is no way
>>> shrinker_no_lock_stealing is true _and_ the lock is owned by the current
>>> process.
>>
>>Every invocation of a memory allocation function can potentially recourse
>>into the shrinker callbacks. Since we currently protect almost all our gpu
>>buffer object state with one single lock, and we can also easily hold onto
>>tons of memory, we can easily OOM without that lock stealing trick.
>
> But this only occures when you allocate memory from within the driver or
> is this mutex so contended that it is almost always taken?

It protects the entire gem state atm. So yes, it's grabbed everywhere.
On top of that it's interleaved with tons of legacy drm core code
locking, which almost exclusively relies on the same mutex. I advice
you to bring along your dragon-grade broad sword when venturing in
this area.

>>The patch you're reverting here was added to work around spurious OOM when
>>trying to allocate sg tables, so this is real.
>
> Couldn't you simply call the two shrink functions before allocating the
> sg table?

We'd need to hand-roll our own sg table allocator. We already have
such hand-rolled code for a lot of these cases and it started to get
_really_ ugly.

>>I fully realize that this is an awful hack and I'll burn on a pretty big
>>pyre for merging it. I also know that it doesn't really work if someone
>>else is trying to allocate while another thread is hogging our single
>>mutex. But without a replacement fix, this is nacked.
>
> I'm sorry to point it out but this "fix" has staging quality.

I never denied that, but laid out our plans to eventually fix this for
real. Unfortunately to fix this for real you have to rework 10 years
of horrible drm locking, all more-or-less interacting in surprising
ways with crazy ioctl interfaces. I'm working on this, slowly, but
steadily.

Patches welcome.

>>The no_lock_stealing hack is probably even worse, we use it to protect
>>critical section which are touching the gpu lru.
>
> Maybe another reason to have a few spots where you call shirnk function
> manually…

Manual shrink handling already makes up more than 50% of our logic in
some cases. It's really gotten out of hand, so I've opted for this
hack to fix up the remaining parts until we have a better solution.

>>Long term we're working towards that goal (also for a bunch of other
>>reasons) and aim for more fine-grained locking, so that we can free memory
>>in smaller chunks. But that's gonna be a lot of work.
> Could you please define "long term" in time units? I would really like
> to see the owner check gone asap.

The baseline we need first is the ww_mutex work Maarten Lankhorst is
working on. Then converting gem and our entire driver codebase over to
the new locking pradigm. Plus fixing up anything which needs fixing in
drm, plus any drivers those drm core changes affect. All while doing
our usual pace of new hw enabling, other feature work and normal
bugfixing. If no one chips in, I'd say a few kernel releases at least.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 01769e2..47f28ee 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -724,7 +724,6 @@  struct i915_gem_mm {
 	struct i915_hw_ppgtt *aliasing_ppgtt;
 
 	struct shrinker inactive_shrinker;
-	bool shrinker_no_lock_stealing;
 
 	/**
 	 * List of objects currently involved in rendering.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0e207e6..7949517 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1500,8 +1500,6 @@  static int i915_gem_object_create_mmap_offset(struct drm_i915_gem_object *obj)
 	if (obj->base.map_list.map)
 		return 0;
 
-	dev_priv->mm.shrinker_no_lock_stealing = true;
-
 	ret = drm_gem_create_mmap_offset(&obj->base);
 	if (ret != -ENOSPC)
 		goto out;
@@ -1521,8 +1519,6 @@  static int i915_gem_object_create_mmap_offset(struct drm_i915_gem_object *obj)
 	i915_gem_shrink_all(dev_priv);
 	ret = drm_gem_create_mmap_offset(&obj->base);
 out:
-	dev_priv->mm.shrinker_no_lock_stealing = false;
-
 	return ret;
 }
 
@@ -4368,19 +4364,6 @@  void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 	spin_unlock(&file_priv->mm.lock);
 }
 
-static bool mutex_is_locked_by(struct mutex *mutex, struct task_struct *task)
-{
-	if (!mutex_is_locked(mutex))
-		return false;
-
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_MUTEXES)
-	return mutex->owner == task;
-#else
-	/* Since UP may be pre-empted, we cannot assume that we own the lock */
-	return false;
-#endif
-}
-
 static int
 i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc)
 {
@@ -4391,18 +4374,10 @@  i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc)
 	struct drm_device *dev = dev_priv->dev;
 	struct drm_i915_gem_object *obj;
 	int nr_to_scan = sc->nr_to_scan;
-	bool unlock = true;
 	int cnt;
 
-	if (!mutex_trylock(&dev->struct_mutex)) {
-		if (!mutex_is_locked_by(&dev->struct_mutex, current))
-			return 0;
-
-		if (dev_priv->mm.shrinker_no_lock_stealing)
-			return 0;
-
-		unlock = false;
-	}
+	if (!mutex_trylock(&dev->struct_mutex))
+		return -1;
 
 	if (nr_to_scan) {
 		nr_to_scan -= i915_gem_purge(dev_priv, nr_to_scan);
@@ -4421,7 +4396,6 @@  i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc)
 		if (obj->pin_count == 0 && obj->pages_pin_count == 0)
 			cnt += obj->base.size >> PAGE_SHIFT;
 
-	if (unlock)
-		mutex_unlock(&dev->struct_mutex);
+	mutex_unlock(&dev->struct_mutex);
 	return cnt;
 }