Message ID | 1364909458-59282-1-git-send-email-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
* Daniel Vetter | 2013-04-02 15:47:02 [+0200]: >On Tue, Apr 02, 2013 at 03:30:58PM +0200, Sebastian Andrzej Siewior wrote: >> mutex_is_locked_by() checks the owner of the lock against current. This >> is done by accessing a private member of struct mutex which works on >> mainline but does not on RT. >> I did not figure out, why this "lock-owner-check" and the "lock stealing >> flag" is required. If the lock can not be acquire the lock (because it >> would deadlock) then it can return -1. >> The lock stealing makes actually no sense to me. If >> shrinker_no_lock_stealing is true then the same functions >> (i915_gem_purge(), i915_gem_shrink_all()) are called from >> i915_gem_object_create_mmap_offset() as from i915_gem_inactive_shrink(). >> I haven't found a path in which i915_gem_inactive_shrink() is invoked >> from i915_gem_object_create_mmap_offset() that means there is no way >> shrinker_no_lock_stealing is true _and_ the lock is owned by the current >> process. > >Every invocation of a memory allocation function can potentially recourse >into the shrinker callbacks. Since we currently protect almost all our gpu >buffer object state with one single lock, and we can also easily hold onto >tons of memory, we can easily OOM without that lock stealing trick. But this only occures when you allocate memory from within the driver or is this mutex so contended that it is almost always taken? >The patch you're reverting here was added to work around spurious OOM when >trying to allocate sg tables, so this is real. Couldn't you simply call the two shrink functions before allocating the sg table? >I fully realize that this is an awful hack and I'll burn on a pretty big >pyre for merging it. I also know that it doesn't really work if someone >else is trying to allocate while another thread is hogging our single >mutex. But without a replacement fix, this is nacked. I'm sorry to point it out but this "fix" has staging quality. >The no_lock_stealing hack is probably even worse, we use it to protect >critical section which are touching the gpu lru. Maybe another reason to have a few spots where you call shirnk function manually… > >Long term we're working towards that goal (also for a bunch of other >reasons) and aim for more fine-grained locking, so that we can free memory >in smaller chunks. But that's gonna be a lot of work. Could you please define "long term" in time units? I would really like to see the owner check gone asap. >Cheers, Daniel Sebastian
On Mon, Apr 8, 2013 at 8:58 PM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > * Daniel Vetter | 2013-04-02 15:47:02 [+0200]: >>On Tue, Apr 02, 2013 at 03:30:58PM +0200, Sebastian Andrzej Siewior wrote: >>> mutex_is_locked_by() checks the owner of the lock against current. This >>> is done by accessing a private member of struct mutex which works on >>> mainline but does not on RT. >>> I did not figure out, why this "lock-owner-check" and the "lock stealing >>> flag" is required. If the lock can not be acquire the lock (because it >>> would deadlock) then it can return -1. >>> The lock stealing makes actually no sense to me. If >>> shrinker_no_lock_stealing is true then the same functions >>> (i915_gem_purge(), i915_gem_shrink_all()) are called from >>> i915_gem_object_create_mmap_offset() as from i915_gem_inactive_shrink(). >>> I haven't found a path in which i915_gem_inactive_shrink() is invoked >>> from i915_gem_object_create_mmap_offset() that means there is no way >>> shrinker_no_lock_stealing is true _and_ the lock is owned by the current >>> process. >> >>Every invocation of a memory allocation function can potentially recourse >>into the shrinker callbacks. Since we currently protect almost all our gpu >>buffer object state with one single lock, and we can also easily hold onto >>tons of memory, we can easily OOM without that lock stealing trick. > > But this only occures when you allocate memory from within the driver or > is this mutex so contended that it is almost always taken? It protects the entire gem state atm. So yes, it's grabbed everywhere. On top of that it's interleaved with tons of legacy drm core code locking, which almost exclusively relies on the same mutex. I advice you to bring along your dragon-grade broad sword when venturing in this area. >>The patch you're reverting here was added to work around spurious OOM when >>trying to allocate sg tables, so this is real. > > Couldn't you simply call the two shrink functions before allocating the > sg table? We'd need to hand-roll our own sg table allocator. We already have such hand-rolled code for a lot of these cases and it started to get _really_ ugly. >>I fully realize that this is an awful hack and I'll burn on a pretty big >>pyre for merging it. I also know that it doesn't really work if someone >>else is trying to allocate while another thread is hogging our single >>mutex. But without a replacement fix, this is nacked. > > I'm sorry to point it out but this "fix" has staging quality. I never denied that, but laid out our plans to eventually fix this for real. Unfortunately to fix this for real you have to rework 10 years of horrible drm locking, all more-or-less interacting in surprising ways with crazy ioctl interfaces. I'm working on this, slowly, but steadily. Patches welcome. >>The no_lock_stealing hack is probably even worse, we use it to protect >>critical section which are touching the gpu lru. > > Maybe another reason to have a few spots where you call shirnk function > manually… Manual shrink handling already makes up more than 50% of our logic in some cases. It's really gotten out of hand, so I've opted for this hack to fix up the remaining parts until we have a better solution. >>Long term we're working towards that goal (also for a bunch of other >>reasons) and aim for more fine-grained locking, so that we can free memory >>in smaller chunks. But that's gonna be a lot of work. > Could you please define "long term" in time units? I would really like > to see the owner check gone asap. The baseline we need first is the ww_mutex work Maarten Lankhorst is working on. Then converting gem and our entire driver codebase over to the new locking pradigm. Plus fixing up anything which needs fixing in drm, plus any drivers those drm core changes affect. All while doing our usual pace of new hw enabling, other feature work and normal bugfixing. If no one chips in, I'd say a few kernel releases at least. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 01769e2..47f28ee 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -724,7 +724,6 @@ struct i915_gem_mm { struct i915_hw_ppgtt *aliasing_ppgtt; struct shrinker inactive_shrinker; - bool shrinker_no_lock_stealing; /** * List of objects currently involved in rendering. diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0e207e6..7949517 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1500,8 +1500,6 @@ static int i915_gem_object_create_mmap_offset(struct drm_i915_gem_object *obj) if (obj->base.map_list.map) return 0; - dev_priv->mm.shrinker_no_lock_stealing = true; - ret = drm_gem_create_mmap_offset(&obj->base); if (ret != -ENOSPC) goto out; @@ -1521,8 +1519,6 @@ static int i915_gem_object_create_mmap_offset(struct drm_i915_gem_object *obj) i915_gem_shrink_all(dev_priv); ret = drm_gem_create_mmap_offset(&obj->base); out: - dev_priv->mm.shrinker_no_lock_stealing = false; - return ret; } @@ -4368,19 +4364,6 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file) spin_unlock(&file_priv->mm.lock); } -static bool mutex_is_locked_by(struct mutex *mutex, struct task_struct *task) -{ - if (!mutex_is_locked(mutex)) - return false; - -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_MUTEXES) - return mutex->owner == task; -#else - /* Since UP may be pre-empted, we cannot assume that we own the lock */ - return false; -#endif -} - static int i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc) { @@ -4391,18 +4374,10 @@ i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc) struct drm_device *dev = dev_priv->dev; struct drm_i915_gem_object *obj; int nr_to_scan = sc->nr_to_scan; - bool unlock = true; int cnt; - if (!mutex_trylock(&dev->struct_mutex)) { - if (!mutex_is_locked_by(&dev->struct_mutex, current)) - return 0; - - if (dev_priv->mm.shrinker_no_lock_stealing) - return 0; - - unlock = false; - } + if (!mutex_trylock(&dev->struct_mutex)) + return -1; if (nr_to_scan) { nr_to_scan -= i915_gem_purge(dev_priv, nr_to_scan); @@ -4421,7 +4396,6 @@ i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc) if (obj->pin_count == 0 && obj->pages_pin_count == 0) cnt += obj->base.size >> PAGE_SHIFT; - if (unlock) - mutex_unlock(&dev->struct_mutex); + mutex_unlock(&dev->struct_mutex); return cnt; }
mutex_is_locked_by() checks the owner of the lock against current. This is done by accessing a private member of struct mutex which works on mainline but does not on RT. I did not figure out, why this "lock-owner-check" and the "lock stealing flag" is required. If the lock can not be acquire the lock (because it would deadlock) then it can return -1. The lock stealing makes actually no sense to me. If shrinker_no_lock_stealing is true then the same functions (i915_gem_purge(), i915_gem_shrink_all()) are called from i915_gem_object_create_mmap_offset() as from i915_gem_inactive_shrink(). I haven't found a path in which i915_gem_inactive_shrink() is invoked from i915_gem_object_create_mmap_offset() that means there is no way shrinker_no_lock_stealing is true _and_ the lock is owned by the current process. Since I don't see why the i915 needs this hack while all other user do not I recommend to remove this hack and return -1 in case of a dead lock. Here is the patch. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- drivers/gpu/drm/i915/i915_drv.h | 1 - drivers/gpu/drm/i915/i915_gem.c | 32 +++----------------------------- 2 files changed, 3 insertions(+), 30 deletions(-)