[2/2] drm/i915: Fix timeout handling in i915_gem_shrinker_vmap
diff mbox series

Message ID 20190109141247.32166-2-tvrtko.ursulin@linux.intel.com
State New
Headers show
Series
  • [1/2] drm/i915: Reduce recursive mutex locking from the shrinker
Related show

Commit Message

Tvrtko Ursulin Jan. 9, 2019, 2:12 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

The code tries to grab struct mutex for 5ms every time the unlocked GPU
idle wait succeeds. But the GPU idle wait itself is practically unbound
which means the 5ms timeout might not be honoured.

Cap the GPU idle wait to 5ms as well to fix this.

v2:
 * Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Chris Wilson Jan. 9, 2019, 2:23 p.m. UTC | #1
Quoting Tvrtko Ursulin (2019-01-09 14:12:47)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> The code tries to grab struct mutex for 5ms every time the unlocked GPU
> idle wait succeeds. But the GPU idle wait itself is practically unbound
> which means the 5ms timeout might not be honoured.

The arbitrary timeout is merely advisory. If we hang, we get to recover
everything we can!
-Chris
Chris Wilson Jan. 9, 2019, 2:31 p.m. UTC | #2
Quoting Tvrtko Ursulin (2019-01-09 14:12:47)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> The code tries to grab struct mutex for 5ms every time the unlocked GPU
> idle wait succeeds. But the GPU idle wait itself is practically unbound
> which means the 5ms timeout might not be honoured.
> 
> Cap the GPU idle wait to 5ms as well to fix this.
> 
> v2:
>  * Rebase.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_shrinker.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index ce539d38461c..4ee393028a93 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -402,13 +402,12 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
>  
>  static bool
>  shrinker_lock_uninterruptible(struct drm_i915_private *i915, bool *unlock,
> -                             int timeout_ms)
> +                             unsigned long timeout)
>  {
> -       unsigned long timeout = jiffies + msecs_to_jiffies_timeout(timeout_ms);
> +       const unsigned long timeout_end = jiffies + timeout;
>  
>         do {
> -               if (i915_gem_wait_for_idle(i915,
> -                                          0, MAX_SCHEDULE_TIMEOUT) == 0 &&
> +               if (i915_gem_wait_for_idle(i915, 0, timeout) == 0 &&

Hmm, I wonder if this code could be hitting i915_request_wait():

#if IS_ENABLED(CONFIG_LOCKDEP)
        GEM_BUG_ON(debug_locks &&
                   !!lockdep_is_held(&rq->i915->drm.struct_mutex) !=
                   !!(flags & I915_WAIT_LOCKED));
#endif

as i915_gem_shrinker_vmap() could conceivably be called under
struct_mutex, e.g. i915_gem_object_pin_map(), and I'm sure I've seen the
vmap arena used for setting pte bits.

Onwards to removing that restriction!
-Chris
Tvrtko Ursulin Jan. 9, 2019, 3:07 p.m. UTC | #3
On 09/01/2019 14:23, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-09 14:12:47)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> The code tries to grab struct mutex for 5ms every time the unlocked GPU
>> idle wait succeeds. But the GPU idle wait itself is practically unbound
>> which means the 5ms timeout might not be honoured.
> 
> The arbitrary timeout is merely advisory. If we hang, we get to recover
> everything we can!

You mean if we hang, eg. take more than 5ms to idle, the function bails 
out on the 5ms timeout check, failing to take the lock and free anything? ;)

Regards,

Tvrtko
Chris Wilson Jan. 9, 2019, 3:13 p.m. UTC | #4
Quoting Tvrtko Ursulin (2019-01-09 15:07:34)
> 
> On 09/01/2019 14:23, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-01-09 14:12:47)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> The code tries to grab struct mutex for 5ms every time the unlocked GPU
> >> idle wait succeeds. But the GPU idle wait itself is practically unbound
> >> which means the 5ms timeout might not be honoured.
> > 
> > The arbitrary timeout is merely advisory. If we hang, we get to recover
> > everything we can!
> 
> You mean if we hang, eg. take more than 5ms to idle, the function bails 
> out on the 5ms timeout check, failing to take the lock and free anything? ;)

No, just that I picked 5 _seconds_ as arbitrary timeout. As it now hits
the mutex_lock() path, the whole polling + timeout is kaput. What I
think we should do is use mutex_lock_killable instead.
-Chris

Patch
diff mbox series

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index ce539d38461c..4ee393028a93 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -402,13 +402,12 @@  i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 
 static bool
 shrinker_lock_uninterruptible(struct drm_i915_private *i915, bool *unlock,
-			      int timeout_ms)
+			      unsigned long timeout)
 {
-	unsigned long timeout = jiffies + msecs_to_jiffies_timeout(timeout_ms);
+	const unsigned long timeout_end = jiffies + timeout;
 
 	do {
-		if (i915_gem_wait_for_idle(i915,
-					   0, MAX_SCHEDULE_TIMEOUT) == 0 &&
+		if (i915_gem_wait_for_idle(i915, 0, timeout) == 0 &&
 		    shrinker_lock(i915, 0, unlock))
 			break;
 
@@ -416,7 +415,7 @@  shrinker_lock_uninterruptible(struct drm_i915_private *i915, bool *unlock,
 		if (fatal_signal_pending(current))
 			return false;
 
-		if (time_after(jiffies, timeout)) {
+		if (time_after(jiffies, timeout_end)) {
 			pr_err("Unable to lock GPU to purge memory.\n");
 			return false;
 		}
@@ -474,11 +473,12 @@  i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
 	struct drm_i915_private *i915 =
 		container_of(nb, struct drm_i915_private, mm.vmap_notifier);
 	struct i915_vma *vma, *next;
+	const unsigned long timeout = msecs_to_jiffies_timeout(5000);
 	unsigned long freed_pages = 0;
 	bool unlock;
 	int ret;
 
-	if (!shrinker_lock_uninterruptible(i915, &unlock, 5000))
+	if (!shrinker_lock_uninterruptible(i915, &unlock, timeout))
 		return NOTIFY_DONE;
 
 	/* Force everything onto the inactive lists */