[01/62] drm/i915: Only start retire worker when idle

Message ID	20160608110602.GV32344@nuc-i3427.alporthouse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> Date: Wed, 8 Jun 2016 12:06:02 +0100 From: Chris Wilson <chris@chris-wilson.co.uk> To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>, intel-gfx@lists.freedesktop.org Message-ID: <20160608110602.GV32344@nuc-i3427.alporthouse.com> Mail-Followup-To: Chris Wilson <chris@chris-wilson.co.uk>, Joonas Lahtinen <joonas.lahtinen@linux.intel.com>, intel-gfx@lists.freedesktop.org References: <1464971847-15809-1-git-send-email-chris@chris-wilson.co.uk> <1464971847-15809-2-git-send-email-chris@chris-wilson.co.uk> <1465299067.5626.15.camel@linux.intel.com> <20160608105315.GT32344@nuc-i3427.alporthouse.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160608105315.GT32344@nuc-i3427.alporthouse.com> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [Intel-gfx] [PATCH 01/62] drm/i915: Only start retire worker when idle Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

20160608110602.GV32344@nuc-i3427.alporthouse.com (mailing list archive)

State

New, archived

Headers

Date: Wed, 8 Jun 2016 12:06:02 +0100
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Message-ID: <20160608110602.GV32344@nuc-i3427.alporthouse.com>
Mail-Followup-To: Chris Wilson <chris@chris-wilson.co.uk>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
References: <1464971847-15809-1-git-send-email-chris@chris-wilson.co.uk>
	<1464971847-15809-2-git-send-email-chris@chris-wilson.co.uk>
	<1465299067.5626.15.camel@linux.intel.com>
	<20160608105315.GT32344@nuc-i3427.alporthouse.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20160608105315.GT32344@nuc-i3427.alporthouse.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: [Intel-gfx] [PATCH 01/62] drm/i915: Only start retire worker
	when idle
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

Chris Wilson June 8, 2016, 11:06 a.m. UTC

On Wed, Jun 08, 2016 at 11:53:15AM +0100, Chris Wilson wrote:
> On Tue, Jun 07, 2016 at 02:31:07PM +0300, Joonas Lahtinen wrote:
> > On pe, 2016-06-03 at 17:36 +0100, Chris Wilson wrote:
> > >  i915_gem_idle_work_handler(struct work_struct *work)
> > >  {
> > >  	struct drm_i915_private *dev_priv =
> > > -		container_of(work, typeof(*dev_priv), mm.idle_work.work);
> > > +		container_of(work, typeof(*dev_priv), gt.idle_work.work);
> > >  	struct drm_device *dev = dev_priv->dev;
> > >  	struct intel_engine_cs *engine;
> > >  
> > > -	for_each_engine(engine, dev_priv)
> > > -		if (!list_empty(&engine->request_list))
> > > -			return;
> > > +	if (!READ_ONCE(dev_priv->gt.awake))
> > > +		return;
> > >  
> > > -	/* we probably should sync with hangcheck here, using cancel_work_sync.
> > > -	 * Also locking seems to be fubar here, engine->request_list is protected
> > > -	 * by dev->struct_mutex. */
> > > +	mutex_lock(&dev->struct_mutex);
> > > +	if (dev_priv->gt.active_engines)
> > > +		goto out;
> > >  
> > > -	intel_mark_idle(dev_priv);
> > > +	for_each_engine(engine, dev_priv)
> > > +		i915_gem_batch_pool_fini(&engine->batch_pool);
> > >  
> > > -	if (mutex_trylock(&dev->struct_mutex)) {
> > > -		for_each_engine(engine, dev_priv)
> > > -			i915_gem_batch_pool_fini(&engine->batch_pool);
> > > +	GEM_BUG_ON(!dev_priv->gt.awake);
> > > +	dev_priv->gt.awake = false;
> > >  
> > > -		mutex_unlock(&dev->struct_mutex);
> > > +	if (INTEL_INFO(dev_priv)->gen >= 6)
> > > +		gen6_rps_idle(dev_priv);
> > > +	intel_runtime_pm_put(dev_priv);
> > > +out:
> > > +	mutex_unlock(&dev->struct_mutex);
> > > +
> > > +	if (!dev_priv->gt.awake &&
> > 
> > No READ_ONCE here, even we just unlocked the mutex. So lacks some
> > consistency.
> > 
> > Also, this assumes we might be pre-empted between unlocking mutex and
> > making this test, so I'm little bit confused. Do you want to optimize
> > by avoiding calling cancel_delayed_work_sync?
> 
> General principle to never call work_sync functions with locks held. I
> had actually thought I had fixed this up (but realized that I just
> rewrote hangcheck later on instead ;)
> 
> Ok, what I think is safer here is
> 
> 	bool hangcheck = cancel_delay_work_sync(hangcheck_work)
> 
> 	mutex_lock()
> 	if (actually_idle()) {
> 		awake = false;
> 		missed_irq_rings |= intel_kick_waiters();
> 	}
> 	mutex_unlock();
> 
> 	if (awake && hangcheck)
> 		queue_hangcheck()
> 	
> So always kick the hangcheck and reeanble if we tried to idle too early.
> This will potentially delay hangcheck by one full hangcheck period if we
> do encounter that race. But we shouldn't be hitting this race that
> often, or hanging the GPU for that mterr.

Actual delta:

Comments

Joonas Lahtinen June 8, 2016, 12:07 p.m. UTC | #1

On ke, 2016-06-08 at 12:06 +0100, Chris Wilson wrote:
> On Wed, Jun 08, 2016 at 11:53:15AM +0100, Chris Wilson wrote:
> > 
> > On Tue, Jun 07, 2016 at 02:31:07PM +0300, Joonas Lahtinen wrote:
> > > 
> > > On pe, 2016-06-03 at 17:36 +0100, Chris Wilson wrote:
> > > > 
> > > >  i915_gem_idle_work_handler(struct work_struct *work)
> > > >  {
> > > >  	struct drm_i915_private *dev_priv =
> > > > -		container_of(work, typeof(*dev_priv), mm.idle_work.work);
> > > > +		container_of(work, typeof(*dev_priv), gt.idle_work.work);
> > > >  	struct drm_device *dev = dev_priv->dev;
> > > >  	struct intel_engine_cs *engine;
> > > >  
> > > > -	for_each_engine(engine, dev_priv)
> > > > -		if (!list_empty(&engine->request_list))
> > > > -			return;
> > > > +	if (!READ_ONCE(dev_priv->gt.awake))
> > > > +		return;
> > > >  
> > > > -	/* we probably should sync with hangcheck here, using cancel_work_sync.
> > > > -	 * Also locking seems to be fubar here, engine->request_list is protected
> > > > -	 * by dev->struct_mutex. */
> > > > +	mutex_lock(&dev->struct_mutex);
> > > > +	if (dev_priv->gt.active_engines)
> > > > +		goto out;
> > > >  
> > > > -	intel_mark_idle(dev_priv);
> > > > +	for_each_engine(engine, dev_priv)
> > > > +		i915_gem_batch_pool_fini(&engine->batch_pool);
> > > >  
> > > > -	if (mutex_trylock(&dev->struct_mutex)) {
> > > > -		for_each_engine(engine, dev_priv)
> > > > -			i915_gem_batch_pool_fini(&engine->batch_pool);
> > > > +	GEM_BUG_ON(!dev_priv->gt.awake);
> > > > +	dev_priv->gt.awake = false;
> > > >  
> > > > -		mutex_unlock(&dev->struct_mutex);
> > > > +	if (INTEL_INFO(dev_priv)->gen >= 6)
> > > > +		gen6_rps_idle(dev_priv);
> > > > +	intel_runtime_pm_put(dev_priv);
> > > > +out:
> > > > +	mutex_unlock(&dev->struct_mutex);
> > > > +
> > > > +	if (!dev_priv->gt.awake &&
> > > No READ_ONCE here, even we just unlocked the mutex. So lacks some
> > > consistency.
> > > 
> > > Also, this assumes we might be pre-empted between unlocking mutex and
> > > making this test, so I'm little bit confused. Do you want to optimize
> > > by avoiding calling cancel_delayed_work_sync?
> > General principle to never call work_sync functions with locks held. I
> > had actually thought I had fixed this up (but realized that I just
> > rewrote hangcheck later on instead ;)
> > 
> > Ok, what I think is safer here is
> > 
> > 	bool hangcheck = cancel_delay_work_sync(hangcheck_work)
> > 
> > 	mutex_lock()
> > 	if (actually_idle()) {
> > 		awake = false;
> > 		missed_irq_rings |= intel_kick_waiters();
> > 	}
> > 	mutex_unlock();
> > 
> > 	if (awake && hangcheck)
> > 		queue_hangcheck()
> > 	
> > So always kick the hangcheck and reeanble if we tried to idle too early.
> > This will potentially delay hangcheck by one full hangcheck period if we
> > do encounter that race. But we shouldn't be hitting this race that
> > often, or hanging the GPU for that mterr.
> Actual delta:
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 406046f66e36..856da4036fb3 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3066,10 +3066,15 @@ i915_gem_idle_work_handler(struct work_struct *work)
>                 container_of(work, typeof(*dev_priv), gt.idle_work.work);
>         struct drm_device *dev = dev_priv->dev;
>         struct intel_engine_cs *engine;
> +       unsigned stuck_engines;
> +       bool rearm_hangcheck;
>  
>         if (!READ_ONCE(dev_priv->gt.awake))
>                 return;
>  
> +       rearm_hangcheck =
> +               cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
> +
>         mutex_lock(&dev->struct_mutex);
>         if (dev_priv->gt.active_engines)
>                 goto out;
> @@ -3079,6 +3084,13 @@ i915_gem_idle_work_handler(struct work_struct *work)
>  
>         GEM_BUG_ON(!dev_priv->gt.awake);
>         dev_priv->gt.awake = false;
> +       rearm_hangcheck = false;
> +
> +       stuck_engines = intel_kick_waiters(dev_priv);
> +       if (unlikely(stuck_engines)) {
> +               DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
> +               dev_priv->gpu_error.missed_irq_rings |= stuck_engines;
> +       }
>  
>         if (INTEL_INFO(dev_priv)->gen >= 6)
>                 gen6_rps_idle(dev_priv);
> @@ -3086,14 +3098,8 @@ i915_gem_idle_work_handler(struct work_struct *work)
>  out:
>         mutex_unlock(&dev->struct_mutex);
>  
> -       if (!dev_priv->gt.awake &&
> -           cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work)) {
> -               unsigned stuck = intel_kick_waiters(dev_priv);
> -               if (unlikely(stuck)) {
> -                       DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
> -                       dev_priv->gpu_error.missed_irq_rings |= stuck;
> -               }
> -       }
> +       if (rearm_hangcheck)
> +               i915_queue_hangcheck(dev_priv);

As discussed in IRC, should not race, so with above hunk;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

>  }
> -Chris
>

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 406046f66e36..856da4036fb3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3066,10 +3066,15 @@  i915_gem_idle_work_handler(struct work_struct *work)
                container_of(work, typeof(*dev_priv), gt.idle_work.work);
        struct drm_device *dev = dev_priv->dev;
        struct intel_engine_cs *engine;
+       unsigned stuck_engines;
+       bool rearm_hangcheck;
 
        if (!READ_ONCE(dev_priv->gt.awake))
                return;
 
+       rearm_hangcheck =
+               cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
+
        mutex_lock(&dev->struct_mutex);
        if (dev_priv->gt.active_engines)
                goto out;
@@ -3079,6 +3084,13 @@  i915_gem_idle_work_handler(struct work_struct *work)
 
        GEM_BUG_ON(!dev_priv->gt.awake);
        dev_priv->gt.awake = false;
+       rearm_hangcheck = false;
+
+       stuck_engines = intel_kick_waiters(dev_priv);
+       if (unlikely(stuck_engines)) {
+               DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
+               dev_priv->gpu_error.missed_irq_rings |= stuck_engines;
+       }
 
        if (INTEL_INFO(dev_priv)->gen >= 6)
                gen6_rps_idle(dev_priv);
@@ -3086,14 +3098,8 @@  i915_gem_idle_work_handler(struct work_struct *work)
 out:
        mutex_unlock(&dev->struct_mutex);
 
-       if (!dev_priv->gt.awake &&
-           cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work)) {
-               unsigned stuck = intel_kick_waiters(dev_priv);
-               if (unlikely(stuck)) {
-                       DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
-                       dev_priv->gpu_error.missed_irq_rings |= stuck;
-               }
-       }
+       if (rearm_hangcheck)
+               i915_queue_hangcheck(dev_priv);
 }
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

[01/62] drm/i915: Only start retire worker when idle

Commit Message

Comments

Patch