From patchwork Wed Jun 8 11:06:02 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 9164299 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id ECBF260832 for ; Wed, 8 Jun 2016 11:06:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DDD7625819 for ; Wed, 8 Jun 2016 11:06:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D27EE265F9; Wed, 8 Jun 2016 11:06:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DDA7525819 for ; Wed, 8 Jun 2016 11:06:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9BE096E3F8; Wed, 8 Jun 2016 11:06:07 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [87.106.93.118]) by gabe.freedesktop.org (Postfix) with ESMTP id 205EE6E98E for ; Wed, 8 Jun 2016 11:06:05 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from nuc-i3427.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 60658434-1500048 for multiple; Wed, 08 Jun 2016 12:06:16 +0100 Received: by nuc-i3427.alporthouse.com (sSMTP sendmail emulation); Wed, 08 Jun 2016 12:06:02 +0100 Date: Wed, 8 Jun 2016 12:06:02 +0100 From: Chris Wilson To: Joonas Lahtinen , intel-gfx@lists.freedesktop.org Message-ID: <20160608110602.GV32344@nuc-i3427.alporthouse.com> Mail-Followup-To: Chris Wilson , Joonas Lahtinen , intel-gfx@lists.freedesktop.org References: <1464971847-15809-1-git-send-email-chris@chris-wilson.co.uk> <1464971847-15809-2-git-send-email-chris@chris-wilson.co.uk> <1465299067.5626.15.camel@linux.intel.com> <20160608105315.GT32344@nuc-i3427.alporthouse.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160608105315.GT32344@nuc-i3427.alporthouse.com> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [Intel-gfx] [PATCH 01/62] drm/i915: Only start retire worker when idle X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Jun 08, 2016 at 11:53:15AM +0100, Chris Wilson wrote: > On Tue, Jun 07, 2016 at 02:31:07PM +0300, Joonas Lahtinen wrote: > > On pe, 2016-06-03 at 17:36 +0100, Chris Wilson wrote: > > >  i915_gem_idle_work_handler(struct work_struct *work) > > >  { > > >   struct drm_i915_private *dev_priv = > > > - container_of(work, typeof(*dev_priv), mm.idle_work.work); > > > + container_of(work, typeof(*dev_priv), gt.idle_work.work); > > >   struct drm_device *dev = dev_priv->dev; > > >   struct intel_engine_cs *engine; > > >   > > > - for_each_engine(engine, dev_priv) > > > - if (!list_empty(&engine->request_list)) > > > - return; > > > + if (!READ_ONCE(dev_priv->gt.awake)) > > > + return; > > >   > > > - /* we probably should sync with hangcheck here, using cancel_work_sync. > > > -  * Also locking seems to be fubar here, engine->request_list is protected > > > -  * by dev->struct_mutex. */ > > > + mutex_lock(&dev->struct_mutex); > > > + if (dev_priv->gt.active_engines) > > > + goto out; > > >   > > > - intel_mark_idle(dev_priv); > > > + for_each_engine(engine, dev_priv) > > > + i915_gem_batch_pool_fini(&engine->batch_pool); > > >   > > > - if (mutex_trylock(&dev->struct_mutex)) { > > > - for_each_engine(engine, dev_priv) > > > - i915_gem_batch_pool_fini(&engine->batch_pool); > > > + GEM_BUG_ON(!dev_priv->gt.awake); > > > + dev_priv->gt.awake = false; > > >   > > > - mutex_unlock(&dev->struct_mutex); > > > + if (INTEL_INFO(dev_priv)->gen >= 6) > > > + gen6_rps_idle(dev_priv); > > > + intel_runtime_pm_put(dev_priv); > > > +out: > > > + mutex_unlock(&dev->struct_mutex); > > > + > > > + if (!dev_priv->gt.awake && > > > > No READ_ONCE here, even we just unlocked the mutex. So lacks some > > consistency. > > > > Also, this assumes we might be pre-empted between unlocking mutex and > > making this test, so I'm little bit confused. Do you want to optimize > > by avoiding calling cancel_delayed_work_sync? > > General principle to never call work_sync functions with locks held. I > had actually thought I had fixed this up (but realized that I just > rewrote hangcheck later on instead ;) > > Ok, what I think is safer here is > > bool hangcheck = cancel_delay_work_sync(hangcheck_work) > > mutex_lock() > if (actually_idle()) { > awake = false; > missed_irq_rings |= intel_kick_waiters(); > } > mutex_unlock(); > > if (awake && hangcheck) > queue_hangcheck() > > So always kick the hangcheck and reeanble if we tried to idle too early. > This will potentially delay hangcheck by one full hangcheck period if we > do encounter that race. But we shouldn't be hitting this race that > often, or hanging the GPU for that mterr. Actual delta: Reviewed-by: Joonas Lahtinen diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 406046f66e36..856da4036fb3 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3066,10 +3066,15 @@ i915_gem_idle_work_handler(struct work_struct *work) container_of(work, typeof(*dev_priv), gt.idle_work.work); struct drm_device *dev = dev_priv->dev; struct intel_engine_cs *engine; + unsigned stuck_engines; + bool rearm_hangcheck; if (!READ_ONCE(dev_priv->gt.awake)) return; + rearm_hangcheck = + cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); + mutex_lock(&dev->struct_mutex); if (dev_priv->gt.active_engines) goto out; @@ -3079,6 +3084,13 @@ i915_gem_idle_work_handler(struct work_struct *work) GEM_BUG_ON(!dev_priv->gt.awake); dev_priv->gt.awake = false; + rearm_hangcheck = false; + + stuck_engines = intel_kick_waiters(dev_priv); + if (unlikely(stuck_engines)) { + DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n"); + dev_priv->gpu_error.missed_irq_rings |= stuck_engines; + } if (INTEL_INFO(dev_priv)->gen >= 6) gen6_rps_idle(dev_priv); @@ -3086,14 +3098,8 @@ i915_gem_idle_work_handler(struct work_struct *work) out: mutex_unlock(&dev->struct_mutex); - if (!dev_priv->gt.awake && - cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work)) { - unsigned stuck = intel_kick_waiters(dev_priv); - if (unlikely(stuck)) { - DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n"); - dev_priv->gpu_error.missed_irq_rings |= stuck; - } - } + if (rearm_hangcheck) + i915_queue_hangcheck(dev_priv); } -Chris -- Chris Wilson, Intel Open Source Technology Centre