From patchwork Fri Jan 23 12:44:07 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mika Kuoppala X-Patchwork-Id: 5693821 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id E82549F358 for ; Fri, 23 Jan 2015 12:44:17 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id E4DC420225 for ; Fri, 23 Jan 2015 12:44:16 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id B3440201F5 for ; Fri, 23 Jan 2015 12:44:15 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0B3EF6E8CC; Fri, 23 Jan 2015 04:44:15 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTP id A6D396E8CC for ; Fri, 23 Jan 2015 04:44:13 -0800 (PST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP; 23 Jan 2015 04:44:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="444177560" Received: from rosetta.fi.intel.com (HELO rosetta) ([10.237.72.102]) by FMSMGA003.fm.intel.com with ESMTP; 23 Jan 2015 04:30:44 -0800 Received: by rosetta (Postfix, from userid 1000) id E04E280052; Fri, 23 Jan 2015 14:44:10 +0200 (EET) From: Mika Kuoppala To: intel-gfx@lists.freedesktop.org Date: Fri, 23 Jan 2015 14:44:07 +0200 Message-Id: <1422017048-4792-1-git-send-email-mika.kuoppala@intel.com> X-Mailer: git-send-email 1.9.1 Cc: Jani Nikula , Daniel Vetter , miku@iki.fi Subject: [Intel-gfx] [PATCH 1/2] drm/i915: Convert hangcheck from a timer into a delayed work item X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Wilson When run as a timer, i915_hangcheck_elapsed() must adhere to all the rules of running in a softirq context. This is advantageous to us as we want to minimise the risk that a driver bug will prevent us from detecting a hung GPU. However, that is irrelevant if the driver bug prevents us from resetting and recovering. Still it is prudent not to rely on mutexes inside the checker, but given the coarseness of dev->struct_mutex doing so is extremely hard. Give in and run from a work queue, i.e. outside of softirq. v2: The conversion does have one significant change, from the use of mod_timer to schedule_delayed_work, means that the time that we execute the first hangcheck is fixed and not continually deferred by later work. This has the advantage of not allowing userspace to fill the ring before hangcheck can finally run. At the same time, it removes the ability for the interrupt to defer the hangcheck as well. This is sensible for that an interrupt is only for a single engine, whereas we perform hangcheck globally, so whilst one ring may have hung, the other could be running normally and preventing the hangcheck from firing. Cc: Jani Nikula Cc: Daniel Vetter Signed-off-by: Chris Wilson (v2) Signed-off-by: Mika Kuoppala --- drivers/gpu/drm/i915/i915_dma.c | 2 +- drivers/gpu/drm/i915/i915_drv.c | 2 +- drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_gem.c | 2 +- drivers/gpu/drm/i915/i915_irq.c | 23 +++++++++-------------- 5 files changed, 13 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 51e8fe5..7c64669 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -934,7 +934,7 @@ int i915_driver_unload(struct drm_device *dev) } /* Free error state after interrupts are fully disabled. */ - del_timer_sync(&dev_priv->gpu_error.hangcheck_timer); + cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); cancel_work_sync(&dev_priv->gpu_error.work); i915_destroy_error_state(dev); diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 66c72bd..cb1468d 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1396,7 +1396,7 @@ static int intel_runtime_suspend(struct device *device) return ret; } - del_timer_sync(&dev_priv->gpu_error.hangcheck_timer); + cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); intel_uncore_forcewake_reset(dev, false); dev_priv->pm.suspended = true; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0d67b17..ce2acdd 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1345,7 +1345,7 @@ struct i915_gpu_error { /* Hang gpu twice in this window and your context gets banned */ #define DRM_I915_CTX_BAN_PERIOD DIV_ROUND_UP(8*DRM_I915_HANGCHECK_PERIOD, 1000) - struct timer_list hangcheck_timer; + struct delayed_work hangcheck_work; /* For reset and error_state handling. */ spinlock_t lock; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fc81889..8a178cd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4615,7 +4615,7 @@ i915_gem_suspend(struct drm_device *dev) i915_gem_stop_ringbuffers(dev); mutex_unlock(&dev->struct_mutex); - del_timer_sync(&dev_priv->gpu_error.hangcheck_timer); + cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); cancel_delayed_work_sync(&dev_priv->mm.retire_work); flush_delayed_work(&dev_priv->mm.idle_work); diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 25b2299..a188c7d 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2980,10 +2980,12 @@ ring_stuck(struct intel_engine_cs *ring, u64 acthd) * we kick the ring. If we see no progress on three subsequent calls * we assume chip is wedged and try to fix it by resetting the chip. */ -static void i915_hangcheck_elapsed(unsigned long data) +static void i915_hangcheck_elapsed(struct work_struct *work) { - struct drm_device *dev = (struct drm_device *)data; - struct drm_i915_private *dev_priv = dev->dev_private; + struct drm_i915_private *dev_priv = + container_of(work, typeof(*dev_priv), + gpu_error.hangcheck_work.work); + struct drm_device *dev = dev_priv->dev; struct intel_engine_cs *ring; int i; int busy_count = 0, rings_hung = 0; @@ -3097,17 +3099,11 @@ static void i915_hangcheck_elapsed(unsigned long data) void i915_queue_hangcheck(struct drm_device *dev) { - struct drm_i915_private *dev_priv = dev->dev_private; - struct timer_list *timer = &dev_priv->gpu_error.hangcheck_timer; - if (!i915.enable_hangcheck) return; - /* Don't continually defer the hangcheck, but make sure it is active */ - if (timer_pending(timer)) - return; - mod_timer(timer, - round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES)); + schedule_delayed_work(&to_i915(dev)->gpu_error.hangcheck_work, + round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES)); } static void ibx_irq_reset(struct drm_device *dev) @@ -4351,9 +4347,8 @@ void intel_irq_init(struct drm_i915_private *dev_priv) else dev_priv->pm_rps_events = GEN6_PM_RPS_EVENTS; - setup_timer(&dev_priv->gpu_error.hangcheck_timer, - i915_hangcheck_elapsed, - (unsigned long) dev); + INIT_DELAYED_WORK(&dev_priv->gpu_error.hangcheck_work, + i915_hangcheck_elapsed); INIT_DELAYED_WORK(&dev_priv->hotplug_reenable_work, intel_hpd_irq_reenable_work);