From patchwork Fri Sep 9 07:20:58 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 9322513 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id F15C860839 for ; Fri, 9 Sep 2016 07:21:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD9BC29C4F for ; Fri, 9 Sep 2016 07:21:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D272A29C90; Fri, 9 Sep 2016 07:21:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6463F29C8F for ; Fri, 9 Sep 2016 07:21:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 70F606E6C3; Fri, 9 Sep 2016 07:21:27 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-wm0-x241.google.com (mail-wm0-x241.google.com [IPv6:2a00:1450:400c:c09::241]) by gabe.freedesktop.org (Postfix) with ESMTPS id E3F806E6CF for ; Fri, 9 Sep 2016 07:21:25 +0000 (UTC) Received: by mail-wm0-x241.google.com with SMTP id l65so1390302wmf.3 for ; Fri, 09 Sep 2016 00:21:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:subject:date:message-id:in-reply-to:references; bh=V9QBXLUiDOmuVkCE3kqzu5JNCehljgW0sNeT/KSARBQ=; b=bgsOdeFbPJ62iIyG0QAFtnM6ZIqcGzny0KhdmftjrZvSsXnLBRq4frOGDyTGb87F0I /QjHlO00zV/Q0NGH+hudoHCY/Zt3LBqvKHUGFTpz3LU7gpaocGnj4KjizDk5UriZD3wc 3Q3AWTdciL/o1qgyWGmuGYSI8Pkjv8CIYs9mZYHaqB6XLjl+PwzrdcjVfyGFrxvjZ6w0 oQBoPe5ddqyKVvU3Q7B7Y8ixEE9bWZSpQIk3AumfudvuiO231mjuoWfvBo9gujQCQGh4 STFz1KzhYuBuNpMHezDcx8xq9QvTKv+NojMRNXygVEBEVM6APRlyTTOYM+I+fJ+tx9zh UnRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:subject:date:message-id :in-reply-to:references; bh=V9QBXLUiDOmuVkCE3kqzu5JNCehljgW0sNeT/KSARBQ=; b=G7qPwXp6rRucjvFF2dFmgUDE/QH+p4v4X8Pyi5/xJbMEWBsX4mIRw6HI5/CfxhoQem WahETQRY6y4cWm9FdyKnd4h6NIVArXJ/clfN06mF7Zv1Lr9U5karE4z4BVAMyr5KrDiL WGCJ50ZQ2laMq2YSSOtZVmK9GvWqRFz36O0BCDrg7M0Yt80mHx7d07xo6DHpRVhdeA8F NOquUo126vl7LxEnQZMAUkGAZWdV2V3axdNIVfZleYY4TQ5mccQgIP1bh+P7/Qf1s4Hr gYl018eicJP/CxZ3rBCoofhJbREdmu3TqrrzaYl9PGcpD81gARrKilqx+lyIEMC8IfHz dl5A== X-Gm-Message-State: AE9vXwM+o8wLxIQNBZ77P4fVGbJvFahRGl9WgZeUS8IF861/mTs5wmeXvisa3xbM8Mab5Q== X-Received: by 10.28.27.15 with SMTP id b15mr1311521wmb.81.1473405684233; Fri, 09 Sep 2016 00:21:24 -0700 (PDT) Received: from haswell.alporthouse.com ([78.156.65.138]) by smtp.gmail.com with ESMTPSA id kk6sm1891592wjb.44.2016.09.09.00.21.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 09 Sep 2016 00:21:23 -0700 (PDT) From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 9 Sep 2016 08:20:58 +0100 Message-Id: <20160909072107.18861-12-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20160909072107.18861-1-chris@chris-wilson.co.uk> References: <20160909072107.18861-1-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [CI 12/21] drm/i915: Replace wait-on-mutex with wait-on-bit in reset worker X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Since we have a cooperative mode now with a direct reset, we can avoid the contention on struct_mutex and instead try then sleep on the I915_RESET_IN_PROGRESS bit. If the mutex is held and that bit is cleared, all is fine. Otherwise, we sleep for a bit and try again. In the worst case we sleep for an extra second waiting for the mutex to be released (no one touching the GPU is allowed the struct_mutex whilst the I915_RESET_IN_PROGRESS bit is set). But when we have a direct reset, this allows us to clean up the reset worker faster. v2: Remember to call wake_up_bit() after changing (for the faster wakeup as promised) Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/i915_drv.c | 14 ++++++++------ drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_irq.c | 31 ++++++++++++++++++------------- 3 files changed, 27 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index ff4173e6e298..f2614b2f59f7 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1726,8 +1726,8 @@ int i915_resume_switcheroo(struct drm_device *dev) * i915_reset - reset chip after a hang * @dev: drm device to reset * - * Reset the chip. Useful if a hang is detected. Returns zero on successful - * reset or otherwise an error code. + * Reset the chip. Useful if a hang is detected. Marks the device as wedged + * on failure. * * Caller must hold the struct_mutex. * @@ -1739,7 +1739,7 @@ int i915_resume_switcheroo(struct drm_device *dev) * - re-init interrupt state * - re-init display */ -int i915_reset(struct drm_i915_private *dev_priv) +void i915_reset(struct drm_i915_private *dev_priv) { struct drm_device *dev = &dev_priv->drm; struct i915_gpu_error *error = &dev_priv->gpu_error; @@ -1748,7 +1748,7 @@ int i915_reset(struct drm_i915_private *dev_priv) lockdep_assert_held(&dev->struct_mutex); if (!test_and_clear_bit(I915_RESET_IN_PROGRESS, &error->flags)) - return test_bit(I915_WEDGED, &error->flags) ? -EIO : 0; + return; /* Clear any previous failed attempts at recovery. Time to try again. */ __clear_bit(I915_WEDGED, &error->flags); @@ -1798,11 +1798,13 @@ int i915_reset(struct drm_i915_private *dev_priv) intel_sanitize_gt_powersave(dev_priv); intel_autoenable_gt_powersave(dev_priv); - return 0; +wakeup: + wake_up_bit(&error->flags, I915_RESET_IN_PROGRESS); + return; error: set_bit(I915_WEDGED, &error->flags); - return ret; + goto wakeup; } static int i915_pm_suspend(struct device *kdev) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 15f1977e356a..9a9f07f3574c 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2884,7 +2884,7 @@ extern long i915_compat_ioctl(struct file *filp, unsigned int cmd, #endif extern int intel_gpu_reset(struct drm_i915_private *dev_priv, u32 engine_mask); extern bool intel_has_gpu_reset(struct drm_i915_private *dev_priv); -extern int i915_reset(struct drm_i915_private *dev_priv); +extern void i915_reset(struct drm_i915_private *dev_priv); extern int intel_guc_reset(struct drm_i915_private *dev_priv); extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine); extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv); diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2c7cb5041511..ef2d40278191 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2497,7 +2497,6 @@ static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv) char *error_event[] = { I915_ERROR_UEVENT "=1", NULL }; char *reset_event[] = { I915_RESET_UEVENT "=1", NULL }; char *reset_done_event[] = { I915_ERROR_UEVENT "=0", NULL }; - int ret; kobject_uevent_env(kobj, KOBJ_CHANGE, error_event); @@ -2512,24 +2511,30 @@ static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv) * simulated reset via debugs, so get an RPM reference. */ intel_runtime_pm_get(dev_priv); - intel_prepare_reset(dev_priv); - /* - * All state reset _must_ be completed before we update the - * reset counter, for otherwise waiters might miss the reset - * pending state and not properly drop locks, resulting in - * deadlocks with the reset work. - */ - mutex_lock(&dev_priv->drm.struct_mutex); - ret = i915_reset(dev_priv); - mutex_unlock(&dev_priv->drm.struct_mutex); + do { + /* + * All state reset _must_ be completed before we update the + * reset counter, for otherwise waiters might miss the reset + * pending state and not properly drop locks, resulting in + * deadlocks with the reset work. + */ + if (mutex_trylock(&dev_priv->drm.struct_mutex)) { + i915_reset(dev_priv); + mutex_unlock(&dev_priv->drm.struct_mutex); + } - intel_finish_reset(dev_priv); + /* We need to wait for anyone holding the lock to wakeup */ + } while (wait_on_bit_timeout(&dev_priv->gpu_error.flags, + I915_RESET_IN_PROGRESS, + TASK_UNINTERRUPTIBLE, + HZ)); + intel_finish_reset(dev_priv); intel_runtime_pm_put(dev_priv); - if (ret == 0) + if (!test_bit(I915_WEDGED, &dev_priv->gpu_error.flags)) kobject_uevent_env(kobj, KOBJ_CHANGE, reset_done_event);