From patchwork Fri Dec 8 01:17:20 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10101437 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5DDE56056F for ; Fri, 8 Dec 2017 01:18:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E35928902 for ; Fri, 8 Dec 2017 01:18:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 425C82897C; Fri, 8 Dec 2017 01:18:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B32BD28902 for ; Fri, 8 Dec 2017 01:18:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7CE1B6E08B; Fri, 8 Dec 2017 01:18:18 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6C8376E08B for ; Fri, 8 Dec 2017 01:18:16 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 9911829-1500050 for multiple; Fri, 08 Dec 2017 01:17:22 +0000 Received: by haswell.alporthouse.com (sSMTP sendmail emulation); Fri, 08 Dec 2017 01:17:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 8 Dec 2017 01:17:20 +0000 Message-Id: <20171208011720.5553-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20171207235625.9135-1-chris@chris-wilson.co.uk> References: <20171207235625.9135-1-chris@chris-wilson.co.uk> X-Originating-IP: 78.156.65.138 X-Country: code=GB country="United Kingdom" ip=78.156.65.138 Subject: [Intel-gfx] [PATCH v2] drm/i915: Unwind i915_gem_init() failure X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Since Michal introduced new errors other than -EIO during i915_gem_init(), we need to actually unwind on the error path as we have to abort the module load (and we expect to do so cleanly!). As we now teardown key state and then mark the driver as wedged (on EIO), we have to be careful to not allow ourselves to resume and unwedge, thus attempting to use the uninitialised driver. v2: Try not to free driver state for the suppressed EIO References: 8620eb1dbbf2 ("drm/i915/uc: Don't use -EIO to report missing firmware") Signed-off-by: Chris Wilson Cc: Michal Wajdeczko Cc: Joonas Lahtinen Cc: Sagar Arun Kamble Signed-off-by: Chris Wilson Reviewed-by: MichaƂ Winiarski --- drivers/gpu/drm/i915/i915_gem.c | 82 +++++++++++++++++++++++++++++++++-------- 1 file changed, 67 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index c7b5db78fbb4..ee243e1ef706 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3235,7 +3235,12 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915) if (!test_bit(I915_WEDGED, &i915->gpu_error.flags)) return true; - /* Before unwedging, make sure that all pending operations + /* Never successfully initialised, so can not unwedge? */ + if (!i915->kernel_context) + return false; + + /* + * Before unwedging, make sure that all pending operations * are flushed and errored out - we may have requests waiting upon * third party fences. We marked all inflight requests as EIO, and * every execbuf since returned EIO, for consistency we want all @@ -4853,7 +4858,8 @@ void i915_gem_resume(struct drm_i915_private *i915) i915_gem_restore_gtt_mappings(i915); i915_gem_restore_fences(i915); - /* As we didn't flush the kernel context before suspend, we cannot + /* + * As we didn't flush the kernel context before suspend, we cannot * guarantee that the context image is complete. So let's just reset * it and start again. */ @@ -4874,8 +4880,10 @@ void i915_gem_resume(struct drm_i915_private *i915) return; err_wedged: - DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n"); - i915_gem_set_wedged(i915); + if (!i915_terminally_wedged(&i915->gpu_error)) { + DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n"); + i915_gem_set_wedged(i915); + } goto out_unlock; } @@ -5158,22 +5166,28 @@ int i915_gem_init(struct drm_i915_private *dev_priv) intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); ret = i915_gem_init_ggtt(dev_priv); - if (ret) - goto out_unlock; + if (ret) { + GEM_BUG_ON(ret == -EIO); + goto err_unlock; + } ret = i915_gem_contexts_init(dev_priv); - if (ret) - goto out_unlock; + if (ret) { + GEM_BUG_ON(ret == -EIO); + goto err_ggtt; + } ret = intel_engines_init(dev_priv); - if (ret) - goto out_unlock; + if (ret) { + GEM_BUG_ON(ret == -EIO); + goto err_context; + } intel_init_gt_powersave(dev_priv); ret = i915_gem_init_hw(dev_priv); if (ret) - goto out_unlock; + goto err_pm; /* * Despite its name intel_init_clock_gating applies both display @@ -5187,9 +5201,48 @@ int i915_gem_init(struct drm_i915_private *dev_priv) intel_init_clock_gating(dev_priv); ret = __intel_engines_record_defaults(dev_priv); -out_unlock: + if (ret) + goto err_init_hw; + + if (i915_inject_load_failure()) { + ret = -EIO; + goto err_init_hw; + } + + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); + mutex_unlock(&dev_priv->drm.struct_mutex); + + return 0; + + /* + * Unwinding is complicated by that we want to handle -EIO to mean + * disable GPU submission but keep KMS alive. We want to mark the + * HW as irrevisibly wedged, but keep enough state around that the + * driver doesn't explode during runtime. + */ +err_init_hw: + i915_gem_wait_for_idle(dev_priv, I915_WAIT_LOCKED); + i915_gem_contexts_lost(dev_priv); + intel_uc_fini_hw(dev_priv); +err_pm: + if (ret != -EIO) { + intel_cleanup_gt_powersave(dev_priv); + i915_gem_cleanup_engines(dev_priv); + } +err_context: + if (ret != -EIO) + i915_gem_contexts_fini(dev_priv); +err_ggtt: +err_unlock: + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); + mutex_unlock(&dev_priv->drm.struct_mutex); + + if (ret != -EIO) + i915_gem_cleanup_userptr(dev_priv); + if (ret == -EIO) { - /* Allow engine initialisation to fail by marking the GPU as + /* + * Allow engine initialisation to fail by marking the GPU as * wedged. But we only want to do this where the GPU is angry, * for all other failure, such as an allocation failure, bail. */ @@ -5199,9 +5252,8 @@ int i915_gem_init(struct drm_i915_private *dev_priv) } ret = 0; } - intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); - mutex_unlock(&dev_priv->drm.struct_mutex); + i915_gem_drain_freed_objects(dev_priv); return ret; }