From patchwork Fri Aug 10 14:00:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mika Kuoppala X-Patchwork-Id: 10562719 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 338F2157B for ; Fri, 10 Aug 2018 14:01:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 182F12BBE4 for ; Fri, 10 Aug 2018 14:01:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0B5432BBEC; Fri, 10 Aug 2018 14:01:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8D7052BBE4 for ; Fri, 10 Aug 2018 14:01:42 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 24A256E93C; Fri, 10 Aug 2018 14:01:41 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id 88BDB6E940 for ; Fri, 10 Aug 2018 14:01:39 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Aug 2018 07:01:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,219,1531810800"; d="scan'208";a="79229232" Received: from rosetta.fi.intel.com ([10.237.72.186]) by fmsmga004.fm.intel.com with ESMTP; 10 Aug 2018 07:01:37 -0700 Received: by rosetta.fi.intel.com (Postfix, from userid 1000) id BCEE2840480; Fri, 10 Aug 2018 17:00:38 +0300 (EEST) From: Mika Kuoppala To: intel-gfx@lists.freedesktop.org Date: Fri, 10 Aug 2018 17:00:35 +0300 Message-Id: <20180810140036.24240-1-mika.kuoppala@linux.intel.com> X-Mailer: git-send-email 2.17.1 Subject: [Intel-gfx] [PATCH 1/2] drm/i915: Expose retry count to per gen reset logic X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP There is a possibility for per gen reset logic to be more nasty if the softer approach on resetting does not bear fruit. Expose retry count to per gen reset logic if it wants to take such tough measures. Cc: Chris Wilson Signed-off-by: Mika Kuoppala Reviewed-by: Chris Wilson --- drivers/gpu/drm/i915/intel_uncore.c | 40 +++++++++++++++++++---------- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index c2fcb51fc58a..027d14574bfa 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1739,7 +1739,7 @@ static void gen3_stop_engine(struct intel_engine_cs *engine) } static void i915_stop_engines(struct drm_i915_private *dev_priv, - unsigned engine_mask) + unsigned int engine_mask) { struct intel_engine_cs *engine; enum intel_engine_id id; @@ -1759,7 +1759,9 @@ static bool i915_in_reset(struct pci_dev *pdev) return gdrst & GRDOM_RESET_STATUS; } -static int i915_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) +static int i915_do_reset(struct drm_i915_private *dev_priv, + unsigned int engine_mask, + unsigned int retry) { struct pci_dev *pdev = dev_priv->drm.pdev; int err; @@ -1786,7 +1788,9 @@ static bool g4x_reset_complete(struct pci_dev *pdev) return (gdrst & GRDOM_RESET_ENABLE) == 0; } -static int g33_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) +static int g33_do_reset(struct drm_i915_private *dev_priv, + unsigned int engine_mask, + unsigned int retry) { struct pci_dev *pdev = dev_priv->drm.pdev; @@ -1794,7 +1798,9 @@ static int g33_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) return wait_for(g4x_reset_complete(pdev), 500); } -static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) +static int g4x_do_reset(struct drm_i915_private *dev_priv, + unsigned int engine_mask, + unsigned int retry) { struct pci_dev *pdev = dev_priv->drm.pdev; int ret; @@ -1831,7 +1837,8 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) } static int ironlake_do_reset(struct drm_i915_private *dev_priv, - unsigned engine_mask) + unsigned int engine_mask, + unsigned int retry) { int ret; @@ -1887,6 +1894,7 @@ static int gen6_hw_domain_reset(struct drm_i915_private *dev_priv, * gen6_reset_engines - reset individual engines * @dev_priv: i915 device * @engine_mask: mask of intel_ring_flag() engines or ALL_ENGINES for full reset + * @retry: the count of of previous attempts to reset. * * This function will reset the individual engines that are set in engine_mask. * If you provide ALL_ENGINES as mask, full global domain reset will be issued. @@ -1897,7 +1905,8 @@ static int gen6_hw_domain_reset(struct drm_i915_private *dev_priv, * Returns 0 on success, nonzero on error. */ static int gen6_reset_engines(struct drm_i915_private *dev_priv, - unsigned engine_mask) + unsigned int engine_mask, + unsigned int retry) { struct intel_engine_cs *engine; const u32 hw_engine_mask[I915_NUM_ENGINES] = { @@ -1936,7 +1945,7 @@ static int gen6_reset_engines(struct drm_i915_private *dev_priv, * Returns 0 on success, nonzero on error. */ static int gen11_reset_engines(struct drm_i915_private *dev_priv, - unsigned engine_mask) + unsigned int engine_mask) { struct intel_engine_cs *engine; const u32 hw_engine_mask[I915_NUM_ENGINES] = { @@ -2105,7 +2114,8 @@ static void gen8_reset_engine_cancel(struct intel_engine_cs *engine) } static int gen8_reset_engines(struct drm_i915_private *dev_priv, - unsigned engine_mask) + unsigned int engine_mask, + unsigned int retry) { struct intel_engine_cs *engine; unsigned int tmp; @@ -2121,7 +2131,7 @@ static int gen8_reset_engines(struct drm_i915_private *dev_priv, if (INTEL_GEN(dev_priv) >= 11) ret = gen11_reset_engines(dev_priv, engine_mask); else - ret = gen6_reset_engines(dev_priv, engine_mask); + ret = gen6_reset_engines(dev_priv, engine_mask, retry); not_ready: for_each_engine_masked(engine, dev_priv, engine_mask, tmp) @@ -2130,7 +2140,8 @@ static int gen8_reset_engines(struct drm_i915_private *dev_priv, return ret; } -typedef int (*reset_func)(struct drm_i915_private *, unsigned engine_mask); +typedef int (*reset_func)(struct drm_i915_private *, + unsigned int engine_mask, unsigned int retry); static reset_func intel_get_gpu_reset(struct drm_i915_private *dev_priv) { @@ -2153,10 +2164,10 @@ static reset_func intel_get_gpu_reset(struct drm_i915_private *dev_priv) return NULL; } -int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) +int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned int engine_mask) { reset_func reset = intel_get_gpu_reset(dev_priv); - int retry; + unsigned int retry; int ret; /* @@ -2200,8 +2211,9 @@ int intel_gpu_reset(struct drm_i915_private *dev_priv, unsigned engine_mask) ret = -ENODEV; if (reset) { - GEM_TRACE("engine_mask=%x\n", engine_mask); - ret = reset(dev_priv, engine_mask); + ret = reset(dev_priv, engine_mask, retry); + GEM_TRACE("engine_mask=%x, ret=%d, retry=%d\n", + engine_mask, ret, retry); } if (ret != -ETIMEDOUT || engine_mask != ALL_ENGINES) break; From patchwork Fri Aug 10 14:00:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mika Kuoppala X-Patchwork-Id: 10562721 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F197714C0 for ; Fri, 10 Aug 2018 14:01:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E19792BBE4 for ; Fri, 10 Aug 2018 14:01:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D5ACC2BBEC; Fri, 10 Aug 2018 14:01:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 884122BBE4 for ; Fri, 10 Aug 2018 14:01:43 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4D5CD6E940; Fri, 10 Aug 2018 14:01:41 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8BCCB6E941 for ; Fri, 10 Aug 2018 14:01:39 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Aug 2018 07:01:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,219,1531810800"; d="scan'208";a="65125499" Received: from rosetta.fi.intel.com ([10.237.72.186]) by orsmga006.jf.intel.com with ESMTP; 10 Aug 2018 07:01:38 -0700 Received: by rosetta.fi.intel.com (Postfix, from userid 1000) id BECC384047E; Fri, 10 Aug 2018 17:00:38 +0300 (EEST) From: Mika Kuoppala To: intel-gfx@lists.freedesktop.org Date: Fri, 10 Aug 2018 17:00:36 +0300 Message-Id: <20180810140036.24240-2-mika.kuoppala@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180810140036.24240-1-mika.kuoppala@linux.intel.com> References: <20180810140036.24240-1-mika.kuoppala@linux.intel.com> Subject: [Intel-gfx] [PATCH 2/2] drm/i915: Force reset on unready engine X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP If engine reports that it is not ready for reset, we give up. Evidence shows that forcing a per engine reset on an engine which is not reporting to be ready for reset, can bring it back into a working order. There is risk that we corrupt the context image currently executing on that engine. But that is a risk worth taking as if we unblock the engine, we prevent a whole device wedging in a case of full gpu reset. Reset individual engine even if it reports that it is not prepared for reset, but only if we aim for full gpu reset and not on first reset attempt. v2: force reset only on later attempts, readability (Chris) Cc: Chris Wilson Signed-off-by: Mika Kuoppala --- drivers/gpu/drm/i915/intel_uncore.c | 49 +++++++++++++++++++++++------ 1 file changed, 39 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 027d14574bfa..d24026839b17 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -2099,9 +2099,6 @@ static int gen8_reset_engine_start(struct intel_engine_cs *engine) RESET_CTL_READY_TO_RESET, 700, 0, NULL); - if (ret) - DRM_ERROR("%s: reset request timeout\n", engine->name); - return ret; } @@ -2113,6 +2110,42 @@ static void gen8_reset_engine_cancel(struct intel_engine_cs *engine) _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET)); } +static int reset_engines(struct drm_i915_private *i915, + unsigned int engine_mask, + unsigned int retry) +{ + if (INTEL_GEN(i915) >= 11) + return gen11_reset_engines(i915, engine_mask); + else + return gen6_reset_engines(i915, engine_mask, retry); +} + +static int gen8_prepare_engine_for_reset(struct intel_engine_cs *engine, + unsigned int retry) +{ + const bool force_reset_on_non_ready = retry >= 1; + int ret; + + ret = gen8_reset_engine_start(engine); + + if (ret && force_reset_on_non_ready) { + /* + * Try to unblock a single non-ready engine by risking + * context corruption. + */ + ret = reset_engines(engine->i915, + intel_engine_flag(engine), + retry); + if (!ret) + ret = gen8_reset_engine_start(engine); + + DRM_ERROR("%s: reset request timeout, forcing reset (%d)\n", + engine->name, ret); + } + + return ret; +} + static int gen8_reset_engines(struct drm_i915_private *dev_priv, unsigned int engine_mask, unsigned int retry) @@ -2122,16 +2155,12 @@ static int gen8_reset_engines(struct drm_i915_private *dev_priv, int ret; for_each_engine_masked(engine, dev_priv, engine_mask, tmp) { - if (gen8_reset_engine_start(engine)) { - ret = -EIO; + ret = gen8_prepare_engine_for_reset(engine, retry); + if (ret) goto not_ready; - } } - if (INTEL_GEN(dev_priv) >= 11) - ret = gen11_reset_engines(dev_priv, engine_mask); - else - ret = gen6_reset_engines(dev_priv, engine_mask, retry); + ret = reset_engines(dev_priv, engine_mask, retry); not_ready: for_each_engine_masked(engine, dev_priv, engine_mask, tmp)