From patchwork Tue Aug 14 14:41:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10565771 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3377D15A6 for ; Tue, 14 Aug 2018 14:41:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 285D22A030 for ; Tue, 14 Aug 2018 14:41:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 266DF2A01C; Tue, 14 Aug 2018 14:41:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 80CEF2A072 for ; Tue, 14 Aug 2018 14:41:33 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C6E336E281; Tue, 14 Aug 2018 14:41:32 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id CDBB26E281 for ; Tue, 14 Aug 2018 14:41:30 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 12902316-1500050 for multiple; Tue, 14 Aug 2018 15:41:23 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 14 Aug 2018 15:41:19 +0100 Message-Id: <20180814144120.18714-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.18.0 MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Clear STOP_RING bit before restoring the context X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Before a reset, we set the STOP_RING bit of RING_MI_MODE to freeze the engine. However, sometimes we observe that upon restart, the engine freezes again with the STOP_RING bit still asserted. By inspection, we know that the register itself is cleared by the GPU reset, so that bit must be preserved inside the context image and reloaded from there. A simple fix (as the RING_MI_MODE is at a fixed offset in a valid context) is to clobber the STOP_RING bit inside the image before replaying the request. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: Michel Thierry Cc: MichaƂ Winiarski Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++-- drivers/gpu/drm/i915/intel_lrc_reg.h | 2 ++ 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 3f90c74038ef..37fe842de639 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1918,6 +1918,20 @@ static void execlists_reset(struct intel_engine_cs *engine, spin_unlock_irqrestore(&engine->timeline.lock, flags); + if (!request) + return; + + regs = request->hw_context->lrc_reg_state; + + /* + * After a reset, the context may have preserved the STOP bit + * of RING_MI_MODE we used to freeze the active engine before + * the reset. If that bit is restored the ring stops instead + * of being executed. + */ + regs[CTX_MI_MODE + 1] |= STOP_RING << 16; + regs[CTX_MI_MODE + 1] &= ~STOP_RING; + /* * If the request was innocent, we leave the request in the ELSP * and will try to replay it on restarting. The context image may @@ -1929,7 +1943,7 @@ static void execlists_reset(struct intel_engine_cs *engine, * and have to at least restore the RING register in the context * image back to the expected values to skip over the guilty request. */ - if (!request || request->fence.error != -EIO) + if (request->fence.error != -EIO) return; /* @@ -1940,7 +1954,6 @@ static void execlists_reset(struct intel_engine_cs *engine, * future request will be after userspace has had the opportunity * to recreate its own state. */ - regs = request->hw_context->lrc_reg_state; if (engine->pinned_default_state) { memcpy(regs, /* skip restoring the vanilla PPHWSP */ engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE, diff --git a/drivers/gpu/drm/i915/intel_lrc_reg.h b/drivers/gpu/drm/i915/intel_lrc_reg.h index 5ef932d810a7..3b155ecbfa92 100644 --- a/drivers/gpu/drm/i915/intel_lrc_reg.h +++ b/drivers/gpu/drm/i915/intel_lrc_reg.h @@ -39,6 +39,8 @@ #define CTX_R_PWR_CLK_STATE 0x42 #define CTX_END 0x44 +#define CTX_MI_MODE 0x54 + #define CTX_REG(reg_state, pos, reg, val) do { \ u32 *reg_state__ = (reg_state); \ const u32 pos__ = (pos); \ From patchwork Tue Aug 14 14:41:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10565769 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3194415A6 for ; Tue, 14 Aug 2018 14:41:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 26D792A0DC for ; Tue, 14 Aug 2018 14:41:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 23CA52A060; Tue, 14 Aug 2018 14:41:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id BE88D2A0BC for ; Tue, 14 Aug 2018 14:41:31 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 30E0E6E282; Tue, 14 Aug 2018 14:41:31 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 058626E281 for ; Tue, 14 Aug 2018 14:41:29 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 12902317-1500050 for multiple; Tue, 14 Aug 2018 15:41:23 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 14 Aug 2018 15:41:20 +0100 Message-Id: <20180814144120.18714-2-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180814144120.18714-1-chris@chris-wilson.co.uk> References: <20180814144120.18714-1-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH 2/2] drm/i915/execlists: Refind the active request before resetting X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP When resetting the context image after a GPU reset, it is vital that we do inspect the context image that was active at the time of the hang. Even a 'pardoned' context may still have some residual corruption (e.g. the STOP_RING bit) from issuing the GPU reset that we need to fixup before continuing. Signed-off-by: Chris Wilson Cc: Mika Kuoppala --- drivers/gpu/drm/i915/intel_lrc.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 37fe842de639..246daacb545e 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -322,9 +322,10 @@ static void unwind_wa_tail(struct i915_request *rq) assert_ring_tail_valid(rq->ring, rq->tail); } -static void __unwind_incomplete_requests(struct intel_engine_cs *engine) +static struct i915_request * +__unwind_incomplete_requests(struct intel_engine_cs *engine) { - struct i915_request *rq, *rn; + struct i915_request *rq, *rn, *active = NULL; struct i915_priolist *uninitialized_var(p); int last_prio = I915_PRIORITY_INVALID; @@ -334,7 +335,7 @@ static void __unwind_incomplete_requests(struct intel_engine_cs *engine) &engine->timeline.requests, link) { if (i915_request_completed(rq)) - return; + break; __i915_request_unsubmit(rq); unwind_wa_tail(rq); @@ -347,7 +348,11 @@ static void __unwind_incomplete_requests(struct intel_engine_cs *engine) GEM_BUG_ON(p->priority != rq_prio(rq)); list_add(&rq->sched.link, &p->requests); + + active = rq; } + + return active; } void @@ -1911,7 +1916,7 @@ static void execlists_reset(struct intel_engine_cs *engine, execlists_cancel_port_requests(execlists); /* Push back any incomplete requests for replay after the reset. */ - __unwind_incomplete_requests(engine); + request = __unwind_incomplete_requests(engine); /* Following the reset, we need to reload the CSB read/write pointers */ reset_csb_pointers(&engine->execlists);