From patchwork Fri Apr 27 20:29:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10369823 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 46F576038F for ; Fri, 27 Apr 2018 20:30:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2DB0829272 for ; Fri, 27 Apr 2018 20:30:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 22D8229287; Fri, 27 Apr 2018 20:30:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 99B2D29274 for ; Fri, 27 Apr 2018 20:30:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A56BB89F53; Fri, 27 Apr 2018 20:30:01 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id E345A89F53 for ; Fri, 27 Apr 2018 20:29:59 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 11523668-1500050 for multiple; Fri, 27 Apr 2018 21:29:24 +0100 Received: by haswell.alporthouse.com (sSMTP sendmail emulation); Fri, 27 Apr 2018 21:29:21 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 27 Apr 2018 21:29:20 +0100 Message-Id: <20180427202920.19440-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180427193224.11628-1-chris@chris-wilson.co.uk> References: <20180427193224.11628-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 X-Originating-IP: 78.156.65.138 X-Country: code=GB country="United Kingdom" ip=78.156.65.138 Subject: [Intel-gfx] [PATCH v3] drm/i915/lrc: Scrub the GPU state of the guilty hanging request X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Previously, we just reset the ring register in the context image such that we could skip over the broken batch and emit the closing breadcrumb. However, on resume the context image and GPU state would be reloaded, which may have been left in an inconsistent state by the reset. The presumption was that at worst it would just cause another reset and skip again until it recovered, however it seems just as likely to cause an unrecoverable hang. Instead of risking loading an incomplete context image, restore it back to the default state. v2: Fix up off-by-one from including the ppHSWP in with the register state. v3: Use a ring local to compact a few lines. Signed-off-by: Chris Wilson Cc: Mika Kuoppala Cc: MichaƂ Winiarski Cc: Michel Thierry Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/intel_lrc.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index ce23d5116482..bbca79bf19cc 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1804,8 +1804,9 @@ static void reset_common_ring(struct intel_engine_cs *engine, struct i915_request *request) { struct intel_engine_execlists * const execlists = &engine->execlists; - struct intel_context *ce; + struct intel_ring * const ring = request->ring; unsigned long flags; + u32 *regs; GEM_TRACE("%s request global=%x, current=%d\n", engine->name, request ? request->global_seqno : 0, @@ -1855,17 +1856,27 @@ static void reset_common_ring(struct intel_engine_cs *engine, * future request will be after userspace has had the opportunity * to recreate its own state. */ - ce = &request->ctx->engine[engine->id]; - execlists_init_reg_state(ce->lrc_reg_state, - request->ctx, engine, ce->ring); + regs = request->ctx->engine[engine->id].lrc_reg_state; + if (engine->default_state) { + void *defaults; + + defaults = i915_gem_object_pin_map(engine->default_state, + I915_MAP_WB); + if (!IS_ERR(defaults)) { + memcpy(regs, /* skip restoring to the vanilla PPHWSP */ + defaults + LRC_STATE_PN * PAGE_SIZE, + engine->context_size - PAGE_SIZE); + i915_gem_object_unpin_map(engine->default_state); + } + } + execlists_init_reg_state(regs, request->ctx, engine, ring); /* Move the RING_HEAD onto the breadcrumb, past the hanging batch */ - ce->lrc_reg_state[CTX_RING_BUFFER_START+1] = - i915_ggtt_offset(ce->ring->vma); - ce->lrc_reg_state[CTX_RING_HEAD+1] = request->postfix; + regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(ring->vma); + regs[CTX_RING_HEAD + 1] = request->postfix; - request->ring->head = request->postfix; - intel_ring_update_space(request->ring); + ring->head = request->postfix; + intel_ring_update_space(ring); /* Reset WaIdleLiteRestore:bdw,skl as well */ unwind_wa_tail(request);