From patchwork Fri Feb 8 15:37:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10803299 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 853AF186D for ; Fri, 8 Feb 2019 15:37:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 757522E922 for ; Fri, 8 Feb 2019 15:37:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 69C852E932; Fri, 8 Feb 2019 15:37:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 03B342E922 for ; Fri, 8 Feb 2019 15:37:16 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6EF4E6EE2E; Fri, 8 Feb 2019 15:37:14 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 10A726EE25 for ; Fri, 8 Feb 2019 15:37:12 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15511803-1500050 for ; Fri, 08 Feb 2019 15:37:09 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 8 Feb 2019 15:37:08 +0000 Message-Id: <20190208153708.20023-7-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190208153708.20023-1-chris@chris-wilson.co.uk> References: <20190208153708.20023-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [CI 7/7] drm/i915: Don't claim an unstarted request was guilty X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP If we haven't even begun executing the payload of the stalled request, then we should not claim that its userspace context was guilty of submitting a hanging batch. v2: Check for context corruption before trying to restart. v3: Preserve semaphores on skipping requests (need to keep the timelines intact). Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/intel_lrc.c | 42 +++++++++++++++++-- drivers/gpu/drm/i915/selftests/igt_spinner.c | 7 ++++ .../gpu/drm/i915/selftests/intel_hangcheck.c | 6 +++ 3 files changed, 52 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 5e98fd79bd9d..1b567a3f006a 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1387,6 +1387,10 @@ static int gen8_emit_init_breadcrumb(struct i915_request *rq) *cs++ = rq->fence.seqno - 1; intel_ring_advance(rq, cs); + + /* Record the updated position of the request's payload */ + rq->infix = intel_ring_offset(rq, cs); + return 0; } @@ -1878,6 +1882,23 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) spin_unlock_irqrestore(&engine->timeline.lock, flags); } +static bool lrc_regs_ok(const struct i915_request *rq) +{ + const struct intel_ring *ring = rq->ring; + const u32 *regs = rq->hw_context->lrc_reg_state; + + /* Quick spot check for the common signs of context corruption */ + + if (regs[CTX_RING_BUFFER_CONTROL + 1] != + (RING_CTL_SIZE(ring->size) | RING_VALID)) + return false; + + if (regs[CTX_RING_BUFFER_START + 1] != i915_ggtt_offset(ring->vma)) + return false; + + return true; +} + static void execlists_reset(struct intel_engine_cs *engine, bool stalled) { struct intel_engine_execlists * const execlists = &engine->execlists; @@ -1912,6 +1933,21 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled) if (!rq) goto out_unlock; + /* + * If this request hasn't started yet, e.g. it is waiting on a + * semaphore, we need to avoid skipping the request or else we + * break the signaling chain. However, if the context is corrupt + * the request will not restart and we will be stuck with a wedged + * device. It is quite often the case that if we issue a reset + * while the GPU is loading the context image, that the context + * image becomes corrupt. + * + * Otherwise, if we have not started yet, the request should replay + * perfectly and we do not need to flag the result as being erroneous. + */ + if (!i915_request_started(rq) && lrc_regs_ok(rq)) + goto out_unlock; + /* * If the request was innocent, we leave the request in the ELSP * and will try to replay it on restarting. The context image may @@ -1924,7 +1960,7 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled) * image back to the expected values to skip over the guilty request. */ i915_reset_request(rq, stalled); - if (!stalled) + if (!stalled && lrc_regs_ok(rq)) goto out_unlock; /* @@ -1942,8 +1978,8 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled) engine->context_size - PAGE_SIZE); } - /* Move the RING_HEAD onto the breadcrumb, past the hanging batch */ - rq->ring->head = intel_ring_wrap(rq->ring, rq->postfix); + /* Rerun the request; its payload has been neutered (if guilty). */ + rq->ring->head = intel_ring_wrap(rq->ring, rq->head); intel_ring_update_space(rq->ring); execlists_init_reg_state(regs, rq->gem_context, engine, rq->ring); diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c index 9ebd9225684e..d0b93a3fbc54 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c @@ -144,6 +144,13 @@ igt_spinner_create_request(struct igt_spinner *spin, i915_gem_chipset_flush(spin->i915); + if (engine->emit_init_breadcrumb && + rq->timeline->has_initial_breadcrumb) { + err = engine->emit_init_breadcrumb(rq); + if (err) + goto cancel_rq; + } + err = engine->emit_bb_start(rq, vma->node.start, PAGE_SIZE, 0); cancel_rq: diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c index 4886fac12628..92475596ff40 100644 --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c @@ -242,6 +242,12 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) *batch++ = MI_BATCH_BUFFER_END; /* not reached */ i915_gem_chipset_flush(h->i915); + if (rq->engine->emit_init_breadcrumb) { + err = rq->engine->emit_init_breadcrumb(rq); + if (err) + goto cancel_rq; + } + flags = 0; if (INTEL_GEN(vm->i915) <= 5) flags |= I915_DISPATCH_SECURE;