From patchwork Tue Jan 21 11:50:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11343627 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 433CB14B4 for ; Tue, 21 Jan 2020 11:51:11 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2BC7D20882 for ; Tue, 21 Jan 2020 11:51:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2BC7D20882 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 462D96EC91; Tue, 21 Jan 2020 11:51:10 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 82BA56EC8B for ; Tue, 21 Jan 2020 11:51:08 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 19957161-1500050 for multiple; Tue, 21 Jan 2020 11:50:54 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 21 Jan 2020 11:50:53 +0000 Message-Id: <20200121115053.217888-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200121100927.114886-1-chris@chris-wilson.co.uk> References: <20200121100927.114886-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH] drm/i915/execlists: Reclaim the hanging virtual request X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" If we encounter a hang on a virtual engine, as we process the hang the request may already have been moved back to the virtual engine (we are processing the hang on the physical engine). We need to reclaim the request from the virtual engine so that the locking is consistent and local to the real engine on which we will hold the request for error state capturing. v2: Pull the reclamation into execlists_hold() and assert that cannot be called from outside of the reset (i.e. with the tasklet disabled). Fixes: 748317386afb ("drm/i915/execlists: Offline error capture") Testcase: igt/gem_exec_balancer/hang Signed-off-by: Chris Wilson Cc: Mika Kuoppala Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_lrc.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 3a30767ff0c4..11dfee4fe9ea 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -2396,6 +2396,35 @@ static void __execlists_hold(struct i915_request *rq) static void execlists_hold(struct intel_engine_cs *engine, struct i915_request *rq) { + if (rq->engine != engine) { /* preempted virtual engine */ + struct virtual_engine *ve = to_virtual_engine(rq->engine); + unsigned long flags; + + /* + * intel_context_inflight() is only protected by virtue + * of process_csb() being called only by the tasklet (or + * directly from inside reset while the tasklet is suspended). + * Assert that neither of those are allowed to run while we + * poke at the request queues. + */ + GEM_BUG_ON(!reset_in_progress(&engine->execlists)); + + /* + * An unsubmitted request along a virtual engine will + * remain on the active (this) engine until we are able + * to process the context switch away (and so mark the + * context as no longer in flight). That cannot have happened + * yet, otherwise we would not be hanging! + */ + spin_lock_irqsave(&ve->base.active.lock, flags); + GEM_BUG_ON(intel_context_inflight(rq->context) != engine); + GEM_BUG_ON(ve->request != rq); + ve->request = NULL; + spin_unlock_irqrestore(&ve->base.active.lock, flags); + + rq->engine = engine; + } + spin_lock_irq(&engine->active.lock); /*