From patchwork Mon Feb 4 13:21:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795627 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 591C713B5 for ; Mon, 4 Feb 2019 13:23:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 46CA725404 for ; Mon, 4 Feb 2019 13:23:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 346652624D; Mon, 4 Feb 2019 13:23:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8319723794 for ; Mon, 4 Feb 2019 13:23:58 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E14A06E4E1; Mon, 4 Feb 2019 13:23:57 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id CF0856E4E0 for ; Mon, 4 Feb 2019 13:23:56 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458847-1500050 for multiple; Mon, 04 Feb 2019 13:22:14 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:21:53 +0000 Message-Id: <20190204132214.9459-2-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 01/22] drm/i915/execlists: Suppress mere WAIT preemption X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP WAIT is occasionally suppressed by virtue of preempted requests being promoted to NEWCLIENT if they have not all ready received that boost. Make this consistent for all WAIT boosts that they are not allowed to preempt executing contexts and are merely granted the right to be at the front of the queue for the next execution slot. This is in keeping with the desire that the WAIT boost be a minor tweak that does not give excessive promotion to its user and open ourselves to trivial abuse. The problem with the inconsistent WAIT preemption becomes more apparent as the preemption is propagated across the engines, where one engine may preempt and the other not, and we be relying on the exact execution order being consistent across engines (e.g. using HW semaphores to coordinate parallel execution). v2: Also protect GuC submission from false preemption loops. v3: Build bug safeguards and better debug messages for st. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_request.c | 12 ++ drivers/gpu/drm/i915/i915_scheduler.h | 2 + drivers/gpu/drm/i915/intel_lrc.c | 9 +- drivers/gpu/drm/i915/selftests/igt_spinner.c | 9 +- drivers/gpu/drm/i915/selftests/intel_lrc.c | 160 +++++++++++++++++++ 5 files changed, 190 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 9ed5baf157a3..d14a1b225f47 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -377,12 +377,24 @@ void __i915_request_submit(struct i915_request *request) /* We may be recursing from the signal callback of another i915 fence */ spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING); + GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags)); set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags); + request->global_seqno = seqno; if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) && !i915_request_enable_breadcrumb(request)) intel_engine_queue_breadcrumbs(engine); + + /* + * As we do not allow WAIT to preempt inflight requests, + * once we have executed a request, along with triggering + * any execution callbacks, we must preserve its ordering + * within the non-preemptible FIFO. + */ + BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */ + request->sched.attr.priority |= __NO_PREEMPTION; + spin_unlock(&request->lock); engine->emit_fini_breadcrumb(request, diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index dbe9cb7ecd82..54bd6c89817e 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -33,6 +33,8 @@ enum { #define I915_PRIORITY_WAIT ((u8)BIT(0)) #define I915_PRIORITY_NEWCLIENT ((u8)BIT(1)) +#define __NO_PREEMPTION (I915_PRIORITY_WAIT) + struct i915_sched_attr { /** * @priority: execution and service priority diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index a9eb0211ce77..773df0bd685b 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -188,6 +188,12 @@ static inline int rq_prio(const struct i915_request *rq) return rq->sched.attr.priority; } +static int effective_prio(const struct i915_request *rq) +{ + /* Restrict mere WAIT boosts from triggering preemption */ + return rq_prio(rq) | __NO_PREEMPTION; +} + static int queue_prio(const struct intel_engine_execlists *execlists) { struct i915_priolist *p; @@ -208,7 +214,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists) static inline bool need_preempt(const struct intel_engine_cs *engine, const struct i915_request *rq) { - const int last_prio = rq_prio(rq); + int last_prio; if (!intel_engine_has_preemption(engine)) return false; @@ -228,6 +234,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine, * preempt. If that hint is stale or we may be trying to preempt * ourselves, ignore the request. */ + last_prio = effective_prio(rq); if (!__execlists_need_preempt(engine->execlists.queue_priority_hint, last_prio)) return false; diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c index 9ebd9225684e..86354e51bdd3 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c @@ -142,10 +142,17 @@ igt_spinner_create_request(struct igt_spinner *spin, *batch++ = upper_32_bits(vma->node.start); *batch++ = MI_BATCH_BUFFER_END; /* not reached */ - i915_gem_chipset_flush(spin->i915); + if (engine->emit_init_breadcrumb && + rq->timeline->has_initial_breadcrumb) { + err = engine->emit_init_breadcrumb(rq); + if (err) + goto cancel_rq; + } err = engine->emit_bb_start(rq, vma->node.start, PAGE_SIZE, 0); + i915_gem_chipset_flush(spin->i915); + cancel_rq: if (err) { i915_request_skip(rq, err); diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c index fb35f53c9ce3..16037a841146 100644 --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c @@ -405,6 +405,165 @@ static int live_suppress_self_preempt(void *arg) goto err_client_b; } +static int __i915_sw_fence_call +dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) +{ + return NOTIFY_DONE; +} + +static struct i915_request *dummy_request(struct intel_engine_cs *engine) +{ + struct i915_request *rq; + + rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO); + if (!rq) + return NULL; + + INIT_LIST_HEAD(&rq->active_list); + rq->engine = engine; + + i915_sched_node_init(&rq->sched); + + /* mark this request as permanently incomplete */ + rq->fence.seqno = 1; + BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */ + rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1; + + i915_sw_fence_init(&rq->submit, dummy_notify); + i915_sw_fence_commit(&rq->submit); + + return rq; +} + +static void dummy_request_free(struct i915_request *dummy) +{ + i915_request_mark_complete(dummy); + i915_sched_node_fini(dummy->engine->i915, &dummy->sched); + kfree(dummy); +} + +static int live_suppress_wait_preempt(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct preempt_client client[4]; + struct intel_engine_cs *engine; + enum intel_engine_id id; + intel_wakeref_t wakeref; + int err = -ENOMEM; + int i; + + /* + * Waiters are given a little priority nudge, but not enough + * to actually cause any preemption. Double check that we do + * not needlessly generate preempt-to-idle cycles. + */ + + if (!HAS_LOGICAL_RING_PREEMPTION(i915)) + return 0; + + mutex_lock(&i915->drm.struct_mutex); + wakeref = intel_runtime_pm_get(i915); + + if (preempt_client_init(i915, &client[0])) /* ELSP[0] */ + goto err_unlock; + if (preempt_client_init(i915, &client[1])) /* ELSP[1] */ + goto err_client_0; + if (preempt_client_init(i915, &client[2])) /* head of queue */ + goto err_client_1; + if (preempt_client_init(i915, &client[3])) /* bystander */ + goto err_client_2; + + for_each_engine(engine, i915, id) { + int depth; + + if (!engine->emit_init_breadcrumb) + continue; + + for (depth = 0; depth < ARRAY_SIZE(client); depth++) { + struct i915_request *rq[ARRAY_SIZE(client)]; + struct i915_request *dummy; + + engine->execlists.preempt_hang.count = 0; + + dummy = dummy_request(engine); + if (!dummy) + goto err_client_3; + + for (i = 0; i < ARRAY_SIZE(client); i++) { + rq[i] = igt_spinner_create_request(&client[i].spin, + client[i].ctx, engine, + MI_NOOP); + if (IS_ERR(rq[i])) { + err = PTR_ERR(rq[i]); + goto err_wedged; + } + + /* Disable NEWCLIENT promotion */ + i915_gem_active_set(&rq[i]->timeline->last_request, + dummy); + i915_request_add(rq[i]); + } + + dummy_request_free(dummy); + + GEM_BUG_ON(i915_request_completed(rq[0])); + if (!igt_wait_for_spinner(&client[0].spin, rq[0])) { + pr_err("%s: First client failed to start\n", + engine->name); + goto err_wedged; + } + GEM_BUG_ON(!i915_request_started(rq[0])); + + if (i915_request_wait(rq[depth], + I915_WAIT_LOCKED | + I915_WAIT_PRIORITY, + 1) != -ETIME) { + pr_err("%s: Waiter depth:%d completed!\n", + engine->name, depth); + goto err_wedged; + } + + for (i = 0; i < ARRAY_SIZE(client); i++) + igt_spinner_end(&client[i].spin); + + if (igt_flush_test(i915, I915_WAIT_LOCKED)) + goto err_wedged; + + if (engine->execlists.preempt_hang.count) { + pr_err("%s: Preemption recorded x%d, depth %d; should have been suppressed!\n", + engine->name, + engine->execlists.preempt_hang.count, + depth); + err = -EINVAL; + goto err_client_3; + } + } + } + + err = 0; +err_client_3: + preempt_client_fini(&client[3]); +err_client_2: + preempt_client_fini(&client[2]); +err_client_1: + preempt_client_fini(&client[1]); +err_client_0: + preempt_client_fini(&client[0]); +err_unlock: + if (igt_flush_test(i915, I915_WAIT_LOCKED)) + err = -EIO; + intel_runtime_pm_put(i915, wakeref); + mutex_unlock(&i915->drm.struct_mutex); + return err; + +err_wedged: + for (i = 0; i < ARRAY_SIZE(client); i++) + igt_spinner_end(&client[i].spin); + i915_gem_set_wedged(i915); + err = -EIO; + goto err_client_3; +} + static int live_preempt_hang(void *arg) { struct drm_i915_private *i915 = arg; @@ -785,6 +944,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915) SUBTEST(live_preempt), SUBTEST(live_late_preempt), SUBTEST(live_suppress_self_preempt), + SUBTEST(live_suppress_wait_preempt), SUBTEST(live_preempt_hang), SUBTEST(live_preempt_smoke), }; From patchwork Mon Feb 4 13:21:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795599 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6446113B5 for ; Mon, 4 Feb 2019 13:22:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4FD872B497 for ; Mon, 4 Feb 2019 13:22:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 41AAD2B49A; Mon, 4 Feb 2019 13:22:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4C1D32B497 for ; Mon, 4 Feb 2019 13:22:44 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6F7186E4D7; Mon, 4 Feb 2019 13:22:43 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3DEEF6E4DE for ; Mon, 4 Feb 2019 13:22:42 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458848-1500050 for multiple; Mon, 04 Feb 2019 13:22:15 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:21:54 +0000 Message-Id: <20190204132214.9459-3-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 02/22] drm/i915/execlists: Suppress redundant preemption X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP On unwinding the active request we give it a small (limited to internal priority levels) boost to prevent it from being gazumped a second time. However, this means that it can be promoted to above the request that triggered the preemption request, causing a preempt-to-idle cycle for no change. We can avoid this if we take the boost into account when checking if the preemption request is valid. v2: After preemption the active request will be after the preemptee if they end up with equal priority. v3: Tvrtko pointed out that this, the existing logic, makes I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk! v4: Prove Tvrtko was right about WAIT being non-preemptible and test it. v5: Except not all priorities were made equal, and the WAIT not preempting is only if we start off as !NEWCLIENT. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++---- 1 file changed, 34 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 773df0bd685b..9b6b3acb9070 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -164,6 +164,8 @@ #define WA_TAIL_DWORDS 2 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS) +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT) + static int execlists_context_deferred_alloc(struct i915_gem_context *ctx, struct intel_engine_cs *engine, struct intel_context *ce); @@ -190,8 +192,30 @@ static inline int rq_prio(const struct i915_request *rq) static int effective_prio(const struct i915_request *rq) { + int prio = rq_prio(rq); + + /* + * On unwinding the active request, we give it a priority bump + * equivalent to a freshly submitted request. This protects it from + * being gazumped again, but it would be preferable if we didn't + * let it be gazumped in the first place! + * + * See __unwind_incomplete_requests() + */ + if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) { + /* + * After preemption, we insert the active request at the + * end of the new priority level. This means that we will be + * _lower_ priority than the preemptee all things equal (and + * so the preemption is valid), so adjust our comparison + * accordingly. + */ + prio |= ACTIVE_PRIORITY; + prio--; + } + /* Restrict mere WAIT boosts from triggering preemption */ - return rq_prio(rq) | __NO_PREEMPTION; + return prio | __NO_PREEMPTION; } static int queue_prio(const struct intel_engine_execlists *execlists) @@ -360,7 +384,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) { struct i915_request *rq, *rn, *active = NULL; struct list_head *uninitialized_var(pl); - int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT; + int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY; lockdep_assert_held(&engine->timeline.lock); @@ -391,9 +415,15 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) * The active request is now effectively the start of a new client * stream, so give it the equivalent small priority bump to prevent * it being gazumped a second time by another peer. + * + * One consequence of this preemption boost is that we may jump + * over lesser priorities (such as I915_PRIORITY_WAIT), effectively + * making those priorities non-preemptible. They will be moved forward + * in the priority queue, but they will not gain immediate access to + * the GPU. */ - if (!(prio & I915_PRIORITY_NEWCLIENT)) { - prio |= I915_PRIORITY_NEWCLIENT; + if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) { + prio |= ACTIVE_PRIORITY; active->sched.attr.priority = prio; list_move_tail(&active->sched.link, i915_sched_lookup_priolist(engine, prio)); From patchwork Mon Feb 4 13:21:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795631 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A40C13B5 for ; Mon, 4 Feb 2019 13:24:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 38A6923794 for ; Mon, 4 Feb 2019 13:24:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2D0EB2624D; Mon, 4 Feb 2019 13:24:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C68E023794 for ; Mon, 4 Feb 2019 13:24:04 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 32DA86E4E0; Mon, 4 Feb 2019 13:24:04 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id E03956E4EC for ; Mon, 4 Feb 2019 13:24:02 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458849-1500050 for multiple; Mon, 04 Feb 2019 13:22:15 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:21:55 +0000 Message-Id: <20190204132214.9459-4-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 03/22] drm/i915/selftests: Exercise some AB...BA preemption chains X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Build a chain using 2 contexts (A, B) then request a preemption such that a later A request runs before the spinner in B. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/selftests/intel_lrc.c | 103 +++++++++++++++++++++ 1 file changed, 103 insertions(+) diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c index 16037a841146..967cefa118ee 100644 --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c @@ -4,6 +4,8 @@ * Copyright © 2018 Intel Corporation */ +#include + #include "../i915_reset.h" #include "../i915_selftest.h" @@ -564,6 +566,106 @@ static int live_suppress_wait_preempt(void *arg) goto err_client_3; } +static int live_chain_preempt(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct intel_engine_cs *engine; + struct preempt_client hi, lo; + enum intel_engine_id id; + intel_wakeref_t wakeref; + int err = -ENOMEM; + + /* + * Build a chain AB...BA between two contexts (A, B) and request + * preemption of the last request. It should then complete before + * the previously submitted spinner in B. + */ + + if (!HAS_LOGICAL_RING_PREEMPTION(i915)) + return 0; + + mutex_lock(&i915->drm.struct_mutex); + wakeref = intel_runtime_pm_get(i915); + + if (preempt_client_init(i915, &hi)) + goto err_unlock; + + if (preempt_client_init(i915, &lo)) + goto err_client_hi; + + for_each_engine(engine, i915, id) { + struct i915_sched_attr attr = { + .priority = I915_USER_PRIORITY(I915_PRIORITY_MAX), + }; + int count, i; + + for_each_prime_number_from(count, 1, 32) { /* must fit ring! */ + struct i915_request *rq; + + rq = igt_spinner_create_request(&hi.spin, + hi.ctx, engine, + MI_ARB_CHECK); + if (IS_ERR(rq)) + goto err_wedged; + i915_request_add(rq); + if (!igt_wait_for_spinner(&hi.spin, rq)) + goto err_wedged; + + rq = igt_spinner_create_request(&lo.spin, + lo.ctx, engine, + MI_ARB_CHECK); + if (IS_ERR(rq)) + goto err_wedged; + i915_request_add(rq); + + for (i = 0; i < count; i++) { + rq = i915_request_alloc(engine, lo.ctx); + if (IS_ERR(rq)) + goto err_wedged; + i915_request_add(rq); + } + + rq = i915_request_alloc(engine, hi.ctx); + if (IS_ERR(rq)) + goto err_wedged; + i915_request_add(rq); + engine->schedule(rq, &attr); + + igt_spinner_end(&hi.spin); + if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) { + struct drm_printer p = + drm_info_printer(i915->drm.dev); + + pr_err("Failed to preempt over chain of %d\n", + count); + intel_engine_dump(engine, &p, + "%s\n", engine->name); + goto err_wedged; + } + igt_spinner_end(&lo.spin); + } + } + + err = 0; +err_client_lo: + preempt_client_fini(&lo); +err_client_hi: + preempt_client_fini(&hi); +err_unlock: + if (igt_flush_test(i915, I915_WAIT_LOCKED)) + err = -EIO; + intel_runtime_pm_put(i915, wakeref); + mutex_unlock(&i915->drm.struct_mutex); + return err; + +err_wedged: + igt_spinner_end(&hi.spin); + igt_spinner_end(&lo.spin); + i915_gem_set_wedged(i915); + err = -EIO; + goto err_client_lo; +} + static int live_preempt_hang(void *arg) { struct drm_i915_private *i915 = arg; @@ -945,6 +1047,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915) SUBTEST(live_late_preempt), SUBTEST(live_suppress_self_preempt), SUBTEST(live_suppress_wait_preempt), + SUBTEST(live_chain_preempt), SUBTEST(live_preempt_hang), SUBTEST(live_preempt_smoke), }; From patchwork Mon Feb 4 13:21:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795633 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2279813B5 for ; Mon, 4 Feb 2019 13:24:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 111FB23794 for ; Mon, 4 Feb 2019 13:24:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 05A6C2624D; Mon, 4 Feb 2019 13:24:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B567023794 for ; Mon, 4 Feb 2019 13:24:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D41076E4ED; Mon, 4 Feb 2019 13:24:04 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id E916F6E4ED for ; Mon, 4 Feb 2019 13:24:02 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458850-1500050 for multiple; Mon, 04 Feb 2019 13:22:15 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:21:56 +0000 Message-Id: <20190204132214.9459-5-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 04/22] drm/i915: Trim NEWCLIENT boosting X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Limit the NEWCLIENT boost to only give its small priority boost to fresh clients only that have no dependencies. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_request.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index d14a1b225f47..04c65e6d83b9 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -980,7 +980,7 @@ void i915_request_add(struct i915_request *request) * Allow interactive/synchronous clients to jump ahead of * the bulk clients. (FQ_CODEL) */ - if (!prev || i915_request_completed(prev)) + if (list_empty(&request->sched.signalers_list)) attr.priority |= I915_PRIORITY_NEWCLIENT; engine->schedule(request, &attr); From patchwork Mon Feb 4 13:21:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795635 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2869913A4 for ; Mon, 4 Feb 2019 13:24:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17F4C23794 for ; Mon, 4 Feb 2019 13:24:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0C23D26E51; Mon, 4 Feb 2019 13:24:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8D57623794 for ; Mon, 4 Feb 2019 13:24:07 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D865B6E4EC; Mon, 4 Feb 2019 13:24:06 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id EDEDE6E4EC for ; Mon, 4 Feb 2019 13:24:04 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458851-1500050 for multiple; Mon, 04 Feb 2019 13:22:15 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:21:57 +0000 Message-Id: <20190204132214.9459-6-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 05/22] drm/i915: Show support for accurate sw PMU busyness tracking X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Expose whether or not we support the PMU software tracking in our scheduler capabilities, so userspace can query at runtime. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem.c | 2 ++ drivers/gpu/drm/i915/intel_engine_cs.c | 38 +++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_lrc.c | 6 ---- drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++ include/uapi/drm/i915_drm.h | 1 + 5 files changed, 43 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index e802af64d628..bc7d1338b69a 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4757,6 +4757,8 @@ static int __i915_gem_restart_engines(void *data) } } + intel_engines_set_scheduler_caps(i915); + return 0; } diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 71c01eb13af1..ec2cbbe070a4 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -614,6 +614,44 @@ int intel_engine_setup_common(struct intel_engine_cs *engine) return err; } +void intel_engines_set_scheduler_caps(struct drm_i915_private *i915) +{ + static const struct { + u32 engine_flag; + u32 sched_cap; + } map[] = { + { I915_ENGINE_HAS_PREEMPTION, I915_SCHEDULER_CAP_PREEMPTION }, + { I915_ENGINE_SUPPORTS_STATS, I915_SCHEDULER_CAP_PMU }, + }; + struct intel_engine_cs *engine; + enum intel_engine_id id; + u32 enabled, disabled; + + enabled = 0; + disabled = 0; + for_each_engine(engine, i915, id) { /* all engines must agree! */ + int i; + + if (engine->schedule) + enabled |= (I915_SCHEDULER_CAP_ENABLED | + I915_SCHEDULER_CAP_PRIORITY); + else + disabled |= (I915_SCHEDULER_CAP_ENABLED | + I915_SCHEDULER_CAP_PRIORITY); + + for (i = 0; i < ARRAY_SIZE(map); i++) { + if (engine->flags & map[i].engine_flag) + enabled |= map[i].sched_cap; + else + disabled |= map[i].sched_cap; + } + } + + i915->caps.scheduler = enabled & ~disabled; + if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_ENABLED)) + i915->caps.scheduler = 0; +} + static void __intel_context_unpin(struct i915_gem_context *ctx, struct intel_engine_cs *engine) { diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9b6b3acb9070..0869a4fd20c7 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -2299,12 +2299,6 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine) engine->flags |= I915_ENGINE_SUPPORTS_STATS; if (engine->i915->preempt_context) engine->flags |= I915_ENGINE_HAS_PREEMPTION; - - engine->i915->caps.scheduler = - I915_SCHEDULER_CAP_ENABLED | - I915_SCHEDULER_CAP_PRIORITY; - if (intel_engine_has_preemption(engine)) - engine->i915->caps.scheduler |= I915_SCHEDULER_CAP_PREEMPTION; } static void diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 1398eb81dee6..8183d3441907 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -590,6 +590,8 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine) return engine->flags & I915_ENGINE_HAS_PREEMPTION; } +void intel_engines_set_scheduler_caps(struct drm_i915_private *i915); + static inline bool __execlists_need_preempt(int prio, int last) { /* diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 298b2e197744..d8ac7f105734 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -476,6 +476,7 @@ typedef struct drm_i915_irq_wait { #define I915_SCHEDULER_CAP_ENABLED (1ul << 0) #define I915_SCHEDULER_CAP_PRIORITY (1ul << 1) #define I915_SCHEDULER_CAP_PREEMPTION (1ul << 2) +#define I915_SCHEDULER_CAP_PMU (1ul << 3) #define I915_PARAM_HUC_STATUS 42 From patchwork Mon Feb 4 13:21:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795629 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8BEAA13A4 for ; Mon, 4 Feb 2019 13:24:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7B10E25404 for ; Mon, 4 Feb 2019 13:24:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6DF422624D; Mon, 4 Feb 2019 13:24:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 546572785D for ; Mon, 4 Feb 2019 13:23:59 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 967E76E4E8; Mon, 4 Feb 2019 13:23:58 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id A92FD6E4E0 for ; Mon, 4 Feb 2019 13:23:57 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458852-1500050 for multiple; Mon, 04 Feb 2019 13:22:15 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:21:58 +0000 Message-Id: <20190204132214.9459-7-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 06/22] drm/i915: Revoke mmaps and prevent access to fence registers across reset X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mika Kuoppala Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Previously, we were able to rely on the recursive properties of struct_mutex to allow us to serialise revoking mmaps and reacquiring the FENCE registers with them being clobbered over a global device reset. I then proceeded to throw out the baby with the bath water in order to pursue a struct_mutex-less reset. Perusing LWN for alternative strategies, the dilemma on how to serialise access to a global resource on one side was answered by https://lwn.net/Articles/202847/ -- Sleepable RCU: 1 int readside(void) { 2 int idx; 3 rcu_read_lock(); 4 if (nomoresrcu) { 5 rcu_read_unlock(); 6 return -EINVAL; 7 } 8 idx = srcu_read_lock(&ss); 9 rcu_read_unlock(); 10 /* SRCU read-side critical section. */ 11 srcu_read_unlock(&ss, idx); 12 return 0; 13 } 14 15 void cleanup(void) 16 { 17 nomoresrcu = 1; 18 synchronize_rcu(); 19 synchronize_srcu(&ss); 20 cleanup_srcu_struct(&ss); 21 } No more worrying about stop_machine, just an uber-complex mutex, optimised for reads, with the overhead pushed to the rare reset path. However, we do run the risk of a deadlock as we allocate underneath the SRCU read lock, and the allocation may require a GPU reset, causing a dependency cycle via the in-flight requests. We resolve that by declaring the driver wedged and cancelling all in-flight rendering. v2: Use expedited rcu barriers to match our earlier timing characteristics. v3: Try to annotate locking contexts for sparse v4: Reduce selftest lock duration to avoid a reset deadlock with fences Testcase: igt/gem_mmap_gtt/hang Fixes: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex") Signed-off-by: Chris Wilson Cc: Mika Kuoppala --- drivers/gpu/drm/i915/i915_debugfs.c | 12 +- drivers/gpu/drm/i915/i915_drv.h | 18 +-- drivers/gpu/drm/i915/i915_gem.c | 56 +++------ drivers/gpu/drm/i915/i915_gem_fence_reg.c | 31 +---- drivers/gpu/drm/i915/i915_gpu_error.h | 12 +- drivers/gpu/drm/i915/i915_reset.c | 107 +++++++++++------- drivers/gpu/drm/i915/i915_reset.h | 4 + .../gpu/drm/i915/selftests/intel_hangcheck.c | 5 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 1 + 9 files changed, 109 insertions(+), 137 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index fa2c226fc779..2cea263b4d79 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1281,14 +1281,11 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused) intel_wakeref_t wakeref; enum intel_engine_id id; + seq_printf(m, "Reset flags: %lx\n", dev_priv->gpu_error.flags); if (test_bit(I915_WEDGED, &dev_priv->gpu_error.flags)) - seq_puts(m, "Wedged\n"); + seq_puts(m, "\tWedged\n"); if (test_bit(I915_RESET_BACKOFF, &dev_priv->gpu_error.flags)) - seq_puts(m, "Reset in progress: struct_mutex backoff\n"); - if (waitqueue_active(&dev_priv->gpu_error.wait_queue)) - seq_puts(m, "Waiter holding struct mutex\n"); - if (waitqueue_active(&dev_priv->gpu_error.reset_queue)) - seq_puts(m, "struct_mutex blocked for reset\n"); + seq_puts(m, "\tDevice (global) reset in progress\n"); if (!i915_modparams.enable_hangcheck) { seq_puts(m, "Hangcheck disabled\n"); @@ -3885,9 +3882,6 @@ i915_wedged_set(void *data, u64 val) * while it is writing to 'i915_wedged' */ - if (i915_reset_backoff(&i915->gpu_error)) - return -EAGAIN; - i915_handle_error(i915, val, I915_ERROR_CAPTURE, "Manually set wedged engine mask = %llx", val); return 0; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 534e52e3a8da..3e4538ce5276 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2989,7 +2989,12 @@ i915_gem_obj_finish_shmem_access(struct drm_i915_gem_object *obj) i915_gem_object_unpin_pages(obj); } -int __must_check i915_mutex_lock_interruptible(struct drm_device *dev); +static inline int __must_check +i915_mutex_lock_interruptible(struct drm_device *dev) +{ + return mutex_lock_interruptible(&dev->struct_mutex); +} + int i915_gem_dumb_create(struct drm_file *file_priv, struct drm_device *dev, struct drm_mode_create_dumb *args); @@ -3006,21 +3011,11 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno); struct i915_request * i915_gem_find_active_request(struct intel_engine_cs *engine); -static inline bool i915_reset_backoff(struct i915_gpu_error *error) -{ - return unlikely(test_bit(I915_RESET_BACKOFF, &error->flags)); -} - static inline bool i915_terminally_wedged(struct i915_gpu_error *error) { return unlikely(test_bit(I915_WEDGED, &error->flags)); } -static inline bool i915_reset_backoff_or_wedged(struct i915_gpu_error *error) -{ - return i915_reset_backoff(error) | i915_terminally_wedged(error); -} - static inline u32 i915_reset_count(struct i915_gpu_error *error) { return READ_ONCE(error->reset_count); @@ -3093,7 +3088,6 @@ struct drm_i915_fence_reg * i915_reserve_fence(struct drm_i915_private *dev_priv); void i915_unreserve_fence(struct drm_i915_fence_reg *fence); -void i915_gem_revoke_fences(struct drm_i915_private *dev_priv); void i915_gem_restore_fences(struct drm_i915_private *dev_priv); void i915_gem_detect_bit_6_swizzle(struct drm_i915_private *dev_priv); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index bc7d1338b69a..2c6161c89cc7 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -100,47 +100,6 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv, spin_unlock(&dev_priv->mm.object_stat_lock); } -static int -i915_gem_wait_for_error(struct i915_gpu_error *error) -{ - int ret; - - might_sleep(); - - /* - * Only wait 10 seconds for the gpu reset to complete to avoid hanging - * userspace. If it takes that long something really bad is going on and - * we should simply try to bail out and fail as gracefully as possible. - */ - ret = wait_event_interruptible_timeout(error->reset_queue, - !i915_reset_backoff(error), - I915_RESET_TIMEOUT); - if (ret == 0) { - DRM_ERROR("Timed out waiting for the gpu reset to complete\n"); - return -EIO; - } else if (ret < 0) { - return ret; - } else { - return 0; - } -} - -int i915_mutex_lock_interruptible(struct drm_device *dev) -{ - struct drm_i915_private *dev_priv = to_i915(dev); - int ret; - - ret = i915_gem_wait_for_error(&dev_priv->gpu_error); - if (ret) - return ret; - - ret = mutex_lock_interruptible(&dev->struct_mutex); - if (ret) - return ret; - - return 0; -} - static u32 __i915_gem_park(struct drm_i915_private *i915) { intel_wakeref_t wakeref; @@ -1869,6 +1828,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) intel_wakeref_t wakeref; struct i915_vma *vma; pgoff_t page_offset; + int srcu; int ret; /* Sanity check that we allow writing into this object */ @@ -1908,7 +1868,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) goto err_unlock; } - /* Now pin it into the GTT as needed */ vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE | @@ -1946,9 +1905,15 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) if (ret) goto err_unpin; + srcu = i915_reset_trylock(dev_priv); + if (srcu < 0) { + ret = srcu; + goto err_unpin; + } + ret = i915_vma_pin_fence(vma); if (ret) - goto err_unpin; + goto err_reset; /* Finally, remap it using the new GTT offset */ ret = remap_io_mapping(area, @@ -1969,6 +1934,8 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) err_fence: i915_vma_unpin_fence(vma); +err_reset: + i915_reset_unlock(dev_priv, srcu); err_unpin: __i915_vma_unpin(vma); err_unlock: @@ -5326,6 +5293,7 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv) init_waitqueue_head(&dev_priv->gpu_error.wait_queue); init_waitqueue_head(&dev_priv->gpu_error.reset_queue); mutex_init(&dev_priv->gpu_error.wedge_mutex); + init_srcu_struct(&dev_priv->gpu_error.srcu); atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0); @@ -5358,6 +5326,8 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count)); WARN_ON(dev_priv->mm.object_count); + cleanup_srcu_struct(&dev_priv->gpu_error.srcu); + kmem_cache_destroy(dev_priv->priorities); kmem_cache_destroy(dev_priv->dependencies); kmem_cache_destroy(dev_priv->requests); diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c index 46e259661294..bd0d5b8d6c96 100644 --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c @@ -240,6 +240,10 @@ static int fence_update(struct drm_i915_fence_reg *fence, i915_vma_flush_writes(old); } + ret = i915_reset_trylock(fence->i915); + if (ret < 0) + return ret; + if (fence->vma && fence->vma != vma) { /* Ensure that all userspace CPU access is completed before * stealing the fence. @@ -272,6 +276,7 @@ static int fence_update(struct drm_i915_fence_reg *fence, list_move_tail(&fence->link, &fence->i915->mm.fence_list); } + i915_reset_unlock(fence->i915, ret); return 0; } @@ -435,32 +440,6 @@ void i915_unreserve_fence(struct drm_i915_fence_reg *fence) list_add(&fence->link, &fence->i915->mm.fence_list); } -/** - * i915_gem_revoke_fences - revoke fence state - * @dev_priv: i915 device private - * - * Removes all GTT mmappings via the fence registers. This forces any user - * of the fence to reacquire that fence before continuing with their access. - * One use is during GPU reset where the fence register is lost and we need to - * revoke concurrent userspace access via GTT mmaps until the hardware has been - * reset and the fence registers have been restored. - */ -void i915_gem_revoke_fences(struct drm_i915_private *dev_priv) -{ - int i; - - lockdep_assert_held(&dev_priv->drm.struct_mutex); - - for (i = 0; i < dev_priv->num_fence_regs; i++) { - struct drm_i915_fence_reg *fence = &dev_priv->fence_regs[i]; - - GEM_BUG_ON(fence->vma && fence->vma->fence != fence); - - if (fence->vma) - i915_vma_revoke_mmap(fence->vma); - } -} - /** * i915_gem_restore_fences - restore fence state * @dev_priv: i915 device private diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h index 53b1f22dd365..4e797c552b96 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.h +++ b/drivers/gpu/drm/i915/i915_gpu_error.h @@ -231,12 +231,10 @@ struct i915_gpu_error { /** * flags: Control various stages of the GPU reset * - * #I915_RESET_BACKOFF - When we start a reset, we want to stop any - * other users acquiring the struct_mutex. To do this we set the - * #I915_RESET_BACKOFF bit in the error flags when we detect a reset - * and then check for that bit before acquiring the struct_mutex (in - * i915_mutex_lock_interruptible()?). I915_RESET_BACKOFF serves a - * secondary role in preventing two concurrent global reset attempts. + * #I915_RESET_BACKOFF - When we start a global reset, we need to + * serialise with any other users attempting to do the same, and + * any global resources that may be clobber by the reset (such as + * FENCE registers). * * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to * acquire the struct_mutex to reset an engine, we need an explicit @@ -272,6 +270,8 @@ struct i915_gpu_error { */ wait_queue_head_t reset_queue; + struct srcu_struct srcu; + struct i915_gpu_restart *restart; }; diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c index 4462007a681c..f58fae457ec6 100644 --- a/drivers/gpu/drm/i915/i915_reset.c +++ b/drivers/gpu/drm/i915/i915_reset.c @@ -639,6 +639,31 @@ static void reset_prepare_engine(struct intel_engine_cs *engine) engine->reset.prepare(engine); } +static void revoke_mmaps(struct drm_i915_private *i915) +{ + int i; + + for (i = 0; i < i915->num_fence_regs; i++) { + struct i915_vma *vma = i915->fence_regs[i].vma; + struct drm_vma_offset_node *node; + u64 vma_offset; + + if (!vma) + continue; + + GEM_BUG_ON(vma->fence != &i915->fence_regs[i]); + if (!i915_vma_has_userfault(vma)) + continue; + + node = &vma->obj->base.vma_node; + vma_offset = vma->ggtt_view.partial.offset << PAGE_SHIFT; + unmap_mapping_range(i915->drm.anon_inode->i_mapping, + drm_vma_node_offset_addr(node) + vma_offset, + vma->size, + 1); + } +} + static void reset_prepare(struct drm_i915_private *i915) { struct intel_engine_cs *engine; @@ -648,6 +673,7 @@ static void reset_prepare(struct drm_i915_private *i915) reset_prepare_engine(engine); intel_uc_sanitize(i915); + revoke_mmaps(i915); } static int gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask) @@ -911,50 +937,22 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915) return ret; } -struct __i915_reset { - struct drm_i915_private *i915; - unsigned int stalled_mask; -}; - -static int __i915_reset__BKL(void *data) -{ - struct __i915_reset *arg = data; - int err; - - err = intel_gpu_reset(arg->i915, ALL_ENGINES); - if (err) - return err; - - return gt_reset(arg->i915, arg->stalled_mask); -} - -#if RESET_UNDER_STOP_MACHINE -/* - * XXX An alternative to using stop_machine would be to park only the - * processes that have a GGTT mmap. By remote parking the threads (SIGSTOP) - * we should be able to prevent their memmory accesses via the lost fence - * registers over the course of the reset without the potential recursive - * of mutexes between the pagefault handler and reset. - * - * See igt/gem_mmap_gtt/hang - */ -#define __do_reset(fn, arg) stop_machine(fn, arg, NULL) -#else -#define __do_reset(fn, arg) fn(arg) -#endif - static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask) { - struct __i915_reset arg = { i915, stalled_mask }; int err, i; - err = __do_reset(__i915_reset__BKL, &arg); + /* Flush everyone currently using a resource about to be clobbered */ + synchronize_srcu(&i915->gpu_error.srcu); + + err = intel_gpu_reset(i915, ALL_ENGINES); for (i = 0; err && i < RESET_MAX_RETRIES; i++) { - msleep(100); - err = __do_reset(__i915_reset__BKL, &arg); + msleep(10 * (i + 1)); + err = intel_gpu_reset(i915, ALL_ENGINES); } + if (err) + return err; - return err; + return gt_reset(i915, stalled_mask); } /** @@ -1274,9 +1272,12 @@ void i915_handle_error(struct drm_i915_private *i915, wait_event(i915->gpu_error.reset_queue, !test_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags)); - goto out; + goto out; /* piggy-back on the other reset */ } + /* Make sure i915_reset_trylock() sees the I915_RESET_BACKOFF */ + synchronize_rcu_expedited(); + /* Prevent any other reset-engine attempt. */ for_each_engine(engine, i915, tmp) { while (test_and_set_bit(I915_RESET_ENGINE + engine->id, @@ -1300,6 +1301,36 @@ void i915_handle_error(struct drm_i915_private *i915, intel_runtime_pm_put(i915, wakeref); } +int i915_reset_trylock(struct drm_i915_private *i915) +{ + struct i915_gpu_error *error = &i915->gpu_error; + int srcu; + + rcu_read_lock(); + while (test_bit(I915_RESET_BACKOFF, &error->flags)) { + rcu_read_unlock(); + + if (wait_event_interruptible(error->reset_queue, + !test_bit(I915_RESET_BACKOFF, + &error->flags))) + return -EINTR; + + rcu_read_lock(); + } + srcu = srcu_read_lock(&error->srcu); + rcu_read_unlock(); + + return srcu; +} + +void i915_reset_unlock(struct drm_i915_private *i915, int tag) +__releases(&i915->gpu_error.srcu) +{ + struct i915_gpu_error *error = &i915->gpu_error; + + srcu_read_unlock(&error->srcu, tag); +} + bool i915_reset_flush(struct drm_i915_private *i915) { int err; diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h index f2d347f319df..893c5d1c2eb8 100644 --- a/drivers/gpu/drm/i915/i915_reset.h +++ b/drivers/gpu/drm/i915/i915_reset.h @@ -9,6 +9,7 @@ #include #include +#include struct drm_i915_private; struct intel_engine_cs; @@ -32,6 +33,9 @@ int i915_reset_engine(struct intel_engine_cs *engine, void i915_reset_request(struct i915_request *rq, bool guilty); bool i915_reset_flush(struct drm_i915_private *i915); +int __must_check i915_reset_trylock(struct drm_i915_private *i915); +void i915_reset_unlock(struct drm_i915_private *i915, int tag); + bool intel_has_gpu_reset(struct drm_i915_private *i915); bool intel_has_reset_engine(struct drm_i915_private *i915); diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c index 7b6f3bea9ef8..4886fac12628 100644 --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c @@ -1039,8 +1039,6 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915, /* Check that we can recover an unbind stuck on a hanging request */ - igt_global_reset_lock(i915); - mutex_lock(&i915->drm.struct_mutex); err = hang_init(&h, i915); if (err) @@ -1138,7 +1136,9 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915, } out_reset: + igt_global_reset_lock(i915); fake_hangcheck(rq->i915, intel_engine_flag(rq->engine)); + igt_global_reset_unlock(i915); if (tsk) { struct igt_wedge_me w; @@ -1159,7 +1159,6 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915, hang_fini(&h); unlock: mutex_unlock(&i915->drm.struct_mutex); - igt_global_reset_unlock(i915); if (i915_terminally_wedged(&i915->gpu_error)) return -EIO; diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index 14ae46fda49f..074a0d9cbf26 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -189,6 +189,7 @@ struct drm_i915_private *mock_gem_device(void) init_waitqueue_head(&i915->gpu_error.wait_queue); init_waitqueue_head(&i915->gpu_error.reset_queue); + init_srcu_struct(&i915->gpu_error.srcu); mutex_init(&i915->gpu_error.wedge_mutex); i915->wq = alloc_ordered_workqueue("mock", 0); From patchwork Mon Feb 4 13:21:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795615 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C86C61805 for ; Mon, 4 Feb 2019 13:22:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B81292B497 for ; Mon, 4 Feb 2019 13:22:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AC6CB2B498; Mon, 4 Feb 2019 13:22:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6237B2B49F for ; Mon, 4 Feb 2019 13:22:53 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5630C6E4E2; Mon, 4 Feb 2019 13:22:48 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1AB066E4E2 for ; Mon, 4 Feb 2019 13:22:43 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458853-1500050 for multiple; Mon, 04 Feb 2019 13:22:15 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:21:59 +0000 Message-Id: <20190204132214.9459-8-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 07/22] drm/i915: Force the GPU reset upon wedging X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mika Kuoppala Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP When declaring the GPU wedged, we do need to hit the GPU with the reset hammer so that its state matches our presumed state during cleanup. If the reset fails, it fails, and we may be unhappy but wedged. However, if we are testing our wedge/unwedged handling, the desync carries over into the next test and promptly explodes. References: https://bugs.freedesktop.org/show_bug.cgi?id=106702 Signed-off-by: Chris Wilson Cc: Mika Kuoppala --- drivers/gpu/drm/i915/i915_reset.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c index f58fae457ec6..c6f6400f95b4 100644 --- a/drivers/gpu/drm/i915/i915_reset.c +++ b/drivers/gpu/drm/i915/i915_reset.c @@ -532,9 +532,6 @@ typedef int (*reset_func)(struct drm_i915_private *, static reset_func intel_get_gpu_reset(struct drm_i915_private *i915) { - if (!i915_modparams.reset) - return NULL; - if (INTEL_GEN(i915) >= 8) return gen8_reset_engines; else if (INTEL_GEN(i915) >= 6) @@ -599,6 +596,9 @@ bool intel_has_gpu_reset(struct drm_i915_private *i915) if (USES_GUC(i915)) return false; + if (!i915_modparams.reset) + return NULL; + return intel_get_gpu_reset(i915); } @@ -823,7 +823,7 @@ void i915_gem_set_wedged(struct drm_i915_private *i915) reset_prepare_engine(engine); /* Even if the GPU reset fails, it should still stop the engines */ - if (INTEL_GEN(i915) >= 5) + if (!INTEL_INFO(i915)->gpu_reset_clobbers_display) intel_gpu_reset(i915, ALL_ENGINES); for_each_engine(engine, i915, id) { From patchwork Mon Feb 4 13:22:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795609 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5EDC913B5 for ; Mon, 4 Feb 2019 13:22:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4FD832B497 for ; Mon, 4 Feb 2019 13:22:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 445F12B49A; Mon, 4 Feb 2019 13:22:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id EB43B2B497 for ; Mon, 4 Feb 2019 13:22:50 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5E9B26E4E4; Mon, 4 Feb 2019 13:22:44 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id C2D386E4D7 for ; Mon, 4 Feb 2019 13:22:42 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458860-1500050 for multiple; Mon, 04 Feb 2019 13:22:26 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:00 +0000 Message-Id: <20190204132214.9459-9-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 08/22] drm/i915: Uninterruptibly drain the timelines on unwedging X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP On wedging, we mark all executing requests as complete and all pending requests completed as soon as they are ready. Before unwedging though we wish to flush those pending requests prior to restoring default execution, and so we must wait. Do so interruptibly as we do not provide the EINTR gracefully back to userspace in this case but persistent in the permanently wedged start without restarting the syscall. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_reset.c | 28 ++++++++-------------------- 1 file changed, 8 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c index c6f6400f95b4..7fc86b44d872 100644 --- a/drivers/gpu/drm/i915/i915_reset.c +++ b/drivers/gpu/drm/i915/i915_reset.c @@ -861,7 +861,6 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915) { struct i915_gpu_error *error = &i915->gpu_error; struct i915_timeline *tl; - bool ret = false; if (!test_bit(I915_WEDGED, &error->flags)) return true; @@ -886,30 +885,20 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915) mutex_lock(&i915->gt.timelines.mutex); list_for_each_entry(tl, &i915->gt.timelines.active_list, link) { struct i915_request *rq; - long timeout; rq = i915_gem_active_get_unlocked(&tl->last_request); if (!rq) continue; /* - * We can't use our normal waiter as we want to - * avoid recursively trying to handle the current - * reset. The basic dma_fence_default_wait() installs - * a callback for dma_fence_signal(), which is - * triggered by our nop handler (indirectly, the - * callback enables the signaler thread which is - * woken by the nop_submit_request() advancing the seqno - * and when the seqno passes the fence, the signaler - * then signals the fence waking us up). + * All internal dependencies (i915_requests) will have + * been flushed by the set-wedge, but we may be stuck waiting + * for external fences. These should all be capped to 10s + * (I915_FENCE_TIMEOUT) so this wait should not be unbounded + * in the worst case. */ - timeout = dma_fence_default_wait(&rq->fence, true, - MAX_SCHEDULE_TIMEOUT); + dma_fence_default_wait(&rq->fence, false, MAX_SCHEDULE_TIMEOUT); i915_request_put(rq); - if (timeout < 0) { - mutex_unlock(&i915->gt.timelines.mutex); - goto unlock; - } } mutex_unlock(&i915->gt.timelines.mutex); @@ -930,11 +919,10 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915) smp_mb__before_atomic(); /* complete takeover before enabling execbuf */ clear_bit(I915_WEDGED, &i915->gpu_error.flags); - ret = true; -unlock: + mutex_unlock(&i915->gpu_error.wedge_mutex); - return ret; + return true; } static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask) From patchwork Mon Feb 4 13:22:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795601 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D426A1805 for ; Mon, 4 Feb 2019 13:22:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C36362B497 for ; Mon, 4 Feb 2019 13:22:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B79972B49A; Mon, 4 Feb 2019 13:22:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7264B2B497 for ; Mon, 4 Feb 2019 13:22:45 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B57936E4DE; Mon, 4 Feb 2019 13:22:43 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3DDC86E4DD for ; Mon, 4 Feb 2019 13:22:42 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458861-1500050 for multiple; Mon, 04 Feb 2019 13:22:26 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:01 +0000 Message-Id: <20190204132214.9459-10-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 09/22] drm/i915: Wait for old resets before applying debugfs/i915_wedged X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Since we use the debugfs to recover the device after modifying the i915.reset parameter, we need to be sure that we apply the reset and not piggy-back onto a concurrent one in order for the parameter to take effect. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 2cea263b4d79..54e426883529 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -3874,13 +3874,9 @@ i915_wedged_set(void *data, u64 val) { struct drm_i915_private *i915 = data; - /* - * There is no safeguard against this debugfs entry colliding - * with the hangcheck calling same i915_handle_error() in - * parallel, causing an explosion. For now we assume that the - * test harness is responsible enough not to inject gpu hangs - * while it is writing to 'i915_wedged' - */ + /* Flush any previous reset before applying for a new one */ + wait_event(i915->gpu_error.reset_queue, + !test_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags)); i915_handle_error(i915, val, I915_ERROR_CAPTURE, "Manually set wedged engine mask = %llx", val); From patchwork Mon Feb 4 13:22:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795617 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4A1613B5 for ; Mon, 4 Feb 2019 13:22:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A5BC12B497 for ; Mon, 4 Feb 2019 13:22:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 99D412B49A; Mon, 4 Feb 2019 13:22:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 278D52B497 for ; Mon, 4 Feb 2019 13:22:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3FBD96E4E5; Mon, 4 Feb 2019 13:22:49 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id E70FB6E4E1 for ; Mon, 4 Feb 2019 13:22:43 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458862-1500050 for multiple; Mon, 04 Feb 2019 13:22:26 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:02 +0000 Message-Id: <20190204132214.9459-11-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 10/22] drm/i915: Serialise resets with wedging X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Prevent concurrent set-wedge with ongoing resets (and vice versa) by taking the same wedge_mutex around both operations. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_reset.c | 68 ++++++++++++++++++------------- 1 file changed, 40 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c index 7fc86b44d872..ca19fcf29c5b 100644 --- a/drivers/gpu/drm/i915/i915_reset.c +++ b/drivers/gpu/drm/i915/i915_reset.c @@ -793,17 +793,14 @@ static void nop_submit_request(struct i915_request *request) intel_engine_queue_breadcrumbs(engine); } -void i915_gem_set_wedged(struct drm_i915_private *i915) +static void __i915_gem_set_wedged(struct drm_i915_private *i915) { struct i915_gpu_error *error = &i915->gpu_error; struct intel_engine_cs *engine; enum intel_engine_id id; - mutex_lock(&error->wedge_mutex); - if (test_bit(I915_WEDGED, &error->flags)) { - mutex_unlock(&error->wedge_mutex); + if (test_bit(I915_WEDGED, &error->flags)) return; - } if (GEM_SHOW_DEBUG() && !intel_engines_are_idle(i915)) { struct drm_printer p = drm_debug_printer(__func__); @@ -852,12 +849,18 @@ void i915_gem_set_wedged(struct drm_i915_private *i915) set_bit(I915_WEDGED, &error->flags); GEM_TRACE("end\n"); - mutex_unlock(&error->wedge_mutex); +} - wake_up_all(&error->reset_queue); +void i915_gem_set_wedged(struct drm_i915_private *i915) +{ + struct i915_gpu_error *error = &i915->gpu_error; + + mutex_lock(&error->wedge_mutex); + __i915_gem_set_wedged(i915); + mutex_unlock(&error->wedge_mutex); } -bool i915_gem_unset_wedged(struct drm_i915_private *i915) +static bool __i915_gem_unset_wedged(struct drm_i915_private *i915) { struct i915_gpu_error *error = &i915->gpu_error; struct i915_timeline *tl; @@ -868,8 +871,6 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915) if (!i915->gt.scratch) /* Never full initialised, recovery impossible */ return false; - mutex_lock(&error->wedge_mutex); - GEM_TRACE("start\n"); /* @@ -920,11 +921,21 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915) smp_mb__before_atomic(); /* complete takeover before enabling execbuf */ clear_bit(I915_WEDGED, &i915->gpu_error.flags); - mutex_unlock(&i915->gpu_error.wedge_mutex); - return true; } +bool i915_gem_unset_wedged(struct drm_i915_private *i915) +{ + struct i915_gpu_error *error = &i915->gpu_error; + bool result; + + mutex_lock(&error->wedge_mutex); + result = __i915_gem_unset_wedged(i915); + mutex_unlock(&error->wedge_mutex); + + return result; +} + static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask) { int err, i; @@ -976,7 +987,7 @@ void i915_reset(struct drm_i915_private *i915, GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags)); /* Clear any previous failed attempts at recovery. Time to try again. */ - if (!i915_gem_unset_wedged(i915)) + if (!__i915_gem_unset_wedged(i915)) return; if (reason) @@ -1038,7 +1049,7 @@ void i915_reset(struct drm_i915_private *i915, */ add_taint(TAINT_WARN, LOCKDEP_STILL_OK); error: - i915_gem_set_wedged(i915); + __i915_gem_set_wedged(i915); goto finish; } @@ -1130,7 +1141,9 @@ static void i915_reset_device(struct drm_i915_private *i915, i915_wedge_on_timeout(&w, i915, 5 * HZ) { intel_prepare_reset(i915); + mutex_lock(&error->wedge_mutex); i915_reset(i915, engine_mask, reason); + mutex_unlock(&error->wedge_mutex); intel_finish_reset(i915); } @@ -1198,6 +1211,7 @@ void i915_handle_error(struct drm_i915_private *i915, unsigned long flags, const char *fmt, ...) { + struct i915_gpu_error *error = &i915->gpu_error; struct intel_engine_cs *engine; intel_wakeref_t wakeref; unsigned int tmp; @@ -1234,20 +1248,19 @@ void i915_handle_error(struct drm_i915_private *i915, * Try engine reset when available. We fall back to full reset if * single reset fails. */ - if (intel_has_reset_engine(i915) && - !i915_terminally_wedged(&i915->gpu_error)) { + if (intel_has_reset_engine(i915) && !i915_terminally_wedged(error)) { for_each_engine_masked(engine, i915, engine_mask, tmp) { BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE); if (test_and_set_bit(I915_RESET_ENGINE + engine->id, - &i915->gpu_error.flags)) + &error->flags)) continue; if (i915_reset_engine(engine, msg) == 0) engine_mask &= ~intel_engine_flag(engine); clear_bit(I915_RESET_ENGINE + engine->id, - &i915->gpu_error.flags); - wake_up_bit(&i915->gpu_error.flags, + &error->flags); + wake_up_bit(&error->flags, I915_RESET_ENGINE + engine->id); } } @@ -1256,10 +1269,9 @@ void i915_handle_error(struct drm_i915_private *i915, goto out; /* Full reset needs the mutex, stop any other user trying to do so. */ - if (test_and_set_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags)) { - wait_event(i915->gpu_error.reset_queue, - !test_bit(I915_RESET_BACKOFF, - &i915->gpu_error.flags)); + if (test_and_set_bit(I915_RESET_BACKOFF, &error->flags)) { + wait_event(error->reset_queue, + !test_bit(I915_RESET_BACKOFF, &error->flags)); goto out; /* piggy-back on the other reset */ } @@ -1269,8 +1281,8 @@ void i915_handle_error(struct drm_i915_private *i915, /* Prevent any other reset-engine attempt. */ for_each_engine(engine, i915, tmp) { while (test_and_set_bit(I915_RESET_ENGINE + engine->id, - &i915->gpu_error.flags)) - wait_on_bit(&i915->gpu_error.flags, + &error->flags)) + wait_on_bit(&error->flags, I915_RESET_ENGINE + engine->id, TASK_UNINTERRUPTIBLE); } @@ -1279,11 +1291,11 @@ void i915_handle_error(struct drm_i915_private *i915, for_each_engine(engine, i915, tmp) { clear_bit(I915_RESET_ENGINE + engine->id, - &i915->gpu_error.flags); + &error->flags); } - clear_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags); - wake_up_all(&i915->gpu_error.reset_queue); + clear_bit(I915_RESET_BACKOFF, &error->flags); + wake_up_all(&error->reset_queue); out: intel_runtime_pm_put(i915, wakeref); From patchwork Mon Feb 4 13:22:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795663 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6FD11922 for ; Mon, 4 Feb 2019 13:40:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5F3482AB77 for ; Mon, 4 Feb 2019 13:40:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 540B52AB1F; Mon, 4 Feb 2019 13:40:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 042732AB0E for ; Mon, 4 Feb 2019 13:40:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2CE896E4F6; Mon, 4 Feb 2019 13:40:28 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 620156E4F6 for ; Mon, 4 Feb 2019 13:40:27 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458863-1500050 for multiple; Mon, 04 Feb 2019 13:22:26 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:03 +0000 Message-Id: <20190204132214.9459-12-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 11/22] drm/i915: Don't claim an unstarted request was guilty X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP If we haven't even begun executing the payload of the stalled request, then we should not claim that its userspace context was guilty of submitting a hanging batch. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/intel_lrc.c | 2 +- drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 6 ++++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 0869a4fd20c7..8e301f19036b 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1947,7 +1947,7 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled) rq ? rq->global_seqno : 0, intel_engine_get_seqno(engine), yesno(stalled)); - if (!rq) + if (!rq || !i915_request_started(rq)) goto out_unlock; /* diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c index 4886fac12628..36c17bfe05a7 100644 --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c @@ -246,6 +246,12 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) if (INTEL_GEN(vm->i915) <= 5) flags |= I915_DISPATCH_SECURE; + if (rq->engine->emit_init_breadcrumb) { + err = rq->engine->emit_init_breadcrumb(rq); + if (err) + goto cancel_rq; + } + err = rq->engine->emit_bb_start(rq, vma->node.start, PAGE_SIZE, flags); cancel_rq: From patchwork Mon Feb 4 13:22:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795667 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5DFD914E1 for ; Mon, 4 Feb 2019 13:40:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4B5552A868 for ; Mon, 4 Feb 2019 13:40:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3E5912AEE9; Mon, 4 Feb 2019 13:40:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D0A852AB1F for ; Mon, 4 Feb 2019 13:40:31 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 049046E4F8; Mon, 4 Feb 2019 13:40:30 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 63BAF6E4F8 for ; Mon, 4 Feb 2019 13:40:27 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458864-1500050 for multiple; Mon, 04 Feb 2019 13:22:26 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:04 +0000 Message-Id: <20190204132214.9459-13-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 12/22] drm/i915: Generalise GPU activity tracking X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP We currently track GPU memory usage inside VMA, such that we never release memory used by the GPU until after it has finished accessing it. However, we may want to track other resources aside from VMA, or we may want to split a VMA into multiple independent regions and track each separately. For this purpose, generalise our request tracking (akin to struct reservation_object) so that we can embed it into other objects. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/Makefile | 4 +- drivers/gpu/drm/i915/i915_active.c | 228 ++++++++++++++++++ drivers/gpu/drm/i915/i915_active.h | 69 ++++++ drivers/gpu/drm/i915/i915_active_types.h | 26 ++ drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +- drivers/gpu/drm/i915/i915_vma.c | 173 +++---------- drivers/gpu/drm/i915/i915_vma.h | 9 +- drivers/gpu/drm/i915/selftests/i915_active.c | 158 ++++++++++++ .../drm/i915/selftests/i915_live_selftests.h | 3 +- 9 files changed, 519 insertions(+), 154 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_active.c create mode 100644 drivers/gpu/drm/i915/i915_active.h create mode 100644 drivers/gpu/drm/i915/i915_active_types.h create mode 100644 drivers/gpu/drm/i915/selftests/i915_active.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 210d0e8777b6..1787e1299b1b 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -57,7 +57,9 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o # GEM code -i915-y += i915_cmd_parser.o \ +i915-y += \ + i915_active.o \ + i915_cmd_parser.o \ i915_gem_batch_pool.o \ i915_gem_clflush.o \ i915_gem_context.o \ diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c new file mode 100644 index 000000000000..91950d778cab --- /dev/null +++ b/drivers/gpu/drm/i915/i915_active.c @@ -0,0 +1,228 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2019 Intel Corporation + */ + +#include "i915_drv.h" +#include "i915_active.h" + +#define BKL(ref) (&(ref)->i915->drm.struct_mutex) + +struct active_node { + struct i915_gem_active base; + struct i915_active *ref; + struct rb_node node; + u64 timeline; +}; + +static void +__active_retire(struct i915_active *ref) +{ + GEM_BUG_ON(!ref->count); + if (!--ref->count) + ref->retire(ref); +} + +static void +node_retire(struct i915_gem_active *base, struct i915_request *rq) +{ + __active_retire(container_of(base, struct active_node, base)->ref); +} + +static void +last_retire(struct i915_gem_active *base, struct i915_request *rq) +{ + __active_retire(container_of(base, struct i915_active, last)); +} + +static struct i915_gem_active * +active_instance(struct i915_active *ref, u64 idx) +{ + struct active_node *node; + struct rb_node **p, *parent; + struct i915_request *old; + + /* + * We track the most recently used timeline to skip a rbtree search + * for the common case, under typical loads we never need the rbtree + * at all. We can reuse the last slot if it is empty, that is + * after the previous activity has been retired, or if it matches the + * current timeline. + * + * Note that we allow the timeline to be active simultaneously in + * the rbtree and the last cache. We do this to avoid having + * to search and replace the rbtree element for a new timeline, with + * the cost being that we must be aware that the ref may be retired + * twice for the same timeline (as the older rbtree element will be + * retired before the new request added to last). + */ + old = i915_gem_active_raw(&ref->last, BKL(ref)); + if (!old || old->fence.context == idx) + goto out; + + /* Move the currently active fence into the rbtree */ + idx = old->fence.context; + + parent = NULL; + p = &ref->tree.rb_node; + while (*p) { + parent = *p; + + node = rb_entry(parent, struct active_node, node); + if (node->timeline == idx) + goto replace; + + if (node->timeline < idx) + p = &parent->rb_right; + else + p = &parent->rb_left; + } + + node = kmalloc(sizeof(*node), GFP_KERNEL); + + /* kmalloc may retire the ref->last (thanks shrinker)! */ + if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) { + kfree(node); + goto out; + } + + if (unlikely(!node)) + return ERR_PTR(-ENOMEM); + + init_request_active(&node->base, node_retire); + node->ref = ref; + node->timeline = idx; + + rb_link_node(&node->node, parent, p); + rb_insert_color(&node->node, &ref->tree); + +replace: + /* + * Overwrite the previous active slot in the rbtree with last, + * leaving last zeroed. If the previous slot is still active, + * we must be careful as we now only expect to receive one retire + * callback not two, and so much undo the active counting for the + * overwritten slot. + */ + if (i915_gem_active_isset(&node->base)) { + /* Retire ourselves from the old rq->active_list */ + __list_del_entry(&node->base.link); + ref->count--; + GEM_BUG_ON(!ref->count); + } + GEM_BUG_ON(list_empty(&ref->last.link)); + list_replace_init(&ref->last.link, &node->base.link); + node->base.request = fetch_and_zero(&ref->last.request); + +out: + return &ref->last; +} + +void i915_active_init(struct drm_i915_private *i915, + struct i915_active *ref, + void (*retire)(struct i915_active *ref)) +{ + ref->i915 = i915; + ref->retire = retire; + ref->tree = RB_ROOT; + init_request_active(&ref->last, last_retire); + ref->count = 0; +} + +int i915_active_ref(struct i915_active *ref, + u64 timeline, + struct i915_request *rq) +{ + struct i915_gem_active *active; + + active = active_instance(ref, timeline); + if (IS_ERR(active)) + return PTR_ERR(active); + + if (!i915_gem_active_isset(active)) + ref->count++; + i915_gem_active_set(active, rq); + + GEM_BUG_ON(!ref->count); + return 0; +} + +bool i915_active_acquire(struct i915_active *ref) +{ + lockdep_assert_held(BKL(ref)); + return !ref->count++; +} + +void i915_active_release(struct i915_active *ref) +{ + lockdep_assert_held(BKL(ref)); + __active_retire(ref); +} + +int i915_active_wait(struct i915_active *ref) +{ + struct active_node *it, *n; + int ret = 0; + + if (i915_active_acquire(ref)) + goto out_release; + + ret = i915_gem_active_retire(&ref->last, BKL(ref)); + if (ret) + goto out_release; + + rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { + ret = i915_gem_active_retire(&it->base, BKL(ref)); + if (ret) + break; + } + +out_release: + i915_active_release(ref); + return ret; +} + +static int __i915_request_await_active(struct i915_request *rq, + struct i915_gem_active *active) +{ + struct i915_request *barrier = + i915_gem_active_raw(active, &rq->i915->drm.struct_mutex); + + return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0; +} + +int i915_request_await_active(struct i915_request *rq, struct i915_active *ref) +{ + struct active_node *it, *n; + int ret; + + ret = __i915_request_await_active(rq, &ref->last); + if (ret) + return ret; + + rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { + ret = __i915_request_await_active(rq, &it->base); + if (ret) + return ret; + } + + return 0; +} + +void i915_active_fini(struct i915_active *ref) +{ + struct active_node *it, *n; + + GEM_BUG_ON(i915_gem_active_isset(&ref->last)); + + rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { + GEM_BUG_ON(i915_gem_active_isset(&it->base)); + kfree(it); + } + ref->tree = RB_ROOT; +} + +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) +#include "selftests/i915_active.c" +#endif diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h new file mode 100644 index 000000000000..0aa2628ea734 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_active.h @@ -0,0 +1,69 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2019 Intel Corporation + */ + +#ifndef _I915_ACTIVE_H_ +#define _I915_ACTIVE_H_ + +#include "i915_active_types.h" + +/* + * GPU activity tracking + * + * Each set of commands submitted to the GPU compromises a single request that + * signals a fence upon completion. struct i915_request combines the + * command submission, scheduling and fence signaling roles. If we want to see + * if a particular task is complete, we need to grab the fence (struct + * i915_request) for that task and check or wait for it to be signaled. More + * often though we want to track the status of a bunch of tasks, for example + * to wait for the GPU to finish accessing some memory across a variety of + * different command pipelines from different clients. We could choose to + * track every single request associated with the task, but knowing that + * each request belongs to an ordered timeline (later requests within a + * timeline must wait for earlier requests), we need only track the + * latest request in each timeline to determine the overall status of the + * task. + * + * struct i915_active provides this tracking across timelines. It builds a + * composite shared-fence, and is updated as new work is submitted to the task, + * forming a snapshot of the current status. It should be embedded into the + * different resources that need to track their associated GPU activity to + * provide a callback when that GPU activity has ceased, or otherwise to + * provide a serialisation point either for request submission or for CPU + * synchronisation. + */ + +void i915_active_init(struct drm_i915_private *i915, + struct i915_active *ref, + void (*retire)(struct i915_active *ref)); + +int i915_active_ref(struct i915_active *ref, + u64 timeline, + struct i915_request *rq); + +int i915_active_wait(struct i915_active *ref); + +int i915_request_await_active(struct i915_request *rq, + struct i915_active *ref); + +bool i915_active_acquire(struct i915_active *ref); + +static inline void i915_active_cancel(struct i915_active *ref) +{ + GEM_BUG_ON(ref->count != 1); + ref->count = 0; +} + +void i915_active_release(struct i915_active *ref); + +static inline bool +i915_active_is_idle(const struct i915_active *ref) +{ + return !ref->count; +} + +void i915_active_fini(struct i915_active *ref); + +#endif /* _I915_ACTIVE_H_ */ diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h new file mode 100644 index 000000000000..411e502ed8dd --- /dev/null +++ b/drivers/gpu/drm/i915/i915_active_types.h @@ -0,0 +1,26 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2019 Intel Corporation + */ + +#ifndef _I915_ACTIVE_TYPES_H_ +#define _I915_ACTIVE_TYPES_H_ + +#include + +#include "i915_request.h" + +struct drm_i915_private; + +struct i915_active { + struct drm_i915_private *i915; + + struct rb_root tree; + struct i915_gem_active last; + unsigned int count; + + void (*retire)(struct i915_active *ref); +}; + +#endif /* _I915_ACTIVE_TYPES_H_ */ diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 49b00996a15e..e625659c03a2 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1917,14 +1917,13 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size) if (!vma) return ERR_PTR(-ENOMEM); + i915_active_init(i915, &vma->active, NULL); init_request_active(&vma->last_fence, NULL); vma->vm = &ggtt->vm; vma->ops = &pd_vma_ops; vma->private = ppgtt; - vma->active = RB_ROOT; - vma->size = size; vma->fence_size = size; vma->flags = I915_VMA_GGTT; diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index d83b8ad5f859..d4772061e642 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -63,22 +63,23 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason) #endif -struct i915_vma_active { - struct i915_gem_active base; - struct i915_vma *vma; - struct rb_node node; - u64 timeline; -}; +static void obj_bump_mru(struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); -static void -__i915_vma_retire(struct i915_vma *vma, struct i915_request *rq) + spin_lock(&i915->mm.obj_lock); + if (obj->bind_count) + list_move_tail(&obj->mm.link, &i915->mm.bound_list); + spin_unlock(&i915->mm.obj_lock); + + obj->mm.dirty = true; /* be paranoid */ +} + +static void __i915_vma_retire(struct i915_active *ref) { + struct i915_vma *vma = container_of(ref, typeof(*vma), active); struct drm_i915_gem_object *obj = vma->obj; - GEM_BUG_ON(!i915_vma_is_active(vma)); - if (--vma->active_count) - return; - GEM_BUG_ON(!i915_gem_object_is_active(obj)); if (--obj->active_count) return; @@ -90,16 +91,12 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq) reservation_object_unlock(obj->resv); } - /* Bump our place on the bound list to keep it roughly in LRU order + /* + * Bump our place on the bound list to keep it roughly in LRU order * so that we don't steal from recently used but inactive objects * (unless we are forced to ofc!) */ - spin_lock(&rq->i915->mm.obj_lock); - if (obj->bind_count) - list_move_tail(&obj->mm.link, &rq->i915->mm.bound_list); - spin_unlock(&rq->i915->mm.obj_lock); - - obj->mm.dirty = true; /* be paranoid */ + obj_bump_mru(obj); if (i915_gem_object_has_active_reference(obj)) { i915_gem_object_clear_active_reference(obj); @@ -107,21 +104,6 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq) } } -static void -i915_vma_retire(struct i915_gem_active *base, struct i915_request *rq) -{ - struct i915_vma_active *active = - container_of(base, typeof(*active), base); - - __i915_vma_retire(active->vma, rq); -} - -static void -i915_vma_last_retire(struct i915_gem_active *base, struct i915_request *rq) -{ - __i915_vma_retire(container_of(base, struct i915_vma, last_active), rq); -} - static struct i915_vma * vma_create(struct drm_i915_gem_object *obj, struct i915_address_space *vm, @@ -137,10 +119,9 @@ vma_create(struct drm_i915_gem_object *obj, if (vma == NULL) return ERR_PTR(-ENOMEM); - vma->active = RB_ROOT; - - init_request_active(&vma->last_active, i915_vma_last_retire); + i915_active_init(vm->i915, &vma->active, __i915_vma_retire); init_request_active(&vma->last_fence, NULL); + vma->vm = vm; vma->ops = &vm->vma_ops; vma->obj = obj; @@ -823,7 +804,6 @@ void i915_vma_reopen(struct i915_vma *vma) static void __i915_vma_destroy(struct i915_vma *vma) { struct drm_i915_private *i915 = vma->vm->i915; - struct i915_vma_active *iter, *n; GEM_BUG_ON(vma->node.allocated); GEM_BUG_ON(vma->fence); @@ -843,10 +823,7 @@ static void __i915_vma_destroy(struct i915_vma *vma) spin_unlock(&obj->vma.lock); } - rbtree_postorder_for_each_entry_safe(iter, n, &vma->active, node) { - GEM_BUG_ON(i915_gem_active_isset(&iter->base)); - kfree(iter); - } + i915_active_fini(&vma->active); kmem_cache_free(i915->vmas, vma); } @@ -931,104 +908,15 @@ static void export_fence(struct i915_vma *vma, reservation_object_unlock(resv); } -static struct i915_gem_active *active_instance(struct i915_vma *vma, u64 idx) -{ - struct i915_vma_active *active; - struct rb_node **p, *parent; - struct i915_request *old; - - /* - * We track the most recently used timeline to skip a rbtree search - * for the common case, under typical loads we never need the rbtree - * at all. We can reuse the last_active slot if it is empty, that is - * after the previous activity has been retired, or if the active - * matches the current timeline. - * - * Note that we allow the timeline to be active simultaneously in - * the rbtree and the last_active cache. We do this to avoid having - * to search and replace the rbtree element for a new timeline, with - * the cost being that we must be aware that the vma may be retired - * twice for the same timeline (as the older rbtree element will be - * retired before the new request added to last_active). - */ - old = i915_gem_active_raw(&vma->last_active, - &vma->vm->i915->drm.struct_mutex); - if (!old || old->fence.context == idx) - goto out; - - /* Move the currently active fence into the rbtree */ - idx = old->fence.context; - - parent = NULL; - p = &vma->active.rb_node; - while (*p) { - parent = *p; - - active = rb_entry(parent, struct i915_vma_active, node); - if (active->timeline == idx) - goto replace; - - if (active->timeline < idx) - p = &parent->rb_right; - else - p = &parent->rb_left; - } - - active = kmalloc(sizeof(*active), GFP_KERNEL); - - /* kmalloc may retire the vma->last_active request (thanks shrinker)! */ - if (unlikely(!i915_gem_active_raw(&vma->last_active, - &vma->vm->i915->drm.struct_mutex))) { - kfree(active); - goto out; - } - - if (unlikely(!active)) - return ERR_PTR(-ENOMEM); - - init_request_active(&active->base, i915_vma_retire); - active->vma = vma; - active->timeline = idx; - - rb_link_node(&active->node, parent, p); - rb_insert_color(&active->node, &vma->active); - -replace: - /* - * Overwrite the previous active slot in the rbtree with last_active, - * leaving last_active zeroed. If the previous slot is still active, - * we must be careful as we now only expect to receive one retire - * callback not two, and so much undo the active counting for the - * overwritten slot. - */ - if (i915_gem_active_isset(&active->base)) { - /* Retire ourselves from the old rq->active_list */ - __list_del_entry(&active->base.link); - vma->active_count--; - GEM_BUG_ON(!vma->active_count); - } - GEM_BUG_ON(list_empty(&vma->last_active.link)); - list_replace_init(&vma->last_active.link, &active->base.link); - active->base.request = fetch_and_zero(&vma->last_active.request); - -out: - return &vma->last_active; -} - int i915_vma_move_to_active(struct i915_vma *vma, struct i915_request *rq, unsigned int flags) { struct drm_i915_gem_object *obj = vma->obj; - struct i915_gem_active *active; lockdep_assert_held(&rq->i915->drm.struct_mutex); GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); - active = active_instance(vma, rq->fence.context); - if (IS_ERR(active)) - return PTR_ERR(active); - /* * Add a reference if we're newly entering the active list. * The order in which we add operations to the retirement queue is @@ -1037,9 +925,15 @@ int i915_vma_move_to_active(struct i915_vma *vma, * add the active reference first and queue for it to be dropped * *last*. */ - if (!i915_gem_active_isset(active) && !vma->active_count++) + if (!vma->active.count) obj->active_count++; - i915_gem_active_set(active, rq); + + if (unlikely(i915_active_ref(&vma->active, rq->fence.context, rq))) { + if (!vma->active.count) + obj->active_count--; + return -ENOMEM; + } + GEM_BUG_ON(!i915_vma_is_active(vma)); GEM_BUG_ON(!obj->active_count); @@ -1073,8 +967,6 @@ int i915_vma_unbind(struct i915_vma *vma) */ might_sleep(); if (i915_vma_is_active(vma)) { - struct i915_vma_active *active, *n; - /* * When a closed VMA is retired, it is unbound - eek. * In order to prevent it from being recursively closed, @@ -1090,19 +982,10 @@ int i915_vma_unbind(struct i915_vma *vma) */ __i915_vma_pin(vma); - ret = i915_gem_active_retire(&vma->last_active, - &vma->vm->i915->drm.struct_mutex); + ret = i915_active_wait(&vma->active); if (ret) goto unpin; - rbtree_postorder_for_each_entry_safe(active, n, - &vma->active, node) { - ret = i915_gem_active_retire(&active->base, - &vma->vm->i915->drm.struct_mutex); - if (ret) - goto unpin; - } - ret = i915_gem_active_retire(&vma->last_fence, &vma->vm->i915->drm.struct_mutex); unpin: diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 5793abe509a2..3c03d4569481 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -34,6 +34,7 @@ #include "i915_gem_fence_reg.h" #include "i915_gem_object.h" +#include "i915_active.h" #include "i915_request.h" enum i915_cache_level; @@ -108,9 +109,7 @@ struct i915_vma { #define I915_VMA_USERFAULT BIT(I915_VMA_USERFAULT_BIT) #define I915_VMA_GGTT_WRITE BIT(15) - unsigned int active_count; - struct rb_root active; - struct i915_gem_active last_active; + struct i915_active active; struct i915_gem_active last_fence; /** @@ -154,9 +153,9 @@ i915_vma_instance(struct drm_i915_gem_object *obj, void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags); #define I915_VMA_RELEASE_MAP BIT(0) -static inline bool i915_vma_is_active(struct i915_vma *vma) +static inline bool i915_vma_is_active(const struct i915_vma *vma) { - return vma->active_count; + return !i915_active_is_idle(&vma->active); } int __must_check i915_vma_move_to_active(struct i915_vma *vma, diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c new file mode 100644 index 000000000000..c05ca366729a --- /dev/null +++ b/drivers/gpu/drm/i915/selftests/i915_active.c @@ -0,0 +1,158 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2018 Intel Corporation + */ + +#include "../i915_selftest.h" + +#include "igt_flush_test.h" +#include "lib_sw_fence.h" + +struct live_active { + struct i915_active base; + bool retired; +}; + +static void __live_active_retire(struct i915_active *base) +{ + struct live_active *active = container_of(base, typeof(*active), base); + + active->retired = true; +} + +static int __live_active_setup(struct drm_i915_private *i915, + struct live_active *active) +{ + struct intel_engine_cs *engine; + struct i915_sw_fence *submit; + enum intel_engine_id id; + unsigned int count = 0; + int err = 0; + + i915_active_init(i915, &active->base, __live_active_retire); + active->retired = false; + + if (!i915_active_acquire(&active->base)) { + pr_err("First i915_active_acquire should report being idle\n"); + return -EINVAL; + } + + submit = heap_fence_create(GFP_KERNEL); + + for_each_engine(engine, i915, id) { + struct i915_request *rq; + + rq = i915_request_alloc(engine, i915->kernel_context); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + break; + } + + err = i915_sw_fence_await_sw_fence_gfp(&rq->submit, + submit, + GFP_KERNEL); + if (err < 0) { + pr_err("Failed to allocate submission fence!\n"); + i915_request_add(rq); + break; + } + + err = i915_active_ref(&active->base, rq->fence.context, rq); + if (err) { + pr_err("Failed to track active ref!\n"); + i915_request_add(rq); + break; + } + + i915_request_add(rq); + count++; + } + + i915_active_release(&active->base); + if (active->retired) { + pr_err("i915_active retired before submission!\n"); + err = -EINVAL; + } + if (active->base.count != count) { + pr_err("i915_active not tracking all requests, found %d, expected %d\n", + active->base.count, count); + err = -EINVAL; + } + + i915_sw_fence_commit(submit); + heap_fence_put(submit); + + return err; +} + +static int live_active_wait(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct live_active active; + intel_wakeref_t wakeref; + int err; + + /* Check that we get a callback when requests retire upon waiting */ + + mutex_lock(&i915->drm.struct_mutex); + wakeref = intel_runtime_pm_get(i915); + + err = __live_active_setup(i915, &active); + + i915_active_wait(&active.base); + if (!active.retired) { + pr_err("i915_active not retired after waiting!\n"); + err = -EINVAL; + } + + i915_active_fini(&active.base); + if (igt_flush_test(i915, I915_WAIT_LOCKED)) + err = -EIO; + + intel_runtime_pm_put(i915, wakeref); + mutex_unlock(&i915->drm.struct_mutex); + return err; +} + +static int live_active_retire(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct live_active active; + intel_wakeref_t wakeref; + int err; + + /* Check that we get a callback when requests are indirectly retired */ + + mutex_lock(&i915->drm.struct_mutex); + wakeref = intel_runtime_pm_get(i915); + + err = __live_active_setup(i915, &active); + + /* waits for & retires all requests */ + if (igt_flush_test(i915, I915_WAIT_LOCKED)) + err = -EIO; + + if (!active.retired) { + pr_err("i915_active not retired after flushing!\n"); + err = -EINVAL; + } + + i915_active_fini(&active.base); + intel_runtime_pm_put(i915, wakeref); + mutex_unlock(&i915->drm.struct_mutex); + return err; +} + +int i915_active_live_selftests(struct drm_i915_private *i915) +{ + static const struct i915_subtest tests[] = { + SUBTEST(live_active_wait), + SUBTEST(live_active_retire), + }; + + if (i915_terminally_wedged(&i915->gpu_error)) + return 0; + + return i915_subtests(tests, i915); +} diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h index 76b4f87fc853..6d766925ad04 100644 --- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h +++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h @@ -12,8 +12,9 @@ selftest(sanitycheck, i915_live_sanitycheck) /* keep first (igt selfcheck) */ selftest(uncore, intel_uncore_live_selftests) selftest(workarounds, intel_workarounds_live_selftests) -selftest(requests, i915_request_live_selftests) selftest(timelines, i915_timeline_live_selftests) +selftest(requests, i915_request_live_selftests) +selftest(active, i915_active_live_selftests) selftest(objects, i915_gem_object_live_selftests) selftest(dmabuf, i915_gem_dmabuf_live_selftests) selftest(coherency, i915_gem_coherency_live_selftests) From patchwork Mon Feb 4 13:22:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795619 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5EA5513A4 for ; Mon, 4 Feb 2019 13:22:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4DE2F2B497 for ; Mon, 4 Feb 2019 13:22:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 423622B49A; Mon, 4 Feb 2019 13:22:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DF18D2B497 for ; Mon, 4 Feb 2019 13:22:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B324B6E4EA; Mon, 4 Feb 2019 13:22:49 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id E6E1F6E4E0 for ; Mon, 4 Feb 2019 13:22:43 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458865-1500050 for multiple; Mon, 04 Feb 2019 13:22:26 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:05 +0000 Message-Id: <20190204132214.9459-14-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 13/22] drm/i915: Release the active tracker tree upon idling X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP As soon as we detect that the active tracker is idle and we prepare to call the retire callback, release the storage for our tree of per-timeline nodes. We expect these to be infrequently used and quick to allocate, so there is little benefit in keeping the tree cached and we would prefer to return the pages back to the system in a timely fashion. This also means that when we finalize the struct as a whole, we know as the activity tracker must be idle, the tree has already been released. Indeed we can reduce i915_active_fini() just to the assertions that there is nothing to do. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_active.c | 33 +++++++++++++++++++++--------- drivers/gpu/drm/i915/i915_active.h | 4 ++++ 2 files changed, 27 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index 91950d778cab..b1fefe98f9a6 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -16,12 +16,29 @@ struct active_node { u64 timeline; }; +static void +__active_park(struct i915_active *ref) +{ + struct active_node *it, *n; + + rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { + GEM_BUG_ON(i915_gem_active_isset(&it->base)); + kfree(it); + } + ref->tree = RB_ROOT; +} + static void __active_retire(struct i915_active *ref) { GEM_BUG_ON(!ref->count); - if (!--ref->count) - ref->retire(ref); + if (--ref->count) + return; + + /* return the unused nodes to our slabcache */ + __active_park(ref); + + ref->retire(ref); } static void @@ -210,18 +227,14 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref) return 0; } +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) void i915_active_fini(struct i915_active *ref) { - struct active_node *it, *n; - GEM_BUG_ON(i915_gem_active_isset(&ref->last)); - - rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { - GEM_BUG_ON(i915_gem_active_isset(&it->base)); - kfree(it); - } - ref->tree = RB_ROOT; + GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree)); + GEM_BUG_ON(ref->count); } +#endif #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/i915_active.c" diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h index 0aa2628ea734..ec4b66efd9a7 100644 --- a/drivers/gpu/drm/i915/i915_active.h +++ b/drivers/gpu/drm/i915/i915_active.h @@ -64,6 +64,10 @@ i915_active_is_idle(const struct i915_active *ref) return !ref->count; } +#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) void i915_active_fini(struct i915_active *ref); +#else +static inline void i915_active_fini(struct i915_active *ref) { } +#endif #endif /* _I915_ACTIVE_H_ */ From patchwork Mon Feb 4 13:22:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795611 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2512A13A4 for ; Mon, 4 Feb 2019 13:22:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 142FF2B498 for ; Mon, 4 Feb 2019 13:22:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0907A2B49F; Mon, 4 Feb 2019 13:22:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A76022B498 for ; Mon, 4 Feb 2019 13:22:52 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CF5FB6E4DD; Mon, 4 Feb 2019 13:22:47 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9D9D56E4DD for ; Mon, 4 Feb 2019 13:22:43 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458866-1500050 for multiple; Mon, 04 Feb 2019 13:22:27 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:06 +0000 Message-Id: <20190204132214.9459-15-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 14/22] drm/i915: Allocate active tracking nodes from a slabcache X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Wrap the active tracking for a GPU references in a slabcache for faster allocations, and hopefully better fragmentation reduction. v3: Nothing device specific left, it's just a slabcache that we can make global. v4: Include i915_active.h and don't put the initfunc under DEBUG_GEM Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_active.c | 31 +++++++++++++++++++++++++++--- drivers/gpu/drm/i915/i915_active.h | 3 +++ drivers/gpu/drm/i915/i915_pci.c | 4 ++++ 3 files changed, 35 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index b1fefe98f9a6..64661c41532b 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -9,6 +9,17 @@ #define BKL(ref) (&(ref)->i915->drm.struct_mutex) +/* + * Active refs memory management + * + * To be more economical with memory, we reap all the i915_active trees as + * they idle (when we know the active requests are inactive) and allocate the + * nodes from a local slab cache to hopefully reduce the fragmentation. + */ +static struct i915_global_active { + struct kmem_cache *slab_cache; +} global; + struct active_node { struct i915_gem_active base; struct i915_active *ref; @@ -23,7 +34,7 @@ __active_park(struct i915_active *ref) rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { GEM_BUG_ON(i915_gem_active_isset(&it->base)); - kfree(it); + kmem_cache_free(global.slab_cache, it); } ref->tree = RB_ROOT; } @@ -96,11 +107,11 @@ active_instance(struct i915_active *ref, u64 idx) p = &parent->rb_left; } - node = kmalloc(sizeof(*node), GFP_KERNEL); + node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL); /* kmalloc may retire the ref->last (thanks shrinker)! */ if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) { - kfree(node); + kmem_cache_free(global.slab_cache, node); goto out; } @@ -239,3 +250,17 @@ void i915_active_fini(struct i915_active *ref) #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/i915_active.c" #endif + +int __init i915_global_active_init(void) +{ + global.slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN); + if (!global.slab_cache) + return -ENOMEM; + + return 0; +} + +void __exit i915_global_active_exit(void) +{ + kmem_cache_destroy(global.slab_cache); +} diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h index ec4b66efd9a7..179b47aeec33 100644 --- a/drivers/gpu/drm/i915/i915_active.h +++ b/drivers/gpu/drm/i915/i915_active.h @@ -70,4 +70,7 @@ void i915_active_fini(struct i915_active *ref); static inline void i915_active_fini(struct i915_active *ref) { } #endif +int i915_global_active_init(void); +void i915_global_active_exit(void); + #endif /* _I915_ACTIVE_H_ */ diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 5d05572c9ff4..852b6b4e8ed8 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -28,6 +28,7 @@ #include +#include "i915_active.h" #include "i915_drv.h" #include "i915_selftest.h" @@ -800,6 +801,8 @@ static int __init i915_init(void) bool use_kms = true; int err; + i915_global_active_init(); + err = i915_mock_selftests(); if (err) return err > 0 ? 0 : err; @@ -831,6 +834,7 @@ static void __exit i915_exit(void) return; pci_unregister_driver(&i915_pci_driver); + i915_global_active_exit(); } module_init(i915_init); From patchwork Mon Feb 4 13:22:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795613 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 650A713B5 for ; Mon, 4 Feb 2019 13:22:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 514792B497 for ; Mon, 4 Feb 2019 13:22:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 453612B49A; Mon, 4 Feb 2019 13:22:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id BAA8A2B497 for ; Mon, 4 Feb 2019 13:22:51 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 171126E4E7; Mon, 4 Feb 2019 13:22:45 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 189CE6E4DF for ; Mon, 4 Feb 2019 13:22:42 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458867-1500050 for multiple; Mon, 04 Feb 2019 13:22:27 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:07 +0000 Message-Id: <20190204132214.9459-16-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 15/22] drm/i915: Make request allocation caches global X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP As kmem_caches share the same properties (size, allocation/free behaviour) for all potential devices, we can use global caches. While this potential has worse fragmentation behaviour (one can argue that different devices would have different activity lifetimes, but you can also argue that activity is temporal across the system) it is the default behaviour of the system at large to amalgamate matching caches. The benefit for us is much reduced pointer dancing along the frequent allocation paths. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_active.c | 9 ++- drivers/gpu/drm/i915/i915_active.h | 1 + drivers/gpu/drm/i915/i915_drv.h | 3 - drivers/gpu/drm/i915/i915_gem.c | 32 +-------- drivers/gpu/drm/i915/i915_globals.c | 49 ++++++++++++++ drivers/gpu/drm/i915/i915_globals.h | 14 ++++ drivers/gpu/drm/i915/i915_pci.c | 8 ++- drivers/gpu/drm/i915/i915_request.c | 53 ++++++++++++--- drivers/gpu/drm/i915/i915_request.h | 10 +++ drivers/gpu/drm/i915/i915_scheduler.c | 66 +++++++++++++++---- drivers/gpu/drm/i915/i915_scheduler.h | 34 ++++++++-- drivers/gpu/drm/i915/intel_guc_submission.c | 3 +- drivers/gpu/drm/i915/intel_lrc.c | 6 +- drivers/gpu/drm/i915/intel_ringbuffer.h | 17 ----- drivers/gpu/drm/i915/selftests/intel_lrc.c | 2 +- drivers/gpu/drm/i915/selftests/mock_engine.c | 48 +++++++------- .../gpu/drm/i915/selftests/mock_gem_device.c | 26 -------- drivers/gpu/drm/i915/selftests/mock_request.c | 12 ++-- drivers/gpu/drm/i915/selftests/mock_request.h | 7 -- 20 files changed, 248 insertions(+), 153 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_globals.c create mode 100644 drivers/gpu/drm/i915/i915_globals.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 1787e1299b1b..a1d834068765 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -77,6 +77,7 @@ i915-y += \ i915_gem_tiling.o \ i915_gem_userptr.o \ i915_gemfs.o \ + i915_globals.o \ i915_query.o \ i915_request.o \ i915_scheduler.o \ diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index 64661c41532b..d23092d8c89f 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -251,7 +251,7 @@ void i915_active_fini(struct i915_active *ref) #include "selftests/i915_active.c" #endif -int __init i915_global_active_init(void) +int i915_global_active_init(void) { global.slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN); if (!global.slab_cache) @@ -260,7 +260,12 @@ int __init i915_global_active_init(void) return 0; } -void __exit i915_global_active_exit(void) +void i915_global_active_shrink(void) +{ + kmem_cache_shrink(global.slab_cache); +} + +void i915_global_active_exit(void) { kmem_cache_destroy(global.slab_cache); } diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h index 179b47aeec33..6c56d10b1f59 100644 --- a/drivers/gpu/drm/i915/i915_active.h +++ b/drivers/gpu/drm/i915/i915_active.h @@ -71,6 +71,7 @@ static inline void i915_active_fini(struct i915_active *ref) { } #endif int i915_global_active_init(void); +void i915_global_active_shrink(void); void i915_global_active_exit(void); #endif /* _I915_ACTIVE_H_ */ diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 3e4538ce5276..e48e3c228d9c 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1459,9 +1459,6 @@ struct drm_i915_private { struct kmem_cache *objects; struct kmem_cache *vmas; struct kmem_cache *luts; - struct kmem_cache *requests; - struct kmem_cache *dependencies; - struct kmem_cache *priorities; const struct intel_device_info __info; /* Use INTEL_INFO() to access. */ struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */ diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 2c6161c89cc7..d82e4f990586 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -42,6 +42,7 @@ #include "i915_drv.h" #include "i915_gem_clflush.h" #include "i915_gemfs.h" +#include "i915_globals.h" #include "i915_reset.h" #include "i915_trace.h" #include "i915_vgpu.h" @@ -2916,12 +2917,11 @@ static void shrink_caches(struct drm_i915_private *i915) * filled slabs to prioritise allocating from the mostly full slabs, * with the aim of reducing fragmentation. */ - kmem_cache_shrink(i915->priorities); - kmem_cache_shrink(i915->dependencies); - kmem_cache_shrink(i915->requests); kmem_cache_shrink(i915->luts); kmem_cache_shrink(i915->vmas); kmem_cache_shrink(i915->objects); + + i915_globals_shrink(); } struct sleep_rcu_work { @@ -5264,23 +5264,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv) if (!dev_priv->luts) goto err_vmas; - dev_priv->requests = KMEM_CACHE(i915_request, - SLAB_HWCACHE_ALIGN | - SLAB_RECLAIM_ACCOUNT | - SLAB_TYPESAFE_BY_RCU); - if (!dev_priv->requests) - goto err_luts; - - dev_priv->dependencies = KMEM_CACHE(i915_dependency, - SLAB_HWCACHE_ALIGN | - SLAB_RECLAIM_ACCOUNT); - if (!dev_priv->dependencies) - goto err_requests; - - dev_priv->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN); - if (!dev_priv->priorities) - goto err_dependencies; - INIT_LIST_HEAD(&dev_priv->gt.active_rings); INIT_LIST_HEAD(&dev_priv->gt.closed_vma); @@ -5305,12 +5288,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv) return 0; -err_dependencies: - kmem_cache_destroy(dev_priv->dependencies); -err_requests: - kmem_cache_destroy(dev_priv->requests); -err_luts: - kmem_cache_destroy(dev_priv->luts); err_vmas: kmem_cache_destroy(dev_priv->vmas); err_objects: @@ -5328,9 +5305,6 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) cleanup_srcu_struct(&dev_priv->gpu_error.srcu); - kmem_cache_destroy(dev_priv->priorities); - kmem_cache_destroy(dev_priv->dependencies); - kmem_cache_destroy(dev_priv->requests); kmem_cache_destroy(dev_priv->luts); kmem_cache_destroy(dev_priv->vmas); kmem_cache_destroy(dev_priv->objects); diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c new file mode 100644 index 000000000000..2ecf9897fd16 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_globals.c @@ -0,0 +1,49 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2019 Intel Corporation + */ + +#include "i915_active.h" +#include "i915_globals.h" +#include "i915_request.h" +#include "i915_scheduler.h" + +int __init i915_globals_init(void) +{ + int err; + + err = i915_global_active_init(); + if (err) + return err; + + err = i915_global_request_init(); + if (err) + goto err_active; + + err = i915_global_scheduler_init(); + if (err) + goto err_request; + + return 0; + +err_request: + i915_global_request_exit(); +err_active: + i915_global_active_exit(); + return err; +} + +void i915_globals_shrink(void) +{ + i915_global_active_shrink(); + i915_global_request_shrink(); + i915_global_scheduler_shrink(); +} + +void __exit i915_globals_exit(void) +{ + i915_global_scheduler_exit(); + i915_global_request_exit(); + i915_global_active_exit(); +} diff --git a/drivers/gpu/drm/i915/i915_globals.h b/drivers/gpu/drm/i915/i915_globals.h new file mode 100644 index 000000000000..903f52c0a1d2 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_globals.h @@ -0,0 +1,14 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2019 Intel Corporation + */ + +#ifndef _I915_GLOBALS_H_ +#define _I915_GLOBALS_H_ + +int i915_globals_init(void); +void i915_globals_shrink(void); +void i915_globals_exit(void); + +#endif /* _I915_GLOBALS_H_ */ diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 852b6b4e8ed8..0d684eb530c3 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -28,8 +28,8 @@ #include -#include "i915_active.h" #include "i915_drv.h" +#include "i915_globals.h" #include "i915_selftest.h" #define PLATFORM(x) .platform = (x), .platform_mask = BIT(x) @@ -801,7 +801,9 @@ static int __init i915_init(void) bool use_kms = true; int err; - i915_global_active_init(); + err = i915_globals_init(); + if (err) + return err; err = i915_mock_selftests(); if (err) @@ -834,7 +836,7 @@ static void __exit i915_exit(void) return; pci_unregister_driver(&i915_pci_driver); - i915_global_active_exit(); + i915_globals_exit(); } module_init(i915_init); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 04c65e6d83b9..3bb4840ba761 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -31,6 +31,11 @@ #include "i915_drv.h" #include "i915_reset.h" +static struct i915_global_request { + struct kmem_cache *slab_requests; + struct kmem_cache *slab_dependencies; +} global; + static const char *i915_fence_get_driver_name(struct dma_fence *fence) { return "i915"; @@ -83,7 +88,7 @@ static void i915_fence_release(struct dma_fence *fence) */ i915_sw_fence_fini(&rq->submit); - kmem_cache_free(rq->i915->requests, rq); + kmem_cache_free(global.slab_requests, rq); } const struct dma_fence_ops i915_fence_ops = { @@ -301,7 +306,7 @@ static void i915_request_retire(struct i915_request *request) unreserve_gt(request->i915); - i915_sched_node_fini(request->i915, &request->sched); + i915_sched_node_fini(&request->sched); i915_request_put(request); } @@ -535,7 +540,7 @@ i915_request_alloc_slow(struct intel_context *ce) ring_retire_requests(ring); out: - return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL); + return kmem_cache_alloc(global.slab_requests, GFP_KERNEL); } /** @@ -617,7 +622,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) * * Do not use kmem_cache_zalloc() here! */ - rq = kmem_cache_alloc(i915->requests, + rq = kmem_cache_alloc(global.slab_requests, GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); if (unlikely(!rq)) { rq = i915_request_alloc_slow(ce); @@ -701,7 +706,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) GEM_BUG_ON(!list_empty(&rq->sched.signalers_list)); GEM_BUG_ON(!list_empty(&rq->sched.waiters_list)); - kmem_cache_free(i915->requests, rq); + kmem_cache_free(global.slab_requests, rq); err_unreserve: unreserve_gt(i915); intel_context_unpin(ce); @@ -720,9 +725,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from) return 0; if (to->engine->schedule) { - ret = i915_sched_node_add_dependency(to->i915, - &to->sched, - &from->sched); + ret = i915_sched_node_add_dependency(&to->sched, &from->sched); if (ret < 0) return ret; } @@ -1195,3 +1198,37 @@ void i915_retire_requests(struct drm_i915_private *i915) #include "selftests/mock_request.c" #include "selftests/i915_request.c" #endif + +int i915_global_request_init(void) +{ + global.slab_requests = KMEM_CACHE(i915_request, + SLAB_HWCACHE_ALIGN | + SLAB_RECLAIM_ACCOUNT | + SLAB_TYPESAFE_BY_RCU); + if (!global.slab_requests) + return -ENOMEM; + + global.slab_dependencies = KMEM_CACHE(i915_dependency, + SLAB_HWCACHE_ALIGN | + SLAB_RECLAIM_ACCOUNT); + if (!global.slab_dependencies) + goto err_requests; + + return 0; + +err_requests: + kmem_cache_destroy(global.slab_requests); + return -ENOMEM; +} + +void i915_global_request_shrink(void) +{ + kmem_cache_shrink(global.slab_dependencies); + kmem_cache_shrink(global.slab_requests); +} + +void i915_global_request_exit(void) +{ + kmem_cache_destroy(global.slab_dependencies); + kmem_cache_destroy(global.slab_requests); +} diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 3cffb96203b9..054bd300984b 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -29,6 +29,7 @@ #include "i915_gem.h" #include "i915_scheduler.h" +#include "i915_selftest.h" #include "i915_sw_fence.h" #include @@ -204,6 +205,11 @@ struct i915_request { struct drm_i915_file_private *file_priv; /** file_priv list entry for this request */ struct list_head client_link; + + I915_SELFTEST_DECLARE(struct { + struct list_head link; + unsigned long delay; + } mock;) }; #define I915_FENCE_GFP (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN) @@ -786,4 +792,8 @@ i915_gem_active_retire(struct i915_gem_active *active, #define for_each_active(mask, idx) \ for (; mask ? idx = ffs(mask) - 1, 1 : 0; mask &= ~BIT(idx)) +int i915_global_request_init(void); +void i915_global_request_shrink(void); +void i915_global_request_exit(void); + #endif /* I915_REQUEST_H */ diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index d01683167c77..7c1d9ef98374 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -10,6 +10,11 @@ #include "i915_request.h" #include "i915_scheduler.h" +static struct i915_global_scheduler { + struct kmem_cache *slab_dependencies; + struct kmem_cache *slab_priorities; +} global; + static DEFINE_SPINLOCK(schedule_lock); static const struct i915_request * @@ -32,16 +37,15 @@ void i915_sched_node_init(struct i915_sched_node *node) } static struct i915_dependency * -i915_dependency_alloc(struct drm_i915_private *i915) +i915_dependency_alloc(void) { - return kmem_cache_alloc(i915->dependencies, GFP_KERNEL); + return kmem_cache_alloc(global.slab_dependencies, GFP_KERNEL); } static void -i915_dependency_free(struct drm_i915_private *i915, - struct i915_dependency *dep) +i915_dependency_free(struct i915_dependency *dep) { - kmem_cache_free(i915->dependencies, dep); + kmem_cache_free(global.slab_dependencies, dep); } bool __i915_sched_node_add_dependency(struct i915_sched_node *node, @@ -68,25 +72,23 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node, return ret; } -int i915_sched_node_add_dependency(struct drm_i915_private *i915, - struct i915_sched_node *node, +int i915_sched_node_add_dependency(struct i915_sched_node *node, struct i915_sched_node *signal) { struct i915_dependency *dep; - dep = i915_dependency_alloc(i915); + dep = i915_dependency_alloc(); if (!dep) return -ENOMEM; if (!__i915_sched_node_add_dependency(node, signal, dep, I915_DEPENDENCY_ALLOC)) - i915_dependency_free(i915, dep); + i915_dependency_free(dep); return 0; } -void i915_sched_node_fini(struct drm_i915_private *i915, - struct i915_sched_node *node) +void i915_sched_node_fini(struct i915_sched_node *node) { struct i915_dependency *dep, *tmp; @@ -106,7 +108,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915, list_del(&dep->wait_link); if (dep->flags & I915_DEPENDENCY_ALLOC) - i915_dependency_free(i915, dep); + i915_dependency_free(dep); } /* Remove ourselves from everyone who depends upon us */ @@ -116,7 +118,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915, list_del(&dep->signal_link); if (dep->flags & I915_DEPENDENCY_ALLOC) - i915_dependency_free(i915, dep); + i915_dependency_free(dep); } spin_unlock(&schedule_lock); @@ -193,7 +195,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio) if (prio == I915_PRIORITY_NORMAL) { p = &execlists->default_priolist; } else { - p = kmem_cache_alloc(engine->i915->priorities, GFP_ATOMIC); + p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC); /* Convert an allocation failure to a priority bump */ if (unlikely(!p)) { prio = I915_PRIORITY_NORMAL; /* recurses just once */ @@ -408,3 +410,39 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump) spin_unlock_bh(&schedule_lock); } + +void __i915_priolist_free(struct i915_priolist *p) +{ + kmem_cache_free(global.slab_priorities, p); +} + +int i915_global_scheduler_init(void) +{ + global.slab_dependencies = KMEM_CACHE(i915_dependency, + SLAB_HWCACHE_ALIGN); + if (!global.slab_dependencies) + return -ENOMEM; + + global.slab_priorities = KMEM_CACHE(i915_priolist, + SLAB_HWCACHE_ALIGN); + if (!global.slab_priorities) + goto err_priorities; + + return 0; + +err_priorities: + kmem_cache_destroy(global.slab_priorities); + return -ENOMEM; +} + +void i915_global_scheduler_shrink(void) +{ + kmem_cache_shrink(global.slab_dependencies); + kmem_cache_shrink(global.slab_priorities); +} + +void i915_global_scheduler_exit(void) +{ + kmem_cache_destroy(global.slab_dependencies); + kmem_cache_destroy(global.slab_priorities); +} diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 54bd6c89817e..5196ce07b6c2 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -85,6 +85,23 @@ struct i915_dependency { #define I915_DEPENDENCY_ALLOC BIT(0) }; +struct i915_priolist { + struct list_head requests[I915_PRIORITY_COUNT]; + struct rb_node node; + unsigned long used; + int priority; +}; + +#define priolist_for_each_request(it, plist, idx) \ + for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \ + list_for_each_entry(it, &(plist)->requests[idx], sched.link) + +#define priolist_for_each_request_consume(it, n, plist, idx) \ + for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \ + list_for_each_entry_safe(it, n, \ + &(plist)->requests[idx - 1], \ + sched.link) + void i915_sched_node_init(struct i915_sched_node *node); bool __i915_sched_node_add_dependency(struct i915_sched_node *node, @@ -92,12 +109,10 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node, struct i915_dependency *dep, unsigned long flags); -int i915_sched_node_add_dependency(struct drm_i915_private *i915, - struct i915_sched_node *node, +int i915_sched_node_add_dependency(struct i915_sched_node *node, struct i915_sched_node *signal); -void i915_sched_node_fini(struct drm_i915_private *i915, - struct i915_sched_node *node); +void i915_sched_node_fini(struct i915_sched_node *node); void i915_schedule(struct i915_request *request, const struct i915_sched_attr *attr); @@ -107,4 +122,15 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump); struct list_head * i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio); +void __i915_priolist_free(struct i915_priolist *p); +static inline void i915_priolist_free(struct i915_priolist *p) +{ + if (p->priority != I915_PRIORITY_NORMAL) + __i915_priolist_free(p); +} + +int i915_global_scheduler_init(void); +void i915_global_scheduler_shrink(void); +void i915_global_scheduler_exit(void); + #endif /* _I915_SCHEDULER_H_ */ diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index 8bc8aa54aa35..4cf94513615d 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -781,8 +781,7 @@ static bool __guc_dequeue(struct intel_engine_cs *engine) } rb_erase_cached(&p->node, &execlists->queue); - if (p->priority != I915_PRIORITY_NORMAL) - kmem_cache_free(engine->i915->priorities, p); + i915_priolist_free(p); } done: execlists->queue_priority_hint = diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 8e301f19036b..e37f207afb5a 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -806,8 +806,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) } rb_erase_cached(&p->node, &execlists->queue); - if (p->priority != I915_PRIORITY_NORMAL) - kmem_cache_free(engine->i915->priorities, p); + i915_priolist_free(p); } done: @@ -966,8 +965,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine) } rb_erase_cached(&p->node, &execlists->queue); - if (p->priority != I915_PRIORITY_NORMAL) - kmem_cache_free(engine->i915->priorities, p); + i915_priolist_free(p); } intel_write_status_page(engine, diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 8183d3441907..5dffccb6740e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -185,23 +185,6 @@ enum intel_engine_id { #define _VECS(n) (VECS + (n)) }; -struct i915_priolist { - struct list_head requests[I915_PRIORITY_COUNT]; - struct rb_node node; - unsigned long used; - int priority; -}; - -#define priolist_for_each_request(it, plist, idx) \ - for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \ - list_for_each_entry(it, &(plist)->requests[idx], sched.link) - -#define priolist_for_each_request_consume(it, n, plist, idx) \ - for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \ - list_for_each_entry_safe(it, n, \ - &(plist)->requests[idx - 1], \ - sched.link) - struct st_preempt_hang { struct completion completion; unsigned int count; diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c index 967cefa118ee..30ab0e04a674 100644 --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c @@ -440,7 +440,7 @@ static struct i915_request *dummy_request(struct intel_engine_cs *engine) static void dummy_request_free(struct i915_request *dummy) { i915_request_mark_complete(dummy); - i915_sched_node_fini(dummy->engine->i915, &dummy->sched); + i915_sched_node_fini(&dummy->sched); kfree(dummy); } diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c index 08f0cab02e0f..0d35af07867b 100644 --- a/drivers/gpu/drm/i915/selftests/mock_engine.c +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c @@ -76,28 +76,27 @@ static void mock_ring_free(struct intel_ring *base) kfree(ring); } -static struct mock_request *first_request(struct mock_engine *engine) +static struct i915_request *first_request(struct mock_engine *engine) { return list_first_entry_or_null(&engine->hw_queue, - struct mock_request, - link); + struct i915_request, + mock.link); } -static void advance(struct mock_request *request) +static void advance(struct i915_request *request) { - list_del_init(&request->link); - intel_engine_write_global_seqno(request->base.engine, - request->base.global_seqno); - i915_request_mark_complete(&request->base); - GEM_BUG_ON(!i915_request_completed(&request->base)); + list_del_init(&request->mock.link); + intel_engine_write_global_seqno(request->engine, request->global_seqno); + i915_request_mark_complete(request); + GEM_BUG_ON(!i915_request_completed(request)); - intel_engine_queue_breadcrumbs(request->base.engine); + intel_engine_queue_breadcrumbs(request->engine); } static void hw_delay_complete(struct timer_list *t) { struct mock_engine *engine = from_timer(engine, t, hw_delay); - struct mock_request *request; + struct i915_request *request; unsigned long flags; spin_lock_irqsave(&engine->hw_lock, flags); @@ -112,8 +111,9 @@ static void hw_delay_complete(struct timer_list *t) * requeue the timer for the next delayed request. */ while ((request = first_request(engine))) { - if (request->delay) { - mod_timer(&engine->hw_delay, jiffies + request->delay); + if (request->mock.delay) { + mod_timer(&engine->hw_delay, + jiffies + request->mock.delay); break; } @@ -171,10 +171,8 @@ mock_context_pin(struct intel_engine_cs *engine, static int mock_request_alloc(struct i915_request *request) { - struct mock_request *mock = container_of(request, typeof(*mock), base); - - INIT_LIST_HEAD(&mock->link); - mock->delay = 0; + INIT_LIST_HEAD(&request->mock.link); + request->mock.delay = 0; return 0; } @@ -192,7 +190,6 @@ static u32 *mock_emit_breadcrumb(struct i915_request *request, u32 *cs) static void mock_submit_request(struct i915_request *request) { - struct mock_request *mock = container_of(request, typeof(*mock), base); struct mock_engine *engine = container_of(request->engine, typeof(*engine), base); unsigned long flags; @@ -201,12 +198,13 @@ static void mock_submit_request(struct i915_request *request) GEM_BUG_ON(!request->global_seqno); spin_lock_irqsave(&engine->hw_lock, flags); - list_add_tail(&mock->link, &engine->hw_queue); - if (mock->link.prev == &engine->hw_queue) { - if (mock->delay) - mod_timer(&engine->hw_delay, jiffies + mock->delay); + list_add_tail(&request->mock.link, &engine->hw_queue); + if (list_is_first(&request->mock.link, &engine->hw_queue)) { + if (request->mock.delay) + mod_timer(&engine->hw_delay, + jiffies + request->mock.delay); else - advance(mock); + advance(request); } spin_unlock_irqrestore(&engine->hw_lock, flags); } @@ -266,12 +264,12 @@ void mock_engine_flush(struct intel_engine_cs *engine) { struct mock_engine *mock = container_of(engine, typeof(*mock), base); - struct mock_request *request, *rn; + struct i915_request *request, *rn; del_timer_sync(&mock->hw_delay); spin_lock_irq(&mock->hw_lock); - list_for_each_entry_safe(request, rn, &mock->hw_queue, link) + list_for_each_entry_safe(request, rn, &mock->hw_queue, mock.link) advance(request); spin_unlock_irq(&mock->hw_lock); } diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index 074a0d9cbf26..17915a2d94fa 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -79,9 +79,6 @@ static void mock_device_release(struct drm_device *dev) destroy_workqueue(i915->wq); - kmem_cache_destroy(i915->priorities); - kmem_cache_destroy(i915->dependencies); - kmem_cache_destroy(i915->requests); kmem_cache_destroy(i915->vmas); kmem_cache_destroy(i915->objects); @@ -211,23 +208,6 @@ struct drm_i915_private *mock_gem_device(void) if (!i915->vmas) goto err_objects; - i915->requests = KMEM_CACHE(mock_request, - SLAB_HWCACHE_ALIGN | - SLAB_RECLAIM_ACCOUNT | - SLAB_TYPESAFE_BY_RCU); - if (!i915->requests) - goto err_vmas; - - i915->dependencies = KMEM_CACHE(i915_dependency, - SLAB_HWCACHE_ALIGN | - SLAB_RECLAIM_ACCOUNT); - if (!i915->dependencies) - goto err_requests; - - i915->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN); - if (!i915->priorities) - goto err_dependencies; - i915_timelines_init(i915); INIT_LIST_HEAD(&i915->gt.active_rings); @@ -257,12 +237,6 @@ struct drm_i915_private *mock_gem_device(void) err_unlock: mutex_unlock(&i915->drm.struct_mutex); i915_timelines_fini(i915); - kmem_cache_destroy(i915->priorities); -err_dependencies: - kmem_cache_destroy(i915->dependencies); -err_requests: - kmem_cache_destroy(i915->requests); -err_vmas: kmem_cache_destroy(i915->vmas); err_objects: kmem_cache_destroy(i915->objects); diff --git a/drivers/gpu/drm/i915/selftests/mock_request.c b/drivers/gpu/drm/i915/selftests/mock_request.c index 0dc29e242597..d1a7c9608712 100644 --- a/drivers/gpu/drm/i915/selftests/mock_request.c +++ b/drivers/gpu/drm/i915/selftests/mock_request.c @@ -31,29 +31,25 @@ mock_request(struct intel_engine_cs *engine, unsigned long delay) { struct i915_request *request; - struct mock_request *mock; /* NB the i915->requests slab cache is enlarged to fit mock_request */ request = i915_request_alloc(engine, context); if (IS_ERR(request)) return NULL; - mock = container_of(request, typeof(*mock), base); - mock->delay = delay; - - return &mock->base; + request->mock.delay = delay; + return request; } bool mock_cancel_request(struct i915_request *request) { - struct mock_request *mock = container_of(request, typeof(*mock), base); struct mock_engine *engine = container_of(request->engine, typeof(*engine), base); bool was_queued; spin_lock_irq(&engine->hw_lock); - was_queued = !list_empty(&mock->link); - list_del_init(&mock->link); + was_queued = !list_empty(&request->mock.link); + list_del_init(&request->mock.link); spin_unlock_irq(&engine->hw_lock); if (was_queued) diff --git a/drivers/gpu/drm/i915/selftests/mock_request.h b/drivers/gpu/drm/i915/selftests/mock_request.h index 995fb728380c..4acf0211df20 100644 --- a/drivers/gpu/drm/i915/selftests/mock_request.h +++ b/drivers/gpu/drm/i915/selftests/mock_request.h @@ -29,13 +29,6 @@ #include "../i915_request.h" -struct mock_request { - struct i915_request base; - - struct list_head link; - unsigned long delay; -}; - struct i915_request * mock_request(struct intel_engine_cs *engine, struct i915_gem_context *context, From patchwork Mon Feb 4 13:22:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795669 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 72D361805 for ; Mon, 4 Feb 2019 13:40:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 600242A868 for ; Mon, 4 Feb 2019 13:40:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 546772AF5E; Mon, 4 Feb 2019 13:40:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D71C12AF58 for ; Mon, 4 Feb 2019 13:40:32 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F3B7E6E360; Mon, 4 Feb 2019 13:40:29 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3A8AA6E360 for ; Mon, 4 Feb 2019 13:40:28 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458869-1500050 for multiple; Mon, 04 Feb 2019 13:22:27 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:08 +0000 Message-Id: <20190204132214.9459-17-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 16/22] drm/i915: Add timeline barrier support X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP From: Tvrtko Ursulin Timeline barrier allows serialization between different timelines. After calling i915_timeline_set_barrier with a request, all following submissions on this timeline will be set up as depending on this request, or barrier. Once the barrier has been completed it automatically gets cleared and things continue as normal. This facility will be used by the upcoming context SSEU code. v2: * Assert barrier has been retired on timeline_fini. (Chris Wilson) * Fix mock_timeline. v3: * Improved comment language. (Chris Wilson) v4: * Maintain ordering with previous barriers set on the timeline. Signed-off-by: Tvrtko Ursulin Suggested-by: Chris Wilson Cc: Chris Wilson Reviewed-by: Chris Wilson --- drivers/gpu/drm/i915/i915_request.c | 17 ++++++++++++++ drivers/gpu/drm/i915/i915_timeline.c | 21 ++++++++++++++++++ drivers/gpu/drm/i915/i915_timeline.h | 22 +++++++++++++++++++ .../gpu/drm/i915/selftests/mock_timeline.c | 1 + 4 files changed, 61 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 3bb4840ba761..f5b2c95125ba 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -543,6 +543,19 @@ i915_request_alloc_slow(struct intel_context *ce) return kmem_cache_alloc(global.slab_requests, GFP_KERNEL); } +static int add_barrier(struct i915_request *rq, struct i915_gem_active *active) +{ + struct i915_request *barrier = + i915_gem_active_raw(active, &rq->i915->drm.struct_mutex); + + return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0; +} + +static int add_timeline_barrier(struct i915_request *rq) +{ + return add_barrier(rq, &rq->timeline->barrier); +} + /** * i915_request_alloc - allocate a request structure * @@ -685,6 +698,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) */ rq->head = rq->ring->emit; + ret = add_timeline_barrier(rq); + if (ret) + goto err_unwind; + ret = engine->request_alloc(rq); if (ret) goto err_unwind; diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c index 5ea3af393ffe..b354843a5040 100644 --- a/drivers/gpu/drm/i915/i915_timeline.c +++ b/drivers/gpu/drm/i915/i915_timeline.c @@ -163,6 +163,7 @@ int i915_timeline_init(struct drm_i915_private *i915, spin_lock_init(&timeline->lock); + init_request_active(&timeline->barrier, NULL); init_request_active(&timeline->last_request, NULL); INIT_LIST_HEAD(&timeline->requests); @@ -235,6 +236,7 @@ void i915_timeline_fini(struct i915_timeline *timeline) { GEM_BUG_ON(timeline->pin_count); GEM_BUG_ON(!list_empty(&timeline->requests)); + GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier)); i915_syncmap_free(&timeline->sync); hwsp_free(timeline); @@ -266,6 +268,25 @@ i915_timeline_create(struct drm_i915_private *i915, return timeline; } +int i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq) +{ + struct i915_request *old; + int err; + + lockdep_assert_held(&rq->i915->drm.struct_mutex); + + /* Must maintain ordering wrt existing barriers */ + old = i915_gem_active_raw(&tl->barrier, &rq->i915->drm.struct_mutex); + if (old) { + err = i915_request_await_dma_fence(rq, &old->fence); + if (err) + return err; + } + + i915_gem_active_set(&tl->barrier, rq); + return 0; +} + int i915_timeline_pin(struct i915_timeline *tl) { int err; diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h index 8caeb66d1cd5..d167e04073c5 100644 --- a/drivers/gpu/drm/i915/i915_timeline.h +++ b/drivers/gpu/drm/i915/i915_timeline.h @@ -74,6 +74,16 @@ struct i915_timeline { */ struct i915_syncmap *sync; + /** + * Barrier provides the ability to serialize ordering between different + * timelines. + * + * Users can call i915_timeline_set_barrier which will make all + * subsequent submissions to this timeline be executed only after the + * barrier has been completed. + */ + struct i915_gem_active barrier; + struct list_head link; const char *name; struct drm_i915_private *i915; @@ -155,4 +165,16 @@ void i915_timelines_init(struct drm_i915_private *i915); void i915_timelines_park(struct drm_i915_private *i915); void i915_timelines_fini(struct drm_i915_private *i915); +/** + * i915_timeline_set_barrier - orders submission between different timelines + * @timeline: timeline to set the barrier on + * @rq: request after which new submissions can proceed + * + * Sets the passed in request as the serialization point for all subsequent + * submissions on @timeline. Subsequent requests will not be submitted to GPU + * until the barrier has been completed. + */ +int i915_timeline_set_barrier(struct i915_timeline *timeline, + struct i915_request *rq); + #endif diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c index cf39ccd9fc05..e5659aaa856d 100644 --- a/drivers/gpu/drm/i915/selftests/mock_timeline.c +++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c @@ -15,6 +15,7 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context) spin_lock_init(&timeline->lock); + init_request_active(&timeline->barrier, NULL); init_request_active(&timeline->last_request, NULL); INIT_LIST_HEAD(&timeline->requests); From patchwork Mon Feb 4 13:22:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795673 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83E85922 for ; Mon, 4 Feb 2019 13:40:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6FC082ABDD for ; Mon, 4 Feb 2019 13:40:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6DB4F2AEE9; Mon, 4 Feb 2019 13:40:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8D38F2AF72 for ; Mon, 4 Feb 2019 13:40:34 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2C6A06E4FD; Mon, 4 Feb 2019 13:40:32 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 603946E360 for ; Mon, 4 Feb 2019 13:40:27 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458870-1500050 for multiple; Mon, 04 Feb 2019 13:22:27 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:09 +0000 Message-Id: <20190204132214.9459-18-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 17/22] drm/i915: Pull i915_gem_active into the i915_active family X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Looking forward, we need to break the struct_mutex dependency on i915_gem_active. In the meantime, external use of i915_gem_active is quite beguiling, little do new users suspect that it implies a barrier as each request it tracks must be ordered wrt the previous one. As one of many, it can be used to track activity across multiple timelines, a shared fence, which fits our unordered request submission much better. We need to steer external users away from the singular, exclusive fence imposed by i915_gem_active to i915_active instead. As part of that process, we move i915_gem_active out of i915_request.c into i915_active.c to start separating the two concepts, and rename it to i915_active_request (both to tie it to the concept of tracking just one request, and to give it a longer, less appealing name). Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_active.c | 62 ++- drivers/gpu/drm/i915/i915_active.h | 349 ++++++++++++++++ drivers/gpu/drm/i915/i915_active_types.h | 16 +- drivers/gpu/drm/i915/i915_debugfs.c | 2 +- drivers/gpu/drm/i915/i915_gem.c | 10 +- drivers/gpu/drm/i915/i915_gem_context.c | 4 +- drivers/gpu/drm/i915/i915_gem_fence_reg.c | 4 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +- drivers/gpu/drm/i915/i915_gem_object.h | 2 +- drivers/gpu/drm/i915/i915_gpu_error.c | 10 +- drivers/gpu/drm/i915/i915_request.c | 35 +- drivers/gpu/drm/i915/i915_request.h | 383 ------------------ drivers/gpu/drm/i915/i915_reset.c | 2 +- drivers/gpu/drm/i915/i915_timeline.c | 25 +- drivers/gpu/drm/i915/i915_timeline.h | 14 +- drivers/gpu/drm/i915/i915_vma.c | 12 +- drivers/gpu/drm/i915/i915_vma.h | 2 +- drivers/gpu/drm/i915/intel_engine_cs.c | 2 +- drivers/gpu/drm/i915/intel_overlay.c | 33 +- drivers/gpu/drm/i915/selftests/intel_lrc.c | 4 +- .../gpu/drm/i915/selftests/mock_timeline.c | 4 +- 21 files changed, 474 insertions(+), 503 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index d23092d8c89f..846900535d10 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -21,7 +21,7 @@ static struct i915_global_active { } global; struct active_node { - struct i915_gem_active base; + struct i915_active_request base; struct i915_active *ref; struct rb_node node; u64 timeline; @@ -33,7 +33,7 @@ __active_park(struct i915_active *ref) struct active_node *it, *n; rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { - GEM_BUG_ON(i915_gem_active_isset(&it->base)); + GEM_BUG_ON(i915_active_request_isset(&it->base)); kmem_cache_free(global.slab_cache, it); } ref->tree = RB_ROOT; @@ -53,18 +53,18 @@ __active_retire(struct i915_active *ref) } static void -node_retire(struct i915_gem_active *base, struct i915_request *rq) +node_retire(struct i915_active_request *base, struct i915_request *rq) { __active_retire(container_of(base, struct active_node, base)->ref); } static void -last_retire(struct i915_gem_active *base, struct i915_request *rq) +last_retire(struct i915_active_request *base, struct i915_request *rq) { __active_retire(container_of(base, struct i915_active, last)); } -static struct i915_gem_active * +static struct i915_active_request * active_instance(struct i915_active *ref, u64 idx) { struct active_node *node; @@ -85,7 +85,7 @@ active_instance(struct i915_active *ref, u64 idx) * twice for the same timeline (as the older rbtree element will be * retired before the new request added to last). */ - old = i915_gem_active_raw(&ref->last, BKL(ref)); + old = i915_active_request_raw(&ref->last, BKL(ref)); if (!old || old->fence.context == idx) goto out; @@ -110,7 +110,7 @@ active_instance(struct i915_active *ref, u64 idx) node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL); /* kmalloc may retire the ref->last (thanks shrinker)! */ - if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) { + if (unlikely(!i915_active_request_raw(&ref->last, BKL(ref)))) { kmem_cache_free(global.slab_cache, node); goto out; } @@ -118,7 +118,7 @@ active_instance(struct i915_active *ref, u64 idx) if (unlikely(!node)) return ERR_PTR(-ENOMEM); - init_request_active(&node->base, node_retire); + i915_active_request_init(&node->base, NULL, node_retire); node->ref = ref; node->timeline = idx; @@ -133,7 +133,7 @@ active_instance(struct i915_active *ref, u64 idx) * callback not two, and so much undo the active counting for the * overwritten slot. */ - if (i915_gem_active_isset(&node->base)) { + if (i915_active_request_isset(&node->base)) { /* Retire ourselves from the old rq->active_list */ __list_del_entry(&node->base.link); ref->count--; @@ -154,7 +154,7 @@ void i915_active_init(struct drm_i915_private *i915, ref->i915 = i915; ref->retire = retire; ref->tree = RB_ROOT; - init_request_active(&ref->last, last_retire); + i915_active_request_init(&ref->last, NULL, last_retire); ref->count = 0; } @@ -162,15 +162,15 @@ int i915_active_ref(struct i915_active *ref, u64 timeline, struct i915_request *rq) { - struct i915_gem_active *active; + struct i915_active_request *active; active = active_instance(ref, timeline); if (IS_ERR(active)) return PTR_ERR(active); - if (!i915_gem_active_isset(active)) + if (!i915_active_request_isset(active)) ref->count++; - i915_gem_active_set(active, rq); + __i915_active_request_set(active, rq); GEM_BUG_ON(!ref->count); return 0; @@ -196,12 +196,12 @@ int i915_active_wait(struct i915_active *ref) if (i915_active_acquire(ref)) goto out_release; - ret = i915_gem_active_retire(&ref->last, BKL(ref)); + ret = i915_active_request_retire(&ref->last, BKL(ref)); if (ret) goto out_release; rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { - ret = i915_gem_active_retire(&it->base, BKL(ref)); + ret = i915_active_request_retire(&it->base, BKL(ref)); if (ret) break; } @@ -211,11 +211,11 @@ int i915_active_wait(struct i915_active *ref) return ret; } -static int __i915_request_await_active(struct i915_request *rq, - struct i915_gem_active *active) +int i915_request_await_active_request(struct i915_request *rq, + struct i915_active_request *active) { struct i915_request *barrier = - i915_gem_active_raw(active, &rq->i915->drm.struct_mutex); + i915_active_request_raw(active, &rq->i915->drm.struct_mutex); return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0; } @@ -225,12 +225,12 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref) struct active_node *it, *n; int ret; - ret = __i915_request_await_active(rq, &ref->last); + ret = i915_request_await_active_request(rq, &ref->last); if (ret) return ret; rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { - ret = __i915_request_await_active(rq, &it->base); + ret = i915_request_await_active_request(rq, &it->base); if (ret) return ret; } @@ -241,12 +241,32 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref) #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) void i915_active_fini(struct i915_active *ref) { - GEM_BUG_ON(i915_gem_active_isset(&ref->last)); + GEM_BUG_ON(i915_active_request_isset(&ref->last)); GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree)); GEM_BUG_ON(ref->count); } #endif +int i915_active_request_set(struct i915_active_request *active, + struct i915_request *rq) +{ + int err; + + /* Must maintain ordering wrt previous active requests */ + err = i915_request_await_active_request(rq, active); + if (err) + return err; + + __i915_active_request_set(active, rq); + return 0; +} + +void i915_active_retire_noop(struct i915_active_request *active, + struct i915_request *request) +{ + /* Space left intentionally blank */ +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/i915_active.c" #endif diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h index 6c56d10b1f59..5fbd9102384b 100644 --- a/drivers/gpu/drm/i915/i915_active.h +++ b/drivers/gpu/drm/i915/i915_active.h @@ -7,7 +7,354 @@ #ifndef _I915_ACTIVE_H_ #define _I915_ACTIVE_H_ +#include + #include "i915_active_types.h" +#include "i915_request.h" + +/* + * We treat requests as fences. This is not be to confused with our + * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync. + * We use the fences to synchronize access from the CPU with activity on the + * GPU, for example, we should not rewrite an object's PTE whilst the GPU + * is reading them. We also track fences at a higher level to provide + * implicit synchronisation around GEM objects, e.g. set-domain will wait + * for outstanding GPU rendering before marking the object ready for CPU + * access, or a pageflip will wait until the GPU is complete before showing + * the frame on the scanout. + * + * In order to use a fence, the object must track the fence it needs to + * serialise with. For example, GEM objects want to track both read and + * write access so that we can perform concurrent read operations between + * the CPU and GPU engines, as well as waiting for all rendering to + * complete, or waiting for the last GPU user of a "fence register". The + * object then embeds a #i915_active_request to track the most recent (in + * retirement order) request relevant for the desired mode of access. + * The #i915_active_request is updated with i915_active_request_set() to + * track the most recent fence request, typically this is done as part of + * i915_vma_move_to_active(). + * + * When the #i915_active_request completes (is retired), it will + * signal its completion to the owner through a callback as well as mark + * itself as idle (i915_active_request.request == NULL). The owner + * can then perform any action, such as delayed freeing of an active + * resource including itself. + */ + +void i915_active_retire_noop(struct i915_active_request *active, + struct i915_request *request); + +/** + * i915_active_request_init - prepares the activity tracker for use + * @active - the active tracker + * @rq - initial request to track, can be NULL + * @func - a callback when then the tracker is retired (becomes idle), + * can be NULL + * + * i915_active_request_init() prepares the embedded @active struct for use as + * an activity tracker, that is for tracking the last known active request + * associated with it. When the last request becomes idle, when it is retired + * after completion, the optional callback @func is invoked. + */ +static inline void +i915_active_request_init(struct i915_active_request *active, + struct i915_request *rq, + i915_active_retire_fn retire) +{ + RCU_INIT_POINTER(active->request, rq); + INIT_LIST_HEAD(&active->link); + active->retire = retire ?: i915_active_retire_noop; +} + +#define INIT_ACTIVE_REQUEST(name) i915_active_request_init((name), NULL, NULL) + +/** + * i915_active_request_set - updates the tracker to watch the current request + * @active - the active tracker + * @request - the request to watch + * + * __i915_active_request_set() watches the given @request for completion. Whilst + * that @request is busy, the @active reports busy. When that @request is + * retired, the @active tracker is updated to report idle. + */ +static inline void +__i915_active_request_set(struct i915_active_request *active, + struct i915_request *request) +{ + list_move(&active->link, &request->active_list); + rcu_assign_pointer(active->request, request); +} + +int __must_check +i915_active_request_set(struct i915_active_request *active, + struct i915_request *rq); + +/** + * i915_active_request_set_retire_fn - updates the retirement callback + * @active - the active tracker + * @fn - the routine called when the request is retired + * @mutex - struct_mutex used to guard retirements + * + * i915_active_request_set_retire_fn() updates the function pointer that + * is called when the final request associated with the @active tracker + * is retired. + */ +static inline void +i915_active_request_set_retire_fn(struct i915_active_request *active, + i915_active_retire_fn fn, + struct mutex *mutex) +{ + lockdep_assert_held(mutex); + active->retire = fn ?: i915_active_retire_noop; +} + +static inline struct i915_request * +__i915_active_request_peek(const struct i915_active_request *active) +{ + /* + * Inside the error capture (running with the driver in an unknown + * state), we want to bend the rules slightly (a lot). + * + * Work is in progress to make it safer, in the meantime this keeps + * the known issue from spamming the logs. + */ + return rcu_dereference_protected(active->request, 1); +} + +/** + * i915_active_request_raw - return the active request + * @active - the active tracker + * + * i915_active_request_raw() returns the current request being tracked, or NULL. + * It does not obtain a reference on the request for the caller, so the caller + * must hold struct_mutex. + */ +static inline struct i915_request * +i915_active_request_raw(const struct i915_active_request *active, + struct mutex *mutex) +{ + return rcu_dereference_protected(active->request, + lockdep_is_held(mutex)); +} + +/** + * i915_active_request_peek - report the active request being monitored + * @active - the active tracker + * + * i915_active_request_peek() returns the current request being tracked if + * still active, or NULL. It does not obtain a reference on the request + * for the caller, so the caller must hold struct_mutex. + */ +static inline struct i915_request * +i915_active_request_peek(const struct i915_active_request *active, + struct mutex *mutex) +{ + struct i915_request *request; + + request = i915_active_request_raw(active, mutex); + if (!request || i915_request_completed(request)) + return NULL; + + return request; +} + +/** + * i915_active_request_get - return a reference to the active request + * @active - the active tracker + * + * i915_active_request_get() returns a reference to the active request, or NULL + * if the active tracker is idle. The caller must hold struct_mutex. + */ +static inline struct i915_request * +i915_active_request_get(const struct i915_active_request *active, + struct mutex *mutex) +{ + return i915_request_get(i915_active_request_peek(active, mutex)); +} + +/** + * __i915_active_request_get_rcu - return a reference to the active request + * @active - the active tracker + * + * __i915_active_request_get() returns a reference to the active request, + * or NULL if the active tracker is idle. The caller must hold the RCU read + * lock, but the returned pointer is safe to use outside of RCU. + */ +static inline struct i915_request * +__i915_active_request_get_rcu(const struct i915_active_request *active) +{ + /* + * Performing a lockless retrieval of the active request is super + * tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing + * slab of request objects will not be freed whilst we hold the + * RCU read lock. It does not guarantee that the request itself + * will not be freed and then *reused*. Viz, + * + * Thread A Thread B + * + * rq = active.request + * retire(rq) -> free(rq); + * (rq is now first on the slab freelist) + * active.request = NULL + * + * rq = new submission on a new object + * ref(rq) + * + * To prevent the request from being reused whilst the caller + * uses it, we take a reference like normal. Whilst acquiring + * the reference we check that it is not in a destroyed state + * (refcnt == 0). That prevents the request being reallocated + * whilst the caller holds on to it. To check that the request + * was not reallocated as we acquired the reference we have to + * check that our request remains the active request across + * the lookup, in the same manner as a seqlock. The visibility + * of the pointer versus the reference counting is controlled + * by using RCU barriers (rcu_dereference and rcu_assign_pointer). + * + * In the middle of all that, we inspect whether the request is + * complete. Retiring is lazy so the request may be completed long + * before the active tracker is updated. Querying whether the + * request is complete is far cheaper (as it involves no locked + * instructions setting cachelines to exclusive) than acquiring + * the reference, so we do it first. The RCU read lock ensures the + * pointer dereference is valid, but does not ensure that the + * seqno nor HWS is the right one! However, if the request was + * reallocated, that means the active tracker's request was complete. + * If the new request is also complete, then both are and we can + * just report the active tracker is idle. If the new request is + * incomplete, then we acquire a reference on it and check that + * it remained the active request. + * + * It is then imperative that we do not zero the request on + * reallocation, so that we can chase the dangling pointers! + * See i915_request_alloc(). + */ + do { + struct i915_request *request; + + request = rcu_dereference(active->request); + if (!request || i915_request_completed(request)) + return NULL; + + /* + * An especially silly compiler could decide to recompute the + * result of i915_request_completed, more specifically + * re-emit the load for request->fence.seqno. A race would catch + * a later seqno value, which could flip the result from true to + * false. Which means part of the instructions below might not + * be executed, while later on instructions are executed. Due to + * barriers within the refcounting the inconsistency can't reach + * past the call to i915_request_get_rcu, but not executing + * that while still executing i915_request_put() creates + * havoc enough. Prevent this with a compiler barrier. + */ + barrier(); + + request = i915_request_get_rcu(request); + + /* + * What stops the following rcu_access_pointer() from occurring + * before the above i915_request_get_rcu()? If we were + * to read the value before pausing to get the reference to + * the request, we may not notice a change in the active + * tracker. + * + * The rcu_access_pointer() is a mere compiler barrier, which + * means both the CPU and compiler are free to perform the + * memory read without constraint. The compiler only has to + * ensure that any operations after the rcu_access_pointer() + * occur afterwards in program order. This means the read may + * be performed earlier by an out-of-order CPU, or adventurous + * compiler. + * + * The atomic operation at the heart of + * i915_request_get_rcu(), see dma_fence_get_rcu(), is + * atomic_inc_not_zero() which is only a full memory barrier + * when successful. That is, if i915_request_get_rcu() + * returns the request (and so with the reference counted + * incremented) then the following read for rcu_access_pointer() + * must occur after the atomic operation and so confirm + * that this request is the one currently being tracked. + * + * The corresponding write barrier is part of + * rcu_assign_pointer(). + */ + if (!request || request == rcu_access_pointer(active->request)) + return rcu_pointer_handoff(request); + + i915_request_put(request); + } while (1); +} + +/** + * i915_active_request_get_unlocked - return a reference to the active request + * @active - the active tracker + * + * i915_active_request_get_unlocked() returns a reference to the active request, + * or NULL if the active tracker is idle. The reference is obtained under RCU, + * so no locking is required by the caller. + * + * The reference should be freed with i915_request_put(). + */ +static inline struct i915_request * +i915_active_request_get_unlocked(const struct i915_active_request *active) +{ + struct i915_request *request; + + rcu_read_lock(); + request = __i915_active_request_get_rcu(active); + rcu_read_unlock(); + + return request; +} + +/** + * i915_active_request_isset - report whether the active tracker is assigned + * @active - the active tracker + * + * i915_active_request_isset() returns true if the active tracker is currently + * assigned to a request. Due to the lazy retiring, that request may be idle + * and this may report stale information. + */ +static inline bool +i915_active_request_isset(const struct i915_active_request *active) +{ + return rcu_access_pointer(active->request); +} + +/** + * i915_active_request_retire - waits until the request is retired + * @active - the active request on which to wait + * + * i915_active_request_retire() waits until the request is completed, + * and then ensures that at least the retirement handler for this + * @active tracker is called before returning. If the @active + * tracker is idle, the function returns immediately. + */ +static inline int __must_check +i915_active_request_retire(struct i915_active_request *active, + struct mutex *mutex) +{ + struct i915_request *request; + long ret; + + request = i915_active_request_raw(active, mutex); + if (!request) + return 0; + + ret = i915_request_wait(request, + I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED, + MAX_SCHEDULE_TIMEOUT); + if (ret < 0) + return ret; + + list_del_init(&active->link); + RCU_INIT_POINTER(active->request, NULL); + + active->retire(active, request); + + return 0; +} /* * GPU activity tracking @@ -47,6 +394,8 @@ int i915_active_wait(struct i915_active *ref); int i915_request_await_active(struct i915_request *rq, struct i915_active *ref); +int i915_request_await_active_request(struct i915_request *rq, + struct i915_active_request *active); bool i915_active_acquire(struct i915_active *ref); diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h index 411e502ed8dd..b679253b53a5 100644 --- a/drivers/gpu/drm/i915/i915_active_types.h +++ b/drivers/gpu/drm/i915/i915_active_types.h @@ -8,16 +8,26 @@ #define _I915_ACTIVE_TYPES_H_ #include - -#include "i915_request.h" +#include struct drm_i915_private; +struct i915_active_request; +struct i915_request; + +typedef void (*i915_active_retire_fn)(struct i915_active_request *, + struct i915_request *); + +struct i915_active_request { + struct i915_request __rcu *request; + struct list_head link; + i915_active_retire_fn retire; +}; struct i915_active { struct drm_i915_private *i915; struct rb_root tree; - struct i915_gem_active last; + struct i915_active_request last; unsigned int count; void (*retire)(struct i915_active *ref); diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 54e426883529..a270af18404f 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -207,7 +207,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) if (vma->fence) seq_printf(m, " , fence: %d%s", vma->fence->id, - i915_gem_active_isset(&vma->last_fence) ? "*" : ""); + i915_active_request_isset(&vma->last_fence) ? "*" : ""); seq_puts(m, ")"); } if (obj->stolen) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d82e4f990586..81aa37508bc4 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2987,7 +2987,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915) GEM_BUG_ON(i915->gt.active_requests); for_each_engine(engine, i915, id) { - GEM_BUG_ON(__i915_gem_active_peek(&engine->timeline.last_request)); + GEM_BUG_ON(__i915_active_request_peek(&engine->timeline.last_request)); GEM_BUG_ON(engine->last_retired_context != to_intel_context(i915->kernel_context, engine)); } @@ -3233,7 +3233,7 @@ wait_for_timelines(struct drm_i915_private *i915, list_for_each_entry(tl, >->active_list, link) { struct i915_request *rq; - rq = i915_gem_active_get_unlocked(&tl->last_request); + rq = i915_active_request_get_unlocked(&tl->last_request); if (!rq) continue; @@ -4134,7 +4134,8 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data, } static void -frontbuffer_retire(struct i915_gem_active *active, struct i915_request *request) +frontbuffer_retire(struct i915_active_request *active, + struct i915_request *request) { struct drm_i915_gem_object *obj = container_of(active, typeof(*obj), frontbuffer_write); @@ -4161,7 +4162,8 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, obj->resv = &obj->__builtin_resv; obj->frontbuffer_ggtt_origin = ORIGIN_GTT; - init_request_active(&obj->frontbuffer_write, frontbuffer_retire); + i915_active_request_init(&obj->frontbuffer_write, + NULL, frontbuffer_retire); obj->mm.madv = I915_MADV_WILLNEED; INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN); diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 6faf1f6faab5..ea8e818d22bf 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -653,8 +653,8 @@ last_request_on_engine(struct i915_timeline *timeline, GEM_BUG_ON(timeline == &engine->timeline); - rq = i915_gem_active_raw(&timeline->last_request, - &engine->i915->drm.struct_mutex); + rq = i915_active_request_raw(&timeline->last_request, + &engine->i915->drm.struct_mutex); if (rq && rq->engine == engine) { GEM_TRACE("last request for %s on engine %s: %llx:%llu\n", timeline->name, engine->name, diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c index bd0d5b8d6c96..36d548fa3aa2 100644 --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c @@ -223,7 +223,7 @@ static int fence_update(struct drm_i915_fence_reg *fence, i915_gem_object_get_tiling(vma->obj))) return -EINVAL; - ret = i915_gem_active_retire(&vma->last_fence, + ret = i915_active_request_retire(&vma->last_fence, &vma->obj->base.dev->struct_mutex); if (ret) return ret; @@ -232,7 +232,7 @@ static int fence_update(struct drm_i915_fence_reg *fence, if (fence->vma) { struct i915_vma *old = fence->vma; - ret = i915_gem_active_retire(&old->last_fence, + ret = i915_active_request_retire(&old->last_fence, &old->obj->base.dev->struct_mutex); if (ret) return ret; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index e625659c03a2..d646d37eec2f 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1918,7 +1918,7 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size) return ERR_PTR(-ENOMEM); i915_active_init(i915, &vma->active, NULL); - init_request_active(&vma->last_fence, NULL); + INIT_ACTIVE_REQUEST(&vma->last_fence); vma->vm = &ggtt->vm; vma->ops = &pd_vma_ops; diff --git a/drivers/gpu/drm/i915/i915_gem_object.h b/drivers/gpu/drm/i915/i915_gem_object.h index 73fec917d097..fab040331cdb 100644 --- a/drivers/gpu/drm/i915/i915_gem_object.h +++ b/drivers/gpu/drm/i915/i915_gem_object.h @@ -175,7 +175,7 @@ struct drm_i915_gem_object { atomic_t frontbuffer_bits; unsigned int frontbuffer_ggtt_origin; /* write once */ - struct i915_gem_active frontbuffer_write; + struct i915_active_request frontbuffer_write; /** Current tiling stride for the object, if it's tiled. */ unsigned int tiling_and_stride; diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 6e2e5ed2bd0a..9a65341fec09 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1062,23 +1062,23 @@ i915_error_object_create(struct drm_i915_private *i915, } /* The error capture is special as tries to run underneath the normal - * locking rules - so we use the raw version of the i915_gem_active lookup. + * locking rules - so we use the raw version of the i915_active_request lookup. */ static inline u32 -__active_get_seqno(struct i915_gem_active *active) +__active_get_seqno(struct i915_active_request *active) { struct i915_request *request; - request = __i915_gem_active_peek(active); + request = __i915_active_request_peek(active); return request ? request->global_seqno : 0; } static inline int -__active_get_engine_id(struct i915_gem_active *active) +__active_get_engine_id(struct i915_active_request *active) { struct i915_request *request; - request = __i915_gem_active_peek(active); + request = __i915_active_request_peek(active); return request ? request->engine->id : -1; } diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index f5b2c95125ba..ed9f16bca4fe 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -29,6 +29,7 @@ #include #include "i915_drv.h" +#include "i915_active.h" #include "i915_reset.h" static struct i915_global_request { @@ -130,12 +131,6 @@ static void unreserve_gt(struct drm_i915_private *i915) i915_gem_park(i915); } -void i915_gem_retire_noop(struct i915_gem_active *active, - struct i915_request *request) -{ - /* Space left intentionally blank */ -} - static void advance_ring(struct i915_request *request) { struct intel_ring *ring = request->ring; @@ -249,7 +244,7 @@ static void __retire_engine_upto(struct intel_engine_cs *engine, static void i915_request_retire(struct i915_request *request) { - struct i915_gem_active *active, *next; + struct i915_active_request *active, *next; GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n", request->engine->name, @@ -283,10 +278,10 @@ static void i915_request_retire(struct i915_request *request) * we may spend an inordinate amount of time simply handling * the retirement of requests and processing their callbacks. * Of which, this loop itself is particularly hot due to the - * cache misses when jumping around the list of i915_gem_active. - * So we try to keep this loop as streamlined as possible and - * also prefetch the next i915_gem_active to try and hide - * the likely cache miss. + * cache misses when jumping around the list of + * i915_active_request. So we try to keep this loop as + * streamlined as possible and also prefetch the next + * i915_active_request to try and hide the likely cache miss. */ prefetchw(next); @@ -543,17 +538,9 @@ i915_request_alloc_slow(struct intel_context *ce) return kmem_cache_alloc(global.slab_requests, GFP_KERNEL); } -static int add_barrier(struct i915_request *rq, struct i915_gem_active *active) -{ - struct i915_request *barrier = - i915_gem_active_raw(active, &rq->i915->drm.struct_mutex); - - return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0; -} - static int add_timeline_barrier(struct i915_request *rq) { - return add_barrier(rq, &rq->timeline->barrier); + return i915_request_await_active_request(rq, &rq->timeline->barrier); } /** @@ -612,7 +599,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) * We use RCU to look up requests in flight. The lookups may * race with the request being allocated from the slab freelist. * That is the request we are writing to here, may be in the process - * of being read by __i915_gem_active_get_rcu(). As such, + * of being read by __i915_active_request_get_rcu(). As such, * we have to be very careful when overwriting the contents. During * the RCU lookup, we change chase the request->engine pointer, * read the request->global_seqno and increment the reference count. @@ -952,8 +939,8 @@ void i915_request_add(struct i915_request *request) * see a more recent value in the hws than we are tracking. */ - prev = i915_gem_active_raw(&timeline->last_request, - &request->i915->drm.struct_mutex); + prev = i915_active_request_raw(&timeline->last_request, + &request->i915->drm.struct_mutex); if (prev && !i915_request_completed(prev)) { i915_sw_fence_await_sw_fence(&request->submit, &prev->submit, &request->submitq); @@ -969,7 +956,7 @@ void i915_request_add(struct i915_request *request) spin_unlock_irq(&timeline->lock); GEM_BUG_ON(timeline->seqno != request->fence.seqno); - i915_gem_active_set(&timeline->last_request, request); + __i915_active_request_set(&timeline->last_request, request); list_add_tail(&request->ring_link, &ring->request_list); if (list_is_first(&request->ring_link, &ring->request_list)) { diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 054bd300984b..071ff1064579 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -409,389 +409,6 @@ static inline void i915_request_mark_complete(struct i915_request *rq) void i915_retire_requests(struct drm_i915_private *i915); -/* - * We treat requests as fences. This is not be to confused with our - * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync. - * We use the fences to synchronize access from the CPU with activity on the - * GPU, for example, we should not rewrite an object's PTE whilst the GPU - * is reading them. We also track fences at a higher level to provide - * implicit synchronisation around GEM objects, e.g. set-domain will wait - * for outstanding GPU rendering before marking the object ready for CPU - * access, or a pageflip will wait until the GPU is complete before showing - * the frame on the scanout. - * - * In order to use a fence, the object must track the fence it needs to - * serialise with. For example, GEM objects want to track both read and - * write access so that we can perform concurrent read operations between - * the CPU and GPU engines, as well as waiting for all rendering to - * complete, or waiting for the last GPU user of a "fence register". The - * object then embeds a #i915_gem_active to track the most recent (in - * retirement order) request relevant for the desired mode of access. - * The #i915_gem_active is updated with i915_gem_active_set() to track the - * most recent fence request, typically this is done as part of - * i915_vma_move_to_active(). - * - * When the #i915_gem_active completes (is retired), it will - * signal its completion to the owner through a callback as well as mark - * itself as idle (i915_gem_active.request == NULL). The owner - * can then perform any action, such as delayed freeing of an active - * resource including itself. - */ -struct i915_gem_active; - -typedef void (*i915_gem_retire_fn)(struct i915_gem_active *, - struct i915_request *); - -struct i915_gem_active { - struct i915_request __rcu *request; - struct list_head link; - i915_gem_retire_fn retire; -}; - -void i915_gem_retire_noop(struct i915_gem_active *, - struct i915_request *request); - -/** - * init_request_active - prepares the activity tracker for use - * @active - the active tracker - * @func - a callback when then the tracker is retired (becomes idle), - * can be NULL - * - * init_request_active() prepares the embedded @active struct for use as - * an activity tracker, that is for tracking the last known active request - * associated with it. When the last request becomes idle, when it is retired - * after completion, the optional callback @func is invoked. - */ -static inline void -init_request_active(struct i915_gem_active *active, - i915_gem_retire_fn retire) -{ - RCU_INIT_POINTER(active->request, NULL); - INIT_LIST_HEAD(&active->link); - active->retire = retire ?: i915_gem_retire_noop; -} - -/** - * i915_gem_active_set - updates the tracker to watch the current request - * @active - the active tracker - * @request - the request to watch - * - * i915_gem_active_set() watches the given @request for completion. Whilst - * that @request is busy, the @active reports busy. When that @request is - * retired, the @active tracker is updated to report idle. - */ -static inline void -i915_gem_active_set(struct i915_gem_active *active, - struct i915_request *request) -{ - list_move(&active->link, &request->active_list); - rcu_assign_pointer(active->request, request); -} - -/** - * i915_gem_active_set_retire_fn - updates the retirement callback - * @active - the active tracker - * @fn - the routine called when the request is retired - * @mutex - struct_mutex used to guard retirements - * - * i915_gem_active_set_retire_fn() updates the function pointer that - * is called when the final request associated with the @active tracker - * is retired. - */ -static inline void -i915_gem_active_set_retire_fn(struct i915_gem_active *active, - i915_gem_retire_fn fn, - struct mutex *mutex) -{ - lockdep_assert_held(mutex); - active->retire = fn ?: i915_gem_retire_noop; -} - -static inline struct i915_request * -__i915_gem_active_peek(const struct i915_gem_active *active) -{ - /* - * Inside the error capture (running with the driver in an unknown - * state), we want to bend the rules slightly (a lot). - * - * Work is in progress to make it safer, in the meantime this keeps - * the known issue from spamming the logs. - */ - return rcu_dereference_protected(active->request, 1); -} - -/** - * i915_gem_active_raw - return the active request - * @active - the active tracker - * - * i915_gem_active_raw() returns the current request being tracked, or NULL. - * It does not obtain a reference on the request for the caller, so the caller - * must hold struct_mutex. - */ -static inline struct i915_request * -i915_gem_active_raw(const struct i915_gem_active *active, struct mutex *mutex) -{ - return rcu_dereference_protected(active->request, - lockdep_is_held(mutex)); -} - -/** - * i915_gem_active_peek - report the active request being monitored - * @active - the active tracker - * - * i915_gem_active_peek() returns the current request being tracked if - * still active, or NULL. It does not obtain a reference on the request - * for the caller, so the caller must hold struct_mutex. - */ -static inline struct i915_request * -i915_gem_active_peek(const struct i915_gem_active *active, struct mutex *mutex) -{ - struct i915_request *request; - - request = i915_gem_active_raw(active, mutex); - if (!request || i915_request_completed(request)) - return NULL; - - return request; -} - -/** - * i915_gem_active_get - return a reference to the active request - * @active - the active tracker - * - * i915_gem_active_get() returns a reference to the active request, or NULL - * if the active tracker is idle. The caller must hold struct_mutex. - */ -static inline struct i915_request * -i915_gem_active_get(const struct i915_gem_active *active, struct mutex *mutex) -{ - return i915_request_get(i915_gem_active_peek(active, mutex)); -} - -/** - * __i915_gem_active_get_rcu - return a reference to the active request - * @active - the active tracker - * - * __i915_gem_active_get() returns a reference to the active request, or NULL - * if the active tracker is idle. The caller must hold the RCU read lock, but - * the returned pointer is safe to use outside of RCU. - */ -static inline struct i915_request * -__i915_gem_active_get_rcu(const struct i915_gem_active *active) -{ - /* - * Performing a lockless retrieval of the active request is super - * tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing - * slab of request objects will not be freed whilst we hold the - * RCU read lock. It does not guarantee that the request itself - * will not be freed and then *reused*. Viz, - * - * Thread A Thread B - * - * rq = active.request - * retire(rq) -> free(rq); - * (rq is now first on the slab freelist) - * active.request = NULL - * - * rq = new submission on a new object - * ref(rq) - * - * To prevent the request from being reused whilst the caller - * uses it, we take a reference like normal. Whilst acquiring - * the reference we check that it is not in a destroyed state - * (refcnt == 0). That prevents the request being reallocated - * whilst the caller holds on to it. To check that the request - * was not reallocated as we acquired the reference we have to - * check that our request remains the active request across - * the lookup, in the same manner as a seqlock. The visibility - * of the pointer versus the reference counting is controlled - * by using RCU barriers (rcu_dereference and rcu_assign_pointer). - * - * In the middle of all that, we inspect whether the request is - * complete. Retiring is lazy so the request may be completed long - * before the active tracker is updated. Querying whether the - * request is complete is far cheaper (as it involves no locked - * instructions setting cachelines to exclusive) than acquiring - * the reference, so we do it first. The RCU read lock ensures the - * pointer dereference is valid, but does not ensure that the - * seqno nor HWS is the right one! However, if the request was - * reallocated, that means the active tracker's request was complete. - * If the new request is also complete, then both are and we can - * just report the active tracker is idle. If the new request is - * incomplete, then we acquire a reference on it and check that - * it remained the active request. - * - * It is then imperative that we do not zero the request on - * reallocation, so that we can chase the dangling pointers! - * See i915_request_alloc(). - */ - do { - struct i915_request *request; - - request = rcu_dereference(active->request); - if (!request || i915_request_completed(request)) - return NULL; - - /* - * An especially silly compiler could decide to recompute the - * result of i915_request_completed, more specifically - * re-emit the load for request->fence.seqno. A race would catch - * a later seqno value, which could flip the result from true to - * false. Which means part of the instructions below might not - * be executed, while later on instructions are executed. Due to - * barriers within the refcounting the inconsistency can't reach - * past the call to i915_request_get_rcu, but not executing - * that while still executing i915_request_put() creates - * havoc enough. Prevent this with a compiler barrier. - */ - barrier(); - - request = i915_request_get_rcu(request); - - /* - * What stops the following rcu_access_pointer() from occurring - * before the above i915_request_get_rcu()? If we were - * to read the value before pausing to get the reference to - * the request, we may not notice a change in the active - * tracker. - * - * The rcu_access_pointer() is a mere compiler barrier, which - * means both the CPU and compiler are free to perform the - * memory read without constraint. The compiler only has to - * ensure that any operations after the rcu_access_pointer() - * occur afterwards in program order. This means the read may - * be performed earlier by an out-of-order CPU, or adventurous - * compiler. - * - * The atomic operation at the heart of - * i915_request_get_rcu(), see dma_fence_get_rcu(), is - * atomic_inc_not_zero() which is only a full memory barrier - * when successful. That is, if i915_request_get_rcu() - * returns the request (and so with the reference counted - * incremented) then the following read for rcu_access_pointer() - * must occur after the atomic operation and so confirm - * that this request is the one currently being tracked. - * - * The corresponding write barrier is part of - * rcu_assign_pointer(). - */ - if (!request || request == rcu_access_pointer(active->request)) - return rcu_pointer_handoff(request); - - i915_request_put(request); - } while (1); -} - -/** - * i915_gem_active_get_unlocked - return a reference to the active request - * @active - the active tracker - * - * i915_gem_active_get_unlocked() returns a reference to the active request, - * or NULL if the active tracker is idle. The reference is obtained under RCU, - * so no locking is required by the caller. - * - * The reference should be freed with i915_request_put(). - */ -static inline struct i915_request * -i915_gem_active_get_unlocked(const struct i915_gem_active *active) -{ - struct i915_request *request; - - rcu_read_lock(); - request = __i915_gem_active_get_rcu(active); - rcu_read_unlock(); - - return request; -} - -/** - * i915_gem_active_isset - report whether the active tracker is assigned - * @active - the active tracker - * - * i915_gem_active_isset() returns true if the active tracker is currently - * assigned to a request. Due to the lazy retiring, that request may be idle - * and this may report stale information. - */ -static inline bool -i915_gem_active_isset(const struct i915_gem_active *active) -{ - return rcu_access_pointer(active->request); -} - -/** - * i915_gem_active_wait - waits until the request is completed - * @active - the active request on which to wait - * @flags - how to wait - * @timeout - how long to wait at most - * @rps - userspace client to charge for a waitboost - * - * i915_gem_active_wait() waits until the request is completed before - * returning, without requiring any locks to be held. Note that it does not - * retire any requests before returning. - * - * This function relies on RCU in order to acquire the reference to the active - * request without holding any locks. See __i915_gem_active_get_rcu() for the - * glory details on how that is managed. Once the reference is acquired, we - * can then wait upon the request, and afterwards release our reference, - * free of any locking. - * - * This function wraps i915_request_wait(), see it for the full details on - * the arguments. - * - * Returns 0 if successful, or a negative error code. - */ -static inline int -i915_gem_active_wait(const struct i915_gem_active *active, unsigned int flags) -{ - struct i915_request *request; - long ret = 0; - - request = i915_gem_active_get_unlocked(active); - if (request) { - ret = i915_request_wait(request, flags, MAX_SCHEDULE_TIMEOUT); - i915_request_put(request); - } - - return ret < 0 ? ret : 0; -} - -/** - * i915_gem_active_retire - waits until the request is retired - * @active - the active request on which to wait - * - * i915_gem_active_retire() waits until the request is completed, - * and then ensures that at least the retirement handler for this - * @active tracker is called before returning. If the @active - * tracker is idle, the function returns immediately. - */ -static inline int __must_check -i915_gem_active_retire(struct i915_gem_active *active, - struct mutex *mutex) -{ - struct i915_request *request; - long ret; - - request = i915_gem_active_raw(active, mutex); - if (!request) - return 0; - - ret = i915_request_wait(request, - I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED, - MAX_SCHEDULE_TIMEOUT); - if (ret < 0) - return ret; - - list_del_init(&active->link); - RCU_INIT_POINTER(active->request, NULL); - - active->retire(active, request); - - return 0; -} - -#define for_each_active(mask, idx) \ - for (; mask ? idx = ffs(mask) - 1, 1 : 0; mask &= ~BIT(idx)) - int i915_global_request_init(void); void i915_global_request_shrink(void); void i915_global_request_exit(void); diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c index ca19fcf29c5b..86d9c46aef18 100644 --- a/drivers/gpu/drm/i915/i915_reset.c +++ b/drivers/gpu/drm/i915/i915_reset.c @@ -887,7 +887,7 @@ static bool __i915_gem_unset_wedged(struct drm_i915_private *i915) list_for_each_entry(tl, &i915->gt.timelines.active_list, link) { struct i915_request *rq; - rq = i915_gem_active_get_unlocked(&tl->last_request); + rq = i915_active_request_get_unlocked(&tl->last_request); if (!rq) continue; diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c index b354843a5040..b2202d2e58a2 100644 --- a/drivers/gpu/drm/i915/i915_timeline.c +++ b/drivers/gpu/drm/i915/i915_timeline.c @@ -163,8 +163,8 @@ int i915_timeline_init(struct drm_i915_private *i915, spin_lock_init(&timeline->lock); - init_request_active(&timeline->barrier, NULL); - init_request_active(&timeline->last_request, NULL); + INIT_ACTIVE_REQUEST(&timeline->barrier); + INIT_ACTIVE_REQUEST(&timeline->last_request); INIT_LIST_HEAD(&timeline->requests); i915_syncmap_init(&timeline->sync); @@ -236,7 +236,7 @@ void i915_timeline_fini(struct i915_timeline *timeline) { GEM_BUG_ON(timeline->pin_count); GEM_BUG_ON(!list_empty(&timeline->requests)); - GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier)); + GEM_BUG_ON(i915_active_request_isset(&timeline->barrier)); i915_syncmap_free(&timeline->sync); hwsp_free(timeline); @@ -268,25 +268,6 @@ i915_timeline_create(struct drm_i915_private *i915, return timeline; } -int i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq) -{ - struct i915_request *old; - int err; - - lockdep_assert_held(&rq->i915->drm.struct_mutex); - - /* Must maintain ordering wrt existing barriers */ - old = i915_gem_active_raw(&tl->barrier, &rq->i915->drm.struct_mutex); - if (old) { - err = i915_request_await_dma_fence(rq, &old->fence); - if (err) - return err; - } - - i915_gem_active_set(&tl->barrier, rq); - return 0; -} - int i915_timeline_pin(struct i915_timeline *tl) { int err; diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h index d167e04073c5..7bec7d2e45bf 100644 --- a/drivers/gpu/drm/i915/i915_timeline.h +++ b/drivers/gpu/drm/i915/i915_timeline.h @@ -28,6 +28,7 @@ #include #include +#include "i915_active.h" #include "i915_request.h" #include "i915_syncmap.h" #include "i915_utils.h" @@ -58,10 +59,10 @@ struct i915_timeline { /* Contains an RCU guarded pointer to the last request. No reference is * held to the request, users must carefully acquire a reference to - * the request using i915_gem_active_get_request_rcu(), or hold the + * the request using i915_active_request_get_request_rcu(), or hold the * struct_mutex. */ - struct i915_gem_active last_request; + struct i915_active_request last_request; /** * We track the most recent seqno that we wait on in every context so @@ -82,7 +83,7 @@ struct i915_timeline { * subsequent submissions to this timeline be executed only after the * barrier has been completed. */ - struct i915_gem_active barrier; + struct i915_active_request barrier; struct list_head link; const char *name; @@ -174,7 +175,10 @@ void i915_timelines_fini(struct drm_i915_private *i915); * submissions on @timeline. Subsequent requests will not be submitted to GPU * until the barrier has been completed. */ -int i915_timeline_set_barrier(struct i915_timeline *timeline, - struct i915_request *rq); +static inline int +i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq) +{ + return i915_active_request_set(&tl->barrier, rq); +} #endif diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index d4772061e642..b713bed20c38 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -120,7 +120,7 @@ vma_create(struct drm_i915_gem_object *obj, return ERR_PTR(-ENOMEM); i915_active_init(vm->i915, &vma->active, __i915_vma_retire); - init_request_active(&vma->last_fence, NULL); + INIT_ACTIVE_REQUEST(&vma->last_fence); vma->vm = vm; vma->ops = &vm->vma_ops; @@ -808,7 +808,7 @@ static void __i915_vma_destroy(struct i915_vma *vma) GEM_BUG_ON(vma->node.allocated); GEM_BUG_ON(vma->fence); - GEM_BUG_ON(i915_gem_active_isset(&vma->last_fence)); + GEM_BUG_ON(i915_active_request_isset(&vma->last_fence)); mutex_lock(&vma->vm->mutex); list_del(&vma->vm_link); @@ -942,14 +942,14 @@ int i915_vma_move_to_active(struct i915_vma *vma, obj->write_domain = I915_GEM_DOMAIN_RENDER; if (intel_fb_obj_invalidate(obj, ORIGIN_CS)) - i915_gem_active_set(&obj->frontbuffer_write, rq); + __i915_active_request_set(&obj->frontbuffer_write, rq); obj->read_domains = 0; } obj->read_domains |= I915_GEM_GPU_DOMAINS; if (flags & EXEC_OBJECT_NEEDS_FENCE) - i915_gem_active_set(&vma->last_fence, rq); + __i915_active_request_set(&vma->last_fence, rq); export_fence(vma, rq, flags); return 0; @@ -986,8 +986,8 @@ int i915_vma_unbind(struct i915_vma *vma) if (ret) goto unpin; - ret = i915_gem_active_retire(&vma->last_fence, - &vma->vm->i915->drm.struct_mutex); + ret = i915_active_request_retire(&vma->last_fence, + &vma->vm->i915->drm.struct_mutex); unpin: __i915_vma_unpin(vma); if (ret) diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 3c03d4569481..7c742027f866 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -110,7 +110,7 @@ struct i915_vma { #define I915_VMA_GGTT_WRITE BIT(15) struct i915_active active; - struct i915_gem_active last_fence; + struct i915_active_request last_fence; /** * Support different GGTT views into the same object. diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index ec2cbbe070a4..0dbd6d7c1693 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -1124,7 +1124,7 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine) * the last request that remains in the timeline. When idle, it is * the last executed context as tracked by retirement. */ - rq = __i915_gem_active_peek(&engine->timeline.last_request); + rq = __i915_active_request_peek(&engine->timeline.last_request); if (rq) return rq->hw_context == kernel_context; else diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index a9238fd07e30..c0df1dbb0069 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -186,7 +186,7 @@ struct intel_overlay { struct overlay_registers __iomem *regs; u32 flip_addr; /* flip handling */ - struct i915_gem_active last_flip; + struct i915_active_request last_flip; }; static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv, @@ -214,23 +214,23 @@ static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv, static void intel_overlay_submit_request(struct intel_overlay *overlay, struct i915_request *rq, - i915_gem_retire_fn retire) + i915_active_retire_fn retire) { - GEM_BUG_ON(i915_gem_active_peek(&overlay->last_flip, - &overlay->i915->drm.struct_mutex)); - i915_gem_active_set_retire_fn(&overlay->last_flip, retire, - &overlay->i915->drm.struct_mutex); - i915_gem_active_set(&overlay->last_flip, rq); + GEM_BUG_ON(i915_active_request_peek(&overlay->last_flip, + &overlay->i915->drm.struct_mutex)); + i915_active_request_set_retire_fn(&overlay->last_flip, retire, + &overlay->i915->drm.struct_mutex); + __i915_active_request_set(&overlay->last_flip, rq); i915_request_add(rq); } static int intel_overlay_do_wait_request(struct intel_overlay *overlay, struct i915_request *rq, - i915_gem_retire_fn retire) + i915_active_retire_fn retire) { intel_overlay_submit_request(overlay, rq, retire); - return i915_gem_active_retire(&overlay->last_flip, - &overlay->i915->drm.struct_mutex); + return i915_active_request_retire(&overlay->last_flip, + &overlay->i915->drm.struct_mutex); } static struct i915_request *alloc_request(struct intel_overlay *overlay) @@ -351,8 +351,9 @@ static void intel_overlay_release_old_vma(struct intel_overlay *overlay) i915_vma_put(vma); } -static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active, - struct i915_request *rq) +static void +intel_overlay_release_old_vid_tail(struct i915_active_request *active, + struct i915_request *rq) { struct intel_overlay *overlay = container_of(active, typeof(*overlay), last_flip); @@ -360,7 +361,7 @@ static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active, intel_overlay_release_old_vma(overlay); } -static void intel_overlay_off_tail(struct i915_gem_active *active, +static void intel_overlay_off_tail(struct i915_active_request *active, struct i915_request *rq) { struct intel_overlay *overlay = @@ -423,8 +424,8 @@ static int intel_overlay_off(struct intel_overlay *overlay) * We have to be careful not to repeat work forever an make forward progess. */ static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay) { - return i915_gem_active_retire(&overlay->last_flip, - &overlay->i915->drm.struct_mutex); + return i915_active_request_retire(&overlay->last_flip, + &overlay->i915->drm.struct_mutex); } /* Wait for pending overlay flip and release old frame. @@ -1357,7 +1358,7 @@ void intel_overlay_setup(struct drm_i915_private *dev_priv) overlay->contrast = 75; overlay->saturation = 146; - init_request_active(&overlay->last_flip, NULL); + INIT_ACTIVE_REQUEST(&overlay->last_flip); mutex_lock(&dev_priv->drm.struct_mutex); diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c index 30ab0e04a674..72151aab208e 100644 --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c @@ -501,8 +501,8 @@ static int live_suppress_wait_preempt(void *arg) } /* Disable NEWCLIENT promotion */ - i915_gem_active_set(&rq[i]->timeline->last_request, - dummy); + __i915_active_request_set(&rq[i]->timeline->last_request, + dummy); i915_request_add(rq[i]); } diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c index e5659aaa856d..d2de9ece2118 100644 --- a/drivers/gpu/drm/i915/selftests/mock_timeline.c +++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c @@ -15,8 +15,8 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context) spin_lock_init(&timeline->lock); - init_request_active(&timeline->barrier, NULL); - init_request_active(&timeline->last_request, NULL); + INIT_ACTIVE_REQUEST(&timeline->barrier); + INIT_ACTIVE_REQUEST(&timeline->last_request); INIT_LIST_HEAD(&timeline->requests); i915_syncmap_init(&timeline->sync); From patchwork Mon Feb 4 13:22:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795603 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4352513A4 for ; Mon, 4 Feb 2019 13:22:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30D722B497 for ; Mon, 4 Feb 2019 13:22:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 24CBF2B49F; Mon, 4 Feb 2019 13:22:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 24EFB2B497 for ; Mon, 4 Feb 2019 13:22:47 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4D7826E4E3; Mon, 4 Feb 2019 13:22:44 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1626E6E4DE for ; Mon, 4 Feb 2019 13:22:42 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458871-1500050 for multiple; Mon, 04 Feb 2019 13:22:27 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:10 +0000 Message-Id: <20190204132214.9459-19-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 18/22] drm/i915: Keep timeline HWSP allocated until idle across the system X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP In preparation for enabling HW semaphores, we need to keep in flight timeline HWSP alive until its use across entire system has completed, as any other timeline active on the GPU may still refer back to the already retired timeline. We both have to delay recycling available cachelines and unpinning old HWSP until the next idle point. An easy option would be to simply keep all used HWSP until the system as a whole was idle, i.e. we could release them all at once on parking. However, on a busy system, we may never see a global idle point, essentially meaning the resource will be leaked until we are forced to do a GC pass. We already employ a fine-grained idle detection mechanism for vma, which we can reuse here so that each cacheline can be freed immediately after the last request using it is retired. v3: Keep track of the activity of each cacheline. v4: cacheline_free() on canceling the seqno tracking v5: Finally with a testcase to exercise wraparound Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_request.c | 30 +- drivers/gpu/drm/i915/i915_timeline.c | 264 ++++++++++++++++-- drivers/gpu/drm/i915/i915_timeline.h | 9 +- .../gpu/drm/i915/selftests/i915_timeline.c | 110 ++++++++ 4 files changed, 374 insertions(+), 39 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index ed9f16bca4fe..057bffa56700 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -331,11 +331,6 @@ void i915_request_retire_upto(struct i915_request *rq) } while (tmp != rq); } -static u32 timeline_get_seqno(struct i915_timeline *tl) -{ - return tl->seqno += 1 + tl->has_initial_breadcrumb; -} - static void move_to_timeline(struct i915_request *request, struct i915_timeline *timeline) { @@ -556,8 +551,10 @@ struct i915_request * i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) { struct drm_i915_private *i915 = engine->i915; - struct i915_request *rq; struct intel_context *ce; + struct i915_timeline *tl; + struct i915_request *rq; + u32 seqno; int ret; lockdep_assert_held(&i915->drm.struct_mutex); @@ -632,24 +629,26 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) } } - rq->rcustate = get_state_synchronize_rcu(); - INIT_LIST_HEAD(&rq->active_list); + + tl = ce->ring->timeline; + ret = i915_timeline_get_seqno(tl, rq, &seqno); + if (ret) + goto err_free; + rq->i915 = i915; rq->engine = engine; rq->gem_context = ctx; rq->hw_context = ce; rq->ring = ce->ring; - rq->timeline = ce->ring->timeline; + rq->timeline = tl; GEM_BUG_ON(rq->timeline == &engine->timeline); - rq->hwsp_seqno = rq->timeline->hwsp_seqno; + rq->hwsp_seqno = tl->hwsp_seqno; + rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */ spin_lock_init(&rq->lock); - dma_fence_init(&rq->fence, - &i915_fence_ops, - &rq->lock, - rq->timeline->fence_context, - timeline_get_seqno(rq->timeline)); + dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock, + tl->fence_context, seqno); /* We bump the ref for the fence chain */ i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify); @@ -710,6 +709,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) GEM_BUG_ON(!list_empty(&rq->sched.signalers_list)); GEM_BUG_ON(!list_empty(&rq->sched.waiters_list)); +err_free: kmem_cache_free(global.slab_requests, rq); err_unreserve: unreserve_gt(i915); diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c index b2202d2e58a2..3608e544012f 100644 --- a/drivers/gpu/drm/i915/i915_timeline.c +++ b/drivers/gpu/drm/i915/i915_timeline.c @@ -6,19 +6,29 @@ #include "i915_drv.h" -#include "i915_timeline.h" +#include "i915_active.h" #include "i915_syncmap.h" +#include "i915_timeline.h" struct i915_timeline_hwsp { - struct i915_vma *vma; + struct i915_gt_timelines *gt; struct list_head free_link; + struct i915_vma *vma; u64 free_bitmap; }; -static inline struct i915_timeline_hwsp * -i915_timeline_hwsp(const struct i915_timeline *tl) +struct i915_timeline_cacheline { + struct i915_active active; + struct i915_timeline_hwsp *hwsp; + void *vaddr; + unsigned int cacheline : 6; + unsigned int free : 1; +}; + +static inline struct drm_i915_private * +hwsp_to_i915(struct i915_timeline_hwsp *hwsp) { - return tl->hwsp_ggtt->private; + return container_of(hwsp->gt, struct drm_i915_private, gt.timelines); } static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915) @@ -71,6 +81,7 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline) vma->private = hwsp; hwsp->vma = vma; hwsp->free_bitmap = ~0ull; + hwsp->gt = gt; spin_lock(>->hwsp_lock); list_add(&hwsp->free_link, >->hwsp_free_list); @@ -88,14 +99,9 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline) return hwsp->vma; } -static void hwsp_free(struct i915_timeline *timeline) +static void __idle_hwsp_free(struct i915_timeline_hwsp *hwsp, int cacheline) { - struct i915_gt_timelines *gt = &timeline->i915->gt.timelines; - struct i915_timeline_hwsp *hwsp; - - hwsp = i915_timeline_hwsp(timeline); - if (!hwsp) /* leave global HWSP alone! */ - return; + struct i915_gt_timelines *gt = hwsp->gt; spin_lock(>->hwsp_lock); @@ -103,7 +109,8 @@ static void hwsp_free(struct i915_timeline *timeline) if (!hwsp->free_bitmap) list_add_tail(&hwsp->free_link, >->hwsp_free_list); - hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES); + GEM_BUG_ON(cacheline >= BITS_PER_TYPE(hwsp->free_bitmap)); + hwsp->free_bitmap |= BIT_ULL(cacheline); /* And if no one is left using it, give the page back to the system */ if (hwsp->free_bitmap == ~0ull) { @@ -115,6 +122,78 @@ static void hwsp_free(struct i915_timeline *timeline) spin_unlock(>->hwsp_lock); } +static void __idle_cacheline_free(struct i915_timeline_cacheline *cl) +{ + GEM_BUG_ON(!i915_active_is_idle(&cl->active)); + + i915_gem_object_unpin_map(cl->hwsp->vma->obj); + i915_vma_put(cl->hwsp->vma); + __idle_hwsp_free(cl->hwsp, cl->cacheline); + + i915_active_fini(&cl->active); + kfree(cl); +} + +static void __cacheline_retire(struct i915_active *active) +{ + struct i915_timeline_cacheline *cl = + container_of(active, typeof(*cl), active); + + i915_vma_unpin(cl->hwsp->vma); + if (cl->free) + __idle_cacheline_free(cl); +} + +static struct i915_timeline_cacheline * +cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline) +{ + struct i915_timeline_cacheline *cl; + void *vaddr; + + GEM_BUG_ON(cacheline >= 64); + + cl = kmalloc(sizeof(*cl), GFP_KERNEL); + if (!cl) + return ERR_PTR(-ENOMEM); + + vaddr = i915_gem_object_pin_map(hwsp->vma->obj, I915_MAP_WB); + if (IS_ERR(vaddr)) { + kfree(cl); + return ERR_CAST(vaddr); + } + + i915_vma_get(hwsp->vma); + cl->hwsp = hwsp; + cl->vaddr = vaddr; + cl->cacheline = cacheline; + cl->free = false; + + i915_active_init(hwsp_to_i915(hwsp), &cl->active, __cacheline_retire); + + return cl; +} + +static void cacheline_acquire(struct i915_timeline_cacheline *cl) +{ + if (cl && i915_active_acquire(&cl->active)) + __i915_vma_pin(cl->hwsp->vma); +} + +static void cacheline_release(struct i915_timeline_cacheline *cl) +{ + if (cl) + i915_active_release(&cl->active); +} + +static void cacheline_free(struct i915_timeline_cacheline *cl) +{ + GEM_BUG_ON(cl->free); + cl->free = true; + + if (i915_active_is_idle(&cl->active)) + __idle_cacheline_free(cl); +} + int i915_timeline_init(struct drm_i915_private *i915, struct i915_timeline *timeline, const char *name, @@ -136,29 +215,40 @@ int i915_timeline_init(struct drm_i915_private *i915, timeline->name = name; timeline->pin_count = 0; timeline->has_initial_breadcrumb = !hwsp; + timeline->hwsp_cacheline = NULL; - timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR; if (!hwsp) { + struct i915_timeline_cacheline *cl; unsigned int cacheline; hwsp = hwsp_alloc(timeline, &cacheline); if (IS_ERR(hwsp)) return PTR_ERR(hwsp); + cl = cacheline_alloc(hwsp->private, cacheline); + if (IS_ERR(cl)) { + __idle_hwsp_free(hwsp->private, cacheline); + return PTR_ERR(cl); + } + + timeline->hwsp_cacheline = cl; timeline->hwsp_offset = cacheline * CACHELINE_BYTES; - } - timeline->hwsp_ggtt = i915_vma_get(hwsp); - vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB); - if (IS_ERR(vaddr)) { - hwsp_free(timeline); - i915_vma_put(hwsp); - return PTR_ERR(vaddr); + vaddr = cl->vaddr; + } else { + timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR; + + vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB); + if (IS_ERR(vaddr)) + return PTR_ERR(vaddr); } timeline->hwsp_seqno = memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES); + timeline->hwsp_ggtt = i915_vma_get(hwsp); + GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size); + timeline->fence_context = dma_fence_context_alloc(1); spin_lock_init(&timeline->lock); @@ -239,9 +329,12 @@ void i915_timeline_fini(struct i915_timeline *timeline) GEM_BUG_ON(i915_active_request_isset(&timeline->barrier)); i915_syncmap_free(&timeline->sync); - hwsp_free(timeline); - i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj); + if (timeline->hwsp_cacheline) + cacheline_free(timeline->hwsp_cacheline); + else + i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj); + i915_vma_put(timeline->hwsp_ggtt); } @@ -284,6 +377,7 @@ int i915_timeline_pin(struct i915_timeline *tl) i915_ggtt_offset(tl->hwsp_ggtt) + offset_in_page(tl->hwsp_offset); + cacheline_acquire(tl->hwsp_cacheline); timeline_add_to_active(tl); return 0; @@ -293,6 +387,129 @@ int i915_timeline_pin(struct i915_timeline *tl) return err; } +static u32 timeline_advance(struct i915_timeline *tl) +{ + GEM_BUG_ON(!tl->pin_count); + GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb); + + return tl->seqno += 1 + tl->has_initial_breadcrumb; +} + +static void timeline_rollback(struct i915_timeline *tl) +{ + tl->seqno -= 1 + tl->has_initial_breadcrumb; +} + +static noinline int +__i915_timeline_get_seqno(struct i915_timeline *tl, + struct i915_request *rq, + u32 *seqno) +{ + struct i915_timeline_cacheline *cl; + struct i915_vma *vma; + unsigned int cacheline; + int err; + + /* + * If there is an outstanding GPU reference to this cacheline, + * such as it being sampled by a HW semaphore on another timeline, + * we cannot wraparound our seqno value (the HW semaphore does + * a strict greater-than-or-equals compare, not i915_seqno_passed). + * So if the cacheline is still busy, we must detach ourselves + * from it and leave it inflight alongside its users. + * + * However, if nobody is watching and we can guarantee that nobody + * will, we could simply reuse the same cacheline. + * + * if (i915_active_request_is_signaled(&tl->last_request) && + * i915_active_is_signaled(&tl->hwsp_cacheline->active)) + * return 0; + * + * That seems unlikely for a busy timeline that needed to wrap in + * the first place, so just replace the cacheline. + */ + + vma = hwsp_alloc(tl, &cacheline); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + goto err_rollback; + } + + err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH); + if (err) { + __idle_hwsp_free(vma->private, cacheline); + goto err_rollback; + } + + cl = cacheline_alloc(vma->private, cacheline); + if (IS_ERR(cl)) { + err = PTR_ERR(cl); + __idle_hwsp_free(vma->private, cacheline); + goto err_unpin; + } + GEM_BUG_ON(cl->hwsp->vma != vma); + + /* + * Attach the old cacheline to the current request, so that we only + * free it after the current request is retired, which ensures that + * all writes into the cacheline from previous requests are complete. + */ + err = i915_active_ref(&tl->hwsp_cacheline->active, + tl->fence_context, rq); + if (err) + goto err_cacheline; + + cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */ + cacheline_free(tl->hwsp_cacheline); + + i915_vma_unpin(tl->hwsp_ggtt); /* binding kept alive by old cacheline */ + i915_vma_put(tl->hwsp_ggtt); + + tl->hwsp_ggtt = i915_vma_get(vma); + + tl->hwsp_offset = cacheline * CACHELINE_BYTES; + tl->hwsp_seqno = + memset(cl->vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES); + + tl->hwsp_offset += i915_ggtt_offset(vma); + + cacheline_acquire(cl); + tl->hwsp_cacheline = cl; + + *seqno = timeline_advance(tl); + GEM_BUG_ON(i915_seqno_passed(*tl->hwsp_seqno, *seqno)); + return 0; + +err_cacheline: + cacheline_free(cl); +err_unpin: + i915_vma_unpin(vma); +err_rollback: + timeline_rollback(tl); + return err; +} + +int i915_timeline_get_seqno(struct i915_timeline *tl, + struct i915_request *rq, + u32 *seqno) +{ + *seqno = timeline_advance(tl); + + /* Replace the HWSP on wraparound for HW semaphores */ + if (unlikely(!*seqno && tl->hwsp_cacheline)) + return __i915_timeline_get_seqno(tl, rq, seqno); + + return 0; +} + +int i915_timeline_read_lock(struct i915_timeline *tl, struct i915_request *rq) +{ + GEM_BUG_ON(!tl->pin_count); + GEM_BUG_ON(!tl->hwsp_cacheline); + return i915_active_ref(&tl->hwsp_cacheline->active, + rq->fence.context, rq); +} + void i915_timeline_unpin(struct i915_timeline *tl) { GEM_BUG_ON(!tl->pin_count); @@ -300,6 +517,7 @@ void i915_timeline_unpin(struct i915_timeline *tl) return; timeline_remove_from_active(tl); + cacheline_release(tl->hwsp_cacheline); /* * Since this timeline is idle, all bariers upon which we were waiting diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h index 7bec7d2e45bf..d78ec6fbc000 100644 --- a/drivers/gpu/drm/i915/i915_timeline.h +++ b/drivers/gpu/drm/i915/i915_timeline.h @@ -34,7 +34,7 @@ #include "i915_utils.h" struct i915_vma; -struct i915_timeline_hwsp; +struct i915_timeline_cacheline; struct i915_timeline { u64 fence_context; @@ -49,6 +49,8 @@ struct i915_timeline { struct i915_vma *hwsp_ggtt; u32 hwsp_offset; + struct i915_timeline_cacheline *hwsp_cacheline; + bool has_initial_breadcrumb; /** @@ -160,6 +162,11 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl, } int i915_timeline_pin(struct i915_timeline *tl); +int i915_timeline_get_seqno(struct i915_timeline *tl, + struct i915_request *rq, + u32 *seqno); +int i915_timeline_read_lock(struct i915_timeline *tl, + struct i915_request *rq); void i915_timeline_unpin(struct i915_timeline *tl); void i915_timelines_init(struct drm_i915_private *i915); diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c index 12ea69b1a1e5..9e0126867634 100644 --- a/drivers/gpu/drm/i915/selftests/i915_timeline.c +++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c @@ -641,6 +641,115 @@ static int live_hwsp_alternate(void *arg) #undef NUM_TIMELINES } +static int live_hwsp_wrap(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct intel_engine_cs *engine; + struct i915_timeline *tl; + enum intel_engine_id id; + intel_wakeref_t wakeref; + int err = 0; + + /* + * Across a seqno wrap, we need to keep the old cacheline alive for + * foreign GPU references. + */ + + mutex_lock(&i915->drm.struct_mutex); + wakeref = intel_runtime_pm_get(i915); + + tl = i915_timeline_create(i915, __func__, NULL); + if (IS_ERR(tl)) { + err = PTR_ERR(tl); + goto out_rpm; + } + if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline) + goto out_free; + + err = i915_timeline_pin(tl); + if (err) + goto out_free; + + for_each_engine(engine, i915, id) { + const u32 *hwsp_seqno[2]; + struct i915_request *rq; + u32 seqno[2]; + + rq = i915_request_alloc(engine, i915->kernel_context); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto out; + } + + tl->seqno = -4u; + + err = i915_timeline_get_seqno(tl, rq, &seqno[0]); + if (err) { + i915_request_add(rq); + goto out; + } + pr_debug("seqno[0]:%08x, hwsp_offset:%08x\n", + seqno[0], tl->hwsp_offset); + + err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[0]); + if (err) { + i915_request_add(rq); + goto out; + } + hwsp_seqno[0] = tl->hwsp_seqno; + + err = i915_timeline_get_seqno(tl, rq, &seqno[1]); + if (err) { + i915_request_add(rq); + goto out; + } + pr_debug("seqno[1]:%08x, hwsp_offset:%08x\n", + seqno[1], tl->hwsp_offset); + + err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[1]); + if (err) { + i915_request_add(rq); + goto out; + } + hwsp_seqno[1] = tl->hwsp_seqno; + + /* With wrap should come a new hwsp */ + GEM_BUG_ON(seqno[1] >= seqno[0]); + GEM_BUG_ON(hwsp_seqno[0] == hwsp_seqno[1]); + + i915_request_add(rq); + + if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) { + pr_err("Wait for timeline writes timed out!\n"); + err = -EIO; + goto out; + } + + if (*hwsp_seqno[0] != seqno[0] || *hwsp_seqno[1] != seqno[1]) { + pr_err("Bad timeline values: found (%x, %x), expected (%x, %x)\n", + *hwsp_seqno[0], *hwsp_seqno[1], + seqno[0], seqno[1]); + err = -EINVAL; + goto out; + } + + i915_retire_requests(i915); /* recycle HWSP */ + } + +out: + if (igt_flush_test(i915, I915_WAIT_LOCKED)) + err = -EIO; + + i915_timeline_unpin(tl); +out_free: + i915_timeline_put(tl); +out_rpm: + intel_runtime_pm_put(i915, wakeref); + mutex_unlock(&i915->drm.struct_mutex); + + return err; +} + static int live_hwsp_recycle(void *arg) { struct drm_i915_private *i915 = arg; @@ -723,6 +832,7 @@ int i915_timeline_live_selftests(struct drm_i915_private *i915) SUBTEST(live_hwsp_recycle), SUBTEST(live_hwsp_engine), SUBTEST(live_hwsp_alternate), + SUBTEST(live_hwsp_wrap), }; return i915_subtests(tests, i915); From patchwork Mon Feb 4 13:22:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795671 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3A01C922 for ; Mon, 4 Feb 2019 13:40:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 297D52A868 for ; Mon, 4 Feb 2019 13:40:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 277612AF5E; Mon, 4 Feb 2019 13:40:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C75CF2A868 for ; Mon, 4 Feb 2019 13:40:33 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5180E6E4FA; Mon, 4 Feb 2019 13:40:30 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 671896E4FA for ; Mon, 4 Feb 2019 13:40:27 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458872-1500050 for multiple; Mon, 04 Feb 2019 13:22:27 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:11 +0000 Message-Id: <20190204132214.9459-20-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 19/22] drm/i915/execlists: Refactor out can_merge_rq() X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP In the next patch, we add another user that wants to check whether requests can be merge into a single HW execution, and in the future we want to add more conditions under which requests from the same context cannot be merge. In preparation, extract out can_merge_rq(). v2: Reorder tests to decide if we can continue filling ELSP and bonus comments. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/intel_lrc.c | 35 ++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index e37f207afb5a..66d465708bc6 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -285,12 +285,11 @@ static inline bool need_preempt(const struct intel_engine_cs *engine, } __maybe_unused static inline bool -assert_priority_queue(const struct intel_engine_execlists *execlists, - const struct i915_request *prev, +assert_priority_queue(const struct i915_request *prev, const struct i915_request *next) { - if (!prev) - return true; + const struct intel_engine_execlists *execlists = + &prev->engine->execlists; /* * Without preemption, the prev may refer to the still active element @@ -601,6 +600,17 @@ static bool can_merge_ctx(const struct intel_context *prev, return true; } +static bool can_merge_rq(const struct i915_request *prev, + const struct i915_request *next) +{ + GEM_BUG_ON(!assert_priority_queue(prev, next)); + + if (!can_merge_ctx(prev->hw_context, next->hw_context)) + return false; + + return true; +} + static void port_assign(struct execlist_port *port, struct i915_request *rq) { GEM_BUG_ON(rq == port_request(port)); @@ -753,8 +763,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) int i; priolist_for_each_request_consume(rq, rn, p, i) { - GEM_BUG_ON(!assert_priority_queue(execlists, last, rq)); - /* * Can we combine this request with the current port? * It has to be the same context/ringbuffer and not @@ -766,8 +774,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * second request, and so we never need to tell the * hardware about the first. */ - if (last && - !can_merge_ctx(rq->hw_context, last->hw_context)) { + if (last && !can_merge_rq(last, rq)) { /* * If we are on the second port and cannot * combine this request with the last, then we @@ -776,6 +783,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (port == last_port) goto done; + /* + * We must not populate both ELSP[] with the + * same LRCA, i.e. we must submit 2 different + * contexts if we submit 2 ELSP. + */ + if (last->hw_context == rq->hw_context) + goto done; + /* * If GVT overrides us we only ever submit * port[0], leaving port[1] empty. Note that we @@ -787,7 +802,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) ctx_single_port_submission(rq->hw_context)) goto done; - GEM_BUG_ON(last->hw_context == rq->hw_context); if (submit) port_assign(port, last); @@ -826,8 +840,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * request triggering preemption on the next dequeue (or subsequent * interrupt for secondary ports). */ - execlists->queue_priority_hint = - port != execlists->port ? rq_prio(last) : INT_MIN; + execlists->queue_priority_hint = queue_prio(execlists); if (submit) { port_assign(port, last); From patchwork Mon Feb 4 13:22:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795665 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 50536922 for ; Mon, 4 Feb 2019 13:40:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3D6FF2AB0E for ; Mon, 4 Feb 2019 13:40:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 321C62AE53; Mon, 4 Feb 2019 13:40:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5A1342AB0E for ; Mon, 4 Feb 2019 13:40:30 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8DF856E4F9; Mon, 4 Feb 2019 13:40:28 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 657726E4F9 for ; Mon, 4 Feb 2019 13:40:27 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458873-1500050 for multiple; Mon, 04 Feb 2019 13:22:27 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:12 +0000 Message-Id: <20190204132214.9459-21-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 20/22] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Having introduced per-context seqno, we now have a means to identity progress across the system without feel of rollback as befell the global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in advance of submission safe in the knowledge that our target seqno and address is stable. However, since we are telling the GPU to busy-spin on the target address until it matches the signaling seqno, we only want to do so when we are sure that busy-spin will be completed quickly. To achieve this we only submit the request to HW once the signaler is itself executing (modulo preemption causing us to wait longer), and we only do so for default and above priority requests (so that idle priority tasks never themselves hog the GPU waiting for others). v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway. v4: Tell the world and include it as part of scheduler caps. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.c | 2 +- drivers/gpu/drm/i915/i915_request.c | 136 +++++++++++++++++++++- drivers/gpu/drm/i915/i915_request.h | 1 + drivers/gpu/drm/i915/i915_sw_fence.c | 4 +- drivers/gpu/drm/i915/i915_sw_fence.h | 3 + drivers/gpu/drm/i915/intel_engine_cs.c | 1 + drivers/gpu/drm/i915/intel_gpu_commands.h | 5 + drivers/gpu/drm/i915/intel_lrc.c | 1 + drivers/gpu/drm/i915/intel_ringbuffer.h | 7 ++ include/uapi/drm/i915_drm.h | 1 + 10 files changed, 156 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index a7aaa1ac4c99..7e38e2b61a2e 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -349,7 +349,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data, value = min_t(int, INTEL_PPGTT(dev_priv), I915_GEM_PPGTT_FULL); break; case I915_PARAM_HAS_SEMAPHORES: - value = 0; + value = !!(dev_priv->caps.scheduler & I915_SCHEDULER_CAP_SEMAPHORES); break; case I915_PARAM_HAS_SECURE_BATCHES: value = capable(CAP_SYS_ADMIN); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 057bffa56700..116bd9648db7 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -22,8 +22,9 @@ * */ -#include #include +#include +#include #include #include #include @@ -32,9 +33,16 @@ #include "i915_active.h" #include "i915_reset.h" +struct execute_cb { + struct list_head link; + struct irq_work work; + struct i915_sw_fence *fence; +}; + static struct i915_global_request { struct kmem_cache *slab_requests; struct kmem_cache *slab_dependencies; + struct kmem_cache *slab_execute_cbs; } global; static const char *i915_fence_get_driver_name(struct dma_fence *fence) @@ -331,6 +339,69 @@ void i915_request_retire_upto(struct i915_request *rq) } while (tmp != rq); } +static void irq_execute_cb(struct irq_work *wrk) +{ + struct execute_cb *cb = container_of(wrk, typeof(*cb), work); + + i915_sw_fence_complete(cb->fence); + kmem_cache_free(global.slab_execute_cbs, cb); +} + +static void __notify_execute_cb(struct i915_request *rq) +{ + struct execute_cb *cb; + + lockdep_assert_held(&rq->lock); + + if (list_empty(&rq->execute_cb)) + return; + + list_for_each_entry(cb, &rq->execute_cb, link) + irq_work_queue(&cb->work); + + /* + * XXX Rollback on __i915_request_unsubmit() + * + * In the future, perhaps when we have an active time-slicing scheduler, + * it will be interesting to unsubmit parallel execution and remove + * busywaits from the GPU until their master is restarted. This is + * quite hairy, we have to carefully rollback the fence and do a + * preempt-to-idle cycle on the target engine, all the while the + * master execute_cb may refire. + */ + INIT_LIST_HEAD(&rq->execute_cb); +} + +static int +i915_request_await_execution(struct i915_request *rq, + struct i915_request *signal, + gfp_t gfp) +{ + struct execute_cb *cb; + + if (i915_request_is_active(signal)) + return 0; + + cb = kmem_cache_alloc(global.slab_execute_cbs, gfp); + if (!cb) + return -ENOMEM; + + cb->fence = &rq->submit; + i915_sw_fence_await(cb->fence); + init_irq_work(&cb->work, irq_execute_cb); + + spin_lock_irq(&signal->lock); + if (i915_request_is_active(signal)) { + i915_sw_fence_complete(cb->fence); + kmem_cache_free(global.slab_execute_cbs, cb); + } else { + list_add_tail(&cb->link, &signal->execute_cb); + } + spin_unlock_irq(&signal->lock); + + return 0; +} + static void move_to_timeline(struct i915_request *request, struct i915_timeline *timeline) { @@ -389,6 +460,7 @@ void __i915_request_submit(struct i915_request *request) */ BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */ request->sched.attr.priority |= __NO_PREEMPTION; + __notify_execute_cb(request); spin_unlock(&request->lock); @@ -630,6 +702,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) } INIT_LIST_HEAD(&rq->active_list); + INIT_LIST_HEAD(&rq->execute_cb); tl = ce->ring->timeline; ret = i915_timeline_get_seqno(tl, rq, &seqno); @@ -717,6 +790,51 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) return ERR_PTR(ret); } +static int +emit_semaphore_wait(struct i915_request *to, + struct i915_request *from, + gfp_t gfp) +{ + u32 *cs; + int err; + + GEM_BUG_ON(!from->timeline->has_initial_breadcrumb); + GEM_BUG_ON(INTEL_GEN(to->i915) < 8); + + /* We need to pin the signaler's HWSP until we are finished reading. */ + err = i915_timeline_read_lock(from->timeline, to); + if (err) + return err; + + /* Only submit our spinner after the signaler is running! */ + err = i915_request_await_execution(to, from, gfp); + if (err) + return err; + + cs = intel_ring_begin(to, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + /* + * Using greater-than-or-equal here means we have to worry + * about seqno wraparound. To side step that issue, we swap + * the timeline HWSP upon wrapping, so that everyone listening + * for the old (pre-wrap) values do not see the much smaller + * (post-wrap) values than they were expecting (and so wait + * forever). + */ + *cs++ = MI_SEMAPHORE_WAIT | + MI_SEMAPHORE_GLOBAL_GTT | + MI_SEMAPHORE_POLL | + MI_SEMAPHORE_SAD_GTE_SDD; + *cs++ = from->fence.seqno; + *cs++ = from->timeline->hwsp_offset; + *cs++ = 0; + + intel_ring_advance(to, cs); + return 0; +} + static int i915_request_await_request(struct i915_request *to, struct i915_request *from) { @@ -738,6 +856,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from) ret = i915_sw_fence_await_sw_fence_gfp(&to->submit, &from->submit, I915_FENCE_GFP); + } else if (intel_engine_has_semaphores(to->engine) && + to->gem_context->sched.priority >= I915_PRIORITY_NORMAL) { + ret = emit_semaphore_wait(to, from, I915_FENCE_GFP); } else { ret = i915_sw_fence_await_dma_fence(&to->submit, &from->fence, 0, @@ -1212,14 +1333,23 @@ int i915_global_request_init(void) if (!global.slab_requests) return -ENOMEM; + global.slab_execute_cbs = KMEM_CACHE(execute_cb, + SLAB_HWCACHE_ALIGN | + SLAB_RECLAIM_ACCOUNT | + SLAB_TYPESAFE_BY_RCU); + if (!global.slab_execute_cbs) + goto err_requests; + global.slab_dependencies = KMEM_CACHE(i915_dependency, SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT); if (!global.slab_dependencies) - goto err_requests; + goto err_execute_cbs; return 0; +err_execute_cbs: + kmem_cache_destroy(global.slab_execute_cbs); err_requests: kmem_cache_destroy(global.slab_requests); return -ENOMEM; @@ -1228,11 +1358,13 @@ int i915_global_request_init(void) void i915_global_request_shrink(void) { kmem_cache_shrink(global.slab_dependencies); + kmem_cache_shrink(global.slab_execute_cbs); kmem_cache_shrink(global.slab_requests); } void i915_global_request_exit(void) { kmem_cache_destroy(global.slab_dependencies); + kmem_cache_destroy(global.slab_execute_cbs); kmem_cache_destroy(global.slab_requests); } diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 071ff1064579..df52776b26cf 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -128,6 +128,7 @@ struct i915_request { */ struct i915_sw_fence submit; wait_queue_entry_t submitq; + struct list_head execute_cb; /* * A list of everyone we wait upon, and everyone who waits upon us. diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index 7c58b049ecb5..8d1400d378d7 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -192,7 +192,7 @@ static void __i915_sw_fence_complete(struct i915_sw_fence *fence, __i915_sw_fence_notify(fence, FENCE_FREE); } -static void i915_sw_fence_complete(struct i915_sw_fence *fence) +void i915_sw_fence_complete(struct i915_sw_fence *fence) { debug_fence_assert(fence); @@ -202,7 +202,7 @@ static void i915_sw_fence_complete(struct i915_sw_fence *fence) __i915_sw_fence_complete(fence, NULL); } -static void i915_sw_fence_await(struct i915_sw_fence *fence) +void i915_sw_fence_await(struct i915_sw_fence *fence) { debug_fence_assert(fence); WARN_ON(atomic_inc_return(&fence->pending) <= 1); diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h index 0e055ea0179f..6dec9e1d1102 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.h +++ b/drivers/gpu/drm/i915/i915_sw_fence.h @@ -79,6 +79,9 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence, unsigned long timeout, gfp_t gfp); +void i915_sw_fence_await(struct i915_sw_fence *fence); +void i915_sw_fence_complete(struct i915_sw_fence *fence); + static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence) { return atomic_read(&fence->pending) <= 0; diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 0dbd6d7c1693..30a308ebbc89 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -621,6 +621,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915) u32 sched_cap; } map[] = { { I915_ENGINE_HAS_PREEMPTION, I915_SCHEDULER_CAP_PREEMPTION }, + { I915_ENGINE_HAS_SEMAPHORES, I915_SCHEDULER_CAP_SEMAPHORES }, { I915_ENGINE_SUPPORTS_STATS, I915_SCHEDULER_CAP_PMU }, }; struct intel_engine_cs *engine; diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h index b96a31bc1080..0efaadd3bc32 100644 --- a/drivers/gpu/drm/i915/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/intel_gpu_commands.h @@ -106,7 +106,12 @@ #define MI_SEMAPHORE_TARGET(engine) ((engine)<<15) #define MI_SEMAPHORE_WAIT MI_INSTR(0x1c, 2) /* GEN8+ */ #define MI_SEMAPHORE_POLL (1<<15) +#define MI_SEMAPHORE_SAD_GT_SDD (0<<12) #define MI_SEMAPHORE_SAD_GTE_SDD (1<<12) +#define MI_SEMAPHORE_SAD_LT_SDD (2<<12) +#define MI_SEMAPHORE_SAD_LTE_SDD (3<<12) +#define MI_SEMAPHORE_SAD_EQ_SDD (4<<12) +#define MI_SEMAPHORE_SAD_NEQ_SDD (5<<12) #define MI_STORE_DWORD_IMM MI_INSTR(0x20, 1) #define MI_STORE_DWORD_IMM_GEN4 MI_INSTR(0x20, 2) #define MI_MEM_VIRTUAL (1 << 22) /* 945,g33,965 */ diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 66d465708bc6..ea813f88fbb3 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -2307,6 +2307,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine) engine->park = NULL; engine->unpark = NULL; + engine->flags |= I915_ENGINE_HAS_SEMAPHORES; engine->flags |= I915_ENGINE_SUPPORTS_STATS; if (engine->i915->preempt_context) engine->flags |= I915_ENGINE_HAS_PREEMPTION; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 5dffccb6740e..c9cd60444987 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -496,6 +496,7 @@ struct intel_engine_cs { #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0) #define I915_ENGINE_SUPPORTS_STATS BIT(1) #define I915_ENGINE_HAS_PREEMPTION BIT(2) +#define I915_ENGINE_HAS_SEMAPHORES BIT(3) unsigned int flags; /* @@ -573,6 +574,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine) return engine->flags & I915_ENGINE_HAS_PREEMPTION; } +static inline bool +intel_engine_has_semaphores(const struct intel_engine_cs *engine) +{ + return engine->flags & I915_ENGINE_HAS_SEMAPHORES; +} + void intel_engines_set_scheduler_caps(struct drm_i915_private *i915); static inline bool __execlists_need_preempt(int prio, int last) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index d8ac7f105734..4b8a07d774b4 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -477,6 +477,7 @@ typedef struct drm_i915_irq_wait { #define I915_SCHEDULER_CAP_PRIORITY (1ul << 1) #define I915_SCHEDULER_CAP_PREEMPTION (1ul << 2) #define I915_SCHEDULER_CAP_PMU (1ul << 3) +#define I915_SCHEDULER_CAP_SEMAPHORES (1ul << 4) #define I915_PARAM_HUC_STATUS 42 From patchwork Mon Feb 4 13:22:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795605 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C1A101805 for ; Mon, 4 Feb 2019 13:22:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B07E22B497 for ; Mon, 4 Feb 2019 13:22:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A4F602B498; Mon, 4 Feb 2019 13:22:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0BA922B49A for ; Mon, 4 Feb 2019 13:22:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BAFF16E4DF; Mon, 4 Feb 2019 13:22:44 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1509C6E4DD for ; Mon, 4 Feb 2019 13:22:42 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458874-1500050 for multiple; Mon, 04 Feb 2019 13:22:27 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:13 +0000 Message-Id: <20190204132214.9459-22-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 21/22] drm/i915: Prioritise non-busywait semaphore workloads X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP We don't want to busywait on the GPU if we have other work to do. If we give non-busywaiting workloads higher (initial) priority than workloads that require a busywait, we will prioritise work that is ready to run immediately. We then also have to be careful that we don't give earlier semaphores an accidental boost because later work doesn't wait on other rings, hence we keep a history of semaphore usage of the dependency chain. Testcase: igt/gem_exec_schedule/semaphore Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_request.c | 16 ++++++++++++++++ drivers/gpu/drm/i915/i915_scheduler.c | 10 ++++++++++ drivers/gpu/drm/i915/i915_scheduler.h | 9 ++++++--- drivers/gpu/drm/i915/intel_lrc.c | 2 +- 4 files changed, 33 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 116bd9648db7..91dcaf07958f 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -832,6 +832,7 @@ emit_semaphore_wait(struct i915_request *to, *cs++ = 0; intel_ring_advance(to, cs); + to->sched.semaphore |= I915_SCHED_HAS_SEMAPHORE; return 0; } @@ -1102,6 +1103,21 @@ void i915_request_add(struct i915_request *request) if (engine->schedule) { struct i915_sched_attr attr = request->gem_context->sched; + /* + * Boost actual workloads past semaphores! + * + * With semaphores we spin on one engine waiting for another, + * simply to reduce the latency of starting our work when + * the signaler completes. However, if there is any other + * work that we could be doing on this engine instead, that + * is better utilisation and will reduce the overall duration + * of the current work. To avoid PI boosting a semaphore + * far in the distance past over useful work, we keep a history + * of any semaphore use along our dependency chain. + */ + if (!request->sched.semaphore) + attr.priority |= I915_PRIORITY_NOSEMAPHORE; + /* * Boost priorities to new clients (new request flows). * diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 7c1d9ef98374..9675ead24f75 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -28,12 +28,18 @@ static inline bool node_signaled(const struct i915_sched_node *node) return i915_request_completed(node_to_request(node)); } +static inline bool node_started(const struct i915_sched_node *node) +{ + return i915_request_started(node_to_request(node)); +} + void i915_sched_node_init(struct i915_sched_node *node) { INIT_LIST_HEAD(&node->signalers_list); INIT_LIST_HEAD(&node->waiters_list); INIT_LIST_HEAD(&node->link); node->attr.priority = I915_PRIORITY_INVALID; + node->semaphore = 0; } static struct i915_dependency * @@ -64,6 +70,10 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node, dep->signaler = signal; dep->flags = flags; + /* Keep track of whether anyone on this chain has a semaphore */ + if (signal->semaphore && !node_started(signal)) + node->semaphore |= signal->semaphore << 1; + ret = true; } diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 5196ce07b6c2..24c2c027fd2c 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -24,14 +24,15 @@ enum { I915_PRIORITY_INVALID = INT_MIN }; -#define I915_USER_PRIORITY_SHIFT 2 +#define I915_USER_PRIORITY_SHIFT 3 #define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT) #define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT) #define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1) -#define I915_PRIORITY_WAIT ((u8)BIT(0)) -#define I915_PRIORITY_NEWCLIENT ((u8)BIT(1)) +#define I915_PRIORITY_WAIT ((u8)BIT(0)) +#define I915_PRIORITY_NEWCLIENT ((u8)BIT(1)) +#define I915_PRIORITY_NOSEMAPHORE ((u8)BIT(2)) #define __NO_PREEMPTION (I915_PRIORITY_WAIT) @@ -74,6 +75,8 @@ struct i915_sched_node { struct list_head waiters_list; /* those after us, they depend upon us */ struct list_head link; struct i915_sched_attr attr; + unsigned long semaphore; +#define I915_SCHED_HAS_SEMAPHORE BIT(0) }; struct i915_dependency { diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index ea813f88fbb3..ae90ce034252 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -164,7 +164,7 @@ #define WA_TAIL_DWORDS 2 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS) -#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT) +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE) static int execlists_context_deferred_alloc(struct i915_gem_context *ctx, struct intel_engine_cs *engine, From patchwork Mon Feb 4 13:22:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10795607 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E45C613B5 for ; Mon, 4 Feb 2019 13:22:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D46792B497 for ; Mon, 4 Feb 2019 13:22:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C8EA32B49A; Mon, 4 Feb 2019 13:22:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 971E32B497 for ; Mon, 4 Feb 2019 13:22:49 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 830646E4E6; Mon, 4 Feb 2019 13:22:44 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3DA846E4D7 for ; Mon, 4 Feb 2019 13:22:42 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 15458875-1500050 for multiple; Mon, 04 Feb 2019 13:22:28 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 4 Feb 2019 13:22:14 +0000 Message-Id: <20190204132214.9459-23-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204132214.9459-1-chris@chris-wilson.co.uk> References: <20190204132214.9459-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 22/22] semaphore-no-stats X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP SW PMU reports semaphore time as busy, HW PMU reports semaphore time as idle. Who is correct? --- drivers/gpu/drm/i915/intel_lrc.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index ae90ce034252..d00b268ed6ee 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -2308,7 +2308,6 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine) engine->unpark = NULL; engine->flags |= I915_ENGINE_HAS_SEMAPHORES; - engine->flags |= I915_ENGINE_SUPPORTS_STATS; if (engine->i915->preempt_context) engine->flags |= I915_ENGINE_HAS_PREEMPTION; }