From patchwork Sun May 6 22:47:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10383049 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B6F5260467 for ; Sun, 6 May 2018 22:47:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9EC9D28A15 for ; Sun, 6 May 2018 22:47:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9365728A48; Sun, 6 May 2018 22:47:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2149428A15 for ; Sun, 6 May 2018 22:47:47 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5F7DF6E1FB; Sun, 6 May 2018 22:47:44 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6AD5F6E11E for ; Sun, 6 May 2018 22:47:42 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 11616630-1500050 for multiple; Sun, 06 May 2018 23:47:34 +0100 Received: by haswell.alporthouse.com (sSMTP sendmail emulation); Sun, 06 May 2018 23:47:34 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Sun, 6 May 2018 23:47:28 +0100 Message-Id: <20180506224728.25450-3-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180506224728.25450-1-chris@chris-wilson.co.uk> References: <20180506224728.25450-1-chris@chris-wilson.co.uk> X-Originating-IP: 78.156.65.138 X-Country: code=GB country="United Kingdom" ip=78.156.65.138 Subject: [Intel-gfx] [PATCH 3/3] drm/i915/execlists: Direct submit onto idle engines X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Bypass using the tasklet to submit the first request to HW, as the tasklet may be deferred unto ksoftirqd and at a minimum will add in excess of 10us (and maybe tens of milliseconds) to our execution latency. This latency reduction is most notable when execution flows between engines. Suggested-by: Tvrtko Ursulin Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/intel_guc_submission.c | 1 + drivers/gpu/drm/i915/intel_lrc.c | 64 +++++++++++++++++---- drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++ 3 files changed, 60 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index 62828e39ee26..d899a2e6fa7d 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -1255,6 +1255,7 @@ int intel_guc_submission_enable(struct intel_guc *guc) engine->unpark = guc_submission_unpark; engine->flags &= ~I915_ENGINE_SUPPORTS_STATS; + engine->flags |= I915_ENGINE_HAS_GUC; } return 0; diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index f9f4064dec0e..8a1ed31c4fc3 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -554,7 +554,7 @@ static void inject_preempt_context(struct intel_engine_cs *engine) execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT); } -static void execlists_dequeue(struct intel_engine_cs *engine) +static bool __execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; struct execlist_port *port = execlists->port; @@ -564,6 +564,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine) struct rb_node *rb; bool submit = false; + lockdep_assert_held(&engine->timeline.lock); + /* Hardware submission is through 2 ports. Conceptually each port * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is * static for a context, and unique to each, so we only execute @@ -585,7 +587,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * and context switches) submission. */ - spin_lock_irq(&engine->timeline.lock); rb = execlists->first; GEM_BUG_ON(rb_first(&execlists->queue) != rb); @@ -598,6 +599,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine) */ GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_USER)); + GEM_BUG_ON(execlists_is_active(execlists, + EXECLISTS_ACTIVE_PREEMPT)); GEM_BUG_ON(!port_count(&port[0])); if (port_count(&port[0]) > 1) goto unlock; @@ -745,12 +748,23 @@ static void execlists_dequeue(struct intel_engine_cs *engine) GEM_BUG_ON(execlists->first && !port_isset(execlists->port)); unlock: + if (last) + execlists_user_begin(execlists, execlists->port); + + return submit; +} + +static void execlists_dequeue(struct intel_engine_cs *engine) +{ + struct intel_engine_execlists * const execlists = &engine->execlists; + bool submit; + + spin_lock_irq(&engine->timeline.lock); + submit = __execlists_dequeue(engine); spin_unlock_irq(&engine->timeline.lock); - if (submit) { - execlists_user_begin(execlists, execlists->port); + if (submit) execlists_submit_ports(engine); - } GEM_BUG_ON(port_isset(execlists->port) && !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER)); @@ -1147,16 +1161,41 @@ static void queue_request(struct intel_engine_cs *engine, &lookup_priolist(engine, node, prio)->requests); } -static void __submit_queue(struct intel_engine_cs *engine, int prio) +static void __wakeup_queue(struct intel_engine_cs *engine, int prio) { engine->execlists.queue_priority = prio; +} + +static void __schedule_queue(struct intel_engine_cs *engine) +{ tasklet_hi_schedule(&engine->execlists.tasklet); } +static void __submit_queue(struct intel_engine_cs *engine) +{ + struct intel_engine_execlists * const execlists = &engine->execlists; + + GEM_BUG_ON(!engine->i915->gt.awake); + + /* Directly submit the first request to reduce the initial latency */ + if (!intel_engine_has_guc(engine) && + !port_isset(execlists->port) && + tasklet_trylock(&execlists->tasklet)) { + if (__execlists_dequeue(engine)) + execlists_submit_ports(engine); + tasklet_unlock(&execlists->tasklet); + return; + } + + __schedule_queue(engine); +} + static void submit_queue(struct intel_engine_cs *engine, int prio) { - if (prio > engine->execlists.queue_priority) - __submit_queue(engine, prio); + if (prio > engine->execlists.queue_priority) { + __wakeup_queue(engine, prio); + __submit_queue(engine); + } } static void execlists_submit_request(struct i915_request *request) @@ -1168,10 +1207,9 @@ static void execlists_submit_request(struct i915_request *request) spin_lock_irqsave(&engine->timeline.lock, flags); queue_request(engine, &request->sched, rq_prio(request)); - submit_queue(engine, rq_prio(request)); - GEM_BUG_ON(!engine->execlists.first); GEM_BUG_ON(list_empty(&request->sched.link)); + submit_queue(engine, rq_prio(request)); spin_unlock_irqrestore(&engine->timeline.lock, flags); } @@ -1293,8 +1331,10 @@ static void execlists_schedule(struct i915_request *request, } if (prio > engine->execlists.queue_priority && - i915_sw_fence_done(&sched_to_request(node)->submit)) - __submit_queue(engine, prio); + i915_sw_fence_done(&sched_to_request(node)->submit)) { + __wakeup_queue(engine, prio); + __schedule_queue(engine); + } } spin_unlock_irq(&engine->timeline.lock); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 010750e8ee44..3d13835d4a87 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -569,6 +569,7 @@ struct intel_engine_cs { #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0) #define I915_ENGINE_SUPPORTS_STATS BIT(1) #define I915_ENGINE_HAS_PREEMPTION BIT(2) +#define I915_ENGINE_HAS_GUC BIT(3) unsigned int flags; /* @@ -646,6 +647,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine) return engine->flags & I915_ENGINE_HAS_PREEMPTION; } +static inline bool +intel_engine_has_guc(const struct intel_engine_cs *engine) +{ + return engine->flags & I915_ENGINE_HAS_GUC; +} + static inline bool __execlists_need_preempt(int prio, int last) { return prio > max(0, last);