From patchwork Tue Feb 14 11:44:12 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 9571789 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2A3F960578 for ; Tue, 14 Feb 2017 11:44:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2874E28399 for ; Tue, 14 Feb 2017 11:44:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 19D802839B; Tue, 14 Feb 2017 11:44:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2A9F42839B for ; Tue, 14 Feb 2017 11:44:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AA9EF6E6A8; Tue, 14 Feb 2017 11:44:18 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com [IPv6:2a00:1450:400c:c09::244]) by gabe.freedesktop.org (Postfix) with ESMTPS id 56E7E6E69D for ; Tue, 14 Feb 2017 11:44:17 +0000 (UTC) Received: by mail-wm0-x244.google.com with SMTP id u63so3128215wmu.2 for ; Tue, 14 Feb 2017 03:44:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=zUQ4ROcrYlcAtGtMninxe+nkbWSEx3gt0DhDKgjajls=; b=OcKioq9vMxBR4rk7U2Za+uQfSE9r9KQdu89Xd4kfzH01Iats0HEY0bYdit2Fbq3mzm yXF1qcxf2HaF9YvpbAKABaiyWxBJ6p4eDMrsTTVqUDahu/afXo1TsgX7FF0z/nKxUy8z PDRJeVYafZsuSb94vgrtkIBdPTVj7/wsPD2K4K6ZlHA1tZ8cChCaXqsOeCHK5CqZ4KqX OH3BPSrrDpP1b02/GjDNzDui/pIVAnA51vhsEC9zahbANPkimm4mkumXzz+ZfT5sqgNG tVZM8jXtbqAB4Muw5y+cQikOFiv6ZRugYIzYWoz2QDBxPNlUH2XXsJkLehCcTDhZA+Sf KG+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=zUQ4ROcrYlcAtGtMninxe+nkbWSEx3gt0DhDKgjajls=; b=W3cc2ygsWMEc+Snl4lR5LoJ0mVmjJ65tqkdnrbCEinSkXmEnMDrzbzhSdbj0eQDurk nX1y1eFxkYqAZk2VV7Kbd0k67DsVteHVBPDFIoVI8QZ5hM/jLfVsCrC4y5PHdlIyrhu8 PsGg95ckRBo7my8BMqzGfr45L5CDH2/EJKqTHPoJZ/92p7ypNXyVvIDm92JDELYGQPnT D3NzQETcfV3/XSvRGOkEBB8Oehv9CSQHZkvolGNhgzsz/jaZOSmfZvUazkFQ+GNPd41L aLKSuLBRYv6uc20izPECDi5iicDTIbSsHE0GVduaw2n3Ox94r30gTWxVAvTvw+UvhX/k XxRQ== X-Gm-Message-State: AMke39kjBFnbtq6DQgIHEkP54ZPqSa0z/q7wKmVQTYn8Dbl6Cim0fJMZWW20d9OaihDHYA== X-Received: by 10.28.195.70 with SMTP id t67mr2897343wmf.98.1487072655907; Tue, 14 Feb 2017 03:44:15 -0800 (PST) Received: from haswell.alporthouse.com ([78.156.65.138]) by smtp.gmail.com with ESMTPSA id q16sm420540wra.69.2017.02.14.03.44.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 14 Feb 2017 03:44:15 -0800 (PST) From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 14 Feb 2017 11:44:12 +0000 Message-Id: <20170214114412.8841-2-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170214114412.8841-1-chris@chris-wilson.co.uk> References: <20170214114412.8841-1-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH 2/2] drm/i915/scheduler: emulate a scheduler for guc X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP This emulates execlists on top of the GuC in order to defer submission of requests to the hardware. This deferral allows time for high priority requests to gazump their way to the head of the queue, however it nerfs the GuC by converting it back into a simple execlist (where the CPU has to wake up after every request to feed new commands into the GuC). v2: Drop hack status - though iirc there is still a lockdep inversion between fence and engine->timeline->lock (which is impossible as the nesting only occurs on different fences - hopefully just requires some judicious lockdep annotation) v3: Apply lockdep nesting to enabling signaling on the request, using the pattern we already have in __i915_gem_request_submit(); v4: Replaying requests after a hang also now needs the timeline spinlock, to disable the interrupts at least v5: Hold wq lock for completeness, and emit a tracepoint for enabling signal v6: Reorder interrupt checking for a happier gcc. v7: Only signal the tasklet after a user-interrupt if using guc scheduling v8: Restore lost update of rq through the i915_guc_irq_handler (Tvrtko) v9: Avoid re-initialising the engine->irq_tasklet from inside a reset Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Reviewed-by: Tvrtko Ursulin Acked-by: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_guc_submission.c | 113 +++++++++++++++++++++++++++-- drivers/gpu/drm/i915/i915_irq.c | 13 +++- drivers/gpu/drm/i915/intel_lrc.c | 5 +- 3 files changed, 117 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c index 8ced9e26f075..5860691aaa48 100644 --- a/drivers/gpu/drm/i915/i915_guc_submission.c +++ b/drivers/gpu/drm/i915/i915_guc_submission.c @@ -25,6 +25,8 @@ #include "i915_drv.h" #include "intel_uc.h" +#include + /** * DOC: GuC-based command submission * @@ -348,7 +350,7 @@ int i915_guc_wq_reserve(struct drm_i915_gem_request *request) u32 freespace; int ret; - spin_lock(&client->wq_lock); + spin_lock_irq(&client->wq_lock); freespace = CIRC_SPACE(client->wq_tail, desc->head, client->wq_size); freespace -= client->wq_rsvd; if (likely(freespace >= wqi_size)) { @@ -358,7 +360,7 @@ int i915_guc_wq_reserve(struct drm_i915_gem_request *request) client->no_wq_space++; ret = -EAGAIN; } - spin_unlock(&client->wq_lock); + spin_unlock_irq(&client->wq_lock); return ret; } @@ -370,9 +372,9 @@ void i915_guc_wq_unreserve(struct drm_i915_gem_request *request) GEM_BUG_ON(READ_ONCE(client->wq_rsvd) < wqi_size); - spin_lock(&client->wq_lock); + spin_lock_irq(&client->wq_lock); client->wq_rsvd -= wqi_size; - spin_unlock(&client->wq_lock); + spin_unlock_irq(&client->wq_lock); } /* Construct a Work Item and append it to the GuC's Work Queue */ @@ -532,10 +534,97 @@ static void __i915_guc_submit(struct drm_i915_gem_request *rq) static void i915_guc_submit(struct drm_i915_gem_request *rq) { - i915_gem_request_submit(rq); + __i915_gem_request_submit(rq); __i915_guc_submit(rq); } +static void nested_enable_signaling(struct drm_i915_gem_request *rq) +{ + /* If we use dma_fence_enable_sw_signaling() directly, lockdep + * detects an ordering issue between the fence lockclass and the + * global_timeline. This circular dependency can only occur via 2 + * different fences (but same fence lockclass), so we use the nesting + * annotation here to prevent the warn, equivalent to the nesting + * inside i915_gem_request_submit() for when we also enable the + * signaler. + */ + + if (test_and_set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, + &rq->fence.flags)) + return; + + GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags)); + trace_dma_fence_enable_signal(&rq->fence); + + spin_lock_nested(&rq->lock, SINGLE_DEPTH_NESTING); + intel_engine_enable_signaling(rq); + spin_unlock(&rq->lock); +} + +static bool i915_guc_dequeue(struct intel_engine_cs *engine) +{ + struct execlist_port *port = engine->execlist_port; + struct drm_i915_gem_request *last = port[0].request; + unsigned long flags; + struct rb_node *rb; + bool submit = false; + + spin_lock_irqsave(&engine->timeline->lock, flags); + rb = engine->execlist_first; + while (rb) { + struct drm_i915_gem_request *cursor = + rb_entry(rb, typeof(*cursor), priotree.node); + + if (last && cursor->ctx != last->ctx) { + if (port != engine->execlist_port) + break; + + i915_gem_request_assign(&port->request, last); + nested_enable_signaling(last); + port++; + } + + rb = rb_next(rb); + rb_erase(&cursor->priotree.node, &engine->execlist_queue); + RB_CLEAR_NODE(&cursor->priotree.node); + cursor->priotree.priority = INT_MAX; + + i915_guc_submit(cursor); + last = cursor; + submit = true; + } + if (submit) { + i915_gem_request_assign(&port->request, last); + nested_enable_signaling(last); + engine->execlist_first = rb; + } + spin_unlock_irqrestore(&engine->timeline->lock, flags); + + return submit; +} + +static void i915_guc_irq_handler(unsigned long data) +{ + struct intel_engine_cs *engine = (struct intel_engine_cs *)data; + struct execlist_port *port = engine->execlist_port; + struct drm_i915_gem_request *rq; + bool submit; + + do { + rq = port[0].request; + while (rq && i915_gem_request_completed(rq)) { + i915_gem_request_put(rq); + port[0].request = port[1].request; + port[1].request = NULL; + rq = port[0].request; + } + + submit = false; + if (!port[1].request) + submit = i915_guc_dequeue(engine); + } while (submit); +} + /* * Everything below here is concerned with setup & teardown, and is * therefore not part of the somewhat time-critical batch-submission @@ -944,15 +1033,25 @@ int i915_guc_submission_enable(struct drm_i915_private *dev_priv) /* Take over from manual control of ELSP (execlists) */ for_each_engine(engine, dev_priv, id) { struct drm_i915_gem_request *rq; + unsigned long flags; - engine->submit_request = i915_guc_submit; - engine->schedule = NULL; + /* The tasklet was initialised by execlists, and may be in + * a state of flux (across a reset) and so we just want to + * take over the callback without changing any other state + * in the tasklet. + */ + engine->irq_tasklet.func = i915_guc_irq_handler; /* Replay the current set of previously submitted requests */ + spin_lock_irqsave(&engine->timeline->lock, flags); list_for_each_entry(rq, &engine->timeline->requests, link) { + spin_lock(&client->wq_lock); client->wq_rsvd += sizeof(struct guc_wq_item); + spin_unlock(&client->wq_lock); + __i915_guc_submit(rq); } + spin_unlock_irqrestore(&engine->timeline->lock, flags); } return 0; diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index cdc7da60d37a..aa886b5fb2cd 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1350,13 +1350,20 @@ static void snb_gt_irq_handler(struct drm_i915_private *dev_priv, static __always_inline void gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir, int test_shift) { - if (iir & (GT_RENDER_USER_INTERRUPT << test_shift)) - notify_ring(engine); + bool tasklet = false; if (iir & (GT_CONTEXT_SWITCH_INTERRUPT << test_shift)) { set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted); - tasklet_hi_schedule(&engine->irq_tasklet); + tasklet = true; } + + if (iir & (GT_RENDER_USER_INTERRUPT << test_shift)) { + notify_ring(engine); + tasklet |= i915.enable_guc_submission; + } + + if (tasklet) + tasklet_hi_schedule(&engine->irq_tasklet); } static irqreturn_t gen8_gt_irq_ack(struct drm_i915_private *dev_priv, diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index af717e1356a9..09e57755c325 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1300,7 +1300,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine) /* After a GPU reset, we may have requests to replay */ clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted); - if (!execlists_elsp_idle(engine)) { + if (!i915.enable_guc_submission && !execlists_elsp_idle(engine)) { DRM_DEBUG_DRIVER("Restarting %s from requests [0x%x, 0x%x]\n", engine->name, port_seqno(&engine->execlist_port[0]), @@ -1385,9 +1385,6 @@ static void reset_common_ring(struct intel_engine_cs *engine, request->ring->last_retired_head = -1; intel_ring_update_space(request->ring); - if (i915.enable_guc_submission) - return; - /* Catch up with any missed context-switch interrupts */ if (request->ctx != port[0].request->ctx) { i915_gem_request_put(port[0].request);