From patchwork Thu Mar 3 10:39:32 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tvrtko Ursulin X-Patchwork-Id: 8491051 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 8CE20C0553 for ; Thu, 3 Mar 2016 10:39:42 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 987EF2035E for ; Thu, 3 Mar 2016 10:39:41 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 67F452035D for ; Thu, 3 Mar 2016 10:39:40 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7965A6EA47; Thu, 3 Mar 2016 10:39:37 +0000 (UTC) X-Original-To: Intel-gfx@lists.freedesktop.org Delivered-To: Intel-gfx@lists.freedesktop.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTP id AE1886EA48 for ; Thu, 3 Mar 2016 10:39:35 +0000 (UTC) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP; 03 Mar 2016 02:39:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,532,1449561600"; d="scan'208";a="925906634" Received: from tursulin-linux.isw.intel.com ([10.102.226.196]) by orsmga002.jf.intel.com with ESMTP; 03 Mar 2016 02:39:33 -0800 From: Tvrtko Ursulin To: Intel-gfx@lists.freedesktop.org Date: Thu, 3 Mar 2016 10:39:32 +0000 Message-Id: <1457001572-31051-1-git-send-email-tvrtko.ursulin@linux.intel.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1456838737-6669-1-git-send-email-tvrtko.ursulin@linux.intel.com> References: <1456838737-6669-1-git-send-email-tvrtko.ursulin@linux.intel.com> Subject: [Intel-gfx] [PATCH v2] drm/i915: Move CSB MMIO reads out of the execlists lock X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,HK_RANDOM_FROM, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Tvrtko Ursulin By reading the CSB (slow MMIO accesses) into a temporary local buffer we can decrease the duration of holding the execlist lock. Main advantage is that during heavy batch buffer submission we reduce the execlist lock contention, which should decrease the latency and CPU usage between the submitting userspace process and interrupt handling. Downside is that we need to grab and relase the forcewake twice, but as the below numbers will show this is completely hidden by the primary gains. Testing with "gem_latency -n 100" (submit batch buffers with a hundred nops each) shows more than doubling of the throughput and more than halving of the dispatch latency, overall latency and CPU time spend in the submitting process. Submitting empty batches ("gem_latency -n 0") does not seem significantly affected by this change with throughput and CPU time improving by half a percent, and overall latency worsening by the same amount. Above tests were done in a hundred runs on a big core Broadwell. v2: * Overflow protection to local CSB buffer. * Use closer dev_priv in execlists_submit_requests. (Chris Wilson) Signed-off-by: Tvrtko Ursulin Cc: Chris Wilson --- drivers/gpu/drm/i915/intel_lrc.c | 75 ++++++++++++++++++++-------------------- 1 file changed, 38 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 27c9ee3f7372..4f19ca7490c4 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -416,15 +416,23 @@ static void execlists_update_context(struct drm_i915_gem_request *rq) static void execlists_submit_requests(struct drm_i915_gem_request *rq0, struct drm_i915_gem_request *rq1) { + struct drm_i915_private *dev_priv = rq0->i915; + execlists_update_context(rq0); if (rq1) execlists_update_context(rq1); + spin_lock(&dev_priv->uncore.lock); + intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL); + execlists_elsp_write(rq0, rq1); + + intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL); + spin_unlock(&dev_priv->uncore.lock); } -static void execlists_context_unqueue__locked(struct intel_engine_cs *ring) +static void execlists_context_unqueue(struct intel_engine_cs *ring) { struct drm_i915_gem_request *req0 = NULL, *req1 = NULL; struct drm_i915_gem_request *cursor, *tmp; @@ -478,19 +486,6 @@ static void execlists_context_unqueue__locked(struct intel_engine_cs *ring) execlists_submit_requests(req0, req1); } -static void execlists_context_unqueue(struct intel_engine_cs *ring) -{ - struct drm_i915_private *dev_priv = ring->dev->dev_private; - - spin_lock(&dev_priv->uncore.lock); - intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL); - - execlists_context_unqueue__locked(ring); - - intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL); - spin_unlock(&dev_priv->uncore.lock); -} - static unsigned int execlists_check_remove_request(struct intel_engine_cs *ring, u32 request_id) { @@ -551,12 +546,10 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring) struct drm_i915_private *dev_priv = ring->dev->dev_private; u32 status_pointer; unsigned int read_pointer, write_pointer; - u32 status = 0; - u32 status_id; + u32 csb[GEN8_CSB_ENTRIES][2]; + unsigned int csb_read = 0, i; unsigned int submit_contexts = 0; - spin_lock(&ring->execlist_lock); - spin_lock(&dev_priv->uncore.lock); intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL); @@ -568,39 +561,47 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring) write_pointer += GEN8_CSB_ENTRIES; while (read_pointer < write_pointer) { - status = get_context_status(ring, ++read_pointer, &status_id); + if (WARN_ON_ONCE(csb_read == GEN8_CSB_ENTRIES)) + break; + csb[csb_read][0] = get_context_status(ring, ++read_pointer, + &csb[csb_read][1]); + csb_read++; + } - if (unlikely(status & GEN8_CTX_STATUS_PREEMPTED)) { - if (status & GEN8_CTX_STATUS_LITE_RESTORE) { - if (execlists_check_remove_request(ring, status_id)) + ring->next_context_status_buffer = write_pointer % GEN8_CSB_ENTRIES; + + /* Update the read pointer to the old write pointer. Manual ringbuffer + * management ftw */ + I915_WRITE_FW(RING_CONTEXT_STATUS_PTR(ring), + _MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, + ring->next_context_status_buffer << 8)); + + intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL); + spin_unlock(&dev_priv->uncore.lock); + + spin_lock(&ring->execlist_lock); + + for (i = 0; i < csb_read; i++) { + if (unlikely(csb[i][0] & GEN8_CTX_STATUS_PREEMPTED)) { + if (csb[i][0] & GEN8_CTX_STATUS_LITE_RESTORE) { + if (execlists_check_remove_request(ring, csb[i][1])) WARN(1, "Lite Restored request removed from queue\n"); } else WARN(1, "Preemption without Lite Restore\n"); } - if (status & (GEN8_CTX_STATUS_ACTIVE_IDLE | + if (csb[i][0] & (GEN8_CTX_STATUS_ACTIVE_IDLE | GEN8_CTX_STATUS_ELEMENT_SWITCH)) submit_contexts += - execlists_check_remove_request(ring, status_id); + execlists_check_remove_request(ring, csb[i][1]); } if (submit_contexts) { if (!ring->disable_lite_restore_wa || - (status & GEN8_CTX_STATUS_ACTIVE_IDLE)) - execlists_context_unqueue__locked(ring); + (csb[i][0] & GEN8_CTX_STATUS_ACTIVE_IDLE)) + execlists_context_unqueue(ring); } - ring->next_context_status_buffer = write_pointer % GEN8_CSB_ENTRIES; - - /* Update the read pointer to the old write pointer. Manual ringbuffer - * management ftw */ - I915_WRITE_FW(RING_CONTEXT_STATUS_PTR(ring), - _MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, - ring->next_context_status_buffer << 8)); - - intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL); - spin_unlock(&dev_priv->uncore.lock); - spin_unlock(&ring->execlist_lock); if (unlikely(submit_contexts > 2))