From patchwork Thu May  4 13:08:46 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chris Wilson <chris@chris-wilson.co.uk>
X-Patchwork-Id: 9711935
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	2A49C6020B for <patchwork-intel-gfx@patchwork.kernel.org>;
	Thu,  4 May 2017 13:08:58 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 21E382867A
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Thu,  4 May 2017 13:08:58 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 16F832869C; Thu,  4 May 2017 13:08:58 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
	autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id ABDA82867E
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Thu,  4 May 2017 13:08:57 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id EB2736E1D8;
	Thu,  4 May 2017 13:08:55 +0000 (UTC)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from fireflyinternet.com (mail.fireflyinternet.com
	[109.228.58.192])
	by gabe.freedesktop.org (Postfix) with ESMTPS id 3246F6E4D5
	for <intel-gfx@lists.freedesktop.org>;
	Thu,  4 May 2017 13:08:55 +0000 (UTC)
X-Default-Received-SPF: pass (skip=forwardok (res=PASS))
	x-ip-name=78.156.65.138;
Received: from haswell.alporthouse.com (unverified [78.156.65.138])
	by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id
	6663706-1500050
	for <intel-gfx@lists.freedesktop.org>; Thu, 04 May 2017 14:08:52 +0100
Received: by haswell.alporthouse.com (sSMTP sendmail emulation);
	Thu, 04 May 2017 14:08:51 +0100
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Thu,  4 May 2017 14:08:46 +0100
Message-Id: <20170504130846.4807-3-chris@chris-wilson.co.uk>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20170504130846.4807-1-chris@chris-wilson.co.uk>
References: <20170504130846.4807-1-chris@chris-wilson.co.uk>
X-Originating-IP: 78.156.65.138
X-Country: code=GB country="United Kingdom" ip=78.156.65.138
Subject: [Intel-gfx] [CI 3/3] drm/i915: Micro-optimise hotpath through
	intel_ring_begin()
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Typically, there is space available within the ring and if not we have
to wait (by definition a slow path). Rearrange the code to reduce the
number of branches and stack size for the hotpath, accomodating a slight
growth for the wait.

v2: Fix the new assert that packets are not larger than the actual ring.
v3: Make the parameters unsigned as well to make usage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 67 ++++++++++++++++++---------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +-
 2 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 47f144b1e3fa..8b427a6151b2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1655,7 +1655,8 @@ static int ring_request_alloc(struct drm_i915_gem_request *request)
 	return 0;
 }
 
-static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
+static noinline int wait_for_space(struct drm_i915_gem_request *req,
+				   unsigned int bytes)
 {
 	struct intel_ring *ring = req->ring;
 	struct drm_i915_gem_request *target;
@@ -1700,52 +1701,56 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 	return 0;
 }
 
-u32 *intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
+u32 *intel_ring_begin(struct drm_i915_gem_request *req,
+		      unsigned int num_dwords)
 {
 	struct intel_ring *ring = req->ring;
-	int remain_actual = ring->size - ring->emit;
-	int remain_usable = ring->effective_size - ring->emit;
-	int bytes = num_dwords * sizeof(u32);
-	int total_bytes, wait_bytes;
-	bool need_wrap = false;
+	const unsigned int remain_usable = ring->effective_size - ring->emit;
+	const unsigned int bytes = num_dwords * sizeof(u32);
+	unsigned int need_wrap = 0;
+	unsigned int total_bytes;
 	u32 *cs;
 
 	total_bytes = bytes + req->reserved_space;
+	GEM_BUG_ON(total_bytes > ring->effective_size);
 
-	if (unlikely(bytes > remain_usable)) {
-		/*
-		 * Not enough space for the basic request. So need to flush
-		 * out the remainder and then wait for base + reserved.
-		 */
-		wait_bytes = remain_actual + total_bytes;
-		need_wrap = true;
-	} else if (unlikely(total_bytes > remain_usable)) {
-		/*
-		 * The base request will fit but the reserved space
-		 * falls off the end. So we don't need an immediate wrap
-		 * and only need to effectively wait for the reserved
-		 * size space from the start of ringbuffer.
-		 */
-		wait_bytes = remain_actual + req->reserved_space;
-	} else {
-		/* No wrapping required, just waiting. */
-		wait_bytes = total_bytes;
+	if (unlikely(total_bytes > remain_usable)) {
+		const int remain_actual = ring->size - ring->emit;
+
+		if (bytes > remain_usable) {
+			/*
+			 * Not enough space for the basic request. So need to
+			 * flush out the remainder and then wait for
+			 * base + reserved.
+			 */
+			total_bytes += remain_actual;
+			need_wrap = remain_actual | 1;
+		} else  {
+			/*
+			 * The base request will fit but the reserved space
+			 * falls off the end. So we don't need an immediate
+			 * wrap and only need to effectively wait for the
+			 * reserved size from the start of ringbuffer.
+			 */
+			total_bytes = req->reserved_space + remain_actual;
+		}
 	}
 
-	if (wait_bytes > ring->space) {
-		int ret = wait_for_space(req, wait_bytes);
+	if (unlikely(total_bytes > ring->space)) {
+		int ret = wait_for_space(req, total_bytes);
 		if (unlikely(ret))
 			return ERR_PTR(ret);
 	}
 
 	if (unlikely(need_wrap)) {
-		GEM_BUG_ON(remain_actual > ring->space);
-		GEM_BUG_ON(ring->emit + remain_actual > ring->size);
+		need_wrap &= ~1;
+		GEM_BUG_ON(need_wrap > ring->space);
+		GEM_BUG_ON(ring->emit + need_wrap > ring->size);
 
 		/* Fill the tail with MI_NOOP */
-		memset(ring->vaddr + ring->emit, 0, remain_actual);
+		memset(ring->vaddr + ring->emit, 0, need_wrap);
 		ring->emit = 0;
-		ring->space -= remain_actual;
+		ring->space -= need_wrap;
 	}
 
 	GEM_BUG_ON(ring->emit > ring->size - bytes);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 3e343b09eeb6..ec16fb6fde62 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -497,7 +497,8 @@ void intel_legacy_submission_resume(struct drm_i915_private *dev_priv);
 
 int __must_check intel_ring_cacheline_align(struct drm_i915_gem_request *req);
 
-u32 __must_check *intel_ring_begin(struct drm_i915_gem_request *req, int n);
+u32 __must_check *intel_ring_begin(struct drm_i915_gem_request *req,
+				   unsigned int n);
 
 static inline void
 intel_ring_advance(struct drm_i915_gem_request *req, u32 *cs)