From patchwork Fri Jun  3 16:08:44 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chris Wilson <chris@chris-wilson.co.uk>
X-Patchwork-Id: 9153489
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	84C6D60221 for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri,  3 Jun 2016 16:09:41 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 756C325EF7
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri,  3 Jun 2016 16:09:41 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 696EE28309; Fri,  3 Jun 2016 16:09:41 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D3E1B25EF7
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri,  3 Jun 2016 16:09:40 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id B34676EE16;
	Fri,  3 Jun 2016 16:09:30 +0000 (UTC)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com
	[IPv6:2a00:1450:400c:c09::244])
	by gabe.freedesktop.org (Postfix) with ESMTPS id 34E0E6EE0A
	for <intel-gfx@lists.freedesktop.org>;
	Fri,  3 Jun 2016 16:09:14 +0000 (UTC)
Received: by mail-wm0-x244.google.com with SMTP id a20so428558wma.3
	for <intel-gfx@lists.freedesktop.org>;
	Fri, 03 Jun 2016 09:09:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=sender:from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=29ZdsvcDI+jdCnQDsb4Bt36IF7LWD0aypzaO5CIuArg=;
	b=pK22pw6yu1P7UBLKiC83CeaoAkbvXVrfz+pRWoW4Q0R71DeZcfFiPwIHGizEedFx31
	0VZIqqDIph/YI5NthJ7GTFCtMiIc73Rj2Dtxi1Yxn8rdOLbeXx122lhF0OIEDHA7IVMl
	JuoAbXv44ys7KCrVkv6HuAQNONKBMqri+/ZnY9lcI4zbQ/BGOiuIUd9N9PzSU/G6w37a
	Tip+OQPc/LX+hTSo+ue2qDXee2A2s0DSLKGsag8nMEakJ9cAFNQZ3P+gmA4ABSzU1cf/
	hQUhx3Y3kQazQWEG123ZRRV/1ds2k5fbnJoIKx8MYPvlld3jT1UzqIcgnKlTuuxyyQSk
	Cf5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:sender:from:to:cc:subject:date:message-id
	:in-reply-to:references;
	bh=29ZdsvcDI+jdCnQDsb4Bt36IF7LWD0aypzaO5CIuArg=;
	b=cNuBQiUhHgBwnluehZV9gmslotnbSZncJLEbFnwijLGRP2kFRiG+YlaPWyjDe/scyr
	WBo7OfUYnghDb1FbFuGE4QUQwnSehBr9g2D9++G3EuDRlaa3LqfxyZgmNcez1x1MjG0i
	JEF9vmxAdUOcRlwKfUY8CPH5CMtEMOAPQd79zAvofTauDrGRCmZ36wdvbaMnRU0EGxI0
	n3qb1cexG7k2mlaaT6UCpNX+gYuyS46UOMc0Y26dhIu8nqwM2v9Vzx23BX1MbSpp/RVF
	En50XKHImRVNEcYXNA+0f2j5ssq5hveAkk39HIUH/W177hpRDSZDVv5QgMl7ZqRBf63r
	p4hQ==
X-Gm-Message-State: 
 ALyK8tIj10/Ws/8c7ZQPauShge6CnpeuxVzs3NhPn6cux+wWMNT83m6EFyo8JlX9c9sYMQ==
X-Received: by 10.28.228.137 with SMTP id b131mr243407wmh.43.1464970152380;
	Fri, 03 Jun 2016 09:09:12 -0700 (PDT)
Received: from haswell.alporthouse.com ([78.156.65.138])
	by smtp.gmail.com with ESMTPSA id
	a195sm283343wma.2.2016.06.03.09.09.11
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Fri, 03 Jun 2016 09:09:11 -0700 (PDT)
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Fri,  3 Jun 2016 17:08:44 +0100
Message-Id: <1464970133-29859-13-git-send-email-chris@chris-wilson.co.uk>
X-Mailer: git-send-email 2.8.1
In-Reply-To: <1464970133-29859-1-git-send-email-chris@chris-wilson.co.uk>
References: <1464970133-29859-1-git-send-email-chris@chris-wilson.co.uk>
Subject: [Intel-gfx] [PATCH 12/21] drm/i915: Add a delay between interrupt
	and inspecting the final seqno (ilk)
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

On Ironlake, there is no command nor register to ensure that the write
from a MI_STORE command is completed (and coherent on the CPU) before the
command parser continues. This means that the ordering between the seqno
write and the subsequent user interrupt is undefined (like gen6+). So to
ensure that the seqno write is completed after the final user interrupt
we need to delay the read sufficiently to allow the write to complete.
This delay is undefined by the bspec, and empirically requires 75us even
though a register read combined with a clflush is less than 500ns. Hence,
the delay is due to an on-chip buffer rather than the latency of the write
to memory.

Note that the render ring controls this by filling the PIPE_CONTROL fifo
with stalling commands that force the earliest pipe-control with the
seqno to be completed before the command parser continues. Given that we
need a barrier operation for BSD, we may as well forgo the extra
per-batch latency by using a common per-interrupt barrier.

Studying the impact of adding the usleep shows that in both sequences of
and individual synchronous no-op batches is negligible for the media
engine (where the write now is unordered with the interrupt). Converting
the render engine over from the current glutton of pie-controls over to
the per-interrupt delays speeds up both the sequential and individual
synchronous no-ops by 20% and 60%, respectively. This speed up holds
even when looking at the throughput of small copies (4KiB->4MiB), both
serial and synchronous, by about 20%. This is because despite adding a
significant delay to the interrupt, in all likelihood we will see the
seqno write without having to apply the barrier (only in the rare corner
cases where the write is delayed on the last required is the delay
necessary).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94307
Testcase: igt/gem_sync #ilk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c         | 10 ++---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 80 ++++++++-------------------------
 2 files changed, 21 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 4013ad92cdc6..c14eb57b5807 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1264,8 +1264,7 @@ static void ivybridge_parity_error_irq_handler(struct drm_i915_private *dev_priv
 static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
-	if (gt_iir &
-	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
+	if (gt_iir & GT_RENDER_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[RCS]);
 	if (gt_iir & ILK_BSD_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[VCS]);
@@ -1274,9 +1273,7 @@ static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv,
 static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
-
-	if (gt_iir &
-	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
+	if (gt_iir & GT_RENDER_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[RCS]);
 	if (gt_iir & GT_BSD_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[VCS]);
@@ -3600,8 +3597,7 @@ static void gen5_gt_irq_postinstall(struct drm_device *dev)
 
 	gt_irqs |= GT_RENDER_USER_INTERRUPT;
 	if (IS_GEN5(dev)) {
-		gt_irqs |= GT_RENDER_PIPECTL_NOTIFY_INTERRUPT |
-			   ILK_BSD_USER_INTERRUPT;
+		gt_irqs |= ILK_BSD_USER_INTERRUPT;
 	} else {
 		gt_irqs |= GT_BLT_USER_INTERRUPT | GT_BSD_USER_INTERRUPT;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ca2e59405998..30e400d77d23 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1508,67 +1508,22 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
 	return 0;
 }
 
-#define PIPE_CONTROL_FLUSH(ring__, addr__)					\
-do {									\
-	intel_ring_emit(ring__, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |		\
-		 PIPE_CONTROL_DEPTH_STALL);				\
-	intel_ring_emit(ring__, (addr__) | PIPE_CONTROL_GLOBAL_GTT);			\
-	intel_ring_emit(ring__, 0);							\
-	intel_ring_emit(ring__, 0);							\
-} while (0)
-
-static int
-pc_render_add_request(struct drm_i915_gem_request *req)
+static void
+gen5_seqno_barrier(struct intel_engine_cs *ring)
 {
-	struct intel_engine_cs *engine = req->engine;
-	u32 addr = engine->status_page.gfx_addr +
-		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	u32 scratch_addr = addr;
-	int ret;
-
-	/* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently
-	 * incoherent with writes to memory, i.e. completely fubar,
-	 * so we need to use PIPE_NOTIFY instead.
+	/* MI_STORE are internally buffered by the GPU and not flushed
+	 * either by MI_FLUSH or SyncFlush or any other combination of
+	 * MI commands.
 	 *
-	 * However, we also need to workaround the qword write
-	 * incoherence by flushing the 6 PIPE_NOTIFY buffers out to
-	 * memory before requesting an interrupt.
+	 * "Only the submission of the store operation is guaranteed.
+	 * The write result will be complete (coherent) some time later
+	 * (this is practically a finite period but there is no guaranteed
+	 * latency)."
+	 *
+	 * Empirically, we observe that we need a delay of at least 75us to
+	 * be sure that the seqno write is visible by the CPU.
 	 */
-	ret = intel_ring_begin(req, 32);
-	if (ret)
-		return ret;
-
-	intel_ring_emit(engine,
-			GFX_OP_PIPE_CONTROL(4) |
-			PIPE_CONTROL_QW_WRITE |
-			PIPE_CONTROL_WRITE_FLUSH |
-			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
-	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(engine, req->seqno);
-	intel_ring_emit(engine, 0);
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-
-	intel_ring_emit(engine,
-		       	GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
-			PIPE_CONTROL_WRITE_FLUSH |
-			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
-			PIPE_CONTROL_NOTIFY);
-	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(engine, req->seqno);
-	intel_ring_emit(engine, 0);
-	__intel_ring_advance(engine);
-
-	return 0;
+	usleep_range(75, 250);
 }
 
 static void
@@ -2825,12 +2780,12 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 			engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
 		}
 	} else if (IS_GEN5(dev_priv)) {
-		engine->add_request = pc_render_add_request;
+		engine->add_request = i9xx_add_request;
 		engine->flush = gen4_render_ring_flush;
 		engine->irq_get = gen5_ring_get_irq;
 		engine->irq_put = gen5_ring_put_irq;
-		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT |
-					GT_RENDER_PIPECTL_NOTIFY_INTERRUPT;
+		engine->irq_seqno_barrier = gen5_seqno_barrier;
+		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 	} else {
 		engine->add_request = i9xx_add_request;
 		if (INTEL_GEN(dev_priv) < 4)
@@ -2867,7 +2822,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	if (ret)
 		return ret;
 
-	if (INTEL_GEN(dev_priv) >= 5) {
+	if (INTEL_GEN(dev_priv) >= 6) {
 		ret = intel_init_pipe_control(engine, 4096);
 		if (ret)
 			return ret;
@@ -2940,6 +2895,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
 			engine->irq_get = gen5_ring_get_irq;
 			engine->irq_put = gen5_ring_put_irq;
+			engine->irq_seqno_barrier = gen5_seqno_barrier;
 		} else {
 			engine->irq_enable_mask = I915_BSD_USER_INTERRUPT;
 			engine->irq_get = i9xx_ring_get_irq;