From patchwork Fri Jun 3 16:08:44 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 9153489 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 84C6D60221 for ; Fri, 3 Jun 2016 16:09:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 756C325EF7 for ; Fri, 3 Jun 2016 16:09:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 696EE28309; Fri, 3 Jun 2016 16:09:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D3E1B25EF7 for ; Fri, 3 Jun 2016 16:09:40 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B34676EE16; Fri, 3 Jun 2016 16:09:30 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com [IPv6:2a00:1450:400c:c09::244]) by gabe.freedesktop.org (Postfix) with ESMTPS id 34E0E6EE0A for ; Fri, 3 Jun 2016 16:09:14 +0000 (UTC) Received: by mail-wm0-x244.google.com with SMTP id a20so428558wma.3 for ; Fri, 03 Jun 2016 09:09:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=29ZdsvcDI+jdCnQDsb4Bt36IF7LWD0aypzaO5CIuArg=; b=pK22pw6yu1P7UBLKiC83CeaoAkbvXVrfz+pRWoW4Q0R71DeZcfFiPwIHGizEedFx31 0VZIqqDIph/YI5NthJ7GTFCtMiIc73Rj2Dtxi1Yxn8rdOLbeXx122lhF0OIEDHA7IVMl JuoAbXv44ys7KCrVkv6HuAQNONKBMqri+/ZnY9lcI4zbQ/BGOiuIUd9N9PzSU/G6w37a Tip+OQPc/LX+hTSo+ue2qDXee2A2s0DSLKGsag8nMEakJ9cAFNQZ3P+gmA4ABSzU1cf/ hQUhx3Y3kQazQWEG123ZRRV/1ds2k5fbnJoIKx8MYPvlld3jT1UzqIcgnKlTuuxyyQSk Cf5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=29ZdsvcDI+jdCnQDsb4Bt36IF7LWD0aypzaO5CIuArg=; b=cNuBQiUhHgBwnluehZV9gmslotnbSZncJLEbFnwijLGRP2kFRiG+YlaPWyjDe/scyr WBo7OfUYnghDb1FbFuGE4QUQwnSehBr9g2D9++G3EuDRlaa3LqfxyZgmNcez1x1MjG0i JEF9vmxAdUOcRlwKfUY8CPH5CMtEMOAPQd79zAvofTauDrGRCmZ36wdvbaMnRU0EGxI0 n3qb1cexG7k2mlaaT6UCpNX+gYuyS46UOMc0Y26dhIu8nqwM2v9Vzx23BX1MbSpp/RVF En50XKHImRVNEcYXNA+0f2j5ssq5hveAkk39HIUH/W177hpRDSZDVv5QgMl7ZqRBf63r p4hQ== X-Gm-Message-State: ALyK8tIj10/Ws/8c7ZQPauShge6CnpeuxVzs3NhPn6cux+wWMNT83m6EFyo8JlX9c9sYMQ== X-Received: by 10.28.228.137 with SMTP id b131mr243407wmh.43.1464970152380; Fri, 03 Jun 2016 09:09:12 -0700 (PDT) Received: from haswell.alporthouse.com ([78.156.65.138]) by smtp.gmail.com with ESMTPSA id a195sm283343wma.2.2016.06.03.09.09.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 03 Jun 2016 09:09:11 -0700 (PDT) From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 3 Jun 2016 17:08:44 +0100 Message-Id: <1464970133-29859-13-git-send-email-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1464970133-29859-1-git-send-email-chris@chris-wilson.co.uk> References: <1464970133-29859-1-git-send-email-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH 12/21] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP On Ironlake, there is no command nor register to ensure that the write from a MI_STORE command is completed (and coherent on the CPU) before the command parser continues. This means that the ordering between the seqno write and the subsequent user interrupt is undefined (like gen6+). So to ensure that the seqno write is completed after the final user interrupt we need to delay the read sufficiently to allow the write to complete. This delay is undefined by the bspec, and empirically requires 75us even though a register read combined with a clflush is less than 500ns. Hence, the delay is due to an on-chip buffer rather than the latency of the write to memory. Note that the render ring controls this by filling the PIPE_CONTROL fifo with stalling commands that force the earliest pipe-control with the seqno to be completed before the command parser continues. Given that we need a barrier operation for BSD, we may as well forgo the extra per-batch latency by using a common per-interrupt barrier. Studying the impact of adding the usleep shows that in both sequences of and individual synchronous no-op batches is negligible for the media engine (where the write now is unordered with the interrupt). Converting the render engine over from the current glutton of pie-controls over to the per-interrupt delays speeds up both the sequential and individual synchronous no-ops by 20% and 60%, respectively. This speed up holds even when looking at the throughput of small copies (4KiB->4MiB), both serial and synchronous, by about 20%. This is because despite adding a significant delay to the interrupt, in all likelihood we will see the seqno write without having to apply the barrier (only in the rare corner cases where the write is delayed on the last required is the delay necessary). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94307 Testcase: igt/gem_sync #ilk Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_irq.c | 10 ++--- drivers/gpu/drm/i915/intel_ringbuffer.c | 80 ++++++++------------------------- 2 files changed, 21 insertions(+), 69 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 4013ad92cdc6..c14eb57b5807 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1264,8 +1264,7 @@ static void ivybridge_parity_error_irq_handler(struct drm_i915_private *dev_priv static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv, u32 gt_iir) { - if (gt_iir & - (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT)) + if (gt_iir & GT_RENDER_USER_INTERRUPT) notify_ring(&dev_priv->engine[RCS]); if (gt_iir & ILK_BSD_USER_INTERRUPT) notify_ring(&dev_priv->engine[VCS]); @@ -1274,9 +1273,7 @@ static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv, static void snb_gt_irq_handler(struct drm_i915_private *dev_priv, u32 gt_iir) { - - if (gt_iir & - (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT)) + if (gt_iir & GT_RENDER_USER_INTERRUPT) notify_ring(&dev_priv->engine[RCS]); if (gt_iir & GT_BSD_USER_INTERRUPT) notify_ring(&dev_priv->engine[VCS]); @@ -3600,8 +3597,7 @@ static void gen5_gt_irq_postinstall(struct drm_device *dev) gt_irqs |= GT_RENDER_USER_INTERRUPT; if (IS_GEN5(dev)) { - gt_irqs |= GT_RENDER_PIPECTL_NOTIFY_INTERRUPT | - ILK_BSD_USER_INTERRUPT; + gt_irqs |= ILK_BSD_USER_INTERRUPT; } else { gt_irqs |= GT_BLT_USER_INTERRUPT | GT_BSD_USER_INTERRUPT; } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index ca2e59405998..30e400d77d23 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1508,67 +1508,22 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req, return 0; } -#define PIPE_CONTROL_FLUSH(ring__, addr__) \ -do { \ - intel_ring_emit(ring__, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE | \ - PIPE_CONTROL_DEPTH_STALL); \ - intel_ring_emit(ring__, (addr__) | PIPE_CONTROL_GLOBAL_GTT); \ - intel_ring_emit(ring__, 0); \ - intel_ring_emit(ring__, 0); \ -} while (0) - -static int -pc_render_add_request(struct drm_i915_gem_request *req) +static void +gen5_seqno_barrier(struct intel_engine_cs *ring) { - struct intel_engine_cs *engine = req->engine; - u32 addr = engine->status_page.gfx_addr + - (I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT); - u32 scratch_addr = addr; - int ret; - - /* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently - * incoherent with writes to memory, i.e. completely fubar, - * so we need to use PIPE_NOTIFY instead. + /* MI_STORE are internally buffered by the GPU and not flushed + * either by MI_FLUSH or SyncFlush or any other combination of + * MI commands. * - * However, we also need to workaround the qword write - * incoherence by flushing the 6 PIPE_NOTIFY buffers out to - * memory before requesting an interrupt. + * "Only the submission of the store operation is guaranteed. + * The write result will be complete (coherent) some time later + * (this is practically a finite period but there is no guaranteed + * latency)." + * + * Empirically, we observe that we need a delay of at least 75us to + * be sure that the seqno write is visible by the CPU. */ - ret = intel_ring_begin(req, 32); - if (ret) - return ret; - - intel_ring_emit(engine, - GFX_OP_PIPE_CONTROL(4) | - PIPE_CONTROL_QW_WRITE | - PIPE_CONTROL_WRITE_FLUSH | - PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE); - intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT); - intel_ring_emit(engine, req->seqno); - intel_ring_emit(engine, 0); - PIPE_CONTROL_FLUSH(engine, scratch_addr); - scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */ - PIPE_CONTROL_FLUSH(engine, scratch_addr); - scratch_addr += 2 * CACHELINE_BYTES; - PIPE_CONTROL_FLUSH(engine, scratch_addr); - scratch_addr += 2 * CACHELINE_BYTES; - PIPE_CONTROL_FLUSH(engine, scratch_addr); - scratch_addr += 2 * CACHELINE_BYTES; - PIPE_CONTROL_FLUSH(engine, scratch_addr); - scratch_addr += 2 * CACHELINE_BYTES; - PIPE_CONTROL_FLUSH(engine, scratch_addr); - - intel_ring_emit(engine, - GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE | - PIPE_CONTROL_WRITE_FLUSH | - PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | - PIPE_CONTROL_NOTIFY); - intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT); - intel_ring_emit(engine, req->seqno); - intel_ring_emit(engine, 0); - __intel_ring_advance(engine); - - return 0; + usleep_range(75, 250); } static void @@ -2825,12 +2780,12 @@ int intel_init_render_ring_buffer(struct drm_device *dev) engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC; } } else if (IS_GEN5(dev_priv)) { - engine->add_request = pc_render_add_request; + engine->add_request = i9xx_add_request; engine->flush = gen4_render_ring_flush; engine->irq_get = gen5_ring_get_irq; engine->irq_put = gen5_ring_put_irq; - engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT | - GT_RENDER_PIPECTL_NOTIFY_INTERRUPT; + engine->irq_seqno_barrier = gen5_seqno_barrier; + engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT; } else { engine->add_request = i9xx_add_request; if (INTEL_GEN(dev_priv) < 4) @@ -2867,7 +2822,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev) if (ret) return ret; - if (INTEL_GEN(dev_priv) >= 5) { + if (INTEL_GEN(dev_priv) >= 6) { ret = intel_init_pipe_control(engine, 4096); if (ret) return ret; @@ -2940,6 +2895,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev) engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT; engine->irq_get = gen5_ring_get_irq; engine->irq_put = gen5_ring_put_irq; + engine->irq_seqno_barrier = gen5_seqno_barrier; } else { engine->irq_enable_mask = I915_BSD_USER_INTERRUPT; engine->irq_get = i9xx_ring_get_irq;