From patchwork Wed Dec 19 14:57:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10737393 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C4566C5 for ; Wed, 19 Dec 2018 14:59:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4CB8A2A72C for ; Wed, 19 Dec 2018 14:59:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 409782A6D5; Wed, 19 Dec 2018 14:59:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id CABA52A6D5 for ; Wed, 19 Dec 2018 14:59:51 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 48CFB6F015; Wed, 19 Dec 2018 14:59:51 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 376EB6F012 for ; Wed, 19 Dec 2018 14:59:46 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 14973744-1500050 for multiple; Wed, 19 Dec 2018 14:57:54 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Wed, 19 Dec 2018 14:57:45 +0000 Message-Id: <20181219145747.19835-7-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.0 In-Reply-To: <20181219145747.19835-1-chris@chris-wilson.co.uk> References: <20181219145747.19835-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 7/9] drm/i915/ringbuffer: Move irq seqno barrier to the GPU for gen7 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP The irq_seqno_barrier is a tradeoff between doing work on every request (on the GPU) and doing work after every interrupt (on the CPU). We presume we have many more requests than interrupts! However, the current w/a for Ivybridge is an implicit delay that currently fails sporadically and consistently if we move the w/a into the irq handler itself. This makes the CPU barrier untenable for upcoming interrupt handler changes and so we need to replace it with a delay on the GPU before we send the MI_USER_INTERRUPT. As it turns out that delay is 32x MI_STORE_DWORD_IMM, or about 0.6us per request! Quite nasty, but the lesser of two evils looking to the future. Testcase: igt/gem_sync Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/intel_ringbuffer.c | 80 ++++++++++++++----------- 1 file changed, 44 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 2b8932ab007f..40797ee6a956 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -443,6 +443,34 @@ static void gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs) } static const int gen6_xcs_emit_breadcrumb_sz = 4; +#define GEN7_XCS_WA 32 +static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs) +{ + int i; + + *cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW; + *cs++ = intel_hws_seqno_address(rq->engine) | MI_FLUSH_DW_USE_GTT; + *cs++ = rq->global_seqno; + + for (i = 0; i < GEN7_XCS_WA; i++) { + *cs++ = MI_STORE_DWORD_INDEX; + *cs++ = I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT; + *cs++ = rq->global_seqno; + } + + *cs++ = MI_FLUSH_DW; + *cs++ = 0; + *cs++ = 0; + + *cs++ = MI_USER_INTERRUPT; + *cs++ = MI_NOOP; + + rq->tail = intel_ring_offset(rq, cs); + assert_ring_tail_valid(rq->ring, rq->tail); +} +static const int gen7_xcs_emit_breadcrumb_sz = 8 + GEN7_XCS_WA * 3; +#undef GEN7_XCS_WA + static void set_hwstam(struct intel_engine_cs *engine, u32 mask) { /* @@ -874,31 +902,6 @@ gen5_seqno_barrier(struct intel_engine_cs *engine) usleep_range(125, 250); } -static void -gen6_seqno_barrier(struct intel_engine_cs *engine) -{ - struct drm_i915_private *dev_priv = engine->i915; - - /* Workaround to force correct ordering between irq and seqno writes on - * ivb (and maybe also on snb) by reading from a CS register (like - * ACTHD) before reading the status page. - * - * Note that this effectively stalls the read by the time it takes to - * do a memory transaction, which more or less ensures that the write - * from the GPU has sufficient time to invalidate the CPU cacheline. - * Alternatively we could delay the interrupt from the CS ring to give - * the write time to land, but that would incur a delay after every - * batch i.e. much more frequent than a delay when waiting for the - * interrupt (with the same net latency). - * - * Also note that to prevent whole machine hangs on gen7, we have to - * take the spinlock to guard against concurrent cacheline access. - */ - spin_lock_irq(&dev_priv->uncore.lock); - POSTING_READ_FW(RING_ACTHD(engine->mmio_base)); - spin_unlock_irq(&dev_priv->uncore.lock); -} - static void gen5_irq_enable(struct intel_engine_cs *engine) { @@ -2218,10 +2221,13 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine) engine->emit_flush = gen6_bsd_ring_flush; engine->irq_enable_mask = GT_BSD_USER_INTERRUPT; - engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb; - engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz; - if (!IS_GEN(dev_priv, 6)) - engine->irq_seqno_barrier = gen6_seqno_barrier; + if (IS_GEN(dev_priv, 6)) { + engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb; + engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz; + } else { + engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb; + engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz; + } } else { engine->emit_flush = bsd_ring_flush; if (IS_GEN(dev_priv, 5)) @@ -2244,10 +2250,13 @@ int intel_init_blt_ring_buffer(struct intel_engine_cs *engine) engine->emit_flush = gen6_ring_flush; engine->irq_enable_mask = GT_BLT_USER_INTERRUPT; - engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb; - engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz; - if (!IS_GEN(dev_priv, 6)) - engine->irq_seqno_barrier = gen6_seqno_barrier; + if (IS_GEN(dev_priv, 6)) { + engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb; + engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz; + } else { + engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb; + engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz; + } return intel_init_ring_buffer(engine); } @@ -2265,9 +2274,8 @@ int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine) engine->irq_enable = hsw_vebox_irq_enable; engine->irq_disable = hsw_vebox_irq_disable; - engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb; - engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz; - engine->irq_seqno_barrier = gen6_seqno_barrier; + engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb; + engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz; return intel_init_ring_buffer(engine); }