From patchwork Tue Aug 28 17:04:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10578849 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2690C174A for ; Tue, 28 Aug 2018 17:04:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 125702A8A3 for ; Tue, 28 Aug 2018 17:04:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 05CBC2A8D4; Tue, 28 Aug 2018 17:04:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 85B722A8F3 for ; Tue, 28 Aug 2018 17:04:53 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 060AE6E3B9; Tue, 28 Aug 2018 17:04:53 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id DED3F6E3BF for ; Tue, 28 Aug 2018 17:04:50 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 13346055-1500050 for multiple; Tue, 28 Aug 2018 18:04:33 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 28 Aug 2018 18:04:29 +0100 Message-Id: <20180828170429.28492-2-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180828170429.28492-1-chris@chris-wilson.co.uk> References: <20180828170429.28492-1-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH] drm/i915/ringbuffer: Delay after invalidating gen6+ xcs X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP During stress testing of full-ppgtt (on Baytrail at least), we found that the invalidation around a context/mm switch was insufficient (writes would go astray). Adding a second MI_FLUSH_DW barrier prevents this, but it is unclear as to whether this is merely a delaying tactic or if it is truly serialising with the TLB invalidation. Either way, it is empirically required. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107715 Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Cc: Mika Kuoppala Cc: Matthew Auld Cc: Tvrtko Ursulin Reviewed-by: Ville Syrjälä --- drivers/gpu/drm/i915/intel_ringbuffer.c | 101 +++++++++++------------- 1 file changed, 47 insertions(+), 54 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index d40f55a8dc34..952b6269bab0 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1944,40 +1944,62 @@ static void gen6_bsd_submit_request(struct i915_request *request) intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); } -static int gen6_bsd_ring_flush(struct i915_request *rq, u32 mode) +static int emit_mi_flush_dw(struct i915_request *rq, u32 mode, u32 invflags) { u32 cmd, *cs; - cs = intel_ring_begin(rq, 4); - if (IS_ERR(cs)) - return PTR_ERR(cs); + do { + cs = intel_ring_begin(rq, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); - cmd = MI_FLUSH_DW; + cmd = MI_FLUSH_DW; - /* We always require a command barrier so that subsequent - * commands, such as breadcrumb interrupts, are strictly ordered - * wrt the contents of the write cache being flushed to memory - * (and thus being coherent from the CPU). - */ - cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; + /* + * We always require a command barrier so that subsequent + * commands, such as breadcrumb interrupts, are strictly ordered + * wrt the contents of the write cache being flushed to memory + * (and thus being coherent from the CPU). + */ + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; - /* - * Bspec vol 1c.5 - video engine command streamer: - * "If ENABLED, all TLBs will be invalidated once the flush - * operation is complete. This bit is only valid when the - * Post-Sync Operation field is a value of 1h or 3h." - */ - if (mode & EMIT_INVALIDATE) - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD; + /* + * Bspec vol 1c.3 - blitter engine command streamer: + * "If ENABLED, all TLBs will be invalidated once the flush + * operation is complete. This bit is only valid when the + * Post-Sync Operation field is a value of 1h or 3h." + */ + if (mode & EMIT_INVALIDATE) + cmd |= invflags; + *cs++ = cmd; + *cs++ = I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT; + *cs++ = 0; + *cs++ = MI_NOOP; + intel_ring_advance(rq, cs); + + /* + * Not only do we need a full barrier (post-sync write) after + * invalidating the TLBs, but we need to wait a little bit + * longer. Whether this is merely delaying us, or the + * subsequent flush is a key part of serialising with the + * post-sync op, this extra pass appears vital before a + * mm switch! + */ + if (!(mode & EMIT_INVALIDATE)) + break; + + mode &= ~EMIT_INVALIDATE; + } while (1); - *cs++ = cmd; - *cs++ = I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT; - *cs++ = 0; - *cs++ = MI_NOOP; - intel_ring_advance(rq, cs); return 0; } +static int gen6_bsd_ring_flush(struct i915_request *rq, u32 mode) +{ + return emit_mi_flush_dw(rq, mode, + MI_INVALIDATE_TLB | MI_INVALIDATE_BSD); +} + static int hsw_emit_bb_start(struct i915_request *rq, u64 offset, u32 len, @@ -2022,36 +2044,7 @@ gen6_emit_bb_start(struct i915_request *rq, static int gen6_ring_flush(struct i915_request *rq, u32 mode) { - u32 cmd, *cs; - - cs = intel_ring_begin(rq, 4); - if (IS_ERR(cs)) - return PTR_ERR(cs); - - cmd = MI_FLUSH_DW; - - /* We always require a command barrier so that subsequent - * commands, such as breadcrumb interrupts, are strictly ordered - * wrt the contents of the write cache being flushed to memory - * (and thus being coherent from the CPU). - */ - cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; - - /* - * Bspec vol 1c.3 - blitter engine command streamer: - * "If ENABLED, all TLBs will be invalidated once the flush - * operation is complete. This bit is only valid when the - * Post-Sync Operation field is a value of 1h or 3h." - */ - if (mode & EMIT_INVALIDATE) - cmd |= MI_INVALIDATE_TLB; - *cs++ = cmd; - *cs++ = I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT; - *cs++ = 0; - *cs++ = MI_NOOP; - intel_ring_advance(rq, cs); - - return 0; + return emit_mi_flush_dw(rq, mode, MI_INVALIDATE_TLB); } static void intel_ring_init_semaphores(struct drm_i915_private *dev_priv,