From patchwork Fri Aug 17 12:24:10 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Chris Wilson <chris@chris-wilson.co.uk>
X-Patchwork-Id: 10568699
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F0989109C
	for <patchwork-intel-gfx@patchwork.kernel.org>;
 Fri, 17 Aug 2018 12:24:24 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DF7452AF9F
	for <patchwork-intel-gfx@patchwork.kernel.org>;
 Fri, 17 Aug 2018 12:24:24 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id D2CC82AFB5; Fri, 17 Aug 2018 12:24:24 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 762412AF9F
	for <patchwork-intel-gfx@patchwork.kernel.org>;
 Fri, 17 Aug 2018 12:24:24 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id AEBB96E13E;
	Fri, 17 Aug 2018 12:24:22 +0000 (UTC)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 5760A6E13E
 for <intel-gfx@lists.freedesktop.org>; Fri, 17 Aug 2018 12:24:21 +0000 (UTC)
X-Default-Received-SPF: pass (skip=forwardok (res=PASS))
 x-ip-name=78.156.65.138;
Received: from haswell.alporthouse.com (unverified [78.156.65.138])
 by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 12984574-1500050
 for multiple; Fri, 17 Aug 2018 13:24:14 +0100
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Fri, 17 Aug 2018 13:24:10 +0100
Message-Id: <20180817122410.8437-1-chris@chris-wilson.co.uk>
X-Mailer: git-send-email 2.18.0
MIME-Version: 1.0
Subject: [Intel-gfx] [PATCH] drm/i915/execlists: Micro-optimise "idle"
 context switch
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

On gen9, we see an effect where when we perform an element switch just
as the first context completes execution that switch takes twice as
long, as if it first reloads the completed context. That is we observe
the cost of

	context1 -> idle -> context1 ->  context2

as being twice the cost of the same operation as on gen8. The impact of
this is incredibly rare outside of microbenchmarks that are focused on
assessing the throughput of context switches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
I think is a microbenchmark too far, as there is no real world impact of
this as both the likelihood of submission at that precise point of time,
and the context switch being a significant fraction of the batch
runtime make the effect miniscule in practise.

It is also not foolproof for even gem_ctx_switch:
kbl ctx1 -> idle -> ctx2: ~25us;
    ctx1 -> idle -> ctx1 -> ctx2 (unpatched): ~53us
    ctx1 -> idle -> ctx1 -> ctx2 (patched): 30-40us

bxt ctx1 -> idle -> ctx2: ~40us
    ctx1 -> idle -> ctx1 -> ctx2 (unpatched): ~80
    ctx1 -> idle -> ctx1 -> ctx2 (patched): 60-70us

So consider this as more of a plea for ideas; why does bdw behaviour
better? Are we missing a flag, a fox or a chicken?
-Chris
---
 drivers/gpu/drm/i915/intel_lrc.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 36050f085071..682268d4249d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -711,6 +711,24 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 				GEM_BUG_ON(last->hw_context == rq->hw_context);
 
+				/*
+				 * Avoid reloading the previous context if we
+				 * know it has just completed and we want
+				 * to switch over to a new context. The CS
+				 * interrupt is likely waiting for us to
+				 * release the local irq lock and so we will
+				 * proceed with the submission momentarily,
+				 * which is quicker than reloading the context
+				 * on the gpu.
+				 */
+				if (!submit &&
+				    intel_engine_signaled(engine,
+							  last->global_seqno)) {
+					GEM_BUG_ON(!list_is_first(&rq->sched.link,
+								  &p->requests));
+					return;
+				}
+
 				if (submit)
 					port_assign(port, last);
 				port++;