From patchwork Fri Aug 17 12:24:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10568699 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F0989109C for ; Fri, 17 Aug 2018 12:24:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DF7452AF9F for ; Fri, 17 Aug 2018 12:24:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D2CC82AFB5; Fri, 17 Aug 2018 12:24:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 762412AF9F for ; Fri, 17 Aug 2018 12:24:24 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AEBB96E13E; Fri, 17 Aug 2018 12:24:22 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5760A6E13E for ; Fri, 17 Aug 2018 12:24:21 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 12984574-1500050 for multiple; Fri, 17 Aug 2018 13:24:14 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 17 Aug 2018 13:24:10 +0100 Message-Id: <20180817122410.8437-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.18.0 MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH] drm/i915/execlists: Micro-optimise "idle" context switch X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP On gen9, we see an effect where when we perform an element switch just as the first context completes execution that switch takes twice as long, as if it first reloads the completed context. That is we observe the cost of context1 -> idle -> context1 -> context2 as being twice the cost of the same operation as on gen8. The impact of this is incredibly rare outside of microbenchmarks that are focused on assessing the throughput of context switches. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: MichaƂ Winiarski --- I think is a microbenchmark too far, as there is no real world impact of this as both the likelihood of submission at that precise point of time, and the context switch being a significant fraction of the batch runtime make the effect miniscule in practise. It is also not foolproof for even gem_ctx_switch: kbl ctx1 -> idle -> ctx2: ~25us; ctx1 -> idle -> ctx1 -> ctx2 (unpatched): ~53us ctx1 -> idle -> ctx1 -> ctx2 (patched): 30-40us bxt ctx1 -> idle -> ctx2: ~40us ctx1 -> idle -> ctx1 -> ctx2 (unpatched): ~80 ctx1 -> idle -> ctx1 -> ctx2 (patched): 60-70us So consider this as more of a plea for ideas; why does bdw behaviour better? Are we missing a flag, a fox or a chicken? -Chris --- drivers/gpu/drm/i915/intel_lrc.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 36050f085071..682268d4249d 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -711,6 +711,24 @@ static void execlists_dequeue(struct intel_engine_cs *engine) GEM_BUG_ON(last->hw_context == rq->hw_context); + /* + * Avoid reloading the previous context if we + * know it has just completed and we want + * to switch over to a new context. The CS + * interrupt is likely waiting for us to + * release the local irq lock and so we will + * proceed with the submission momentarily, + * which is quicker than reloading the context + * on the gpu. + */ + if (!submit && + intel_engine_signaled(engine, + last->global_seqno)) { + GEM_BUG_ON(!list_is_first(&rq->sched.link, + &p->requests)); + return; + } + if (submit) port_assign(port, last); port++;