From patchwork Sun Nov 29 08:48:00 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 7718291 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 29EFE9F39B for ; Sun, 29 Nov 2015 08:48:42 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 252762057F for ; Sun, 29 Nov 2015 08:48:41 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id A7EFA2054B for ; Sun, 29 Nov 2015 08:48:39 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1D8F86E436; Sun, 29 Nov 2015 00:48:39 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [87.106.93.118]) by gabe.freedesktop.org (Postfix) with ESMTP id 207026E435 for ; Sun, 29 Nov 2015 00:48:37 -0800 (PST) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 49019813-1500048 for multiple; Sun, 29 Nov 2015 08:48:23 +0000 Received: by haswell.alporthouse.com (sSMTP sendmail emulation); Sun, 29 Nov 2015 08:48:20 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Sun, 29 Nov 2015 08:48:00 +0000 Message-Id: <1448786893-2522-3-git-send-email-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.6.2 In-Reply-To: <1448786893-2522-1-git-send-email-chris@chris-wilson.co.uk> References: <1448786893-2522-1-git-send-email-chris@chris-wilson.co.uk> X-Originating-IP: 78.156.65.138 X-Country: code=GB country="United Kingdom" ip=78.156.65.138 Cc: Daniel Vetter , Eero Tamminen , stable@kernel.vger.org, "Rantala, Valtteri" Subject: [Intel-gfx] [PATCH 02/15] drm/i915: Limit the busy wait on requests to 10us not 10ms! X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When waiting for high frequency requests, the finite amount of time required to set up the irq and wait upon it limits the response rate. By busywaiting on the request completion for a short while we can service the high frequency waits as quick as possible. However, if it is a slow request, we want to sleep as quickly as possible. The tradeoff between waiting and sleeping is roughly the time it takes to sleep on a request, on the order of a microsecond. Based on measurements of synchronous workloads from across big core and little atom, I have set the limit for busywaiting as 10 microseconds. In most of the synchronous cases, we can reduce the limit down to as little as 2 miscroseconds, but that leaves quite a few test cases regressing by factors of 3 and more. The code currently uses the jiffie clock, but that is far too coarse (on the order of 10 milliseconds) and results in poor interactivity as the CPU ends up being hogged by slow requests. To get microsecond resolution we need to use a high resolution timer. The cheapest of which is polling local_clock(), but that is only valid on the same CPU. If we switch CPUs because the task was preempted, we can also use that as an indicator that the system is too busy to waste cycles on spinning and we should sleep instead. __i915_spin_request was introduced in commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2] Author: Chris Wilson Date: Tue Apr 7 16:20:41 2015 +0100 drm/i915: Optimistically spin for the request completion v2: Drop full u64 for unsigned long - the timer is 32bit wraparound safe, so we can use native register sizes on smaller architectures. Mention the approximate microseconds units for elapsed time and add some extra comments describing the reason for busywaiting. v3: Raise the limit to 10us Reported-by: Jens Axboe Link: https://lkml.org/lkml/2015/11/12/621 Cc; "Rogozhkin, Dmitry V" Cc: Daniel Vetter Cc: Tvrtko Ursulin Cc: Eero Tamminen Cc: "Rantala, Valtteri" Cc: stable@kernel.vger.org Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem.c | 47 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 87fc34f5899f..bad112abb16b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1146,14 +1146,57 @@ static bool missed_irq(struct drm_i915_private *dev_priv, return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings); } +static unsigned long local_clock_us(unsigned *cpu) +{ + unsigned long t; + + /* Cheaply and approximately convert from nanoseconds to microseconds. + * The result and subsequent calculations are also defined in the same + * approximate microseconds units. The principal source of timing + * error here is from the simple truncation. + * + * Note that local_clock() is only defined wrt to the current CPU; + * the comparisons are no longer valid if we switch CPUs. Instead of + * blocking preemption for the entire busywait, we can detect the CPU + * switch and use that as indicator of system load and a reason to + * stop busywaiting, see busywait_stop(). + */ + *cpu = get_cpu(); + t = local_clock() >> 10; + put_cpu(); + + return t; +} + +static bool busywait_stop(unsigned long timeout, unsigned cpu) +{ + unsigned this_cpu; + + if (time_after(local_clock_us(&this_cpu), timeout)) + return true; + + return this_cpu != cpu; +} + static int __i915_spin_request(struct drm_i915_gem_request *req, int state) { unsigned long timeout; + unsigned cpu; + + /* When waiting for high frequency requests, e.g. during synchronous + * rendering split between the CPU and GPU, the finite amount of time + * required to set up the irq and wait upon it limits the response + * rate. By busywaiting on the request completion for a short while we + * can service the high frequency waits as quick as possible. However, + * if it is a slow request, we want to sleep as quickly as possible. + * The tradeoff between waiting and sleeping is roughly the time it + * takes to sleep on a request, on the order of a microsecond. + */ if (i915_gem_request_get_ring(req)->irq_refcount) return -EBUSY; - timeout = jiffies + 1; + timeout = local_clock_us(&cpu) + 10; while (!need_resched()) { if (i915_gem_request_completed(req, true)) return 0; @@ -1161,7 +1204,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state) if (signal_pending_state(state, current)) break; - if (time_after_eq(jiffies, timeout)) + if (busywait_stop(timeout, cpu)) break; cpu_relax_lowlatency();