From patchwork Fri Sep 14 08:00:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10600505 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E34F4112B for ; Fri, 14 Sep 2018 09:16:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1D9B2B0EF for ; Fri, 14 Sep 2018 09:16:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B5DD42A45B; Fri, 14 Sep 2018 09:16:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 511A52A45B for ; Fri, 14 Sep 2018 09:16:44 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F01266E771; Fri, 14 Sep 2018 09:10:49 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id BB7B06E008 for ; Fri, 14 Sep 2018 08:00:36 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 13764966-1500050 for multiple; Fri, 14 Sep 2018 09:00:14 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 14 Sep 2018 09:00:15 +0100 Message-Id: <20180914080017.30308-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.19.0 MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 1/3] drm/i915: Limit the backpressure for i915_request allocation X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Vetter Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP If we try and fail to allocate a i915_request, we apply some backpressure on the clients to throttle the memory allocations coming from i915.ko. Currently, we wait until completely idle, but this is far too heavy and leads to some situations where the only escape is to declare a client hung and reset the GPU. The intent is to only ratelimit the allocation requests and to allow ourselves to recycle requests and memory from any long queues built up by a client hog. Although the system memory is inherently a global resources, we don't want to overly penalize an unlucky client to pay the price of reaping a hog. To reduce the influence of one client on another, we can instead of waiting for the entire GPU to idle, impose a barrier on the local client. (One end goal for request allocation is for scalability to many concurrent allocators; simultaneous execbufs.) To prevent ourselves from getting caught out by long running requests (requests that may never finish without userspace intervention, whom we are blocking) we need to impose a finite timeout, ideally shorter than hangcheck. A long time ago Paul McKenney suggested that RCU users should ratelimit themselves using judicious use of cond_synchronize_rcu(). This gives us the opportunity to reduce our indefinite wait for the GPU to idle to a wait for the RCU grace period of the previous allocation along this timeline to expire, satisfying both the local and finite properties we desire for our ratelimiting. There are still a few global steps (reclaim not least amongst those!) when we exhaust the immediate slab pool, at least now the wait is itself decoupled from struct_mutex for our glorious highly parallel future! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680 Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: Joonas Lahtinen Cc: Daniel Vetter Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_request.c | 14 ++++++++------ drivers/gpu/drm/i915/i915_request.h | 8 ++++++++ 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 09ed48833b54..a492385b2089 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -732,13 +732,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) rq = kmem_cache_alloc(i915->requests, GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); if (unlikely(!rq)) { + i915_retire_requests(i915); + /* Ratelimit ourselves to prevent oom from malicious clients */ - ret = i915_gem_wait_for_idle(i915, - I915_WAIT_LOCKED | - I915_WAIT_INTERRUPTIBLE, - MAX_SCHEDULE_TIMEOUT); - if (ret) - goto err_unreserve; + rq = i915_gem_active_raw(&ce->ring->timeline->last_request, + &i915->drm.struct_mutex); + if (rq) + cond_synchronize_rcu(rq->rcustate); /* * We've forced the client to stall and catch up with whatever @@ -758,6 +758,8 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) } } + rq->rcustate = get_state_synchronize_rcu(); + INIT_LIST_HEAD(&rq->active_list); rq->i915 = i915; rq->engine = engine; diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 9898301ab7ef..7fa94b024968 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -100,6 +100,14 @@ struct i915_request { struct i915_timeline *timeline; struct intel_signal_node signaling; + /* + * The rcu epoch of when this request was allocated. Used to judiciously + * apply backpressure on future allocations to ensure that under + * mempressure there is sufficient RCU ticks for us to reclaim our + * RCU protected slabs. + */ + unsigned long rcustate; + /* * Fences for the various phases in the request's lifetime. * From patchwork Fri Sep 14 08:00:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10600539 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF2A0112B for ; Fri, 14 Sep 2018 09:21:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB2092ADCB for ; Fri, 14 Sep 2018 09:21:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BF6E22ADEC; Fri, 14 Sep 2018 09:21:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 84BD52ADCB for ; Fri, 14 Sep 2018 09:21:04 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D3B006E5DF; Fri, 14 Sep 2018 09:11:51 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id AB2B66E008 for ; Fri, 14 Sep 2018 08:01:59 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 13764967-1500050 for multiple; Fri, 14 Sep 2018 09:00:14 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 14 Sep 2018 09:00:16 +0100 Message-Id: <20180914080017.30308-2-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20180914080017.30308-1-chris@chris-wilson.co.uk> References: <20180914080017.30308-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 2/3] drm/i915: Flush the tasklet when checking for idle X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP In order to reduce latency when checking for idle we kick the tasklet directly. Sometimes this is not enough as it is queued on another cpu and so to improve the accuracy of this idle-check (and so to reduce latency overall by avoiding another pass, or worse declaring a timeout!) wait for the tasklet to complete. References: https://bugs.freedesktop.org/show_bug.cgi?id=107916 Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: Mika Kuoppala Cc: Michel Thierry Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/intel_engine_cs.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 10cd051ba29e..217ed3ee1cab 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -990,6 +990,9 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) } local_bh_enable(); + /* Otherwise flush the tasklet if it was on another cpu */ + tasklet_unlock_wait(t); + if (READ_ONCE(engine->execlists.active)) return false; } From patchwork Fri Sep 14 08:00:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10600501 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 838F6112B for ; Fri, 14 Sep 2018 09:16:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6227C2A45B for ; Fri, 14 Sep 2018 09:16:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 552402B28B; Fri, 14 Sep 2018 09:16:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 092DE2A45B for ; Fri, 14 Sep 2018 09:16:15 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B48826E7CC; Fri, 14 Sep 2018 09:10:44 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id D70506E008 for ; Fri, 14 Sep 2018 08:00:35 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 13764968-1500050 for multiple; Fri, 14 Sep 2018 09:00:15 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 14 Sep 2018 09:00:17 +0100 Message-Id: <20180914080017.30308-3-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20180914080017.30308-1-chris@chris-wilson.co.uk> References: <20180914080017.30308-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 3/3] drm/i915/execlists: Reset CSB pointers on canceling requests (wedging) X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP The prior assumption was that we did not need to reset the CSB on wedging when cancelling the outstanding requests as it would be cleaned up in the subsequent reset prior to restarting the GPU. However, what was not accounted for was that in performing the reset, we would try to process the outstanding CSB entries. If the GPU happened to complete a CS event just as we were performing the cancellation of requests, that event would be kept in the CSB until the reset -- but our bookkeeping was cleared, causing confusion when trying to complete the CS event. v2: Use a sanitize on unwedge to avoid interfering with eio suspend (where we intentionally disable GPU reset). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107925 Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: Joonas Lahtinen Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/i915_gem.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d6f2bbd6a0dc..c8020719bcfb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3438,6 +3438,9 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915) i915_retire_requests(i915); GEM_BUG_ON(i915->gt.active_requests); + if (!intel_gpu_reset(i915, ALL_ENGINES)) + intel_engines_sanitize(i915); + /* * Undo nop_submit_request. We prevent all new i915 requests from * being queued (by disallowing execbuf whilst wedged) so having