From patchwork Mon Jan 11 10:44:57 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chris Wilson <chris@chris-wilson.co.uk>
X-Patchwork-Id: 8001591
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Original-To: patchwork-intel-gfx@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork2.web.kernel.org (Postfix) with ESMTP id 3128CBEEE5
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Mon, 11 Jan 2016 10:50:15 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 20FCC2028D
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Mon, 11 Jan 2016 10:50:14 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	by mail.kernel.org (Postfix) with ESMTP id 143F12024D
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Mon, 11 Jan 2016 10:50:13 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 0D8856E3F5;
	Mon, 11 Jan 2016 02:50:10 -0800 (PST)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com
	[74.125.82.67])
	by gabe.freedesktop.org (Postfix) with ESMTPS id E7D7C6E2CB
	for <intel-gfx@lists.freedesktop.org>;
	Mon, 11 Jan 2016 02:46:57 -0800 (PST)
Received: by mail-wm0-f67.google.com with SMTP id f206so25662064wmf.2
	for <intel-gfx@lists.freedesktop.org>;
	Mon, 11 Jan 2016 02:46:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=sender:from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=4ZvFnl4stoAiipUfe6Yxd5rt9elfElTzeD76XiicjLE=;
	b=AJsXulEknRKLYDsUFMmOkdmBn2FdojjDKko0F/wBdXwi+Stgc9Hbe8DqfceQijEpEy
	31fahLS1UBqzTAx0NALcgFVtk+2EgrjlTN1cAq+MyC/6PY1LfI/4FIivAgqP+l7m8RtF
	LlpnWQApN9sPvVzP1p890fHpKp6uY81ag9MHVxv5wt2XRbzdG/b7Se5mwcHBqXcDPD3l
	QaUo9aUPB7wWe4ui+m+wADSgCNUdyrH5FDHimqjnmw2VEiqMVPe57dDTZy4t2xE2C8O/
	B1TWup84iBUMybk9mK+KvnrCcRERw1reQ7k/NhN/fDrLSo3ol07vhrQW3IWOoEohJ/j0
	E10A==
X-Received: by 10.194.133.164 with SMTP id pd4mr2605577wjb.133.1452509216748;
	Mon, 11 Jan 2016 02:46:56 -0800 (PST)
Received: from haswell.alporthouse.com ([78.156.65.138])
	by smtp.gmail.com with ESMTPSA id
	t3sm118879383wjz.11.2016.01.11.02.46.55
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Mon, 11 Jan 2016 02:46:55 -0800 (PST)
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Mon, 11 Jan 2016 10:44:57 +0000
Message-Id: <1452509174-16671-27-git-send-email-chris@chris-wilson.co.uk>
X-Mailer: git-send-email 2.7.0.rc3
In-Reply-To: <1452509174-16671-1-git-send-email-chris@chris-wilson.co.uk>
References: <1452503961-14837-1-git-send-email-chris@chris-wilson.co.uk>
	<1452509174-16671-1-git-send-email-chris@chris-wilson.co.uk>
Subject: [Intel-gfx] [PATCH 113/190] drm/i915: Enable lockless lookup of
	request tracking via RCU
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY
	autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.

However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.

v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)

Paul E. McKenney wrote:

Another approach is synchronize_rcu() after some largish number of
requests.  The advantage of this approach is that it throttles the
production of callbacks at the source.  The corresponding disadvantage
is that it slows things up.

Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it.  Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair.  The
idea is to do something like this:

        cond_synchronize_rcu(cookie);
        cookie = get_state_synchronize_rcu();

You would of course do an initial get_state_synchronize_rcu() to
get things going.  This would not block unless there was less than
one grace period's worth of time between invocations.  But this
assumes a busy system, where there is almost always a grace period
in flight.  But you can make that happen as follows:

        cond_synchronize_rcu(cookie);
        cookie = get_state_synchronize_rcu();
        call_rcu(&my_rcu_head, noop_function);

Note that you need additional code to make sure that the old callback
has completed before doing a new one.  Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c          |  3 ++-
 drivers/gpu/drm/i915/i915_gem_request.c  |  2 +-
 drivers/gpu/drm/i915/i915_gem_request.h  | 24 +++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 15 +++++++++++----
 4 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6712ecf1239b..ee715558ecea 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4215,7 +4215,8 @@ i915_gem_load(struct drm_device *dev)
 	dev_priv->requests =
 		kmem_cache_create("i915_gem_request",
 				  sizeof(struct drm_i915_gem_request), 0,
-				  SLAB_HWCACHE_ALIGN,
+				  SLAB_HWCACHE_ALIGN |
+				  SLAB_DESTROY_BY_RCU,
 				  NULL);
 
 	INIT_LIST_HEAD(&dev_priv->context_list);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 61be8dda4a14..be24bde2e602 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -336,7 +336,7 @@ static void __i915_gem_request_retire_active(struct drm_i915_gem_request *req)
 	 */
 	list_for_each_entry_safe(active, next, &req->active_list, link) {
 		INIT_LIST_HEAD(&active->link);
-		active->request = NULL;
+		rcu_assign_pointer(active->request, NULL);
 
 		active->retire(active, req);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index c2e83584f8a2..f035db7c97cd 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -138,6 +138,12 @@ i915_gem_request_get(struct drm_i915_gem_request *req)
 	return to_request(fence_get(&req->fence));
 }
 
+static inline struct drm_i915_gem_request *
+i915_gem_request_get_rcu(struct drm_i915_gem_request *req)
+{
+	return to_request(fence_get_rcu(&req->fence));
+}
+
 static inline void
 i915_gem_request_put(struct drm_i915_gem_request *req)
 {
@@ -242,7 +248,23 @@ i915_gem_request_mark_active(struct drm_i915_gem_request *request,
 			     struct i915_gem_active *active)
 {
 	list_move(&active->link, &request->active_list);
-	active->request = request;
+	rcu_assign_pointer(active->request, request);
+}
+
+static inline struct drm_i915_gem_request *
+i915_gem_active_get_request_rcu(struct i915_gem_active *active)
+{
+	do {
+		struct drm_i915_gem_request *request;
+
+		request = rcu_dereference(active->request);
+		if (request == NULL)
+			return NULL;
+
+		request = i915_gem_request_get_rcu(request);
+		if (request)
+			return request;
+	} while (1);
 }
 
 #endif /* I915_GEM_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 4d44def8fb03..ee689c373a91 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -171,6 +171,8 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 	}
 
 	i915_gem_retire_requests(dev_priv->dev);
+	/* expedite the RCU grace period to free some request slabs */
+	synchronize_rcu_expedited();
 
 	return count;
 }
@@ -191,10 +193,15 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
  */
 unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv)
 {
-	return i915_gem_shrink(dev_priv, -1UL,
-			       I915_SHRINK_BOUND |
-			       I915_SHRINK_UNBOUND |
-			       I915_SHRINK_ACTIVE);
+	unsigned long freed;
+
+	freed = i915_gem_shrink(dev_priv, -1UL,
+				I915_SHRINK_BOUND |
+				I915_SHRINK_UNBOUND |
+				I915_SHRINK_ACTIVE);
+	rcu_barrier(); /* wait until our RCU delayed slab frees are completed */
+
+	return freed;
 }
 
 static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)