RFC drm/i915: Remove (struct_mutex) locking for wait-ioctl

Message ID	1450869563-23892-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Wed, 23 Dec 2015 11:19:23 +0000 Message-Id: <1450869563-23892-1-git-send-email-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH] RFC drm/i915: Remove (struct_mutex) locking for wait-ioctl Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

1450869563-23892-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Wed, 23 Dec 2015 11:19:23 +0000
Message-Id: <1450869563-23892-1-git-send-email-chris@chris-wilson.co.uk>
Subject: [Intel-gfx] [PATCH] RFC drm/i915: Remove (struct_mutex) locking for
	wait-ioctl
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

Chris Wilson Dec. 23, 2015, 11:19 a.m. UTC

With a bit of care (and leniency) we can iterate over the object and
wait for previous rendering to complete with judicial use of atomic
reference counting. The ABI requires us to ensure that an active object
is eventually flushed (like the busy-ioctl) which is guaranteed by our
management of requests (i.e. everything that is submitted to hardware is
flushed in the same request). All we have to do is ensure that we can
detect when the requests are complete for reporting when the object is
idle (without triggering ETIME) - this is handled by
__i915_wait_request.

The biggest danger in the code is walking the object without holding any
locks. We iterate over the set of last requests and carefully grab a
reference upon it. (If it is changing beneath us, that is the usual
userspace race and even with locking you get the same indeterminate
results.) If the request is unreferenced beneath us, it will be disposed
of into the request cache - so we have to carefully order the retrieval
of the request pointer with its removal.

The impact of this is actually quite small - the return to userspace
following the wait was already lockless. What we achieve here is
completing an already finished wait without hitting the struct_mutex,
our hold is quite short and so we are typically just a victim of
contention rather than a cause.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         | 53 +++++++++++----------------------
 drivers/gpu/drm/i915/i915_gem_request.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_request.h |  8 ++++-
 3 files changed, 26 insertions(+), 37 deletions(-)

Comments

Chris Wilson Dec. 23, 2015, 11:32 a.m. UTC | #1

On Wed, Dec 23, 2015 at 11:19:23AM +0000, Chris Wilson wrote:
> With a bit of care (and leniency) we can iterate over the object and
> wait for previous rendering to complete with judicial use of atomic
> reference counting. The ABI requires us to ensure that an active object
> is eventually flushed (like the busy-ioctl) which is guaranteed by our
> management of requests (i.e. everything that is submitted to hardware is
> flushed in the same request). All we have to do is ensure that we can
> detect when the requests are complete for reporting when the object is
> idle (without triggering ETIME) - this is handled by
> __i915_wait_request.
> 
> The biggest danger in the code is walking the object without holding any
> locks. We iterate over the set of last requests and carefully grab a
> reference upon it. (If it is changing beneath us, that is the usual
> userspace race and even with locking you get the same indeterminate
> results.) If the request is unreferenced beneath us, it will be disposed
> of into the request cache - so we have to carefully order the retrieval
> of the request pointer with its removal.
> 
> The impact of this is actually quite small - the return to userspace
> following the wait was already lockless. What we achieve here is
> completing an already finished wait without hitting the struct_mutex,
> our hold is quite short and so we are typically just a victim of
> contention rather than a cause.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Some food for thought. I would especially like someone to poke holes in
the racy pointer lookup and check the store mb() versus the
rcu-reference.

The same technique would also apply to busy-ioctl, and that's about the
limits of its applicablity. Though isolating those from contention and
having to do a flush-active should be beneficial system-wide.
-Chris

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c169574758d5..33de69eff93a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2414,57 +2414,40 @@  i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_i915_gem_wait *args = data;
 	struct drm_i915_gem_object *obj;
-	struct drm_i915_gem_request *req[I915_NUM_RINGS];
-	int i, n = 0;
-	int ret;
+	int i, ret = 0;
 
 	if (args->flags != 0)
 		return -EINVAL;
 
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
-
 	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->bo_handle));
-	if (&obj->base == NULL) {
-		mutex_unlock(&dev->struct_mutex);
+	if (&obj->base == NULL)
 		return -ENOENT;
-	}
-
-	/* Need to make sure the object gets inactive eventually. */
-	ret = i915_gem_object_flush_active(obj);
-	if (ret)
-		goto out;
 
 	if (!obj->active)
 		goto out;
 
-	/* Do this after OLR check to make sure we make forward progress polling
-	 * on this IOCTL with a timeout == 0 (like busy ioctl)
-	 */
-	if (args->timeout_ns == 0) {
-		ret = -ETIME;
-		goto out;
-	}
-
 	for (i = 0; i < I915_NUM_RINGS; i++) {
-		if (obj->last_read[i].request == NULL)
+		struct drm_i915_gem_request *req;
+
+reload:
+		req = READ_ONCE(obj->last_read[i].request);
+		if (req == NULL)
 			continue;
 
-		req[n++] = i915_gem_request_get(obj->last_read[i].request);
+		req = i915_gem_request_get_rcu(req);
+		if (req == NULL)
+			goto reload;
+
+		ret = __i915_wait_request(req, true,
+					  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
+					  to_rps_client(file));
+		i915_gem_request_put(req);
+		if (ret)
+			goto out;
 	}
 
 out:
-	drm_gem_object_unreference(&obj->base);
-	mutex_unlock(&dev->struct_mutex);
-
-	for (i = 0; i < n; i++) {
-		if (ret == 0)
-			ret = __i915_wait_request(req[i], true,
-						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
-						  to_rps_client(file));
-		i915_gem_request_put(req[i]);
-	}
+	drm_gem_object_unreference_unlocked(&obj->base);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 4e087143b1d2..a1efaf3063f2 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -335,7 +335,7 @@  static void __i915_gem_request_retire_active(struct drm_i915_gem_request *req)
 		 * and pass along the auxiliary information.
 		 */
 		INIT_LIST_HEAD(&active->link);
-		active->request = NULL;
+		smp_store_mb(active->request, NULL);
 
 		active->retire(active, req);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index fed2abaa906e..5ab91cd32042 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -139,6 +139,12 @@  i915_gem_request_get(struct drm_i915_gem_request *req)
 	return to_request(fence_get(&req->fence));
 }
 
+static inline struct drm_i915_gem_request *
+i915_gem_request_get_rcu(struct drm_i915_gem_request *req)
+{
+	return to_request(fence_get_rcu(&req->fence));
+}
+
 static inline void
 i915_gem_request_put(struct drm_i915_gem_request *req)
 {
@@ -242,8 +248,8 @@  static inline void
 i915_gem_request_mark_active(struct drm_i915_gem_request *request,
 			     struct i915_gem_active *active)
 {
-	active->request = request;
 	list_move(&active->link, &request->active_list);
+	smp_store_mb(active->request, request);
 }
 
 #endif /* I915_GEM_REQUEST_H */

RFC drm/i915: Remove (struct_mutex) locking for wait-ioctl

Commit Message

Comments

Patch