diff mbox

[3/3] drm/i915: add fences to the request struct

Message ID 1417636147-29664-4-git-send-email-jbarnes@virtuousgeek.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jesse Barnes Dec. 3, 2014, 7:49 p.m. UTC
This simplifies the sync code quite a bit.  I don't think we'll be able
to get away with using the core fence code's seqno support, since we'll
be moving away from simple seqno comparisions with the scheduler and
preemption, but the additional code is pretty minimal anyway, and lets
us add additional debugging as needed, so it's probably fine to keep
either way.

We still need to add support for other rings here; we ought to be able
to do that with the timeline field of the ioctl (which will include
other "rings" like the display flip queue for example).

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_drv.h  |  6 +++
 drivers/gpu/drm/i915/i915_sync.c | 89 +++++++++++++++-------------------------
 2 files changed, 40 insertions(+), 55 deletions(-)

Comments

Chris Wilson Dec. 4, 2014, 9:13 a.m. UTC | #1
On Wed, Dec 03, 2014 at 11:49:07AM -0800, Jesse Barnes wrote:
> This simplifies the sync code quite a bit.  I don't think we'll be able
> to get away with using the core fence code's seqno support, since we'll
> be moving away from simple seqno comparisions with the scheduler and
> preemption, but the additional code is pretty minimal anyway, and lets
> us add additional debugging as needed, so it's probably fine to keep
> either way.
> 
> We still need to add support for other rings here; we ought to be able
> to do that with the timeline field of the ioctl (which will include
> other "rings" like the display flip queue for example).

I am ambivalent about this. I don't think migrating i915_request over to
use the heavyweight fence primitives is a good idea, so see little value
in embedding the struct fence inside i915_request vs a small bookkeeping
struct referencing i915_request.
-Chris
Daniel Vetter Dec. 4, 2014, 11:05 a.m. UTC | #2
On Thu, Dec 04, 2014 at 09:13:21AM +0000, Chris Wilson wrote:
> On Wed, Dec 03, 2014 at 11:49:07AM -0800, Jesse Barnes wrote:
> > This simplifies the sync code quite a bit.  I don't think we'll be able
> > to get away with using the core fence code's seqno support, since we'll
> > be moving away from simple seqno comparisions with the scheduler and
> > preemption, but the additional code is pretty minimal anyway, and lets
> > us add additional debugging as needed, so it's probably fine to keep
> > either way.
> > 
> > We still need to add support for other rings here; we ought to be able
> > to do that with the timeline field of the ioctl (which will include
> > other "rings" like the display flip queue for example).
> 
> I am ambivalent about this. I don't think migrating i915_request over to
> use the heavyweight fence primitives is a good idea, so see little value
> in embedding the struct fence inside i915_request vs a small bookkeeping
> struct referencing i915_request.

Which part of struct fence is too heavyweight? And I guess we could always
add a slab or some cache if the allocation overhead is too much. I really
like this conceptually so if there's a concern with overhead I want solid
data for it.
-Daniel
Chris Wilson Dec. 4, 2014, 11:29 a.m. UTC | #3
On Thu, Dec 04, 2014 at 12:05:34PM +0100, Daniel Vetter wrote:
> On Thu, Dec 04, 2014 at 09:13:21AM +0000, Chris Wilson wrote:
> > On Wed, Dec 03, 2014 at 11:49:07AM -0800, Jesse Barnes wrote:
> > > This simplifies the sync code quite a bit.  I don't think we'll be able
> > > to get away with using the core fence code's seqno support, since we'll
> > > be moving away from simple seqno comparisions with the scheduler and
> > > preemption, but the additional code is pretty minimal anyway, and lets
> > > us add additional debugging as needed, so it's probably fine to keep
> > > either way.
> > > 
> > > We still need to add support for other rings here; we ought to be able
> > > to do that with the timeline field of the ioctl (which will include
> > > other "rings" like the display flip queue for example).
> > 
> > I am ambivalent about this. I don't think migrating i915_request over to
> > use the heavyweight fence primitives is a good idea, so see little value
> > in embedding the struct fence inside i915_request vs a small bookkeeping
> > struct referencing i915_request.
> 
> Which part of struct fence is too heavyweight? And I guess we could always
> add a slab or some cache if the allocation overhead is too much. I really
> like this conceptually so if there's a concern with overhead I want solid
> data for it.

It uses locked atomic operations, which are unnecessary for the very
frequent is-complete checks (due to the nice ordering constraints of x86).

You have me confused. You keep applying patches without solid data to
back up your claims.
-Chris
Daniel Vetter Dec. 4, 2014, 12:58 p.m. UTC | #4
On Thu, Dec 4, 2014 at 12:29 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> It uses locked atomic operations, which are unnecessary for the very
> frequent is-complete checks (due to the nice ordering constraints of x86).

So let's look at the fastpaths:
- fence already signaled: Just a test_bit which amounts to a normal read on x86.
- fence not yet signaled: Just a call into the driver backend.
Obviously we want to keep the lazy coherency trick for i915, but
struct fence already has the concept of lazy coherency: By default
they only signal "somewhen", you need to call ->enable_signalling for
tight guarantees. We can just reuse the book-keeping bit struct fence
already has to compute lazy_coherency. So no additional atomics
compared to our current fastpath.
- signalling a fence: This one indeed did grow a test_and_set_bit, so
a regression in the fastpath (locks will only be taking when the fence
is already in the accurate slowpath mode). But this is only once per
fence so hopefully doesn't hurt. If it does we can fix it by trading
speed for memory and putting the signalling bit into it's own word,
resulting in a simple store.

So looks all good or fixable. The only real issue is that the slowpath
cost is beared by all threads if just one thread goes into a slowpath.
So a notch less fair for this case. But in the normal case where
everyone just checks business to manage caches this shouldn't happen.

Or did I miss something? In that case we should really try to fix
fences first, since they're meant to be fast in the fastpaths at least
and used universally.

> You have me confused. You keep applying patches without solid data to
> back up your claims.

I guess you mean the ggtt vma first, which imo really just reverts
premature optimization. But the point of the separate patch (and me
merging it quickly) is to catch regressions and issues. So if we
really need this I'll happily revert the patch (so that we can add the
perf data the first attempt lacked and store it for perpetuity).
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 367d95a..2725243 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -49,6 +49,7 @@ 
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
+#include <linux/fence.h>
 
 /* General customization:
  */
@@ -2025,6 +2026,11 @@  struct drm_i915_gem_request {
 	struct drm_i915_file_private *file_priv;
 	/** file_priv list entry for this request */
 	struct list_head client_list;
+
+	/* core fence obj for this request, may be exported */
+	struct fence fence;
+
+	wait_queue_t wait;
 };
 
 void i915_gem_request_free(struct kref *req_ref);
diff --git a/drivers/gpu/drm/i915/i915_sync.c b/drivers/gpu/drm/i915/i915_sync.c
index 4741b0c..ea54b34 100644
--- a/drivers/gpu/drm/i915/i915_sync.c
+++ b/drivers/gpu/drm/i915/i915_sync.c
@@ -50,25 +50,10 @@  static spinlock_t fence_lock;
  * with other Android timelines and aggregated into sync_fences, etc.
  *
  * TODO:
- *   rebase on top of Chris's seqno/request stuff and use requests
  *   allow non-RCS fences (need ring/context association)
  */
 
-struct i915_fence {
-	struct fence base;
-	struct intel_engine_cs *ring;
-	struct intel_context *ctx;
-	wait_queue_t wait;
-	struct drm_i915_gem_request *request;
-};
-
-#define to_intel_fence(x) container_of(x, struct i915_fence, base)
-
-int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
-		 unsigned reset_counter,
-		 bool interruptible,
-		 struct timespec *timeout,
-		 struct drm_i915_file_private *file_priv);
+#define to_i915_request(x) container_of(x, struct drm_i915_gem_request, fence)
 
 static const char *i915_fence_get_driver_name(struct fence *fence)
 {
@@ -77,24 +62,24 @@  static const char *i915_fence_get_driver_name(struct fence *fence)
 
 static const char *i915_fence_get_timeline_name(struct fence *fence)
 {
-	struct i915_fence *intel_fence = to_intel_fence(fence);
+	struct drm_i915_gem_request *req = to_i915_request(fence);
 
-	return intel_fence->ring->name;
+	return req->ring->name;
 }
 
 static int i915_fence_check(wait_queue_t *wait, unsigned mode, int flags,
 			    void *key)
 {
-	struct i915_fence *intel_fence = wait->private;
-	struct intel_engine_cs *ring = intel_fence->ring;
+	struct drm_i915_gem_request *req = wait->private;
+	struct intel_engine_cs *ring = req->ring;
 
-	if (!i915_gem_request_completed(intel_fence->request, false))
+	if (!i915_gem_request_completed(req, false))
 		return 0;
 
-	fence_signal_locked(&intel_fence->base);
+	fence_signal_locked(&req->fence);
 
 	__remove_wait_queue(&ring->irq_queue, wait);
-	fence_put(&intel_fence->base);
+	fence_put(&req->fence);
 	ring->irq_put(ring);
 
 	return 0;
@@ -102,26 +87,26 @@  static int i915_fence_check(wait_queue_t *wait, unsigned mode, int flags,
 
 static bool i915_fence_enable_signaling(struct fence *fence)
 {
-	struct i915_fence *intel_fence = to_intel_fence(fence);
-	struct intel_engine_cs *ring = intel_fence->ring;
+	struct drm_i915_gem_request *req = to_i915_request(fence);
+	struct intel_engine_cs *ring = req->ring;
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	wait_queue_t *wait = &intel_fence->wait;
+	wait_queue_t *wait = &req->wait;
 
 	/* queue fence wait queue on irq queue and get fence */
-	if (i915_gem_request_completed(intel_fence->request, false) ||
+	if (i915_gem_request_completed(req, false) ||
 	    i915_terminally_wedged(&dev_priv->gpu_error))
 		return false;
 
 	if (!ring->irq_get(ring))
 		return false;
 
-	if (i915_gem_request_completed(intel_fence->request, false)) {
+	if (i915_gem_request_completed(req, false)) {
 		ring->irq_put(ring);
 		return true;
 	}
 
 	wait->flags = 0;
-	wait->private = intel_fence;
+	wait->private = req;
 	wait->func = i915_fence_check;
 
 	__add_wait_queue(&ring->irq_queue, wait);
@@ -132,22 +117,22 @@  static bool i915_fence_enable_signaling(struct fence *fence)
 
 static bool i915_fence_signaled(struct fence *fence)
 {
-	struct i915_fence *intel_fence = to_intel_fence(fence);
+	struct drm_i915_gem_request *req = to_i915_request(fence);
 
-	return i915_gem_request_completed(intel_fence->request, false);
+	return i915_gem_request_completed(req, false);
 }
 
 static signed long i915_fence_wait(struct fence *fence, bool intr,
 				   signed long timeout_js)
 {
-	struct i915_fence *intel_fence = to_intel_fence(fence);
-	struct drm_i915_private *dev_priv = intel_fence->ring->dev->dev_private;
+	struct drm_i915_gem_request *req = to_i915_request(fence);
+	struct drm_i915_private *dev_priv = req->ring->dev->dev_private;
 	int ret;
 	s64 timeout;
 
 	timeout = jiffies_to_nsecs(timeout_js);
 
-	ret = __i915_wait_request(intel_fence->request,
+	ret = __i915_wait_request(req,
 				  atomic_read(&dev_priv->gpu_error.reset_counter),
 				  intr, &timeout, NULL);
 	if (ret == -ETIME)
@@ -159,29 +144,29 @@  static signed long i915_fence_wait(struct fence *fence, bool intr,
 static int i915_fence_fill_driver_data(struct fence *fence, void *data,
 				      int size)
 {
-	struct i915_fence *intel_fence = to_intel_fence(fence);
+	struct drm_i915_gem_request *req = to_i915_request(fence);
 
-	if (size < sizeof(intel_fence->request->seqno))
+	if (size < sizeof(req->seqno))
 		return -ENOMEM;
 
-	memcpy(data, &intel_fence->request->seqno,
-	       sizeof(intel_fence->request->seqno));
+	memcpy(data, &req->seqno,
+	       sizeof(req->seqno));
 
-	return sizeof(intel_fence->request->seqno);
+	return sizeof(req->seqno);
 }
 
 static void i915_fence_value_str(struct fence *fence, char *str, int size)
 {
-	struct i915_fence *intel_fence = to_intel_fence(fence);
+	struct drm_i915_gem_request *req = to_i915_request(fence);
 
-	snprintf(str, size, "%u", intel_fence->request->seqno);
+	snprintf(str, size, "%u", req->seqno);
 }
 
 static void i915_fence_timeline_value_str(struct fence *fence, char *str,
 					  int size)
 {
-	struct i915_fence *intel_fence = to_intel_fence(fence);
-	struct intel_engine_cs *ring = intel_fence->ring;
+	struct drm_i915_gem_request *req = to_i915_request(fence);
+	struct intel_engine_cs *ring = req->ring;
 
 	snprintf(str, size, "%u", ring->get_seqno(ring, false));
 }
@@ -200,27 +185,21 @@  static struct fence_ops i915_fence_ops = {
 static struct fence *i915_fence_create(struct intel_engine_cs *ring,
 				       struct intel_context *ctx)
 {
-	struct i915_fence *fence;
+	struct drm_i915_gem_request *req;
 	int ret;
 
-	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
-	if (!fence)
-		return NULL;
-
 	ret = ring->add_request(ring);
 	if (ret) {
 		DRM_ERROR("add_request failed\n");
-		fence_free((struct fence *)fence);
 		return NULL;
 	}
 
-	fence->ring = ring;
-	fence->ctx = ctx;
-	fence->request = ring->outstanding_lazy_request;
-	fence_init(&fence->base, &i915_fence_ops, &fence_lock, ctx->user_handle,
-		   fence->request->seqno);
+	req = ring->outstanding_lazy_request;
+
+	fence_init(&req->fence, &i915_fence_ops, &fence_lock,
+		   ctx->user_handle, req->seqno);
 
-	return &fence->base;
+	return &req->fence;
 }
 
 /**