diff mbox

[4/5] drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request

Message ID 1415789607-10243-5-git-send-email-nicholas.hoath@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nick Hoath Nov. 12, 2014, 10:53 a.m. UTC
Move all remaining elements that were unique to execlists queue items
in to the associated request.

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Issue: VIZ-4274
---
 drivers/gpu/drm/i915/i915_debugfs.c |  8 +++----
 drivers/gpu/drm/i915/i915_drv.h     | 22 +++++++++++++++++
 drivers/gpu/drm/i915/i915_gem.c     |  6 ++---
 drivers/gpu/drm/i915/intel_lrc.c    | 47 +++++++++++++++++--------------------
 drivers/gpu/drm/i915/intel_lrc.h    | 28 ----------------------
 5 files changed, 50 insertions(+), 61 deletions(-)

Comments

Chris Wilson Nov. 12, 2014, 11:24 a.m. UTC | #1
On Wed, Nov 12, 2014 at 10:53:26AM +0000, Nick Hoath wrote:
  		seq_putc(m, '\n');
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index afa9c35..0fe238c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2027,6 +2027,28 @@ struct drm_i915_gem_request {
>  	struct list_head free_list;
>  
>  	uint32_t uniq;
> +
> +	/**
> +	 * The ELSP only accepts two elements at a time, so we queue context/tail
> +	 * pairs on a given queue (ring->execlist_queue) until the hardware is
> +	 * available. The queue serves a double purpose: we also use it to keep track
> +	 * of the up to 2 contexts currently in the hardware (usually one in execution
> +	 * and the other queued up by the GPU): We only remove elements from the head
> +	 * of the queue when the hardware informs us that an element has been
> +	 * completed.
> +	 *
> +	 * All accesses to the queue are mediated by a spinlock (ring->execlist_lock).
> +	 */
> +
> +	/** Execlist link in the submission queue.*/
> +	struct list_head execlist_link;

This is redundant. The request should only be one of the pending or active
lists at any time.

> +	/** Execlists workqueue for processing this request in a bottom half */
> +	struct work_struct work;

For what purpose? This is not needed.

> +	/** Execlists no. of times this request has been sent to the ELSP */
> +	int elsp_submitted;

A request can only be submitted exactly once at any time. This
bookkeeping is not part of the request.

Still not detangled I am afraid.
-Chris
Nick Hoath Nov. 12, 2014, 12:13 p.m. UTC | #2
On 12/11/2014 11:24, Chris Wilson wrote:
> On Wed, Nov 12, 2014 at 10:53:26AM +0000, Nick Hoath wrote:
>    		seq_putc(m, '\n');
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index afa9c35..0fe238c 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2027,6 +2027,28 @@ struct drm_i915_gem_request {
>>   	struct list_head free_list;
>>
>>   	uint32_t uniq;
>> +
>> +	/**
>> +	 * The ELSP only accepts two elements at a time, so we queue context/tail
>> +	 * pairs on a given queue (ring->execlist_queue) until the hardware is
>> +	 * available. The queue serves a double purpose: we also use it to keep track
>> +	 * of the up to 2 contexts currently in the hardware (usually one in execution
>> +	 * and the other queued up by the GPU): We only remove elements from the head
>> +	 * of the queue when the hardware informs us that an element has been
>> +	 * completed.
>> +	 *
>> +	 * All accesses to the queue are mediated by a spinlock (ring->execlist_lock).
>> +	 */
>> +
>> +	/** Execlist link in the submission queue.*/
>> +	struct list_head execlist_link;
>
> This is redundant. The request should only be one of the pending or active
> lists at any time.
>
This is used by the pending execlist requests list owned by the 
intel_engine_cs. The request isn't in both the active and pending 
execlist engine lists.
>> +	/** Execlists workqueue for processing this request in a bottom half */
>> +	struct work_struct work;
>
> For what purpose? This is not needed.
This worker is currently used to free up execlist requests. This goes 
away when Thomas Daniel's patchset is merged.
I have spotted a bug in the cleanup handler with the merged 
requests/execlists cleanup though.
>
>> +	/** Execlists no. of times this request has been sent to the ELSP */
>> +	int elsp_submitted;
>
> A request can only be submitted exactly once at any time. This
> bookkeeping is not part of the request.
This is a refcount to preserve the request if it has been resubmitted 
due to preemption or TDR, due to a race condition between the HW 
finishing with the item and the cleanup/resubmission. Have a look at
e1fee72c2ea2e9c0c6e6743d32a6832f21337d6c which contains a much better
description of why this exists.
>
> Still not detangled I am afraid.
> -Chris
>
Daniel Vetter Dec. 10, 2014, 4:25 p.m. UTC | #3
On Wed, Nov 12, 2014 at 12:13:03PM +0000, Nick Hoath wrote:
> On 12/11/2014 11:24, Chris Wilson wrote:
> >On Wed, Nov 12, 2014 at 10:53:26AM +0000, Nick Hoath wrote:
> >   		seq_putc(m, '\n');
> >>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>index afa9c35..0fe238c 100644
> >>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>@@ -2027,6 +2027,28 @@ struct drm_i915_gem_request {
> >>  	struct list_head free_list;
> >>
> >>  	uint32_t uniq;
> >>+
> >>+	/**
> >>+	 * The ELSP only accepts two elements at a time, so we queue context/tail
> >>+	 * pairs on a given queue (ring->execlist_queue) until the hardware is
> >>+	 * available. The queue serves a double purpose: we also use it to keep track
> >>+	 * of the up to 2 contexts currently in the hardware (usually one in execution
> >>+	 * and the other queued up by the GPU): We only remove elements from the head
> >>+	 * of the queue when the hardware informs us that an element has been
> >>+	 * completed.
> >>+	 *
> >>+	 * All accesses to the queue are mediated by a spinlock (ring->execlist_lock).
> >>+	 */
> >>+
> >>+	/** Execlist link in the submission queue.*/
> >>+	struct list_head execlist_link;
> >
> >This is redundant. The request should only be one of the pending or active
> >lists at any time.
> >
> This is used by the pending execlist requests list owned by the
> intel_engine_cs. The request isn't in both the active and pending execlist
> engine lists.

I guess we'll eventually need to subsume this into the scheduler's list of
ctxs. Or reuse it there. Imo ok for now, might just need a rename
longterm.

> >>+	/** Execlists workqueue for processing this request in a bottom half */
> >>+	struct work_struct work;
> >
> >For what purpose? This is not needed.
> This worker is currently used to free up execlist requests. This goes away
> when Thomas Daniel's patchset is merged.
> I have spotted a bug in the cleanup handler with the merged
> requests/execlists cleanup though.

I guess this will drop out on a rebase now that everything has landed?

> >>+	/** Execlists no. of times this request has been sent to the ELSP */
> >>+	int elsp_submitted;
> >
> >A request can only be submitted exactly once at any time. This
> >bookkeeping is not part of the request.
> This is a refcount to preserve the request if it has been resubmitted due to
> preemption or TDR, due to a race condition between the HW finishing with the
> item and the cleanup/resubmission. Have a look at
> e1fee72c2ea2e9c0c6e6743d32a6832f21337d6c which contains a much better
> description of why this exists.

Yeah this one is still a bit crazy but guess that's just how it is ;-)
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 45da79e..9ce9a02 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1851,7 +1851,7 @@  static int i915_execlists(struct seq_file *m, void *data)
 	intel_runtime_pm_get(dev_priv);
 
 	for_each_ring(ring, dev_priv, ring_id) {
-		struct intel_ctx_submit_request *head_req = NULL;
+		struct drm_i915_gem_request *head_req = NULL;
 		int count = 0;
 		unsigned long flags;
 
@@ -1884,18 +1884,18 @@  static int i915_execlists(struct seq_file *m, void *data)
 		list_for_each(cursor, &ring->execlist_queue)
 			count++;
 		head_req = list_first_entry_or_null(&ring->execlist_queue,
-				struct intel_ctx_submit_request, execlist_link);
+				struct drm_i915_gem_request, execlist_link);
 		spin_unlock_irqrestore(&ring->execlist_lock, flags);
 
 		seq_printf(m, "\t%d requests in queue\n", count);
 		if (head_req) {
 			struct drm_i915_gem_object *ctx_obj;
 
-			ctx_obj = head_req->request->ctx->engine[ring_id].state;
+			ctx_obj = head_req->ctx->engine[ring_id].state;
 			seq_printf(m, "\tHead request id: %u\n",
 				   intel_execlists_ctx_id(ctx_obj));
 			seq_printf(m, "\tHead request tail: %u\n",
-				   head_req->request->tail);
+				   head_req->tail);
 		}
 
 		seq_putc(m, '\n');
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index afa9c35..0fe238c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2027,6 +2027,28 @@  struct drm_i915_gem_request {
 	struct list_head free_list;
 
 	uint32_t uniq;
+
+	/**
+	 * The ELSP only accepts two elements at a time, so we queue context/tail
+	 * pairs on a given queue (ring->execlist_queue) until the hardware is
+	 * available. The queue serves a double purpose: we also use it to keep track
+	 * of the up to 2 contexts currently in the hardware (usually one in execution
+	 * and the other queued up by the GPU): We only remove elements from the head
+	 * of the queue when the hardware informs us that an element has been
+	 * completed.
+	 *
+	 * All accesses to the queue are mediated by a spinlock (ring->execlist_lock).
+	 */
+
+	/** Execlist link in the submission queue.*/
+	struct list_head execlist_link;
+
+	/** Execlists workqueue for processing this request in a bottom half */
+	struct work_struct work;
+
+	/** Execlists no. of times this request has been sent to the ELSP */
+	int elsp_submitted;
+
 };
 
 void i915_gem_request_free(struct kref *req_ref);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index bd5a1e2..4d2d2e5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2696,14 +2696,14 @@  static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	}
 
 	while (!list_empty(&ring->execlist_queue)) {
-		struct intel_ctx_submit_request *submit_req;
+		struct drm_i915_gem_request *submit_req;
 
 		submit_req = list_first_entry(&ring->execlist_queue,
-				struct intel_ctx_submit_request,
+				struct drm_i915_gem_request,
 				execlist_link);
 		list_del(&submit_req->execlist_link);
 		intel_runtime_pm_put(dev_priv);
-		i915_gem_context_unreference(submit_req->request->ctx);
+		i915_gem_context_unreference(submit_req->ctx);
 		kfree(submit_req);
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4bd9572..b6ec012 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -382,8 +382,8 @@  static void execlists_submit_contexts(struct intel_engine_cs *ring,
 
 static void execlists_context_unqueue(struct intel_engine_cs *ring)
 {
-	struct intel_ctx_submit_request *req0 = NULL, *req1 = NULL;
-	struct intel_ctx_submit_request *cursor = NULL, *tmp = NULL;
+	struct drm_i915_gem_request *req0 = NULL, *req1 = NULL;
+	struct drm_i915_gem_request *cursor = NULL, *tmp = NULL;
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 
 	assert_spin_locked(&ring->execlist_lock);
@@ -396,7 +396,7 @@  static void execlists_context_unqueue(struct intel_engine_cs *ring)
 				 execlist_link) {
 		if (!req0) {
 			req0 = cursor;
-		} else if (req0->request->ctx == cursor->request->ctx) {
+		} else if (req0->ctx == cursor->ctx) {
 			/* Same ctx: ignore first request, as second request
 			 * will update tail past first request's workload */
 			cursor->elsp_submitted = req0->elsp_submitted;
@@ -411,9 +411,9 @@  static void execlists_context_unqueue(struct intel_engine_cs *ring)
 
 	WARN_ON(req1 && req1->elsp_submitted);
 
-	execlists_submit_contexts(ring, req0->request->ctx, req0->request->tail,
-				  req1 ? req1->request->ctx : NULL,
-				  req1 ? req1->request->tail : 0);
+	execlists_submit_contexts(ring, req0->ctx, req0->tail,
+				  req1 ? req1->ctx : NULL,
+				  req1 ? req1->tail : 0);
 
 	req0->elsp_submitted++;
 	if (req1)
@@ -424,17 +424,17 @@  static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 					   u32 request_id)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	struct intel_ctx_submit_request *head_req;
+	struct drm_i915_gem_request *head_req;
 
 	assert_spin_locked(&ring->execlist_lock);
 
 	head_req = list_first_entry_or_null(&ring->execlist_queue,
-					    struct intel_ctx_submit_request,
+					    struct drm_i915_gem_request,
 					    execlist_link);
 
 	if (head_req != NULL) {
 		struct drm_i915_gem_object *ctx_obj =
-				head_req->request->ctx->engine[ring->id].state;
+				head_req->ctx->engine[ring->id].state;
 		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
 			WARN(head_req->elsp_submitted == 0,
 			     "Never submitted head request\n");
@@ -512,16 +512,16 @@  void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring)
 
 static void execlists_free_request_task(struct work_struct *work)
 {
-	struct intel_ctx_submit_request *req =
-		container_of(work, struct intel_ctx_submit_request, work);
-	struct drm_device *dev = req->request->ring->dev;
+	struct drm_i915_gem_request *req =
+		container_of(work, struct drm_i915_gem_request, work);
+	struct drm_device *dev = req->ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
 	intel_runtime_pm_put(dev_priv);
 
 	mutex_lock(&dev->struct_mutex);
-	i915_gem_context_unreference(req->request->ctx);
-	i915_gem_request_unreference(req->request);
+	i915_gem_context_unreference(req->ctx);
+	i915_gem_request_unreference(req);
 	mutex_unlock(&dev->struct_mutex);
 
 	kfree(req);
@@ -532,16 +532,11 @@  static int execlists_context_queue(struct intel_engine_cs *ring,
 				   u32 tail,
 				   struct drm_i915_gem_request *request)
 {
-	struct intel_ctx_submit_request *req = NULL, *cursor;
+	struct drm_i915_gem_request *cursor;
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	unsigned long flags;
 	int num_elements = 0;
 
-	req = kzalloc(sizeof(*req), GFP_KERNEL);
-	if (req == NULL)
-		return -ENOMEM;
-	INIT_WORK(&req->work, execlists_free_request_task);
-
 	if(!request)
 	{
 		/*
@@ -559,9 +554,9 @@  static int execlists_context_queue(struct intel_engine_cs *ring,
 	{
 		WARN_ON(to != request->ctx);
 	}
-	req->request = request;
+	INIT_WORK(&request->work, execlists_free_request_task);
 	i915_gem_request_reference(request);
-	i915_gem_context_reference(req->request->ctx);
+	i915_gem_context_reference(request->ctx);
 
 	intel_runtime_pm_get(dev_priv);
 
@@ -572,13 +567,13 @@  static int execlists_context_queue(struct intel_engine_cs *ring,
 			break;
 
 	if (num_elements > 2) {
-		struct intel_ctx_submit_request *tail_req;
+		struct drm_i915_gem_request *tail_req;
 
 		tail_req = list_last_entry(&ring->execlist_queue,
-					   struct intel_ctx_submit_request,
+					   struct drm_i915_gem_request,
 					   execlist_link);
 
-		if (to == tail_req->request->ctx) {
+		if (to == tail_req->ctx) {
 			WARN(tail_req->elsp_submitted != 0,
 			     "More than 2 already-submitted reqs queued\n");
 			list_del(&tail_req->execlist_link);
@@ -586,7 +581,7 @@  static int execlists_context_queue(struct intel_engine_cs *ring,
 		}
 	}
 
-	list_add_tail(&req->execlist_link, &ring->execlist_queue);
+	list_add_tail(&request->execlist_link, &ring->execlist_queue);
 	if (num_elements == 0)
 		execlists_context_unqueue(ring);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index bc6ff0d..94fcd5c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -84,34 +84,6 @@  int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       u64 exec_start, u32 flags);
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
-/**
- * struct intel_ctx_submit_request - queued context submission request
- * @ctx: Context to submit to the ELSP.
- * @ring: Engine to submit it to.
- * @tail: how far in the context's ringbuffer this request goes to.
- * @execlist_link: link in the submission queue.
- * @work: workqueue for processing this request in a bottom half.
- * @elsp_submitted: no. of times this request has been sent to the ELSP.
- *
- * The ELSP only accepts two elements at a time, so we queue context/tail
- * pairs on a given queue (ring->execlist_queue) until the hardware is
- * available. The queue serves a double purpose: we also use it to keep track
- * of the up to 2 contexts currently in the hardware (usually one in execution
- * and the other queued up by the GPU): We only remove elements from the head
- * of the queue when the hardware informs us that an element has been
- * completed.
- *
- * All accesses to the queue are mediated by a spinlock (ring->execlist_lock).
- */
-struct intel_ctx_submit_request {
-	struct list_head execlist_link;
-	struct work_struct work;
-
-	int elsp_submitted;
-
-	struct drm_i915_gem_request *request;
-};
-
 void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring);
 
 #endif /* _INTEL_LRC_H_ */