diff mbox

[4/4] drm/i915: Remove the spin-request during execbuf await_request

Message ID 20170605102619.4679-4-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson June 5, 2017, 10:26 a.m. UTC
Originally we would enable and disable the breadcrumb interrupt
immediately on demand. This was slow enough to have a large impact
(>30%) on tasks that hopped between engines. However, by using a shadow
to keep the irq alive for an extra interrupt (see commit 67b807a89230
("drm/i915: Delay disabling the user interrupt for breadcrumbs")) and
by recently reducing the cost in adding ourselves to the signal tree, we
no longer need to spin-request during await_request to avoid delays in
throughput tests. Without the earlier patches to stop the wakeup when
signaling if the irq was already active, we saw no improvement in
execbuf overhead (and corresponding contention in other clients) despite
the removal of the spinner in a simple test like glxgears. This means
that will be scenarios where now we spend longer enabling the interrupt
than we would have spent spinning, but these are not likely to have as
noticeable an impact as the high frequency test cases (where there
should not be any regression).

Ulterior motive: generalising the engine->sync_to to handle different
types of semaphores and non-semaphores.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

Comments

Tvrtko Ursulin June 7, 2017, 10:22 a.m. UTC | #1
On 05/06/2017 11:26, Chris Wilson wrote:
> Originally we would enable and disable the breadcrumb interrupt
> immediately on demand. This was slow enough to have a large impact
> (>30%) on tasks that hopped between engines. However, by using a shadow
> to keep the irq alive for an extra interrupt (see commit 67b807a89230
> ("drm/i915: Delay disabling the user interrupt for breadcrumbs")) and
> by recently reducing the cost in adding ourselves to the signal tree, we
> no longer need to spin-request during await_request to avoid delays in
> throughput tests. Without the earlier patches to stop the wakeup when
> signaling if the irq was already active, we saw no improvement in
> execbuf overhead (and corresponding contention in other clients) despite
> the removal of the spinner in a simple test like glxgears. This means
> that will be scenarios where now we spend longer enabling the interrupt

"There will be" I guess?

> than we would have spent spinning, but these are not likely to have as
> noticeable an impact as the high frequency test cases (where there
> should not be any regression).
> 
> Ulterior motive: generalising the engine->sync_to to handle different
> types of semaphores and non-semaphores.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Oscar Mateo <oscar.mateo@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_request.c | 18 ++++++------------
>   1 file changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 46d869e26b4d..8c59c79cbd8b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -683,7 +683,6 @@ static int
>   i915_gem_request_await_request(struct drm_i915_gem_request *to,
>   			       struct drm_i915_gem_request *from)
>   {
> -	u32 seqno;
>   	int ret;
>   
>   	GEM_BUG_ON(to == from);
> @@ -707,18 +706,14 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
>   		return ret < 0 ? ret : 0;
>   	}
>   
> -	seqno = i915_gem_request_global_seqno(from);
> -	if (!seqno)
> -		goto await_dma_fence;
> +	if (to->engine->semaphore.sync_to) {
> +		u32 seqno;
>   
> -	if (!to->engine->semaphore.sync_to) {
> -		if (!__i915_gem_request_started(from, seqno))
> -			goto await_dma_fence;
> +		GEM_BUG_ON(!from->engine->semaphore.signal);
>   
> -		if (!__i915_spin_request(from, seqno, TASK_INTERRUPTIBLE, 2))
> +		seqno = i915_gem_request_global_seqno(from);
> +		if (!seqno)
>   			goto await_dma_fence;
> -	} else {
> -		GEM_BUG_ON(!from->engine->semaphore.signal);
>   
>   		if (seqno <= to->timeline->global_sync[from->engine->id])
>   			return 0;
> @@ -729,10 +724,9 @@ i915_gem_request_await_request(struct drm_i915_gem_request *to,
>   			return ret;
>   
>   		to->timeline->global_sync[from->engine->id] = seqno;
> +		return 0;
>   	}
>   
> -	return 0;
> -
>   await_dma_fence:
>   	ret = i915_sw_fence_await_dma_fence(&to->submit,
>   					    &from->fence, 0,
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
Joonas Lahtinen June 8, 2017, 10:02 a.m. UTC | #2
On ma, 2017-06-05 at 11:26 +0100, Chris Wilson wrote:
> Originally we would enable and disable the breadcrumb interrupt
> immediately on demand. This was slow enough to have a large impact
> (>30%) on tasks that hopped between engines. However, by using a shadow
> to keep the irq alive for an extra interrupt (see commit 67b807a89230
> ("drm/i915: Delay disabling the user interrupt for breadcrumbs")) and
> by recently reducing the cost in adding ourselves to the signal tree, we
> no longer need to spin-request during await_request to avoid delays in
> throughput tests. Without the earlier patches to stop the wakeup when
> signaling if the irq was already active, we saw no improvement in
> execbuf overhead (and corresponding contention in other clients) despite
> the removal of the spinner in a simple test like glxgears. This means
> that will be scenarios where now we spend longer enabling the interrupt

      ^ there ?          "now where we" ? 

> than we would have spent spinning, but these are not likely to have as
> noticeable an impact as the high frequency test cases (where there
> should not be any regression).
> 
> Ulterior motive: generalising the engine->sync_to to handle different
> types of semaphores and non-semaphores.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Oscar Mateo <oscar.mateo@intel.com>

Does what is described, so code itself is:

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Some Testcase:'s would be cool, without those it's bit handwavy. Maybe
an Ack from Tvrtko.

Regards, Joonas
Chris Wilson June 8, 2017, 10:07 a.m. UTC | #3
Quoting Joonas Lahtinen (2017-06-08 11:02:40)
> On ma, 2017-06-05 at 11:26 +0100, Chris Wilson wrote:
> > Originally we would enable and disable the breadcrumb interrupt
> > immediately on demand. This was slow enough to have a large impact
> > (>30%) on tasks that hopped between engines. However, by using a shadow
> > to keep the irq alive for an extra interrupt (see commit 67b807a89230
> > ("drm/i915: Delay disabling the user interrupt for breadcrumbs")) and
> > by recently reducing the cost in adding ourselves to the signal tree, we
> > no longer need to spin-request during await_request to avoid delays in
> > throughput tests. Without the earlier patches to stop the wakeup when
> > signaling if the irq was already active, we saw no improvement in
> > execbuf overhead (and corresponding contention in other clients) despite
> > the removal of the spinner in a simple test like glxgears. This means
> > that will be scenarios where now we spend longer enabling the interrupt
> 
>       ^ there ?          "now where we" ? 
> 
> > than we would have spent spinning, but these are not likely to have as
> > noticeable an impact as the high frequency test cases (where there
> > should not be any regression).
> > 
> > Ulterior motive: generalising the engine->sync_to to handle different
> > types of semaphores and non-semaphores.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Oscar Mateo <oscar.mateo@intel.com>
> 
> Does what is described, so code itself is:
> 
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> 
> Some Testcase:'s would be cool, without those it's bit handwavy. Maybe
> an Ack from Tvrtko.

Testcase for what though? It doesn't fix or improve anything, it just
moves the overhead of the breadcrumb from one column to another.

And we are sorely lacking performance tests in igt. Correctness wise
there are many that cover this await_request.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 46d869e26b4d..8c59c79cbd8b 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -683,7 +683,6 @@  static int
 i915_gem_request_await_request(struct drm_i915_gem_request *to,
 			       struct drm_i915_gem_request *from)
 {
-	u32 seqno;
 	int ret;
 
 	GEM_BUG_ON(to == from);
@@ -707,18 +706,14 @@  i915_gem_request_await_request(struct drm_i915_gem_request *to,
 		return ret < 0 ? ret : 0;
 	}
 
-	seqno = i915_gem_request_global_seqno(from);
-	if (!seqno)
-		goto await_dma_fence;
+	if (to->engine->semaphore.sync_to) {
+		u32 seqno;
 
-	if (!to->engine->semaphore.sync_to) {
-		if (!__i915_gem_request_started(from, seqno))
-			goto await_dma_fence;
+		GEM_BUG_ON(!from->engine->semaphore.signal);
 
-		if (!__i915_spin_request(from, seqno, TASK_INTERRUPTIBLE, 2))
+		seqno = i915_gem_request_global_seqno(from);
+		if (!seqno)
 			goto await_dma_fence;
-	} else {
-		GEM_BUG_ON(!from->engine->semaphore.signal);
 
 		if (seqno <= to->timeline->global_sync[from->engine->id])
 			return 0;
@@ -729,10 +724,9 @@  i915_gem_request_await_request(struct drm_i915_gem_request *to,
 			return ret;
 
 		to->timeline->global_sync[from->engine->id] = seqno;
+		return 0;
 	}
 
-	return 0;
-
 await_dma_fence:
 	ret = i915_sw_fence_await_dma_fence(&to->submit,
 					    &from->fence, 0,