diff mbox

[1/2] drm/i915: Only defer freeing of fence callback when also using the timer

Message ID 20180115090643.26696-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Jan. 15, 2018, 9:06 a.m. UTC
Without an accompanying timer (for internal fences), we can free the
fence callback immediately as we do not need to employ the RCU barrier
to serialise with the timer. By avoiding the RCU delay, we can avoid the
extra mempressure under heavy inter-engine request utilisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_sw_fence.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Comments

Tvrtko Ursulin Jan. 15, 2018, 10 a.m. UTC | #1
On 15/01/2018 09:06, Chris Wilson wrote:
> Without an accompanying timer (for internal fences), we can free the
> fence callback immediately as we do not need to employ the RCU barrier
> to serialise with the timer. By avoiding the RCU delay, we can avoid the
> extra mempressure under heavy inter-engine request utilisation.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_sw_fence.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 3669f5eeb91e..13021326d777 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -398,7 +398,12 @@ static void dma_i915_sw_fence_wake(struct dma_fence *dma,
>   	if (fence)
>   		i915_sw_fence_complete(fence);
>   
> -	irq_work_queue(&cb->work);
> +	if (cb->dma) {
> +		irq_work_queue(&cb->work);
> +		return;
> +	}
> +
> +	kfree(cb);
>   }
>   
>   static void irq_i915_sw_fence_work(struct irq_work *wrk)
> @@ -437,10 +442,12 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>   	i915_sw_fence_await(fence);
>   
>   	cb->dma = NULL;
> -	timer_setup(&cb->timer, timer_i915_sw_fence_wake, TIMER_IRQSAFE);
> -	init_irq_work(&cb->work, irq_i915_sw_fence_work);
>   	if (timeout) {
>   		cb->dma = dma_fence_get(dma);
> +		init_irq_work(&cb->work, irq_i915_sw_fence_work);
> +
> +		timer_setup(&cb->timer,
> +			    timer_i915_sw_fence_wake, TIMER_IRQSAFE);
>   		mod_timer(&cb->timer, round_jiffies_up(jiffies + timeout));
>   	}
>   
> 

Looks straightforward enough and I can't spot any problems with it. It 
would be good to have mentioned in the commit under which 
tests/benchmarks can the improvement be observed and how.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
Chris Wilson Jan. 15, 2018, 10:04 a.m. UTC | #2
Quoting Tvrtko Ursulin (2018-01-15 10:00:48)
> 
> On 15/01/2018 09:06, Chris Wilson wrote:
> > Without an accompanying timer (for internal fences), we can free the
> > fence callback immediately as we do not need to employ the RCU barrier
> > to serialise with the timer. By avoiding the RCU delay, we can avoid the
> > extra mempressure under heavy inter-engine request utilisation.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_sw_fence.c | 13 ++++++++++---
> >   1 file changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> > index 3669f5eeb91e..13021326d777 100644
> > --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> > @@ -398,7 +398,12 @@ static void dma_i915_sw_fence_wake(struct dma_fence *dma,
> >       if (fence)
> >               i915_sw_fence_complete(fence);
> >   
> > -     irq_work_queue(&cb->work);
> > +     if (cb->dma) {
> > +             irq_work_queue(&cb->work);
> > +             return;
> > +     }
> > +
> > +     kfree(cb);
> >   }
> >   
> >   static void irq_i915_sw_fence_work(struct irq_work *wrk)
> > @@ -437,10 +442,12 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
> >       i915_sw_fence_await(fence);
> >   
> >       cb->dma = NULL;
> > -     timer_setup(&cb->timer, timer_i915_sw_fence_wake, TIMER_IRQSAFE);
> > -     init_irq_work(&cb->work, irq_i915_sw_fence_work);
> >       if (timeout) {
> >               cb->dma = dma_fence_get(dma);
> > +             init_irq_work(&cb->work, irq_i915_sw_fence_work);
> > +
> > +             timer_setup(&cb->timer,
> > +                         timer_i915_sw_fence_wake, TIMER_IRQSAFE);
> >               mod_timer(&cb->timer, round_jiffies_up(jiffies + timeout));
> >       }
> >   
> > 
> 
> Looks straightforward enough and I can't spot any problems with it. It 
> would be good to have mentioned in the commit under which 
> tests/benchmarks can the improvement be observed and how.

gem_exec_nop/sequential !ivb

(I honestly thought that "inter-engine request utilisation" was
descriptive enough to work out what scenarios were affected.)
-Chris
Chris Wilson Jan. 15, 2018, 12:26 p.m. UTC | #3
Quoting Patchwork (2018-01-15 10:59:15)
> == Series Details ==
> 
> Series: series starting with [1/2] drm/i915: Only defer freeing of fence callback when also using the timer
> URL   : https://patchwork.freedesktop.org/series/36470/
> State : success
> 
> == Summary ==
> 
> Test gem_tiled_swapping:
>         Subgroup non-threaded:
>                 incomplete -> PASS       (shard-hsw) fdo#104218
> Test kms_flip:
>         Subgroup vblank-vs-dpms-suspend-interruptible:
>                 pass       -> INCOMPLETE (shard-hsw) fdo#103540
> Test kms_frontbuffer_tracking:
>         Subgroup fbc-1p-offscren-pri-shrfb-draw-blt:
>                 fail       -> PASS       (shard-snb) fdo#101623
> Test drv_suspend:
>         Subgroup fence-restore-untiled-hibernate:
>                 fail       -> SKIP       (shard-snb) fdo#103375

I have some work to do to improve selftest coverage here, but pushed
nevertheless. Thanks for the review,
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 3669f5eeb91e..13021326d777 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -398,7 +398,12 @@  static void dma_i915_sw_fence_wake(struct dma_fence *dma,
 	if (fence)
 		i915_sw_fence_complete(fence);
 
-	irq_work_queue(&cb->work);
+	if (cb->dma) {
+		irq_work_queue(&cb->work);
+		return;
+	}
+
+	kfree(cb);
 }
 
 static void irq_i915_sw_fence_work(struct irq_work *wrk)
@@ -437,10 +442,12 @@  int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	i915_sw_fence_await(fence);
 
 	cb->dma = NULL;
-	timer_setup(&cb->timer, timer_i915_sw_fence_wake, TIMER_IRQSAFE);
-	init_irq_work(&cb->work, irq_i915_sw_fence_work);
 	if (timeout) {
 		cb->dma = dma_fence_get(dma);
+		init_irq_work(&cb->work, irq_i915_sw_fence_work);
+
+		timer_setup(&cb->timer,
+			    timer_i915_sw_fence_wake, TIMER_IRQSAFE);
 		mod_timer(&cb->timer, round_jiffies_up(jiffies + timeout));
 	}