Message ID | 20180115090643.26696-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 15/01/2018 09:06, Chris Wilson wrote: > Without an accompanying timer (for internal fences), we can free the > fence callback immediately as we do not need to employ the RCU barrier > to serialise with the timer. By avoiding the RCU delay, we can avoid the > extra mempressure under heavy inter-engine request utilisation. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_sw_fence.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c > index 3669f5eeb91e..13021326d777 100644 > --- a/drivers/gpu/drm/i915/i915_sw_fence.c > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c > @@ -398,7 +398,12 @@ static void dma_i915_sw_fence_wake(struct dma_fence *dma, > if (fence) > i915_sw_fence_complete(fence); > > - irq_work_queue(&cb->work); > + if (cb->dma) { > + irq_work_queue(&cb->work); > + return; > + } > + > + kfree(cb); > } > > static void irq_i915_sw_fence_work(struct irq_work *wrk) > @@ -437,10 +442,12 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence, > i915_sw_fence_await(fence); > > cb->dma = NULL; > - timer_setup(&cb->timer, timer_i915_sw_fence_wake, TIMER_IRQSAFE); > - init_irq_work(&cb->work, irq_i915_sw_fence_work); > if (timeout) { > cb->dma = dma_fence_get(dma); > + init_irq_work(&cb->work, irq_i915_sw_fence_work); > + > + timer_setup(&cb->timer, > + timer_i915_sw_fence_wake, TIMER_IRQSAFE); > mod_timer(&cb->timer, round_jiffies_up(jiffies + timeout)); > } > > Looks straightforward enough and I can't spot any problems with it. It would be good to have mentioned in the commit under which tests/benchmarks can the improvement be observed and how. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-01-15 10:00:48) > > On 15/01/2018 09:06, Chris Wilson wrote: > > Without an accompanying timer (for internal fences), we can free the > > fence callback immediately as we do not need to employ the RCU barrier > > to serialise with the timer. By avoiding the RCU delay, we can avoid the > > extra mempressure under heavy inter-engine request utilisation. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > --- > > drivers/gpu/drm/i915/i915_sw_fence.c | 13 ++++++++++--- > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c > > index 3669f5eeb91e..13021326d777 100644 > > --- a/drivers/gpu/drm/i915/i915_sw_fence.c > > +++ b/drivers/gpu/drm/i915/i915_sw_fence.c > > @@ -398,7 +398,12 @@ static void dma_i915_sw_fence_wake(struct dma_fence *dma, > > if (fence) > > i915_sw_fence_complete(fence); > > > > - irq_work_queue(&cb->work); > > + if (cb->dma) { > > + irq_work_queue(&cb->work); > > + return; > > + } > > + > > + kfree(cb); > > } > > > > static void irq_i915_sw_fence_work(struct irq_work *wrk) > > @@ -437,10 +442,12 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence, > > i915_sw_fence_await(fence); > > > > cb->dma = NULL; > > - timer_setup(&cb->timer, timer_i915_sw_fence_wake, TIMER_IRQSAFE); > > - init_irq_work(&cb->work, irq_i915_sw_fence_work); > > if (timeout) { > > cb->dma = dma_fence_get(dma); > > + init_irq_work(&cb->work, irq_i915_sw_fence_work); > > + > > + timer_setup(&cb->timer, > > + timer_i915_sw_fence_wake, TIMER_IRQSAFE); > > mod_timer(&cb->timer, round_jiffies_up(jiffies + timeout)); > > } > > > > > > Looks straightforward enough and I can't spot any problems with it. It > would be good to have mentioned in the commit under which > tests/benchmarks can the improvement be observed and how. gem_exec_nop/sequential !ivb (I honestly thought that "inter-engine request utilisation" was descriptive enough to work out what scenarios were affected.) -Chris
Quoting Patchwork (2018-01-15 10:59:15) > == Series Details == > > Series: series starting with [1/2] drm/i915: Only defer freeing of fence callback when also using the timer > URL : https://patchwork.freedesktop.org/series/36470/ > State : success > > == Summary == > > Test gem_tiled_swapping: > Subgroup non-threaded: > incomplete -> PASS (shard-hsw) fdo#104218 > Test kms_flip: > Subgroup vblank-vs-dpms-suspend-interruptible: > pass -> INCOMPLETE (shard-hsw) fdo#103540 > Test kms_frontbuffer_tracking: > Subgroup fbc-1p-offscren-pri-shrfb-draw-blt: > fail -> PASS (shard-snb) fdo#101623 > Test drv_suspend: > Subgroup fence-restore-untiled-hibernate: > fail -> SKIP (shard-snb) fdo#103375 I have some work to do to improve selftest coverage here, but pushed nevertheless. Thanks for the review, -Chris
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index 3669f5eeb91e..13021326d777 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -398,7 +398,12 @@ static void dma_i915_sw_fence_wake(struct dma_fence *dma, if (fence) i915_sw_fence_complete(fence); - irq_work_queue(&cb->work); + if (cb->dma) { + irq_work_queue(&cb->work); + return; + } + + kfree(cb); } static void irq_i915_sw_fence_work(struct irq_work *wrk) @@ -437,10 +442,12 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence, i915_sw_fence_await(fence); cb->dma = NULL; - timer_setup(&cb->timer, timer_i915_sw_fence_wake, TIMER_IRQSAFE); - init_irq_work(&cb->work, irq_i915_sw_fence_work); if (timeout) { cb->dma = dma_fence_get(dma); + init_irq_work(&cb->work, irq_i915_sw_fence_work); + + timer_setup(&cb->timer, + timer_i915_sw_fence_wake, TIMER_IRQSAFE); mod_timer(&cb->timer, round_jiffies_up(jiffies + timeout)); }
Without an accompanying timer (for internal fences), we can free the fence callback immediately as we do not need to employ the RCU barrier to serialise with the timer. By avoiding the RCU delay, we can avoid the extra mempressure under heavy inter-engine request utilisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> --- drivers/gpu/drm/i915/i915_sw_fence.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)