Message ID | 1418312494-22920-1-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Dec 11, 2014 at 03:41:34PM +0000, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Eliminate six needless spin lock/unlock pairs when writing ELSP. > > RFC for now with some #define copy and paste. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Dave Gordon <david.s.gordon@intel.com> Yeah makes sense. I'm on the fence whether we should do an all-uppercase conversion of the raw mmio macros, would be a nothc more consistent. And some perf data for this patch would be good, too. -Daniel > --- > drivers/gpu/drm/i915/intel_lrc.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index a82020e..f2f4a28 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -276,6 +276,10 @@ static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj) > return desc; > } > > +#define __raw_i915_read32(dev_priv__, reg__) readl((dev_priv__)->regs + (reg__)) > +#define __raw_i915_write32(dev_priv__, reg__, val__) writel(val__, (dev_priv__)->regs + (reg__)) > +#define __raw_posting_read(dev_priv__, reg__) (void)__raw_i915_read32(dev_priv__, reg__) > + > static void execlists_elsp_write(struct intel_engine_cs *ring, > struct drm_i915_gem_object *ctx_obj0, > struct drm_i915_gem_object *ctx_obj1) > @@ -323,19 +327,17 @@ static void execlists_elsp_write(struct intel_engine_cs *ring, > dev_priv->uncore.funcs.force_wake_get(dev_priv, > FORCEWAKE_ALL); > } > - spin_unlock_irqrestore(&dev_priv->uncore.lock, flags); > > - I915_WRITE(RING_ELSP(ring), desc[1]); > - I915_WRITE(RING_ELSP(ring), desc[0]); > - I915_WRITE(RING_ELSP(ring), desc[3]); > + __raw_i915_write32(dev_priv, RING_ELSP(ring), desc[1]); > + __raw_i915_write32(dev_priv, RING_ELSP(ring), desc[0]); > + __raw_i915_write32(dev_priv, RING_ELSP(ring), desc[3]); > /* The context is automatically loaded after the following */ > - I915_WRITE(RING_ELSP(ring), desc[2]); > + __raw_i915_write32(dev_priv, RING_ELSP(ring), desc[2]); > > /* ELSP is a wo register, so use another nearby reg for posting instead */ > - POSTING_READ(RING_EXECLIST_STATUS(ring)); > + __raw_posting_read(dev_priv, RING_EXECLIST_STATUS(ring)); > > /* Release Force Wakeup (see the big comment above). */ > - spin_lock_irqsave(&dev_priv->uncore.lock, flags); > if (IS_CHERRYVIEW(dev) || INTEL_INFO(dev)->gen >= 9) { > if (--dev_priv->uncore.fw_rendercount == 0) > dev_priv->uncore.funcs.force_wake_put(dev_priv, > -- > 2.1.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 12/15/2014 01:06 PM, Daniel Vetter wrote: > On Thu, Dec 11, 2014 at 03:41:34PM +0000, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Eliminate six needless spin lock/unlock pairs when writing ELSP. >> >> RFC for now with some #define copy and paste. >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Cc: Dave Gordon <david.s.gordon@intel.com> > > Yeah makes sense. I'm on the fence whether we should do an all-uppercase > conversion of the raw mmio macros, would be a nothc more consistent. And > some perf data for this patch would be good, too. I know perf data would be good but I had no time to set up a suitable platform for testing. This was more like a drive-by since it's not pretty so it annoyed me. Will see if I can get back to this in the near future. Regards, Tvrtko
On 12/15/2014 01:06 PM, Daniel Vetter wrote: > On Thu, Dec 11, 2014 at 03:41:34PM +0000, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Eliminate six needless spin lock/unlock pairs when writing ELSP. >> >> RFC for now with some #define copy and paste. >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Cc: Dave Gordon <david.s.gordon@intel.com> > > Yeah makes sense. I'm on the fence whether we should do an all-uppercase > conversion of the raw mmio macros, would be a nothc more consistent. And > some perf data for this patch would be good, too. With regards to perf data, Ben Widawsky was kind enough to give this patch a spin on his perf test bed (CHV), on a range of OGL benchmarks. Apparently only two results have "confidence t-score" > 95% (statistics is not my area), bench_OglBatch4 and bench_OglDeferred which show 0.51% and 0.73% gains respectively. Looking just on the basis of those two, I'd say the patch is worth cleaning up since it is a good gain for such a simple change. Other results show anything from 4.29% slowdown (!*) (bench_OglTexFilterAniso) to a 7.08% gain (bench_OglMultithreaded). Average across all benchmarks is a 0.38% gain. Thoughts? Regards, Tvrtko * I can't really understand regressions for some tests?!
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index a82020e..f2f4a28 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -276,6 +276,10 @@ static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj) return desc; } +#define __raw_i915_read32(dev_priv__, reg__) readl((dev_priv__)->regs + (reg__)) +#define __raw_i915_write32(dev_priv__, reg__, val__) writel(val__, (dev_priv__)->regs + (reg__)) +#define __raw_posting_read(dev_priv__, reg__) (void)__raw_i915_read32(dev_priv__, reg__) + static void execlists_elsp_write(struct intel_engine_cs *ring, struct drm_i915_gem_object *ctx_obj0, struct drm_i915_gem_object *ctx_obj1) @@ -323,19 +327,17 @@ static void execlists_elsp_write(struct intel_engine_cs *ring, dev_priv->uncore.funcs.force_wake_get(dev_priv, FORCEWAKE_ALL); } - spin_unlock_irqrestore(&dev_priv->uncore.lock, flags); - I915_WRITE(RING_ELSP(ring), desc[1]); - I915_WRITE(RING_ELSP(ring), desc[0]); - I915_WRITE(RING_ELSP(ring), desc[3]); + __raw_i915_write32(dev_priv, RING_ELSP(ring), desc[1]); + __raw_i915_write32(dev_priv, RING_ELSP(ring), desc[0]); + __raw_i915_write32(dev_priv, RING_ELSP(ring), desc[3]); /* The context is automatically loaded after the following */ - I915_WRITE(RING_ELSP(ring), desc[2]); + __raw_i915_write32(dev_priv, RING_ELSP(ring), desc[2]); /* ELSP is a wo register, so use another nearby reg for posting instead */ - POSTING_READ(RING_EXECLIST_STATUS(ring)); + __raw_posting_read(dev_priv, RING_EXECLIST_STATUS(ring)); /* Release Force Wakeup (see the big comment above). */ - spin_lock_irqsave(&dev_priv->uncore.lock, flags); if (IS_CHERRYVIEW(dev) || INTEL_INFO(dev)->gen >= 9) { if (--dev_priv->uncore.fw_rendercount == 0) dev_priv->uncore.funcs.force_wake_put(dev_priv,