diff mbox

[RFC] drm/i915: Reduce locking in command submission

Message ID 1418312494-22920-1-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin Dec. 11, 2014, 3:41 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Eliminate six needless spin lock/unlock pairs when writing ELSP.

RFC for now with some #define copy and paste.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

Comments

Daniel Vetter Dec. 15, 2014, 1:06 p.m. UTC | #1
On Thu, Dec 11, 2014 at 03:41:34PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Eliminate six needless spin lock/unlock pairs when writing ELSP.
> 
> RFC for now with some #define copy and paste.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Dave Gordon <david.s.gordon@intel.com>

Yeah makes sense. I'm on the fence whether we should do an all-uppercase
conversion of the raw mmio macros, would be a nothc more consistent. And
some perf data for this patch would be good, too.
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index a82020e..f2f4a28 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -276,6 +276,10 @@ static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj)
>  	return desc;
>  }
>  
> +#define __raw_i915_read32(dev_priv__, reg__) readl((dev_priv__)->regs + (reg__))
> +#define __raw_i915_write32(dev_priv__, reg__, val__) writel(val__, (dev_priv__)->regs + (reg__))
> +#define __raw_posting_read(dev_priv__, reg__) (void)__raw_i915_read32(dev_priv__, reg__)
> +
>  static void execlists_elsp_write(struct intel_engine_cs *ring,
>  				 struct drm_i915_gem_object *ctx_obj0,
>  				 struct drm_i915_gem_object *ctx_obj1)
> @@ -323,19 +327,17 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  			dev_priv->uncore.funcs.force_wake_get(dev_priv,
>  							      FORCEWAKE_ALL);
>  	}
> -	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
>  
> -	I915_WRITE(RING_ELSP(ring), desc[1]);
> -	I915_WRITE(RING_ELSP(ring), desc[0]);
> -	I915_WRITE(RING_ELSP(ring), desc[3]);
> +	__raw_i915_write32(dev_priv, RING_ELSP(ring), desc[1]);
> +	__raw_i915_write32(dev_priv, RING_ELSP(ring), desc[0]);
> +	__raw_i915_write32(dev_priv, RING_ELSP(ring), desc[3]);
>  	/* The context is automatically loaded after the following */
> -	I915_WRITE(RING_ELSP(ring), desc[2]);
> +	__raw_i915_write32(dev_priv, RING_ELSP(ring), desc[2]);
>  
>  	/* ELSP is a wo register, so use another nearby reg for posting instead */
> -	POSTING_READ(RING_EXECLIST_STATUS(ring));
> +	__raw_posting_read(dev_priv, RING_EXECLIST_STATUS(ring));
>  
>  	/* Release Force Wakeup (see the big comment above). */
> -	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
>  	if (IS_CHERRYVIEW(dev) || INTEL_INFO(dev)->gen >= 9) {
>  		if (--dev_priv->uncore.fw_rendercount == 0)
>  			dev_priv->uncore.funcs.force_wake_put(dev_priv,
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Tvrtko Ursulin Dec. 16, 2014, 1:34 p.m. UTC | #2
On 12/15/2014 01:06 PM, Daniel Vetter wrote:
> On Thu, Dec 11, 2014 at 03:41:34PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Eliminate six needless spin lock/unlock pairs when writing ELSP.
>>
>> RFC for now with some #define copy and paste.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Dave Gordon <david.s.gordon@intel.com>
>
> Yeah makes sense. I'm on the fence whether we should do an all-uppercase
> conversion of the raw mmio macros, would be a nothc more consistent. And
> some perf data for this patch would be good, too.

I know perf data would be good but I had no time to set up a suitable 
platform for testing. This was more like a drive-by since it's not 
pretty so it annoyed me. Will see if I can get back to this in the near 
future.

Regards,

Tvrtko
Tvrtko Ursulin Jan. 14, 2015, 10:13 a.m. UTC | #3
On 12/15/2014 01:06 PM, Daniel Vetter wrote:
> On Thu, Dec 11, 2014 at 03:41:34PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Eliminate six needless spin lock/unlock pairs when writing ELSP.
>>
>> RFC for now with some #define copy and paste.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Dave Gordon <david.s.gordon@intel.com>
>
> Yeah makes sense. I'm on the fence whether we should do an all-uppercase
> conversion of the raw mmio macros, would be a nothc more consistent. And
> some perf data for this patch would be good, too.

With regards to perf data, Ben Widawsky was kind enough to give this 
patch a spin on his perf test bed (CHV), on a range of OGL benchmarks.

Apparently only two results have "confidence t-score" > 95% (statistics 
is not my area), bench_OglBatch4 and bench_OglDeferred which show 0.51% 
and 0.73% gains respectively.

Looking just on the basis of those two, I'd say the patch is worth 
cleaning up since it is a good gain for such a simple change.

Other results show anything from 4.29% slowdown (!*) 
(bench_OglTexFilterAniso) to a 7.08% gain (bench_OglMultithreaded).

Average across all benchmarks is a 0.38% gain.

Thoughts?

Regards,

Tvrtko

* I can't really understand regressions for some tests?!
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a82020e..f2f4a28 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -276,6 +276,10 @@  static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_object *ctx_obj)
 	return desc;
 }
 
+#define __raw_i915_read32(dev_priv__, reg__) readl((dev_priv__)->regs + (reg__))
+#define __raw_i915_write32(dev_priv__, reg__, val__) writel(val__, (dev_priv__)->regs + (reg__))
+#define __raw_posting_read(dev_priv__, reg__) (void)__raw_i915_read32(dev_priv__, reg__)
+
 static void execlists_elsp_write(struct intel_engine_cs *ring,
 				 struct drm_i915_gem_object *ctx_obj0,
 				 struct drm_i915_gem_object *ctx_obj1)
@@ -323,19 +327,17 @@  static void execlists_elsp_write(struct intel_engine_cs *ring,
 			dev_priv->uncore.funcs.force_wake_get(dev_priv,
 							      FORCEWAKE_ALL);
 	}
-	spin_unlock_irqrestore(&dev_priv->uncore.lock, flags);
 
-	I915_WRITE(RING_ELSP(ring), desc[1]);
-	I915_WRITE(RING_ELSP(ring), desc[0]);
-	I915_WRITE(RING_ELSP(ring), desc[3]);
+	__raw_i915_write32(dev_priv, RING_ELSP(ring), desc[1]);
+	__raw_i915_write32(dev_priv, RING_ELSP(ring), desc[0]);
+	__raw_i915_write32(dev_priv, RING_ELSP(ring), desc[3]);
 	/* The context is automatically loaded after the following */
-	I915_WRITE(RING_ELSP(ring), desc[2]);
+	__raw_i915_write32(dev_priv, RING_ELSP(ring), desc[2]);
 
 	/* ELSP is a wo register, so use another nearby reg for posting instead */
-	POSTING_READ(RING_EXECLIST_STATUS(ring));
+	__raw_posting_read(dev_priv, RING_EXECLIST_STATUS(ring));
 
 	/* Release Force Wakeup (see the big comment above). */
-	spin_lock_irqsave(&dev_priv->uncore.lock, flags);
 	if (IS_CHERRYVIEW(dev) || INTEL_INFO(dev)->gen >= 9) {
 		if (--dev_priv->uncore.fw_rendercount == 0)
 			dev_priv->uncore.funcs.force_wake_put(dev_priv,