[1/2] drm/i915: Cache elsp submit register
diff mbox

Message ID 1458667013-13944-1-git-send-email-tvrtko.ursulin@linux.intel.com
State New
Headers show

Commit Message

Tvrtko Ursulin March 22, 2016, 5:16 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Since we write four times to the same register, caching
the mmio register saves a tiny amount of generated code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Ville Syrjälä March 22, 2016, 5:29 p.m. UTC | #1
On Tue, Mar 22, 2016 at 05:16:52PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Since we write four times to the same register, caching
> the mmio register saves a tiny amount of generated code.

The compiler can't figure this out on its own?

> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e733795b57e0..6916991bdceb 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -362,6 +362,7 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
>  
>  	struct intel_engine_cs *engine = rq0->engine;
>  	struct drm_i915_private *dev_priv = rq0->i915;
> +	i915_reg_t elsp_reg = RING_ELSP(engine);
>  	uint64_t desc[2];
>  
>  	if (rq1) {
> @@ -375,12 +376,12 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
>  	rq0->elsp_submitted++;
>  
>  	/* You must always write both descriptors in the order below. */
> -	I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[1]));
> -	I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[1]));
> +	I915_WRITE_FW(elsp_reg, upper_32_bits(desc[1]));
> +	I915_WRITE_FW(elsp_reg, lower_32_bits(desc[1]));
>  
> -	I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[0]));
> +	I915_WRITE_FW(elsp_reg, upper_32_bits(desc[0]));
>  	/* The context is automatically loaded after the following */
> -	I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[0]));
> +	I915_WRITE_FW(elsp_reg, lower_32_bits(desc[0]));
>  
>  	/* ELSP is a wo register, use another nearby reg for posting */
>  	POSTING_READ_FW(RING_EXECLIST_STATUS_LO(engine));
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Tvrtko Ursulin March 22, 2016, 5:39 p.m. UTC | #2
On 22/03/16 17:29, Ville Syrjälä wrote:
> On Tue, Mar 22, 2016 at 05:16:52PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Since we write four times to the same register, caching
>> the mmio register saves a tiny amount of generated code.
>
> The compiler can't figure this out on its own?

Nope, at least gcc 4.84 I am running here can't. :(

And this only solves one part of the things it can't figure out in that 
code. It still recalculates one part, can't remember which one is which 
now without revisiting the generated assembly. It used to be for times 
in a row: load register, add 0x230, displace 0x78, store[0-4]. This only 
solves the add 0x230 redundancy. But working around that would possibly 
be a bit too low level.

Regards,

Tvrtko
Dave Gordon March 30, 2016, 3:05 p.m. UTC | #3
On 22/03/16 17:39, Tvrtko Ursulin wrote:
>
> On 22/03/16 17:29, Ville Syrjälä wrote:
>> On Tue, Mar 22, 2016 at 05:16:52PM +0000, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Since we write four times to the same register, caching
>>> the mmio register saves a tiny amount of generated code.
>>
>> The compiler can't figure this out on its own?
>
> Nope, at least gcc 4.84 I am running here can't. :(
>
> And this only solves one part of the things it can't figure out in that
> code. It still recalculates one part, can't remember which one is which
> now without revisiting the generated assembly. It used to be for times
> in a row: load register, add 0x230, displace 0x78, store[0-4]. This only
> solves the add 0x230 redundancy. But working around that would possibly
> be a bit too low level.
>
> Regards,
> Tvrtko

Compiler's probably assuming aliasing.

RING_ELSP(engine) is actually (engine->mmio_base+0x230).

I915_WRITE_FW(reg, val) is actually __raw_i915_write32(dev_priv, 
(reg__), (val__)) which ultimately translates to a store to some address.

The compiler can't be sure that this store isn't actually to 
(engine->mmio_base), so it refetches it and adds the 0x230 again. Saving 
the (struct-valued) result of the RING_ELSP() macro means the compiler 
knows it isn't aliased, so can reuse it four times.

We could try adding __restrict to various key pointers, starting with 
dev_priv and all pointers-to-engines?

.Dave.

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e733795b57e0..6916991bdceb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -362,6 +362,7 @@  static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
 
 	struct intel_engine_cs *engine = rq0->engine;
 	struct drm_i915_private *dev_priv = rq0->i915;
+	i915_reg_t elsp_reg = RING_ELSP(engine);
 	uint64_t desc[2];
 
 	if (rq1) {
@@ -375,12 +376,12 @@  static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
 	rq0->elsp_submitted++;
 
 	/* You must always write both descriptors in the order below. */
-	I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[1]));
-	I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[1]));
+	I915_WRITE_FW(elsp_reg, upper_32_bits(desc[1]));
+	I915_WRITE_FW(elsp_reg, lower_32_bits(desc[1]));
 
-	I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[0]));
+	I915_WRITE_FW(elsp_reg, upper_32_bits(desc[0]));
 	/* The context is automatically loaded after the following */
-	I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[0]));
+	I915_WRITE_FW(elsp_reg, lower_32_bits(desc[0]));
 
 	/* ELSP is a wo register, use another nearby reg for posting */
 	POSTING_READ_FW(RING_EXECLIST_STATUS_LO(engine));