Message ID | 1458667013-13944-1-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Mar 22, 2016 at 05:16:52PM +0000, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Since we write four times to the same register, caching > the mmio register saves a tiny amount of generated code. The compiler can't figure this out on its own? > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/intel_lrc.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index e733795b57e0..6916991bdceb 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -362,6 +362,7 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0, > > struct intel_engine_cs *engine = rq0->engine; > struct drm_i915_private *dev_priv = rq0->i915; > + i915_reg_t elsp_reg = RING_ELSP(engine); > uint64_t desc[2]; > > if (rq1) { > @@ -375,12 +376,12 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0, > rq0->elsp_submitted++; > > /* You must always write both descriptors in the order below. */ > - I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[1])); > - I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[1])); > + I915_WRITE_FW(elsp_reg, upper_32_bits(desc[1])); > + I915_WRITE_FW(elsp_reg, lower_32_bits(desc[1])); > > - I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[0])); > + I915_WRITE_FW(elsp_reg, upper_32_bits(desc[0])); > /* The context is automatically loaded after the following */ > - I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[0])); > + I915_WRITE_FW(elsp_reg, lower_32_bits(desc[0])); > > /* ELSP is a wo register, use another nearby reg for posting */ > POSTING_READ_FW(RING_EXECLIST_STATUS_LO(engine)); > -- > 1.9.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 22/03/16 17:29, Ville Syrjälä wrote: > On Tue, Mar 22, 2016 at 05:16:52PM +0000, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Since we write four times to the same register, caching >> the mmio register saves a tiny amount of generated code. > > The compiler can't figure this out on its own? Nope, at least gcc 4.84 I am running here can't. :( And this only solves one part of the things it can't figure out in that code. It still recalculates one part, can't remember which one is which now without revisiting the generated assembly. It used to be for times in a row: load register, add 0x230, displace 0x78, store[0-4]. This only solves the add 0x230 redundancy. But working around that would possibly be a bit too low level. Regards, Tvrtko
On 22/03/16 17:39, Tvrtko Ursulin wrote: > > On 22/03/16 17:29, Ville Syrjälä wrote: >> On Tue, Mar 22, 2016 at 05:16:52PM +0000, Tvrtko Ursulin wrote: >>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> >>> Since we write four times to the same register, caching >>> the mmio register saves a tiny amount of generated code. >> >> The compiler can't figure this out on its own? > > Nope, at least gcc 4.84 I am running here can't. :( > > And this only solves one part of the things it can't figure out in that > code. It still recalculates one part, can't remember which one is which > now without revisiting the generated assembly. It used to be for times > in a row: load register, add 0x230, displace 0x78, store[0-4]. This only > solves the add 0x230 redundancy. But working around that would possibly > be a bit too low level. > > Regards, > Tvrtko Compiler's probably assuming aliasing. RING_ELSP(engine) is actually (engine->mmio_base+0x230). I915_WRITE_FW(reg, val) is actually __raw_i915_write32(dev_priv, (reg__), (val__)) which ultimately translates to a store to some address. The compiler can't be sure that this store isn't actually to (engine->mmio_base), so it refetches it and adds the 0x230 again. Saving the (struct-valued) result of the RING_ELSP() macro means the compiler knows it isn't aliased, so can reuse it four times. We could try adding __restrict to various key pointers, starting with dev_priv and all pointers-to-engines? .Dave.
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index e733795b57e0..6916991bdceb 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -362,6 +362,7 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0, struct intel_engine_cs *engine = rq0->engine; struct drm_i915_private *dev_priv = rq0->i915; + i915_reg_t elsp_reg = RING_ELSP(engine); uint64_t desc[2]; if (rq1) { @@ -375,12 +376,12 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0, rq0->elsp_submitted++; /* You must always write both descriptors in the order below. */ - I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[1])); - I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[1])); + I915_WRITE_FW(elsp_reg, upper_32_bits(desc[1])); + I915_WRITE_FW(elsp_reg, lower_32_bits(desc[1])); - I915_WRITE_FW(RING_ELSP(engine), upper_32_bits(desc[0])); + I915_WRITE_FW(elsp_reg, upper_32_bits(desc[0])); /* The context is automatically loaded after the following */ - I915_WRITE_FW(RING_ELSP(engine), lower_32_bits(desc[0])); + I915_WRITE_FW(elsp_reg, lower_32_bits(desc[0])); /* ELSP is a wo register, use another nearby reg for posting */ POSTING_READ_FW(RING_EXECLIST_STATUS_LO(engine));