diff mbox

[1/6] drm/i915: Replace hardcoded cacheline size with macro

Message ID 1396452971-25654-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson April 2, 2014, 3:36 p.m. UTC
For readibility and guess at the meaning behind the constants.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

Comments

Jesse Barnes April 2, 2014, 9:57 p.m. UTC | #1
On Wed,  2 Apr 2014 16:36:06 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> For readibility and guess at the meaning behind the constants.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 785f246d28a8..475391ce671a 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -33,6 +33,8 @@
>  #include "i915_trace.h"
>  #include "intel_drv.h"
>  
> +#define CACHELINE_BYTES 64
> +

Are you sure it's 64 on every gen?  It changes on the CPU side from
time to time... I thought it might have changed over time on the GPU
too but I haven't checked the specs.

Either way a doc ref would be nice here.
Chris Wilson April 3, 2014, 6:45 a.m. UTC | #2
On Wed, Apr 02, 2014 at 02:57:11PM -0700, Jesse Barnes wrote:
> On Wed,  2 Apr 2014 16:36:06 +0100
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > For readibility and guess at the meaning behind the constants.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 29 ++++++++++++++++-------------
> >  1 file changed, 16 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 785f246d28a8..475391ce671a 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -33,6 +33,8 @@
> >  #include "i915_trace.h"
> >  #include "intel_drv.h"
> >  
> > +#define CACHELINE_BYTES 64
> > +
> 
> Are you sure it's 64 on every gen?  It changes on the CPU side from
> time to time... I thought it might have changed over time on the GPU
> too but I haven't checked the specs.

The cacheline is 32bytes on gen2, 64 elsewhere. We've made a blanket
assumption of 64 and then some random factors on top. (Some cachelines
may be as large as 1024bytes elsewhere in the chip.) I'm not sure where
some of the values used in the code where plucked from, Jesse?
-Chris
Daniel Vetter April 3, 2014, 3:16 p.m. UTC | #3
On Thu, Apr 03, 2014 at 07:45:40AM +0100, Chris Wilson wrote:
> On Wed, Apr 02, 2014 at 02:57:11PM -0700, Jesse Barnes wrote:
> > On Wed,  2 Apr 2014 16:36:06 +0100
> > Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > 
> > > For readibility and guess at the meaning behind the constants.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 29 ++++++++++++++++-------------
> > >  1 file changed, 16 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index 785f246d28a8..475391ce671a 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -33,6 +33,8 @@
> > >  #include "i915_trace.h"
> > >  #include "intel_drv.h"
> > >  
> > > +#define CACHELINE_BYTES 64
> > > +
> > 
> > Are you sure it's 64 on every gen?  It changes on the CPU side from
> > time to time... I thought it might have changed over time on the GPU
> > too but I haven't checked the specs.
> 
> The cacheline is 32bytes on gen2, 64 elsewhere. We've made a blanket
> assumption of 64 and then some random factors on top. (Some cachelines
> may be as large as 1024bytes elsewhere in the chip.) I'm not sure where
> some of the values used in the code where plucked from, Jesse?

Maybe a quick comment that this is the maximum and that gen2 has only
32bytes but aligning more isn't harmful?

Perhaps also mention that for actual cacheline flushing we _must_ use the
cl size of the cpu for otherwise we don't flush poperly. If your define
has some comment/warning to that effect I'm ok with this generalization.
And it nicely makes some of the additional lore (128bytes ?!) stick out
more.

btw the 1k thing at least on i865G is iirc just the writeout fifo between
the cpu and the gmch to paper over FSB latencies (or whatever irked hw
designers).
-Daniel
Chris Wilson April 3, 2014, 3:23 p.m. UTC | #4
On Thu, Apr 03, 2014 at 05:16:16PM +0200, Daniel Vetter wrote:
> btw the 1k thing at least on i865G is iirc just the writeout fifo between
> the cpu and the gmch to paper over FSB latencies (or whatever irked hw
> designers).

Isn't there a 1024 byte supercacheline for msaa as well? At least that
sticks out in my mind from the discussions on when not to use eDRAM etc.
-Chris
Daniel Vetter April 3, 2014, 9:09 p.m. UTC | #5
On Thu, Apr 03, 2014 at 04:23:05PM +0100, Chris Wilson wrote:
> On Thu, Apr 03, 2014 at 05:16:16PM +0200, Daniel Vetter wrote:
> > btw the 1k thing at least on i865G is iirc just the writeout fifo between
> > the cpu and the gmch to paper over FSB latencies (or whatever irked hw
> > designers).
> 
> Isn't there a 1024 byte supercacheline for msaa as well? At least that
> sticks out in my mind from the discussions on when not to use eDRAM etc.

I've thought that was just a recommendation to not use eDRAM with
compressed msaa buffers since for those you only use the pixels on the
edges of primitives. And since one msaa pixel is less than the 1024bytes
the supercacheline is actual edram utilization for common workloads would
be horrible.

Or at least that's how I've interpreted the graphs and diagrams in the
docs. Afaik the coherency mocs bits for edram are still available on a 64
byte boundary, it's just the address tag which is much larger.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 785f246d28a8..475391ce671a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -33,6 +33,8 @@ 
 #include "i915_trace.h"
 #include "intel_drv.h"
 
+#define CACHELINE_BYTES 64
+
 static inline int ring_space(struct intel_ring_buffer *ring)
 {
 	int space = (ring->head & HEAD_ADDR) - (ring->tail + I915_RING_FREE_SPACE);
@@ -175,7 +177,7 @@  gen4_render_ring_flush(struct intel_ring_buffer *ring,
 static int
 intel_emit_post_sync_nonzero_flush(struct intel_ring_buffer *ring)
 {
-	u32 scratch_addr = ring->scratch.gtt_offset + 128;
+	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 
@@ -212,7 +214,7 @@  gen6_render_ring_flush(struct intel_ring_buffer *ring,
                          u32 invalidate_domains, u32 flush_domains)
 {
 	u32 flags = 0;
-	u32 scratch_addr = ring->scratch.gtt_offset + 128;
+	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 	/* Force SNB workarounds for PIPE_CONTROL flushes */
@@ -306,7 +308,7 @@  gen7_render_ring_flush(struct intel_ring_buffer *ring,
 		       u32 invalidate_domains, u32 flush_domains)
 {
 	u32 flags = 0;
-	u32 scratch_addr = ring->scratch.gtt_offset + 128;
+	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 	/*
@@ -367,7 +369,7 @@  gen8_render_ring_flush(struct intel_ring_buffer *ring,
 		       u32 invalidate_domains, u32 flush_domains)
 {
 	u32 flags = 0;
-	u32 scratch_addr = ring->scratch.gtt_offset + 128;
+	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 	flags |= PIPE_CONTROL_CS_STALL;
@@ -772,7 +774,7 @@  do {									\
 static int
 pc_render_add_request(struct intel_ring_buffer *ring)
 {
-	u32 scratch_addr = ring->scratch.gtt_offset + 128;
+	u32 scratch_addr = ring->scratch.gtt_offset + 2 * CACHELINE_BYTES;
 	int ret;
 
 	/* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently
@@ -794,15 +796,15 @@  pc_render_add_request(struct intel_ring_buffer *ring)
 	intel_ring_emit(ring, ring->outstanding_lazy_seqno);
 	intel_ring_emit(ring, 0);
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
-	scratch_addr += 128; /* write to separate cachelines */
+	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
-	scratch_addr += 128;
+	scratch_addr += 2 * CACHELINE_BYTES;
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
-	scratch_addr += 128;
+	scratch_addr += 2 * CACHELINE_BYTES;
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
-	scratch_addr += 128;
+	scratch_addr += 2 * CACHELINE_BYTES;
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
-	scratch_addr += 128;
+	scratch_addr += 2 * CACHELINE_BYTES;
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
 
 	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
@@ -1411,7 +1413,7 @@  static int intel_init_ring_buffer(struct drm_device *dev,
 	 */
 	ring->effective_size = ring->size;
 	if (IS_I830(ring->dev) || IS_845G(ring->dev))
-		ring->effective_size -= 128;
+		ring->effective_size -= 2 * CACHELINE_BYTES;
 
 	i915_cmd_parser_init_ring(ring);
 
@@ -1672,12 +1674,13 @@  int intel_ring_begin(struct intel_ring_buffer *ring,
 /* Align the ring tail to a cacheline boundary */
 int intel_ring_cacheline_align(struct intel_ring_buffer *ring)
 {
-	int num_dwords = (64 - (ring->tail & 63)) / sizeof(uint32_t);
+	int num_dwords = (ring->tail & (CACHELINE_BYTES - 1)) / sizeof(uint32_t);
 	int ret;
 
 	if (num_dwords == 0)
 		return 0;
 
+	num_dwords = CACHELINE_BYTES / sizeof(uint32_t) - num_dwords;
 	ret = intel_ring_begin(ring, num_dwords);
 	if (ret)
 		return ret;
@@ -2034,7 +2037,7 @@  int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
 	ring->size = size;
 	ring->effective_size = ring->size;
 	if (IS_I830(ring->dev) || IS_845G(ring->dev))
-		ring->effective_size -= 128;
+		ring->effective_size -= 2 * CACHELINE_BYTES;
 
 	ring->virtual_start = ioremap_wc(start, size);
 	if (ring->virtual_start == NULL) {