diff mbox

[2/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaReadAfterWriteHazard'

Message ID 1390362310-15963-3-git-send-email-akash.goel@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

akash.goel@intel.com Jan. 22, 2014, 3:45 a.m. UTC
From: Akash Goel <akash.goel@intel.com>

Added a new rendering specific Workaround 'WaReadAfterWriteHazard'.
In this WA, need to add 12 MI Store Dword commands to ensure proper
flush of h/w pipeline.

Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

Comments

Ville Syrjälä Jan. 22, 2014, 10:54 a.m. UTC | #1
On Wed, Jan 22, 2014 at 09:15:06AM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> Added a new rendering specific Workaround 'WaReadAfterWriteHazard'.
> In this WA, need to add 12 MI Store Dword commands to ensure proper
> flush of h/w pipeline.
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 133d273..e8ec536 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring)
>  
>  	trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS);
>  
> +	if (IS_VALLEYVIEW(ring->dev)) {
> +		/*
> +		 * WaReadAfterWriteHazard
> +		 * Send a number of Store Data commands here to finish
> +		 * flushing hardware pipeline.This is needed in the case
> +		 * where the next workload tries reading from the same
> +		 * surface that this batch writes to. Without these StoreDWs,
> +		 * not all of the data will actually be flushd to the surface
> +		 * by the time the next batch starts reading it, possibly
> +		 * causing a small amount of corruption.
> +		 */
> +		int i;
> +		ret = intel_ring_begin(ring, 4 * 12);

BSpec says 8 is enough. Is Bspec incorrect.

Also this workaround is also listed for everything SNB+.

> +		if (ret)
> +			return ret;
> +		for (i = 0; i < 12; i++) {
> +			intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
> +			intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_INDEX <<
> +							MI_STORE_DWORD_INDEX_SHIFT);
> +			intel_ring_emit(ring, 0);
> +			intel_ring_emit(ring, MI_NOOP);
> +		}
> +		intel_ring_advance(ring);
> +	}
> +
>  	ring->gpu_caches_dirty = false;
>  	return 0;
>  }
> -- 
> 1.8.5.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Jan. 22, 2014, 11:11 a.m. UTC | #2
On Wed, Jan 22, 2014 at 12:54:51PM +0200, Ville Syrjälä wrote:
> On Wed, Jan 22, 2014 at 09:15:06AM +0530, akash.goel@intel.com wrote:
> > From: Akash Goel <akash.goel@intel.com>
> > 
> > Added a new rendering specific Workaround 'WaReadAfterWriteHazard'.
> > In this WA, need to add 12 MI Store Dword commands to ensure proper
> > flush of h/w pipeline.
> > 
> > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 133d273..e8ec536 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring)
> >  
> >  	trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS);
> >  
> > +	if (IS_VALLEYVIEW(ring->dev)) {
> > +		/*
> > +		 * WaReadAfterWriteHazard
> > +		 * Send a number of Store Data commands here to finish
> > +		 * flushing hardware pipeline.This is needed in the case
> > +		 * where the next workload tries reading from the same
> > +		 * surface that this batch writes to. Without these StoreDWs,
> > +		 * not all of the data will actually be flushd to the surface
> > +		 * by the time the next batch starts reading it, possibly
> > +		 * causing a small amount of corruption.
> > +		 */
> > +		int i;
> > +		ret = intel_ring_begin(ring, 4 * 12);
> 
> BSpec says 8 is enough. Is Bspec incorrect.

No, these are just figures they plucked out of the air. Last I heard
they were using 32...
 
> Also this workaround is also listed for everything SNB+.

And we already have a more effective workaround.
-Chris
sourab.gupta@intel.com March 21, 2014, 11:53 a.m. UTC | #3
On Wed, 2014-01-22 at 11:11 +0000, Chris Wilson wrote:
> On Wed, Jan 22, 2014 at 12:54:51PM +0200, Ville Syrjälä wrote:

> > On Wed, Jan 22, 2014 at 09:15:06AM +0530, akash.goel@intel.com wrote:

> > > From: Akash Goel <akash.goel@intel.com>

> > >

> > > Added a new rendering specific Workaround 'WaReadAfterWriteHazard'.

> > > In this WA, need to add 12 MI Store Dword commands to ensure proper

> > > flush of h/w pipeline.

> > >

> > > Signed-off-by: Akash Goel <akash.goel@intel.com>

> > > ---

> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++

> > >  1 file changed, 25 insertions(+)

> > >

> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c

> > > index 133d273..e8ec536 100644

> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c

> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c

> > > @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring)

> > >

> > >     trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS);

> > >

> > > +   if (IS_VALLEYVIEW(ring->dev)) {

> > > +           /*

> > > +            * WaReadAfterWriteHazard

> > > +            * Send a number of Store Data commands here to finish

> > > +            * flushing hardware pipeline.This is needed in the case

> > > +            * where the next workload tries reading from the same

> > > +            * surface that this batch writes to. Without these StoreDWs,

> > > +            * not all of the data will actually be flushd to the surface

> > > +            * by the time the next batch starts reading it, possibly

> > > +            * causing a small amount of corruption.

> > > +            */

> > > +           int i;

> > > +           ret = intel_ring_begin(ring, 4 * 12);

> >

> > BSpec says 8 is enough. Is Bspec incorrect.

> 

> No, these are just figures they plucked out of the air. Last I heard

> they were using 32...

> 

> > Also this workaround is also listed for everything SNB+.

> 

> And we already have a more effective workaround.

> -Chris


Can you please let us know whether this workaround patch is required or
not. If not, then how is this currently handled.
> 

> --

> Chris Wilson, Intel Open Source Technology Centre
Daniel Vetter March 21, 2014, 2:58 p.m. UTC | #4
On Fri, Mar 21, 2014 at 11:53:40AM +0000, Gupta, Sourab wrote:
> On Wed, 2014-01-22 at 11:11 +0000, Chris Wilson wrote:
> > On Wed, Jan 22, 2014 at 12:54:51PM +0200, Ville Syrjälä wrote:
> > > On Wed, Jan 22, 2014 at 09:15:06AM +0530, akash.goel@intel.com wrote:
> > > > From: Akash Goel <akash.goel@intel.com>
> > > >
> > > > Added a new rendering specific Workaround 'WaReadAfterWriteHazard'.
> > > > In this WA, need to add 12 MI Store Dword commands to ensure proper
> > > > flush of h/w pipeline.
> > > >
> > > > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++
> > > >  1 file changed, 25 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > index 133d273..e8ec536 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring)
> > > >
> > > >     trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS);
> > > >
> > > > +   if (IS_VALLEYVIEW(ring->dev)) {
> > > > +           /*
> > > > +            * WaReadAfterWriteHazard
> > > > +            * Send a number of Store Data commands here to finish
> > > > +            * flushing hardware pipeline.This is needed in the case
> > > > +            * where the next workload tries reading from the same
> > > > +            * surface that this batch writes to. Without these StoreDWs,
> > > > +            * not all of the data will actually be flushd to the surface
> > > > +            * by the time the next batch starts reading it, possibly
> > > > +            * causing a small amount of corruption.
> > > > +            */
> > > > +           int i;
> > > > +           ret = intel_ring_begin(ring, 4 * 12);
> > >
> > > BSpec says 8 is enough. Is Bspec incorrect.
> > 
> > No, these are just figures they plucked out of the air. Last I heard
> > they were using 32...
> > 
> > > Also this workaround is also listed for everything SNB+.
> > 
> > And we already have a more effective workaround.
> > -Chris
> 
> Can you please let us know whether this workaround patch is required or
> not. If not, then how is this currently handled.

We emit XY_SETUP_BLT before certain blt operations to insert a
sufficiently long stall. The underlying bug this works around is that the
cache controller of the cpu falls over in certain very specific rwm
cycles. The official w/a is a full pipeline flush when switching between
reading and writing to a surface, which has a horribly perf impact on the
blitter.
-Daniel
sourab.gupta@intel.com March 21, 2014, 4:50 p.m. UTC | #5
On Fri, 2014-03-21 at 14:58 +0000, Daniel Vetter wrote:
> On Fri, Mar 21, 2014 at 11:53:40AM +0000, Gupta, Sourab wrote:

> > On Wed, 2014-01-22 at 11:11 +0000, Chris Wilson wrote:

> > > On Wed, Jan 22, 2014 at 12:54:51PM +0200, Ville Syrjälä wrote:

> > > > On Wed, Jan 22, 2014 at 09:15:06AM +0530, akash.goel@intel.com wrote:

> > > > > From: Akash Goel <akash.goel@intel.com>

> > > > >

> > > > > Added a new rendering specific Workaround 'WaReadAfterWriteHazard'.

> > > > > In this WA, need to add 12 MI Store Dword commands to ensure proper

> > > > > flush of h/w pipeline.

> > > > >

> > > > > Signed-off-by: Akash Goel <akash.goel@intel.com>

> > > > > ---

> > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++

> > > > >  1 file changed, 25 insertions(+)

> > > > >

> > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c

> > > > > index 133d273..e8ec536 100644

> > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c

> > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c

> > > > > @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring)

> > > > >

> > > > >     trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS);

> > > > >

> > > > > +   if (IS_VALLEYVIEW(ring->dev)) {

> > > > > +           /*

> > > > > +            * WaReadAfterWriteHazard

> > > > > +            * Send a number of Store Data commands here to finish

> > > > > +            * flushing hardware pipeline.This is needed in the case

> > > > > +            * where the next workload tries reading from the same

> > > > > +            * surface that this batch writes to. Without these StoreDWs,

> > > > > +            * not all of the data will actually be flushd to the surface

> > > > > +            * by the time the next batch starts reading it, possibly

> > > > > +            * causing a small amount of corruption.

> > > > > +            */

> > > > > +           int i;

> > > > > +           ret = intel_ring_begin(ring, 4 * 12);

> > > >

> > > > BSpec says 8 is enough. Is Bspec incorrect.

> > > 

> > > No, these are just figures they plucked out of the air. Last I heard

> > > they were using 32...

> > > 

> > > > Also this workaround is also listed for everything SNB+.

> > > 

> > > And we already have a more effective workaround.

> > > -Chris

> > 

> > Can you please let us know whether this workaround patch is required or

> > not. If not, then how is this currently handled.

> 

> We emit XY_SETUP_BLT before certain blt operations to insert a

> sufficiently long stall. The underlying bug this works around is that the

> cache controller of the cpu falls over in certain very specific rwm

> cycles. The official w/a is a full pipeline flush when switching between

> reading and writing to a surface, which has a horribly perf impact on the

> blitter.

> -Daniel


Hi Daniel,
In the kernel code, we're not able to see any such operation during the
batchbuffer submission. Is the XY_SETUP_BLT emit done through the
userspace, in these specific usecases?
Does this mean that this patch is not admissible as such for this
underlying bug?

Regards,
Sourab
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 133d273..e8ec536 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2167,6 +2167,31 @@  intel_ring_flush_all_caches(struct intel_ring_buffer *ring)
 
 	trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS);
 
+	if (IS_VALLEYVIEW(ring->dev)) {
+		/*
+		 * WaReadAfterWriteHazard
+		 * Send a number of Store Data commands here to finish
+		 * flushing hardware pipeline.This is needed in the case
+		 * where the next workload tries reading from the same
+		 * surface that this batch writes to. Without these StoreDWs,
+		 * not all of the data will actually be flushd to the surface
+		 * by the time the next batch starts reading it, possibly
+		 * causing a small amount of corruption.
+		 */
+		int i;
+		ret = intel_ring_begin(ring, 4 * 12);
+		if (ret)
+			return ret;
+		for (i = 0; i < 12; i++) {
+			intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
+			intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_INDEX <<
+							MI_STORE_DWORD_INDEX_SHIFT);
+			intel_ring_emit(ring, 0);
+			intel_ring_emit(ring, MI_NOOP);
+		}
+		intel_ring_advance(ring);
+	}
+
 	ring->gpu_caches_dirty = false;
 	return 0;
 }