diff mbox

drm/i915: Enable provoking vertex fix on Gen9 systems.

Message ID 20180615190605.16238-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson June 15, 2018, 7:06 p.m. UTC
From: Kenneth Graunke <kenneth@whitecape.org>

The SF and clipper units mishandle the provoking vertex in some cases,
which can cause misrendering with shaders that use flat shaded inputs.

There are chicken bits in 3D_CHICKEN3 (for SF) and FF_SLICE_CHICKEN
(for the clipper) that work around the issue.  These registers are
unfortunately not part of the logical context (even the power context),
and so we must reload them every time we start executing in a context.

Bugzilla: https://bugs.freedesktop.org/103047
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---

This is only being set for gen9 right now, do we need it for gen10+?
Ken, I also need your s-o-b.

An open question is how to record such w/a for easy checking from
userspace. Though something like this might be better as part of an
independent verification tool.
-Chris
---
 drivers/gpu/drm/i915/i915_reg.h  |  5 +++++
 drivers/gpu/drm/i915/intel_lrc.c | 12 +++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

Comments

Kenneth Graunke June 16, 2018, 5:07 a.m. UTC | #1
On Friday, June 15, 2018 12:06:05 PM PDT Chris Wilson wrote:
> From: Kenneth Graunke <kenneth@whitecape.org>
> 
> The SF and clipper units mishandle the provoking vertex in some cases,
> which can cause misrendering with shaders that use flat shaded inputs.
> 
> There are chicken bits in 3D_CHICKEN3 (for SF) and FF_SLICE_CHICKEN
> (for the clipper) that work around the issue.  These registers are
> unfortunately not part of the logical context (even the power context),
> and so we must reload them every time we start executing in a context.
> 
> Bugzilla: https://bugs.freedesktop.org/103047
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
> 
> This is only being set for gen9 right now, do we need it for gen10+?

The 3D_CHICKEN3 bit is labeled as SKL and KBL.  The FF_SLICE_CHICKEN
bit is labeled as SKL, with no mention of KBL...but the Windows driver
appears to set both on all Gen 9.  It doesn't look like it sets them
on Gen 10, nor is documented to be necessary.  (I haven't brought up
a Cannonlake to test it, though...)

It looks like some of these registers are gone or reworked on Icelake,
so Gen9-only sounds good to me.

> Ken, I also need your s-o-b.

Oops, sorry...got out of the habit of adding that...

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>

Feel free to take authorship if you prefer - you've done much more work
on it than I have at this point. :)  Your call.  Thanks for fixing it
up and getting it landed, by the way!

It's a bummer that we have to do it here...but it seems like the only
place that it can realistically happen, with the power context issue.

It would be nice to Cc this to stable, but it seems like a lot of the
workaround code has changed a lot in the meantime, so not sure how well
it'd apply...

> An open question is how to record such w/a for easy checking from
> userspace. Though something like this might be better as part of an
> independent verification tool.
> -Chris
> ---
>  drivers/gpu/drm/i915/i915_reg.h  |  5 +++++
>  drivers/gpu/drm/i915/intel_lrc.c | 12 +++++++++++-
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index b8c0ebd50889..54ec7ab57ce8 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -2432,12 +2432,17 @@ enum i915_power_well_id {
>  #define _3D_CHICKEN	_MMIO(0x2084)
>  #define  _3D_CHICKEN_HIZ_PLANE_DISABLE_MSAA_4X_SNB	(1 << 10)
>  #define _3D_CHICKEN2	_MMIO(0x208c)
> +
> +#define FF_SLICE_CHICKEN	_MMIO(0x2088)
> +#define  FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX	(1 << 1)
> +
>  /* Disables pipelining of read flushes past the SF-WIZ interface.
>   * Required on all Ironlake steppings according to the B-Spec, but the
>   * particular danger of not doing so is not specified.
>   */
>  # define _3D_CHICKEN2_WM_READ_PIPELINED			(1 << 14)
>  #define _3D_CHICKEN3	_MMIO(0x2090)
> +#define  _3D_CHICKEN_SF_PROVOKING_VERTEX_FIX		(1 << 12)
>  #define  _3D_CHICKEN_SF_DISABLE_OBJEND_CULL		(1 << 10)
>  #define  _3D_CHICKEN3_AA_LINE_QUALITY_FIX_ENABLE	(1 << 5)
>  #define  _3D_CHICKEN3_SF_DISABLE_FASTCLIP_CULL		(1 << 5)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 839cb1fc6a01..2b0ae552cc4e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1575,11 +1575,21 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
>  	/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
>  	batch = gen8_emit_flush_coherentl3_wa(engine, batch);
>  
> +	*batch++ = MI_LOAD_REGISTER_IMM(3);
> +
>  	/* WaDisableGatherAtSetShaderCommonSlice:skl,bxt,kbl,glk */
> -	*batch++ = MI_LOAD_REGISTER_IMM(1);
>  	*batch++ = i915_mmio_reg_offset(COMMON_SLICE_CHICKEN2);
>  	*batch++ = _MASKED_BIT_DISABLE(
>  			GEN9_DISABLE_GATHER_AT_SET_SHADER_COMMON_SLICE);
> +
> +	/* BSpec: 11391 */
> +	*batch++ = i915_mmio_reg_offset(FF_SLICE_CHICKEN);
> +	*batch++ = _MASKED_BIT_ENABLE(FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX);
> +
> +	/* BSpec: 11299 */
> +	*batch++ = i915_mmio_reg_offset(_3D_CHICKEN3);
> +	*batch++ = _MASKED_BIT_ENABLE(_3D_CHICKEN_SF_PROVOKING_VERTEX_FIX);
> +
>  	*batch++ = MI_NOOP;
>  
>  	/* WaClearSlmSpaceAtContextSwitch:kbl */
>
Joonas Lahtinen June 18, 2018, 9:03 a.m. UTC | #2
Quoting Chris Wilson (2018-06-15 22:06:05)
> From: Kenneth Graunke <kenneth@whitecape.org>
> 
> The SF and clipper units mishandle the provoking vertex in some cases,
> which can cause misrendering with shaders that use flat shaded inputs.
> 
> There are chicken bits in 3D_CHICKEN3 (for SF) and FF_SLICE_CHICKEN
> (for the clipper) that work around the issue.  These registers are
> unfortunately not part of the logical context (even the power context),
> and so we must reload them every time we start executing in a context.
> 
> Bugzilla: https://bugs.freedesktop.org/103047
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

One note below.

> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1575,11 +1575,21 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
>         /* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
>         batch = gen8_emit_flush_coherentl3_wa(engine, batch);
>  
> +       *batch++ = MI_LOAD_REGISTER_IMM(3);
> +
>         /* WaDisableGatherAtSetShaderCommonSlice:skl,bxt,kbl,glk */
> -       *batch++ = MI_LOAD_REGISTER_IMM(1);
>         *batch++ = i915_mmio_reg_offset(COMMON_SLICE_CHICKEN2);
>         *batch++ = _MASKED_BIT_DISABLE(
>                         GEN9_DISABLE_GATHER_AT_SET_SHADER_COMMON_SLICE);
> +
> +       /* BSpec: 11391 */
> +       *batch++ = i915_mmio_reg_offset(FF_SLICE_CHICKEN);
> +       *batch++ = _MASKED_BIT_ENABLE(FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX);
> +
> +       /* BSpec: 11299 */
> +       *batch++ = i915_mmio_reg_offset(_3D_CHICKEN3);
> +       *batch++ = _MASKED_BIT_ENABLE(_3D_CHICKEN_SF_PROVOKING_VERTEX_FIX);

I'm almost betting that somebody will take one of these 3 off without
noticing the distant LOAD_REGISTER_IMM(). To perfect this, const table
of pairs and use ARRAY_SIZE()? Then we're also one step closer to the
const W/A tables...

Regards, Joonas
Chris Wilson June 18, 2018, 9:12 a.m. UTC | #3
Quoting Joonas Lahtinen (2018-06-18 10:03:24)
> Quoting Chris Wilson (2018-06-15 22:06:05)
> > From: Kenneth Graunke <kenneth@whitecape.org>
> > 
> > The SF and clipper units mishandle the provoking vertex in some cases,
> > which can cause misrendering with shaders that use flat shaded inputs.
> > 
> > There are chicken bits in 3D_CHICKEN3 (for SF) and FF_SLICE_CHICKEN
> > (for the clipper) that work around the issue.  These registers are
> > unfortunately not part of the logical context (even the power context),
> > and so we must reload them every time we start executing in a context.
> > 
> > Bugzilla: https://bugs.freedesktop.org/103047
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Thanks, fingers crossed we don't discover a reason why we shouldn't do
this for all contexts... Pushed.
 
> One note below.
> 
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -1575,11 +1575,21 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
> >         /* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
> >         batch = gen8_emit_flush_coherentl3_wa(engine, batch);
> >  
> > +       *batch++ = MI_LOAD_REGISTER_IMM(3);
> > +
> >         /* WaDisableGatherAtSetShaderCommonSlice:skl,bxt,kbl,glk */
> > -       *batch++ = MI_LOAD_REGISTER_IMM(1);
> >         *batch++ = i915_mmio_reg_offset(COMMON_SLICE_CHICKEN2);
> >         *batch++ = _MASKED_BIT_DISABLE(
> >                         GEN9_DISABLE_GATHER_AT_SET_SHADER_COMMON_SLICE);
> > +
> > +       /* BSpec: 11391 */
> > +       *batch++ = i915_mmio_reg_offset(FF_SLICE_CHICKEN);
> > +       *batch++ = _MASKED_BIT_ENABLE(FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX);
> > +
> > +       /* BSpec: 11299 */
> > +       *batch++ = i915_mmio_reg_offset(_3D_CHICKEN3);
> > +       *batch++ = _MASKED_BIT_ENABLE(_3D_CHICKEN_SF_PROVOKING_VERTEX_FIX);
> 
> I'm almost betting that somebody will take one of these 3 off without
> noticing the distant LOAD_REGISTER_IMM(). To perfect this, const table
> of pairs and use ARRAY_SIZE()? Then we're also one step closer to the
> const W/A tables...

I'll be back later to tidy this up.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index b8c0ebd50889..54ec7ab57ce8 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -2432,12 +2432,17 @@  enum i915_power_well_id {
 #define _3D_CHICKEN	_MMIO(0x2084)
 #define  _3D_CHICKEN_HIZ_PLANE_DISABLE_MSAA_4X_SNB	(1 << 10)
 #define _3D_CHICKEN2	_MMIO(0x208c)
+
+#define FF_SLICE_CHICKEN	_MMIO(0x2088)
+#define  FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX	(1 << 1)
+
 /* Disables pipelining of read flushes past the SF-WIZ interface.
  * Required on all Ironlake steppings according to the B-Spec, but the
  * particular danger of not doing so is not specified.
  */
 # define _3D_CHICKEN2_WM_READ_PIPELINED			(1 << 14)
 #define _3D_CHICKEN3	_MMIO(0x2090)
+#define  _3D_CHICKEN_SF_PROVOKING_VERTEX_FIX		(1 << 12)
 #define  _3D_CHICKEN_SF_DISABLE_OBJEND_CULL		(1 << 10)
 #define  _3D_CHICKEN3_AA_LINE_QUALITY_FIX_ENABLE	(1 << 5)
 #define  _3D_CHICKEN3_SF_DISABLE_FASTCLIP_CULL		(1 << 5)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 839cb1fc6a01..2b0ae552cc4e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1575,11 +1575,21 @@  static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 	/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
 	batch = gen8_emit_flush_coherentl3_wa(engine, batch);
 
+	*batch++ = MI_LOAD_REGISTER_IMM(3);
+
 	/* WaDisableGatherAtSetShaderCommonSlice:skl,bxt,kbl,glk */
-	*batch++ = MI_LOAD_REGISTER_IMM(1);
 	*batch++ = i915_mmio_reg_offset(COMMON_SLICE_CHICKEN2);
 	*batch++ = _MASKED_BIT_DISABLE(
 			GEN9_DISABLE_GATHER_AT_SET_SHADER_COMMON_SLICE);
+
+	/* BSpec: 11391 */
+	*batch++ = i915_mmio_reg_offset(FF_SLICE_CHICKEN);
+	*batch++ = _MASKED_BIT_ENABLE(FF_SLICE_CHICKEN_CL_PROVOKING_VERTEX_FIX);
+
+	/* BSpec: 11299 */
+	*batch++ = i915_mmio_reg_offset(_3D_CHICKEN3);
+	*batch++ = _MASKED_BIT_ENABLE(_3D_CHICKEN_SF_PROVOKING_VERTEX_FIX);
+
 	*batch++ = MI_NOOP;
 
 	/* WaClearSlmSpaceAtContextSwitch:kbl */