diff mbox

[v2] drm/i915/cnl: WaPipeControlBefore3DStateSamplePattern

Message ID 20180126230524.17429-1-rafael.antognolli@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Rafael Antognolli Jan. 26, 2018, 11:05 p.m. UTC
This workaround should prevent a bug that can be hit on a context
restore. To avoid the issue, we must emit a PIPE_CONTROL with CS stall
(0x7a000004 0x00100000 0x00000000 0x00000000) followed by 12DW's of
NOOP(0x0) in the indirect context batch buffer, to ensure the engine is
idle prior to programming 3DSTATE_SAMPLE_PATTERN.

References: HSD#1939868

 v2:
   - More descriptive changelog and comments.
   - Fix math when counting dwords.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

Comments

Chris Wilson Jan. 26, 2018, 11:17 p.m. UTC | #1
Quoting Rafael Antognolli (2018-01-26 23:05:24)
> This workaround should prevent a bug that can be hit on a context
> restore. To avoid the issue, we must emit a PIPE_CONTROL with CS stall
> (0x7a000004 0x00100000 0x00000000 0x00000000) followed by 12DW's of
> NOOP(0x0) in the indirect context batch buffer, to ensure the engine is
> idle prior to programming 3DSTATE_SAMPLE_PATTERN.
> 
> References: HSD#1939868
> 
>  v2:
>    - More descriptive changelog and comments.
>    - Fix math when counting dwords.
> 
> Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 51e61b04a555..46c130d106d7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1366,6 +1366,36 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
>         return batch;
>  }
>  
> +static u32 *gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
> +{
> +       int i;
> +
> +       /*
> +        * WaPipeControlBefore3DStateSamplePattern: cnl
> +        *
> +        * Ensure the engine is idle prior to programming a
> +        * 3DSTATE_SAMPLE_PATTERN during a context restore.
> +        */
> +       batch = gen8_emit_pipe_control(batch,
> +                                      PIPE_CONTROL_CS_STALL,
> +                                      0);
> +       /*
> +        * WaPipeControlBefore3DStateSamplePattern says we need 4 dwords for
> +        * the PIPE_CONTROL followed by 12 dwords of 0x0, so 16 dwords in
> +        * total. Since gen8_emit_pipe_control() already advances the batch by
> +        * 6 dwords, we advance the other 10 here.
> +        */
> +       for (i = 0; i < 10; i++)
> +               *batch++ = MI_NOOP;

If the spec says emit 12 nops, emit 12 nops. Otherwise if we add another
w/a here it may break this magic.

I am not going to ask what the magic nops do, it just gives a 72 cycle
delay in the CS parser. My guess is that they are trying to isolate a
cacheline, in which case they are expecting the pipecontrol to be
cacheline aligned. (I would get that clarified and add an assert to
document such a requirement.) Or they just mean that the indirect ctx is
cacheline aligned and the nops are just the normal padding and nothing
to do with the w/a.

s/10/12/ here and tweak the comment to remove the assumed cacheline
tail, but please see if someone can clarify if the nops are indeed part
of the w/a or just the normal filler.
With s/10/12/,
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Rafael Antognolli Jan. 26, 2018, 11:41 p.m. UTC | #2
On Fri, Jan 26, 2018 at 11:17:01PM +0000, Chris Wilson wrote:
> Quoting Rafael Antognolli (2018-01-26 23:05:24)
> > This workaround should prevent a bug that can be hit on a context
> > restore. To avoid the issue, we must emit a PIPE_CONTROL with CS stall
> > (0x7a000004 0x00100000 0x00000000 0x00000000) followed by 12DW's of
> > NOOP(0x0) in the indirect context batch buffer, to ensure the engine is
> > idle prior to programming 3DSTATE_SAMPLE_PATTERN.
> > 
> > References: HSD#1939868
> > 
> >  v2:
> >    - More descriptive changelog and comments.
> >    - Fix math when counting dwords.
> > 
> > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c | 34 +++++++++++++++++++++++++++++++++-
> >  1 file changed, 33 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 51e61b04a555..46c130d106d7 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -1366,6 +1366,36 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
> >         return batch;
> >  }
> >  
> > +static u32 *gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
> > +{
> > +       int i;
> > +
> > +       /*
> > +        * WaPipeControlBefore3DStateSamplePattern: cnl
> > +        *
> > +        * Ensure the engine is idle prior to programming a
> > +        * 3DSTATE_SAMPLE_PATTERN during a context restore.
> > +        */
> > +       batch = gen8_emit_pipe_control(batch,
> > +                                      PIPE_CONTROL_CS_STALL,
> > +                                      0);
> > +       /*
> > +        * WaPipeControlBefore3DStateSamplePattern says we need 4 dwords for
> > +        * the PIPE_CONTROL followed by 12 dwords of 0x0, so 16 dwords in
> > +        * total. Since gen8_emit_pipe_control() already advances the batch by
> > +        * 6 dwords, we advance the other 10 here.
> > +        */
> > +       for (i = 0; i < 10; i++)
> > +               *batch++ = MI_NOOP;
> 
> If the spec says emit 12 nops, emit 12 nops. Otherwise if we add another
> w/a here it may break this magic.

So, in the bspec, there's the following text:

"The address above needs to be a GGTT address and contain a pipe control
with CS stall(0x7a000004 0x00100000 0x00000000 0x00000000)followed by
12DW’s of NOOP(0x00000000)"

while on an internal ticket pointed by it, it is:

"The address above needs to be a GGTT address.. It would contain the
following value: 7A000004 00100000 followed by 14 DW’s of zero."

That's why I think it is really 16 DW's in total. And
gen8_emit_pipe_control() already emits 6 of them. How are we supposed to
emit 12 after that?

I don't know if they expect the pipe control to be cacheline aligned,
but I can ask and try to find out.

> I am not going to ask what the magic nops do, it just gives a 72 cycle
> delay in the CS parser. My guess is that they are trying to isolate a
> cacheline, in which case they are expecting the pipecontrol to be
> cacheline aligned. (I would get that clarified and add an assert to
> document such a requirement.) Or they just mean that the indirect ctx is
> cacheline aligned and the nops are just the normal padding and nothing
> to do with the w/a.
> 
> s/10/12/ here and tweak the comment to remove the assumed cacheline
> tail, but please see if someone can clarify if the nops are indeed part
> of the w/a or just the normal filler.
> With s/10/12/,
> Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 51e61b04a555..46c130d106d7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1366,6 +1366,36 @@  static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 	return batch;
 }
 
+static u32 *gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
+{
+	int i;
+
+	/*
+	 * WaPipeControlBefore3DStateSamplePattern: cnl
+	 *
+	 * Ensure the engine is idle prior to programming a
+	 * 3DSTATE_SAMPLE_PATTERN during a context restore.
+	 */
+	batch = gen8_emit_pipe_control(batch,
+				       PIPE_CONTROL_CS_STALL,
+				       0);
+	/*
+	 * WaPipeControlBefore3DStateSamplePattern says we need 4 dwords for
+	 * the PIPE_CONTROL followed by 12 dwords of 0x0, so 16 dwords in
+	 * total. Since gen8_emit_pipe_control() already advances the batch by
+	 * 6 dwords, we advance the other 10 here.
+	 */
+	for (i = 0; i < 10; i++)
+		*batch++ = MI_NOOP;
+
+	/* Pad to end of cacheline */
+	while ((unsigned long)batch % CACHELINE_BYTES)
+		*batch++ = MI_NOOP;
+
+	return batch;
+}
+
+
 #define CTX_WA_BB_OBJ_SIZE (PAGE_SIZE)
 
 static int lrc_setup_wa_ctx(struct intel_engine_cs *engine)
@@ -1419,7 +1449,9 @@  static int intel_init_workaround_bb(struct intel_engine_cs *engine)
 
 	switch (INTEL_GEN(engine->i915)) {
 	case 10:
-		return 0;
+		wa_bb_fn[0] = gen10_init_indirectctx_bb;
+		wa_bb_fn[1] = NULL;
+		break;
 	case 9:
 		wa_bb_fn[0] = gen9_init_indirectctx_bb;
 		wa_bb_fn[1] = NULL;