diff mbox series

drm/i915: Fix indirect context size calculation

Message ID 20200414122000.19353-1-mika.kuoppala@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Fix indirect context size calculation | expand

Commit Message

Mika Kuoppala April 14, 2020, 12:20 p.m. UTC
Hardware needs cacheline count for indirect context size.
Count of zero means that the feature is disabled.
If we only divide size with cacheline bytes, we get
one cacheline short of execution.

Divide by rounding up to a cacheline size so that
hardware executes everything intended.

Bspec: 11739
Fixes: 17ee950df38b ("drm/i915/gen8: Add infrastructure to initialize WA batch buffers")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Mika Kuoppala April 14, 2020, 1:51 p.m. UTC | #1
Mika Kuoppala <mika.kuoppala@linux.intel.com> writes:

> Hardware needs cacheline count for indirect context size.
> Count of zero means that the feature is disabled.
> If we only divide size with cacheline bytes, we get
> one cacheline short of execution.
>
> Divide by rounding up to a cacheline size so that
> hardware executes everything intended.
>
> Bspec: 11739
> Fixes: 17ee950df38b ("drm/i915/gen8: Add infrastructure to initialize WA batch buffers")
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 6fbad5e2343f..acbb36ad17ff 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -4739,7 +4739,8 @@ static void init_wa_bb_reg_state(u32 * const regs,
>  
>  		regs[pos_bb_per_ctx + 2] =
>  			(ggtt_offset + wa_ctx->indirect_ctx.offset) |
> -			(wa_ctx->indirect_ctx.size / CACHELINE_BYTES);
> +			DIV_ROUND_UP(wa_ctx->indirect_ctx.size,
> +				     CACHELINE_BYTES);

The aligment to cacheline is checked on the emitting phase.

This patch can be ignored.
-Mika


>  
>  		regs[pos_bb_per_ctx + 4] =
>  			intel_lr_indirect_ctx_offset(engine) << 6;
> -- 
> 2.17.1
Chris Wilson April 14, 2020, 2:38 p.m. UTC | #2
Quoting Mika Kuoppala (2020-04-14 13:20:00)
> Hardware needs cacheline count for indirect context size.
> Count of zero means that the feature is disabled.
> If we only divide size with cacheline bytes, we get
> one cacheline short of execution.

I thought we only emitted cacheline chunks by design?

I see us checking for
GEM_DEBUG_WARN_ON(!IS_ALIGNED(wa_bb[i]->offset, CACHELINE_BYTES)))
so what's the reason? I expect that's in the next patch.
-Chris
Chris Wilson April 14, 2020, 2:39 p.m. UTC | #3
Quoting Mika Kuoppala (2020-04-14 14:51:32)
> Mika Kuoppala <mika.kuoppala@linux.intel.com> writes:
> 
> > Hardware needs cacheline count for indirect context size.
> > Count of zero means that the feature is disabled.
> > If we only divide size with cacheline bytes, we get
> > one cacheline short of execution.
> >
> > Divide by rounding up to a cacheline size so that
> > hardware executes everything intended.
> >
> > Bspec: 11739
> > Fixes: 17ee950df38b ("drm/i915/gen8: Add infrastructure to initialize WA batch buffers")
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 6fbad5e2343f..acbb36ad17ff 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -4739,7 +4739,8 @@ static void init_wa_bb_reg_state(u32 * const regs,
> >  
> >               regs[pos_bb_per_ctx + 2] =
> >                       (ggtt_offset + wa_ctx->indirect_ctx.offset) |
> > -                     (wa_ctx->indirect_ctx.size / CACHELINE_BYTES);
> > +                     DIV_ROUND_UP(wa_ctx->indirect_ctx.size,
> > +                                  CACHELINE_BYTES);
> 
> The aligment to cacheline is checked on the emitting phase.

My headache is screwing with my latency. I see I am superfluous and
should just call it a day.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 6fbad5e2343f..acbb36ad17ff 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -4739,7 +4739,8 @@  static void init_wa_bb_reg_state(u32 * const regs,
 
 		regs[pos_bb_per_ctx + 2] =
 			(ggtt_offset + wa_ctx->indirect_ctx.offset) |
-			(wa_ctx->indirect_ctx.size / CACHELINE_BYTES);
+			DIV_ROUND_UP(wa_ctx->indirect_ctx.size,
+				     CACHELINE_BYTES);
 
 		regs[pos_bb_per_ctx + 4] =
 			intel_lr_indirect_ctx_offset(engine) << 6;