Message ID | 20200414122000.19353-1-mika.kuoppala@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Fix indirect context size calculation | expand |
Mika Kuoppala <mika.kuoppala@linux.intel.com> writes: > Hardware needs cacheline count for indirect context size. > Count of zero means that the feature is disabled. > If we only divide size with cacheline bytes, we get > one cacheline short of execution. > > Divide by rounding up to a cacheline size so that > hardware executes everything intended. > > Bspec: 11739 > Fixes: 17ee950df38b ("drm/i915/gen8: Add infrastructure to initialize WA batch buffers") > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > --- > drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > index 6fbad5e2343f..acbb36ad17ff 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > @@ -4739,7 +4739,8 @@ static void init_wa_bb_reg_state(u32 * const regs, > > regs[pos_bb_per_ctx + 2] = > (ggtt_offset + wa_ctx->indirect_ctx.offset) | > - (wa_ctx->indirect_ctx.size / CACHELINE_BYTES); > + DIV_ROUND_UP(wa_ctx->indirect_ctx.size, > + CACHELINE_BYTES); The aligment to cacheline is checked on the emitting phase. This patch can be ignored. -Mika > > regs[pos_bb_per_ctx + 4] = > intel_lr_indirect_ctx_offset(engine) << 6; > -- > 2.17.1
Quoting Mika Kuoppala (2020-04-14 13:20:00) > Hardware needs cacheline count for indirect context size. > Count of zero means that the feature is disabled. > If we only divide size with cacheline bytes, we get > one cacheline short of execution. I thought we only emitted cacheline chunks by design? I see us checking for GEM_DEBUG_WARN_ON(!IS_ALIGNED(wa_bb[i]->offset, CACHELINE_BYTES))) so what's the reason? I expect that's in the next patch. -Chris
Quoting Mika Kuoppala (2020-04-14 14:51:32) > Mika Kuoppala <mika.kuoppala@linux.intel.com> writes: > > > Hardware needs cacheline count for indirect context size. > > Count of zero means that the feature is disabled. > > If we only divide size with cacheline bytes, we get > > one cacheline short of execution. > > > > Divide by rounding up to a cacheline size so that > > hardware executes everything intended. > > > > Bspec: 11739 > > Fixes: 17ee950df38b ("drm/i915/gen8: Add infrastructure to initialize WA batch buffers") > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > --- > > drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > > index 6fbad5e2343f..acbb36ad17ff 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > > @@ -4739,7 +4739,8 @@ static void init_wa_bb_reg_state(u32 * const regs, > > > > regs[pos_bb_per_ctx + 2] = > > (ggtt_offset + wa_ctx->indirect_ctx.offset) | > > - (wa_ctx->indirect_ctx.size / CACHELINE_BYTES); > > + DIV_ROUND_UP(wa_ctx->indirect_ctx.size, > > + CACHELINE_BYTES); > > The aligment to cacheline is checked on the emitting phase. My headache is screwing with my latency. I see I am superfluous and should just call it a day. -Chris
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 6fbad5e2343f..acbb36ad17ff 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -4739,7 +4739,8 @@ static void init_wa_bb_reg_state(u32 * const regs, regs[pos_bb_per_ctx + 2] = (ggtt_offset + wa_ctx->indirect_ctx.offset) | - (wa_ctx->indirect_ctx.size / CACHELINE_BYTES); + DIV_ROUND_UP(wa_ctx->indirect_ctx.size, + CACHELINE_BYTES); regs[pos_bb_per_ctx + 4] = intel_lr_indirect_ctx_offset(engine) << 6;
Hardware needs cacheline count for indirect context size. Count of zero means that the feature is disabled. If we only divide size with cacheline bytes, we get one cacheline short of execution. Divide by rounding up to a cacheline size so that hardware executes everything intended. Bspec: 11739 Fixes: 17ee950df38b ("drm/i915/gen8: Add infrastructure to initialize WA batch buffers") Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> --- drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)