diff mbox

[v3,1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

Message ID 1433500446-26929-2-git-send-email-arun.siluvery@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

arun.siluvery@linux.intel.com June 5, 2015, 10:34 a.m. UTC
Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.
This version uses this approach.
(Thanks to Chris, Dave and Thomas for their inputs)

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 168 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |   9 ++
 2 files changed, 174 insertions(+), 3 deletions(-)

Comments

Chris Wilson June 5, 2015, 10:56 a.m. UTC | #1
On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:
> Some of the WA are to be applied during context save but before restore and
> some at the end of context save/restore but before executing the instructions
> in the ring, WA batch buffers are created for this purpose and these WA cannot
> be applied using normal means. Each context has two registers to load the
> offsets of these batch buffers. If they are non-zero, HW understands that it
> need to execute these batches.
> 
> v1: In this version two separate ring_buffer objects were used to load WA
> instructions for indirect and per context batch buffers and they were part
> of every context.
> 
> v2: Chris suggested to include additional page in context and use it to load
> these WA instead of creating separate objects. This will simplify lot of things
> as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
> is planning to use a similar setup to share data between GuC and driver and
> WA batch buffers can probably share that page. However after discussions with
> Dave who is implementing GuC changes, he suggested to use an independent page
> for the reasons - GuC area might grow and these WA are initialized only once and
> are not changed afterwards so we can share them share across all contexts.
> This version uses this approach.
> (Thanks to Chris, Dave and Thomas for their inputs)

Having moved to a shared wa_ctx for all lrc, I think it makes sense to
then do the allocation during ring_init itself, next to the scratch/hws
status pages. The advantage is that we don't then need to add more
special cases to the default ctx on RCS, and its permanence is far more
prominent. It will also be more consistent with calling it ring->wa_ctx.

Since you have it already plumbed into ring init/fini, why is it partly
done during default ctx init? Maybe all that is required a little bit of
code and changelog explanation.
-Chris
Chris Wilson June 5, 2015, 11 a.m. UTC | #2
On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:
> +	/* FIXME: fill unused locations with NOOPs.
> +	 * Replace these instructions with WA
> +	 */
> +        while (index < end)
> +		reg_state[index++] = MI_NOOP;

I found calling it reg_state was very confusing. Maybe batch, bb, data or cmd?
-Chris
arun.siluvery@linux.intel.com June 5, 2015, 11:24 a.m. UTC | #3
On 05/06/2015 11:56, Chris Wilson wrote:
> On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:
>> Some of the WA are to be applied during context save but before restore and
>> some at the end of context save/restore but before executing the instructions
>> in the ring, WA batch buffers are created for this purpose and these WA cannot
>> be applied using normal means. Each context has two registers to load the
>> offsets of these batch buffers. If they are non-zero, HW understands that it
>> need to execute these batches.
>>
>> v1: In this version two separate ring_buffer objects were used to load WA
>> instructions for indirect and per context batch buffers and they were part
>> of every context.
>>
>> v2: Chris suggested to include additional page in context and use it to load
>> these WA instead of creating separate objects. This will simplify lot of things
>> as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
>> is planning to use a similar setup to share data between GuC and driver and
>> WA batch buffers can probably share that page. However after discussions with
>> Dave who is implementing GuC changes, he suggested to use an independent page
>> for the reasons - GuC area might grow and these WA are initialized only once and
>> are not changed afterwards so we can share them share across all contexts.
>> This version uses this approach.
>> (Thanks to Chris, Dave and Thomas for their inputs)
>
> Having moved to a shared wa_ctx for all lrc, I think it makes sense to
> then do the allocation during ring_init itself, next to the scratch/hws
> status pages. The advantage is that we don't then need to add more
> special cases to the default ctx on RCS, and its permanence is far more
> prominent. It will also be more consistent with calling it ring->wa_ctx.
>
> Since you have it already plumbed into ring init/fini, why is it partly
> done during default ctx init? Maybe all that is required a little bit of
> code and changelog explanation.
> -Chris
>
ok, it is possible to do the allocation and setup in logical_ring_init() 
itself. I wanted to group it with other wa which are setup in 
init_context().

I will also change s/reg_state/cmd.

regards
Arun
Chris Wilson June 5, 2015, 11:36 a.m. UTC | #4
On Fri, Jun 05, 2015 at 12:24:58PM +0100, Siluvery, Arun wrote:
> ok, it is possible to do the allocation and setup in
> logical_ring_init() itself. I wanted to group it with other wa which
> are setup in init_context().

Phew, I had worried I had missed something. The issue the current split
between ring init and default context init raises in my mind is that the
content has context dependencies upon it - whereas the w/a so far can be
reused globally. So imo the current split is more confusing than just
creating the w/a buffer entirely during ring init.
-Chris
arun.siluvery@linux.intel.com June 5, 2015, 11:56 a.m. UTC | #5
On 05/06/2015 12:36, Chris Wilson wrote:
> On Fri, Jun 05, 2015 at 12:24:58PM +0100, Siluvery, Arun wrote:
>> ok, it is possible to do the allocation and setup in
>> logical_ring_init() itself. I wanted to group it with other wa which
>> are setup in init_context().
>
> Phew, I had worried I had missed something. The issue the current split
> between ring init and default context init raises in my mind is that the
> content has context dependencies upon it - whereas the w/a so far can be
> reused globally. So imo the current split is more confusing than just
> creating the w/a buffer entirely during ring init.
> -Chris
>
I am moving it to logical_render_ring_init() as these are only 
applicable for RCS.

regards
Arun
Daniel Vetter June 15, 2015, 3:22 p.m. UTC | #6
On Fri, Jun 05, 2015 at 12:00:54PM +0100, Chris Wilson wrote:
> On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:
> > +	/* FIXME: fill unused locations with NOOPs.
> > +	 * Replace these instructions with WA
> > +	 */
> > +        while (index < end)
> > +		reg_state[index++] = MI_NOOP;
> 
> I found calling it reg_state was very confusing. Maybe batch, bb, data or cmd?

Concurred, reg_state sounds like an mmio dump not a batchbuffer. wa_batch
would be my naming bikeshed, but I'd go with either. If this is all that
needs changing I can do that while applying patches.
-Daniel
arun.siluvery@linux.intel.com June 15, 2015, 3:23 p.m. UTC | #7
On 15/06/2015 16:22, Daniel Vetter wrote:
> On Fri, Jun 05, 2015 at 12:00:54PM +0100, Chris Wilson wrote:
>> On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:
>>> +	/* FIXME: fill unused locations with NOOPs.
>>> +	 * Replace these instructions with WA
>>> +	 */
>>> +        while (index < end)
>>> +		reg_state[index++] = MI_NOOP;
>>
>> I found calling it reg_state was very confusing. Maybe batch, bb, data or cmd?
>
> Concurred, reg_state sounds like an mmio dump not a batchbuffer. wa_batch
> would be my naming bikeshed, but I'd go with either. If this is all that
> needs changing I can do that while applying patches.
> -Daniel
>
I have already changed this to cmd.
There are some more comments from Dave which I am addressing now, I will 
send them soon.

regards
Arun
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0413b8f..0b3422a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -211,9 +211,11 @@  enum {
 	FAULT_AND_CONTINUE /* Unsupported */
 };
 #define GEN8_CTX_ID_SHIFT 32
+#define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17
 
 static int intel_lr_context_pin(struct intel_engine_cs *ring,
 		struct intel_context *ctx);
+static void lrc_destroy_ctx_wa_obj(struct intel_engine_cs *ring);
 
 /**
  * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists
@@ -1077,6 +1079,96 @@  static int intel_logical_ring_workarounds_emit(struct intel_engine_cs *ring,
 	return 0;
 }
 
+static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
+{
+	int index;
+	int end;
+	struct page *page;
+	uint32_t *reg_state;
+
+	page = i915_gem_object_get_page(ring->ctx_wa.obj, 0);
+	reg_state = kmap_atomic(page);
+
+	index = ring->ctx_wa.indctx_batch_offset / sizeof(uint32_t);
+	end = index + (ring->ctx_wa.indctx_batch_size *
+		       CACHELINE_BYTES) / sizeof(uint32_t);
+
+	if ((end * sizeof(uint32_t)) > PAGE_SIZE) {
+		DRM_ERROR("context WA instruction exceeding alloted size\n");
+		kunmap_atomic(reg_state);
+		return -EINVAL;
+	}
+
+	/* FIXME: fill unused locations with NOOPs.
+	 * Replace these instructions with WA
+	 */
+        while (index < end)
+		reg_state[index++] = MI_NOOP;
+
+	/*
+	 * MI_BATCH_BUFFER_END is not required in Indirect ctx BB because
+	 * execution depends on the length specified in terms of cache lines
+	 * in the register CTX_RCS_INDIRECT_CTX
+	 */
+
+	kunmap_atomic(reg_state);
+
+	return 0;
+}
+
+static int gen8_init_perctx_bb(struct intel_engine_cs *ring)
+{
+	int index;
+	int end;
+	struct page *page;
+	uint32_t *reg_state;
+
+	page = i915_gem_object_get_page(ring->ctx_wa.obj, 0);
+	reg_state = kmap_atomic(page);
+
+	index = ring->ctx_wa.perctx_batch_offset / sizeof(uint32_t);
+	end = index + (ring->ctx_wa.perctx_batch_size *
+		       CACHELINE_BYTES) / sizeof(uint32_t);
+
+	if ((end * sizeof(uint32_t)) > PAGE_SIZE) {
+		DRM_ERROR("context WA instruction exceeding alloted size\n");
+		kunmap_atomic(reg_state);
+		return -EINVAL;
+	}
+
+	/* FIXME: fill unused locations with NOOPs.
+	 * Replace these instructions with WA
+	 */
+        while (index < end)
+		reg_state[index++] = MI_NOOP;
+
+	reg_state[index - 1] = MI_BATCH_BUFFER_END;
+	kunmap_atomic(reg_state);
+
+	return 0;
+}
+
+static int intel_init_workaround_bb(struct intel_engine_cs *ring)
+{
+	int ret;
+	struct drm_device *dev = ring->dev;
+
+	if (IS_GEN8(dev)) {
+		ret = gen8_init_indirectctx_bb(ring);
+		if (ret)
+			return ret;
+
+		ret = gen8_init_perctx_bb(ring);
+		if (ret)
+			return ret;
+	} else {
+		WARN_ONCE(INTEL_INFO(ring->dev)->gen >= 9,
+			"WA batch buffer is not initialized\n");
+	}
+
+	return 0;
+}
+
 static int gen8_init_common_ring(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
@@ -1754,15 +1846,25 @@  populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
 	reg_state[CTX_SECOND_BB_STATE+1] = 0;
 	if (ring->id == RCS) {
-		/* TODO: according to BSpec, the register state context
-		 * for CHV does not have these. OTOH, these registers do
-		 * exist in CHV. I'm waiting for a clarification */
 		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
 		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
 		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
 		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
 		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
 		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
+		if (ring->ctx_wa.obj) {
+			reg_state[CTX_RCS_INDIRECT_CTX+1] =
+				(i915_gem_obj_ggtt_offset(ring->ctx_wa.obj) +
+				 ring->ctx_wa.indctx_batch_offset) |
+				ring->ctx_wa.indctx_batch_size;
+
+			reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] =
+				CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT << 6;
+
+			reg_state[CTX_BB_PER_CTX_PTR+1] =
+				(i915_gem_obj_ggtt_offset(ring->ctx_wa.obj) +
+				 ring->ctx_wa.perctx_batch_offset) | 0x01;
+		}
 	}
 	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
 	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
@@ -1822,6 +1924,8 @@  void intel_lr_context_free(struct intel_context *ctx)
 			if (ctx == ring->default_context) {
 				intel_unpin_ringbuffer_obj(ringbuf);
 				i915_gem_object_ggtt_unpin(ctx_obj);
+				if (ring->id == RCS)
+					lrc_destroy_ctx_wa_obj(ring);
 			}
 			WARN_ON(ctx->engine[ring->id].pin_count);
 			intel_destroy_ringbuffer_obj(ringbuf);
@@ -1872,6 +1976,46 @@  static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
 	POSTING_READ(RING_HWS_PGA(ring->mmio_base));
 }
 
+static int lrc_setup_ctx_wa_obj(struct intel_engine_cs *ring, u32 size)
+{
+	int ret;
+	struct drm_device *dev = ring->dev;
+
+	WARN_ON(ring->id != RCS);
+
+	size = roundup(size, PAGE_SIZE);
+	ring->ctx_wa.obj = i915_gem_alloc_object(dev, size);
+	if (!ring->ctx_wa.obj) {
+		DRM_DEBUG_DRIVER("Alloc LRC Ctx WA backing obj failed.\n");
+		return -ENOMEM;
+	}
+
+	ret = i915_gem_obj_ggtt_pin(ring->ctx_wa.obj, GEN8_LR_CONTEXT_ALIGN, 0);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Pin LRC Ctx WA backing obj failed: %d\n",
+				 ret);
+		drm_gem_object_unreference(&ring->ctx_wa.obj->base);
+		return ret;
+	}
+
+	ring->ctx_wa.indctx_batch_offset = 0;
+	ring->ctx_wa.indctx_batch_size = 4; /* in cache lines */
+	ring->ctx_wa.perctx_batch_offset =
+		ring->ctx_wa.indctx_batch_size * CACHELINE_BYTES;
+	ring->ctx_wa.perctx_batch_size = 2;
+
+	return 0;
+}
+
+static void lrc_destroy_ctx_wa_obj(struct intel_engine_cs *ring)
+{
+	WARN_ON(ring->id != RCS);
+
+	i915_gem_object_ggtt_unpin(ring->ctx_wa.obj);
+	drm_gem_object_unreference(&ring->ctx_wa.obj->base);
+	ring->ctx_wa.obj = NULL;
+}
+
 /**
  * intel_lr_context_deferred_create() - create the LRC specific bits of a context
  * @ctx: LR context to create.
@@ -1954,6 +2098,22 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 	}
 
+	if (ring->id == RCS && is_global_default_ctx) {
+		ret = lrc_setup_ctx_wa_obj(ring, PAGE_SIZE);
+		if (ret) {
+			DRM_DEBUG_DRIVER(
+				"Failed to setup context WA page: %d\n", ret);
+			goto error;
+		}
+
+		ret = intel_init_workaround_bb(ring);
+		if (ret) {
+			lrc_destroy_ctx_wa_obj(ring);
+			DRM_ERROR("WA batch buffers are not initialized: %d\n",
+				  ret);
+		}
+	}
+
 	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
@@ -1982,6 +2142,8 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 	return 0;
 
 error:
+	if (ring->id == RCS && is_global_default_ctx)
+		lrc_destroy_ctx_wa_obj(ring);
 	if (is_global_default_ctx)
 		intel_unpin_ringbuffer_obj(ringbuf);
 error_destroy_rbuf:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 39f6dfc..61c1402 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -119,6 +119,14 @@  struct intel_ringbuffer {
 
 struct	intel_context;
 
+struct  i915_ctx_workarounds {
+	u32 indctx_batch_offset;
+	u32 indctx_batch_size;
+	u32 perctx_batch_offset;
+	u32 perctx_batch_size;
+	struct drm_i915_gem_object *obj;
+};
+
 struct  intel_engine_cs {
 	const char	*name;
 	enum intel_ring_id {
@@ -142,6 +150,7 @@  struct  intel_engine_cs {
 	struct i915_gem_batch_pool batch_pool;
 
 	struct intel_hw_status_page status_page;
+	struct i915_ctx_workarounds ctx_wa;
 
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */