diff mbox

[V2] drm/i915: Use I915_MAP_WC for execlists context buffer on the platforms without LLC

Message ID 1529647750-25008-1-git-send-email-yakui.zhao@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhao, Yakui June 22, 2018, 6:09 a.m. UTC
Under execlists mode the context buffer is allocated in global Gtt region.
The I915_MAP_WB type is used to map the buffer so that the driver can
initialize the context buffer.(Ring reg, Context Ctrl reg and so on).
And then __context_pin is called to flush back corresponding contents.
In fact as it also tries to update context buffer (Ring Tail offset)
before writing the ELSP port, it has no explicit cache flsuh.Maybe it is
handled by HW. But this is quite confusing as BXT has no LLC. So the WC
is used to map the context buffer on the platform without LLC and the
update of context buffer is writen into phys page directly. It will
be safer.

V1->V2: Remove the dirty flag of execlists state buffer and one minor
typo in commit log

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Chris Wilson June 22, 2018, 6:26 a.m. UTC | #1
Quoting Zhao Yakui (2018-06-22 07:09:10)
> Under execlists mode the context buffer is allocated in global Gtt region.
> The I915_MAP_WB type is used to map the buffer so that the driver can
> initialize the context buffer.(Ring reg, Context Ctrl reg and so on).
> And then __context_pin is called to flush back corresponding contents.
> In fact as it also tries to update context buffer (Ring Tail offset)
> before writing the ELSP port, it has no explicit cache flsuh.Maybe it is
> handled by HW. But this is quite confusing as BXT has no LLC. So the WC
> is used to map the context buffer on the platform without LLC and the
> update of context buffer is writen into phys page directly. It will
> be safer.
> 
> V1->V2: Remove the dirty flag of execlists state buffer and one minor
> typo in commit log

The object's pages are still dirty, so why? It's not about CPU cache
dirt, here it is about whether the pages differ from any potential
swapcache.

I was anticipating there would be some type conflict with
engine->pinned_default_state, but that just happens to work out
correctly... so long as there is always a retirement during load and we
park before any reset. Hmm.
-Chris
Chris Wilson June 22, 2018, 6:36 a.m. UTC | #2
Quoting Zhao Yakui (2018-06-22 07:09:10)
> @@ -2728,6 +2729,7 @@ populate_lr_context(struct i915_gem_context *ctx,
>                     struct intel_engine_cs *engine,
>                     struct intel_ring *ring)
>  {
> +       enum i915_map_type map = HAS_LLC(ctx->i915) ? I915_MAP_WB : I915_MAP_WC;
>         void *vaddr;
>         u32 *regs;
>         int ret;
> @@ -2738,13 +2740,12 @@ populate_lr_context(struct i915_gem_context *ctx,
>                 return ret;
>         }
>  
> -       vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WB);
> +       vaddr = i915_gem_object_pin_map(ctx_obj, map);

As this uses the cpu domain and flushed afterwards, this one is correct
in its usage of MAP_WB.
-Chris
Zhao, Yakui June 22, 2018, 7:29 a.m. UTC | #3
>-----Original Message-----

>From: Chris Wilson [mailto:chris@chris-wilson.co.uk]

>Sent: Friday, June 22, 2018 2:36 PM

>To: Zhao, Yakui <yakui.zhao@intel.com>; intel-gfx@lists.freedesktop.org

>Cc: Zhao, Yakui <yakui.zhao@intel.com>

>Subject: Re: [PATCH V2] drm/i915: Use I915_MAP_WC for execlists context

>buffer on the platforms without LLC

>

>Quoting Zhao Yakui (2018-06-22 07:09:10)

>> @@ -2728,6 +2729,7 @@ populate_lr_context(struct i915_gem_context *ctx,

>>                     struct intel_engine_cs *engine,

>>                     struct intel_ring *ring)  {

>> +       enum i915_map_type map = HAS_LLC(ctx->i915) ? I915_MAP_WB :

>> + I915_MAP_WC;

>>         void *vaddr;

>>         u32 *regs;

>>         int ret;

>> @@ -2738,13 +2740,12 @@ populate_lr_context(struct i915_gem_context

>*ctx,

>>                 return ret;

>>         }

>>

>> -       vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WB);

>> +       vaddr = i915_gem_object_pin_map(ctx_obj, map);

>

>As this uses the cpu domain and flushed afterwards, this one is correct in its

>usage of MAP_WB.


In this function the content of context state is flushed.

But the function of execlists_submit_ports will update it again before writing the ELSP port.
And there is no flush. In fact after the ELSP port is written, the HW will start to execute the submitted commands.


>-Chris
Zhao, Yakui June 22, 2018, 7:34 a.m. UTC | #4
>-----Original Message-----

>From: Chris Wilson [mailto:chris@chris-wilson.co.uk]

>Sent: Friday, June 22, 2018 2:26 PM

>To: Zhao, Yakui <yakui.zhao@intel.com>; intel-gfx@lists.freedesktop.org

>Cc: Zhao, Yakui <yakui.zhao@intel.com>

>Subject: Re: [PATCH V2] drm/i915: Use I915_MAP_WC for execlists context

>buffer on the platforms without LLC

>

>Quoting Zhao Yakui (2018-06-22 07:09:10)

>> Under execlists mode the context buffer is allocated in global Gtt region.

>> The I915_MAP_WB type is used to map the buffer so that the driver can

>> initialize the context buffer.(Ring reg, Context Ctrl reg and so on).

>> And then __context_pin is called to flush back corresponding contents.

>> In fact as it also tries to update context buffer (Ring Tail offset)

>> before writing the ELSP port, it has no explicit cache flsuh.Maybe it

>> is handled by HW. But this is quite confusing as BXT has no LLC. So

>> the WC is used to map the context buffer on the platform without LLC

>> and the update of context buffer is writen into phys page directly. It

>> will be safer.

>>

>> V1->V2: Remove the dirty flag of execlists state buffer and one minor

>> typo in commit log

>

>The object's pages are still dirty, so why? It's not about CPU cache dirt, here it

>is about whether the pages differ from any potential swapcache.

>


Based on the test it seems that this patch still has some problems. More works are needed in order to change the MAP type.
Maybe this buffer should be handled like intel_ring biffer.
I will check it later.

>I was anticipating there would be some type conflict with

>engine->pinned_default_state, but that just happens to work out

>correctly... so long as there is always a retirement during load and we park

>before any reset. Hmm.

>-Chris
Chris Wilson June 22, 2018, 7:36 a.m. UTC | #5
Quoting Zhao, Yakui (2018-06-22 08:29:15)
> 
> 
> >-----Original Message-----
> >From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> >Sent: Friday, June 22, 2018 2:36 PM
> >To: Zhao, Yakui <yakui.zhao@intel.com>; intel-gfx@lists.freedesktop.org
> >Cc: Zhao, Yakui <yakui.zhao@intel.com>
> >Subject: Re: [PATCH V2] drm/i915: Use I915_MAP_WC for execlists context
> >buffer on the platforms without LLC
> >
> >Quoting Zhao Yakui (2018-06-22 07:09:10)
> >> @@ -2728,6 +2729,7 @@ populate_lr_context(struct i915_gem_context *ctx,
> >>                     struct intel_engine_cs *engine,
> >>                     struct intel_ring *ring)  {
> >> +       enum i915_map_type map = HAS_LLC(ctx->i915) ? I915_MAP_WB :
> >> + I915_MAP_WC;
> >>         void *vaddr;
> >>         u32 *regs;
> >>         int ret;
> >> @@ -2738,13 +2740,12 @@ populate_lr_context(struct i915_gem_context
> >*ctx,
> >>                 return ret;
> >>         }
> >>
> >> -       vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WB);
> >> +       vaddr = i915_gem_object_pin_map(ctx_obj, map);
> >
> >As this uses the cpu domain and flushed afterwards, this one is correct in its
> >usage of MAP_WB.
> 
> In this function the content of context state is flushed.
> 
> But the function of execlists_submit_ports will update it again before writing the ELSP port.
> And there is no flush. In fact after the ELSP port is written, the HW will start to execute the submitted commands.

That's a different map.
-Chris
Zhao, Yakui June 22, 2018, 8:06 a.m. UTC | #6
>-----Original Message-----

>From: Chris Wilson [mailto:chris@chris-wilson.co.uk]

>Sent: Friday, June 22, 2018 3:37 PM

>To: Zhao, Yakui <yakui.zhao@intel.com>; intel-gfx@lists.freedesktop.org

>Subject: RE: [PATCH V2] drm/i915: Use I915_MAP_WC for execlists context

>buffer on the platforms without LLC

>

>Quoting Zhao, Yakui (2018-06-22 08:29:15)

>>

>>

>> >-----Original Message-----

>> >From: Chris Wilson [mailto:chris@chris-wilson.co.uk]

>> >Sent: Friday, June 22, 2018 2:36 PM

>> >To: Zhao, Yakui <yakui.zhao@intel.com>;

>> >intel-gfx@lists.freedesktop.org

>> >Cc: Zhao, Yakui <yakui.zhao@intel.com>

>> >Subject: Re: [PATCH V2] drm/i915: Use I915_MAP_WC for execlists

>> >context buffer on the platforms without LLC

>> >

>> >Quoting Zhao Yakui (2018-06-22 07:09:10)

>> >> @@ -2728,6 +2729,7 @@ populate_lr_context(struct i915_gem_context

>*ctx,

>> >>                     struct intel_engine_cs *engine,

>> >>                     struct intel_ring *ring)  {

>> >> +       enum i915_map_type map = HAS_LLC(ctx->i915) ? I915_MAP_WB :

>> >> + I915_MAP_WC;

>> >>         void *vaddr;

>> >>         u32 *regs;

>> >>         int ret;

>> >> @@ -2738,13 +2740,12 @@ populate_lr_context(struct

>i915_gem_context

>> >*ctx,

>> >>                 return ret;

>> >>         }

>> >>

>> >> -       vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WB);

>> >> +       vaddr = i915_gem_object_pin_map(ctx_obj, map);

>> >

>> >As this uses the cpu domain and flushed afterwards, this one is

>> >correct in its usage of MAP_WB.

>>

>> In this function the content of context state is flushed.

>>

>> But the function of execlists_submit_ports will update it again before writing

>the ELSP port.

>> And there is no flush. In fact after the ELSP port is written, the HW will start

>to execute the submitted commands.

>

>That's a different map.


Really?  It is allocated in one gem obj.

Will you please help to point out where to handle the different map?

Thanks
     Yakui

>-Chris
Chris Wilson June 22, 2018, 8:10 a.m. UTC | #7
Quoting Zhao, Yakui (2018-06-22 09:06:27)
> 
> 
> >-----Original Message-----
> >From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> >Sent: Friday, June 22, 2018 3:37 PM
> >To: Zhao, Yakui <yakui.zhao@intel.com>; intel-gfx@lists.freedesktop.org
> >Subject: RE: [PATCH V2] drm/i915: Use I915_MAP_WC for execlists context
> >buffer on the platforms without LLC
> >
> >Quoting Zhao, Yakui (2018-06-22 08:29:15)
> >>
> >>
> >> >-----Original Message-----
> >> >From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> >> >Sent: Friday, June 22, 2018 2:36 PM
> >> >To: Zhao, Yakui <yakui.zhao@intel.com>;
> >> >intel-gfx@lists.freedesktop.org
> >> >Cc: Zhao, Yakui <yakui.zhao@intel.com>
> >> >Subject: Re: [PATCH V2] drm/i915: Use I915_MAP_WC for execlists
> >> >context buffer on the platforms without LLC
> >> >
> >> >Quoting Zhao Yakui (2018-06-22 07:09:10)
> >> >> @@ -2728,6 +2729,7 @@ populate_lr_context(struct i915_gem_context
> >*ctx,
> >> >>                     struct intel_engine_cs *engine,
> >> >>                     struct intel_ring *ring)  {
> >> >> +       enum i915_map_type map = HAS_LLC(ctx->i915) ? I915_MAP_WB :
> >> >> + I915_MAP_WC;
> >> >>         void *vaddr;
> >> >>         u32 *regs;
> >> >>         int ret;
> >> >> @@ -2738,13 +2740,12 @@ populate_lr_context(struct
> >i915_gem_context
> >> >*ctx,
> >> >>                 return ret;
> >> >>         }
> >> >>
> >> >> -       vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WB);
> >> >> +       vaddr = i915_gem_object_pin_map(ctx_obj, map);
> >> >
> >> >As this uses the cpu domain and flushed afterwards, this one is
> >> >correct in its usage of MAP_WB.
> >>
> >> In this function the content of context state is flushed.
> >>
> >> But the function of execlists_submit_ports will update it again before writing
> >the ELSP port.
> >> And there is no flush. In fact after the ELSP port is written, the HW will start
> >to execute the submitted commands.
> >
> >That's a different map.
> 
> Really?  It is allocated in one gem obj.
> 
> Will you please help to point out where to handle the different map?

It is reallocated on demand. The map used to prepopulate the context
image is not the same map used to update the ring registers in flight.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 10deebe..5ffd76e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1386,6 +1386,7 @@  __execlists_context_pin(struct intel_engine_cs *engine,
 {
 	void *vaddr;
 	int ret;
+	enum i915_map_type map = HAS_LLC(ctx->i915) ? I915_MAP_WB : I915_MAP_WC;
 
 	ret = execlists_context_deferred_alloc(ctx, engine, ce);
 	if (ret)
@@ -1396,7 +1397,7 @@  __execlists_context_pin(struct intel_engine_cs *engine,
 	if (ret)
 		goto err;
 
-	vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
+	vaddr = i915_gem_object_pin_map(ce->state->obj, map);
 	if (IS_ERR(vaddr)) {
 		ret = PTR_ERR(vaddr);
 		goto unpin_vma;
@@ -2728,6 +2729,7 @@  populate_lr_context(struct i915_gem_context *ctx,
 		    struct intel_engine_cs *engine,
 		    struct intel_ring *ring)
 {
+	enum i915_map_type map = HAS_LLC(ctx->i915) ? I915_MAP_WB : I915_MAP_WC;
 	void *vaddr;
 	u32 *regs;
 	int ret;
@@ -2738,13 +2740,12 @@  populate_lr_context(struct i915_gem_context *ctx,
 		return ret;
 	}
 
-	vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WB);
+	vaddr = i915_gem_object_pin_map(ctx_obj, map);
 	if (IS_ERR(vaddr)) {
 		ret = PTR_ERR(vaddr);
 		DRM_DEBUG_DRIVER("Could not map object pages! (%d)\n", ret);
 		return ret;
 	}
-	ctx_obj->mm.dirty = true;
 
 	if (engine->default_state) {
 		/*
@@ -2756,7 +2757,7 @@  populate_lr_context(struct i915_gem_context *ctx,
 		void *defaults;
 
 		defaults = i915_gem_object_pin_map(engine->default_state,
-						   I915_MAP_WB);
+						   map);
 		if (IS_ERR(defaults)) {
 			ret = PTR_ERR(defaults);
 			goto err_unpin_ctx;
@@ -2851,6 +2852,7 @@  void intel_lr_context_resume(struct drm_i915_private *dev_priv)
 	struct intel_engine_cs *engine;
 	struct i915_gem_context *ctx;
 	enum intel_engine_id id;
+	enum i915_map_type map = HAS_LLC(dev_priv) ? I915_MAP_WB : I915_MAP_WC;
 
 	/* Because we emit WA_TAIL_DWORDS there may be a disparity
 	 * between our bookkeeping in ce->ring->head and ce->ring->tail and
@@ -2872,7 +2874,7 @@  void intel_lr_context_resume(struct drm_i915_private *dev_priv)
 				continue;
 
 			reg = i915_gem_object_pin_map(ce->state->obj,
-						      I915_MAP_WB);
+						      map);
 			if (WARN_ON(IS_ERR(reg)))
 				continue;
 
@@ -2880,7 +2882,6 @@  void intel_lr_context_resume(struct drm_i915_private *dev_priv)
 			reg[CTX_RING_HEAD+1] = 0;
 			reg[CTX_RING_TAIL+1] = 0;
 
-			ce->state->obj->mm.dirty = true;
 			i915_gem_object_unpin_map(ce->state->obj);
 
 			intel_ring_reset(ce->ring, 0);