Message ID | 1460123698-16832-4-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Apr 08, 2016 at 02:54:57PM +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > We can use the new pin/lazy unpin API for more performance > in the execlist submission paths. The sticking point for me was the ringbuffer and Braswell having to re-ioremap it every context pin. I hadn't noticed the kmap of the context, but this does seem like a valid use for pin_map so I'm not complaining... (Just saying this half the solution ;) Oh, and you don't have pin_map for your ringbuffer either yet. -Chris
On 08/04/16 15:40, Chris Wilson wrote: > On Fri, Apr 08, 2016 at 02:54:57PM +0100, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> We can use the new pin/lazy unpin API for more performance >> in the execlist submission paths. > > The sticking point for me was the ringbuffer and Braswell having to > re-ioremap it every context pin. I hadn't noticed the kmap of the > context, but this does seem like a valid use for pin_map so I'm not > complaining... (Just saying this half the solution ;) Pull in to the series "drm/i915: Move ioremap_wc tracking onto VMA" as well then? > Oh, and you don't have pin_map for your ringbuffer either yet. Confused, I did base this on top of "drm/i915: Refactor duplicate object vmap functions" and the half missing is the ioremap paths. Which you commented on above. What other ringbuffer is missing? Regards, Tvrtko
On Mon, Apr 11, 2016 at 10:05:34AM +0100, Tvrtko Ursulin wrote: > > On 08/04/16 15:40, Chris Wilson wrote: > >On Fri, Apr 08, 2016 at 02:54:57PM +0100, Tvrtko Ursulin wrote: > >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> > >>We can use the new pin/lazy unpin API for more performance > >>in the execlist submission paths. > > > >The sticking point for me was the ringbuffer and Braswell having to > >re-ioremap it every context pin. I hadn't noticed the kmap of the > >context, but this does seem like a valid use for pin_map so I'm not > >complaining... (Just saying this half the solution ;) > > Pull in to the series "drm/i915: Move ioremap_wc tracking onto VMA" > as well then? That depends on "io-mapping: Specify mapping size for io_mapping_map_wc" and the vma->map_and_fenceable (though you can workaround that) iirc . > >Oh, and you don't have pin_map for your ringbuffer either yet. > > Confused, I did base this on top of "drm/i915: Refactor duplicate > object vmap functions" and the half missing is the ioremap paths. > Which you commented on above. What other ringbuffer is missing? No, I was just thinking it was still pending because I had a fixup in my tree: drm/i915: Treat ringbuffer writes as write to normal memory That and a patch to use WC vmaps for Braswell as well (that's still in preliminary form, but if you are interested https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=3fa97e044f3b5a2c23e44ec659a8199f134d8fb5 it really helps with context creation stress tests - and not much else! ;) -Chris
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 3d08dac0a9b0..5e967943ce49 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1111,8 +1111,8 @@ static int intel_lr_context_do_pin(struct intel_context *ctx, struct drm_i915_private *dev_priv = dev->dev_private; struct drm_i915_gem_object *ctx_obj = ctx->engine[engine->id].state; struct intel_ringbuffer *ringbuf = ctx->engine[engine->id].ringbuf; - struct page *lrc_state_page; - uint32_t *lrc_reg_state; + void *obj_addr; + u32 *lrc_reg_state; int ret; WARN_ON(!mutex_is_locked(&engine->dev->struct_mutex)); @@ -1122,11 +1122,11 @@ static int intel_lr_context_do_pin(struct intel_context *ctx, if (ret) return ret; - lrc_state_page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN); - if (WARN_ON(!lrc_state_page)) { - ret = -ENODEV; + obj_addr = i915_gem_object_pin_map(ctx_obj); + if (!obj_addr) goto unpin_ctx_obj; - } + + lrc_reg_state = obj_addr + LRC_STATE_PN * PAGE_SIZE; ret = intel_pin_and_map_ringbuffer_obj(engine->dev, ringbuf); if (ret) @@ -1134,7 +1134,6 @@ static int intel_lr_context_do_pin(struct intel_context *ctx, ctx->engine[engine->id].lrc_vma = i915_gem_obj_to_ggtt(ctx_obj); intel_lr_context_descriptor_update(ctx, engine); - lrc_reg_state = kmap(lrc_state_page); lrc_reg_state[CTX_RING_BUFFER_START+1] = ringbuf->vma->node.start; ctx->engine[engine->id].lrc_reg_state = lrc_reg_state; ctx_obj->dirty = true; @@ -1177,7 +1176,7 @@ void intel_lr_context_unpin(struct intel_context *ctx, WARN_ON(!mutex_is_locked(&ctx->i915->dev->struct_mutex)); if (--ctx->engine[engine->id].pin_count == 0) { - kunmap(kmap_to_page(ctx->engine[engine->id].lrc_reg_state)); + i915_gem_object_unpin_map(ctx_obj); intel_unpin_ringbuffer_obj(ctx->engine[engine->id].ringbuf); i915_gem_object_ggtt_unpin(ctx_obj); ctx->engine[engine->id].lrc_vma = NULL;