Message ID | 20200501101900.22543-3-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] drm/i915/gem: Use chained reloc batches | expand |
On 01/05/2020 11:19, Chris Wilson wrote: > If at first we don't succeed, try try again. > > No all engines may support the MI ops we need to perform asynchronous > relocation patching, and so we end up failing back to a synchronous > operation that has a liability of blocking. However, Tvrtko pointed out > we don't need to use the same engine to perform the relocations as we > are planning to execute the execbuf on, and so if we switch over to a > working engine, we can perform the relocation asynchronously. The user > execbuf will be queued after the relocations by virtue of fencing. > > This patch creates a new context per execbuf requiring asynchronous > relocations on an unusable engines. This is perhaps a bit excessive and > can be amoriated by a small context cache, but for the moment we only > need it for working around a little used engine on Sandybridge, and only > if relocations are actually required. > > Now we just need to teach the relocation code to handle physical > addressing for gen2/3, and we should then have universal support! > > Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Testcase: igt/gem_exec_reloc/basic-spin # snb > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 32 ++++++++++++++++--- > 1 file changed, 27 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > index b224a453e2a3..6d649de3a796 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > @@ -1280,6 +1280,7 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma) > } > > static int __reloc_gpu_alloc(struct i915_execbuffer *eb, > + struct intel_engine_cs *engine, > unsigned int len) > { > struct reloc_cache *cache = &eb->reloc_cache; > @@ -1289,7 +1290,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb, > u32 *cmd; > int err; > > - pool = intel_gt_get_buffer_pool(eb->engine->gt, PAGE_SIZE); > + pool = intel_gt_get_buffer_pool(engine->gt, PAGE_SIZE); > if (IS_ERR(pool)) > return PTR_ERR(pool); > > @@ -1312,7 +1313,23 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb, > if (err) > goto err_unmap; > > - rq = i915_request_create(eb->context); > + if (engine == eb->context->engine) { > + rq = i915_request_create(eb->context); > + } else { > + struct intel_context *ce; > + > + ce = intel_context_create(engine); > + if (IS_ERR(ce)) { > + err = PTR_ERR(rq); > + goto err_unpin; > + } > + > + i915_vm_put(ce->vm); > + ce->vm = i915_vm_get(eb->context->vm); > + > + rq = intel_context_create_request(ce); > + intel_context_put(ce); > + } > if (IS_ERR(rq)) { > err = PTR_ERR(rq); > goto err_unpin; > @@ -1363,10 +1380,15 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb, > int err; > > if (unlikely(!cache->rq)) { > - if (!intel_engine_can_store_dword(eb->engine)) > - return ERR_PTR(-ENODEV); > + struct intel_engine_cs *engine = eb->engine; > + > + if (!intel_engine_can_store_dword(engine)) { > + engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0]; > + if (!engine || !intel_engine_can_store_dword(engine)) > + return ERR_PTR(-ENODEV); > + } > > - err = __reloc_gpu_alloc(eb, len); > + err = __reloc_gpu_alloc(eb, engine, len); > if (unlikely(err)) > return ERR_PTR(err); > } > If you are not worried about the context create dance on SNB, and it is limited to VCS, then neither am I. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
Quoting Tvrtko Ursulin (2020-05-01 13:47:36) > > On 01/05/2020 11:19, Chris Wilson wrote: > If you are not worried about the context create dance on SNB, and it is > limited to VCS, then neither am I. In the short term, since it's limited to vcs on SNB so that means it is just a plain kmalloc (as there is no logical state), I'm not worrying. Longer term, I do intend on having a pool of logical states cached on the engine. -Chris
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index b224a453e2a3..6d649de3a796 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1280,6 +1280,7 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma) } static int __reloc_gpu_alloc(struct i915_execbuffer *eb, + struct intel_engine_cs *engine, unsigned int len) { struct reloc_cache *cache = &eb->reloc_cache; @@ -1289,7 +1290,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb, u32 *cmd; int err; - pool = intel_gt_get_buffer_pool(eb->engine->gt, PAGE_SIZE); + pool = intel_gt_get_buffer_pool(engine->gt, PAGE_SIZE); if (IS_ERR(pool)) return PTR_ERR(pool); @@ -1312,7 +1313,23 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb, if (err) goto err_unmap; - rq = i915_request_create(eb->context); + if (engine == eb->context->engine) { + rq = i915_request_create(eb->context); + } else { + struct intel_context *ce; + + ce = intel_context_create(engine); + if (IS_ERR(ce)) { + err = PTR_ERR(rq); + goto err_unpin; + } + + i915_vm_put(ce->vm); + ce->vm = i915_vm_get(eb->context->vm); + + rq = intel_context_create_request(ce); + intel_context_put(ce); + } if (IS_ERR(rq)) { err = PTR_ERR(rq); goto err_unpin; @@ -1363,10 +1380,15 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb, int err; if (unlikely(!cache->rq)) { - if (!intel_engine_can_store_dword(eb->engine)) - return ERR_PTR(-ENODEV); + struct intel_engine_cs *engine = eb->engine; + + if (!intel_engine_can_store_dword(engine)) { + engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0]; + if (!engine || !intel_engine_can_store_dword(engine)) + return ERR_PTR(-ENODEV); + } - err = __reloc_gpu_alloc(eb, len); + err = __reloc_gpu_alloc(eb, engine, len); if (unlikely(err)) return ERR_PTR(err); }
If at first we don't succeed, try try again. No all engines may support the MI ops we need to perform asynchronous relocation patching, and so we end up failing back to a synchronous operation that has a liability of blocking. However, Tvrtko pointed out we don't need to use the same engine to perform the relocations as we are planning to execute the execbuf on, and so if we switch over to a working engine, we can perform the relocation asynchronously. The user execbuf will be queued after the relocations by virtue of fencing. This patch creates a new context per execbuf requiring asynchronous relocations on an unusable engines. This is perhaps a bit excessive and can be amoriated by a small context cache, but for the moment we only need it for working around a little used engine on Sandybridge, and only if relocations are actually required. Now we just need to teach the relocation code to handle physical addressing for gen2/3, and we should then have universal support! Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Testcase: igt/gem_exec_reloc/basic-spin # snb Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 32 ++++++++++++++++--- 1 file changed, 27 insertions(+), 5 deletions(-)