Message ID | 1441981893-28133-1-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Sep 11, 2015 at 03:31:33PM +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Prevent leaking VMAs and PPGTT VMs when objects are imported > via flink. > > Scenario is that any VMAs created by the importer will be left > dangling after the importer exits, or destroys the PPGTT context > with which they are associated. > > This is caused by object destruction not running when the > importer closes the buffer object handle due the reference held > by the exporter. This also leaks the VM since the VMA has a > reference on it. > > In practice these leaks can be observed by stopping and starting > the X server on a kernel with fbcon compiled in. Every time > X server exits another VMA will be leaked against the fbcon's > frame buffer object. > > Also on systems where flink buffer sharing is used extensively, > like Android, this leak has even more serious consequences. > > This version is takes a general approach from the earlier work > by Rafael Barabalho (drm/i915: Clean-up PPGTT on context > destruction) and tries to incorporate the subsequent discussion > between Chris Wilson and Daniel Vetter. > > On context destruction a VM is marked as closed and a worker > thread scheduled to unbind all inactive VMAs for this VM. At > the same time, active VMAs retired on this closed VM are > unbound immediately. You don't need a worker, since you just can just drop the vma from the retirement. http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=9d4020dce054cca23bd1fea72092d036f0a3ea13 That patch is as old as the test case, just waiting for some review on earlier code. -Chris
On 09/11/2015 03:56 PM, Chris Wilson wrote: > On Fri, Sep 11, 2015 at 03:31:33PM +0100, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Prevent leaking VMAs and PPGTT VMs when objects are imported >> via flink. >> >> Scenario is that any VMAs created by the importer will be left >> dangling after the importer exits, or destroys the PPGTT context >> with which they are associated. >> >> This is caused by object destruction not running when the >> importer closes the buffer object handle due the reference held >> by the exporter. This also leaks the VM since the VMA has a >> reference on it. >> >> In practice these leaks can be observed by stopping and starting >> the X server on a kernel with fbcon compiled in. Every time >> X server exits another VMA will be leaked against the fbcon's >> frame buffer object. >> >> Also on systems where flink buffer sharing is used extensively, >> like Android, this leak has even more serious consequences. >> >> This version is takes a general approach from the earlier work >> by Rafael Barabalho (drm/i915: Clean-up PPGTT on context >> destruction) and tries to incorporate the subsequent discussion >> between Chris Wilson and Daniel Vetter. >> >> On context destruction a VM is marked as closed and a worker >> thread scheduled to unbind all inactive VMAs for this VM. At >> the same time, active VMAs retired on this closed VM are >> unbound immediately. > > You don't need a worker, since you just can just drop the vma from the > retirement. I was thinking that retirement does not necessarily happen - maybe both VMAs are already inactive at the time of context destruction. Which is then a question is it OK to wait for the next retirement on the same object to clean it up. I wasn't sure so thought it is safer to clean it up immediately since it is not a lot of code. > http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=9d4020dce054cca23bd1fea72092d036f0a3ea13 > > That patch is as old as the test case, just waiting for some review on > earlier code. Plus my patch doesn't fix flink-and-close-vma-leak, but a new one I also posted, flink-and-exit-vma-leak. The latter is what was affecting Android, and can be seen with X.org and fbcon. Your cleanup and handle close is more complete but a question is how long will "just waiting for some review on earlier code" take :), especially considering it depends on a bigger rewrite of the core. Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 41263cd4170c..1dc005758a96 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2372,7 +2372,7 @@ i915_gem_object_retire__write(struct drm_i915_gem_object *obj) static void i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring) { - struct i915_vma *vma; + struct i915_vma *vma, *next; RQ_BUG_ON(obj->last_read_req[ring] == NULL); RQ_BUG_ON(!(obj->active & (1 << ring))); @@ -2394,9 +2394,13 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring) list_move_tail(&obj->global_list, &to_i915(obj->base.dev)->mm.bound_list); - list_for_each_entry(vma, &obj->vma_list, vma_link) { + list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) { if (!list_empty(&vma->mm_list)) list_move_tail(&vma->mm_list, &vma->vm->inactive_list); + + /* Unbind VMAs from contexts closed in the interim. */ + if (vma->vm->closed) + WARN_ON(i915_vma_unbind(vma)); } i915_gem_request_assign(&obj->last_fenced_req, NULL); diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 74aa0c9929ba..6628b8fe1976 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -133,6 +133,55 @@ static int get_context_size(struct drm_device *dev) return ret; } +static void +i915_gem_context_cleanup_worker(struct work_struct *work) +{ + struct i915_hw_ppgtt *ppgtt = container_of(work, typeof(*ppgtt), + cleanup_work); + struct drm_device *dev = ppgtt->base.dev; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_vma *vma, *next; + + mutex_lock(&dev->struct_mutex); + + list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list, + mm_list) { + bool was_interruptible; + + was_interruptible = dev_priv->mm.interruptible; + dev_priv->mm.interruptible = false; + WARN_ON(i915_vma_unbind(vma)); + + dev_priv->mm.interruptible = was_interruptible; + } + + i915_ppgtt_put(ppgtt); + + mutex_unlock(&dev->struct_mutex); +} + +static void i915_gem_context_clean(struct intel_context *ctx) +{ + struct i915_hw_ppgtt *ppgtt = ctx->ppgtt; + + if (!ppgtt) + return; + + /* + * Signal that whis context, or better say the underlying VM is + * getting closed. Since the VM is not expected to outlive the context + * this is safe to do. + */ + ppgtt->base.closed = true; + + /* + * Unbind all inactive VMAs for this VM, but do it asynchronously. + */ + INIT_WORK(&ppgtt->cleanup_work, i915_gem_context_cleanup_worker); + i915_ppgtt_get(ppgtt); + schedule_work(&ppgtt->cleanup_work); +} + void i915_gem_context_free(struct kref *ctx_ref) { struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref); @@ -142,6 +191,13 @@ void i915_gem_context_free(struct kref *ctx_ref) if (i915.enable_execlists) intel_lr_context_free(ctx); + /* + * This context is going away and we need to remove all VMAs still + * around. This is to handle imported shared objects for which + * destructor did not run when their handles were closed. + */ + i915_gem_context_clean(ctx); + i915_ppgtt_put(ctx->ppgtt); if (ctx->legacy_hw_ctx.rcs_state) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 87862813cfde..b8e54ce579a7 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -3178,6 +3178,9 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj, if (WARN_ON(i915_is_ggtt(vm) != !!ggtt_view)) return ERR_PTR(-EINVAL); + if (WARN_ON(vm->closed)) + return ERR_PTR(-EINVAL); + vma = kmem_cache_zalloc(to_i915(obj->base.dev)->vmas, GFP_KERNEL); if (vma == NULL) return ERR_PTR(-ENOMEM); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index 82750073d5b3..aa079ee67c16 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -275,6 +275,9 @@ struct i915_address_space { u64 start; /* Start offset always 0 for dri2 */ u64 total; /* size addr space maps (ex. 2GB for ggtt) */ + /** Address space is being destroyed? */ + bool closed; + struct i915_page_scratch *scratch_page; struct i915_page_table *scratch_pt; struct i915_page_directory *scratch_pd; @@ -371,6 +374,8 @@ struct i915_hw_ppgtt { struct drm_i915_file_private *file_priv; + struct work_struct cleanup_work; + gen6_pte_t __iomem *pd_addr; int (*enable)(struct i915_hw_ppgtt *ppgtt);