From patchwork Tue Jul 1 18:17:49 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 4461281 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 09DDC9FAD9 for ; Tue, 1 Jul 2014 18:18:51 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id CF5FB203F3 for ; Tue, 1 Jul 2014 18:18:49 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id EEFE92035C for ; Tue, 1 Jul 2014 18:18:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 63F036E22D; Tue, 1 Jul 2014 11:18:45 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTP id 2BF4C6E23E for ; Tue, 1 Jul 2014 11:18:44 -0700 (PDT) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 01 Jul 2014 11:18:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,583,1400050800"; d="scan'208";a="537317998" Received: from ironside.jf.intel.com ([10.7.197.210]) by orsmga001.jf.intel.com with ESMTP; 01 Jul 2014 11:17:55 -0700 From: Ben Widawsky To: Intel GFX Date: Tue, 1 Jul 2014 11:17:49 -0700 Message-Id: <1404238671-18760-15-git-send-email-benjamin.widawsky@intel.com> X-Mailer: git-send-email 2.0.1 In-Reply-To: <1404238671-18760-1-git-send-email-benjamin.widawsky@intel.com> References: <1404238671-18760-1-git-send-email-benjamin.widawsky@intel.com> Cc: Ben Widawsky , Ben Widawsky Subject: [Intel-gfx] [PATCH 14/16] drm/i915: Defer PPGTT cleanup X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The last patch made PPGTT free cases correct. It left a major problem though where in many cases it was possible to have to idle the GPU in order to destroy a VM. This is really unfortunate as it is stalling the active GPU process for the dying GPU process. The workqueue grew very tricky. I left in my original wait based version as #if 0, and used Chris' recommendation with the seqno check. I haven't measure one vs the other, but I am in favor of the code as it is. I am just leaving the old code for people to observe. NOTE: I somewhat expect this patch to not get merged because of work in the pipeline. I won't argue much against that, and I'll simply say, this solves a real problem, or platforms running with PPGTT. PPGTT is not enabled by default yet, and I therefore see little harm in merging this. Because I was expecting this to not get merged, I did not spent much time investigating any corner cases, in particular, cases where I need to cleanup the wq. If this patch does head in the merge direction, I will take a closer look at that. Signed-off-by: Ben Widawsky --- drivers/gpu/drm/i915/i915_drv.h | 10 +++ drivers/gpu/drm/i915/i915_gem.c | 110 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_gem_context.c | 40 ++++++++---- drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +- drivers/gpu/drm/i915/i915_gem_gtt.h | 2 + 5 files changed, 150 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 6c252c3..c23213d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1077,6 +1077,15 @@ struct i915_gem_mm { struct delayed_work idle_work; /** + * PPGTT freeing often happens during interruptible times (fd close, + * execbuf, busy_ioctl). It therefore becomes difficult to clean up the + * PPGTT when the refcount reaches 0 if a signal comes in. This + * workqueue defers the cleanup of the VM to a later time, and allows + * userspace to continue on. + */ + struct delayed_work ppgtt_work; + + /** * Are we in a non-interruptible section of code like * modesetting? */ @@ -1461,6 +1470,7 @@ struct drm_i915_private { struct mutex modeset_restore_lock; struct list_head vm_list; /* Global list of all address spaces */ + struct list_head ppgtt_free_list; struct i915_gtt gtt; /* VM representing the global address space */ struct i915_gem_mm mm; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f6d1238..561bd64 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2707,6 +2707,112 @@ i915_gem_idle_work_handler(struct work_struct *work) intel_mark_idle(dev_priv->dev); } +static void +i915_gem_ppgtt_work_handler(struct work_struct *work) +{ + struct drm_i915_private *dev_priv = + container_of(work, typeof(*dev_priv), mm.ppgtt_work.work); + struct i915_address_space *vm, *v; +#if 0 + unsigned reset_counter; + int ret; +#endif + + if (!mutex_trylock(&dev_priv->dev->struct_mutex)) { + queue_delayed_work(dev_priv->wq, &dev_priv->mm.ppgtt_work, + round_jiffies_up_relative(HZ)); + return; + } + + list_for_each_entry_safe(vm, v, &dev_priv->ppgtt_free_list, global_link) { + struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base); + struct i915_vma *vma = NULL, *vma_active = NULL, *vma_inactive = NULL; + + /* The following attempts to find the newest (most recently + * activated/inactivated) VMA. + */ + if (!list_empty(&ppgtt->base.active_list)) { + vma_active = list_last_entry(&ppgtt->base.active_list, + typeof(*vma), mm_list); + } + if (!list_empty(&ppgtt->base.inactive_list)) { + vma_inactive = list_last_entry(&ppgtt->base.inactive_list, + typeof(*vma), mm_list); + } + + if (vma_active) + vma = vma_active; + else if (vma_inactive) + vma = vma_inactive; + + /* Sanity check */ + if (vma_active && vma_inactive) { + if (WARN_ON(vma_active->retire_seqno <= vma_inactive->retire_seqno)) + vma = vma_inactive; + } + + if (!vma) + goto finish; + + /* Another sanity check */ + if (WARN_ON(!vma->retire_seqno)) + continue; + + /* NOTE: We could wait here for the seqno, but that makes things + * significantly more complex. */ + if (!i915_seqno_passed(vma->retire_ring->get_seqno(vma->retire_ring, true), vma->retire_seqno)) + break; + +#if 0 + reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter); + mutex_unlock(&dev_priv->dev->struct_mutex); + ret = __wait_seqno(vma->retire_ring, vma->retire_seqno, + reset_counter, true, NULL, NULL); + + if (i915_mutex_lock_interruptible(dev_priv->dev)) { + queue_delayed_work(dev_priv->wq, &dev_priv->mm.ppgtt_work, + round_jiffies_up_relative(HZ)); + return; + } + + if (ret) + continue; + +#endif +finish: + /* If the object was destroyed while our VMA was dangling, the + * object free code did the synchronous cleanup for us. + * Therefore every node on this list should have a living object + * (and VMA), but let's try not to use it in case future code + * allows a VMA to outlive an object. + */ + while (!list_empty(&ppgtt->base.mm.head_node.node_list)) { + struct drm_mm_node *node; + struct i915_vma *vma; + struct drm_i915_gem_object *obj; + + node = list_first_entry(&ppgtt->base.mm.head_node.node_list, + struct drm_mm_node, + node_list); + vma = container_of(node, struct i915_vma, node); + obj = vma->obj; + + drm_mm_remove_node(node); + i915_gem_vma_destroy(vma); + i915_gem_object_unpin_pages(obj); + } + + ppgtt->base.cleanup(&ppgtt->base); + kfree(ppgtt); + } + + if (!list_empty(&dev_priv->ppgtt_free_list)) + queue_delayed_work(dev_priv->wq, &dev_priv->mm.ppgtt_work, + round_jiffies_up_relative(HZ)); + + mutex_unlock(&dev_priv->dev->struct_mutex); +} + /** * Ensures that an object will eventually get non-busy by flushing any required * write domains, emitting any outstanding lazy request and retiring and @@ -4551,6 +4657,7 @@ i915_gem_suspend(struct drm_device *dev) mutex_unlock(&dev->struct_mutex); del_timer_sync(&dev_priv->gpu_error.hangcheck_timer); + cancel_delayed_work_sync(&dev_priv->mm.ppgtt_work); cancel_delayed_work_sync(&dev_priv->mm.retire_work); cancel_delayed_work_sync(&dev_priv->mm.idle_work); @@ -4892,6 +4999,7 @@ i915_gem_load(struct drm_device *dev) NULL); INIT_LIST_HEAD(&dev_priv->vm_list); + INIT_LIST_HEAD(&dev_priv->ppgtt_free_list); i915_init_vm(dev_priv, &dev_priv->gtt.base); INIT_LIST_HEAD(&dev_priv->context_list); @@ -4906,6 +5014,8 @@ i915_gem_load(struct drm_device *dev) i915_gem_retire_work_handler); INIT_DELAYED_WORK(&dev_priv->mm.idle_work, i915_gem_idle_work_handler); + INIT_DELAYED_WORK(&dev_priv->mm.ppgtt_work, + i915_gem_ppgtt_work_handler); init_waitqueue_head(&dev_priv->gpu_error.reset_queue); /* On GEN3 we really need to make sure the ARB C3 LP bit is set */ diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index e1b5613..5b4a9a0 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -96,7 +96,7 @@ #define GEN6_CONTEXT_ALIGN (64<<10) #define GEN7_CONTEXT_ALIGN 4096 -static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt) +static int do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt) { struct drm_device *dev = ppgtt->base.dev; struct drm_i915_private *dev_priv = dev->dev_private; @@ -131,7 +131,7 @@ static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt) if (ppgtt == dev_priv->mm.aliasing_ppgtt || (list_empty(&vm->active_list) && list_empty(&vm->inactive_list))) { ppgtt->base.cleanup(&ppgtt->base); - return; + return 0; } /* @@ -153,17 +153,30 @@ static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt) /* We have a problem here where VM teardown cannot be interrupted, or * else the ppgtt cleanup will fail. As an example, a precisely timed - * SIGKILL could leads to an OOPS, or worse. There are two options: - * 1. Make the eviction uninterruptible - * 2. Defer the eviction if it was interrupted. - * - * Option #1 is not the friendliest, but it's the easiest to implement, - * and least error prone. - * TODO: Implement option 2 + * SIGKILL could leads to an OOPS, or worse. The real solution is to + * properly track the VMA <-> OBJ relationship. This temporary bandaid + * will simply defer the free until we know the seqno has passed for + * this VMA. */ ret = i915_gem_evict_vm(&ppgtt->base, do_idle, !do_idle); - if (ret == -ERESTARTSYS) - ret = i915_gem_evict_vm(&ppgtt->base, do_idle, false); + if (ret == -ERESTARTSYS) { + struct drm_mm_node *entry; + /* First mark all VMAs */ + drm_mm_for_each_node(entry, &ppgtt->base.mm) { + struct i915_vma *vma = container_of(entry, struct i915_vma, node); + vma->retire_seqno = vma->obj->last_read_seqno; + vma->retire_ring = vma->obj->ring; + /* It's okay to lose the object, we just can't lose the + * VMA */ + } + + list_move_tail(&ppgtt->base.global_link, &dev_priv->ppgtt_free_list); + queue_delayed_work(dev_priv->wq, + &dev_priv->mm.ppgtt_work, + round_jiffies_up_relative(HZ)); + return -EAGAIN; + } + WARN_ON(ret); WARN_ON(!list_empty(&vm->active_list)); @@ -173,6 +186,7 @@ static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt) } ppgtt->base.cleanup(&ppgtt->base); + return 0; } static void ppgtt_release(struct kref *kref) @@ -180,8 +194,8 @@ static void ppgtt_release(struct kref *kref) struct i915_hw_ppgtt *ppgtt = container_of(kref, struct i915_hw_ppgtt, ref); - do_ppgtt_cleanup(ppgtt); - kfree(ppgtt); + if (!do_ppgtt_cleanup(ppgtt)) + kfree(ppgtt); } static size_t get_context_alignment(struct drm_device *dev) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 88451dc..b2d14d7 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1029,7 +1029,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm) container_of(vm, struct i915_hw_ppgtt, base); list_del(&vm->global_link); - drm_mm_takedown(&ppgtt->base.mm); + drm_mm_takedown(&vm->mm); drm_mm_remove_node(&ppgtt->node); gen6_ppgtt_unmap_pages(ppgtt); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index 1d75801..d0dc567 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -137,6 +137,8 @@ struct i915_vma { struct hlist_node exec_node; unsigned long exec_handle; struct drm_i915_gem_exec_object2 *exec_entry; + uint32_t retire_seqno; /* Last active seqno at context desruction */ + struct intel_engine_cs *retire_ring; /* Last ring for retire_seqno */ /** * How many users have pinned this object in GTT space. The following