From patchwork Fri Aug 22 03:11:35 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Widawsky <benjamin.widawsky@intel.com>
X-Patchwork-Id: 4760581
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Original-To: patchwork-intel-gfx@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.19.201])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id F2DC79F344
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri, 22 Aug 2014 03:13:01 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id BEF6620173
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri, 22 Aug 2014 03:13:00 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	by mail.kernel.org (Postfix) with ESMTP id 773E320155
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri, 22 Aug 2014 03:12:59 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id EC2E26E85D;
	Thu, 21 Aug 2014 20:12:58 -0700 (PDT)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
	by gabe.freedesktop.org (Postfix) with ESMTP id 55BC66E85C
	for <intel-gfx@lists.freedesktop.org>;
	Thu, 21 Aug 2014 20:12:56 -0700 (PDT)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
	by fmsmga103.fm.intel.com with ESMTP; 21 Aug 2014 20:05:08 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="375575742"
Received: from unknown (HELO ironside.intel.com) ([10.255.12.192])
	by FMSMGA003.fm.intel.com with ESMTP; 21 Aug 2014 20:09:02 -0700
From: Ben Widawsky <benjamin.widawsky@intel.com>
To: Intel GFX <intel-gfx@lists.freedesktop.org>
Date: Thu, 21 Aug 2014 20:11:35 -0700
Message-Id: <1408677155-1840-13-git-send-email-benjamin.widawsky@intel.com>
X-Mailer: git-send-email 2.0.4
In-Reply-To: <1408677155-1840-1-git-send-email-benjamin.widawsky@intel.com>
References: <1408677155-1840-1-git-send-email-benjamin.widawsky@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>,
	Ben Widawsky <benjamin.widawsky@intel.com>
Subject: [Intel-gfx] [PATCH 12/68] drm/i915: Defer PPGTT cleanup
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

The last patch made PPGTT free cases correct. It left a major problem
though where in many cases it was possible to have to idle the GPU in
order to destroy a VM. This is really unfortunate as it is stalling the
active GPU process for the dying GPU process.

The workqueue grew very tricky. I left in my original wait based version
as #if 0, and used Chris' recommendation with the seqno check. I haven't
measure one vs the other, but I am in favor of the code as it is. I am
just leaving the old code for people to observe.

NOTE: I don't expect this patch to be merged as the thing it fixes will
be superseded by the PPGTT refcounting. However, at this time I do not
have a working and tested solution, other than mine. So it's still here.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h         |  10 +++
 drivers/gpu/drm/i915/i915_gem.c         | 110 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.c |  40 ++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.c     |   2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h     |   2 +
 5 files changed, 150 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c5c8753..56d193e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1129,6 +1129,15 @@ struct i915_gem_mm {
 	struct delayed_work idle_work;
 
 	/**
+	 * PPGTT freeing often happens during interruptible times (fd close,
+	 * execbuf, busy_ioctl). It therefore becomes difficult to clean up the
+	 * PPGTT when the refcount reaches 0 if a signal comes in. This
+	 * workqueue defers the cleanup of the VM to a later time, and allows
+	 * userspace to continue on.
+	 */
+	struct delayed_work ppgtt_work;
+
+	/**
 	 * Are we in a non-interruptible section of code like
 	 * modesetting?
 	 */
@@ -1520,6 +1529,7 @@ struct drm_i915_private {
 	struct mutex modeset_restore_lock;
 
 	struct list_head vm_list; /* Global list of all address spaces */
+	struct list_head ppgtt_free_list;
 	struct i915_gtt gtt; /* VM representing the global address space */
 
 	struct i915_gem_mm mm;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 20743bd..4ad2205 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2707,6 +2707,112 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	intel_mark_idle(dev_priv->dev);
 }
 
+static void
+i915_gem_ppgtt_work_handler(struct work_struct *work)
+{
+	struct drm_i915_private *dev_priv =
+		container_of(work, typeof(*dev_priv), mm.ppgtt_work.work);
+	struct i915_address_space *vm, *v;
+#if 0
+	unsigned reset_counter;
+	int ret;
+#endif
+
+	if (!mutex_trylock(&dev_priv->dev->struct_mutex)) {
+		queue_delayed_work(dev_priv->wq, &dev_priv->mm.ppgtt_work,
+				   round_jiffies_up_relative(HZ));
+		return;
+	}
+
+	list_for_each_entry_safe(vm, v, &dev_priv->ppgtt_free_list, global_link) {
+		struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base);
+		struct i915_vma *vma = NULL, *vma_active = NULL, *vma_inactive = NULL;
+
+		/* The following attempts to find the newest (most recently
+		 * activated/inactivated) VMA.
+		 */
+		if (!list_empty(&ppgtt->base.active_list)) {
+			vma_active = list_last_entry(&ppgtt->base.active_list,
+						     typeof(*vma), mm_list);
+		}
+		if (!list_empty(&ppgtt->base.inactive_list)) {
+			vma_inactive = list_last_entry(&ppgtt->base.inactive_list,
+						       typeof(*vma), mm_list);
+		}
+
+		if (vma_active)
+			vma = vma_active;
+		else if (vma_inactive)
+			vma = vma_inactive;
+
+		/* Sanity check */
+		if (vma_active && vma_inactive) {
+			if (WARN_ON(vma_active->retire_seqno <= vma_inactive->retire_seqno))
+				vma = vma_inactive;
+		}
+
+		if (!vma)
+			goto finish;
+
+		/* Another sanity check */
+		if (WARN_ON(!vma->retire_seqno))
+			continue;
+
+		/* NOTE: We could wait here for the seqno, but that makes things
+		 * significantly more complex. */
+		if (!i915_seqno_passed(vma->retire_ring->get_seqno(vma->retire_ring, true), vma->retire_seqno))
+			break;
+
+#if 0
+		reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+		mutex_unlock(&dev_priv->dev->struct_mutex);
+		ret = __wait_seqno(vma->retire_ring, vma->retire_seqno,
+				   reset_counter, true, NULL, NULL);
+
+		if (i915_mutex_lock_interruptible(dev_priv->dev)) {
+			queue_delayed_work(dev_priv->wq, &dev_priv->mm.ppgtt_work,
+					   round_jiffies_up_relative(HZ));
+			return;
+		}
+
+		if (ret)
+			continue;
+
+#endif
+finish:
+		/* If the object was destroyed while our VMA was dangling, the
+		 * object free code did the synchronous cleanup for us.
+		 * Therefore every node on this list should have a living object
+		 * (and VMA), but let's try not to use it in case future code
+		 * allows a VMA to outlive an object.
+		 */
+		while (!list_empty(&ppgtt->base.mm.head_node.node_list)) {
+			struct drm_mm_node *node;
+			struct i915_vma *vma;
+			struct drm_i915_gem_object *obj;
+
+			node = list_first_entry(&ppgtt->base.mm.head_node.node_list,
+						struct drm_mm_node,
+						node_list);
+			vma = container_of(node, struct i915_vma, node);
+			obj = vma->obj;
+
+			drm_mm_remove_node(node);
+			i915_gem_vma_destroy(vma);
+			i915_gem_object_unpin_pages(obj);
+		}
+
+		ppgtt->base.cleanup(&ppgtt->base);
+		kfree(ppgtt);
+	}
+
+	if (!list_empty(&dev_priv->ppgtt_free_list))
+		queue_delayed_work(dev_priv->wq, &dev_priv->mm.ppgtt_work,
+				   round_jiffies_up_relative(HZ));
+
+	mutex_unlock(&dev_priv->dev->struct_mutex);
+}
+
 /**
  * Ensures that an object will eventually get non-busy by flushing any required
  * write domains, emitting any outstanding lazy request and retiring and
@@ -4553,6 +4659,7 @@ i915_gem_suspend(struct drm_device *dev)
 	mutex_unlock(&dev->struct_mutex);
 
 	del_timer_sync(&dev_priv->gpu_error.hangcheck_timer);
+	cancel_delayed_work_sync(&dev_priv->mm.ppgtt_work);
 	cancel_delayed_work_sync(&dev_priv->mm.retire_work);
 	flush_delayed_work(&dev_priv->mm.idle_work);
 
@@ -4894,6 +5001,7 @@ i915_gem_load(struct drm_device *dev)
 				  NULL);
 
 	INIT_LIST_HEAD(&dev_priv->vm_list);
+	INIT_LIST_HEAD(&dev_priv->ppgtt_free_list);
 	i915_init_vm(dev_priv, &dev_priv->gtt.base);
 
 	INIT_LIST_HEAD(&dev_priv->context_list);
@@ -4908,6 +5016,8 @@ i915_gem_load(struct drm_device *dev)
 			  i915_gem_retire_work_handler);
 	INIT_DELAYED_WORK(&dev_priv->mm.idle_work,
 			  i915_gem_idle_work_handler);
+	INIT_DELAYED_WORK(&dev_priv->mm.ppgtt_work,
+			  i915_gem_ppgtt_work_handler);
 	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
 
 	/* On GEN3 we really need to make sure the ARB C3 LP bit is set */
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index e48a3bb..61b36f9 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -96,7 +96,7 @@
 #define GEN6_CONTEXT_ALIGN (64<<10)
 #define GEN7_CONTEXT_ALIGN 4096
 
-static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt)
+static int do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -131,7 +131,7 @@ static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt)
 	if (ppgtt == dev_priv->mm.aliasing_ppgtt ||
 	    (list_empty(&vm->active_list) && list_empty(&vm->inactive_list))) {
 		ppgtt->base.cleanup(&ppgtt->base);
-		return;
+		return 0;
 	}
 
 	/*
@@ -153,17 +153,30 @@ static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt)
 
 	/* We have a problem here where VM teardown cannot be interrupted, or
 	 * else the ppgtt cleanup will fail. As an example, a precisely timed
-	 * SIGKILL could leads to an OOPS, or worse. There are two options:
-	 * 1. Make the eviction uninterruptible
-	 * 2. Defer the eviction if it was interrupted.
-	 *
-	 * Option #1 is not the friendliest, but it's the easiest to implement,
-	 * and least error prone.
-	 * TODO: Implement option 2
+	 * SIGKILL could leads to an OOPS, or worse. The real solution is to
+	 * properly track the VMA <-> OBJ relationship. This temporary bandaid
+	 * will simply defer the free until we know the seqno has passed for
+	 * this VMA.
 	 */
 	ret = i915_gem_evict_vm(&ppgtt->base, do_idle, !do_idle);
-	if (ret == -ERESTARTSYS)
-		ret = i915_gem_evict_vm(&ppgtt->base, do_idle, false);
+	if (ret == -ERESTARTSYS) {
+		struct drm_mm_node *entry;
+		/* First mark all VMAs */
+		drm_mm_for_each_node(entry, &ppgtt->base.mm) {
+			struct i915_vma *vma = container_of(entry, struct i915_vma, node);
+			vma->retire_seqno = vma->obj->last_read_seqno;
+			vma->retire_ring = vma->obj->ring;
+			/* It's okay to lose the object, we just can't lose the
+			 * VMA */
+		}
+
+		list_move_tail(&ppgtt->base.global_link, &dev_priv->ppgtt_free_list);
+		queue_delayed_work(dev_priv->wq,
+				   &dev_priv->mm.ppgtt_work,
+				   round_jiffies_up_relative(HZ));
+		return -EAGAIN;
+	}
+
 	WARN_ON(ret);
 	WARN_ON(!list_empty(&vm->active_list));
 
@@ -173,6 +186,7 @@ static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt)
 	}
 
 	ppgtt->base.cleanup(&ppgtt->base);
+	return 0;
 }
 
 static void ppgtt_release(struct kref *kref)
@@ -180,8 +194,8 @@ static void ppgtt_release(struct kref *kref)
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(kref, struct i915_hw_ppgtt, ref);
 
-	do_ppgtt_cleanup(ppgtt);
-	kfree(ppgtt);
+	if (!do_ppgtt_cleanup(ppgtt))
+		kfree(ppgtt);
 }
 
 static size_t get_context_alignment(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f2ece5f..254895d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1019,7 +1019,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 		container_of(vm, struct i915_hw_ppgtt, base);
 
 	list_del(&vm->global_link);
-	drm_mm_takedown(&ppgtt->base.mm);
+	drm_mm_takedown(&vm->mm);
 	drm_mm_remove_node(&ppgtt->node);
 
 	gen6_ppgtt_unmap_pages(ppgtt);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 379cf16ea..593b5d0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -137,6 +137,8 @@ struct i915_vma {
 	struct hlist_node exec_node;
 	unsigned long exec_handle;
 	struct drm_i915_gem_exec_object2 *exec_entry;
+	uint32_t retire_seqno; /* Last active seqno at context desruction */
+	struct intel_engine_cs *retire_ring; /* Last ring for retire_seqno */
 
 	/**
 	 * How many users have pinned this object in GTT space. The following