From patchwork Fri Aug 22 03:11:34 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Widawsky <benjamin.widawsky@intel.com>
X-Patchwork-Id: 4760571
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
X-Original-To: patchwork-intel-gfx@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.19.201])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 62D9B9F344
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri, 22 Aug 2014 03:12:58 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 8767220173
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri, 22 Aug 2014 03:12:57 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	by mail.kernel.org (Postfix) with ESMTP id 937E020155
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri, 22 Aug 2014 03:12:56 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 1C0FE6E859;
	Thu, 21 Aug 2014 20:12:56 -0700 (PDT)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
	by gabe.freedesktop.org (Postfix) with ESMTP id 44D326E858
	for <intel-gfx@lists.freedesktop.org>;
	Thu, 21 Aug 2014 20:12:55 -0700 (PDT)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
	by fmsmga103.fm.intel.com with ESMTP; 21 Aug 2014 20:05:07 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="375575731"
Received: from unknown (HELO ironside.intel.com) ([10.255.12.192])
	by FMSMGA003.fm.intel.com with ESMTP; 21 Aug 2014 20:09:01 -0700
From: Ben Widawsky <benjamin.widawsky@intel.com>
To: Intel GFX <intel-gfx@lists.freedesktop.org>
Date: Thu, 21 Aug 2014 20:11:34 -0700
Message-Id: <1408677155-1840-12-git-send-email-benjamin.widawsky@intel.com>
X-Mailer: git-send-email 2.0.4
In-Reply-To: <1408677155-1840-1-git-send-email-benjamin.widawsky@intel.com>
References: <1408677155-1840-1-git-send-email-benjamin.widawsky@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>,
	Ben Widawsky <benjamin.widawsky@intel.com>
Subject: [Intel-gfx] [PATCH 11/68] drm/i915: More correct (slower) ppgtt
	cleanup
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

If a VM still have objects which are bound (exactly: have a node
reserved in the drm_mm), and we are in the middle of a reset, we have no
hope of the standard methods fixing the situation (ring idle won't
work). We must therefore let the reset handler take it's course, and
then we can resume tearing down the VM.

This logic very much duplicates^Wresembles the logic in our wait for
error code. I've decided to leave it as open coded because I expect this
bit of code to require tweaks and changes over time.

Interruptions via signal causes a really similar problem.

This should deprecate the need for the yet unmerged patch from Chris
(and an identical patch from me, which was first!!):
drm/i915: Prevent signals from interrupting close()

I have a followup patch to implement deferred free, before you complain.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 51 +++++++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 4d47bcb..e48a3bb 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -101,6 +101,32 @@ static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt)
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_address_space *vm = &ppgtt->base;
+	bool do_idle = false;
+	int ret;
+
+	/* If we get here while in reset, we need to let the reset handler run
+	 * first, or else our VM teardown isn't going to go smoothly. There are
+	 * a could of options at this point, but letting the reset handler do
+	 * it's thing is the most desirable. The reset handler will take care of
+	 * retiring the stuck requests.
+	 */
+	if (i915_reset_in_progress(&dev_priv->gpu_error)) {
+		mutex_unlock(&dev->struct_mutex);
+#define EXIT_COND (!i915_reset_in_progress(&dev_priv->gpu_error) || \
+		   i915_terminally_wedged(&dev_priv->gpu_error))
+		ret = wait_event_timeout(dev_priv->gpu_error.reset_queue,
+					 EXIT_COND,
+					 10 * HZ);
+		if (!ret) {
+			/* it's unlikely idling will solve anything, but it
+			 * shouldn't hurt to try. */
+			do_idle = true;
+			/* TODO: go down kicking and screaming harder */
+		}
+#undef EXIT_COND
+
+		mutex_lock(&dev->struct_mutex);
+	}
 
 	if (ppgtt == dev_priv->mm.aliasing_ppgtt ||
 	    (list_empty(&vm->active_list) && list_empty(&vm->inactive_list))) {
@@ -117,14 +143,33 @@ static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt)
 	if (!list_empty(&vm->active_list)) {
 		struct i915_vma *vma;
 
+		do_idle = true;
 		list_for_each_entry(vma, &vm->active_list, mm_list)
 			if (WARN_ON(list_empty(&vma->vma_link) ||
 				    list_is_singular(&vma->vma_link)))
 				break;
-		i915_gem_evict_vm(&ppgtt->base, true, true);
-	} else {
+	} else
 		i915_gem_retire_requests(dev);
-		i915_gem_evict_vm(&ppgtt->base, false, true);
+
+	/* We have a problem here where VM teardown cannot be interrupted, or
+	 * else the ppgtt cleanup will fail. As an example, a precisely timed
+	 * SIGKILL could leads to an OOPS, or worse. There are two options:
+	 * 1. Make the eviction uninterruptible
+	 * 2. Defer the eviction if it was interrupted.
+	 *
+	 * Option #1 is not the friendliest, but it's the easiest to implement,
+	 * and least error prone.
+	 * TODO: Implement option 2
+	 */
+	ret = i915_gem_evict_vm(&ppgtt->base, do_idle, !do_idle);
+	if (ret == -ERESTARTSYS)
+		ret = i915_gem_evict_vm(&ppgtt->base, do_idle, false);
+	WARN_ON(ret);
+	WARN_ON(!list_empty(&vm->active_list));
+
+	/* This is going to blow up badly if the mm is unclean */
+	if (WARN_ON(!list_empty(&ppgtt->base.mm.head_node.node_list))) {
+		/* TODO: go down kicking and screaming harder++ */
 	}
 
 	ppgtt->base.cleanup(&ppgtt->base);