From patchwork Fri Sep  9 07:20:57 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chris Wilson <chris@chris-wilson.co.uk>
X-Patchwork-Id: 9322511
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	BE4DB6077F for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri,  9 Sep 2016 07:21:28 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB0CA29C4F
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri,  9 Sep 2016 07:21:28 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id A01BF29C91; Fri,  9 Sep 2016 07:21:28 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 10E2D29C4F
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri,  9 Sep 2016 07:21:28 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 5C8216E6BE;
	Fri,  9 Sep 2016 07:21:27 +0000 (UTC)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com
	[IPv6:2a00:1450:400c:c09::244])
	by gabe.freedesktop.org (Postfix) with ESMTPS id 8A6946E6BE
	for <intel-gfx@lists.freedesktop.org>;
	Fri,  9 Sep 2016 07:21:24 +0000 (UTC)
Received: by mail-wm0-x244.google.com with SMTP id b187so1406188wme.0
	for <intel-gfx@lists.freedesktop.org>;
	Fri, 09 Sep 2016 00:21:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=sender:from:to:subject:date:message-id:in-reply-to:references;
	bh=NaUUePrZTMFrnxrhULE/BvznM7BAtts90OeDHbVdXcM=;
	b=GlaseyWVjel+iR6UXzvpoBYc2erAArd2yowiyAYV+Nsk6jwEyoOa1weGpEO7jqMD6R
	9ggjg+8rmQhkYdI8gctxUAy7/JSjlrxO0zUDXzCSJ3DJN8jua5um6YUX+vbZeZGyY7eO
	KZcNvGtp9ZytQvlylPYyF44K0sucN+z/wrufc6n620/9e73VgRVNvhP5Yw3la0jeSXrV
	TXBGloJWgttbwhGi98Z5le63hP0auoguDUF3+XNT1V9Ptiby1giQ+vMCJCh4ucNQf/y0
	G71cUehugxpPtU6ErQYmZAfqR1xkNa0GJQSxJQsmdeUbdOe0nPWEvSf+OmezNLna1eYm
	T32g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:sender:from:to:subject:date:message-id
	:in-reply-to:references;
	bh=NaUUePrZTMFrnxrhULE/BvznM7BAtts90OeDHbVdXcM=;
	b=TkJkIgZ8KABZisAi8lZ+QSI/IOboGCV/rql1Pb96XJIcJiZL9y6sa8ASYWwYF3USmu
	h60IDQeedVAcI/gBj5SBoscmcAeQr6TnMPHJLfQc7HCVyh9Qf3igV9ls3FCO+NqBhysH
	1KV0sApkGSk8f2n7v/77LRCe17wRaQyt8xqW7NabKAVAh3hrq5wBCJNEKvvnZDNN5bfB
	zzc3fFofqk9dMe91VamtxX237gGWFuEzJlL1c6yadwvU3QZEid+h6IZLUb41oDNXqpGO
	uHplSzfReAAlyd99R4WSw5nejKeuBw5X/ptflxNjHS47Rk4ALH19Yx13VaIcXht+VipQ
	owIg==
X-Gm-Message-State: 
 AE9vXwPH0dVmQp5lWjsDHXW2DoOysOhQy82m3oUTa7AQOnLhUuVXtWiVyObcRHLcN/pSoA==
X-Received: by 10.28.0.77 with SMTP id 74mr1321488wma.29.1473405682959;
	Fri, 09 Sep 2016 00:21:22 -0700 (PDT)
Received: from haswell.alporthouse.com ([78.156.65.138])
	by smtp.gmail.com with ESMTPSA id
	kk6sm1891592wjb.44.2016.09.09.00.21.20
	for <intel-gfx@lists.freedesktop.org>
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Fri, 09 Sep 2016 00:21:21 -0700 (PDT)
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Fri,  9 Sep 2016 08:20:57 +0100
Message-Id: <20160909072107.18861-11-chris@chris-wilson.co.uk>
X-Mailer: git-send-email 2.9.3
In-Reply-To: <20160909072107.18861-1-chris@chris-wilson.co.uk>
References: <20160909072107.18861-1-chris@chris-wilson.co.uk>
Subject: [Intel-gfx] [CI 11/21] drm/i915: Perform a direct reset of the GPU
	from the waiter
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

If a waiter is holding the struct_mutex, then the reset worker cannot
reset the GPU until the waiter returns. We do not want to return -EAGAIN
form i915_wait_request as that breaks delicate operations like
i915_vma_unbind() which often cannot be restarted easily, and returning
-EIO is just as useless (and has in the past proven dangerous). The
remaining WARN_ON(i915_wait_request) serve as a valuable reminder that
handling errors from an indefinite wait are tricky.

We can keep the current semantic that knowing after a reset is complete,
so is the request, by performing the reset ourselves if we hold the
mutex.

uevent emission is still handled by the reset worker, so it may appear
slightly out of order with respect to the actual reset (and concurrent
use of the device).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c         | 11 ++++++-----
 drivers/gpu/drm/i915/i915_drv.h         | 15 +++------------
 drivers/gpu/drm/i915/i915_gem_request.c | 29 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_irq.c         |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c |  3 ---
 5 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 47a676d859db..ff4173e6e298 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1729,6 +1729,8 @@ int i915_resume_switcheroo(struct drm_device *dev)
  * Reset the chip.  Useful if a hang is detected. Returns zero on successful
  * reset or otherwise an error code.
  *
+ * Caller must hold the struct_mutex.
+ *
  * Procedure is fairly simple:
  *   - reset the chip using the reset reg
  *   - re-init context state
@@ -1743,7 +1745,10 @@ int i915_reset(struct drm_i915_private *dev_priv)
 	struct i915_gpu_error *error = &dev_priv->gpu_error;
 	int ret;
 
-	mutex_lock(&dev->struct_mutex);
+	lockdep_assert_held(&dev->struct_mutex);
+
+	if (!test_and_clear_bit(I915_RESET_IN_PROGRESS, &error->flags))
+		return test_bit(I915_WEDGED, &error->flags) ? -EIO : 0;
 
 	/* Clear any previous failed attempts at recovery. Time to try again. */
 	__clear_bit(I915_WEDGED, &error->flags);
@@ -1784,9 +1789,6 @@ int i915_reset(struct drm_i915_private *dev_priv)
 		goto error;
 	}
 
-	clear_bit(I915_RESET_IN_PROGRESS, &error->flags);
-	mutex_unlock(&dev->struct_mutex);
-
 	/*
 	 * rps/rc6 re-init is necessary to restore state lost after the
 	 * reset and the re-install of gt irqs. Skip for ironlake per
@@ -1800,7 +1802,6 @@ int i915_reset(struct drm_i915_private *dev_priv)
 
 error:
 	set_bit(I915_WEDGED, &error->flags);
-	mutex_unlock(&dev->struct_mutex);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 20b7743f8ec5..15f1977e356a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3863,7 +3863,9 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
 			    schedule_timeout_uninterruptible(remaining_jiffies);
 	}
 }
-static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
+
+static inline bool
+__i915_request_irq_complete(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *engine = req->engine;
 
@@ -3925,17 +3927,6 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 			return true;
 	}
 
-	/* We need to check whether any gpu reset happened in between
-	 * the request being submitted and now. If a reset has occurred,
-	 * the seqno will have been advance past ours and our request
-	 * is complete. If we are in the process of handling a reset,
-	 * the request is effectively complete as the rendering will
-	 * be discarded, but we need to return in order to drop the
-	 * struct_mutex.
-	 */
-	if (i915_reset_in_progress(&req->i915->gpu_error))
-		return true;
-
 	return false;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 5f89801e6a16..64c370681a81 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -533,6 +533,16 @@ void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 	engine->submit_request(request);
 }
 
+static void reset_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&q->lock, flags);
+	if (list_empty(&wait->task_list))
+		__add_wait_queue(q, wait);
+	spin_unlock_irqrestore(&q->lock, flags);
+}
+
 static unsigned long local_clock_us(unsigned int *cpu)
 {
 	unsigned long t;
@@ -710,6 +720,25 @@ wakeup:
 		if (__i915_request_irq_complete(req))
 			break;
 
+		/* If the GPU is hung, and we hold the lock, reset the GPU
+		 * and then check for completion. On a full reset, the engine's
+		 * HW seqno will be advanced passed us and we are complete.
+		 * If we do a partial reset, we have to wait for the GPU to
+		 * resume and update the breadcrumb.
+		 *
+		 * If we don't hold the mutex, we can just wait for the worker
+		 * to come along and update the breadcrumb (either directly
+		 * itself, or indirectly by recovering the GPU).
+		 */
+		if (flags & I915_WAIT_LOCKED &&
+		    i915_reset_in_progress(&req->i915->gpu_error)) {
+			__set_current_state(TASK_RUNNING);
+			i915_reset(req->i915);
+			reset_wait_queue(&req->i915->gpu_error.wait_queue,
+					 &reset);
+			continue;
+		}
+
 		/* Only spin if we know the GPU is processing this request */
 		if (i915_spin_request(req, state, 2))
 			break;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index ed172d7beecb..2c7cb5041511 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2521,7 +2521,9 @@ static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv)
 	 * pending state and not properly drop locks, resulting in
 	 * deadlocks with the reset work.
 	 */
+	mutex_lock(&dev_priv->drm.struct_mutex);
 	ret = i915_reset(dev_priv);
+	mutex_unlock(&dev_priv->drm.struct_mutex);
 
 	intel_finish_reset(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 8687a84a7ff3..d2d85fc869e1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2229,9 +2229,6 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
 	if (ret)
 		return ret;
 
-	if (i915_reset_in_progress(&target->i915->gpu_error))
-		return -EAGAIN;
-
 	i915_gem_request_retire_upto(target);
 
 	intel_ring_update_space(ring);