From patchwork Thu Nov 15 16:17:23 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Vetter <daniel.vetter@ffwll.ch>
X-Patchwork-Id: 1750751
Return-Path: 
 <intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org>
X-Original-To: patchwork-intel-gfx@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork1.kernel.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	by patchwork1.kernel.org (Postfix) with ESMTP id 2DF063FC8A
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Thu, 15 Nov 2012 16:30:35 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 1EB56A0944
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Thu, 15 Nov 2012 08:30:35 -0800 (PST)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from mail-ee0-f49.google.com (mail-ee0-f49.google.com
	[74.125.83.49])
	by gabe.freedesktop.org (Postfix) with ESMTP id 60374A091E
	for <intel-gfx@lists.freedesktop.org>;
	Thu, 15 Nov 2012 08:28:56 -0800 (PST)
Received: by mail-ee0-f49.google.com with SMTP id c1so1062357eek.36
	for <intel-gfx@lists.freedesktop.org>;
	Thu, 15 Nov 2012 08:28:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google;
	h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references;
	bh=YRyVnCzNjnHzGqFIsVMn/3JO/5CRmmgU9HOCnCbAjw4=;
	b=UagV6NYsAh+P4TmarAgVYAbaGiNT/jzdodQAJSamPAYz3Z9H9CroNyyOdec0kxigrB
	TZnI1yDZN1I5inJTOsPX62dRraCs3r6c7hvp5RMyutx6rYx4BS8zo3d8b+Lp+xhAlYCl
	q/IZZ7eBQgtTAtBjOmyavCzsqwr4npQp8ngHA=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references
	:x-gm-message-state;
	bh=YRyVnCzNjnHzGqFIsVMn/3JO/5CRmmgU9HOCnCbAjw4=;
	b=m1yZzmR+TLzPX7sjY07YxIUFwDigW1C0H2+ZEyps/BdSNqV2HWi/tMBhQN6o0cDybk
	g1bS2uP+6whGb+cV530OfWDK3TrfY65NkrsGL3IKYk02GyvmYfcqLAjf6u2cks3jhry0
	uQJv2B7MriO1ec9TtsCxKsXaM3q7JHqH6g63qBedKV8KNJ2OozbD5bymApncFXPqh8ev
	SzYMJz9k2yNATwHHBHOmDtQYyYFW6PpIu2SzAInh95nRVCHFum1/wWolUm+bqjz/P/bZ
	bjiKK+gLz5OxqzcIcBewHpFxzJ9LvjwJSF8bQmPPb0xR/VIhryEVZQmtEfhZb361Zbfw
	ExmQ==
Received: by 10.14.223.199 with SMTP id v47mr5177116eep.45.1352996935481;
	Thu, 15 Nov 2012 08:28:55 -0800 (PST)
Received: from fliege.ffwll.local (178-83-130-250.dynamic.hispeed.ch.
	[178.83.130.250]) by mx.google.com with ESMTPS id
	z43sm37048046een.16.2012.11.15.08.28.54
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 15 Nov 2012 08:28:54 -0800 (PST)
From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Date: Thu, 15 Nov 2012 17:17:23 +0100
Message-Id: <1352996243-15590-2-git-send-email-daniel.vetter@ffwll.ch>
X-Mailer: git-send-email 1.7.11.4
In-Reply-To: <1352996243-15590-1-git-send-email-daniel.vetter@ffwll.ch>
References: <1352909648-21514-1-git-send-email-daniel.vetter@ffwll.ch>
	<1352996243-15590-1-git-send-email-daniel.vetter@ffwll.ch>
X-Gm-Message-State: 
 ALoCoQkmSQbuJ9QxQ09sbNTdcTfcnTOHnS6NkUPWlvW+ZK7EWVd+aOANuE9m9j6Tq6MG1SYfgHqG
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: [Intel-gfx] [PATCH 2/2] drm/i915: create a race-free reset detection
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Sender: 
 intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org
Errors-To: 
 intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org

With the previous patch the state transition handling of the reset
code itself is now (hopefully) race free and solid. But that still
leaves out everyone else - with the various lock-free wait paths
we have there's the possibility that the reset happens between the
point where we read the seqno we should wait on and the actual wait.

And if __wait_seqno then never sees the RESET_IN_PROGRESS state, we'll
happily wait for a seqno which will in all likelyhood never signal.

In practice this is not a big problem since the X server gets
constantly interrupted, and can then submit more work (hopefully) to
unblock everyone else: As soon as a new seqno write lands, all waiters
will unblock. But running the i-g-t reset testcase ZZ_hangman can
expose this race, especially on slower hw with fewer cpu cores.

Now looking forward to ARB_robustness and friends that's not the best
possible behaviour, hence this patch adds a reset_counter to be able
to detect any reset, even if a given thread never observed the
in-progress state.

The important part is to correctly order things:
- The write side needs to increment the counter after any seqno gets
  reset.  Hence we need to do that at the end of the reset work, and
  again wake everyone up. We also need to place a barrier in between
  any possible seqno changes and the counter increment, since any
  unlock operations only guarantee that nothing leaks out, but not
  that at later load operation gets moved ahead.
- On the read side we need to ensure that no reset can sneak in and
  invalidate the seqno. In all cases we can use the one-sided barrier
  that unlock operations guarantee (of the lock protecting the
  respective seqno/ring pair) to ensure correct ordering. Hence it is
  sufficient to place the atomic read before the mutex/spin_unlock and
  no additional barriers are required.

The end-result of all this is that we need to wake up everyone twice
in a reset operation:
- First, before the reset starts, to get any lockholders of the locks,
  so that the reset can proceed.
- Second, after the reset is completed, to allow waiters to properly
  and reliably detect the reset condition and bail out.

I admit that this entire reset_counter thing smells a bit like
overkill, but I think it's justified since it makes it really explicit
what the bail-out condition is. And we need a reset counter anyway to
implement ARB_robustness, and imo with finer-grained locking on the
horizont this is the most resilient scheme I could think of.

v2: Drop spurious change in the wait_for_error EXIT_COND - we only
need to wait until we leave the reset-in-progress wedged state.

v3: Don't play tricks with barriers in the throttle ioctl, the
spin_unlock is barrier enough.

I've also considered using a little helper to grab the current
reset_counter, but then decided that hiding the atomic_read isn't a
great idea, since having it explicitly show up in the code is a nice
remainder to reviews to check the memory barriers.

v4: Add a comment to explain why we need to fall through in
__wait_seqno in the end variable assignments.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++--
 drivers/gpu/drm/i915/i915_gem.c | 36 +++++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_irq.c | 21 ++++++++++++++++++---
 3 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f68de01..d1e03b6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -727,9 +727,16 @@ struct i915_gpu_error {
 	unsigned long last_reset;
 
 	/**
-	 * State variable controlling the reset flow
+	 * State variable and reset counter controlling the reset flow
 	 *
-	 * Upper bits are for the reset counter.
+	 * Upper bits are for the reset counter.  This counter is used by the
+	 * wait_seqno code to race-free noticed that a reset event happened and
+	 * that it needs to restart the entire ioctl (since most likely the
+	 * seqno it waited for won't ever signal anytime soon).
+	 *
+	 * This is important for lock-free wait paths, where no contended lock
+	 * naturally enforces the correct ordering between the bail-out of the
+	 * waiter and the gpu reset work code.
 	 *
 	 * Lowest bit controls the reset state machine: Set means a reset is in
 	 * progress. This state will (presuming we don't have any bugs) decay
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ddad3ed..bbfa887 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -977,13 +977,22 @@ i915_gem_check_olr(struct intel_ring_buffer *ring, u32 seqno)
  * __wait_seqno - wait until execution of seqno has finished
  * @ring: the ring expected to report seqno
  * @seqno: duh!
+ * @reset_counter: reset sequence associated with the given seqno
  * @interruptible: do an interruptible wait (normally yes)
  * @timeout: in - how long to wait (NULL forever); out - how much time remaining
  *
+ * Note: It is of utmost importance that the passed in seqno and reset_counter
+ * values have been read by the caller in an smb safe manner. Where read-side
+ * locks are involved, it is sufficient to read the reset_counter before
+ * unlocking the lock that protects the seqno. For lockless tricks, the
+ * reset_counter _must_ be read before, and an appropriate smb_rmb must be
+ * inserted.
+ *
  * Returns 0 if the seqno was found within the alloted time. Else returns the
  * errno with remaining time filled in timeout argument.
  */
 static int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno,
+			unsigned reset_counter,
 			bool interruptible, struct timespec *timeout)
 {
 	drm_i915_private_t *dev_priv = ring->dev->dev_private;
@@ -1013,7 +1022,8 @@ static int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno,
 
 #define EXIT_COND \
 	(i915_seqno_passed(ring->get_seqno(ring, false), seqno) || \
-	i915_reset_in_progress(&dev_priv->gpu_error))
+	 i915_reset_in_progress(&dev_priv->gpu_error) || \
+	 reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
 	do {
 		if (interruptible)
 			end = wait_event_interruptible_timeout(ring->irq_queue,
@@ -1023,6 +1033,13 @@ static int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno,
 			end = wait_event_timeout(ring->irq_queue, EXIT_COND,
 						 timeout_jiffies);
 
+		/* We need to check whether any gpu reset happened in between
+		 * the caller grabbing the seqno and now ... */
+		if (reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
+			end = -EAGAIN;
+
+		/* ... but upgrade the -EGAIN to an -EIO if the gpu is truely
+		 * gone. */
 		ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
 		if (ret)
 			end = ret;
@@ -1077,7 +1094,9 @@ i915_wait_seqno(struct intel_ring_buffer *ring, uint32_t seqno)
 	if (ret)
 		return ret;
 
-	return __wait_seqno(ring, seqno, interruptible, NULL);
+	return __wait_seqno(ring, seqno,
+			    atomic_read(&dev_priv->gpu_error.reset_counter),
+			    interruptible, NULL);
 }
 
 /**
@@ -1124,6 +1143,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring = obj->ring;
+	unsigned reset_counter;
 	u32 seqno;
 	int ret;
 
@@ -1142,8 +1162,9 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
+	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 	mutex_unlock(&dev->struct_mutex);
-	ret = __wait_seqno(ring, seqno, true, NULL);
+	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL);
 	mutex_lock(&dev->struct_mutex);
 
 	i915_gem_retire_requests_ring(ring);
@@ -2275,10 +2296,12 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 int
 i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
+	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct drm_i915_gem_wait *args = data;
 	struct drm_i915_gem_object *obj;
 	struct intel_ring_buffer *ring = NULL;
 	struct timespec timeout_stack, *timeout = NULL;
+	unsigned reset_counter;
 	u32 seqno = 0;
 	int ret = 0;
 
@@ -2319,9 +2342,10 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	}
 
 	drm_gem_object_unreference(&obj->base);
+	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 	mutex_unlock(&dev->struct_mutex);
 
-	ret = __wait_seqno(ring, seqno, true, timeout);
+	ret = __wait_seqno(ring, seqno, reset_counter, true, timeout);
 	if (timeout) {
 		WARN_ON(!timespec_valid(timeout));
 		args->timeout_ns = timespec_to_ns(timeout);
@@ -3386,6 +3410,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	unsigned long recent_enough = jiffies - msecs_to_jiffies(20);
 	struct drm_i915_gem_request *request;
 	struct intel_ring_buffer *ring = NULL;
+	unsigned reset_counter;
 	u32 seqno = 0;
 	int ret;
 
@@ -3405,12 +3430,13 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 		ring = request->ring;
 		seqno = request->seqno;
 	}
+	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 	spin_unlock(&file_priv->mm.lock);
 
 	if (seqno == 0)
 		return 0;
 
-	ret = __wait_seqno(ring, seqno, true, NULL);
+	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL);
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 08e1287..77de66c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -843,9 +843,11 @@ static void i915_error_work_func(struct work_struct *work)
 	drm_i915_private_t *dev_priv = container_of(error, drm_i915_private_t,
 						    gpu_error);
 	struct drm_device *dev = dev_priv->dev;
+	struct intel_ring_buffer *ring;
 	char *error_event[] = { "ERROR=1", NULL };
 	char *reset_event[] = { "RESET=1", NULL };
 	char *reset_done_event[] = { "ERROR=0", NULL };
+	int i;
 
 	kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, error_event);
 
@@ -854,12 +856,25 @@ static void i915_error_work_func(struct work_struct *work)
 		kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, reset_event);
 
 		if (!i915_reset(dev)) {
-			atomic_set(&error->reset_counter, 0);
 			kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, reset_done_event);
 		} else {
 			atomic_set(&error->reset_counter, I915_WEDGED);
 		}
 
+		/*
+		 * After all the gem state is reset, increment the reset counter
+		 * and wake up everyone waiting for the reset to complete.
+		 *
+		 * Since unlock operations are a one-sided barrier only, we need
+		 * to insert a barrier here to order any seqno updates before
+		 * the counter increment.
+		 */
+		smp_mb__before_atomic_inc();
+		atomic_inc(&dev_priv->gpu_error.reset_counter);
+
+		for_each_ring(ring, dev_priv, i)
+			wake_up_all(&ring->irq_queue);
+
 		wake_up_all(&dev_priv->gpu_error.reset_queue);
 	}
 }
@@ -1440,8 +1455,8 @@ void i915_handle_error(struct drm_device *dev, bool wedged)
 	i915_report_and_clear_eir(dev);
 
 	if (wedged) {
-		atomic_set(&dev_priv->gpu_error.reset_counter,
-			   I915_RESET_IN_PROGRESS_FLAG);
+		atomic_set_mask(I915_RESET_IN_PROGRESS_FLAG,
+				&dev_priv->gpu_error.reset_counter);
 
 		/*
 		 * Wakeup waiting processes so that the reset work item