From patchwork Thu Mar 22 11:17:12 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tvrtko Ursulin <tursulin@ursulin.net>
X-Patchwork-Id: 10301261
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	521BA60386 for <patchwork-intel-gfx@patchwork.kernel.org>;
	Thu, 22 Mar 2018 11:17:33 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3C90722230
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Thu, 22 Mar 2018 11:17:33 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 30DC6284C3; Thu, 22 Mar 2018 11:17:33 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 706AB22230
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Thu, 22 Mar 2018 11:17:32 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id D782C6EBE4;
	Thu, 22 Mar 2018 11:17:31 +0000 (UTC)
X-Original-To: Intel-gfx@lists.freedesktop.org
Delivered-To: Intel-gfx@lists.freedesktop.org
Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com
	[IPv6:2a00:1450:400c:c09::244])
	by gabe.freedesktop.org (Postfix) with ESMTPS id CF2D96EBE4
	for <Intel-gfx@lists.freedesktop.org>;
	Thu, 22 Mar 2018 11:17:29 +0000 (UTC)
Received: by mail-wm0-x244.google.com with SMTP id f125so15294696wme.4
	for <Intel-gfx@lists.freedesktop.org>;
	Thu, 22 Mar 2018 04:17:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=ursulin-net.20150623.gappssmtp.com; s=20150623;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=PxYet5rCh/byENvSY2p9jdcEASG2CvPQR6h+H0sRr0k=;
	b=gSWwQ65GzGuUM9Y2nEww4Sy09NhpkRuZaIynu6goORR2FC8UNZI7cpjYIemg8/Rwri
	eLN+ZSPwH2EP3n2HvrkB/r5sIEnvK8pjd0dKWcVPdWXJ4V226vryPHDJ93wNgVlV3U9P
	Sehw/A3CkVxllZUMFQsCU6Fi2ajjuYVyRNr3hmcc7cXC6mRZJMsasPM1OSQGEV4pJLwL
	16dFh/FBaJCMafkGy/U6tELVPvZXpkRcL09/OQ2gakMBHxQ/8WaiUW7KSaWnCbB1ER2k
	yn7rFFlzJNhawU/X/94YZVXrEQ2oFku3ZS/CmhScNJjRvm9gVfvvQjKwh/6nj28afGQd
	AVIw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=PxYet5rCh/byENvSY2p9jdcEASG2CvPQR6h+H0sRr0k=;
	b=lz9dF3hfYRNvLgcvP8PAo2Syk6NhQs4ZNYhIP6pwSAH1ytDjOY0GhoyMM4UW8IIt/F
	GzPHbGq5Gmndxp0cucIiJbQZ4eJ1XNwhd6vyDfBqTGW1dT9wkEpE+62c/E/tZN7Mb76s
	CishUwAWxk+mLcWV0kJ28CCKy3J57BCWscF2sc3dcTSjbN4EuHsfTlD/Vx8iAAi+Q9LN
	1MJirf2P9a/mTahU1XsJuU6xrlmeD+gPN4ay1k7PJSdjiy9L/l3b+Y6G6Y5G9lnuHDnR
	zUyyDjENUnebGSJUdLWqpboGS0Ku725a/hbTwHMZe++pWQSUml7L0h/LreZYxqQsngdF
	O+vA==
X-Gm-Message-State: AElRT7HDBibO4sGnigfCfDpQJPk6Y4neACexBIZ+LmnZO4hYxe3EmYoL
	O5V2oDgpjONcyfCdXffimNV07Q==
X-Google-Smtp-Source: 
 AG47ELuys/hqvPdGOwRDyAoq2YT6GlOgt1pUdO02gFecYg9h57SgvHsxZnE+rwK6NSyNgAqZi3GtUg==
X-Received: by 10.28.218.14 with SMTP id r14mr5540471wmg.133.1521717448273;
	Thu, 22 Mar 2018 04:17:28 -0700 (PDT)
Received: from localhost.localdomain ([95.146.144.186])
	by smtp.gmail.com with ESMTPSA id
	r19sm8056293wmd.48.2018.03.22.04.17.22
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 22 Mar 2018 04:17:22 -0700 (PDT)
From: Tvrtko Ursulin <tursulin@ursulin.net>
X-Google-Original-From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: igt-dev@lists.freedesktop.org
Date: Thu, 22 Mar 2018 11:17:12 +0000
Message-Id: <20180322111712.9056-3-tvrtko.ursulin@linux.intel.com>
X-Mailer: git-send-email 2.14.1
In-Reply-To: <20180322111712.9056-1-tvrtko.ursulin@linux.intel.com>
References: <20180322111712.9056-1-tvrtko.ursulin@linux.intel.com>
Subject: [Intel-gfx] [PATCH i-g-t 3/3] tests/perf_pmu: Improve accuracy by
	waiting on spinner to start
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Intel-gfx@lists.freedesktop.org
MIME-Version: 1.0
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

More than one test assumes that the spinner is running pretty much
immediately after we have create or submitted it.

In actuality there is a variable delay, especially on execlists platforms,
between submission and spin batch starting to run on the hardware.

To enable tests which care about this level of timing to account for this,
we add a new spin batch constructor which provides an output field which
can be polled to determine when the batch actually started running.

This is implemented via MI_STOREDW_IMM from the spin batch, writing into
memory mapped page shared with userspace.

Using this facility from perf_pmu, where applicable, should improve very
occasional test fails across the set and platforms.

v2:
 Chris Wilson:
 * Use caching mapping if available.
 * Handle old gens better.
 * Use gem_can_store_dword.
 * Cache exec obj array in spin_batch_t for easier resubmit.

v3:
 * Forgot I915_EXEC_NO_RELOC. (Chris Wilson)

v4:
 * Mask out all non-engine flags in gem_can_store_dword.
 * Added some debug logging.

v5:
 * Fix relocs and batch munmap. (Chris)
 * Added assert idle spinner batch looks as expected.

v6:
 * Skip accuracy tests when !gem_can_store_dword.

v7:
 * Fix batch recursion reloc address.

v8:
 Chris Wilson:
 * Pull up gem_can_store_dword check before we start submitting.
 * Build spinner batch in a way we can skip store dword when not
   needed so we can run on SandyBridge.

v9:
 * Fix wait on spinner.
 * More tweaks to accuracy test.

v10:
 * Dropped accuracy subtest changes due problems with RT thread and
   tasklet submission.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v9
---
 tests/perf_pmu.c | 151 +++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 103 insertions(+), 48 deletions(-)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 19fcc95ffc7f..203fe48955a8 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -170,6 +170,56 @@ static unsigned int e2ring(int gem_fd, const struct intel_execution_engine2 *e)
 #define FLAG_LONG (16)
 #define FLAG_HANG (32)
 
+static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
+{
+	if (gem_can_store_dword(fd, flags))
+		return __igt_spin_batch_new_poll(fd, ctx, flags);
+	else
+		return __igt_spin_batch_new(fd, ctx, flags, 0);
+}
+
+static unsigned long __spin_wait(int fd, igt_spin_t *spin)
+{
+	struct timespec start = { };
+
+	igt_nsec_elapsed(&start);
+
+	if (spin->running) {
+		unsigned long timeout = 0;
+
+		while (!*((volatile bool *)spin->running)) {
+			unsigned long t = igt_nsec_elapsed(&start);
+
+			if ((t - timeout) > 250e6) {
+				timeout = t;
+				igt_warn("Spinner not running after %.2fms\n",
+					 (double)t / 1e6);
+			}
+		}
+	} else {
+		igt_debug("__spin_wait - usleep mode\n");
+		usleep(500e3); /* Better than nothing! */
+	}
+
+	return igt_nsec_elapsed(&start);
+}
+
+static igt_spin_t * __spin_sync(int fd, uint32_t ctx, unsigned long flags)
+{
+	igt_spin_t *spin = __spin_poll(fd, ctx, flags);
+
+	__spin_wait(fd, spin);
+
+	return spin;
+}
+
+static igt_spin_t * spin_sync(int fd, uint32_t ctx, unsigned long flags)
+{
+	igt_require_gem(fd);
+
+	return __spin_sync(fd, ctx, flags);
+}
+
 static void end_spin(int fd, igt_spin_t *spin, unsigned int flags)
 {
 	if (!spin)
@@ -180,8 +230,25 @@ static void end_spin(int fd, igt_spin_t *spin, unsigned int flags)
 	if (flags & FLAG_SYNC)
 		gem_sync(fd, spin->handle);
 
-	if (flags & TEST_TRAILING_IDLE)
-		usleep(batch_duration_ns / 5000);
+	if (flags & TEST_TRAILING_IDLE) {
+		unsigned long t, timeout = 0;
+		struct timespec start = { };
+
+		igt_nsec_elapsed(&start);
+
+		do {
+			t = igt_nsec_elapsed(&start);
+
+			if (gem_bo_busy(fd, spin->handle) &&
+			    (t - timeout) > 10e6) {
+				timeout = t;
+				igt_warn("Spinner not idle after %.2fms\n",
+					 (double)t / 1e6);
+			}
+
+			usleep(1e3);
+		} while (t < batch_duration_ns / 5);
+	}
 }
 
 static void
@@ -195,7 +262,7 @@ single(int gem_fd, const struct intel_execution_engine2 *e, unsigned int flags)
 	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
 
 	if (flags & TEST_BUSY)
-		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		spin = spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 	else
 		spin = NULL;
 
@@ -251,13 +318,7 @@ busy_start(int gem_fd, const struct intel_execution_engine2 *e)
 	 */
 	sleep(2);
 
-	spin = __igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
-
-	/*
-	 * Sleep for a bit after making the engine busy to make sure the PMU
-	 * gets enabled when the batch is already running.
-	 */
-	usleep(500e3);
+	spin = __spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 
 	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
 
@@ -300,7 +361,7 @@ busy_double_start(int gem_fd, const struct intel_execution_engine2 *e)
 	 * re-submission in execlists mode. Make sure busyness is correctly
 	 * reported with the engine busy, and after the engine went idle.
 	 */
-	spin[0] = __igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	spin[0] = __spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 	usleep(500e3);
 	spin[1] = __igt_spin_batch_new(gem_fd, ctx, e2ring(gem_fd, e), 0);
 
@@ -386,7 +447,7 @@ busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 
 	igt_assert_eq(i, num_engines);
 
-	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	spin = spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 	pmu_read_multi(fd[0], num_engines, tval[0]);
 	slept = measured_usleep(batch_duration_ns / 1000);
 	if (flags & TEST_TRAILING_IDLE)
@@ -412,15 +473,15 @@ busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 }
 
 static void
-__submit_spin_batch(int gem_fd,
-		    struct drm_i915_gem_exec_object2 *obj,
-		    const struct intel_execution_engine2 *e)
+__submit_spin_batch(int gem_fd, igt_spin_t *spin,
+		    const struct intel_execution_engine2 *e,
+		    int offset)
 {
-	struct drm_i915_gem_execbuffer2 eb = {
-		.buffer_count = 1,
-		.buffers_ptr = to_user_pointer(obj),
-		.flags = e2ring(gem_fd, e),
-	};
+	struct drm_i915_gem_execbuffer2 eb = spin->execbuf;
+
+	eb.flags &= ~(0x3f | I915_EXEC_BSD_MASK);
+	eb.flags |= e2ring(gem_fd, e) | I915_EXEC_NO_RELOC;
+	eb.batch_start_offset += offset;
 
 	gem_execbuf(gem_fd, &eb);
 }
@@ -429,7 +490,6 @@ static void
 most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 		    const unsigned int num_engines, unsigned int flags)
 {
-	struct drm_i915_gem_exec_object2 obj = {};
 	const struct intel_execution_engine2 *e_;
 	uint64_t tval[2][num_engines];
 	uint64_t val[num_engines];
@@ -443,15 +503,12 @@ most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
 			continue;
 
-		if (e == e_) {
+		if (e == e_)
 			idle_idx = i;
-		} else if (spin) {
-			__submit_spin_batch(gem_fd, &obj, e_);
-		} else {
-			spin = igt_spin_batch_new(gem_fd, 0,
-						  e2ring(gem_fd, e_), 0);
-			obj.handle = spin->handle;
-		}
+		else if (spin)
+			__submit_spin_batch(gem_fd, spin, e_, 64);
+		else
+			spin = __spin_poll(gem_fd, 0, e2ring(gem_fd, e_));
 
 		val[i++] = I915_PMU_ENGINE_BUSY(e_->class, e_->instance);
 	}
@@ -461,6 +518,9 @@ most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 	for (i = 0; i < num_engines; i++)
 		fd[i] = open_group(val[i], fd[0]);
 
+	/* Small delay to allow engines to start. */
+	usleep(__spin_wait(gem_fd, spin) * num_engines / 1e3);
+
 	pmu_read_multi(fd[0], num_engines, tval[0]);
 	slept = measured_usleep(batch_duration_ns / 1000);
 	if (flags & TEST_TRAILING_IDLE)
@@ -489,7 +549,6 @@ static void
 all_busy_check_all(int gem_fd, const unsigned int num_engines,
 		   unsigned int flags)
 {
-	struct drm_i915_gem_exec_object2 obj = {};
 	const struct intel_execution_engine2 *e;
 	uint64_t tval[2][num_engines];
 	uint64_t val[num_engines];
@@ -503,13 +562,10 @@ all_busy_check_all(int gem_fd, const unsigned int num_engines,
 		if (!gem_has_engine(gem_fd, e->class, e->instance))
 			continue;
 
-		if (spin) {
-			__submit_spin_batch(gem_fd, &obj, e);
-		} else {
-			spin = igt_spin_batch_new(gem_fd, 0,
-						  e2ring(gem_fd, e), 0);
-			obj.handle = spin->handle;
-		}
+		if (spin)
+			__submit_spin_batch(gem_fd, spin, e, 64);
+		else
+			spin = __spin_poll(gem_fd, 0, e2ring(gem_fd, e));
 
 		val[i++] = I915_PMU_ENGINE_BUSY(e->class, e->instance);
 	}
@@ -519,6 +575,9 @@ all_busy_check_all(int gem_fd, const unsigned int num_engines,
 	for (i = 0; i < num_engines; i++)
 		fd[i] = open_group(val[i], fd[0]);
 
+	/* Small delay to allow engines to start. */
+	usleep(__spin_wait(gem_fd, spin) * num_engines / 1e3);
+
 	pmu_read_multi(fd[0], num_engines, tval[0]);
 	slept = measured_usleep(batch_duration_ns / 1000);
 	if (flags & TEST_TRAILING_IDLE)
@@ -550,7 +609,7 @@ no_sema(int gem_fd, const struct intel_execution_engine2 *e, unsigned int flags)
 	open_group(I915_PMU_ENGINE_WAIT(e->class, e->instance), fd);
 
 	if (flags & TEST_BUSY)
-		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		spin = spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 	else
 		spin = NULL;
 
@@ -884,7 +943,7 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
 	 */
 	fd[1] = open_pmu(config);
 
-	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	spin = spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 
 	val[0] = val[1] = __pmu_read_single(fd[0], &ts[0]);
 	slept[1] = measured_usleep(batch_duration_ns / 1000);
@@ -1248,7 +1307,7 @@ test_frequency(int gem_fd)
 	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == min_freq);
 
 	gem_quiescent_gpu(gem_fd); /* Idle to be sure the change takes effect */
-	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	spin = spin_sync(gem_fd, 0, I915_EXEC_RENDER);
 
 	slept = pmu_read_multi(fd, 2, start);
 	measured_usleep(batch_duration_ns / 1000);
@@ -1274,7 +1333,7 @@ test_frequency(int gem_fd)
 	igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == max_freq);
 
 	gem_quiescent_gpu(gem_fd);
-	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	spin = spin_sync(gem_fd, 0, I915_EXEC_RENDER);
 
 	slept = pmu_read_multi(fd, 2, start);
 	measured_usleep(batch_duration_ns / 1000);
@@ -1517,7 +1576,6 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 		const unsigned long timeout[] = {
 			pwm_calibration_us * 1000, test_us * 1000
 		};
-		struct drm_i915_gem_exec_object2 obj = {};
 		uint64_t total_busy_ns = 0, total_idle_ns = 0;
 		igt_spin_t *spin;
 		int ret;
@@ -1531,11 +1589,8 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 
 		/* Allocate our spin batch and idle it. */
 		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
-		obj.handle = spin->handle;
-		__submit_spin_batch(gem_fd, &obj, e); /* record its location */
 		igt_spin_batch_end(spin);
-		gem_sync(gem_fd, obj.handle);
-		obj.flags |= EXEC_OBJECT_PINNED;
+		gem_sync(gem_fd, spin->handle);
 
 		/* 1st pass is calibration, second pass is the test. */
 		for (int pass = 0; pass < ARRAY_SIZE(timeout); pass++) {
@@ -1549,7 +1604,7 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 
 				/* Restart the spinbatch. */
 				__rearm_spin_batch(spin);
-				__submit_spin_batch(gem_fd, &obj, e);
+				__submit_spin_batch(gem_fd, spin, e, 0);
 
 				/*
 				 * Note that the submission may be delayed to a
@@ -1559,7 +1614,7 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 
 				t_busy = measured_usleep(busy_us);
 				igt_spin_batch_end(spin);
-				gem_sync(gem_fd, obj.handle);
+				gem_sync(gem_fd, spin->handle);
 
 				total_busy_ns += t_busy;