From patchwork Thu Mar 15 12:56:17 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tvrtko Ursulin <tursulin@ursulin.net>
X-Patchwork-Id: 10284445
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	D65C760386 for <patchwork-intel-gfx@patchwork.kernel.org>;
	Thu, 15 Mar 2018 12:56:32 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CC224289F8
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Thu, 15 Mar 2018 12:56:32 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C09B928A13; Thu, 15 Mar 2018 12:56:32 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id BFD0E289F8
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Thu, 15 Mar 2018 12:56:31 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 4D1D66E84F;
	Thu, 15 Mar 2018 12:56:30 +0000 (UTC)
X-Original-To: Intel-gfx@lists.freedesktop.org
Delivered-To: Intel-gfx@lists.freedesktop.org
Received: from mail-wm0-x242.google.com (mail-wm0-x242.google.com
	[IPv6:2a00:1450:400c:c09::242])
	by gabe.freedesktop.org (Postfix) with ESMTPS id 61CCF6E3D7
	for <Intel-gfx@lists.freedesktop.org>;
	Thu, 15 Mar 2018 12:56:28 +0000 (UTC)
Received: by mail-wm0-x242.google.com with SMTP id h76so10307727wme.4
	for <Intel-gfx@lists.freedesktop.org>;
	Thu, 15 Mar 2018 05:56:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=ursulin-net.20150623.gappssmtp.com; s=20150623;
	h=from:to:cc:subject:date:message-id;
	bh=ygwZrYKt9As5e/pyrjrMQ2B7M2OelL2F9WSrAEzoV6k=;
	b=d1GY1Nhz2CBP0MP7N7oVR6zfQXowRqgts+8Q+H0YCi+8/2XqZ4llpLdsGURKqs3V5P
	8eZyvsTkmcpd9xDxzBJSb8hOcKNxjDgdt4OqXz0NCOEG7ILcthogn2ehZzuVZlv4qZXS
	dX1DDQfp9YmCu0wvC+lMzWKiUX2caePIPW1TA8LiqGiGA8ffL4LY+2wU08Kl6fsVa65j
	lf3YkF3OpqguZdvnO04xHBrEPk5UdZp3KxNGavXCiXKobY6c1T4sgcffF1JQLdiP5ypF
	RDn/POyCL8P93JT++sL34GYMNp2YAOKBeASkc41dvMXUc9pPjg+iMSVj31l8tNY6Ttgh
	FDeQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id;
	bh=ygwZrYKt9As5e/pyrjrMQ2B7M2OelL2F9WSrAEzoV6k=;
	b=PytW0s5Kx/XDVYE43xAvbgih08pflQSyKreRQ+kxBRGKelE0Yb132FR8xT04jQS4Xm
	KbaztZvcpjwTK7wsEshKs3pqwglZGMIDeWOfggsP+ti+6BwI2eYn+07jkhCVVpu/KMUq
	uP9e2uzU9n6UscT2QAGf6ei2zmCtzRupNSM1vFFiWMv2zrTYqVTsWCNpYMZZYI1j4mof
	tJ4OFQ/DKiG2rM1sF7CE3WgAivjsdblOdBm1XfJYspBeF1DqeZw84RB69sXCC/WSYEfO
	zrdjXcRaphpLrksMXWu6q9qpgJr/aA26/8zpDb9OAyHRocclwQ3lVfgI+HTZWAC3thxA
	iE+A==
X-Gm-Message-State: AElRT7HkLMTVDz0riYfL/JwGHIaOSJcphbVJUOuDNC8xymtd7OYst6K/
	6puD6p3pYmLE1hFxMMQ18buxjy25
X-Google-Smtp-Source: 
 AG47ELs5u6MYKdELm3XblBIh8cTOzxvSvUEClxxrGOmuk6oTvo/Ojxl9G/5zvudkbePYTfprteI7dA==
X-Received: by 10.28.87.75 with SMTP id l72mr4460110wmb.48.1521118586475;
	Thu, 15 Mar 2018 05:56:26 -0700 (PDT)
Received: from localhost.localdomain ([95.146.144.186])
	by smtp.gmail.com with ESMTPSA id
	59sm4990310wro.57.2018.03.15.05.56.25
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 15 Mar 2018 05:56:25 -0700 (PDT)
From: Tvrtko Ursulin <tursulin@ursulin.net>
X-Google-Original-From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: igt-dev@lists.freedesktop.org
Date: Thu, 15 Mar 2018 12:56:17 +0000
Message-Id: <20180315125617.12062-1-tvrtko.ursulin@linux.intel.com>
X-Mailer: git-send-email 2.14.1
Subject: [Intel-gfx] [PATCH i-g-t] tests/perf_pmu: Improve accuracy by
	waiting on spinner to start
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: Intel-gfx@lists.freedesktop.org
MIME-Version: 1.0
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

More than one test assumes that the spinner is running pretty much
immediately after we have create or submitted it.

In actuality there is a variable delay, especially on execlists platforms,
between submission and spin batch starting to run on the hardware.

To enable tests which care about this level of timing to account for this,
we add a new spin batch constructor which provides an output field which
can be polled to determine when the batch actually started running.

This is implemented via MI_STOREDW_IMM from the spin batch, writing into
memory mapped page shared with userspace.

Using this facility from perf_pmu, where applicable, should improve very
occasional test fails across the set and platforms.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/igt_dummyload.c |  99 +++++++++++++++++++++++++++++++----
 lib/igt_dummyload.h |   9 ++++
 tests/perf_pmu.c    | 145 +++++++++++++++++++++++++++++++++++-----------------
 3 files changed, 196 insertions(+), 57 deletions(-)

diff --git a/lib/igt_dummyload.c b/lib/igt_dummyload.c
index 4b20f23dfe26..0447d2f14d57 100644
--- a/lib/igt_dummyload.c
+++ b/lib/igt_dummyload.c
@@ -74,9 +74,12 @@ fill_reloc(struct drm_i915_gem_relocation_entry *reloc,
 	reloc->write_domain = write_domains;
 }
 
-static int emit_recursive_batch(igt_spin_t *spin,
-				int fd, uint32_t ctx, unsigned engine,
-				uint32_t dep, bool out_fence)
+#define OUT_FENCE	(1 << 0)
+#define POLL_RUN	(1 << 1)
+
+static int
+emit_recursive_batch(igt_spin_t *spin, int fd, uint32_t ctx, unsigned engine,
+		     uint32_t dep, unsigned int flags)
 {
 #define SCRATCH 0
 #define BATCH 1
@@ -116,6 +119,8 @@ static int emit_recursive_batch(igt_spin_t *spin,
 	execbuf.buffer_count++;
 
 	if (dep) {
+		igt_assert(!(flags & POLL_RUN));
+
 		/* dummy write to dependency */
 		obj[SCRATCH].handle = dep;
 		fill_reloc(&relocs[obj[BATCH].relocation_count++],
@@ -123,6 +128,41 @@ static int emit_recursive_batch(igt_spin_t *spin,
 			   I915_GEM_DOMAIN_RENDER,
 			   I915_GEM_DOMAIN_RENDER);
 		execbuf.buffer_count++;
+	} else if (flags & POLL_RUN) {
+		unsigned int offset;
+
+		igt_assert(!dep);
+
+		spin->poll_handle = gem_create(fd, 4096);
+		spin->running = __gem_mmap__wc(fd, spin->poll_handle,
+					       0, 4096, PROT_READ | PROT_WRITE);
+		igt_assert(spin->running);
+		igt_assert_eq(*spin->running, 0);
+
+		*batch++ = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
+
+		if (gen >= 8) {
+			offset = sizeof(uint32_t);
+			*batch++ = 0;
+			*batch++ = 0;
+		} else if (gen >= 4) {
+			offset = 2 * sizeof(uint32_t);
+			*batch++ = 0;
+			*batch++ = 0;
+		} else {
+			offset = sizeof(uint32_t);
+			batch[-1]--;
+			*batch++ = 0;
+		}
+
+		*batch++ = 1;
+
+		obj[SCRATCH].handle = spin->poll_handle;
+		fill_reloc(&relocs[obj[BATCH].relocation_count++],
+			   spin->poll_handle, offset,
+			   I915_GEM_DOMAIN_INSTRUCTION,
+			   I915_GEM_DOMAIN_INSTRUCTION);
+		execbuf.buffer_count++;
 	}
 
 	spin->batch = batch;
@@ -170,14 +210,14 @@ static int emit_recursive_batch(igt_spin_t *spin,
 	execbuf.buffers_ptr = to_user_pointer(obj + (2 - execbuf.buffer_count));
 	execbuf.rsvd1 = ctx;
 
-	if (out_fence)
+	if (flags & OUT_FENCE)
 		execbuf.flags |= I915_EXEC_FENCE_OUT;
 
 	for (i = 0; i < nengine; i++) {
 		execbuf.flags &= ~ENGINE_MASK;
 		execbuf.flags |= engines[i];
 		gem_execbuf_wr(fd, &execbuf);
-		if (out_fence) {
+		if (flags & OUT_FENCE) {
 			int _fd = execbuf.rsvd2 >> 32;
 
 			igt_assert(_fd >= 0);
@@ -199,7 +239,7 @@ static int emit_recursive_batch(igt_spin_t *spin,
 
 static igt_spin_t *
 ___igt_spin_batch_new(int fd, uint32_t ctx, unsigned engine, uint32_t dep,
-		      int out_fence)
+		      unsigned int flags)
 {
 	igt_spin_t *spin;
 
@@ -207,7 +247,7 @@ ___igt_spin_batch_new(int fd, uint32_t ctx, unsigned engine, uint32_t dep,
 	igt_assert(spin);
 
 	spin->out_fence = emit_recursive_batch(spin, fd, ctx, engine, dep,
-					       out_fence);
+					       flags);
 
 	pthread_mutex_lock(&list_lock);
 	igt_list_add(&spin->link, &spin_list);
@@ -219,7 +259,7 @@ ___igt_spin_batch_new(int fd, uint32_t ctx, unsigned engine, uint32_t dep,
 igt_spin_t *
 __igt_spin_batch_new(int fd, uint32_t ctx, unsigned engine, uint32_t dep)
 {
-	return ___igt_spin_batch_new(fd, ctx, engine, dep, false);
+	return ___igt_spin_batch_new(fd, ctx, engine, dep, 0);
 }
 
 /**
@@ -253,7 +293,7 @@ igt_spin_batch_new(int fd, uint32_t ctx, unsigned engine, uint32_t dep)
 igt_spin_t *
 __igt_spin_batch_new_fence(int fd, uint32_t ctx, unsigned engine)
 {
-	return ___igt_spin_batch_new(fd, ctx, engine, 0, true);
+	return ___igt_spin_batch_new(fd, ctx, engine, 0, OUT_FENCE);
 }
 
 /**
@@ -286,6 +326,42 @@ igt_spin_batch_new_fence(int fd, uint32_t ctx, unsigned engine)
 	return spin;
 }
 
+igt_spin_t *
+__igt_spin_batch_new_poll(int fd, uint32_t ctx, unsigned engine)
+{
+	return ___igt_spin_batch_new(fd, ctx, engine, 0, POLL_RUN);
+}
+
+/**
+ * igt_spin_batch_new_poll:
+ * @fd: open i915 drm file descriptor
+ * @engine: Ring to execute batch OR'd with execbuf flags. If value is less
+ *          than 0, execute on all available rings.
+ *
+ * Start a recursive batch on a ring. Immediately returns a #igt_spin_t that
+ * contains the batch's handle that can be waited upon. The returned structure
+ * must be passed to igt_spin_batch_free() for post-processing.
+ *
+ * igt_spin_t->running will containt a pointer which target will change from
+ * zero to one once the spinner actually starts executing on the GPU.
+ *
+ * Returns:
+ * Structure with helper internal state for igt_spin_batch_free().
+ */
+igt_spin_t *
+igt_spin_batch_new_poll(int fd, uint32_t ctx, unsigned engine)
+{
+	igt_spin_t *spin;
+
+	igt_require_gem(fd);
+	igt_require(gem_mmap__has_wc(fd));
+
+	spin = __igt_spin_batch_new_poll(fd, ctx, engine);
+	igt_assert(gem_bo_busy(fd, spin->handle));
+
+	return spin;
+}
+
 static void notify(union sigval arg)
 {
 	igt_spin_t *spin = arg.sival_ptr;
@@ -367,6 +443,11 @@ void igt_spin_batch_free(int fd, igt_spin_t *spin)
 	igt_spin_batch_end(spin);
 	gem_munmap(spin->batch, BATCH_SIZE);
 
+	if (spin->running) {
+		gem_munmap(spin->running, 4096);
+		gem_close(fd, spin->poll_handle);
+	}
+
 	gem_close(fd, spin->handle);
 
 	if (spin->out_fence >= 0)
diff --git a/lib/igt_dummyload.h b/lib/igt_dummyload.h
index 4103e4ab9e36..7ed93a3884b9 100644
--- a/lib/igt_dummyload.h
+++ b/lib/igt_dummyload.h
@@ -36,6 +36,8 @@ typedef struct igt_spin {
 	struct igt_list link;
 	uint32_t *batch;
 	int out_fence;
+	uint32_t poll_handle;
+	bool *running;
 } igt_spin_t;
 
 igt_spin_t *__igt_spin_batch_new(int fd,
@@ -55,6 +57,13 @@ igt_spin_t *igt_spin_batch_new_fence(int fd,
 				     uint32_t ctx,
 				     unsigned engine);
 
+igt_spin_t *__igt_spin_batch_new_poll(int fd,
+				       uint32_t ctx,
+				       unsigned engine);
+igt_spin_t *igt_spin_batch_new_poll(int fd,
+				    uint32_t ctx,
+				    unsigned engine);
+
 void igt_spin_batch_set_timeout(igt_spin_t *spin, int64_t ns);
 void igt_spin_batch_end(igt_spin_t *spin);
 void igt_spin_batch_free(int fd, igt_spin_t *spin);
diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 19fcc95ffc7f..d1b7b23bc646 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -184,6 +184,38 @@ static void end_spin(int fd, igt_spin_t *spin, unsigned int flags)
 		usleep(batch_duration_ns / 5000);
 }
 
+static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
+{
+	return __igt_spin_batch_new_poll(fd, ctx, flags);
+}
+
+static unsigned long __spin_wait(igt_spin_t *spin)
+{
+	struct timespec start = { };
+
+	igt_nsec_elapsed(&start);
+
+	while (!spin->running);
+
+	return igt_nsec_elapsed(&start);
+}
+
+static igt_spin_t * __spin_sync(int fd, uint32_t ctx, unsigned long flags)
+{
+	igt_spin_t *spin = __spin_poll(fd, ctx, flags);
+
+	__spin_wait(spin);
+
+	return spin;
+}
+
+static igt_spin_t * spin_sync(int fd, uint32_t ctx, unsigned long flags)
+{
+	igt_require_gem(fd);
+
+	return __spin_sync(fd, ctx, flags);
+}
+
 static void
 single(int gem_fd, const struct intel_execution_engine2 *e, unsigned int flags)
 {
@@ -195,7 +227,7 @@ single(int gem_fd, const struct intel_execution_engine2 *e, unsigned int flags)
 	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
 
 	if (flags & TEST_BUSY)
-		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		spin = spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 	else
 		spin = NULL;
 
@@ -251,13 +283,7 @@ busy_start(int gem_fd, const struct intel_execution_engine2 *e)
 	 */
 	sleep(2);
 
-	spin = __igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
-
-	/*
-	 * Sleep for a bit after making the engine busy to make sure the PMU
-	 * gets enabled when the batch is already running.
-	 */
-	usleep(500e3);
+	spin = __spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 
 	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
 
@@ -300,7 +326,7 @@ busy_double_start(int gem_fd, const struct intel_execution_engine2 *e)
 	 * re-submission in execlists mode. Make sure busyness is correctly
 	 * reported with the engine busy, and after the engine went idle.
 	 */
-	spin[0] = __igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	spin[0] = __spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 	usleep(500e3);
 	spin[1] = __igt_spin_batch_new(gem_fd, ctx, e2ring(gem_fd, e), 0);
 
@@ -386,7 +412,7 @@ busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 
 	igt_assert_eq(i, num_engines);
 
-	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	spin = spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 	pmu_read_multi(fd[0], num_engines, tval[0]);
 	slept = measured_usleep(batch_duration_ns / 1000);
 	if (flags & TEST_TRAILING_IDLE)
@@ -413,15 +439,25 @@ busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 
 static void
 __submit_spin_batch(int gem_fd,
+		    igt_spin_t *spin,
 		    struct drm_i915_gem_exec_object2 *obj,
 		    const struct intel_execution_engine2 *e)
 {
 	struct drm_i915_gem_execbuffer2 eb = {
-		.buffer_count = 1,
 		.buffers_ptr = to_user_pointer(obj),
 		.flags = e2ring(gem_fd, e),
 	};
 
+	if (spin->running) {
+		obj[0].handle = spin->poll_handle;
+		obj[0].flags = EXEC_OBJECT_ASYNC;
+		obj[1].handle = spin->handle;
+		eb.buffer_count = 2;
+	} else {
+		obj[0].handle = spin->handle;
+		eb.buffer_count = 1;
+	}
+
 	gem_execbuf(gem_fd, &eb);
 }
 
@@ -429,7 +465,7 @@ static void
 most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 		    const unsigned int num_engines, unsigned int flags)
 {
-	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_exec_object2 obj[2];
 	const struct intel_execution_engine2 *e_;
 	uint64_t tval[2][num_engines];
 	uint64_t val[num_engines];
@@ -438,20 +474,19 @@ most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 	igt_spin_t *spin = NULL;
 	unsigned int idle_idx, i;
 
+	memset(obj, 0, sizeof(obj));
+
 	i = 0;
 	for_each_engine_class_instance(fd, e_) {
 		if (!gem_has_engine(gem_fd, e_->class, e_->instance))
 			continue;
 
-		if (e == e_) {
+		if (e == e_)
 			idle_idx = i;
-		} else if (spin) {
-			__submit_spin_batch(gem_fd, &obj, e_);
-		} else {
-			spin = igt_spin_batch_new(gem_fd, 0,
-						  e2ring(gem_fd, e_), 0);
-			obj.handle = spin->handle;
-		}
+		else if (spin)
+			__submit_spin_batch(gem_fd, spin, obj, e_);
+		else
+			spin = __spin_poll(gem_fd, 0, e2ring(gem_fd, e_));
 
 		val[i++] = I915_PMU_ENGINE_BUSY(e_->class, e_->instance);
 	}
@@ -461,6 +496,9 @@ most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 	for (i = 0; i < num_engines; i++)
 		fd[i] = open_group(val[i], fd[0]);
 
+	/* Small delay to allow engines to start. */
+	usleep(__spin_wait(spin) * num_engines / 1e3);
+
 	pmu_read_multi(fd[0], num_engines, tval[0]);
 	slept = measured_usleep(batch_duration_ns / 1000);
 	if (flags & TEST_TRAILING_IDLE)
@@ -489,7 +527,7 @@ static void
 all_busy_check_all(int gem_fd, const unsigned int num_engines,
 		   unsigned int flags)
 {
-	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_exec_object2 obj[2];
 	const struct intel_execution_engine2 *e;
 	uint64_t tval[2][num_engines];
 	uint64_t val[num_engines];
@@ -498,18 +536,17 @@ all_busy_check_all(int gem_fd, const unsigned int num_engines,
 	igt_spin_t *spin = NULL;
 	unsigned int i;
 
+	memset(obj, 0, sizeof(obj));
+
 	i = 0;
 	for_each_engine_class_instance(fd, e) {
 		if (!gem_has_engine(gem_fd, e->class, e->instance))
 			continue;
 
-		if (spin) {
-			__submit_spin_batch(gem_fd, &obj, e);
-		} else {
-			spin = igt_spin_batch_new(gem_fd, 0,
-						  e2ring(gem_fd, e), 0);
-			obj.handle = spin->handle;
-		}
+		if (spin)
+			__submit_spin_batch(gem_fd, spin, obj, e);
+		else
+			spin = __spin_poll(gem_fd, 0, e2ring(gem_fd, e));
 
 		val[i++] = I915_PMU_ENGINE_BUSY(e->class, e->instance);
 	}
@@ -519,6 +556,9 @@ all_busy_check_all(int gem_fd, const unsigned int num_engines,
 	for (i = 0; i < num_engines; i++)
 		fd[i] = open_group(val[i], fd[0]);
 
+	/* Small delay to allow engines to start. */
+	usleep(__spin_wait(spin) * num_engines / 1e3);
+
 	pmu_read_multi(fd[0], num_engines, tval[0]);
 	slept = measured_usleep(batch_duration_ns / 1000);
 	if (flags & TEST_TRAILING_IDLE)
@@ -550,7 +590,7 @@ no_sema(int gem_fd, const struct intel_execution_engine2 *e, unsigned int flags)
 	open_group(I915_PMU_ENGINE_WAIT(e->class, e->instance), fd);
 
 	if (flags & TEST_BUSY)
-		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+		spin = spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 	else
 		spin = NULL;
 
@@ -884,7 +924,7 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
 	 */
 	fd[1] = open_pmu(config);
 
-	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	spin = spin_sync(gem_fd, 0, e2ring(gem_fd, e));
 
 	val[0] = val[1] = __pmu_read_single(fd[0], &ts[0]);
 	slept[1] = measured_usleep(batch_duration_ns / 1000);
@@ -1248,7 +1288,7 @@ test_frequency(int gem_fd)
 	igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == min_freq);
 
 	gem_quiescent_gpu(gem_fd); /* Idle to be sure the change takes effect */
-	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	spin = spin_sync(gem_fd, 0, I915_EXEC_RENDER);
 
 	slept = pmu_read_multi(fd, 2, start);
 	measured_usleep(batch_duration_ns / 1000);
@@ -1274,7 +1314,7 @@ test_frequency(int gem_fd)
 	igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == max_freq);
 
 	gem_quiescent_gpu(gem_fd);
-	spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
+	spin = spin_sync(gem_fd, 0, I915_EXEC_RENDER);
 
 	slept = pmu_read_multi(fd, 2, start);
 	measured_usleep(batch_duration_ns / 1000);
@@ -1455,6 +1495,8 @@ static void __rearm_spin_batch(igt_spin_t *spin)
 {
 	const uint32_t mi_arb_chk = 0x5 << 23;
 
+	if (spin->running)
+		*spin->running = 0;
        *spin->batch = mi_arb_chk;
        __sync_synchronize();
 }
@@ -1517,7 +1559,7 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 		const unsigned long timeout[] = {
 			pwm_calibration_us * 1000, test_us * 1000
 		};
-		struct drm_i915_gem_exec_object2 obj = {};
+		struct drm_i915_gem_exec_object2 obj[2];
 		uint64_t total_busy_ns = 0, total_idle_ns = 0;
 		igt_spin_t *spin;
 		int ret;
@@ -1530,12 +1572,13 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 			igt_warn("Failed to set scheduling policy!\n");
 
 		/* Allocate our spin batch and idle it. */
-		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
-		obj.handle = spin->handle;
-		__submit_spin_batch(gem_fd, &obj, e); /* record its location */
+		spin = __spin_poll(gem_fd, 0, e2ring(gem_fd, e));
+		memset(obj, 0, sizeof(obj));
+		__submit_spin_batch(gem_fd, spin, obj, e); /* record its location */
 		igt_spin_batch_end(spin);
-		gem_sync(gem_fd, obj.handle);
-		obj.flags |= EXEC_OBJECT_PINNED;
+		gem_sync(gem_fd, spin->handle);
+		obj[0].flags |= EXEC_OBJECT_PINNED;
+		obj[1].flags |= EXEC_OBJECT_PINNED;
 
 		/* 1st pass is calibration, second pass is the test. */
 		for (int pass = 0; pass < ARRAY_SIZE(timeout); pass++) {
@@ -1545,24 +1588,30 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 
 			igt_nsec_elapsed(&test_start);
 			do {
-				unsigned int target_idle_us, t_busy;
+				unsigned int target_idle_us;
+				struct timespec start = { };
+				unsigned long prep_delay_ns;
 
 				/* Restart the spinbatch. */
+				igt_nsec_elapsed(&start);
 				__rearm_spin_batch(spin);
-				__submit_spin_batch(gem_fd, &obj, e);
+				__submit_spin_batch(gem_fd, spin, obj, e);
 
-				/*
-				 * Note that the submission may be delayed to a
-				 * tasklet (ksoftirqd) which cannot run until we
-				 * sleep as we hog the cpu (we are RT).
-				 */
+				 /* Wait for batch to start executing. */
+				__spin_wait(spin);
+				prep_delay_ns = igt_nsec_elapsed(&start);
 
-				t_busy = measured_usleep(busy_us);
+				/* PWM busy sleep. */
+				memset(&start, 0, sizeof(start));
+				igt_nsec_elapsed(&start);
+				measured_usleep(busy_us);
 				igt_spin_batch_end(spin);
-				gem_sync(gem_fd, obj.handle);
+				gem_sync(gem_fd, spin->handle);
 
-				total_busy_ns += t_busy;
+				total_busy_ns += igt_nsec_elapsed(&start);
+				total_idle_ns += prep_delay_ns;
 
+				/* Re-calibrate. */
 				target_idle_us =
 					(100 * total_busy_ns / target_busy_pct - (total_busy_ns + total_idle_ns)) / 1000;
 				total_idle_ns += measured_usleep(target_idle_us);