[i-g-t,2/6] igt/gem_sync: Alternate stress for nop+sync

Message ID	20180619104920.17528-2-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Tue, 19 Jun 2018 11:49:16 +0100 Message-Id: <20180619104920.17528-2-chris@chris-wilson.co.uk> In-Reply-To: <20180619104920.17528-1-chris@chris-wilson.co.uk> References: <20180619104920.17528-1-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH i-g-t 2/6] igt/gem_sync: Alternate stress for nop+sync Precedence: list Cc: igt-dev@lists.freedesktop.org MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

20180619104920.17528-2-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Tue, 19 Jun 2018 11:49:16 +0100
Message-Id: <20180619104920.17528-2-chris@chris-wilson.co.uk>
In-Reply-To: <20180619104920.17528-1-chris@chris-wilson.co.uk>
References: <20180619104920.17528-1-chris@chris-wilson.co.uk>
Subject: [Intel-gfx] [PATCH i-g-t 2/6] igt/gem_sync: Alternate stress for
	nop+sync
Precedence: list
Cc: igt-dev@lists.freedesktop.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

Chris Wilson June 19, 2018, 10:49 a.m. UTC

Apply a different sort of stress by timing how long it takes to sync a
second nop batch in the pipeline. We first start a spinner on the
engine, then when we know the GPU is active, we submit the second nop;
start timing as we then release the spinner and wait for the nop to
complete.

As with every other gem_sync test, it serves two roles. The first is
that it checks that we do not miss a wakeup under common stressful
conditions (the more conditions we check, the happier we will be that
they do not occur in practice). And the second role it fulfils, is that
it provides a very crude estimate for how long it takes for a nop to
execute from a running start (we already have a complimentary estimate
for an idle start).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 tests/gem_sync.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

Comments

Joonas Lahtinen June 19, 2018, 1:36 p.m. UTC | #1

Quoting Chris Wilson (2018-06-19 13:49:16)
> Apply a different sort of stress by timing how long it takes to sync a
> second nop batch in the pipeline. We first start a spinner on the
> engine, then when we know the GPU is active, we submit the second nop;
> start timing as we then release the spinner and wait for the nop to
> complete.
> 
> As with every other gem_sync test, it serves two roles. The first is
> that it checks that we do not miss a wakeup under common stressful
> conditions (the more conditions we check, the happier we will be that
> they do not occur in practice). And the second role it fulfils, is that
> it provides a very crude estimate for how long it takes for a nop to
> execute from a running start (we already have a complimentary estimate
> for an idle start).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

<SNIP>

> +static void
> +wakeup_ring(int fd, unsigned ring, int timeout)
> +{

<SNIP>

> +       intel_detect_and_clear_missed_interrupts(fd);
> +       igt_fork(child, num_engines) {
> +               const uint32_t bbe = MI_BATCH_BUFFER_END;
> +               struct drm_i915_gem_exec_object2 object;
> +               struct drm_i915_gem_execbuffer2 execbuf;
> +               double end, this, elapsed, now;
> +               unsigned long cycles;
> +               uint32_t cmd;
> +               igt_spin_t *spin;
> +
> +               memset(&object, 0, sizeof(object));
> +               object.handle = gem_create(fd, 4096);
> +               gem_write(fd, object.handle, 0, &bbe, sizeof(bbe));
> +
> +               memset(&execbuf, 0, sizeof(execbuf));
> +               execbuf.buffers_ptr = to_user_pointer(&object);
> +               execbuf.buffer_count = 1;
> +               execbuf.flags = engines[child % num_engines];
> +
> +               spin = __igt_spin_batch_new_poll(fd, 0, execbuf.flags);
> +               igt_assert(spin->running);
> +               cmd = *spin->batch;
> +
> +               gem_execbuf(fd, &execbuf);
> +
> +               igt_spin_batch_end(spin);
> +               gem_sync(fd, object.handle);
> +
> +               end = gettime() + timeout;
> +               elapsed = 0;
> +               cycles = 0;
> +               do {
> +                       *spin->batch = cmd;
> +                       *spin->running = 0;

igt_spin_batch_reset/resume/whatever...

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas

Chris Wilson June 19, 2018, 1:39 p.m. UTC | #2

Quoting Joonas Lahtinen (2018-06-19 14:36:42)
> Quoting Chris Wilson (2018-06-19 13:49:16)
> > Apply a different sort of stress by timing how long it takes to sync a
> > second nop batch in the pipeline. We first start a spinner on the
> > engine, then when we know the GPU is active, we submit the second nop;
> > start timing as we then release the spinner and wait for the nop to
> > complete.
> > 
> > As with every other gem_sync test, it serves two roles. The first is
> > that it checks that we do not miss a wakeup under common stressful
> > conditions (the more conditions we check, the happier we will be that
> > they do not occur in practice). And the second role it fulfils, is that
> > it provides a very crude estimate for how long it takes for a nop to
> > execute from a running start (we already have a complimentary estimate
> > for an idle start).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> 
> <SNIP>
> 
> > +static void
> > +wakeup_ring(int fd, unsigned ring, int timeout)
> > +{
> 
> <SNIP>
> 
> > +       intel_detect_and_clear_missed_interrupts(fd);
> > +       igt_fork(child, num_engines) {
> > +               const uint32_t bbe = MI_BATCH_BUFFER_END;
> > +               struct drm_i915_gem_exec_object2 object;
> > +               struct drm_i915_gem_execbuffer2 execbuf;
> > +               double end, this, elapsed, now;
> > +               unsigned long cycles;
> > +               uint32_t cmd;
> > +               igt_spin_t *spin;
> > +
> > +               memset(&object, 0, sizeof(object));
> > +               object.handle = gem_create(fd, 4096);
> > +               gem_write(fd, object.handle, 0, &bbe, sizeof(bbe));
> > +
> > +               memset(&execbuf, 0, sizeof(execbuf));
> > +               execbuf.buffers_ptr = to_user_pointer(&object);
> > +               execbuf.buffer_count = 1;
> > +               execbuf.flags = engines[child % num_engines];
> > +
> > +               spin = __igt_spin_batch_new_poll(fd, 0, execbuf.flags);
> > +               igt_assert(spin->running);
> > +               cmd = *spin->batch;
> > +
> > +               gem_execbuf(fd, &execbuf);
> > +
> > +               igt_spin_batch_end(spin);
> > +               gem_sync(fd, object.handle);
> > +
> > +               end = gettime() + timeout;
> > +               elapsed = 0;
> > +               cycles = 0;
> > +               do {
> > +                       *spin->batch = cmd;
> > +                       *spin->running = 0;
> 
> igt_spin_batch_reset/resume/whatever...

And here you see why Tvrtko and myself never formalised that part of the
API.

Anyway, I just found why this was underperforming. Those 250 MI_NOOPs we
insert to stop the GPU eating itself cost around 5us, which considering
the target here is say 1us, is quite huge.
-Chris

diff --git a/tests/gem_sync.c b/tests/gem_sync.c
index 1e2e089a1..4cd97c58b 100644
--- a/tests/gem_sync.c
+++ b/tests/gem_sync.c
@@ -177,6 +177,92 @@  idle_ring(int fd, unsigned ring, int timeout)
 	gem_close(fd, object.handle);
 }
 
+static void
+wakeup_ring(int fd, unsigned ring, int timeout)
+{
+	unsigned engines[16];
+	const char *names[16];
+	int num_engines = 0;
+
+	if (ring == ALL_ENGINES) {
+		for_each_physical_engine(fd, ring) {
+			if (!gem_can_store_dword(fd, ring))
+				continue;
+
+			names[num_engines] = e__->name;
+			engines[num_engines++] = ring;
+			if (num_engines == ARRAY_SIZE(engines))
+				break;
+		}
+		igt_require(num_engines);
+	} else {
+		gem_require_ring(fd, ring);
+		igt_require(gem_can_store_dword(fd, ring));
+		names[num_engines] = NULL;
+		engines[num_engines++] = ring;
+	}
+
+	intel_detect_and_clear_missed_interrupts(fd);
+	igt_fork(child, num_engines) {
+		const uint32_t bbe = MI_BATCH_BUFFER_END;
+		struct drm_i915_gem_exec_object2 object;
+		struct drm_i915_gem_execbuffer2 execbuf;
+		double end, this, elapsed, now;
+		unsigned long cycles;
+		uint32_t cmd;
+		igt_spin_t *spin;
+
+		memset(&object, 0, sizeof(object));
+		object.handle = gem_create(fd, 4096);
+		gem_write(fd, object.handle, 0, &bbe, sizeof(bbe));
+
+		memset(&execbuf, 0, sizeof(execbuf));
+		execbuf.buffers_ptr = to_user_pointer(&object);
+		execbuf.buffer_count = 1;
+		execbuf.flags = engines[child % num_engines];
+
+		spin = __igt_spin_batch_new_poll(fd, 0, execbuf.flags);
+		igt_assert(spin->running);
+		cmd = *spin->batch;
+
+		gem_execbuf(fd, &execbuf);
+
+		igt_spin_batch_end(spin);
+		gem_sync(fd, object.handle);
+
+		end = gettime() + timeout;
+		elapsed = 0;
+		cycles = 0;
+		do {
+			*spin->batch = cmd;
+			*spin->running = 0;
+			gem_execbuf(fd, &spin->execbuf);
+			while (!READ_ONCE(*spin->running))
+				;
+
+			gem_execbuf(fd, &execbuf);
+
+			this = gettime();
+			igt_spin_batch_end(spin);
+			gem_sync(fd, object.handle);
+			now = gettime();
+
+			elapsed += now - this;
+			cycles++;
+		} while (now < end);
+
+		igt_info("%s%sompleted %ld cycles: %.3f us\n",
+			 names[child % num_engines] ?: "",
+			 names[child % num_engines] ? " c" : "C",
+			 cycles, elapsed*1e6/cycles);
+
+		igt_spin_batch_free(fd, spin);
+		gem_close(fd, object.handle);
+	}
+	igt_waitchildren_timeout(timeout+10, NULL);
+	igt_assert_eq(intel_detect_and_clear_missed_interrupts(fd), 0);
+}
+
 static void
 store_ring(int fd, unsigned ring, int num_children, int timeout)
 {
@@ -762,6 +848,8 @@  igt_main
 			sync_ring(fd, e->exec_id | e->flags, 1, 150);
 		igt_subtest_f("idle-%s", e->name)
 			idle_ring(fd, e->exec_id | e->flags, 150);
+		igt_subtest_f("wakeup-%s", e->name)
+			wakeup_ring(fd, e->exec_id | e->flags, 150);
 		igt_subtest_f("store-%s", e->name)
 			store_ring(fd, e->exec_id | e->flags, 1, 150);
 		igt_subtest_f("many-%s", e->name)
@@ -782,6 +870,8 @@  igt_main
 		sync_ring(fd, ALL_ENGINES, ncpus, 150);
 	igt_subtest("forked-store-each")
 		store_ring(fd, ALL_ENGINES, ncpus, 150);
+	igt_subtest("wakeup-each")
+		wakeup_ring(fd, ALL_ENGINES, 150);
 
 	igt_subtest("basic-all")
 		sync_all(fd, 1, 5);

[i-g-t,2/6] igt/gem_sync: Alternate stress for nop+sync

Commit Message

Comments

Patch