drm/i915: Always wakeup the next breadcrumb waiter

Message ID	20170302230211.4134-1-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Thu, 2 Mar 2017 23:02:11 +0000 Message-Id: <20170302230211.4134-1-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH] drm/i915: Always wakeup the next breadcrumb waiter Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

20170302230211.4134-1-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Thu,  2 Mar 2017 23:02:11 +0000
Message-Id: <20170302230211.4134-1-chris@chris-wilson.co.uk>
Subject: [Intel-gfx] [PATCH] drm/i915: Always wakeup the next breadcrumb
	waiter
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

Chris Wilson March 2, 2017, 11:02 p.m. UTC

Whenever we advance from one completed waiter to the next, give it a
kick so that it can check to see if its seqno completed during the
switch. We used to rely on faking an interrupt to determine when the
wake up was required, but now the irq should always be enabled and so no
longer receive the kick when starting a new waiter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 46 ++++++++++++--------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

Comments

Chris Wilson March 3, 2017, 7:49 a.m. UTC | #1

On Thu, Mar 02, 2017 at 11:02:11PM +0000, Chris Wilson wrote:
> Whenever we advance from one completed waiter to the next, give it a
> kick so that it can check to see if its seqno completed during the
> switch. We used to rely on faking an interrupt to determine when the
> wake up was required, but now the irq should always be enabled and so no
> longer receive the kick when starting a new waiter.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I can't decide if there's a bug here or not. When we install a new
waiter, we wakeup any that are complete, but the question is do we need
to wakeup the next.

Thread:					IRQ:
spin_lock_irq:
	consume expired
	first_waiter = next
	if (irq)
		wake_up(next);
					irq = true
					spin_lock:
						wake_up(first_waiter);

Will be fine.

					irq = true
					spin_lock:
						wake_up(first_waiter);
spin_lock_irq:
	consume expired
	first_waiter = next
	if (irq)
		wake_up(next);

Either the thread consumes the irq to perform the barrier before
completing, or it leaves it set and the next waiter is woken. Either
way, because the wake_up in the IRQ is now serliased by the spin_lock we
should never loose a wakeup.

In the failure case in CI, irq = false, implying that the wakeup had
been consumed before it slept. I think I need to look elsewhere for the
missing wakeup - or it is a legimate missed breadcrumb on that machine,
which looks unlikely as the error appears very specific and an ordinary
miss is likely to show up in many different tests.
-Chris

Chris Wilson March 3, 2017, 8:31 a.m. UTC | #2

On Fri, Mar 03, 2017 at 08:19:43AM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915: Always wakeup the next breadcrumb waiter (rev3)
> URL   : https://patchwork.freedesktop.org/series/20590/
> State : failure
> 
> == Summary ==
> 
> Series 20590v3 drm/i915: Always wakeup the next breadcrumb waiter
> https://patchwork.freedesktop.org/api/1.0/series/20590/revisions/3/mbox/
> 
> Test drv_module_reload:
>         Subgroup basic-reload:
>                 dmesg-warn -> PASS       (fi-ilk-650)
> Test gem_exec_flush:
>         Subgroup basic-batch-kernel-default-uc:
>                 fail       -> PASS       (fi-snb-2600) fdo#100007
> Test gem_exec_parallel:
>         Subgroup basic:
>                 pass       -> FAIL       (fi-bxt-t5700)
>                 pass       -> FAIL       (fi-skl-6260u)
>                 pass       -> FAIL       (fi-bxt-j4205)
>                 pass       -> FAIL       (fi-bdw-5557u)
>                 pass       -> FAIL       (fi-skl-6700hq)
>                 pass       -> FAIL       (fi-kbl-7500u)
> Test gem_sync:
>         Subgroup basic-store-all:
>                 pass       -> FAIL       (fi-bxt-t5700)
>                 pass       -> FAIL       (fi-bxt-j4205)
>                 pass       -> FAIL       (fi-kbl-7500u)
>                 pass       -> FAIL       (fi-skl-6260u)
>                 pass       -> FAIL       (fi-skl-6700hq)
>                 pass       -> FAIL       (fi-bdw-5557u)
>                 pass       -> FAIL       (fi-skl-6700k)
>                 pass       -> FAIL       (fi-skl-6770hq)
>                 pass       -> FAIL       (fi-bsw-n3050)

Clearly I missed some reason as to why that wakeup *is* required. We are
not as serialised by the spinlock as I thought?
-Chris

diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 235d4645a5cf..d48cbd66ae42 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -287,6 +287,19 @@  static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
 	wake_up_process(wait->tsk); /* implicit smp_wmb() */
 }
 
+static inline void __intel_breadcrumbs_next(struct intel_breadcrumbs *b,
+					    struct rb_node *next)
+{
+	/* As there is a delay between reading the current
+	 * seqno, processing the completed tasks and selecting
+	 * the next waiter, we may have missed the interrupt
+	 * and so need for the next bottom-half to wakeup.
+	 */
+	GEM_BUG_ON(!b->irq_armed);
+	b->first_wait = to_wait(next);
+	wake_up_process(to_wait(next)->tsk);
+}
+
 static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 				    struct intel_wait *wait)
 {
@@ -357,21 +370,7 @@  static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 		GEM_BUG_ON(!next && !first);
 		if (next && next != &wait->node) {
 			GEM_BUG_ON(first);
-			b->first_wait = to_wait(next);
-			/* As there is a delay between reading the current
-			 * seqno, processing the completed tasks and selecting
-			 * the next waiter, we may have missed the interrupt
-			 * and so need for the next bottom-half to wakeup.
-			 *
-			 * Also as we enable the IRQ, we may miss the
-			 * interrupt for that seqno, so we have to wake up
-			 * the next bottom-half in order to do a coherent check
-			 * in case the seqno passed.
-			 */
-			__intel_breadcrumbs_enable_irq(b);
-			if (test_bit(ENGINE_IRQ_BREADCRUMB,
-				     &engine->irq_posted))
-				wake_up_process(to_wait(next)->tsk);
+			__intel_breadcrumbs_next(b, next);
 		}
 
 		do {
@@ -473,21 +472,10 @@  static void __intel_engine_remove_wait(struct intel_engine_cs *engine,
 			}
 		}
 
-		if (next) {
-			/* In our haste, we may have completed the first waiter
-			 * before we enabled the interrupt. Do so now as we
-			 * have a second waiter for a future seqno. Afterwards,
-			 * we have to wake up that waiter in case we missed
-			 * the interrupt, or if we have to handle an
-			 * exception rather than a seqno completion.
-			 */
-			b->first_wait = to_wait(next);
-			if (b->first_wait->seqno != wait->seqno)
-				__intel_breadcrumbs_enable_irq(b);
-			wake_up_process(b->first_wait->tsk);
-		} else {
+		if (next)
+			__intel_breadcrumbs_next(b, next);
+		else
 			b->first_wait = NULL;
-		}
 	} else {
 		GEM_BUG_ON(rb_first(&b->waiters) == &wait->node);
 	}