diff mbox

drm/i915: Always wakeup the next breadcrumb waiter

Message ID 20170302230211.4134-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson March 2, 2017, 11:02 p.m. UTC
Whenever we advance from one completed waiter to the next, give it a
kick so that it can check to see if its seqno completed during the
switch. We used to rely on faking an interrupt to determine when the
wake up was required, but now the irq should always be enabled and so no
longer receive the kick when starting a new waiter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 46 ++++++++++++--------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

Comments

Chris Wilson March 3, 2017, 7:49 a.m. UTC | #1
On Thu, Mar 02, 2017 at 11:02:11PM +0000, Chris Wilson wrote:
> Whenever we advance from one completed waiter to the next, give it a
> kick so that it can check to see if its seqno completed during the
> switch. We used to rely on faking an interrupt to determine when the
> wake up was required, but now the irq should always be enabled and so no
> longer receive the kick when starting a new waiter.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I can't decide if there's a bug here or not. When we install a new
waiter, we wakeup any that are complete, but the question is do we need
to wakeup the next.

Thread:					IRQ:
spin_lock_irq:
	consume expired
	first_waiter = next
	if (irq)
		wake_up(next);
					irq = true
					spin_lock:
						wake_up(first_waiter);

Will be fine.

					irq = true
					spin_lock:
						wake_up(first_waiter);
spin_lock_irq:
	consume expired
	first_waiter = next
	if (irq)
		wake_up(next);

Either the thread consumes the irq to perform the barrier before
completing, or it leaves it set and the next waiter is woken. Either
way, because the wake_up in the IRQ is now serliased by the spin_lock we
should never loose a wakeup.

In the failure case in CI, irq = false, implying that the wakeup had
been consumed before it slept. I think I need to look elsewhere for the
missing wakeup - or it is a legimate missed breadcrumb on that machine,
which looks unlikely as the error appears very specific and an ordinary
miss is likely to show up in many different tests.
-Chris
Chris Wilson March 3, 2017, 8:31 a.m. UTC | #2
On Fri, Mar 03, 2017 at 08:19:43AM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915: Always wakeup the next breadcrumb waiter (rev3)
> URL   : https://patchwork.freedesktop.org/series/20590/
> State : failure
> 
> == Summary ==
> 
> Series 20590v3 drm/i915: Always wakeup the next breadcrumb waiter
> https://patchwork.freedesktop.org/api/1.0/series/20590/revisions/3/mbox/
> 
> Test drv_module_reload:
>         Subgroup basic-reload:
>                 dmesg-warn -> PASS       (fi-ilk-650)
> Test gem_exec_flush:
>         Subgroup basic-batch-kernel-default-uc:
>                 fail       -> PASS       (fi-snb-2600) fdo#100007
> Test gem_exec_parallel:
>         Subgroup basic:
>                 pass       -> FAIL       (fi-bxt-t5700)
>                 pass       -> FAIL       (fi-skl-6260u)
>                 pass       -> FAIL       (fi-bxt-j4205)
>                 pass       -> FAIL       (fi-bdw-5557u)
>                 pass       -> FAIL       (fi-skl-6700hq)
>                 pass       -> FAIL       (fi-kbl-7500u)
> Test gem_sync:
>         Subgroup basic-store-all:
>                 pass       -> FAIL       (fi-bxt-t5700)
>                 pass       -> FAIL       (fi-bxt-j4205)
>                 pass       -> FAIL       (fi-kbl-7500u)
>                 pass       -> FAIL       (fi-skl-6260u)
>                 pass       -> FAIL       (fi-skl-6700hq)
>                 pass       -> FAIL       (fi-bdw-5557u)
>                 pass       -> FAIL       (fi-skl-6700k)
>                 pass       -> FAIL       (fi-skl-6770hq)
>                 pass       -> FAIL       (fi-bsw-n3050)

Clearly I missed some reason as to why that wakeup *is* required. We are
not as serialised by the spinlock as I thought?
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 235d4645a5cf..d48cbd66ae42 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -287,6 +287,19 @@  static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
 	wake_up_process(wait->tsk); /* implicit smp_wmb() */
 }
 
+static inline void __intel_breadcrumbs_next(struct intel_breadcrumbs *b,
+					    struct rb_node *next)
+{
+	/* As there is a delay between reading the current
+	 * seqno, processing the completed tasks and selecting
+	 * the next waiter, we may have missed the interrupt
+	 * and so need for the next bottom-half to wakeup.
+	 */
+	GEM_BUG_ON(!b->irq_armed);
+	b->first_wait = to_wait(next);
+	wake_up_process(to_wait(next)->tsk);
+}
+
 static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 				    struct intel_wait *wait)
 {
@@ -357,21 +370,7 @@  static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 		GEM_BUG_ON(!next && !first);
 		if (next && next != &wait->node) {
 			GEM_BUG_ON(first);
-			b->first_wait = to_wait(next);
-			/* As there is a delay between reading the current
-			 * seqno, processing the completed tasks and selecting
-			 * the next waiter, we may have missed the interrupt
-			 * and so need for the next bottom-half to wakeup.
-			 *
-			 * Also as we enable the IRQ, we may miss the
-			 * interrupt for that seqno, so we have to wake up
-			 * the next bottom-half in order to do a coherent check
-			 * in case the seqno passed.
-			 */
-			__intel_breadcrumbs_enable_irq(b);
-			if (test_bit(ENGINE_IRQ_BREADCRUMB,
-				     &engine->irq_posted))
-				wake_up_process(to_wait(next)->tsk);
+			__intel_breadcrumbs_next(b, next);
 		}
 
 		do {
@@ -473,21 +472,10 @@  static void __intel_engine_remove_wait(struct intel_engine_cs *engine,
 			}
 		}
 
-		if (next) {
-			/* In our haste, we may have completed the first waiter
-			 * before we enabled the interrupt. Do so now as we
-			 * have a second waiter for a future seqno. Afterwards,
-			 * we have to wake up that waiter in case we missed
-			 * the interrupt, or if we have to handle an
-			 * exception rather than a seqno completion.
-			 */
-			b->first_wait = to_wait(next);
-			if (b->first_wait->seqno != wait->seqno)
-				__intel_breadcrumbs_enable_irq(b);
-			wake_up_process(b->first_wait->tsk);
-		} else {
+		if (next)
+			__intel_breadcrumbs_next(b, next);
+		else
 			b->first_wait = NULL;
-		}
 	} else {
 		GEM_BUG_ON(rb_first(&b->waiters) == &wait->node);
 	}