Message ID | 20170302230211.4134-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Mar 02, 2017 at 11:02:11PM +0000, Chris Wilson wrote: > Whenever we advance from one completed waiter to the next, give it a > kick so that it can check to see if its seqno completed during the > switch. We used to rely on faking an interrupt to determine when the > wake up was required, but now the irq should always be enabled and so no > longer receive the kick when starting a new waiter. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> I can't decide if there's a bug here or not. When we install a new waiter, we wakeup any that are complete, but the question is do we need to wakeup the next. Thread: IRQ: spin_lock_irq: consume expired first_waiter = next if (irq) wake_up(next); irq = true spin_lock: wake_up(first_waiter); Will be fine. irq = true spin_lock: wake_up(first_waiter); spin_lock_irq: consume expired first_waiter = next if (irq) wake_up(next); Either the thread consumes the irq to perform the barrier before completing, or it leaves it set and the next waiter is woken. Either way, because the wake_up in the IRQ is now serliased by the spin_lock we should never loose a wakeup. In the failure case in CI, irq = false, implying that the wakeup had been consumed before it slept. I think I need to look elsewhere for the missing wakeup - or it is a legimate missed breadcrumb on that machine, which looks unlikely as the error appears very specific and an ordinary miss is likely to show up in many different tests. -Chris
On Fri, Mar 03, 2017 at 08:19:43AM -0000, Patchwork wrote: > == Series Details == > > Series: drm/i915: Always wakeup the next breadcrumb waiter (rev3) > URL : https://patchwork.freedesktop.org/series/20590/ > State : failure > > == Summary == > > Series 20590v3 drm/i915: Always wakeup the next breadcrumb waiter > https://patchwork.freedesktop.org/api/1.0/series/20590/revisions/3/mbox/ > > Test drv_module_reload: > Subgroup basic-reload: > dmesg-warn -> PASS (fi-ilk-650) > Test gem_exec_flush: > Subgroup basic-batch-kernel-default-uc: > fail -> PASS (fi-snb-2600) fdo#100007 > Test gem_exec_parallel: > Subgroup basic: > pass -> FAIL (fi-bxt-t5700) > pass -> FAIL (fi-skl-6260u) > pass -> FAIL (fi-bxt-j4205) > pass -> FAIL (fi-bdw-5557u) > pass -> FAIL (fi-skl-6700hq) > pass -> FAIL (fi-kbl-7500u) > Test gem_sync: > Subgroup basic-store-all: > pass -> FAIL (fi-bxt-t5700) > pass -> FAIL (fi-bxt-j4205) > pass -> FAIL (fi-kbl-7500u) > pass -> FAIL (fi-skl-6260u) > pass -> FAIL (fi-skl-6700hq) > pass -> FAIL (fi-bdw-5557u) > pass -> FAIL (fi-skl-6700k) > pass -> FAIL (fi-skl-6770hq) > pass -> FAIL (fi-bsw-n3050) Clearly I missed some reason as to why that wakeup *is* required. We are not as serialised by the spinlock as I thought? -Chris
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c index 235d4645a5cf..d48cbd66ae42 100644 --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c @@ -287,6 +287,19 @@ static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b, wake_up_process(wait->tsk); /* implicit smp_wmb() */ } +static inline void __intel_breadcrumbs_next(struct intel_breadcrumbs *b, + struct rb_node *next) +{ + /* As there is a delay between reading the current + * seqno, processing the completed tasks and selecting + * the next waiter, we may have missed the interrupt + * and so need for the next bottom-half to wakeup. + */ + GEM_BUG_ON(!b->irq_armed); + b->first_wait = to_wait(next); + wake_up_process(to_wait(next)->tsk); +} + static bool __intel_engine_add_wait(struct intel_engine_cs *engine, struct intel_wait *wait) { @@ -357,21 +370,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine, GEM_BUG_ON(!next && !first); if (next && next != &wait->node) { GEM_BUG_ON(first); - b->first_wait = to_wait(next); - /* As there is a delay between reading the current - * seqno, processing the completed tasks and selecting - * the next waiter, we may have missed the interrupt - * and so need for the next bottom-half to wakeup. - * - * Also as we enable the IRQ, we may miss the - * interrupt for that seqno, so we have to wake up - * the next bottom-half in order to do a coherent check - * in case the seqno passed. - */ - __intel_breadcrumbs_enable_irq(b); - if (test_bit(ENGINE_IRQ_BREADCRUMB, - &engine->irq_posted)) - wake_up_process(to_wait(next)->tsk); + __intel_breadcrumbs_next(b, next); } do { @@ -473,21 +472,10 @@ static void __intel_engine_remove_wait(struct intel_engine_cs *engine, } } - if (next) { - /* In our haste, we may have completed the first waiter - * before we enabled the interrupt. Do so now as we - * have a second waiter for a future seqno. Afterwards, - * we have to wake up that waiter in case we missed - * the interrupt, or if we have to handle an - * exception rather than a seqno completion. - */ - b->first_wait = to_wait(next); - if (b->first_wait->seqno != wait->seqno) - __intel_breadcrumbs_enable_irq(b); - wake_up_process(b->first_wait->tsk); - } else { + if (next) + __intel_breadcrumbs_next(b, next); + else b->first_wait = NULL; - } } else { GEM_BUG_ON(rb_first(&b->waiters) == &wait->node); }
Whenever we advance from one completed waiter to the next, give it a kick so that it can check to see if its seqno completed during the switch. We used to rely on faking an interrupt to determine when the wake up was required, but now the irq should always be enabled and so no longer receive the kick when starting a new waiter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/intel_breadcrumbs.c | 46 ++++++++++++-------------------- 1 file changed, 17 insertions(+), 29 deletions(-)