Message ID | 20180604073441.6737-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 04/06/2018 08:34, Chris Wilson wrote: > This looked to be a sensible precaution to flush an ongoing interrupt > before resetting the IRQ. However, it has a potential to trigger a lockup > and is not strictly required so drop it. > > [21411.603173] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1 > [21411.603174] Modules linked in: i915 intel_gtt drm_kms_helper > [21411.603181] CPU: 1 PID: 672 Comm: kworker/1:2 Tainted: G U 4.17.0-rc7+ #307 > [21411.603181] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016 > [21411.603227] Workqueue: events_long i915_hangcheck_elapsed [i915] > [21411.603234] RIP: 0010:__synchronize_hardirq+0x12/0x50 > [21411.603235] RSP: 0018:ffffc90000393c48 EFLAGS: 00000006 > [21411.603236] RAX: 0000000001440200 RBX: ffff880275b09200 RCX: 000000000000003d > [21411.603237] RDX: 0000000000000000 RSI: 000000000000007d RDI: ffff880275b09200 > [21411.603238] RBP: ffff880256e81440 R08: ffff88027647a910 R09: ffff880275b09200 > [21411.603239] R10: 0000000000000000 R11: 0000000000000040 R12: ffff880275b092a4 > [21411.603239] R13: 0000000000000286 R14: ffff880273a86000 R15: ffff88027004eac0 > [21411.603241] FS: 0000000000000000(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000 > [21411.603242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [21411.603243] CR2: 00007f9cc24d6018 CR3: 0000000002a09000 CR4: 00000000001406e0 > [21411.603243] Call Trace: > [21411.603250] synchronize_hardirq+0x1b/0x30 > [21411.603286] reset_irq+0x1f/0x180 [i915] > [21411.603323] execlists_reset+0x2f/0x1d0 [i915] > [21411.603351] i915_reset_engine+0xf0/0x140 [i915] > [21411.603381] i915_handle_error+0x11d/0x420 [i915] > [21411.603386] ? scnprintf+0x3a/0x70 > [21411.603421] i915_hangcheck_elapsed+0x310/0x490 [i915] > [21411.603425] process_one_work+0x195/0x310 > [21411.603427] worker_thread+0x26/0x3c0 > [21411.603429] ? wq_nice_store+0xd0/0xd0 > [21411.603431] kthread+0x107/0x120 > [21411.603433] ? _kthread_create_worker_on_cpu+0x40/0x40 > [21411.603435] ret_from_fork+0x1f/0x30 > [21411.603437] Code: 00 00 89 c5 e8 50 e1 41 00 48 8b 53 38 89 e8 81 22 ff ff fb ff 5b 5d c3 90 41 54 4c 8d a7 a4 00 00 00 55 53 48 89 fb eb 02 f3 90 <48> 8b 43 38 8b 00 a9 00 00 04 00 75 f1 4c 89 e7 e8 89 e0 41 00 > > Fixes: 46b3617dfec8 ("drm/i915: Actually flush interrupts on reset not just wedging") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Cc: Michel Thierry <michel.thierry@intel.com> > Cc: Michał Winiarski <michal.winiarski@intel.com> > --- > drivers/gpu/drm/i915/intel_lrc.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 517e92c6a70b..8d912d0c8fc1 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -869,7 +869,6 @@ static void reset_irq(struct intel_engine_cs *engine) > { > /* Mark all CS interrupts as complete */ > smp_store_mb(engine->execlists.active, 0); > - synchronize_hardirq(engine->i915->drm.irq); > > clear_gtiir(engine); > > Funny, I was just commenting on synchronize_hardirq in the direct submission series. In this one I don't immediately see what deadlocks. What's on the other side, I mean what is preventing interrupt handler from exiting? Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-06-04 16:13:54) > > On 04/06/2018 08:34, Chris Wilson wrote: > > This looked to be a sensible precaution to flush an ongoing interrupt > > before resetting the IRQ. However, it has a potential to trigger a lockup > > and is not strictly required so drop it. > > > > [21411.603173] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1 > > [21411.603174] Modules linked in: i915 intel_gtt drm_kms_helper > > [21411.603181] CPU: 1 PID: 672 Comm: kworker/1:2 Tainted: G U 4.17.0-rc7+ #307 > > [21411.603181] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016 > > [21411.603227] Workqueue: events_long i915_hangcheck_elapsed [i915] > > [21411.603234] RIP: 0010:__synchronize_hardirq+0x12/0x50 > > [21411.603235] RSP: 0018:ffffc90000393c48 EFLAGS: 00000006 > > [21411.603236] RAX: 0000000001440200 RBX: ffff880275b09200 RCX: 000000000000003d > > [21411.603237] RDX: 0000000000000000 RSI: 000000000000007d RDI: ffff880275b09200 > > [21411.603238] RBP: ffff880256e81440 R08: ffff88027647a910 R09: ffff880275b09200 > > [21411.603239] R10: 0000000000000000 R11: 0000000000000040 R12: ffff880275b092a4 > > [21411.603239] R13: 0000000000000286 R14: ffff880273a86000 R15: ffff88027004eac0 > > [21411.603241] FS: 0000000000000000(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000 > > [21411.603242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [21411.603243] CR2: 00007f9cc24d6018 CR3: 0000000002a09000 CR4: 00000000001406e0 > > [21411.603243] Call Trace: > > [21411.603250] synchronize_hardirq+0x1b/0x30 > > [21411.603286] reset_irq+0x1f/0x180 [i915] > > [21411.603323] execlists_reset+0x2f/0x1d0 [i915] > > [21411.603351] i915_reset_engine+0xf0/0x140 [i915] > > [21411.603381] i915_handle_error+0x11d/0x420 [i915] > > [21411.603386] ? scnprintf+0x3a/0x70 > > [21411.603421] i915_hangcheck_elapsed+0x310/0x490 [i915] > > [21411.603425] process_one_work+0x195/0x310 > > [21411.603427] worker_thread+0x26/0x3c0 > > [21411.603429] ? wq_nice_store+0xd0/0xd0 > > [21411.603431] kthread+0x107/0x120 > > [21411.603433] ? _kthread_create_worker_on_cpu+0x40/0x40 > > [21411.603435] ret_from_fork+0x1f/0x30 > > [21411.603437] Code: 00 00 89 c5 e8 50 e1 41 00 48 8b 53 38 89 e8 81 22 ff ff fb ff 5b 5d c3 90 41 54 4c 8d a7 a4 00 00 00 55 53 48 89 fb eb 02 f3 90 <48> 8b 43 38 8b 00 a9 00 00 04 00 75 f1 4c 89 e7 e8 89 e0 41 00 > > > > Fixes: 46b3617dfec8 ("drm/i915: Actually flush interrupts on reset not just wedging") > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > Cc: Michel Thierry <michel.thierry@intel.com> > > Cc: Michał Winiarski <michal.winiarski@intel.com> > > --- > > drivers/gpu/drm/i915/intel_lrc.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > index 517e92c6a70b..8d912d0c8fc1 100644 > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > @@ -869,7 +869,6 @@ static void reset_irq(struct intel_engine_cs *engine) > > { > > /* Mark all CS interrupts as complete */ > > smp_store_mb(engine->execlists.active, 0); > > - synchronize_hardirq(engine->i915->drm.irq); > > > > clear_gtiir(engine); > > > > > > Funny, I was just commenting on synchronize_hardirq in the direct > submission series. > > In this one I don't immediately see what deadlocks. What's on the other > side, I mean what is preventing interrupt handler from exiting? Hmm, right I probably just messed up over there. Knee jerk reaction in thinking it was a local CPU deadlock. -Chris
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 517e92c6a70b..8d912d0c8fc1 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -869,7 +869,6 @@ static void reset_irq(struct intel_engine_cs *engine) { /* Mark all CS interrupts as complete */ smp_store_mb(engine->execlists.active, 0); - synchronize_hardirq(engine->i915->drm.irq); clear_gtiir(engine);
This looked to be a sensible precaution to flush an ongoing interrupt before resetting the IRQ. However, it has a potential to trigger a lockup and is not strictly required so drop it. [21411.603173] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1 [21411.603174] Modules linked in: i915 intel_gtt drm_kms_helper [21411.603181] CPU: 1 PID: 672 Comm: kworker/1:2 Tainted: G U 4.17.0-rc7+ #307 [21411.603181] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016 [21411.603227] Workqueue: events_long i915_hangcheck_elapsed [i915] [21411.603234] RIP: 0010:__synchronize_hardirq+0x12/0x50 [21411.603235] RSP: 0018:ffffc90000393c48 EFLAGS: 00000006 [21411.603236] RAX: 0000000001440200 RBX: ffff880275b09200 RCX: 000000000000003d [21411.603237] RDX: 0000000000000000 RSI: 000000000000007d RDI: ffff880275b09200 [21411.603238] RBP: ffff880256e81440 R08: ffff88027647a910 R09: ffff880275b09200 [21411.603239] R10: 0000000000000000 R11: 0000000000000040 R12: ffff880275b092a4 [21411.603239] R13: 0000000000000286 R14: ffff880273a86000 R15: ffff88027004eac0 [21411.603241] FS: 0000000000000000(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000 [21411.603242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [21411.603243] CR2: 00007f9cc24d6018 CR3: 0000000002a09000 CR4: 00000000001406e0 [21411.603243] Call Trace: [21411.603250] synchronize_hardirq+0x1b/0x30 [21411.603286] reset_irq+0x1f/0x180 [i915] [21411.603323] execlists_reset+0x2f/0x1d0 [i915] [21411.603351] i915_reset_engine+0xf0/0x140 [i915] [21411.603381] i915_handle_error+0x11d/0x420 [i915] [21411.603386] ? scnprintf+0x3a/0x70 [21411.603421] i915_hangcheck_elapsed+0x310/0x490 [i915] [21411.603425] process_one_work+0x195/0x310 [21411.603427] worker_thread+0x26/0x3c0 [21411.603429] ? wq_nice_store+0xd0/0xd0 [21411.603431] kthread+0x107/0x120 [21411.603433] ? _kthread_create_worker_on_cpu+0x40/0x40 [21411.603435] ret_from_fork+0x1f/0x30 [21411.603437] Code: 00 00 89 c5 e8 50 e1 41 00 48 8b 53 38 89 e8 81 22 ff ff fb ff 5b 5d c3 90 41 54 4c 8d a7 a4 00 00 00 55 53 48 89 fb eb 02 f3 90 <48> 8b 43 38 8b 00 a9 00 00 04 00 75 f1 4c 89 e7 e8 89 e0 41 00 Fixes: 46b3617dfec8 ("drm/i915: Actually flush interrupts on reset not just wedging") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 1 - 1 file changed, 1 deletion(-)