diff mbox

[1/3] drm/i915/execlists: Remove synchronize_hardirq()

Message ID 20180604073441.6737-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson June 4, 2018, 7:34 a.m. UTC
This looked to be a sensible precaution to flush an ongoing interrupt
before resetting the IRQ. However, it has a potential to trigger a lockup
and is not strictly required so drop it.

[21411.603173] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
[21411.603174] Modules linked in: i915 intel_gtt drm_kms_helper
[21411.603181] CPU: 1 PID: 672 Comm: kworker/1:2 Tainted: G     U            4.17.0-rc7+ #307
[21411.603181] Hardware name:  /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016
[21411.603227] Workqueue: events_long i915_hangcheck_elapsed [i915]
[21411.603234] RIP: 0010:__synchronize_hardirq+0x12/0x50
[21411.603235] RSP: 0018:ffffc90000393c48 EFLAGS: 00000006
[21411.603236] RAX: 0000000001440200 RBX: ffff880275b09200 RCX: 000000000000003d
[21411.603237] RDX: 0000000000000000 RSI: 000000000000007d RDI: ffff880275b09200
[21411.603238] RBP: ffff880256e81440 R08: ffff88027647a910 R09: ffff880275b09200
[21411.603239] R10: 0000000000000000 R11: 0000000000000040 R12: ffff880275b092a4
[21411.603239] R13: 0000000000000286 R14: ffff880273a86000 R15: ffff88027004eac0
[21411.603241] FS:  0000000000000000(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000
[21411.603242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21411.603243] CR2: 00007f9cc24d6018 CR3: 0000000002a09000 CR4: 00000000001406e0
[21411.603243] Call Trace:
[21411.603250]  synchronize_hardirq+0x1b/0x30
[21411.603286]  reset_irq+0x1f/0x180 [i915]
[21411.603323]  execlists_reset+0x2f/0x1d0 [i915]
[21411.603351]  i915_reset_engine+0xf0/0x140 [i915]
[21411.603381]  i915_handle_error+0x11d/0x420 [i915]
[21411.603386]  ? scnprintf+0x3a/0x70
[21411.603421]  i915_hangcheck_elapsed+0x310/0x490 [i915]
[21411.603425]  process_one_work+0x195/0x310
[21411.603427]  worker_thread+0x26/0x3c0
[21411.603429]  ? wq_nice_store+0xd0/0xd0
[21411.603431]  kthread+0x107/0x120
[21411.603433]  ? _kthread_create_worker_on_cpu+0x40/0x40
[21411.603435]  ret_from_fork+0x1f/0x30
[21411.603437] Code: 00 00 89 c5 e8 50 e1 41 00 48 8b 53 38 89 e8 81 22 ff ff fb ff 5b 5d c3 90 41 54 4c 8d a7 a4 00 00 00 55 53 48 89 fb eb 02 f3 90 <48> 8b 43 38 8b 00 a9 00 00 04 00 75 f1 4c 89 e7 e8 89 e0 41 00

Fixes: 46b3617dfec8 ("drm/i915: Actually flush interrupts on reset not just wedging")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Tvrtko Ursulin June 4, 2018, 3:13 p.m. UTC | #1
On 04/06/2018 08:34, Chris Wilson wrote:
> This looked to be a sensible precaution to flush an ongoing interrupt
> before resetting the IRQ. However, it has a potential to trigger a lockup
> and is not strictly required so drop it.
> 
> [21411.603173] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> [21411.603174] Modules linked in: i915 intel_gtt drm_kms_helper
> [21411.603181] CPU: 1 PID: 672 Comm: kworker/1:2 Tainted: G     U            4.17.0-rc7+ #307
> [21411.603181] Hardware name:  /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016
> [21411.603227] Workqueue: events_long i915_hangcheck_elapsed [i915]
> [21411.603234] RIP: 0010:__synchronize_hardirq+0x12/0x50
> [21411.603235] RSP: 0018:ffffc90000393c48 EFLAGS: 00000006
> [21411.603236] RAX: 0000000001440200 RBX: ffff880275b09200 RCX: 000000000000003d
> [21411.603237] RDX: 0000000000000000 RSI: 000000000000007d RDI: ffff880275b09200
> [21411.603238] RBP: ffff880256e81440 R08: ffff88027647a910 R09: ffff880275b09200
> [21411.603239] R10: 0000000000000000 R11: 0000000000000040 R12: ffff880275b092a4
> [21411.603239] R13: 0000000000000286 R14: ffff880273a86000 R15: ffff88027004eac0
> [21411.603241] FS:  0000000000000000(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000
> [21411.603242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [21411.603243] CR2: 00007f9cc24d6018 CR3: 0000000002a09000 CR4: 00000000001406e0
> [21411.603243] Call Trace:
> [21411.603250]  synchronize_hardirq+0x1b/0x30
> [21411.603286]  reset_irq+0x1f/0x180 [i915]
> [21411.603323]  execlists_reset+0x2f/0x1d0 [i915]
> [21411.603351]  i915_reset_engine+0xf0/0x140 [i915]
> [21411.603381]  i915_handle_error+0x11d/0x420 [i915]
> [21411.603386]  ? scnprintf+0x3a/0x70
> [21411.603421]  i915_hangcheck_elapsed+0x310/0x490 [i915]
> [21411.603425]  process_one_work+0x195/0x310
> [21411.603427]  worker_thread+0x26/0x3c0
> [21411.603429]  ? wq_nice_store+0xd0/0xd0
> [21411.603431]  kthread+0x107/0x120
> [21411.603433]  ? _kthread_create_worker_on_cpu+0x40/0x40
> [21411.603435]  ret_from_fork+0x1f/0x30
> [21411.603437] Code: 00 00 89 c5 e8 50 e1 41 00 48 8b 53 38 89 e8 81 22 ff ff fb ff 5b 5d c3 90 41 54 4c 8d a7 a4 00 00 00 55 53 48 89 fb eb 02 f3 90 <48> 8b 43 38 8b 00 a9 00 00 04 00 75 f1 4c 89 e7 e8 89 e0 41 00
> 
> Fixes: 46b3617dfec8 ("drm/i915: Actually flush interrupts on reset not just wedging")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 517e92c6a70b..8d912d0c8fc1 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -869,7 +869,6 @@ static void reset_irq(struct intel_engine_cs *engine)
>   {
>   	/* Mark all CS interrupts as complete */
>   	smp_store_mb(engine->execlists.active, 0);
> -	synchronize_hardirq(engine->i915->drm.irq);
>   
>   	clear_gtiir(engine);
>   
> 

Funny, I was just commenting on synchronize_hardirq in the direct 
submission series.

In this one I don't immediately see what deadlocks. What's on the other 
side, I mean what is preventing interrupt handler from exiting?

Regards,

Tvrtko
Chris Wilson June 4, 2018, 3:21 p.m. UTC | #2
Quoting Tvrtko Ursulin (2018-06-04 16:13:54)
> 
> On 04/06/2018 08:34, Chris Wilson wrote:
> > This looked to be a sensible precaution to flush an ongoing interrupt
> > before resetting the IRQ. However, it has a potential to trigger a lockup
> > and is not strictly required so drop it.
> > 
> > [21411.603173] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> > [21411.603174] Modules linked in: i915 intel_gtt drm_kms_helper
> > [21411.603181] CPU: 1 PID: 672 Comm: kworker/1:2 Tainted: G     U            4.17.0-rc7+ #307
> > [21411.603181] Hardware name:  /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016
> > [21411.603227] Workqueue: events_long i915_hangcheck_elapsed [i915]
> > [21411.603234] RIP: 0010:__synchronize_hardirq+0x12/0x50
> > [21411.603235] RSP: 0018:ffffc90000393c48 EFLAGS: 00000006
> > [21411.603236] RAX: 0000000001440200 RBX: ffff880275b09200 RCX: 000000000000003d
> > [21411.603237] RDX: 0000000000000000 RSI: 000000000000007d RDI: ffff880275b09200
> > [21411.603238] RBP: ffff880256e81440 R08: ffff88027647a910 R09: ffff880275b09200
> > [21411.603239] R10: 0000000000000000 R11: 0000000000000040 R12: ffff880275b092a4
> > [21411.603239] R13: 0000000000000286 R14: ffff880273a86000 R15: ffff88027004eac0
> > [21411.603241] FS:  0000000000000000(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000
> > [21411.603242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [21411.603243] CR2: 00007f9cc24d6018 CR3: 0000000002a09000 CR4: 00000000001406e0
> > [21411.603243] Call Trace:
> > [21411.603250]  synchronize_hardirq+0x1b/0x30
> > [21411.603286]  reset_irq+0x1f/0x180 [i915]
> > [21411.603323]  execlists_reset+0x2f/0x1d0 [i915]
> > [21411.603351]  i915_reset_engine+0xf0/0x140 [i915]
> > [21411.603381]  i915_handle_error+0x11d/0x420 [i915]
> > [21411.603386]  ? scnprintf+0x3a/0x70
> > [21411.603421]  i915_hangcheck_elapsed+0x310/0x490 [i915]
> > [21411.603425]  process_one_work+0x195/0x310
> > [21411.603427]  worker_thread+0x26/0x3c0
> > [21411.603429]  ? wq_nice_store+0xd0/0xd0
> > [21411.603431]  kthread+0x107/0x120
> > [21411.603433]  ? _kthread_create_worker_on_cpu+0x40/0x40
> > [21411.603435]  ret_from_fork+0x1f/0x30
> > [21411.603437] Code: 00 00 89 c5 e8 50 e1 41 00 48 8b 53 38 89 e8 81 22 ff ff fb ff 5b 5d c3 90 41 54 4c 8d a7 a4 00 00 00 55 53 48 89 fb eb 02 f3 90 <48> 8b 43 38 8b 00 a9 00 00 04 00 75 f1 4c 89 e7 e8 89 e0 41 00
> > 
> > Fixes: 46b3617dfec8 ("drm/i915: Actually flush interrupts on reset not just wedging")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_lrc.c | 1 -
> >   1 file changed, 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 517e92c6a70b..8d912d0c8fc1 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -869,7 +869,6 @@ static void reset_irq(struct intel_engine_cs *engine)
> >   {
> >       /* Mark all CS interrupts as complete */
> >       smp_store_mb(engine->execlists.active, 0);
> > -     synchronize_hardirq(engine->i915->drm.irq);
> >   
> >       clear_gtiir(engine);
> >   
> > 
> 
> Funny, I was just commenting on synchronize_hardirq in the direct 
> submission series.
> 
> In this one I don't immediately see what deadlocks. What's on the other 
> side, I mean what is preventing interrupt handler from exiting?

Hmm, right I probably just messed up over there. Knee jerk reaction in
thinking it was a local CPU deadlock.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 517e92c6a70b..8d912d0c8fc1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -869,7 +869,6 @@  static void reset_irq(struct intel_engine_cs *engine)
 {
 	/* Mark all CS interrupts as complete */
 	smp_store_mb(engine->execlists.active, 0);
-	synchronize_hardirq(engine->i915->drm.irq);
 
 	clear_gtiir(engine);