diff mbox series

drm/i915/gt: Save irqstate around virtual_context_destroy

Message ID 20191205145934.663183-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915/gt: Save irqstate around virtual_context_destroy | expand

Commit Message

Chris Wilson Dec. 5, 2019, 2:59 p.m. UTC
As virtual_context_destroy() may be called from a request signal, it may
be called from inside an irq-off section, and so we need to do a full
save/restore of the irq state rather than blindly re-enable irqs upon
unlocking.

<4> [110.024262] WARNING: inconsistent lock state
<4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G     U
<4> [110.024292] --------------------------------
<4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
<4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes:
<4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
<4> [110.024592] {IN-HARDIRQ-W} state was registered at:
<4> [110.024612]   lock_acquire+0xa7/0x1c0
<4> [110.024627]   _raw_spin_lock_irqsave+0x33/0x50
<4> [110.024788]   intel_engine_breadcrumbs_irq+0x38c/0x600 [i915]
<4> [110.024808]   irq_work_run_list+0x49/0x70
<4> [110.024824]   irq_work_run+0x26/0x50
<4> [110.024839]   smp_irq_work_interrupt+0x44/0x1e0
<4> [110.024855]   irq_work_interrupt+0xf/0x20
<4> [110.024871]   __do_softirq+0xb7/0x47f
<4> [110.024885]   irq_exit+0xba/0xc0
<4> [110.024898]   do_IRQ+0x83/0x160
<4> [110.024910]   ret_from_intr+0x0/0x1d
<4> [110.024922] irq event stamp: 172864
<4> [110.024938] hardirqs last  enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50
<4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40
<4> [110.024988] softirqs last  enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f
<4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0
<4> [110.025031]
other info that might help us debug this:
<4> [110.025049]  Possible unsafe locking scenario:

<4> [110.025065]        CPU0
<4> [110.025075]        ----
<4> [110.025084]   lock(&(&rq->lock)->rlock);
<4> [110.025099]   <Interrupt>
<4> [110.025109]     lock(&(&rq->lock)->rlock);
<4> [110.025124]
 *** DEADLOCK ***

<4> [110.025144] 4 locks held by kworker/0:0/5:
<4> [110.025156]  #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620
<4> [110.025187]  #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620
<4> [110.025219]  #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915]
<4> [110.025405]  #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
<4> [110.025634]
stack backtrace:
<4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G     U            5.4.0-rc8-CI-CI_DRM_7489+ #1
<4> [110.025675] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4> [110.025856] Workqueue: events engine_retire [i915]
<4> [110.025872] Call Trace:
<4> [110.025891]  dump_stack+0x71/0x9b
<4> [110.025907]  mark_lock+0x49a/0x500
<4> [110.025926]  ? print_shortest_lock_dependencies+0x200/0x200
<4> [110.025946]  mark_held_locks+0x49/0x70
<4> [110.025962]  ? _raw_spin_unlock_irq+0x24/0x50
<4> [110.025978]  lockdep_hardirqs_on+0xa2/0x1c0
<4> [110.025995]  _raw_spin_unlock_irq+0x24/0x50
<4> [110.026171]  virtual_context_destroy+0xc5/0x2e0 [i915]
<4> [110.026376]  __active_retire+0xb4/0x290 [i915]
<4> [110.026396]  dma_fence_signal_locked+0x9e/0x1b0
<4> [110.026613]  i915_request_retire+0x451/0x930 [i915]
<4> [110.026766]  retire_requests+0x4d/0x60 [i915]
<4> [110.026919]  engine_retire+0x63/0xe0 [i915]

Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Chris Wilson Dec. 5, 2019, 3:02 p.m. UTC | #1
Quoting Chris Wilson (2019-12-05 14:59:34)
> As virtual_context_destroy() may be called from a request signal, it may
> be called from inside an irq-off section, and so we need to do a full
> save/restore of the irq state rather than blindly re-enable irqs upon
> unlocking.
> 
> <4> [110.024262] WARNING: inconsistent lock state
> <4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G     U
> <4> [110.024292] --------------------------------
> <4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> <4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes:
> <4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
> <4> [110.024592] {IN-HARDIRQ-W} state was registered at:
> <4> [110.024612]   lock_acquire+0xa7/0x1c0
> <4> [110.024627]   _raw_spin_lock_irqsave+0x33/0x50
> <4> [110.024788]   intel_engine_breadcrumbs_irq+0x38c/0x600 [i915]
> <4> [110.024808]   irq_work_run_list+0x49/0x70
> <4> [110.024824]   irq_work_run+0x26/0x50
> <4> [110.024839]   smp_irq_work_interrupt+0x44/0x1e0
> <4> [110.024855]   irq_work_interrupt+0xf/0x20
> <4> [110.024871]   __do_softirq+0xb7/0x47f
> <4> [110.024885]   irq_exit+0xba/0xc0
> <4> [110.024898]   do_IRQ+0x83/0x160
> <4> [110.024910]   ret_from_intr+0x0/0x1d
> <4> [110.024922] irq event stamp: 172864
> <4> [110.024938] hardirqs last  enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50
> <4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40
> <4> [110.024988] softirqs last  enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f
> <4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0
> <4> [110.025031]
> other info that might help us debug this:
> <4> [110.025049]  Possible unsafe locking scenario:
> 
> <4> [110.025065]        CPU0
> <4> [110.025075]        ----
> <4> [110.025084]   lock(&(&rq->lock)->rlock);
> <4> [110.025099]   <Interrupt>
> <4> [110.025109]     lock(&(&rq->lock)->rlock);
> <4> [110.025124]
>  *** DEADLOCK ***
> 
> <4> [110.025144] 4 locks held by kworker/0:0/5:
> <4> [110.025156]  #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620
> <4> [110.025187]  #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620
> <4> [110.025219]  #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915]
> <4> [110.025405]  #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
> <4> [110.025634]
> stack backtrace:
> <4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G     U            5.4.0-rc8-CI-CI_DRM_7489+ #1
> <4> [110.025675] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
> <4> [110.025856] Workqueue: events engine_retire [i915]
> <4> [110.025872] Call Trace:
> <4> [110.025891]  dump_stack+0x71/0x9b
> <4> [110.025907]  mark_lock+0x49a/0x500
> <4> [110.025926]  ? print_shortest_lock_dependencies+0x200/0x200
> <4> [110.025946]  mark_held_locks+0x49/0x70
> <4> [110.025962]  ? _raw_spin_unlock_irq+0x24/0x50
> <4> [110.025978]  lockdep_hardirqs_on+0xa2/0x1c0
> <4> [110.025995]  _raw_spin_unlock_irq+0x24/0x50
> <4> [110.026171]  virtual_context_destroy+0xc5/0x2e0 [i915]
> <4> [110.026376]  __active_retire+0xb4/0x290 [i915]
> <4> [110.026396]  dma_fence_signal_locked+0x9e/0x1b0
> <4> [110.026613]  i915_request_retire+0x451/0x930 [i915]
> <4> [110.026766]  retire_requests+0x4d/0x60 [i915]
> <4> [110.026919]  engine_retire+0x63/0xe0 [i915]
> 
> Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")

Hmm,
Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex")
might be more accurate, but it's save to backport the full irqsave.
-Chris
Tvrtko Ursulin Dec. 5, 2019, 5:12 p.m. UTC | #2
On 05/12/2019 14:59, Chris Wilson wrote:
> As virtual_context_destroy() may be called from a request signal, it may
> be called from inside an irq-off section, and so we need to do a full
> save/restore of the irq state rather than blindly re-enable irqs upon
> unlocking.
> 
> <4> [110.024262] WARNING: inconsistent lock state
> <4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G     U
> <4> [110.024292] --------------------------------
> <4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> <4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes:
> <4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
> <4> [110.024592] {IN-HARDIRQ-W} state was registered at:
> <4> [110.024612]   lock_acquire+0xa7/0x1c0
> <4> [110.024627]   _raw_spin_lock_irqsave+0x33/0x50
> <4> [110.024788]   intel_engine_breadcrumbs_irq+0x38c/0x600 [i915]
> <4> [110.024808]   irq_work_run_list+0x49/0x70
> <4> [110.024824]   irq_work_run+0x26/0x50
> <4> [110.024839]   smp_irq_work_interrupt+0x44/0x1e0
> <4> [110.024855]   irq_work_interrupt+0xf/0x20
> <4> [110.024871]   __do_softirq+0xb7/0x47f
> <4> [110.024885]   irq_exit+0xba/0xc0
> <4> [110.024898]   do_IRQ+0x83/0x160
> <4> [110.024910]   ret_from_intr+0x0/0x1d
> <4> [110.024922] irq event stamp: 172864
> <4> [110.024938] hardirqs last  enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50
> <4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40
> <4> [110.024988] softirqs last  enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f
> <4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0
> <4> [110.025031]
> other info that might help us debug this:
> <4> [110.025049]  Possible unsafe locking scenario:
> 
> <4> [110.025065]        CPU0
> <4> [110.025075]        ----
> <4> [110.025084]   lock(&(&rq->lock)->rlock);
> <4> [110.025099]   <Interrupt>
> <4> [110.025109]     lock(&(&rq->lock)->rlock);
> <4> [110.025124]
>   *** DEADLOCK ***
> 
> <4> [110.025144] 4 locks held by kworker/0:0/5:
> <4> [110.025156]  #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620
> <4> [110.025187]  #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620
> <4> [110.025219]  #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915]
> <4> [110.025405]  #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
> <4> [110.025634]
> stack backtrace:
> <4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G     U            5.4.0-rc8-CI-CI_DRM_7489+ #1
> <4> [110.025675] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
> <4> [110.025856] Workqueue: events engine_retire [i915]
> <4> [110.025872] Call Trace:
> <4> [110.025891]  dump_stack+0x71/0x9b
> <4> [110.025907]  mark_lock+0x49a/0x500
> <4> [110.025926]  ? print_shortest_lock_dependencies+0x200/0x200
> <4> [110.025946]  mark_held_locks+0x49/0x70
> <4> [110.025962]  ? _raw_spin_unlock_irq+0x24/0x50
> <4> [110.025978]  lockdep_hardirqs_on+0xa2/0x1c0
> <4> [110.025995]  _raw_spin_unlock_irq+0x24/0x50
> <4> [110.026171]  virtual_context_destroy+0xc5/0x2e0 [i915]
> <4> [110.026376]  __active_retire+0xb4/0x290 [i915]
> <4> [110.026396]  dma_fence_signal_locked+0x9e/0x1b0
> <4> [110.026613]  i915_request_retire+0x451/0x930 [i915]
> <4> [110.026766]  retire_requests+0x4d/0x60 [i915]
> <4> [110.026919]  engine_retire+0x63/0xe0 [i915]
> 
> Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index a74387664583..c7ea8a055005 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -4200,17 +4200,18 @@ static void virtual_context_destroy(struct kref *kref)
>   	for (n = 0; n < ve->num_siblings; n++) {
>   		struct intel_engine_cs *sibling = ve->siblings[n];
>   		struct rb_node *node = &ve->nodes[sibling->id].rb;
> +		unsigned long flags;
>   
>   		if (RB_EMPTY_NODE(node))
>   			continue;
>   
> -		spin_lock_irq(&sibling->active.lock);
> +		spin_lock_irqsave(&sibling->active.lock, flags);
>   
>   		/* Detachment is lazily performed in the execlists tasklet */
>   		if (!RB_EMPTY_NODE(node))
>   			rb_erase_cached(node, &sibling->execlists.virtual);
>   
> -		spin_unlock_irq(&sibling->active.lock);
> +		spin_unlock_irqrestore(&sibling->active.lock, flags);
>   	}
>   	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index a74387664583..c7ea8a055005 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -4200,17 +4200,18 @@  static void virtual_context_destroy(struct kref *kref)
 	for (n = 0; n < ve->num_siblings; n++) {
 		struct intel_engine_cs *sibling = ve->siblings[n];
 		struct rb_node *node = &ve->nodes[sibling->id].rb;
+		unsigned long flags;
 
 		if (RB_EMPTY_NODE(node))
 			continue;
 
-		spin_lock_irq(&sibling->active.lock);
+		spin_lock_irqsave(&sibling->active.lock, flags);
 
 		/* Detachment is lazily performed in the execlists tasklet */
 		if (!RB_EMPTY_NODE(node))
 			rb_erase_cached(node, &sibling->execlists.virtual);
 
-		spin_unlock_irq(&sibling->active.lock);
+		spin_unlock_irqrestore(&sibling->active.lock, flags);
 	}
 	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));