Message ID | 20191205145934.663183-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/gt: Save irqstate around virtual_context_destroy | expand |
Quoting Chris Wilson (2019-12-05 14:59:34) > As virtual_context_destroy() may be called from a request signal, it may > be called from inside an irq-off section, and so we need to do a full > save/restore of the irq state rather than blindly re-enable irqs upon > unlocking. > > <4> [110.024262] WARNING: inconsistent lock state > <4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G U > <4> [110.024292] -------------------------------- > <4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. > <4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes: > <4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] > <4> [110.024592] {IN-HARDIRQ-W} state was registered at: > <4> [110.024612] lock_acquire+0xa7/0x1c0 > <4> [110.024627] _raw_spin_lock_irqsave+0x33/0x50 > <4> [110.024788] intel_engine_breadcrumbs_irq+0x38c/0x600 [i915] > <4> [110.024808] irq_work_run_list+0x49/0x70 > <4> [110.024824] irq_work_run+0x26/0x50 > <4> [110.024839] smp_irq_work_interrupt+0x44/0x1e0 > <4> [110.024855] irq_work_interrupt+0xf/0x20 > <4> [110.024871] __do_softirq+0xb7/0x47f > <4> [110.024885] irq_exit+0xba/0xc0 > <4> [110.024898] do_IRQ+0x83/0x160 > <4> [110.024910] ret_from_intr+0x0/0x1d > <4> [110.024922] irq event stamp: 172864 > <4> [110.024938] hardirqs last enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50 > <4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40 > <4> [110.024988] softirqs last enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f > <4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0 > <4> [110.025031] > other info that might help us debug this: > <4> [110.025049] Possible unsafe locking scenario: > > <4> [110.025065] CPU0 > <4> [110.025075] ---- > <4> [110.025084] lock(&(&rq->lock)->rlock); > <4> [110.025099] <Interrupt> > <4> [110.025109] lock(&(&rq->lock)->rlock); > <4> [110.025124] > *** DEADLOCK *** > > <4> [110.025144] 4 locks held by kworker/0:0/5: > <4> [110.025156] #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620 > <4> [110.025187] #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620 > <4> [110.025219] #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915] > <4> [110.025405] #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] > <4> [110.025634] > stack backtrace: > <4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G U 5.4.0-rc8-CI-CI_DRM_7489+ #1 > <4> [110.025675] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017 > <4> [110.025856] Workqueue: events engine_retire [i915] > <4> [110.025872] Call Trace: > <4> [110.025891] dump_stack+0x71/0x9b > <4> [110.025907] mark_lock+0x49a/0x500 > <4> [110.025926] ? print_shortest_lock_dependencies+0x200/0x200 > <4> [110.025946] mark_held_locks+0x49/0x70 > <4> [110.025962] ? _raw_spin_unlock_irq+0x24/0x50 > <4> [110.025978] lockdep_hardirqs_on+0xa2/0x1c0 > <4> [110.025995] _raw_spin_unlock_irq+0x24/0x50 > <4> [110.026171] virtual_context_destroy+0xc5/0x2e0 [i915] > <4> [110.026376] __active_retire+0xb4/0x290 [i915] > <4> [110.026396] dma_fence_signal_locked+0x9e/0x1b0 > <4> [110.026613] i915_request_retire+0x451/0x930 [i915] > <4> [110.026766] retire_requests+0x4d/0x60 [i915] > <4> [110.026919] engine_retire+0x63/0xe0 [i915] > > Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine") Hmm, Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex") might be more accurate, but it's save to backport the full irqsave. -Chris
On 05/12/2019 14:59, Chris Wilson wrote: > As virtual_context_destroy() may be called from a request signal, it may > be called from inside an irq-off section, and so we need to do a full > save/restore of the irq state rather than blindly re-enable irqs upon > unlocking. > > <4> [110.024262] WARNING: inconsistent lock state > <4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G U > <4> [110.024292] -------------------------------- > <4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. > <4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes: > <4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] > <4> [110.024592] {IN-HARDIRQ-W} state was registered at: > <4> [110.024612] lock_acquire+0xa7/0x1c0 > <4> [110.024627] _raw_spin_lock_irqsave+0x33/0x50 > <4> [110.024788] intel_engine_breadcrumbs_irq+0x38c/0x600 [i915] > <4> [110.024808] irq_work_run_list+0x49/0x70 > <4> [110.024824] irq_work_run+0x26/0x50 > <4> [110.024839] smp_irq_work_interrupt+0x44/0x1e0 > <4> [110.024855] irq_work_interrupt+0xf/0x20 > <4> [110.024871] __do_softirq+0xb7/0x47f > <4> [110.024885] irq_exit+0xba/0xc0 > <4> [110.024898] do_IRQ+0x83/0x160 > <4> [110.024910] ret_from_intr+0x0/0x1d > <4> [110.024922] irq event stamp: 172864 > <4> [110.024938] hardirqs last enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50 > <4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40 > <4> [110.024988] softirqs last enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f > <4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0 > <4> [110.025031] > other info that might help us debug this: > <4> [110.025049] Possible unsafe locking scenario: > > <4> [110.025065] CPU0 > <4> [110.025075] ---- > <4> [110.025084] lock(&(&rq->lock)->rlock); > <4> [110.025099] <Interrupt> > <4> [110.025109] lock(&(&rq->lock)->rlock); > <4> [110.025124] > *** DEADLOCK *** > > <4> [110.025144] 4 locks held by kworker/0:0/5: > <4> [110.025156] #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620 > <4> [110.025187] #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620 > <4> [110.025219] #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915] > <4> [110.025405] #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] > <4> [110.025634] > stack backtrace: > <4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G U 5.4.0-rc8-CI-CI_DRM_7489+ #1 > <4> [110.025675] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017 > <4> [110.025856] Workqueue: events engine_retire [i915] > <4> [110.025872] Call Trace: > <4> [110.025891] dump_stack+0x71/0x9b > <4> [110.025907] mark_lock+0x49a/0x500 > <4> [110.025926] ? print_shortest_lock_dependencies+0x200/0x200 > <4> [110.025946] mark_held_locks+0x49/0x70 > <4> [110.025962] ? _raw_spin_unlock_irq+0x24/0x50 > <4> [110.025978] lockdep_hardirqs_on+0xa2/0x1c0 > <4> [110.025995] _raw_spin_unlock_irq+0x24/0x50 > <4> [110.026171] virtual_context_destroy+0xc5/0x2e0 [i915] > <4> [110.026376] __active_retire+0xb4/0x290 [i915] > <4> [110.026396] dma_fence_signal_locked+0x9e/0x1b0 > <4> [110.026613] i915_request_retire+0x451/0x930 [i915] > <4> [110.026766] retire_requests+0x4d/0x60 [i915] > <4> [110.026919] engine_retire+0x63/0xe0 [i915] > > Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_lrc.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > index a74387664583..c7ea8a055005 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > @@ -4200,17 +4200,18 @@ static void virtual_context_destroy(struct kref *kref) > for (n = 0; n < ve->num_siblings; n++) { > struct intel_engine_cs *sibling = ve->siblings[n]; > struct rb_node *node = &ve->nodes[sibling->id].rb; > + unsigned long flags; > > if (RB_EMPTY_NODE(node)) > continue; > > - spin_lock_irq(&sibling->active.lock); > + spin_lock_irqsave(&sibling->active.lock, flags); > > /* Detachment is lazily performed in the execlists tasklet */ > if (!RB_EMPTY_NODE(node)) > rb_erase_cached(node, &sibling->execlists.virtual); > > - spin_unlock_irq(&sibling->active.lock); > + spin_unlock_irqrestore(&sibling->active.lock, flags); > } > GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet)); > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index a74387664583..c7ea8a055005 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -4200,17 +4200,18 @@ static void virtual_context_destroy(struct kref *kref) for (n = 0; n < ve->num_siblings; n++) { struct intel_engine_cs *sibling = ve->siblings[n]; struct rb_node *node = &ve->nodes[sibling->id].rb; + unsigned long flags; if (RB_EMPTY_NODE(node)) continue; - spin_lock_irq(&sibling->active.lock); + spin_lock_irqsave(&sibling->active.lock, flags); /* Detachment is lazily performed in the execlists tasklet */ if (!RB_EMPTY_NODE(node)) rb_erase_cached(node, &sibling->execlists.virtual); - spin_unlock_irq(&sibling->active.lock); + spin_unlock_irqrestore(&sibling->active.lock, flags); } GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
As virtual_context_destroy() may be called from a request signal, it may be called from inside an irq-off section, and so we need to do a full save/restore of the irq state rather than blindly re-enable irqs upon unlocking. <4> [110.024262] WARNING: inconsistent lock state <4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G U <4> [110.024292] -------------------------------- <4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. <4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes: <4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] <4> [110.024592] {IN-HARDIRQ-W} state was registered at: <4> [110.024612] lock_acquire+0xa7/0x1c0 <4> [110.024627] _raw_spin_lock_irqsave+0x33/0x50 <4> [110.024788] intel_engine_breadcrumbs_irq+0x38c/0x600 [i915] <4> [110.024808] irq_work_run_list+0x49/0x70 <4> [110.024824] irq_work_run+0x26/0x50 <4> [110.024839] smp_irq_work_interrupt+0x44/0x1e0 <4> [110.024855] irq_work_interrupt+0xf/0x20 <4> [110.024871] __do_softirq+0xb7/0x47f <4> [110.024885] irq_exit+0xba/0xc0 <4> [110.024898] do_IRQ+0x83/0x160 <4> [110.024910] ret_from_intr+0x0/0x1d <4> [110.024922] irq event stamp: 172864 <4> [110.024938] hardirqs last enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50 <4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40 <4> [110.024988] softirqs last enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f <4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0 <4> [110.025031] other info that might help us debug this: <4> [110.025049] Possible unsafe locking scenario: <4> [110.025065] CPU0 <4> [110.025075] ---- <4> [110.025084] lock(&(&rq->lock)->rlock); <4> [110.025099] <Interrupt> <4> [110.025109] lock(&(&rq->lock)->rlock); <4> [110.025124] *** DEADLOCK *** <4> [110.025144] 4 locks held by kworker/0:0/5: <4> [110.025156] #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620 <4> [110.025187] #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620 <4> [110.025219] #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915] <4> [110.025405] #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] <4> [110.025634] stack backtrace: <4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G U 5.4.0-rc8-CI-CI_DRM_7489+ #1 <4> [110.025675] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017 <4> [110.025856] Workqueue: events engine_retire [i915] <4> [110.025872] Call Trace: <4> [110.025891] dump_stack+0x71/0x9b <4> [110.025907] mark_lock+0x49a/0x500 <4> [110.025926] ? print_shortest_lock_dependencies+0x200/0x200 <4> [110.025946] mark_held_locks+0x49/0x70 <4> [110.025962] ? _raw_spin_unlock_irq+0x24/0x50 <4> [110.025978] lockdep_hardirqs_on+0xa2/0x1c0 <4> [110.025995] _raw_spin_unlock_irq+0x24/0x50 <4> [110.026171] virtual_context_destroy+0xc5/0x2e0 [i915] <4> [110.026376] __active_retire+0xb4/0x290 [i915] <4> [110.026396] dma_fence_signal_locked+0x9e/0x1b0 <4> [110.026613] i915_request_retire+0x451/0x930 [i915] <4> [110.026766] retire_requests+0x4d/0x60 [i915] <4> [110.026919] engine_retire+0x63/0xe0 [i915] Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/gt/intel_lrc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)