Message ID | 20180508153514.20251-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/08/2018 08:35 AM, Chris Wilson wrote: > CI noticed > > <4>[ 23.430701] ============================================ > <4>[ 23.430706] WARNING: possible recursive locking detected > <4>[ 23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted > <4>[ 23.430720] -------------------------------------------- > <4>[ 23.430725] systemd-udevd/169 is trying to acquire lock: > <4>[ 23.430732] (ptrval) (&(&timeline->lock)->rlock){....}, at: move_to_timeline+0x48/0x12c [i915] > <4>[ 23.430888] > but task is already holding lock: > <4>[ 23.430894] (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] > <4>[ 23.430995] > other info that might help us debug this: > <4>[ 23.431002] Possible unsafe locking scenario: > > <4>[ 23.431007] CPU0 > <4>[ 23.431010] ---- > <4>[ 23.431013] lock(&(&timeline->lock)->rlock); > <4>[ 23.431021] lock(&(&timeline->lock)->rlock); > <4>[ 23.431028] > *** DEADLOCK *** > > <4>[ 23.431036] May be due to missing lock nesting notation > > <4>[ 23.431044] 5 locks held by systemd-udevd/169: > <4>[ 23.431049] #0: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x42/0xe0 > <4>[ 23.431065] #1: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x50/0xe0 > <4>[ 23.431078] #2: (ptrval) (&dev->struct_mutex){+.+.}, at: i915_gem_init+0xca/0x630 [i915] > <4>[ 23.431174] #3: (ptrval) (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915] > <4>[ 23.431271] #4: (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] > <4>[ 23.431369] > stack backtrace: > <4>[ 23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted 4.17.0-rc4-CI-CI_DRM_4156+ #1 > <4>[ 23.431385] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 > <4>[ 23.431394] Call Trace: > <4>[ 23.431403] dump_stack+0x67/0x9b ... > <4>[ 23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15: 0000561a472a4b30 > > but did not report it as an issue as it only occurred during the first > module on boot. This is due to the removal of the distinct global > timeline, and its separate lock class. So instead mark up the expected > nesting. An alternative would be to define a separate lock class for the > engine, but since we only expect to have a single point of nesting, we > can avoid having multiple lock classes for the struct. > > Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Tested-by: Michel Thierry <michel.thierry@intel.com> > --- > drivers/gpu/drm/i915/i915_request.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index f336942229cf..8928894dd9c7 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -502,7 +502,7 @@ static void move_to_timeline(struct i915_request *request, > GEM_BUG_ON(request->timeline == &request->engine->timeline); > lockdep_assert_held(&request->engine->timeline.lock); > > - spin_lock(&request->timeline->lock); > + spin_lock_nested(&request->timeline->lock, SINGLE_DEPTH_NESTING); > list_move_tail(&request->link, &timeline->requests); > spin_unlock(&request->timeline->lock); > } >
On 08/05/2018 16:35, Chris Wilson wrote: > CI noticed > > <4>[ 23.430701] ============================================ > <4>[ 23.430706] WARNING: possible recursive locking detected > <4>[ 23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted > <4>[ 23.430720] -------------------------------------------- > <4>[ 23.430725] systemd-udevd/169 is trying to acquire lock: > <4>[ 23.430732] (ptrval) (&(&timeline->lock)->rlock){....}, at: move_to_timeline+0x48/0x12c [i915] > <4>[ 23.430888] > but task is already holding lock: > <4>[ 23.430894] (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] > <4>[ 23.430995] > other info that might help us debug this: > <4>[ 23.431002] Possible unsafe locking scenario: > > <4>[ 23.431007] CPU0 > <4>[ 23.431010] ---- > <4>[ 23.431013] lock(&(&timeline->lock)->rlock); > <4>[ 23.431021] lock(&(&timeline->lock)->rlock); > <4>[ 23.431028] > *** DEADLOCK *** > > <4>[ 23.431036] May be due to missing lock nesting notation > > <4>[ 23.431044] 5 locks held by systemd-udevd/169: > <4>[ 23.431049] #0: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x42/0xe0 > <4>[ 23.431065] #1: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x50/0xe0 > <4>[ 23.431078] #2: (ptrval) (&dev->struct_mutex){+.+.}, at: i915_gem_init+0xca/0x630 [i915] > <4>[ 23.431174] #3: (ptrval) (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915] > <4>[ 23.431271] #4: (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] > <4>[ 23.431369] > stack backtrace: > <4>[ 23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted 4.17.0-rc4-CI-CI_DRM_4156+ #1 > <4>[ 23.431385] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 > <4>[ 23.431394] Call Trace: > <4>[ 23.431403] dump_stack+0x67/0x9b > <4>[ 23.431411] __lock_acquire+0xc67/0x1b50 > <4>[ 23.431421] ? ring_buffer_lock_reserve+0x154/0x3f0 > <4>[ 23.431429] ? lock_acquire+0xa6/0x210 > <4>[ 23.431435] lock_acquire+0xa6/0x210 > <4>[ 23.431530] ? move_to_timeline+0x48/0x12c [i915] > <4>[ 23.431540] _raw_spin_lock+0x2a/0x40 > <4>[ 23.431634] ? move_to_timeline+0x48/0x12c [i915] > <4>[ 23.431730] move_to_timeline+0x48/0x12c [i915] > <4>[ 23.431826] __i915_request_submit+0xfa/0x280 [i915] > <4>[ 23.431923] i915_request_submit+0x25/0x40 [i915] > <4>[ 23.432024] i9xx_submit_request+0x11/0x140 [i915] > <4>[ 23.432120] submit_notify+0x8d/0x124 [i915] > <4>[ 23.432202] __i915_sw_fence_complete+0x81/0x250 [i915] > <4>[ 23.432300] __i915_request_add+0x31c/0x7c0 [i915] > <4>[ 23.432395] i915_gem_init+0x621/0x630 [i915] > <4>[ 23.432476] i915_driver_load+0xbee/0x10b0 [i915] > <4>[ 23.432485] ? trace_hardirqs_on_caller+0xe0/0x1b0 > <4>[ 23.432566] i915_pci_probe+0x29/0x90 [i915] > <4>[ 23.432574] pci_device_probe+0xa1/0x130 > <4>[ 23.432582] driver_probe_device+0x306/0x480 > <4>[ 23.432589] __driver_attach+0xb7/0xe0 > <4>[ 23.432596] ? driver_probe_device+0x480/0x480 > <4>[ 23.432602] ? driver_probe_device+0x480/0x480 > <4>[ 23.432609] bus_for_each_dev+0x74/0xc0 > <4>[ 23.432616] bus_add_driver+0x15f/0x250 > <4>[ 23.432623] ? 0xffffffffa02d7000 > <4>[ 23.432629] driver_register+0x52/0xc0 > <4>[ 23.432635] ? 0xffffffffa02d7000 > <4>[ 23.432642] do_one_initcall+0x58/0x370 > <4>[ 23.432653] ? do_init_module+0x1d/0x1ea > <4>[ 23.432660] ? rcu_read_lock_sched_held+0x6f/0x80 > <4>[ 23.432667] ? kmem_cache_alloc_trace+0x282/0x2e0 > <4>[ 23.432675] do_init_module+0x56/0x1ea > <4>[ 23.432682] load_module+0x2435/0x2b20 > <4>[ 23.432694] ? __se_sys_finit_module+0xd3/0xf0 > <4>[ 23.432701] __se_sys_finit_module+0xd3/0xf0 > <4>[ 23.432710] do_syscall_64+0x55/0x190 > <4>[ 23.432717] entry_SYSCALL_64_after_hwframe+0x49/0xbe > <4>[ 23.432724] RIP: 0033:0x7fa780782839 > <4>[ 23.432729] RSP: 002b:00007ffcea73e668 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > <4>[ 23.432738] RAX: ffffffffffffffda RBX: 0000561a472a4b30 RCX: 00007fa780782839 > <4>[ 23.432745] RDX: 0000000000000000 RSI: 00007fa7804610e5 RDI: 000000000000000e > <4>[ 23.432752] RBP: 00007fa7804610e5 R08: 0000000000000000 R09: 00007ffcea73e780 > <4>[ 23.432758] R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000 > <4>[ 23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15: 0000561a472a4b30 > > but did not report it as an issue as it only occurred during the first > module on boot. This is due to the removal of the distinct global > timeline, and its separate lock class. So instead mark up the expected > nesting. An alternative would be to define a separate lock class for the > engine, but since we only expect to have a single point of nesting, we > can avoid having multiple lock classes for the struct. > > Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_request.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index f336942229cf..8928894dd9c7 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -502,7 +502,7 @@ static void move_to_timeline(struct i915_request *request, > GEM_BUG_ON(request->timeline == &request->engine->timeline); > lockdep_assert_held(&request->engine->timeline.lock); > > - spin_lock(&request->timeline->lock); > + spin_lock_nested(&request->timeline->lock, SINGLE_DEPTH_NESTING); > list_move_tail(&request->link, &timeline->requests); > spin_unlock(&request->timeline->lock); > } > Looks like I haven't been paying enough attention! Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-05-08 18:47:42) > > On 08/05/2018 16:35, Chris Wilson wrote: > > CI noticed > > > > <4>[ 23.430701] ============================================ > > <4>[ 23.430706] WARNING: possible recursive locking detected > > <4>[ 23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted > > <4>[ 23.430720] -------------------------------------------- > > <4>[ 23.430725] systemd-udevd/169 is trying to acquire lock: > > <4>[ 23.430732] (ptrval) (&(&timeline->lock)->rlock){....}, at: move_to_timeline+0x48/0x12c [i915] > > <4>[ 23.430888] > > but task is already holding lock: > > <4>[ 23.430894] (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] > > <4>[ 23.430995] > > other info that might help us debug this: > > <4>[ 23.431002] Possible unsafe locking scenario: > > > > <4>[ 23.431007] CPU0 > > <4>[ 23.431010] ---- > > <4>[ 23.431013] lock(&(&timeline->lock)->rlock); > > <4>[ 23.431021] lock(&(&timeline->lock)->rlock); > > <4>[ 23.431028] > > *** DEADLOCK *** > > > > <4>[ 23.431036] May be due to missing lock nesting notation > > > > <4>[ 23.431044] 5 locks held by systemd-udevd/169: > > <4>[ 23.431049] #0: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x42/0xe0 > > <4>[ 23.431065] #1: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x50/0xe0 > > <4>[ 23.431078] #2: (ptrval) (&dev->struct_mutex){+.+.}, at: i915_gem_init+0xca/0x630 [i915] > > <4>[ 23.431174] #3: (ptrval) (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915] > > <4>[ 23.431271] #4: (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] > > <4>[ 23.431369] > > stack backtrace: > > <4>[ 23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted 4.17.0-rc4-CI-CI_DRM_4156+ #1 > > <4>[ 23.431385] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 > > <4>[ 23.431394] Call Trace: > > <4>[ 23.431403] dump_stack+0x67/0x9b > > <4>[ 23.431411] __lock_acquire+0xc67/0x1b50 > > <4>[ 23.431421] ? ring_buffer_lock_reserve+0x154/0x3f0 > > <4>[ 23.431429] ? lock_acquire+0xa6/0x210 > > <4>[ 23.431435] lock_acquire+0xa6/0x210 > > <4>[ 23.431530] ? move_to_timeline+0x48/0x12c [i915] > > <4>[ 23.431540] _raw_spin_lock+0x2a/0x40 > > <4>[ 23.431634] ? move_to_timeline+0x48/0x12c [i915] > > <4>[ 23.431730] move_to_timeline+0x48/0x12c [i915] > > <4>[ 23.431826] __i915_request_submit+0xfa/0x280 [i915] > > <4>[ 23.431923] i915_request_submit+0x25/0x40 [i915] > > <4>[ 23.432024] i9xx_submit_request+0x11/0x140 [i915] > > <4>[ 23.432120] submit_notify+0x8d/0x124 [i915] > > <4>[ 23.432202] __i915_sw_fence_complete+0x81/0x250 [i915] > > <4>[ 23.432300] __i915_request_add+0x31c/0x7c0 [i915] > > <4>[ 23.432395] i915_gem_init+0x621/0x630 [i915] > > <4>[ 23.432476] i915_driver_load+0xbee/0x10b0 [i915] > > <4>[ 23.432485] ? trace_hardirqs_on_caller+0xe0/0x1b0 > > <4>[ 23.432566] i915_pci_probe+0x29/0x90 [i915] > > <4>[ 23.432574] pci_device_probe+0xa1/0x130 > > <4>[ 23.432582] driver_probe_device+0x306/0x480 > > <4>[ 23.432589] __driver_attach+0xb7/0xe0 > > <4>[ 23.432596] ? driver_probe_device+0x480/0x480 > > <4>[ 23.432602] ? driver_probe_device+0x480/0x480 > > <4>[ 23.432609] bus_for_each_dev+0x74/0xc0 > > <4>[ 23.432616] bus_add_driver+0x15f/0x250 > > <4>[ 23.432623] ? 0xffffffffa02d7000 > > <4>[ 23.432629] driver_register+0x52/0xc0 > > <4>[ 23.432635] ? 0xffffffffa02d7000 > > <4>[ 23.432642] do_one_initcall+0x58/0x370 > > <4>[ 23.432653] ? do_init_module+0x1d/0x1ea > > <4>[ 23.432660] ? rcu_read_lock_sched_held+0x6f/0x80 > > <4>[ 23.432667] ? kmem_cache_alloc_trace+0x282/0x2e0 > > <4>[ 23.432675] do_init_module+0x56/0x1ea > > <4>[ 23.432682] load_module+0x2435/0x2b20 > > <4>[ 23.432694] ? __se_sys_finit_module+0xd3/0xf0 > > <4>[ 23.432701] __se_sys_finit_module+0xd3/0xf0 > > <4>[ 23.432710] do_syscall_64+0x55/0x190 > > <4>[ 23.432717] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > <4>[ 23.432724] RIP: 0033:0x7fa780782839 > > <4>[ 23.432729] RSP: 002b:00007ffcea73e668 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > > <4>[ 23.432738] RAX: ffffffffffffffda RBX: 0000561a472a4b30 RCX: 00007fa780782839 > > <4>[ 23.432745] RDX: 0000000000000000 RSI: 00007fa7804610e5 RDI: 000000000000000e > > <4>[ 23.432752] RBP: 00007fa7804610e5 R08: 0000000000000000 R09: 00007ffcea73e780 > > <4>[ 23.432758] R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000 > > <4>[ 23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15: 0000561a472a4b30 > > > > but did not report it as an issue as it only occurred during the first > > module on boot. This is due to the removal of the distinct global > > timeline, and its separate lock class. So instead mark up the expected > > nesting. An alternative would be to define a separate lock class for the > > engine, but since we only expect to have a single point of nesting, we > > can avoid having multiple lock classes for the struct. > > > > Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > --- > > drivers/gpu/drm/i915/i915_request.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > > index f336942229cf..8928894dd9c7 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -502,7 +502,7 @@ static void move_to_timeline(struct i915_request *request, > > GEM_BUG_ON(request->timeline == &request->engine->timeline); > > lockdep_assert_held(&request->engine->timeline.lock); > > > > - spin_lock(&request->timeline->lock); > > + spin_lock_nested(&request->timeline->lock, SINGLE_DEPTH_NESTING); > > list_move_tail(&request->link, &timeline->requests); > > spin_unlock(&request->timeline->lock); > > } > > > > Looks like I haven't been paying enough attention! Nor me, and I've definitely run it through lockdep myself, at least I think I have! (though admittedly less often than simple kasan runs as I tend to be more concerned about accessing requests after they are freed). -Chris
Quoting Michel Thierry (2018-05-08 18:30:22) > On 05/08/2018 08:35 AM, Chris Wilson wrote: > > CI noticed > > > > <4>[ 23.430701] ============================================ > > <4>[ 23.430706] WARNING: possible recursive locking detected > > <4>[ 23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted > > <4>[ 23.430720] -------------------------------------------- > > <4>[ 23.430725] systemd-udevd/169 is trying to acquire lock: > > <4>[ 23.430732] (ptrval) (&(&timeline->lock)->rlock){....}, at: move_to_timeline+0x48/0x12c [i915] > > <4>[ 23.430888] > > but task is already holding lock: > > <4>[ 23.430894] (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] > > <4>[ 23.430995] > > other info that might help us debug this: > > <4>[ 23.431002] Possible unsafe locking scenario: > > > > <4>[ 23.431007] CPU0 > > <4>[ 23.431010] ---- > > <4>[ 23.431013] lock(&(&timeline->lock)->rlock); > > <4>[ 23.431021] lock(&(&timeline->lock)->rlock); > > <4>[ 23.431028] > > *** DEADLOCK *** > > > > <4>[ 23.431036] May be due to missing lock nesting notation > > > > <4>[ 23.431044] 5 locks held by systemd-udevd/169: > > <4>[ 23.431049] #0: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x42/0xe0 > > <4>[ 23.431065] #1: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x50/0xe0 > > <4>[ 23.431078] #2: (ptrval) (&dev->struct_mutex){+.+.}, at: i915_gem_init+0xca/0x630 [i915] > > <4>[ 23.431174] #3: (ptrval) (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915] > > <4>[ 23.431271] #4: (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] > > <4>[ 23.431369] > > stack backtrace: > > <4>[ 23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted 4.17.0-rc4-CI-CI_DRM_4156+ #1 > > <4>[ 23.431385] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 > > <4>[ 23.431394] Call Trace: > > <4>[ 23.431403] dump_stack+0x67/0x9b > ... > > <4>[ 23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15: 0000561a472a4b30 > > > > but did not report it as an issue as it only occurred during the first > > module on boot. This is due to the removal of the distinct global > > timeline, and its separate lock class. So instead mark up the expected > > nesting. An alternative would be to define a separate lock class for the > > engine, but since we only expect to have a single point of nesting, we > > can avoid having multiple lock classes for the struct. > > > > Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > Tested-by: Michel Thierry <michel.thierry@intel.com> Thanks for the review and testing; pushed. My apologies for letting an obvious bug in hindsight slip through, -Chris
Quoting Patchwork (2018-05-08 21:58:09) > == Series Details == > > Series: drm/i915: Annotate timeline lock nesting > URL : https://patchwork.freedesktop.org/series/42881/ > State : failure > > == Summary == > > = CI Bug Log - changes from CI_DRM_4158_full -> Patchwork_8946_full = > > == Summary - FAILURE == > > Serious unknown changes coming with Patchwork_8946_full absolutely need to be > verified manually. > > If you think the reported changes have nothing to do with the changes > introduced in Patchwork_8946_full, please notify your bug team to allow them > to document this new failure mode, which will reduce false positives in CI. > > External URL: https://patchwork.freedesktop.org/api/1.0/series/42881/revisions/1/mbox/ > > == Possible new issues == > > Here are the unknown changes that may have been introduced in Patchwork_8946_full: > > === IGT changes === > > ==== Possible regressions ==== > > igt@drv_selftest@live_hangcheck: > shard-kbl: PASS -> DMESG-FAIL Sigh, and that's exactly the problem we create for ourselves when lockdep fires once during boot and squashes all subsequent warnings. -Chris
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index f336942229cf..8928894dd9c7 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -502,7 +502,7 @@ static void move_to_timeline(struct i915_request *request, GEM_BUG_ON(request->timeline == &request->engine->timeline); lockdep_assert_held(&request->engine->timeline.lock); - spin_lock(&request->timeline->lock); + spin_lock_nested(&request->timeline->lock, SINGLE_DEPTH_NESTING); list_move_tail(&request->link, &timeline->requests); spin_unlock(&request->timeline->lock); }
CI noticed <4>[ 23.430701] ============================================ <4>[ 23.430706] WARNING: possible recursive locking detected <4>[ 23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted <4>[ 23.430720] -------------------------------------------- <4>[ 23.430725] systemd-udevd/169 is trying to acquire lock: <4>[ 23.430732] (ptrval) (&(&timeline->lock)->rlock){....}, at: move_to_timeline+0x48/0x12c [i915] <4>[ 23.430888] but task is already holding lock: <4>[ 23.430894] (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] <4>[ 23.430995] other info that might help us debug this: <4>[ 23.431002] Possible unsafe locking scenario: <4>[ 23.431007] CPU0 <4>[ 23.431010] ---- <4>[ 23.431013] lock(&(&timeline->lock)->rlock); <4>[ 23.431021] lock(&(&timeline->lock)->rlock); <4>[ 23.431028] *** DEADLOCK *** <4>[ 23.431036] May be due to missing lock nesting notation <4>[ 23.431044] 5 locks held by systemd-udevd/169: <4>[ 23.431049] #0: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x42/0xe0 <4>[ 23.431065] #1: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x50/0xe0 <4>[ 23.431078] #2: (ptrval) (&dev->struct_mutex){+.+.}, at: i915_gem_init+0xca/0x630 [i915] <4>[ 23.431174] #3: (ptrval) (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915] <4>[ 23.431271] #4: (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] <4>[ 23.431369] stack backtrace: <4>[ 23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted 4.17.0-rc4-CI-CI_DRM_4156+ #1 <4>[ 23.431385] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 <4>[ 23.431394] Call Trace: <4>[ 23.431403] dump_stack+0x67/0x9b <4>[ 23.431411] __lock_acquire+0xc67/0x1b50 <4>[ 23.431421] ? ring_buffer_lock_reserve+0x154/0x3f0 <4>[ 23.431429] ? lock_acquire+0xa6/0x210 <4>[ 23.431435] lock_acquire+0xa6/0x210 <4>[ 23.431530] ? move_to_timeline+0x48/0x12c [i915] <4>[ 23.431540] _raw_spin_lock+0x2a/0x40 <4>[ 23.431634] ? move_to_timeline+0x48/0x12c [i915] <4>[ 23.431730] move_to_timeline+0x48/0x12c [i915] <4>[ 23.431826] __i915_request_submit+0xfa/0x280 [i915] <4>[ 23.431923] i915_request_submit+0x25/0x40 [i915] <4>[ 23.432024] i9xx_submit_request+0x11/0x140 [i915] <4>[ 23.432120] submit_notify+0x8d/0x124 [i915] <4>[ 23.432202] __i915_sw_fence_complete+0x81/0x250 [i915] <4>[ 23.432300] __i915_request_add+0x31c/0x7c0 [i915] <4>[ 23.432395] i915_gem_init+0x621/0x630 [i915] <4>[ 23.432476] i915_driver_load+0xbee/0x10b0 [i915] <4>[ 23.432485] ? trace_hardirqs_on_caller+0xe0/0x1b0 <4>[ 23.432566] i915_pci_probe+0x29/0x90 [i915] <4>[ 23.432574] pci_device_probe+0xa1/0x130 <4>[ 23.432582] driver_probe_device+0x306/0x480 <4>[ 23.432589] __driver_attach+0xb7/0xe0 <4>[ 23.432596] ? driver_probe_device+0x480/0x480 <4>[ 23.432602] ? driver_probe_device+0x480/0x480 <4>[ 23.432609] bus_for_each_dev+0x74/0xc0 <4>[ 23.432616] bus_add_driver+0x15f/0x250 <4>[ 23.432623] ? 0xffffffffa02d7000 <4>[ 23.432629] driver_register+0x52/0xc0 <4>[ 23.432635] ? 0xffffffffa02d7000 <4>[ 23.432642] do_one_initcall+0x58/0x370 <4>[ 23.432653] ? do_init_module+0x1d/0x1ea <4>[ 23.432660] ? rcu_read_lock_sched_held+0x6f/0x80 <4>[ 23.432667] ? kmem_cache_alloc_trace+0x282/0x2e0 <4>[ 23.432675] do_init_module+0x56/0x1ea <4>[ 23.432682] load_module+0x2435/0x2b20 <4>[ 23.432694] ? __se_sys_finit_module+0xd3/0xf0 <4>[ 23.432701] __se_sys_finit_module+0xd3/0xf0 <4>[ 23.432710] do_syscall_64+0x55/0x190 <4>[ 23.432717] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 23.432724] RIP: 0033:0x7fa780782839 <4>[ 23.432729] RSP: 002b:00007ffcea73e668 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 <4>[ 23.432738] RAX: ffffffffffffffda RBX: 0000561a472a4b30 RCX: 00007fa780782839 <4>[ 23.432745] RDX: 0000000000000000 RSI: 00007fa7804610e5 RDI: 000000000000000e <4>[ 23.432752] RBP: 00007fa7804610e5 R08: 0000000000000000 R09: 00007ffcea73e780 <4>[ 23.432758] R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000 <4>[ 23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15: 0000561a472a4b30 but did not report it as an issue as it only occurred during the first module on boot. This is due to the removal of the distinct global timeline, and its separate lock class. So instead mark up the expected nesting. An alternative would be to define a separate lock class for the engine, but since we only expect to have a single point of nesting, we can avoid having multiple lock classes for the struct. Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> --- drivers/gpu/drm/i915/i915_request.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)