Message ID | 20230607190350.287644-1-zhanjun.dong@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Avoid circular locking dependency when flush delayed work on gt reset | expand |
On 6/7/2023 12:03, Zhanjun Dong wrote: > This attempts to avoid circular locking dependency between flush delayed work and intel_gt_reset. > Switched from cancel_delayed_work_sync to cancel_delayed_work, the non-sync version for reset path, it is safe as the worker has the trylock code to handle the lock; Meanwhile keep the sync version for park/fini to ensure the worker is not still running during suspend or shutdown. > > WARNING: possible circular locking dependency detected > 6.4.0-rc1-drmtip_1340-g31e3463b0edb+ #1 Not tainted > ------------------------------------------------------ > kms_pipe_crc_ba/6415 is trying to acquire lock: > ffff88813e6cc640 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}, at: __flush_work+0x42/0x530 > > but task is already holding lock: > ffff88813e6cce90 (>->reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #3 (>->reset.mutex){+.+.}-{3:3}: > lock_acquire+0xd8/0x2d0 > i915_gem_shrinker_taints_mutex+0x31/0x50 [i915] > intel_gt_init_reset+0x65/0x80 [i915] > intel_gt_common_init_early+0xe1/0x170 [i915] > intel_root_gt_init_early+0x48/0x60 [i915] > i915_driver_probe+0x671/0xcb0 [i915] > i915_pci_probe+0xdc/0x210 [i915] > pci_device_probe+0x95/0x120 > really_probe+0x164/0x3c0 > __driver_probe_device+0x73/0x160 > driver_probe_device+0x19/0xa0 > __driver_attach+0xb6/0x180 > bus_for_each_dev+0x77/0xd0 > bus_add_driver+0x114/0x210 > driver_register+0x5b/0x110 > __pfx_vgem_open+0x3/0x10 [vgem] > do_one_initcall+0x57/0x270 > do_init_module+0x5f/0x220 > load_module+0x1ca4/0x1f00 > __do_sys_finit_module+0xb4/0x130 > do_syscall_64+0x3c/0x90 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > -> #2 (fs_reclaim){+.+.}-{0:0}: > lock_acquire+0xd8/0x2d0 > fs_reclaim_acquire+0xac/0xe0 > kmem_cache_alloc+0x32/0x260 > i915_vma_instance+0xb2/0xc60 [i915] > i915_gem_object_ggtt_pin_ww+0x175/0x370 [i915] > vm_fault_gtt+0x22d/0xf60 [i915] > __do_fault+0x2f/0x1d0 > do_pte_missing+0x4a/0xd20 > __handle_mm_fault+0x5b0/0x790 > handle_mm_fault+0xa2/0x230 > do_user_addr_fault+0x3ea/0xa10 > exc_page_fault+0x68/0x1a0 > asm_exc_page_fault+0x26/0x30 > > -> #1 (>->reset.backoff_srcu){++++}-{0:0}: > lock_acquire+0xd8/0x2d0 > _intel_gt_reset_lock+0x57/0x330 [i915] > guc_timestamp_ping+0x35/0x130 [i915] > process_one_work+0x250/0x510 > worker_thread+0x4f/0x3a0 > kthread+0xff/0x130 > ret_from_fork+0x29/0x50 > > -> #0 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}: > check_prev_add+0x90/0xc60 > __lock_acquire+0x1998/0x2590 > lock_acquire+0xd8/0x2d0 > __flush_work+0x74/0x530 > __cancel_work_timer+0x14f/0x1f0 > intel_guc_submission_reset_prepare+0x81/0x4b0 [i915] > intel_uc_reset_prepare+0x9c/0x120 [i915] > reset_prepare+0x21/0x60 [i915] > intel_gt_reset+0x1dd/0x470 [i915] > intel_gt_reset_global+0xfb/0x170 [i915] > intel_gt_handle_error+0x368/0x420 [i915] > intel_gt_debugfs_reset_store+0x5c/0xc0 [i915] > i915_wedged_set+0x29/0x40 [i915] > simple_attr_write_xsigned.constprop.0+0xb4/0x110 > full_proxy_write+0x52/0x80 > vfs_write+0xc5/0x4f0 > ksys_write+0x64/0xe0 > do_syscall_64+0x3c/0x90 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > other info that might help us debug this: > Chain exists of: > (work_completion)(&(&guc->timestamp.work)->work) --> fs_reclaim --> >->reset.mutex > Possible unsafe locking scenario: > CPU0 CPU1 > ---- ---- > lock(>->reset.mutex); > lock(fs_reclaim); > lock(>->reset.mutex); > lock((work_completion)(&(&guc->timestamp.work)->work)); > > *** DEADLOCK *** > 3 locks held by kms_pipe_crc_ba/6415: > #0: ffff888101541430 (sb_writers#15){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 > #1: ffff888136c7eab8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_write_xsigned.constprop.0+0x47/0x110 > #2: ffff88813e6cce90 (>->reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] > > Signed-off-by: Zhanjun Dong<zhanjun.dong@intel.com> > --- > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index a0e3ef1c65d2..cca6960d3490 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -1357,9 +1357,12 @@ static void guc_enable_busyness_worker(struct intel_guc *guc) > mod_delayed_work(system_highpri_wq, &guc->timestamp.work, guc->timestamp.ping_delay); > } > > -static void guc_cancel_busyness_worker(struct intel_guc *guc) > +static void guc_cancel_busyness_worker(struct intel_guc *guc, bool sync) > { > - cancel_delayed_work_sync(&guc->timestamp.work); > + if (sync) > + cancel_delayed_work_sync(&guc->timestamp.work); > + else > + cancel_delayed_work(&guc->timestamp.work); > } > > static void __reset_guc_busyness_stats(struct intel_guc *guc) > @@ -1370,7 +1373,7 @@ static void __reset_guc_busyness_stats(struct intel_guc *guc) > unsigned long flags; > ktime_t unused; > > - guc_cancel_busyness_worker(guc); > + guc_cancel_busyness_worker(guc, 0); Should use true/false rather than 1/0 for bool values. Also, this needs a comment actually in the code, not just in the patch description. E.g.: Attempting a synchronous cancel within the reset path leads to a circular mutex locking complaint by lockdep. However, it is safe to use an asynchronous cancel here. If the worker does actually run concurrently with a reset then it will early exit due to the mutex_trylock call rather than block. > > spin_lock_irqsave(&guc->timestamp.lock, flags); > > @@ -1485,7 +1488,7 @@ static int guc_init_engine_stats(struct intel_guc *guc) > > static void guc_fini_engine_stats(struct intel_guc *guc) > { > - guc_cancel_busyness_worker(guc); > + guc_cancel_busyness_worker(guc, 1); > } > > void intel_guc_busyness_park(struct intel_gt *gt) > @@ -1500,7 +1503,7 @@ void intel_guc_busyness_park(struct intel_gt *gt) > * and causes an unclaimed register access warning. Cancel the worker > * synchronously here. > */ > - guc_cancel_busyness_worker(guc); > + guc_cancel_busyness_worker(guc, 1); > > /* > * Before parking, we should sample engine busyness stats if we need to. > @@ -4503,7 +4506,7 @@ int intel_guc_submission_enable(struct intel_guc *guc) > /* Note: By the time we're here, GuC may have already been reset */ > void intel_guc_submission_disable(struct intel_guc *guc) > { > - guc_cancel_busyness_worker(guc); > + guc_cancel_busyness_worker(guc, 0); Hmm. I think this is going to need breaking up further back in the stack. We definitely want to be doing a synchronous cancel in the general case of disabling submission (e.g. due to suspend or driver unload). But if this is happening as part of a reset call stack, then it is a problem. AFAICT, the only way _submission_disable would be called within a reset is if __uc_init_hw() failed for some reason. So one option would be to add the sync/async flag to _submission_disable() as well and just make the init failure case async with other callers being sync. A better option might be to add an 'are busyness stats enabled' boolean to the guc structure. And inside the cancel function, early exit if the worker is not actually enabled (and set the flag anywhere and everywhere that does a mod_work to enable it, which is just guc_enable_busyness_worker() I think?). That would mean that multiple disables do nothing. So e.g. a failed reset will cancel the worker asynchronously in reset prepare but then not try to cancel it again synchronously in submission disable. Hmm, except that init_hw has already enabled the worker by that point :(. FYC: Umesh and Daniele... any thoughts? I would be tempted to say is there any way we can just add a lockdep annotation to ignore this issue? The lockdep splat described in the patch description above seems like a false positive to me. Sure the reset lock is held by the reset code which is now trying to synchronously flush the busyness worker thread which also takes the reset lock. But the busyness worker thread does a trylock and will abort if the lock is already held. So no issue... However. I think we do have a genuine issue with the internal delayed worker lock itself, which has maybe shown up in other lockdep splat reports. For example, if a worker thread triggers a reset (e.g. anything reset related coming in via a G2H) then the reset code is running inside a worker thread. Which is maybe holding an internal kernel worker thread lock? So if the reset path does a synchronous cancel of another worker thread, that also requires taking the worker thread lock and thus a deadlock occurs. John. > > /* Semaphore interrupt disable and route to host */ > guc_route_semaphores(guc, false);
Hi Dong, On Wed, Jun 07, 2023 at 12:03:50PM -0700, Zhanjun Dong wrote: > This attempts to avoid circular locking dependency between flush delayed work and intel_gt_reset. > Switched from cancel_delayed_work_sync to cancel_delayed_work, the non-sync version for reset path, it is safe as the worker has the trylock code to handle the lock; Meanwhile keep the sync version for park/fini to ensure the worker is not still running during suspend or shutdown. > > WARNING: possible circular locking dependency detected > 6.4.0-rc1-drmtip_1340-g31e3463b0edb+ #1 Not tainted > ------------------------------------------------------ > kms_pipe_crc_ba/6415 is trying to acquire lock: > ffff88813e6cc640 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}, at: __flush_work+0x42/0x530 > > but task is already holding lock: > ffff88813e6cce90 (>->reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #3 (>->reset.mutex){+.+.}-{3:3}: > lock_acquire+0xd8/0x2d0 > i915_gem_shrinker_taints_mutex+0x31/0x50 [i915] > intel_gt_init_reset+0x65/0x80 [i915] > intel_gt_common_init_early+0xe1/0x170 [i915] > intel_root_gt_init_early+0x48/0x60 [i915] > i915_driver_probe+0x671/0xcb0 [i915] > i915_pci_probe+0xdc/0x210 [i915] > pci_device_probe+0x95/0x120 > really_probe+0x164/0x3c0 > __driver_probe_device+0x73/0x160 > driver_probe_device+0x19/0xa0 > __driver_attach+0xb6/0x180 > bus_for_each_dev+0x77/0xd0 > bus_add_driver+0x114/0x210 > driver_register+0x5b/0x110 > __pfx_vgem_open+0x3/0x10 [vgem] > do_one_initcall+0x57/0x270 > do_init_module+0x5f/0x220 > load_module+0x1ca4/0x1f00 > __do_sys_finit_module+0xb4/0x130 > do_syscall_64+0x3c/0x90 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > -> #2 (fs_reclaim){+.+.}-{0:0}: > lock_acquire+0xd8/0x2d0 > fs_reclaim_acquire+0xac/0xe0 > kmem_cache_alloc+0x32/0x260 > i915_vma_instance+0xb2/0xc60 [i915] > i915_gem_object_ggtt_pin_ww+0x175/0x370 [i915] > vm_fault_gtt+0x22d/0xf60 [i915] > __do_fault+0x2f/0x1d0 > do_pte_missing+0x4a/0xd20 > __handle_mm_fault+0x5b0/0x790 > handle_mm_fault+0xa2/0x230 > do_user_addr_fault+0x3ea/0xa10 > exc_page_fault+0x68/0x1a0 > asm_exc_page_fault+0x26/0x30 > > -> #1 (>->reset.backoff_srcu){++++}-{0:0}: > lock_acquire+0xd8/0x2d0 > _intel_gt_reset_lock+0x57/0x330 [i915] > guc_timestamp_ping+0x35/0x130 [i915] > process_one_work+0x250/0x510 > worker_thread+0x4f/0x3a0 > kthread+0xff/0x130 > ret_from_fork+0x29/0x50 > > -> #0 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}: > check_prev_add+0x90/0xc60 > __lock_acquire+0x1998/0x2590 > lock_acquire+0xd8/0x2d0 > __flush_work+0x74/0x530 > __cancel_work_timer+0x14f/0x1f0 > intel_guc_submission_reset_prepare+0x81/0x4b0 [i915] > intel_uc_reset_prepare+0x9c/0x120 [i915] > reset_prepare+0x21/0x60 [i915] > intel_gt_reset+0x1dd/0x470 [i915] > intel_gt_reset_global+0xfb/0x170 [i915] > intel_gt_handle_error+0x368/0x420 [i915] > intel_gt_debugfs_reset_store+0x5c/0xc0 [i915] > i915_wedged_set+0x29/0x40 [i915] > simple_attr_write_xsigned.constprop.0+0xb4/0x110 > full_proxy_write+0x52/0x80 > vfs_write+0xc5/0x4f0 > ksys_write+0x64/0xe0 > do_syscall_64+0x3c/0x90 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > other info that might help us debug this: > Chain exists of: > (work_completion)(&(&guc->timestamp.work)->work) --> fs_reclaim --> >->reset.mutex > Possible unsafe locking scenario: > CPU0 CPU1 > ---- ---- > lock(>->reset.mutex); > lock(fs_reclaim); > lock(>->reset.mutex); > lock((work_completion)(&(&guc->timestamp.work)->work)); > > *** DEADLOCK *** > 3 locks held by kms_pipe_crc_ba/6415: > #0: ffff888101541430 (sb_writers#15){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 > #1: ffff888136c7eab8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_write_xsigned.constprop.0+0x47/0x110 > #2: ffff88813e6cce90 (>->reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] > > Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Andrzej's r-b is missing here. > --- Please add a version to your patch and a changelog. Thanks, Andi
Hi Andi, Thanks for comments. Info would be updated on next revision, which is on the way. Regards, Zhanjun Dong On 2023-06-07 8:19 p.m., Andi Shyti wrote: > Hi Dong, > > On Wed, Jun 07, 2023 at 12:03:50PM -0700, Zhanjun Dong wrote: >> This attempts to avoid circular locking dependency between flush delayed work and intel_gt_reset. >> Switched from cancel_delayed_work_sync to cancel_delayed_work, the non-sync version for reset path, it is safe as the worker has the trylock code to handle the lock; Meanwhile keep the sync version for park/fini to ensure the worker is not still running during suspend or shutdown. >> >> WARNING: possible circular locking dependency detected >> 6.4.0-rc1-drmtip_1340-g31e3463b0edb+ #1 Not tainted >> ------------------------------------------------------ >> kms_pipe_crc_ba/6415 is trying to acquire lock: >> ffff88813e6cc640 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}, at: __flush_work+0x42/0x530 >> >> but task is already holding lock: >> ffff88813e6cce90 (>->reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] >> >> which lock already depends on the new lock. >> >> the existing dependency chain (in reverse order) is: >> >> -> #3 (>->reset.mutex){+.+.}-{3:3}: >> lock_acquire+0xd8/0x2d0 >> i915_gem_shrinker_taints_mutex+0x31/0x50 [i915] >> intel_gt_init_reset+0x65/0x80 [i915] >> intel_gt_common_init_early+0xe1/0x170 [i915] >> intel_root_gt_init_early+0x48/0x60 [i915] >> i915_driver_probe+0x671/0xcb0 [i915] >> i915_pci_probe+0xdc/0x210 [i915] >> pci_device_probe+0x95/0x120 >> really_probe+0x164/0x3c0 >> __driver_probe_device+0x73/0x160 >> driver_probe_device+0x19/0xa0 >> __driver_attach+0xb6/0x180 >> bus_for_each_dev+0x77/0xd0 >> bus_add_driver+0x114/0x210 >> driver_register+0x5b/0x110 >> __pfx_vgem_open+0x3/0x10 [vgem] >> do_one_initcall+0x57/0x270 >> do_init_module+0x5f/0x220 >> load_module+0x1ca4/0x1f00 >> __do_sys_finit_module+0xb4/0x130 >> do_syscall_64+0x3c/0x90 >> entry_SYSCALL_64_after_hwframe+0x72/0xdc >> >> -> #2 (fs_reclaim){+.+.}-{0:0}: >> lock_acquire+0xd8/0x2d0 >> fs_reclaim_acquire+0xac/0xe0 >> kmem_cache_alloc+0x32/0x260 >> i915_vma_instance+0xb2/0xc60 [i915] >> i915_gem_object_ggtt_pin_ww+0x175/0x370 [i915] >> vm_fault_gtt+0x22d/0xf60 [i915] >> __do_fault+0x2f/0x1d0 >> do_pte_missing+0x4a/0xd20 >> __handle_mm_fault+0x5b0/0x790 >> handle_mm_fault+0xa2/0x230 >> do_user_addr_fault+0x3ea/0xa10 >> exc_page_fault+0x68/0x1a0 >> asm_exc_page_fault+0x26/0x30 >> >> -> #1 (>->reset.backoff_srcu){++++}-{0:0}: >> lock_acquire+0xd8/0x2d0 >> _intel_gt_reset_lock+0x57/0x330 [i915] >> guc_timestamp_ping+0x35/0x130 [i915] >> process_one_work+0x250/0x510 >> worker_thread+0x4f/0x3a0 >> kthread+0xff/0x130 >> ret_from_fork+0x29/0x50 >> >> -> #0 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}: >> check_prev_add+0x90/0xc60 >> __lock_acquire+0x1998/0x2590 >> lock_acquire+0xd8/0x2d0 >> __flush_work+0x74/0x530 >> __cancel_work_timer+0x14f/0x1f0 >> intel_guc_submission_reset_prepare+0x81/0x4b0 [i915] >> intel_uc_reset_prepare+0x9c/0x120 [i915] >> reset_prepare+0x21/0x60 [i915] >> intel_gt_reset+0x1dd/0x470 [i915] >> intel_gt_reset_global+0xfb/0x170 [i915] >> intel_gt_handle_error+0x368/0x420 [i915] >> intel_gt_debugfs_reset_store+0x5c/0xc0 [i915] >> i915_wedged_set+0x29/0x40 [i915] >> simple_attr_write_xsigned.constprop.0+0xb4/0x110 >> full_proxy_write+0x52/0x80 >> vfs_write+0xc5/0x4f0 >> ksys_write+0x64/0xe0 >> do_syscall_64+0x3c/0x90 >> entry_SYSCALL_64_after_hwframe+0x72/0xdc >> >> other info that might help us debug this: >> Chain exists of: >> (work_completion)(&(&guc->timestamp.work)->work) --> fs_reclaim --> >->reset.mutex >> Possible unsafe locking scenario: >> CPU0 CPU1 >> ---- ---- >> lock(>->reset.mutex); >> lock(fs_reclaim); >> lock(>->reset.mutex); >> lock((work_completion)(&(&guc->timestamp.work)->work)); >> >> *** DEADLOCK *** >> 3 locks held by kms_pipe_crc_ba/6415: >> #0: ffff888101541430 (sb_writers#15){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 >> #1: ffff888136c7eab8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_write_xsigned.constprop.0+0x47/0x110 >> #2: ffff88813e6cce90 (>->reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] >> >> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> > Andrzej's r-b is missing here. > >> --- > Please add a version to your patch and a changelog. > > Thanks, > Andi
John mentioned 2 options: 1. “add the sync/async flag to _submission_disable()” Consider to be a small change 1. add an 'are busyness stats enabled' boolean to the guc structure Seems effected area among the flow and is much more than option 1. I would like to discuss a bit more before moving forward. Regards, Zhanjun Dong From: Harrison, John C <john.c.harrison@intel.com> Sent: June 7, 2023 4:17 PM To: Dong, Zhanjun <zhanjun.dong@intel.com>; intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Nerlige Ramappa, Umesh <umesh.nerlige.ramappa@intel.com>; Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset On 6/7/2023 12:03, Zhanjun Dong wrote: This attempts to avoid circular locking dependency between flush delayed work and intel_gt_reset. Switched from cancel_delayed_work_sync to cancel_delayed_work, the non-sync version for reset path, it is safe as the worker has the trylock code to handle the lock; Meanwhile keep the sync version for park/fini to ensure the worker is not still running during suspend or shutdown. WARNING: possible circular locking dependency detected 6.4.0-rc1-drmtip_1340-g31e3463b0edb+ #1 Not tainted ------------------------------------------------------ kms_pipe_crc_ba/6415 is trying to acquire lock: ffff88813e6cc640 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}, at: __flush_work+0x42/0x530 but task is already holding lock: ffff88813e6cce90 (>->reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (>->reset.mutex){+.+.}-{3:3}: lock_acquire+0xd8/0x2d0 i915_gem_shrinker_taints_mutex+0x31/0x50 [i915] intel_gt_init_reset+0x65/0x80 [i915] intel_gt_common_init_early+0xe1/0x170 [i915] intel_root_gt_init_early+0x48/0x60 [i915] i915_driver_probe+0x671/0xcb0 [i915] i915_pci_probe+0xdc/0x210 [i915] pci_device_probe+0x95/0x120 really_probe+0x164/0x3c0 __driver_probe_device+0x73/0x160 driver_probe_device+0x19/0xa0 __driver_attach+0xb6/0x180 bus_for_each_dev+0x77/0xd0 bus_add_driver+0x114/0x210 driver_register+0x5b/0x110 __pfx_vgem_open+0x3/0x10 [vgem] do_one_initcall+0x57/0x270 do_init_module+0x5f/0x220 load_module+0x1ca4/0x1f00 __do_sys_finit_module+0xb4/0x130 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc -> #2 (fs_reclaim){+.+.}-{0:0}: lock_acquire+0xd8/0x2d0 fs_reclaim_acquire+0xac/0xe0 kmem_cache_alloc+0x32/0x260 i915_vma_instance+0xb2/0xc60 [i915] i915_gem_object_ggtt_pin_ww+0x175/0x370 [i915] vm_fault_gtt+0x22d/0xf60 [i915] __do_fault+0x2f/0x1d0 do_pte_missing+0x4a/0xd20 __handle_mm_fault+0x5b0/0x790 handle_mm_fault+0xa2/0x230 do_user_addr_fault+0x3ea/0xa10 exc_page_fault+0x68/0x1a0 asm_exc_page_fault+0x26/0x30 -> #1 (>->reset.backoff_srcu){++++}-{0:0}: lock_acquire+0xd8/0x2d0 _intel_gt_reset_lock+0x57/0x330 [i915] guc_timestamp_ping+0x35/0x130 [i915] process_one_work+0x250/0x510 worker_thread+0x4f/0x3a0 kthread+0xff/0x130 ret_from_fork+0x29/0x50 -> #0 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}: check_prev_add+0x90/0xc60 __lock_acquire+0x1998/0x2590 lock_acquire+0xd8/0x2d0 __flush_work+0x74/0x530 __cancel_work_timer+0x14f/0x1f0 intel_guc_submission_reset_prepare+0x81/0x4b0 [i915] intel_uc_reset_prepare+0x9c/0x120 [i915] reset_prepare+0x21/0x60 [i915] intel_gt_reset+0x1dd/0x470 [i915] intel_gt_reset_global+0xfb/0x170 [i915] intel_gt_handle_error+0x368/0x420 [i915] intel_gt_debugfs_reset_store+0x5c/0xc0 [i915] i915_wedged_set+0x29/0x40 [i915] simple_attr_write_xsigned.constprop.0+0xb4/0x110 full_proxy_write+0x52/0x80 vfs_write+0xc5/0x4f0 ksys_write+0x64/0xe0 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc other info that might help us debug this: Chain exists of: (work_completion)(&(&guc->timestamp.work)->work) --> fs_reclaim --> >->reset.mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(>->reset.mutex); lock(fs_reclaim); lock(>->reset.mutex); lock((work_completion)(&(&guc->timestamp.work)->work)); *** DEADLOCK *** 3 locks held by kms_pipe_crc_ba/6415: #0: ffff888101541430 (sb_writers#15){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 #1: ffff888136c7eab8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_write_xsigned.constprop.0+0x47/0x110 #2: ffff88813e6cce90 (>->reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com><mailto:zhanjun.dong@intel.com> --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a0e3ef1c65d2..cca6960d3490 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1357,9 +1357,12 @@ static void guc_enable_busyness_worker(struct intel_guc *guc) mod_delayed_work(system_highpri_wq, &guc->timestamp.work, guc->timestamp.ping_delay); } -static void guc_cancel_busyness_worker(struct intel_guc *guc) +static void guc_cancel_busyness_worker(struct intel_guc *guc, bool sync) { - cancel_delayed_work_sync(&guc->timestamp.work); + if (sync) + cancel_delayed_work_sync(&guc->timestamp.work); + else + cancel_delayed_work(&guc->timestamp.work); } static void __reset_guc_busyness_stats(struct intel_guc *guc) @@ -1370,7 +1373,7 @@ static void __reset_guc_busyness_stats(struct intel_guc *guc) unsigned long flags; ktime_t unused; - guc_cancel_busyness_worker(guc); + guc_cancel_busyness_worker(guc, 0); Should use true/false rather than 1/0 for bool values. Also, this needs a comment actually in the code, not just in the patch description. E.g.: Attempting a synchronous cancel within the reset path leads to a circular mutex locking complaint by lockdep. However, it is safe to use an asynchronous cancel here. If the worker does actually run concurrently with a reset then it will early exit due to the mutex_trylock call rather than block. spin_lock_irqsave(&guc->timestamp.lock, flags); @@ -1485,7 +1488,7 @@ static int guc_init_engine_stats(struct intel_guc *guc) static void guc_fini_engine_stats(struct intel_guc *guc) { - guc_cancel_busyness_worker(guc); + guc_cancel_busyness_worker(guc, 1); } void intel_guc_busyness_park(struct intel_gt *gt) @@ -1500,7 +1503,7 @@ void intel_guc_busyness_park(struct intel_gt *gt) * and causes an unclaimed register access warning. Cancel the worker * synchronously here. */ - guc_cancel_busyness_worker(guc); + guc_cancel_busyness_worker(guc, 1); /* * Before parking, we should sample engine busyness stats if we need to. @@ -4503,7 +4506,7 @@ int intel_guc_submission_enable(struct intel_guc *guc) /* Note: By the time we're here, GuC may have already been reset */ void intel_guc_submission_disable(struct intel_guc *guc) { - guc_cancel_busyness_worker(guc); + guc_cancel_busyness_worker(guc, 0); Hmm. I think this is going to need breaking up further back in the stack. We definitely want to be doing a synchronous cancel in the general case of disabling submission (e.g. due to suspend or driver unload). But if this is happening as part of a reset call stack, then it is a problem. AFAICT, the only way _submission_disable would be called within a reset is if __uc_init_hw() failed for some reason. So one option would be to add the sync/async flag to _submission_disable() as well and just make the init failure case async with other callers being sync. A better option might be to add an 'are busyness stats enabled' boolean to the guc structure. And inside the cancel function, early exit if the worker is not actually enabled (and set the flag anywhere and everywhere that does a mod_work to enable it, which is just guc_enable_busyness_worker() I think?). That would mean that multiple disables do nothing. So e.g. a failed reset will cancel the worker asynchronously in reset prepare but then not try to cancel it again synchronously in submission disable. Hmm, except that init_hw has already enabled the worker by that point :(. FYC: Umesh and Daniele... any thoughts? I would be tempted to say is there any way we can just add a lockdep annotation to ignore this issue? The lockdep splat described in the patch description above seems like a false positive to me. Sure the reset lock is held by the reset code which is now trying to synchronously flush the busyness worker thread which also takes the reset lock. But the busyness worker thread does a trylock and will abort if the lock is already held. So no issue... However. I think we do have a genuine issue with the internal delayed worker lock itself, which has maybe shown up in other lockdep splat reports. For example, if a worker thread triggers a reset (e.g. anything reset related coming in via a G2H) then the reset code is running inside a worker thread. Which is maybe holding an internal kernel worker thread lock? So if the reset path does a synchronous cancel of another worker thread, that also requires taking the worker thread lock and thus a deadlock occurs. John. /* Semaphore interrupt disable and route to host */ guc_route_semaphores(guc, false);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a0e3ef1c65d2..cca6960d3490 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1357,9 +1357,12 @@ static void guc_enable_busyness_worker(struct intel_guc *guc) mod_delayed_work(system_highpri_wq, &guc->timestamp.work, guc->timestamp.ping_delay); } -static void guc_cancel_busyness_worker(struct intel_guc *guc) +static void guc_cancel_busyness_worker(struct intel_guc *guc, bool sync) { - cancel_delayed_work_sync(&guc->timestamp.work); + if (sync) + cancel_delayed_work_sync(&guc->timestamp.work); + else + cancel_delayed_work(&guc->timestamp.work); } static void __reset_guc_busyness_stats(struct intel_guc *guc) @@ -1370,7 +1373,7 @@ static void __reset_guc_busyness_stats(struct intel_guc *guc) unsigned long flags; ktime_t unused; - guc_cancel_busyness_worker(guc); + guc_cancel_busyness_worker(guc, 0); spin_lock_irqsave(&guc->timestamp.lock, flags); @@ -1485,7 +1488,7 @@ static int guc_init_engine_stats(struct intel_guc *guc) static void guc_fini_engine_stats(struct intel_guc *guc) { - guc_cancel_busyness_worker(guc); + guc_cancel_busyness_worker(guc, 1); } void intel_guc_busyness_park(struct intel_gt *gt) @@ -1500,7 +1503,7 @@ void intel_guc_busyness_park(struct intel_gt *gt) * and causes an unclaimed register access warning. Cancel the worker * synchronously here. */ - guc_cancel_busyness_worker(guc); + guc_cancel_busyness_worker(guc, 1); /* * Before parking, we should sample engine busyness stats if we need to. @@ -4503,7 +4506,7 @@ int intel_guc_submission_enable(struct intel_guc *guc) /* Note: By the time we're here, GuC may have already been reset */ void intel_guc_submission_disable(struct intel_guc *guc) { - guc_cancel_busyness_worker(guc); + guc_cancel_busyness_worker(guc, 0); /* Semaphore interrupt disable and route to host */ guc_route_semaphores(guc, false);
This attempts to avoid circular locking dependency between flush delayed work and intel_gt_reset. Switched from cancel_delayed_work_sync to cancel_delayed_work, the non-sync version for reset path, it is safe as the worker has the trylock code to handle the lock; Meanwhile keep the sync version for park/fini to ensure the worker is not still running during suspend or shutdown. WARNING: possible circular locking dependency detected 6.4.0-rc1-drmtip_1340-g31e3463b0edb+ #1 Not tainted ------------------------------------------------------ kms_pipe_crc_ba/6415 is trying to acquire lock: ffff88813e6cc640 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}, at: __flush_work+0x42/0x530 but task is already holding lock: ffff88813e6cce90 (>->reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (>->reset.mutex){+.+.}-{3:3}: lock_acquire+0xd8/0x2d0 i915_gem_shrinker_taints_mutex+0x31/0x50 [i915] intel_gt_init_reset+0x65/0x80 [i915] intel_gt_common_init_early+0xe1/0x170 [i915] intel_root_gt_init_early+0x48/0x60 [i915] i915_driver_probe+0x671/0xcb0 [i915] i915_pci_probe+0xdc/0x210 [i915] pci_device_probe+0x95/0x120 really_probe+0x164/0x3c0 __driver_probe_device+0x73/0x160 driver_probe_device+0x19/0xa0 __driver_attach+0xb6/0x180 bus_for_each_dev+0x77/0xd0 bus_add_driver+0x114/0x210 driver_register+0x5b/0x110 __pfx_vgem_open+0x3/0x10 [vgem] do_one_initcall+0x57/0x270 do_init_module+0x5f/0x220 load_module+0x1ca4/0x1f00 __do_sys_finit_module+0xb4/0x130 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc -> #2 (fs_reclaim){+.+.}-{0:0}: lock_acquire+0xd8/0x2d0 fs_reclaim_acquire+0xac/0xe0 kmem_cache_alloc+0x32/0x260 i915_vma_instance+0xb2/0xc60 [i915] i915_gem_object_ggtt_pin_ww+0x175/0x370 [i915] vm_fault_gtt+0x22d/0xf60 [i915] __do_fault+0x2f/0x1d0 do_pte_missing+0x4a/0xd20 __handle_mm_fault+0x5b0/0x790 handle_mm_fault+0xa2/0x230 do_user_addr_fault+0x3ea/0xa10 exc_page_fault+0x68/0x1a0 asm_exc_page_fault+0x26/0x30 -> #1 (>->reset.backoff_srcu){++++}-{0:0}: lock_acquire+0xd8/0x2d0 _intel_gt_reset_lock+0x57/0x330 [i915] guc_timestamp_ping+0x35/0x130 [i915] process_one_work+0x250/0x510 worker_thread+0x4f/0x3a0 kthread+0xff/0x130 ret_from_fork+0x29/0x50 -> #0 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}: check_prev_add+0x90/0xc60 __lock_acquire+0x1998/0x2590 lock_acquire+0xd8/0x2d0 __flush_work+0x74/0x530 __cancel_work_timer+0x14f/0x1f0 intel_guc_submission_reset_prepare+0x81/0x4b0 [i915] intel_uc_reset_prepare+0x9c/0x120 [i915] reset_prepare+0x21/0x60 [i915] intel_gt_reset+0x1dd/0x470 [i915] intel_gt_reset_global+0xfb/0x170 [i915] intel_gt_handle_error+0x368/0x420 [i915] intel_gt_debugfs_reset_store+0x5c/0xc0 [i915] i915_wedged_set+0x29/0x40 [i915] simple_attr_write_xsigned.constprop.0+0xb4/0x110 full_proxy_write+0x52/0x80 vfs_write+0xc5/0x4f0 ksys_write+0x64/0xe0 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc other info that might help us debug this: Chain exists of: (work_completion)(&(&guc->timestamp.work)->work) --> fs_reclaim --> >->reset.mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(>->reset.mutex); lock(fs_reclaim); lock(>->reset.mutex); lock((work_completion)(&(&guc->timestamp.work)->work)); *** DEADLOCK *** 3 locks held by kms_pipe_crc_ba/6415: #0: ffff888101541430 (sb_writers#15){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 #1: ffff888136c7eab8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_write_xsigned.constprop.0+0x47/0x110 #2: ffff88813e6cce90 (>->reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)