Message ID | 20191004100025.70798-1-mika.westerberg@linux.intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | [1/2] bdi: Do not use freezable workqueue | expand |
On 10/4/19 4:00 AM, Mika Westerberg wrote: > A removable block device, such as NVMe or SSD connected over Thunderbolt > can be hot-removed any time including when the system is suspended. When > device is hot-removed during suspend and the system gets resumed, kernel > first resumes devices and then thaws the userspace including freezable > workqueues. What happens in that case is that the NVMe driver notices > that the device is unplugged and removes it from the system. This ends > up calling bdi_unregister() for the gendisk which then schedules > wb_workfn() to be run one more time. > > However, since the bdi_wq is still frozen flush_delayed_work() call in > wb_shutdown() blocks forever halting system resume process. User sees > this as hang as nothing is happening anymore. > > Triggering sysrq-w reveals this: > > Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme] > Call Trace: > ? __schedule+0x2c5/0x630 > ? wait_for_completion+0xa4/0x120 > schedule+0x3e/0xc0 > schedule_timeout+0x1c9/0x320 > ? resched_curr+0x1f/0xd0 > ? wait_for_completion+0xa4/0x120 > wait_for_completion+0xc3/0x120 > ? wake_up_q+0x60/0x60 > __flush_work+0x131/0x1e0 > ? flush_workqueue_prep_pwqs+0x130/0x130 > bdi_unregister+0xb9/0x130 > del_gendisk+0x2d2/0x2e0 > nvme_ns_remove+0xed/0x110 [nvme_core] > nvme_remove_namespaces+0x96/0xd0 [nvme_core] > nvme_remove+0x5b/0x160 [nvme] > pci_device_remove+0x36/0x90 > device_release_driver_internal+0xdf/0x1c0 > nvme_remove_dead_ctrl_work+0x14/0x30 [nvme] > process_one_work+0x1c2/0x3f0 > worker_thread+0x48/0x3e0 > kthread+0x100/0x140 > ? current_work+0x30/0x30 > ? kthread_park+0x80/0x80 > ret_from_fork+0x35/0x40 > > This is not limited to NVMes so exactly same issue can be reproduced by > hot-removing SSD (over Thunderbolt) while the system is suspended. > > Prevent this from happening by removing WQ_FREEZABLE from bdi_wq. This series looks good for me, I don't think there's a reason for the workers to be marked freezable.
On Fri, Oct 4, 2019 at 3:22 PM Jens Axboe <axboe@kernel.dk> wrote: > > On 10/4/19 4:00 AM, Mika Westerberg wrote: > > A removable block device, such as NVMe or SSD connected over Thunderbolt > > can be hot-removed any time including when the system is suspended. When > > device is hot-removed during suspend and the system gets resumed, kernel > > first resumes devices and then thaws the userspace including freezable > > workqueues. What happens in that case is that the NVMe driver notices > > that the device is unplugged and removes it from the system. This ends > > up calling bdi_unregister() for the gendisk which then schedules > > wb_workfn() to be run one more time. > > > > However, since the bdi_wq is still frozen flush_delayed_work() call in > > wb_shutdown() blocks forever halting system resume process. User sees > > this as hang as nothing is happening anymore. > > > > Triggering sysrq-w reveals this: > > > > Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme] > > Call Trace: > > ? __schedule+0x2c5/0x630 > > ? wait_for_completion+0xa4/0x120 > > schedule+0x3e/0xc0 > > schedule_timeout+0x1c9/0x320 > > ? resched_curr+0x1f/0xd0 > > ? wait_for_completion+0xa4/0x120 > > wait_for_completion+0xc3/0x120 > > ? wake_up_q+0x60/0x60 > > __flush_work+0x131/0x1e0 > > ? flush_workqueue_prep_pwqs+0x130/0x130 > > bdi_unregister+0xb9/0x130 > > del_gendisk+0x2d2/0x2e0 > > nvme_ns_remove+0xed/0x110 [nvme_core] > > nvme_remove_namespaces+0x96/0xd0 [nvme_core] > > nvme_remove+0x5b/0x160 [nvme] > > pci_device_remove+0x36/0x90 > > device_release_driver_internal+0xdf/0x1c0 > > nvme_remove_dead_ctrl_work+0x14/0x30 [nvme] > > process_one_work+0x1c2/0x3f0 > > worker_thread+0x48/0x3e0 > > kthread+0x100/0x140 > > ? current_work+0x30/0x30 > > ? kthread_park+0x80/0x80 > > ret_from_fork+0x35/0x40 > > > > This is not limited to NVMes so exactly same issue can be reproduced by > > hot-removing SSD (over Thunderbolt) while the system is suspended. > > > > Prevent this from happening by removing WQ_FREEZABLE from bdi_wq. > > This series looks good for me, I don't think there's a reason for > the workers to be marked freezable. I was a bit concerned that the original idea might be to prevent writes to the persistent storage from occurring after creating an image during hibernation, but if that's not the case, the series is fine from the general power management standpoint, so Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> for both patches.
diff --git a/mm/backing-dev.c b/mm/backing-dev.c index d9daa3e422d0..c360f6a6c844 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -239,8 +239,8 @@ static int __init default_bdi_init(void) { int err; - bdi_wq = alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_FREEZABLE | - WQ_UNBOUND | WQ_SYSFS, 0); + bdi_wq = alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_UNBOUND | + WQ_SYSFS, 0); if (!bdi_wq) return -ENOMEM;
A removable block device, such as NVMe or SSD connected over Thunderbolt can be hot-removed any time including when the system is suspended. When device is hot-removed during suspend and the system gets resumed, kernel first resumes devices and then thaws the userspace including freezable workqueues. What happens in that case is that the NVMe driver notices that the device is unplugged and removes it from the system. This ends up calling bdi_unregister() for the gendisk which then schedules wb_workfn() to be run one more time. However, since the bdi_wq is still frozen flush_delayed_work() call in wb_shutdown() blocks forever halting system resume process. User sees this as hang as nothing is happening anymore. Triggering sysrq-w reveals this: Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme] Call Trace: ? __schedule+0x2c5/0x630 ? wait_for_completion+0xa4/0x120 schedule+0x3e/0xc0 schedule_timeout+0x1c9/0x320 ? resched_curr+0x1f/0xd0 ? wait_for_completion+0xa4/0x120 wait_for_completion+0xc3/0x120 ? wake_up_q+0x60/0x60 __flush_work+0x131/0x1e0 ? flush_workqueue_prep_pwqs+0x130/0x130 bdi_unregister+0xb9/0x130 del_gendisk+0x2d2/0x2e0 nvme_ns_remove+0xed/0x110 [nvme_core] nvme_remove_namespaces+0x96/0xd0 [nvme_core] nvme_remove+0x5b/0x160 [nvme] pci_device_remove+0x36/0x90 device_release_driver_internal+0xdf/0x1c0 nvme_remove_dead_ctrl_work+0x14/0x30 [nvme] process_one_work+0x1c2/0x3f0 worker_thread+0x48/0x3e0 kthread+0x100/0x140 ? current_work+0x30/0x30 ? kthread_park+0x80/0x80 ret_from_fork+0x35/0x40 This is not limited to NVMes so exactly same issue can be reproduced by hot-removing SSD (over Thunderbolt) while the system is suspended. Prevent this from happening by removing WQ_FREEZABLE from bdi_wq. Reported-by: AceLan Kao <acelan.kao@canonical.com> Link: https://marc.info/?l=linux-kernel&m=138695698516487 Link: https://bugzilla.kernel.org/show_bug.cgi?id=204385 Link: https://lore.kernel.org/lkml/20191002122136.GD2819@lahna.fi.intel.com/#t Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> --- I'm not too familiar with the bdi and block layer so it may be there is a good reason having freezabe bdi_wq and we need to re-think how this could be solved. This problem is easy to reproduce with Thunderbolt capable systems by doing following steps: 1. Connect NVMe or SSD over Thunderbolt 2. Suspend the system (mem, s2idle) 3. Detach the NVMe or SSD 4. Resume system mm/backing-dev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)