Message ID | 20230523012727.3042247-1-yukuai1@huaweicloud.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] md: fix duplicate filename for rdev | expand |
On Mon, May 22, 2023 at 6:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > From: Yu Kuai <yukuai3@huawei.com> > > Commit 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device > from an md array via sysfs") delays the deletion of rdev, however, this > introduces a window that rdev can be added again while the deletion is > not done yet, and sysfs will complain about duplicate filename. > > Follow up patches try to fix this problem by flushing workqueue, however, > flush_rdev_wq() is just dead code, the progress in > md_kick_rdev_from_array(): > > 1) list_del_rcu(&rdev->same_set); > 2) synchronize_rcu(); > 3) queue_work(md_rdev_misc_wq, &rdev->del_work); > > So in flush_rdev_wq(), if rdev is found in the list, work_pending() can > never pass, in the meantime, if work is queued, then rdev can never be > found in the list. > > flush_rdev_wq() can be replaced by flush_workqueue() directly, however, > this approach is not good: > - the workqueue is global, this synchronization for all raid disks is > not necessary. > - flush_workqueue can't be called under 'reconfig_mutex', there is still > a small window between flush_workqueue() and mddev_lock() that other > contexts can queue new work, hence the problem is not solved completely. > > sysfs already has apis to support delete itself through writer, and > these apis, specifically sysfs_break/unbreak_active_protection(), is used > to support deleting rdev synchronously. Therefore, the above commit can be > reverted, and sysfs duplicate filename can be avoided. > > A new mdadm regression test is proposed as well([1]). > > [1] https://lore.kernel.org/linux-raid/20230428062845.1975462-1-yukuai1@huaweicloud.com/ > Fixes: 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device from an md array via sysfs") > Signed-off-by: Yu Kuai <yukuai3@huawei.com> Thanks for the fix! I made the following changes and applied it to md-next: 1. remove md_rdev->del_work, which is not used any more; 2. change list_empty_safe to list_empty protected by the mutex, as list_empty_safe doesn't seem safe here. Please let me know if either change doesn't make sense. Thanks, Song
Hi, 在 2023/05/24 2:05, Song Liu 写道: > On Mon, May 22, 2023 at 6:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> From: Yu Kuai <yukuai3@huawei.com> >> >> Commit 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device >> from an md array via sysfs") delays the deletion of rdev, however, this >> introduces a window that rdev can be added again while the deletion is >> not done yet, and sysfs will complain about duplicate filename. >> >> Follow up patches try to fix this problem by flushing workqueue, however, >> flush_rdev_wq() is just dead code, the progress in >> md_kick_rdev_from_array(): >> >> 1) list_del_rcu(&rdev->same_set); >> 2) synchronize_rcu(); >> 3) queue_work(md_rdev_misc_wq, &rdev->del_work); >> >> So in flush_rdev_wq(), if rdev is found in the list, work_pending() can >> never pass, in the meantime, if work is queued, then rdev can never be >> found in the list. >> >> flush_rdev_wq() can be replaced by flush_workqueue() directly, however, >> this approach is not good: >> - the workqueue is global, this synchronization for all raid disks is >> not necessary. >> - flush_workqueue can't be called under 'reconfig_mutex', there is still >> a small window between flush_workqueue() and mddev_lock() that other >> contexts can queue new work, hence the problem is not solved completely. >> >> sysfs already has apis to support delete itself through writer, and >> these apis, specifically sysfs_break/unbreak_active_protection(), is used >> to support deleting rdev synchronously. Therefore, the above commit can be >> reverted, and sysfs duplicate filename can be avoided. >> >> A new mdadm regression test is proposed as well([1]). >> >> [1] https://lore.kernel.org/linux-raid/20230428062845.1975462-1-yukuai1@huaweicloud.com/ >> Fixes: 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device from an md array via sysfs") >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > Thanks for the fix! I made the following changes and applied it > to md-next: > > 1. remove md_rdev->del_work, which is not used any more; > 2. change list_empty_safe to list_empty protected by the mutex, as > list_empty_safe doesn't seem safe here. Yes, this make sense, we must make sure caller won't see stale rdev through sysfs: t1: remove rdev t2: mutex_lock(reconfig_mutex) list_add mutex_unlock(reconfig_mutex) mutex_lock(reconfig_mutex) mutex_unlock(reconfig_mutex) mutex_lock(delete_mutex) list_del_init list_empty_careful -> list is empty now, return, caller will think rdev is removed, however, since export_rdev is not called yet, adding this rdev again will fail. kobject_del hold mutex is safe, and I think performance should be ok because remove rdev is not hot path. If we don't want to hold a new mutex in hot path, perhaps list_empty_careful can work with following changes, remove rdev from the list after sysfs entries is removed: diff --git a/drivers/md/md.c b/drivers/md/md.c index 296885798a2b..84dce5822f91 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -758,8 +758,8 @@ static void md_free_rdev(struct mddev *mddev) mutex_lock(&mddev->delete_mutex); list_for_each_entry_safe(rdev, tmp, &mddev->deleting, same_set) { - list_del_init(&rdev->same_set); kobject_del(&rdev->kobj); + list_del_init(&rdev->same_set); export_rdev(rdev); } mutex_unlock(&mddev->delete_mutex); > > Please let me know if either change doesn't make sense. > > Thanks, > Song > . >
On Tue, May 23, 2023 at 6:33 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2023/05/24 2:05, Song Liu 写道: > > On Mon, May 22, 2023 at 6:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> From: Yu Kuai <yukuai3@huawei.com> > >> > >> Commit 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device > >> from an md array via sysfs") delays the deletion of rdev, however, this > >> introduces a window that rdev can be added again while the deletion is > >> not done yet, and sysfs will complain about duplicate filename. > >> > >> Follow up patches try to fix this problem by flushing workqueue, however, > >> flush_rdev_wq() is just dead code, the progress in > >> md_kick_rdev_from_array(): > >> > >> 1) list_del_rcu(&rdev->same_set); > >> 2) synchronize_rcu(); > >> 3) queue_work(md_rdev_misc_wq, &rdev->del_work); > >> > >> So in flush_rdev_wq(), if rdev is found in the list, work_pending() can > >> never pass, in the meantime, if work is queued, then rdev can never be > >> found in the list. > >> > >> flush_rdev_wq() can be replaced by flush_workqueue() directly, however, > >> this approach is not good: > >> - the workqueue is global, this synchronization for all raid disks is > >> not necessary. > >> - flush_workqueue can't be called under 'reconfig_mutex', there is still > >> a small window between flush_workqueue() and mddev_lock() that other > >> contexts can queue new work, hence the problem is not solved completely. > >> > >> sysfs already has apis to support delete itself through writer, and > >> these apis, specifically sysfs_break/unbreak_active_protection(), is used > >> to support deleting rdev synchronously. Therefore, the above commit can be > >> reverted, and sysfs duplicate filename can be avoided. > >> > >> A new mdadm regression test is proposed as well([1]). > >> > >> [1] https://lore.kernel.org/linux-raid/20230428062845.1975462-1-yukuai1@huaweicloud.com/ > >> Fixes: 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device from an md array via sysfs") > >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > > > Thanks for the fix! I made the following changes and applied it > > to md-next: > > > > 1. remove md_rdev->del_work, which is not used any more; > > 2. change list_empty_safe to list_empty protected by the mutex, as > > list_empty_safe doesn't seem safe here. Hmm.. it appears that I missed a circular locking dependency with mdadm test 21raid5cache (delete_mutex and open_mutex). Please take a look at this. Thanks, Song [ 239.802277] ====================================================== [ 239.803650] WARNING: possible circular locking dependency detected [ 239.804929] 6.4.0-rc2+ #772 Not tainted [ 239.805569] ------------------------------------------------------ [ 239.806568] kworker/20:1/222 is trying to acquire lock: [ 239.807406] ffff88815335b3f0 (&mddev->delete_mutex){+.+.}-{3:3}, at: mddev_unlock+0xe0/0x2d0 [ 239.808653] but task is already holding lock: [ 239.809481] ffffc9000246fe00 ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}, at: process_one_work+0x462/0xa50 [ 239.811049] which lock already depends on the new lock. [ 239.812187] the existing dependency chain (in reverse order) is: [ 239.813230] -> #3 ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}: [ 239.814468] __flush_work+0xdb/0x690 [ 239.815068] r5l_exit_log+0x59/0xc0 [ 239.815649] free_conf+0x34/0x320 [ 239.816243] raid5_free+0x11/0x40 [ 239.816788] __md_stop+0x9f/0x140 [ 239.817336] do_md_stop+0x2af/0xaf0 [ 239.817901] md_ioctl+0xb34/0x1e30 [ 239.818469] blkdev_ioctl+0x2bf/0x3d0 [ 239.819079] __x64_sys_ioctl+0xbe/0x100 [ 239.819701] do_syscall_64+0x3a/0x90 [ 239.820294] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 239.821076] -> #2 (&mddev->open_mutex){+.+.}-{3:3}: [ 239.821971] __mutex_lock+0x11d/0x13f0 [ 239.822592] md_open+0xad/0x180 [ 239.822937] kobject: 'md0' (ffff88811c8e5498): kobject_uevent_env [ 239.823113] blkdev_get_whole+0x50/0x120 [ 239.824216] kobject: 'md0' (ffff88811c8e5498): fill_kobj_path: path = '/devices/virtual/block/md0' [ 239.824725] blkdev_get_by_dev+0x309/0x4f0 [ 239.826747] blkdev_open+0x8a/0x110 [ 239.827319] do_dentry_open+0x2a5/0x7b0 [ 239.827939] path_openat+0xcee/0x1070 [ 239.828545] do_filp_open+0x148/0x1d0 [ 239.829137] do_sys_openat2+0x2ec/0x470 [ 239.829791] do_sys_open+0x8a/0xd0 [ 239.830352] do_syscall_64+0x3a/0x90 [ 239.830935] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 239.831728] -> #1 (&disk->open_mutex){+.+.}-{3:3}: [ 239.832615] __mutex_lock+0x11d/0x13f0 [ 239.833231] blkdev_put+0x65/0x350 [ 239.833785] export_rdev.isra.63+0x72/0xe0 [ 239.834456] mddev_unlock+0x1b1/0x2d0 [ 239.835043] md_ioctl+0x96c/0x1e30 [ 239.835616] blkdev_ioctl+0x2bf/0x3d0 [ 239.836213] __x64_sys_ioctl+0xbe/0x100 [ 239.836828] do_syscall_64+0x3a/0x90 [ 239.837414] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 239.838225] -> #0 (&mddev->delete_mutex){+.+.}-{3:3}: [ 239.839145] __lock_acquire+0x1e42/0x34b0 [ 239.839797] lock_acquire+0x161/0x3d0 [ 239.840395] __mutex_lock+0x11d/0x13f0 [ 239.840999] mddev_unlock+0xe0/0x2d0 [ 239.841582] r5c_disable_writeback_async+0x261/0x270 [ 239.842354] process_one_work+0x55f/0xa50 [ 239.842996] worker_thread+0x69/0x660 [ 239.843595] kthread+0x1a1/0x1e0 [ 239.844131] ret_from_fork+0x2c/0x50 [ 239.844712] other info that might help us debug this: [ 239.845813] Chain exists of: &mddev->delete_mutex --> &mddev->open_mutex --> (work_completion)(&log->disable_writeback_work) [ 239.847776] Possible unsafe locking scenario: [ 239.848623] CPU0 CPU1 [ 239.849262] ---- ---- [ 239.849927] lock((work_completion)(&log->disable_writeback_work)); [ 239.850831] lock(&mddev->open_mutex); [ 239.851719] lock((work_completion)(&log->disable_writeback_work)); [ 239.852945] lock(&mddev->delete_mutex); [ 239.853517] *** DEADLOCK *** [ 239.854345] 2 locks held by kworker/20:1/222: [ 239.854960] #0: ffff8881000b2748 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x462/0xa50 [ 239.856287] #1: ffffc9000246fe00 ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}, at: process_one_work+0x462/0xa50 [ 239.857939] stack backtrace: [ 239.858567] CPU: 20 PID: 222 Comm: kworker/20:1 Not tainted 6.4.0-rc2+ #772 [ 239.859533] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 239.861135] Workqueue: events r5c_disable_writeback_async [ 239.861903] Call Trace: [ 239.862281] <TASK> [ 239.862606] dump_stack_lvl+0x46/0x80 [ 239.863144] check_noncircular+0x1ff/0x240 [ 239.863737] ? __pfx_check_noncircular+0x10/0x10 [ 239.864401] ? mark_lock.part.45+0x11a/0x1350 [ 239.865019] ? add_chain_block+0x23b/0x310 [ 239.865624] __lock_acquire+0x1e42/0x34b0 [ 239.866202] ? select_task_rq_fair+0x2b0/0x1e50 [ 239.866845] ? __pfx___lock_acquire+0x10/0x10 [ 239.867473] ? ttwu_queue_wakelist+0x1cc/0x1f0 [ 239.868107] ? __smp_call_single_queue+0x137/0x2a0 [ 239.868816] ? __default_send_IPI_dest_field+0x2b/0xa0 [ 239.869548] ? __lock_acquire+0xa5d/0x34b0 [ 239.870131] lock_acquire+0x161/0x3d0 [ 239.870660] ? mddev_unlock+0xe0/0x2d0 [ 239.871216] ? __pfx_lock_acquire+0x10/0x10 [ 239.871808] ? lock_is_held_type+0xd8/0x130 [ 239.872405] __mutex_lock+0x11d/0x13f0 [ 239.872936] ? mddev_unlock+0xe0/0x2d0 [ 239.873474] ? mddev_unlock+0xe0/0x2d0 [ 239.874006] ? __mutex_unlock_slowpath+0x12c/0x410 [ 239.874695] ? __pfx___mutex_lock+0x10/0x10 [ 239.875297] ? __pfx_rcu_read_lock_held+0x10/0x10 [ 239.875986] ? mddev_unlock+0xe0/0x2d0 [ 239.876530] mddev_unlock+0xe0/0x2d0 [ 239.877044] r5c_disable_writeback_async+0x261/0x270 [ 239.877753] ? __pfx_r5c_disable_writeback_async+0x10/0x10 [ 239.878531] ? __switch_to+0x2d8/0x770 [ 239.879066] ? __pfx_autoremove_wake_function+0x10/0x10 [ 239.879813] process_one_work+0x55f/0xa50 [ 239.880398] ? __pfx_process_one_work+0x10/0x10 [ 239.881073] ? _raw_spin_lock_irq+0x5c/0x90 [ 239.881675] worker_thread+0x69/0x660 [ 239.882206] ? __kthread_parkme+0xe4/0x100 [ 239.882786] ? __pfx_worker_thread+0x10/0x10 [ 239.883395] kthread+0x1a1/0x1e0 [ 239.883863] ? __pfx_kthread+0x10/0x10 [ 239.884406] ret_from_fork+0x2c/0x50 [ 239.884925] </TASK>
Hi, 在 2023/06/21 5:38, Song Liu 写道: > On Tue, May 23, 2023 at 6:33 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2023/05/24 2:05, Song Liu 写道: >>> On Mon, May 22, 2023 at 6:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>>> >>>> From: Yu Kuai <yukuai3@huawei.com> >>>> >>>> Commit 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device >>>> from an md array via sysfs") delays the deletion of rdev, however, this >>>> introduces a window that rdev can be added again while the deletion is >>>> not done yet, and sysfs will complain about duplicate filename. >>>> >>>> Follow up patches try to fix this problem by flushing workqueue, however, >>>> flush_rdev_wq() is just dead code, the progress in >>>> md_kick_rdev_from_array(): >>>> >>>> 1) list_del_rcu(&rdev->same_set); >>>> 2) synchronize_rcu(); >>>> 3) queue_work(md_rdev_misc_wq, &rdev->del_work); >>>> >>>> So in flush_rdev_wq(), if rdev is found in the list, work_pending() can >>>> never pass, in the meantime, if work is queued, then rdev can never be >>>> found in the list. >>>> >>>> flush_rdev_wq() can be replaced by flush_workqueue() directly, however, >>>> this approach is not good: >>>> - the workqueue is global, this synchronization for all raid disks is >>>> not necessary. >>>> - flush_workqueue can't be called under 'reconfig_mutex', there is still >>>> a small window between flush_workqueue() and mddev_lock() that other >>>> contexts can queue new work, hence the problem is not solved completely. >>>> >>>> sysfs already has apis to support delete itself through writer, and >>>> these apis, specifically sysfs_break/unbreak_active_protection(), is used >>>> to support deleting rdev synchronously. Therefore, the above commit can be >>>> reverted, and sysfs duplicate filename can be avoided. >>>> >>>> A new mdadm regression test is proposed as well([1]). >>>> >>>> [1] https://lore.kernel.org/linux-raid/20230428062845.1975462-1-yukuai1@huaweicloud.com/ >>>> Fixes: 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device from an md array via sysfs") >>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>> >>> Thanks for the fix! I made the following changes and applied it >>> to md-next: >>> >>> 1. remove md_rdev->del_work, which is not used any more; >>> 2. change list_empty_safe to list_empty protected by the mutex, as >>> list_empty_safe doesn't seem safe here. > > Hmm.. it appears that I missed a circular locking dependency with mdadm > test 21raid5cache (delete_mutex and open_mutex). > > Please take a look at this. > Thanks, I think following patch can fix this, but run this test keep failing in my VM. Song, can you pass this test? diff --git a/drivers/md/md.c b/drivers/md/md.c index 1be64ab76f5c..62b7d2d0e873 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -766,17 +766,29 @@ static void md_free_rdev(struct mddev *mddev) { struct md_rdev *rdev; struct md_rdev *tmp; + LIST_HEAD(delete); - mutex_lock(&mddev->delete_mutex); - if (list_empty(&mddev->deleting)) - goto out; +retry: + if (list_empty_careful(&mddev->deleting)) + return; + + if (!mutex_trylock(&mddev->delete_mutex)) + goto retry; - list_for_each_entry_safe(rdev, tmp, &mddev->deleting, same_set) { + /* + * Clear 'mddev->deleting' first, because export_rdev can trigger + * mddev_unlock(), and grab 'delete_mutex' again will cause deadlock. + * make sure md_free_rdev() will exit directly untill all rdev is + * deleted. + */ + list_splice(&delete, &mddev->deleting); + + list_for_each_entry_safe(rdev, tmp, &delete, same_set) { list_del_init(&rdev->same_set); kobject_del(&rdev->kobj); export_rdev(rdev, mddev); } -out: + mutex_unlock(&mddev->delete_mutex); } > Thanks, > Song > > [ 239.802277] ====================================================== > [ 239.803650] WARNING: possible circular locking dependency detected > [ 239.804929] 6.4.0-rc2+ #772 Not tainted > [ 239.805569] ------------------------------------------------------ > [ 239.806568] kworker/20:1/222 is trying to acquire lock: > [ 239.807406] ffff88815335b3f0 (&mddev->delete_mutex){+.+.}-{3:3}, > at: mddev_unlock+0xe0/0x2d0 > [ 239.808653] > but task is already holding lock: > [ 239.809481] ffffc9000246fe00 > ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}, at: > process_one_work+0x462/0xa50 > [ 239.811049] > which lock already depends on the new lock. > > [ 239.812187] > the existing dependency chain (in reverse order) is: > [ 239.813230] > -> #3 > ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}: > [ 239.814468] __flush_work+0xdb/0x690 > [ 239.815068] r5l_exit_log+0x59/0xc0 > [ 239.815649] free_conf+0x34/0x320 > [ 239.816243] raid5_free+0x11/0x40 > [ 239.816788] __md_stop+0x9f/0x140 > [ 239.817336] do_md_stop+0x2af/0xaf0 > [ 239.817901] md_ioctl+0xb34/0x1e30 > [ 239.818469] blkdev_ioctl+0x2bf/0x3d0 > [ 239.819079] __x64_sys_ioctl+0xbe/0x100 > [ 239.819701] do_syscall_64+0x3a/0x90 > [ 239.820294] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 239.821076] > -> #2 (&mddev->open_mutex){+.+.}-{3:3}: > [ 239.821971] __mutex_lock+0x11d/0x13f0 > [ 239.822592] md_open+0xad/0x180 > [ 239.822937] kobject: 'md0' (ffff88811c8e5498): kobject_uevent_env > [ 239.823113] blkdev_get_whole+0x50/0x120 > [ 239.824216] kobject: 'md0' (ffff88811c8e5498): fill_kobj_path: path > = '/devices/virtual/block/md0' > [ 239.824725] blkdev_get_by_dev+0x309/0x4f0 > [ 239.826747] blkdev_open+0x8a/0x110 > [ 239.827319] do_dentry_open+0x2a5/0x7b0 > [ 239.827939] path_openat+0xcee/0x1070 > [ 239.828545] do_filp_open+0x148/0x1d0 > [ 239.829137] do_sys_openat2+0x2ec/0x470 > [ 239.829791] do_sys_open+0x8a/0xd0 > [ 239.830352] do_syscall_64+0x3a/0x90 > [ 239.830935] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 239.831728] > -> #1 (&disk->open_mutex){+.+.}-{3:3}: > [ 239.832615] __mutex_lock+0x11d/0x13f0 > [ 239.833231] blkdev_put+0x65/0x350 > [ 239.833785] export_rdev.isra.63+0x72/0xe0 > [ 239.834456] mddev_unlock+0x1b1/0x2d0 > [ 239.835043] md_ioctl+0x96c/0x1e30 > [ 239.835616] blkdev_ioctl+0x2bf/0x3d0 > [ 239.836213] __x64_sys_ioctl+0xbe/0x100 > [ 239.836828] do_syscall_64+0x3a/0x90 > [ 239.837414] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 239.838225] > -> #0 (&mddev->delete_mutex){+.+.}-{3:3}: > [ 239.839145] __lock_acquire+0x1e42/0x34b0 > [ 239.839797] lock_acquire+0x161/0x3d0 > [ 239.840395] __mutex_lock+0x11d/0x13f0 > [ 239.840999] mddev_unlock+0xe0/0x2d0 > [ 239.841582] r5c_disable_writeback_async+0x261/0x270 > [ 239.842354] process_one_work+0x55f/0xa50 > [ 239.842996] worker_thread+0x69/0x660 > [ 239.843595] kthread+0x1a1/0x1e0 > [ 239.844131] ret_from_fork+0x2c/0x50 > [ 239.844712] > other info that might help us debug this: > > [ 239.845813] Chain exists of: > &mddev->delete_mutex --> &mddev->open_mutex --> > (work_completion)(&log->disable_writeback_work) > > [ 239.847776] Possible unsafe locking scenario: > > [ 239.848623] CPU0 CPU1 > [ 239.849262] ---- ---- > [ 239.849927] lock((work_completion)(&log->disable_writeback_work)); > [ 239.850831] lock(&mddev->open_mutex); > [ 239.851719] > lock((work_completion)(&log->disable_writeback_work)); > [ 239.852945] lock(&mddev->delete_mutex); > [ 239.853517] > *** DEADLOCK *** > > [ 239.854345] 2 locks held by kworker/20:1/222: > [ 239.854960] #0: ffff8881000b2748 > ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x462/0xa50 > [ 239.856287] #1: ffffc9000246fe00 > ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}, at: > process_one_work+0x462/0xa50 > [ 239.857939] > stack backtrace: > [ 239.858567] CPU: 20 PID: 222 Comm: kworker/20:1 Not tainted 6.4.0-rc2+ #772 > [ 239.859533] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 > [ 239.861135] Workqueue: events r5c_disable_writeback_async > [ 239.861903] Call Trace: > [ 239.862281] <TASK> > [ 239.862606] dump_stack_lvl+0x46/0x80 > [ 239.863144] check_noncircular+0x1ff/0x240 > [ 239.863737] ? __pfx_check_noncircular+0x10/0x10 > [ 239.864401] ? mark_lock.part.45+0x11a/0x1350 > [ 239.865019] ? add_chain_block+0x23b/0x310 > [ 239.865624] __lock_acquire+0x1e42/0x34b0 > [ 239.866202] ? select_task_rq_fair+0x2b0/0x1e50 > [ 239.866845] ? __pfx___lock_acquire+0x10/0x10 > [ 239.867473] ? ttwu_queue_wakelist+0x1cc/0x1f0 > [ 239.868107] ? __smp_call_single_queue+0x137/0x2a0 > [ 239.868816] ? __default_send_IPI_dest_field+0x2b/0xa0 > [ 239.869548] ? __lock_acquire+0xa5d/0x34b0 > [ 239.870131] lock_acquire+0x161/0x3d0 > [ 239.870660] ? mddev_unlock+0xe0/0x2d0 > [ 239.871216] ? __pfx_lock_acquire+0x10/0x10 > [ 239.871808] ? lock_is_held_type+0xd8/0x130 > [ 239.872405] __mutex_lock+0x11d/0x13f0 > [ 239.872936] ? mddev_unlock+0xe0/0x2d0 > [ 239.873474] ? mddev_unlock+0xe0/0x2d0 > [ 239.874006] ? __mutex_unlock_slowpath+0x12c/0x410 > [ 239.874695] ? __pfx___mutex_lock+0x10/0x10 > [ 239.875297] ? __pfx_rcu_read_lock_held+0x10/0x10 > [ 239.875986] ? mddev_unlock+0xe0/0x2d0 > [ 239.876530] mddev_unlock+0xe0/0x2d0 > [ 239.877044] r5c_disable_writeback_async+0x261/0x270 > [ 239.877753] ? __pfx_r5c_disable_writeback_async+0x10/0x10 > [ 239.878531] ? __switch_to+0x2d8/0x770 > [ 239.879066] ? __pfx_autoremove_wake_function+0x10/0x10 > [ 239.879813] process_one_work+0x55f/0xa50 > [ 239.880398] ? __pfx_process_one_work+0x10/0x10 > [ 239.881073] ? _raw_spin_lock_irq+0x5c/0x90 > [ 239.881675] worker_thread+0x69/0x660 > [ 239.882206] ? __kthread_parkme+0xe4/0x100 > [ 239.882786] ? __pfx_worker_thread+0x10/0x10 > [ 239.883395] kthread+0x1a1/0x1e0 > [ 239.883863] ? __pfx_kthread+0x10/0x10 > [ 239.884406] ret_from_fork+0x2c/0x50 > [ 239.884925] </TASK> > . >
Hi, 在 2023/06/21 10:25, Yu Kuai 写道: > Hi, > > 在 2023/06/21 5:38, Song Liu 写道: >> On Tue, May 23, 2023 at 6:33 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>> >>> Hi, >>> >>> 在 2023/05/24 2:05, Song Liu 写道: >>>> On Mon, May 22, 2023 at 6:30 PM Yu Kuai <yukuai1@huaweicloud.com> >>>> wrote: >>>>> >>>>> From: Yu Kuai <yukuai3@huawei.com> >>>>> >>>>> Commit 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a >>>>> device >>>>> from an md array via sysfs") delays the deletion of rdev, however, >>>>> this >>>>> introduces a window that rdev can be added again while the deletion is >>>>> not done yet, and sysfs will complain about duplicate filename. >>>>> >>>>> Follow up patches try to fix this problem by flushing workqueue, >>>>> however, >>>>> flush_rdev_wq() is just dead code, the progress in >>>>> md_kick_rdev_from_array(): >>>>> >>>>> 1) list_del_rcu(&rdev->same_set); >>>>> 2) synchronize_rcu(); >>>>> 3) queue_work(md_rdev_misc_wq, &rdev->del_work); >>>>> >>>>> So in flush_rdev_wq(), if rdev is found in the list, work_pending() >>>>> can >>>>> never pass, in the meantime, if work is queued, then rdev can never be >>>>> found in the list. >>>>> >>>>> flush_rdev_wq() can be replaced by flush_workqueue() directly, >>>>> however, >>>>> this approach is not good: >>>>> - the workqueue is global, this synchronization for all raid disks is >>>>> not necessary. >>>>> - flush_workqueue can't be called under 'reconfig_mutex', there is >>>>> still >>>>> a small window between flush_workqueue() and mddev_lock() that >>>>> other >>>>> contexts can queue new work, hence the problem is not solved >>>>> completely. >>>>> >>>>> sysfs already has apis to support delete itself through writer, and >>>>> these apis, specifically sysfs_break/unbreak_active_protection(), >>>>> is used >>>>> to support deleting rdev synchronously. Therefore, the above commit >>>>> can be >>>>> reverted, and sysfs duplicate filename can be avoided. >>>>> >>>>> A new mdadm regression test is proposed as well([1]). >>>>> >>>>> [1] >>>>> https://lore.kernel.org/linux-raid/20230428062845.1975462-1-yukuai1@huaweicloud.com/ >>>>> >>>>> Fixes: 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a >>>>> device from an md array via sysfs") >>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>>> >>>> Thanks for the fix! I made the following changes and applied it >>>> to md-next: >>>> >>>> 1. remove md_rdev->del_work, which is not used any more; >>>> 2. change list_empty_safe to list_empty protected by the mutex, as >>>> list_empty_safe doesn't seem safe here. >> >> Hmm.. it appears that I missed a circular locking dependency with mdadm >> test 21raid5cache (delete_mutex and open_mutex). >> >> Please take a look at this. >> > > Thanks, I think following patch can fix this, but run this test keep > failing in my VM. Song, can you pass this test? I just figure out a better sulution, please ignore this patch. I'll send a fix soon. Thanks, Kuai > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 1be64ab76f5c..62b7d2d0e873 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -766,17 +766,29 @@ static void md_free_rdev(struct mddev *mddev) > { > struct md_rdev *rdev; > struct md_rdev *tmp; > + LIST_HEAD(delete); > > - mutex_lock(&mddev->delete_mutex); > - if (list_empty(&mddev->deleting)) > - goto out; > +retry: > + if (list_empty_careful(&mddev->deleting)) > + return; > + > + if (!mutex_trylock(&mddev->delete_mutex)) > + goto retry; > > - list_for_each_entry_safe(rdev, tmp, &mddev->deleting, same_set) { > + /* > + * Clear 'mddev->deleting' first, because export_rdev can trigger > + * mddev_unlock(), and grab 'delete_mutex' again will cause > deadlock. > + * make sure md_free_rdev() will exit directly untill all rdev is > + * deleted. > + */ > + list_splice(&delete, &mddev->deleting); > + > + list_for_each_entry_safe(rdev, tmp, &delete, same_set) { > list_del_init(&rdev->same_set); > kobject_del(&rdev->kobj); > export_rdev(rdev, mddev); > } > -out: > + > mutex_unlock(&mddev->delete_mutex); > } > > > >> Thanks, >> Song >> >> [ 239.802277] ====================================================== >> [ 239.803650] WARNING: possible circular locking dependency detected >> [ 239.804929] 6.4.0-rc2+ #772 Not tainted >> [ 239.805569] ------------------------------------------------------ >> [ 239.806568] kworker/20:1/222 is trying to acquire lock: >> [ 239.807406] ffff88815335b3f0 (&mddev->delete_mutex){+.+.}-{3:3}, >> at: mddev_unlock+0xe0/0x2d0 >> [ 239.808653] >> but task is already holding lock: >> [ 239.809481] ffffc9000246fe00 >> ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}, at: >> process_one_work+0x462/0xa50 >> [ 239.811049] >> which lock already depends on the new lock. >> >> [ 239.812187] >> the existing dependency chain (in reverse order) is: >> [ 239.813230] >> -> #3 >> ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}: >> [ 239.814468] __flush_work+0xdb/0x690 >> [ 239.815068] r5l_exit_log+0x59/0xc0 >> [ 239.815649] free_conf+0x34/0x320 >> [ 239.816243] raid5_free+0x11/0x40 >> [ 239.816788] __md_stop+0x9f/0x140 >> [ 239.817336] do_md_stop+0x2af/0xaf0 >> [ 239.817901] md_ioctl+0xb34/0x1e30 >> [ 239.818469] blkdev_ioctl+0x2bf/0x3d0 >> [ 239.819079] __x64_sys_ioctl+0xbe/0x100 >> [ 239.819701] do_syscall_64+0x3a/0x90 >> [ 239.820294] entry_SYSCALL_64_after_hwframe+0x72/0xdc >> [ 239.821076] >> -> #2 (&mddev->open_mutex){+.+.}-{3:3}: >> [ 239.821971] __mutex_lock+0x11d/0x13f0 >> [ 239.822592] md_open+0xad/0x180 >> [ 239.822937] kobject: 'md0' (ffff88811c8e5498): kobject_uevent_env >> [ 239.823113] blkdev_get_whole+0x50/0x120 >> [ 239.824216] kobject: 'md0' (ffff88811c8e5498): fill_kobj_path: path >> = '/devices/virtual/block/md0' >> [ 239.824725] blkdev_get_by_dev+0x309/0x4f0 >> [ 239.826747] blkdev_open+0x8a/0x110 >> [ 239.827319] do_dentry_open+0x2a5/0x7b0 >> [ 239.827939] path_openat+0xcee/0x1070 >> [ 239.828545] do_filp_open+0x148/0x1d0 >> [ 239.829137] do_sys_openat2+0x2ec/0x470 >> [ 239.829791] do_sys_open+0x8a/0xd0 >> [ 239.830352] do_syscall_64+0x3a/0x90 >> [ 239.830935] entry_SYSCALL_64_after_hwframe+0x72/0xdc >> [ 239.831728] >> -> #1 (&disk->open_mutex){+.+.}-{3:3}: >> [ 239.832615] __mutex_lock+0x11d/0x13f0 >> [ 239.833231] blkdev_put+0x65/0x350 >> [ 239.833785] export_rdev.isra.63+0x72/0xe0 >> [ 239.834456] mddev_unlock+0x1b1/0x2d0 >> [ 239.835043] md_ioctl+0x96c/0x1e30 >> [ 239.835616] blkdev_ioctl+0x2bf/0x3d0 >> [ 239.836213] __x64_sys_ioctl+0xbe/0x100 >> [ 239.836828] do_syscall_64+0x3a/0x90 >> [ 239.837414] entry_SYSCALL_64_after_hwframe+0x72/0xdc >> [ 239.838225] >> -> #0 (&mddev->delete_mutex){+.+.}-{3:3}: >> [ 239.839145] __lock_acquire+0x1e42/0x34b0 >> [ 239.839797] lock_acquire+0x161/0x3d0 >> [ 239.840395] __mutex_lock+0x11d/0x13f0 >> [ 239.840999] mddev_unlock+0xe0/0x2d0 >> [ 239.841582] r5c_disable_writeback_async+0x261/0x270 >> [ 239.842354] process_one_work+0x55f/0xa50 >> [ 239.842996] worker_thread+0x69/0x660 >> [ 239.843595] kthread+0x1a1/0x1e0 >> [ 239.844131] ret_from_fork+0x2c/0x50 >> [ 239.844712] >> other info that might help us debug this: >> >> [ 239.845813] Chain exists of: >> &mddev->delete_mutex --> &mddev->open_mutex --> >> (work_completion)(&log->disable_writeback_work) >> >> [ 239.847776] Possible unsafe locking scenario: >> >> [ 239.848623] CPU0 CPU1 >> [ 239.849262] ---- ---- >> [ 239.849927] lock((work_completion)(&log->disable_writeback_work)); >> [ 239.850831] lock(&mddev->open_mutex); >> [ 239.851719] >> lock((work_completion)(&log->disable_writeback_work)); >> [ 239.852945] lock(&mddev->delete_mutex); >> [ 239.853517] >> *** DEADLOCK *** >> >> [ 239.854345] 2 locks held by kworker/20:1/222: >> [ 239.854960] #0: ffff8881000b2748 >> ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x462/0xa50 >> [ 239.856287] #1: ffffc9000246fe00 >> ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}, at: >> process_one_work+0x462/0xa50 >> [ 239.857939] >> stack backtrace: >> [ 239.858567] CPU: 20 PID: 222 Comm: kworker/20:1 Not tainted >> 6.4.0-rc2+ #772 >> [ 239.859533] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 >> [ 239.861135] Workqueue: events r5c_disable_writeback_async >> [ 239.861903] Call Trace: >> [ 239.862281] <TASK> >> [ 239.862606] dump_stack_lvl+0x46/0x80 >> [ 239.863144] check_noncircular+0x1ff/0x240 >> [ 239.863737] ? __pfx_check_noncircular+0x10/0x10 >> [ 239.864401] ? mark_lock.part.45+0x11a/0x1350 >> [ 239.865019] ? add_chain_block+0x23b/0x310 >> [ 239.865624] __lock_acquire+0x1e42/0x34b0 >> [ 239.866202] ? select_task_rq_fair+0x2b0/0x1e50 >> [ 239.866845] ? __pfx___lock_acquire+0x10/0x10 >> [ 239.867473] ? ttwu_queue_wakelist+0x1cc/0x1f0 >> [ 239.868107] ? __smp_call_single_queue+0x137/0x2a0 >> [ 239.868816] ? __default_send_IPI_dest_field+0x2b/0xa0 >> [ 239.869548] ? __lock_acquire+0xa5d/0x34b0 >> [ 239.870131] lock_acquire+0x161/0x3d0 >> [ 239.870660] ? mddev_unlock+0xe0/0x2d0 >> [ 239.871216] ? __pfx_lock_acquire+0x10/0x10 >> [ 239.871808] ? lock_is_held_type+0xd8/0x130 >> [ 239.872405] __mutex_lock+0x11d/0x13f0 >> [ 239.872936] ? mddev_unlock+0xe0/0x2d0 >> [ 239.873474] ? mddev_unlock+0xe0/0x2d0 >> [ 239.874006] ? __mutex_unlock_slowpath+0x12c/0x410 >> [ 239.874695] ? __pfx___mutex_lock+0x10/0x10 >> [ 239.875297] ? __pfx_rcu_read_lock_held+0x10/0x10 >> [ 239.875986] ? mddev_unlock+0xe0/0x2d0 >> [ 239.876530] mddev_unlock+0xe0/0x2d0 >> [ 239.877044] r5c_disable_writeback_async+0x261/0x270 >> [ 239.877753] ? __pfx_r5c_disable_writeback_async+0x10/0x10 >> [ 239.878531] ? __switch_to+0x2d8/0x770 >> [ 239.879066] ? __pfx_autoremove_wake_function+0x10/0x10 >> [ 239.879813] process_one_work+0x55f/0xa50 >> [ 239.880398] ? __pfx_process_one_work+0x10/0x10 >> [ 239.881073] ? _raw_spin_lock_irq+0x5c/0x90 >> [ 239.881675] worker_thread+0x69/0x660 >> [ 239.882206] ? __kthread_parkme+0xe4/0x100 >> [ 239.882786] ? __pfx_worker_thread+0x10/0x10 >> [ 239.883395] kthread+0x1a1/0x1e0 >> [ 239.883863] ? __pfx_kthread+0x10/0x10 >> [ 239.884406] ret_from_fork+0x2c/0x50 >> [ 239.884925] </TASK> >> . >> > > . >
diff --git a/drivers/md/md.c b/drivers/md/md.c index 7455bc9d8498..a0a2be43aabf 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -87,11 +87,11 @@ static struct module *md_cluster_mod; static DECLARE_WAIT_QUEUE_HEAD(resync_wait); static struct workqueue_struct *md_wq; static struct workqueue_struct *md_misc_wq; -static struct workqueue_struct *md_rdev_misc_wq; static int remove_and_add_spares(struct mddev *mddev, struct md_rdev *this); static void mddev_detach(struct mddev *mddev); +static void export_rdev(struct md_rdev *rdev); /* * Default number of read corrections we'll attempt on an rdev @@ -643,9 +643,11 @@ void mddev_init(struct mddev *mddev) { mutex_init(&mddev->open_mutex); mutex_init(&mddev->reconfig_mutex); + mutex_init(&mddev->delete_mutex); mutex_init(&mddev->bitmap_info.mutex); INIT_LIST_HEAD(&mddev->disks); INIT_LIST_HEAD(&mddev->all_mddevs); + INIT_LIST_HEAD(&mddev->deleting); timer_setup(&mddev->safemode_timer, md_safemode_timeout, 0); atomic_set(&mddev->active, 1); atomic_set(&mddev->openers, 0); @@ -747,6 +749,23 @@ static void mddev_free(struct mddev *mddev) static const struct attribute_group md_redundancy_group; +static void md_free_rdev(struct mddev *mddev) +{ + struct md_rdev *rdev; + struct md_rdev *tmp; + + if (list_empty_careful(&mddev->deleting)) + return; + + mutex_lock(&mddev->delete_mutex); + list_for_each_entry_safe(rdev, tmp, &mddev->deleting, same_set) { + list_del_init(&rdev->same_set); + kobject_del(&rdev->kobj); + export_rdev(rdev); + } + mutex_unlock(&mddev->delete_mutex); +} + void mddev_unlock(struct mddev *mddev) { if (mddev->to_remove) { @@ -788,6 +807,8 @@ void mddev_unlock(struct mddev *mddev) } else mutex_unlock(&mddev->reconfig_mutex); + md_free_rdev(mddev); + /* As we've dropped the mutex we need a spinlock to * make sure the thread doesn't disappear */ @@ -2428,13 +2449,6 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev) return err; } -static void rdev_delayed_delete(struct work_struct *ws) -{ - struct md_rdev *rdev = container_of(ws, struct md_rdev, del_work); - kobject_del(&rdev->kobj); - kobject_put(&rdev->kobj); -} - void md_autodetect_dev(dev_t dev); static void export_rdev(struct md_rdev *rdev) @@ -2452,6 +2466,8 @@ static void export_rdev(struct md_rdev *rdev) static void md_kick_rdev_from_array(struct md_rdev *rdev) { + struct mddev *mddev = rdev->mddev; + bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk); list_del_rcu(&rdev->same_set); pr_debug("md: unbind<%pg>\n", rdev->bdev); @@ -2465,15 +2481,17 @@ static void md_kick_rdev_from_array(struct md_rdev *rdev) rdev->sysfs_unack_badblocks = NULL; rdev->sysfs_badblocks = NULL; rdev->badblocks.count = 0; - /* We need to delay this, otherwise we can deadlock when - * writing to 'remove' to "dev/state". We also need - * to delay it due to rcu usage. - */ + synchronize_rcu(); - INIT_WORK(&rdev->del_work, rdev_delayed_delete); - kobject_get(&rdev->kobj); - queue_work(md_rdev_misc_wq, &rdev->del_work); - export_rdev(rdev); + + /* + * kobject_del() will wait for all in progress writers to be done, where + * reconfig_mutex is held, hence it can't be called under + * reconfig_mutex and it's delayed to mddev_unlock(). + */ + mutex_lock(&mddev->delete_mutex); + list_add(&rdev->same_set, &mddev->deleting); + mutex_unlock(&mddev->delete_mutex); } static void export_array(struct mddev *mddev) @@ -3541,6 +3559,7 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr, { struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr); struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj); + struct kernfs_node *kn = NULL; ssize_t rv; struct mddev *mddev = rdev->mddev; @@ -3548,6 +3567,10 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr, return -EIO; if (!capable(CAP_SYS_ADMIN)) return -EACCES; + + if (entry->store == state_store && cmd_match(page, "remove")) + kn = sysfs_break_active_protection(kobj, attr); + rv = mddev ? mddev_lock(mddev) : -ENODEV; if (!rv) { if (rdev->mddev == NULL) @@ -3556,6 +3579,10 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr, rv = entry->store(rdev, page, length); mddev_unlock(mddev); } + + if (kn) + sysfs_unbreak_active_protection(kn); + return rv; } @@ -4479,20 +4506,6 @@ null_show(struct mddev *mddev, char *page) return -EINVAL; } -/* need to ensure rdev_delayed_delete() has completed */ -static void flush_rdev_wq(struct mddev *mddev) -{ - struct md_rdev *rdev; - - rcu_read_lock(); - rdev_for_each_rcu(rdev, mddev) - if (work_pending(&rdev->del_work)) { - flush_workqueue(md_rdev_misc_wq); - break; - } - rcu_read_unlock(); -} - static ssize_t new_dev_store(struct mddev *mddev, const char *buf, size_t len) { @@ -4520,7 +4533,6 @@ new_dev_store(struct mddev *mddev, const char *buf, size_t len) minor != MINOR(dev)) return -EOVERFLOW; - flush_rdev_wq(mddev); err = mddev_lock(mddev); if (err) return err; @@ -5590,7 +5602,6 @@ struct mddev *md_alloc(dev_t dev, char *name) * removed (mddev_delayed_delete). */ flush_workqueue(md_misc_wq); - flush_workqueue(md_rdev_misc_wq); mutex_lock(&disks_mutex); mddev = mddev_alloc(dev); @@ -7553,9 +7564,6 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode, } - if (cmd == ADD_NEW_DISK || cmd == HOT_ADD_DISK) - flush_rdev_wq(mddev); - if (cmd == HOT_REMOVE_DISK) /* need to ensure recovery thread has run */ wait_event_interruptible_timeout(mddev->sb_wait, @@ -9618,10 +9626,6 @@ static int __init md_init(void) if (!md_misc_wq) goto err_misc_wq; - md_rdev_misc_wq = alloc_workqueue("md_rdev_misc", 0, 0); - if (!md_rdev_misc_wq) - goto err_rdev_misc_wq; - ret = __register_blkdev(MD_MAJOR, "md", md_probe); if (ret < 0) goto err_md; @@ -9640,8 +9644,6 @@ static int __init md_init(void) err_mdp: unregister_blkdev(MD_MAJOR, "md"); err_md: - destroy_workqueue(md_rdev_misc_wq); -err_rdev_misc_wq: destroy_workqueue(md_misc_wq); err_misc_wq: destroy_workqueue(md_wq); @@ -9937,7 +9939,6 @@ static __exit void md_exit(void) } spin_unlock(&all_mddevs_lock); - destroy_workqueue(md_rdev_misc_wq); destroy_workqueue(md_misc_wq); destroy_workqueue(md_wq); } diff --git a/drivers/md/md.h b/drivers/md/md.h index 1eec65cf783c..4d191db831da 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -531,6 +531,14 @@ struct mddev { unsigned int good_device_nr; /* good device num within cluster raid */ unsigned int noio_flag; /* for memalloc scope API */ + /* + * Temporarily store rdev that will be finally removed when + * reconfig_mutex is unlocked. + */ + struct list_head deleting; + /* Protect the deleting list */ + struct mutex delete_mutex; + bool has_superblocks:1; bool fail_last_dev:1; bool serialize_policy:1;