Message ID | 20240130021843.3608859-14-yukuai1@huaweicloud.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | dm-raid: fix v6.7 regressions | expand |
On Tue, 30 Jan 2024, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > __dm_destroy() guarantee that device openers is zero, and then > only call 'presuspend' and 'postsuspend' for the target. For > request-based dm, 'md->holders' will be grabbed for each rq and > __dm_destroy() will wait for 'md->holders' to be zero. However, for > bio-based device, __dm_destroy() doesn't wait for all bios to be done. > > Fix this problem by calling dm_wait_for_completion() to wail for all > inflight IO to be done, like what dm_suspend() does. If the number of openers is zero, it is guaranteed that there are no bios in flight. Therefore, we don't have to wait for them. If there are bios in flight, it is a bug in the code that issues the bios. You can put WARN_ON(dm_in_flight_bios(md)) there. Mikulas > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > drivers/md/dm.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 8dcabf84d866..2c0eae67d0f1 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -58,6 +58,7 @@ static DEFINE_IDR(_minor_idr); > static DEFINE_SPINLOCK(_minor_lock); > > static void do_deferred_remove(struct work_struct *w); > +static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_state); > > static DECLARE_WORK(deferred_remove_work, do_deferred_remove); > > @@ -2495,6 +2496,8 @@ static void __dm_destroy(struct mapped_device *md, bool wait) > if (!dm_suspended_md(md)) { > dm_table_presuspend_targets(map); > set_bit(DMF_SUSPENDED, &md->flags); > + if (wait) > + dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE); > set_bit(DMF_POST_SUSPENDING, &md->flags); > dm_table_postsuspend_targets(map); > } > -- > 2.39.2 >
Hi, 在 2024/01/30 19:46, Mikulas Patocka 写道: > > > On Tue, 30 Jan 2024, Yu Kuai wrote: > >> From: Yu Kuai <yukuai3@huawei.com> >> >> __dm_destroy() guarantee that device openers is zero, and then >> only call 'presuspend' and 'postsuspend' for the target. For >> request-based dm, 'md->holders' will be grabbed for each rq and >> __dm_destroy() will wait for 'md->holders' to be zero. However, for >> bio-based device, __dm_destroy() doesn't wait for all bios to be done. >> >> Fix this problem by calling dm_wait_for_completion() to wail for all >> inflight IO to be done, like what dm_suspend() does. > > If the number of openers is zero, it is guaranteed that there are no bios > in flight. Therefore, we don't have to wait for them. > > If there are bios in flight, it is a bug in the code that issues the bios. > You can put WARN_ON(dm_in_flight_bios(md)) there. I add this patch because while testing, there is a problem that is hard to reporduce, as I mentioned in the other thread. I'll add BUG_ON() and try if I can still reporduce this problem without triggering it. Thanks, Kuai [12504.959682] BUG bio-296 (Not tainted): Object already free [12504.960239] ----------------------------------------------------------------------------- [12504.960239] [12504.961209] Allocated in mempool_alloc+0xe8/0x270 age=30 cpu=1 pid=203288 [12504.961905] kmem_cache_alloc+0x36a/0x3b0 [12504.962324] mempool_alloc+0xe8/0x270 [12504.962712] bio_alloc_bioset+0x3b5/0x920 [12504.963129] bio_alloc_clone+0x3e/0x160 [12504.963533] alloc_io+0x3d/0x1f0 [12504.963876] dm_submit_bio+0x12f/0xa30 [12504.964267] __submit_bio+0x9c/0xe0 [12504.964639] submit_bio_noacct_nocheck+0x25a/0x570 [12504.965136] submit_bio_wait+0xc2/0x160 [12504.965535] blkdev_issue_zeroout+0x19b/0x2e0 [12504.965991] ext4_init_inode_table+0x246/0x560 [12504.966462] ext4_lazyinit_thread+0x750/0xbe0 [12504.966922] kthread+0x1b4/0x1f0 > > Mikulas > >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> --- >> drivers/md/dm.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/md/dm.c b/drivers/md/dm.c >> index 8dcabf84d866..2c0eae67d0f1 100644 >> --- a/drivers/md/dm.c >> +++ b/drivers/md/dm.c >> @@ -58,6 +58,7 @@ static DEFINE_IDR(_minor_idr); >> static DEFINE_SPINLOCK(_minor_lock); >> >> static void do_deferred_remove(struct work_struct *w); >> +static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_state); >> >> static DECLARE_WORK(deferred_remove_work, do_deferred_remove); >> >> @@ -2495,6 +2496,8 @@ static void __dm_destroy(struct mapped_device *md, bool wait) >> if (!dm_suspended_md(md)) { >> dm_table_presuspend_targets(map); >> set_bit(DMF_SUSPENDED, &md->flags); >> + if (wait) >> + dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE); >> set_bit(DMF_POST_SUSPENDING, &md->flags); >> dm_table_postsuspend_targets(map); >> } >> -- >> 2.39.2 >> > > . >
Hi, 在 2024/01/30 21:05, Yu Kuai 写道: > Hi, > > 在 2024/01/30 19:46, Mikulas Patocka 写道: >> >> >> On Tue, 30 Jan 2024, Yu Kuai wrote: >> >>> From: Yu Kuai <yukuai3@huawei.com> >>> >>> __dm_destroy() guarantee that device openers is zero, and then >>> only call 'presuspend' and 'postsuspend' for the target. For >>> request-based dm, 'md->holders' will be grabbed for each rq and >>> __dm_destroy() will wait for 'md->holders' to be zero. However, for >>> bio-based device, __dm_destroy() doesn't wait for all bios to be done. >>> >>> Fix this problem by calling dm_wait_for_completion() to wail for all >>> inflight IO to be done, like what dm_suspend() does. >> >> If the number of openers is zero, it is guaranteed that there are no bios >> in flight. Therefore, we don't have to wait for them. >> >> If there are bios in flight, it is a bug in the code that issues the >> bios. >> You can put WARN_ON(dm_in_flight_bios(md)) there. > > I add this patch because while testing, there is a problem that is > hard to reporduce, as I mentioned in the other thread. I'll add BUG_ON() > and try if I can still reporduce this problem without triggering it. > > Thanks, > Kuai > > [12504.959682] BUG bio-296 (Not tainted): Object already free > [12504.960239] > ----------------------------------------------------------------------------- > > [12504.960239] > [12504.961209] Allocated in mempool_alloc+0xe8/0x270 age=30 cpu=1 > pid=203288 > [12504.961905] kmem_cache_alloc+0x36a/0x3b0 > [12504.962324] mempool_alloc+0xe8/0x270 > [12504.962712] bio_alloc_bioset+0x3b5/0x920 > [12504.963129] bio_alloc_clone+0x3e/0x160 > [12504.963533] alloc_io+0x3d/0x1f0 > [12504.963876] dm_submit_bio+0x12f/0xa30 > [12504.964267] __submit_bio+0x9c/0xe0 > [12504.964639] submit_bio_noacct_nocheck+0x25a/0x570 > [12504.965136] submit_bio_wait+0xc2/0x160 > [12504.965535] blkdev_issue_zeroout+0x19b/0x2e0 > [12504.965991] ext4_init_inode_table+0x246/0x560 > [12504.966462] ext4_lazyinit_thread+0x750/0xbe0 > [12504.966922] kthread+0x1b4/0x1f0 After adding the BUG_ON(), I can still reporducing this BUG, this really looks like a BUG, and I don't think this is related to dm-raid. Perhaps you guys can take a look? Thanks, Kuai >> >> Mikulas >> >>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>> --- >>> drivers/md/dm.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c >>> index 8dcabf84d866..2c0eae67d0f1 100644 >>> --- a/drivers/md/dm.c >>> +++ b/drivers/md/dm.c >>> @@ -58,6 +58,7 @@ static DEFINE_IDR(_minor_idr); >>> static DEFINE_SPINLOCK(_minor_lock); >>> static void do_deferred_remove(struct work_struct *w); >>> +static int dm_wait_for_completion(struct mapped_device *md, unsigned >>> int task_state); >>> static DECLARE_WORK(deferred_remove_work, do_deferred_remove); >>> @@ -2495,6 +2496,8 @@ static void __dm_destroy(struct mapped_device >>> *md, bool wait) >>> if (!dm_suspended_md(md)) { >>> dm_table_presuspend_targets(map); >>> set_bit(DMF_SUSPENDED, &md->flags); >>> + if (wait) >>> + dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE); >>> set_bit(DMF_POST_SUSPENDING, &md->flags); >>> dm_table_postsuspend_targets(map); >>> } >>> -- >>> 2.39.2 >>> >> >> . >> > > . >
diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 8dcabf84d866..2c0eae67d0f1 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -58,6 +58,7 @@ static DEFINE_IDR(_minor_idr); static DEFINE_SPINLOCK(_minor_lock); static void do_deferred_remove(struct work_struct *w); +static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_state); static DECLARE_WORK(deferred_remove_work, do_deferred_remove); @@ -2495,6 +2496,8 @@ static void __dm_destroy(struct mapped_device *md, bool wait) if (!dm_suspended_md(md)) { dm_table_presuspend_targets(map); set_bit(DMF_SUSPENDED, &md->flags); + if (wait) + dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE); set_bit(DMF_POST_SUSPENDING, &md->flags); dm_table_postsuspend_targets(map); }