Message ID | 20221217030908.1261787-1-yukuai1@huaweicloud.com (mailing list archive) |
---|---|
Headers | show |
Series | blk-cgroup: synchronize del_gendisk() with configuring cgroup policy | expand |
Hello, On Sat, Dec 17, 2022 at 11:09:04AM +0800, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > iocost is initialized when it's configured the first time, and iocost > initializing can race with del_gendisk(), which will cause null pointer > dereference: > > t1 t2 > ioc_qos_write > blk_iocost_init > rq_qos_add > del_gendisk > rq_qos_exit > //iocost is removed from q->roqs > blkcg_activate_policy > pd_init_fn > ioc_pd_init > ioc = q_to_ioc(blkg->q) > //can't find iocost and return null > > And iolatency is about to switch to the same lazy initialization. > > This patchset fix this problem by synchronize rq_qos_add() and > blkcg_activate_policy() with rq_qos_exit(). So, the patchset seems a bit overly complicated to me. What do you think about the following? * These init/exit paths are super cold path, just protecting them with a global mutex is probably enough. If we encounter a scalability problem, it's easy to fix down the line. * If we're synchronizing this with a mutex anyway, no need to grab the queue_lock, right? rq_qos_add/del/exit() can all just hold the mutex. * And we can keep the state tracking within rq_qos. When rq_qos_exit() is called, mark it so that future adds will fail - be that a special ->next value a queue flag or whatever. Thanks.
Hi, 在 2022/12/20 4:55, Tejun Heo 写道: > Hello, > > On Sat, Dec 17, 2022 at 11:09:04AM +0800, Yu Kuai wrote: >> From: Yu Kuai <yukuai3@huawei.com> >> >> iocost is initialized when it's configured the first time, and iocost >> initializing can race with del_gendisk(), which will cause null pointer >> dereference: >> >> t1 t2 >> ioc_qos_write >> blk_iocost_init >> rq_qos_add >> del_gendisk >> rq_qos_exit >> //iocost is removed from q->roqs >> blkcg_activate_policy >> pd_init_fn >> ioc_pd_init >> ioc = q_to_ioc(blkg->q) >> //can't find iocost and return null >> >> And iolatency is about to switch to the same lazy initialization. >> >> This patchset fix this problem by synchronize rq_qos_add() and >> blkcg_activate_policy() with rq_qos_exit(). > > So, the patchset seems a bit overly complicated to me. What do you think > about the following? > > * These init/exit paths are super cold path, just protecting them with a > global mutex is probably enough. If we encounter a scalability problem, > it's easy to fix down the line. > > * If we're synchronizing this with a mutex anyway, no need to grab the > queue_lock, right? rq_qos_add/del/exit() can all just hold the mutex. > > * And we can keep the state tracking within rq_qos. When rq_qos_exit() is > called, mark it so that future adds will fail - be that a special ->next > value a queue flag or whatever. Yes, that sounds good. BTW, queue_lock is also used to protect pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is problematic: blkcg_activate_policy spin_lock_irq(&q->queue_lock); list_for_each_entry_reverse(blkg, &q->blkg_list pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed spin_unlock_irq(&q->queue_lock); // release queue_lock here is problematic, this will cause pd_offline_fn called without pd_init_fn. pd_alloc_fn(__GFP_NOWARN,...) If we are using a mutex to protect rq_qos ops, it seems the right thing to do do also using the mutex to protect blkcg_policy ops, and this problem can be fixed because mutex can be held to alloc memroy with GFP_KERNEL. What do you think? Thanks, Kuai > > Thanks. >
Hello, On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote: > Yes, that sounds good. BTW, queue_lock is also used to protect > pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is > problematic: > > blkcg_activate_policy > spin_lock_irq(&q->queue_lock); > list_for_each_entry_reverse(blkg, &q->blkg_list > pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed > > spin_unlock_irq(&q->queue_lock); > // release queue_lock here is problematic, this will cause > pd_offline_fn called without pd_init_fn. > pd_alloc_fn(__GFP_NOWARN,...) So, if a blkg is destroyed while a policy is being activated, right? > If we are using a mutex to protect rq_qos ops, it seems the right thing > to do do also using the mutex to protect blkcg_policy ops, and this > problem can be fixed because mutex can be held to alloc memroy with > GFP_KERNEL. What do you think? One worry is that switching to mutex can be more headache due to destroy path synchronization. Another approach would be using a per-blkg flag to track whether a blkg has been initialized. Thanks.
Hi, 在 2022/12/21 0:01, Tejun Heo 写道: > Hello, > > On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote: >> Yes, that sounds good. BTW, queue_lock is also used to protect >> pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is >> problematic: >> >> blkcg_activate_policy >> spin_lock_irq(&q->queue_lock); >> list_for_each_entry_reverse(blkg, &q->blkg_list >> pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed >> >> spin_unlock_irq(&q->queue_lock); >> // release queue_lock here is problematic, this will cause >> pd_offline_fn called without pd_init_fn. >> pd_alloc_fn(__GFP_NOWARN,...) > > So, if a blkg is destroyed while a policy is being activated, right? Yes, remove cgroup can race with this, for bfq null pointer deference will be triggered in bfq_pd_offline(). > >> If we are using a mutex to protect rq_qos ops, it seems the right thing >> to do do also using the mutex to protect blkcg_policy ops, and this >> problem can be fixed because mutex can be held to alloc memroy with >> GFP_KERNEL. What do you think? > > One worry is that switching to mutex can be more headache due to destroy > path synchronization. Another approach would be using a per-blkg flag to > track whether a blkg has been initialized. I think perhaps you mean per blkg_policy_data flag? per blkg flag should not work in this case. Thanks, Kuai > > Thanks. >
Hi, 在 2022/12/21 9:10, Yu Kuai 写道: > Hi, > > 在 2022/12/21 0:01, Tejun Heo 写道: >> Hello, >> >> On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote: >>> Yes, that sounds good. BTW, queue_lock is also used to protect >>> pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is >>> problematic: >>> >>> blkcg_activate_policy >>> spin_lock_irq(&q->queue_lock); >>> list_for_each_entry_reverse(blkg, &q->blkg_list >>> pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed >>> >>> spin_unlock_irq(&q->queue_lock); >>> // release queue_lock here is problematic, this will cause >>> pd_offline_fn called without pd_init_fn. >>> pd_alloc_fn(__GFP_NOWARN,...) >> >> So, if a blkg is destroyed while a policy is being activated, right? > Yes, remove cgroup can race with this, for bfq null pointer deference > will be triggered in bfq_pd_offline(). BTW, We just found that pd_online_fn() is missed in blkcg_activate_policy()... Currently this is not a problem because only bl-throttle implement it, and blk-throttle is activated while creating blkg. Thanks, Kuai > >> >>> If we are using a mutex to protect rq_qos ops, it seems the right thing >>> to do do also using the mutex to protect blkcg_policy ops, and this >>> problem can be fixed because mutex can be held to alloc memroy with >>> GFP_KERNEL. What do you think? >> >> One worry is that switching to mutex can be more headache due to destroy >> path synchronization. Another approach would be using a per-blkg flag to >> track whether a blkg has been initialized. > I think perhaps you mean per blkg_policy_data flag? per blkg flag should > not work in this case. > > Thanks, > Kuai >> >> Thanks. >> > > . >
On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote: > If we are using a mutex to protect rq_qos ops, it seems the right thing > to do do also using the mutex to protect blkcg_policy ops, and this > problem can be fixed because mutex can be held to alloc memroy with > GFP_KERNEL. What do you think? Getting rid of the atomic allocations would be awesome. FYI, I'm also in favor of everything that moves things out of queue_lock into more dedicated locks. queue_lock is such an undocument mess of untargeted things that don't realted to each other right now that splitting and removing it is becoming more and more important.
From: Yu Kuai <yukuai3@huawei.com> iocost is initialized when it's configured the first time, and iocost initializing can race with del_gendisk(), which will cause null pointer dereference: t1 t2 ioc_qos_write blk_iocost_init rq_qos_add del_gendisk rq_qos_exit //iocost is removed from q->roqs blkcg_activate_policy pd_init_fn ioc_pd_init ioc = q_to_ioc(blkg->q) //can't find iocost and return null And iolatency is about to switch to the same lazy initialization. This patchset fix this problem by synchronize rq_qos_add() and blkcg_activate_policy() with rq_qos_exit(). Yu Kuai (4): block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() block/rq_qos: fail rq_qos_add() after del_gendisk() blk-cgroup: add a new interface blkcg_conf_close_bdev() blk-cgroup: synchronize del_gendisk() with configuring cgroup policy block/blk-cgroup.c | 12 ++++++++++-- block/blk-cgroup.h | 1 + block/blk-iocost.c | 8 ++++---- block/blk-rq-qos.c | 25 ++++++++++++++++++++----- block/blk-rq-qos.h | 17 +++++++++++++---- include/linux/blkdev.h | 1 + 6 files changed, 49 insertions(+), 15 deletions(-)