Message ID | 20221108181030.1611703-1-khazhy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] bfq: fix waker_bfqq inconsistency crash | expand |
On Tue 08-11-22 10:10:29, Khazhismel Kumykov wrote: > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL, > but woken_list_node still being hashed. This would happen when > bfq_init_rq() expects a brand new allocated queue to be returned from > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq > without resetting woken_list_node. Since we can always return oom_bfqq > when attempting to allocate, we cannot assume waker_bfqq starts as NULL. > > Avoid setting woken_bfqq for oom_bfqq entirely, as it's not useful. > > Crashes would have a stacktrace like: > [160595.656560] bfq_add_bfqq_busy+0x110/0x1ec > [160595.661142] bfq_add_request+0x6bc/0x980 > [160595.666602] bfq_insert_request+0x8ec/0x1240 > [160595.671762] bfq_insert_requests+0x58/0x9c > [160595.676420] blk_mq_sched_insert_request+0x11c/0x198 > [160595.682107] blk_mq_submit_bio+0x270/0x62c > [160595.686759] __submit_bio_noacct_mq+0xec/0x178 > [160595.691926] submit_bio+0x120/0x184 > [160595.695990] ext4_mpage_readpages+0x77c/0x7c8 > [160595.701026] ext4_readpage+0x60/0xb0 > [160595.705158] filemap_read_page+0x54/0x114 > [160595.711961] filemap_fault+0x228/0x5f4 > [160595.716272] do_read_fault+0xe0/0x1f0 > [160595.720487] do_fault+0x40/0x1c8 > > Tested by injecting random failures into bfq_get_queue, crashes go away > completely. > > Fixes: 8ef3fc3a043c ("block, bfq: make shared queues inherit wakers") > Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Looks good. Thanks! Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > block/bfq-iosched.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > index 7ea427817f7f..ca04ec868c40 100644 > --- a/block/bfq-iosched.c > +++ b/block/bfq-iosched.c > @@ -6784,6 +6784,12 @@ static struct bfq_queue *bfq_init_rq(struct request *rq) > bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio, > true, is_sync, > NULL); > + if (unlikely(bfqq == &bfqd->oom_bfqq)) > + bfqq_already_existing = true; > + } else > + bfqq_already_existing = true; > + > + if (!bfqq_already_existing) { > bfqq->waker_bfqq = old_bfqq->waker_bfqq; > bfqq->tentative_waker_bfqq = NULL; > > @@ -6797,8 +6803,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq) > if (bfqq->waker_bfqq) > hlist_add_head(&bfqq->woken_list_node, > &bfqq->waker_bfqq->woken_list); > - } else > - bfqq_already_existing = true; > + } > } > } > > -- > 2.38.1.431.g37b22c650d-goog >
On Tue, 8 Nov 2022 10:10:29 -0800, Khazhismel Kumykov wrote: > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL, > but woken_list_node still being hashed. This would happen when > bfq_init_rq() expects a brand new allocated queue to be returned from > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq > without resetting woken_list_node. Since we can always return oom_bfqq > when attempting to allocate, we cannot assume waker_bfqq starts as NULL. > > [...] Applied, thanks! [1/2] bfq: fix waker_bfqq inconsistency crash commit: a1795c2ccb1e4c49220d2a0d381540024d71647c [2/2] bfq: ignore oom_bfqq in bfq_check_waker commit: 99771d73ff4539f2337b84917f4792abf0d8931b Best regards,
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 7ea427817f7f..ca04ec868c40 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -6784,6 +6784,12 @@ static struct bfq_queue *bfq_init_rq(struct request *rq) bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio, true, is_sync, NULL); + if (unlikely(bfqq == &bfqd->oom_bfqq)) + bfqq_already_existing = true; + } else + bfqq_already_existing = true; + + if (!bfqq_already_existing) { bfqq->waker_bfqq = old_bfqq->waker_bfqq; bfqq->tentative_waker_bfqq = NULL; @@ -6797,8 +6803,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq) if (bfqq->waker_bfqq) hlist_add_head(&bfqq->woken_list_node, &bfqq->waker_bfqq->woken_list); - } else - bfqq_already_existing = true; + } } }
This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL, but woken_list_node still being hashed. This would happen when bfq_init_rq() expects a brand new allocated queue to be returned from bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq without resetting woken_list_node. Since we can always return oom_bfqq when attempting to allocate, we cannot assume waker_bfqq starts as NULL. Avoid setting woken_bfqq for oom_bfqq entirely, as it's not useful. Crashes would have a stacktrace like: [160595.656560] bfq_add_bfqq_busy+0x110/0x1ec [160595.661142] bfq_add_request+0x6bc/0x980 [160595.666602] bfq_insert_request+0x8ec/0x1240 [160595.671762] bfq_insert_requests+0x58/0x9c [160595.676420] blk_mq_sched_insert_request+0x11c/0x198 [160595.682107] blk_mq_submit_bio+0x270/0x62c [160595.686759] __submit_bio_noacct_mq+0xec/0x178 [160595.691926] submit_bio+0x120/0x184 [160595.695990] ext4_mpage_readpages+0x77c/0x7c8 [160595.701026] ext4_readpage+0x60/0xb0 [160595.705158] filemap_read_page+0x54/0x114 [160595.711961] filemap_fault+0x228/0x5f4 [160595.716272] do_read_fault+0xe0/0x1f0 [160595.720487] do_fault+0x40/0x1c8 Tested by injecting random failures into bfq_get_queue, crashes go away completely. Fixes: 8ef3fc3a043c ("block, bfq: make shared queues inherit wakers") Signed-off-by: Khazhismel Kumykov <khazhy@google.com> --- block/bfq-iosched.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)