Message ID | 20240520033847.13533-1-yang.yang@vivo.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] blk-mq: fix potential I/O hang caused by batch wakeup | expand |
On 5/19/24 20:38, Yang Yang wrote: > The depth is 62, and the wake_batch is 8. In the following situation, > the task would hang forever. > > t1: t2: t3: > blk_mq_get_tag . . > io_schedule . . > elevator_switch . > blk_mq_freeze_queue . > blk_freeze_queue_start . > blk_mq_freeze_queue_wait . > blk_mq_submit_bio > __bio_queue_enter > > Fix this issue by waking up all the waiters sleeping on tags after > freezing the queue. Shouldn't blk_mq_alloc_request() be mentioned in t1 since that is the function that calls blk_queue_enter()? > diff --git a/block/blk-core.c b/block/blk-core.c > index a16b5abdbbf5..e1eacfad6e5b 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q) > * prevent I/O from crossing blk_queue_enter(). > */ > blk_freeze_queue_start(q); > - if (queue_is_mq(q)) > - blk_mq_wake_waiters(q); > /* Make blk_queue_enter() reexamine the DYING flag. */ > wake_up_all(&q->mq_freeze_wq); > } Why has blk_queue_start_drain() been modified? I don't see any reference in the patch description to blk_queue_start_drain(). Am I perhaps missing something? > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 4ecb9db62337..9eb3139e713a 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q) > if (++q->mq_freeze_depth == 1) { > percpu_ref_kill(&q->q_usage_counter); > mutex_unlock(&q->mq_freeze_lock); > - if (queue_is_mq(q)) > + if (queue_is_mq(q)) { > + blk_mq_wake_waiters(q); > blk_mq_run_hw_queues(q, false); > + } > } else { > mutex_unlock(&q->mq_freeze_lock); > } Why would the above change be necessary? If the blk_queue_enter() call by blk_mq_alloc_request() succeeds and blk_mq_get_tag() calls io_schedule(), io_schedule() will be woken up indirectly by the blk_mq_run_hw_queues() call because that call will free one of the tags that the io_schedule() call is waiting for. Thanks, Bart.
On 2024/5/21 2:11, Bart Van Assche wrote: > On 5/19/24 20:38, Yang Yang wrote: >> The depth is 62, and the wake_batch is 8. In the following situation, >> the task would hang forever. >> >> t1: t2: t3: >> blk_mq_get_tag . . >> io_schedule . . >> elevator_switch . >> blk_mq_freeze_queue . >> blk_freeze_queue_start . >> blk_mq_freeze_queue_wait . >> blk_mq_submit_bio >> __bio_queue_enter >> >> Fix this issue by waking up all the waiters sleeping on tags after >> freezing the queue. > > Shouldn't blk_mq_alloc_request() be mentioned in t1 since that is the function > that calls blk_queue_enter()? t1: t2: t3: blk_mq_submit_bio . . __blk_mq_alloc_requests . . blk_mq_get_tag . . io_schedule . . elevator_switch . blk_mq_freeze_queue . blk_freeze_queue_start . q->mq_freeze_depth=1 . blk_mq_freeze_queue_wait . blk_mq_submit_bio __bio_queue_enter wait_event(!q->mq_freeze_depth) > >> diff --git a/block/blk-core.c b/block/blk-core.c >> index a16b5abdbbf5..e1eacfad6e5b 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q) >> * prevent I/O from crossing blk_queue_enter(). >> */ >> blk_freeze_queue_start(q); >> - if (queue_is_mq(q)) >> - blk_mq_wake_waiters(q); >> /* Make blk_queue_enter() reexamine the DYING flag. */ >> wake_up_all(&q->mq_freeze_wq); >> } > > Why has blk_queue_start_drain() been modified? I don't see any reference > in the patch description to blk_queue_start_drain(). Am I perhaps missing > something? blk_mq_wake_waiters() has already been called in blk_freeze_queue_start(), so I thought it can be removed from blk_queue_start_drain(). > >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 4ecb9db62337..9eb3139e713a 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q) >> if (++q->mq_freeze_depth == 1) { >> percpu_ref_kill(&q->q_usage_counter); >> mutex_unlock(&q->mq_freeze_lock); >> - if (queue_is_mq(q)) >> + if (queue_is_mq(q)) { >> + blk_mq_wake_waiters(q); >> blk_mq_run_hw_queues(q, false); >> + } >> } else { >> mutex_unlock(&q->mq_freeze_lock); >> } > > Why would the above change be necessary? If the blk_queue_enter() call > by blk_mq_alloc_request() succeeds and blk_mq_get_tag() calls > io_schedule(), io_schedule() will be woken up indirectly by the > blk_mq_run_hw_queues() call because that call will free one of the tags > that the io_schedule() call is waiting for. This patch is a workaround solution. I think the hang is caused by a lost wakeup, so after blk_mq_run_hw_queues(), t1 is still waiting for the tag. bt = 0xFFFFFF802F9C6790 -> ( sb = ( depth = 62, shift = 6, map_nr = 1, round_robin = FALSE, map = 0xFFFFFF803BF97000, alloc_hint = 0x00000049119B3F4C), wake_batch = 6, wake_index = (counter = 0), ws = 0xFFFFFF803BEBCA00, ws_active = (counter = 1), min_shallow_depth = 48, completion_cnt = (counter = 1), wakeup_cnt = (counter = 0)) Upon analyzing the coredump, it was noticed that sbq->completion_cnt=1, and I can't figure out why. blk_mq_put_tag->sbitmap_queue_clear->sbitmap_queue_wake_up(sbq, 1) should be called multiple times, considering that sbq->ws_active=1, sbq->completion_cnt should be greater than 1. Looking forward to some advice from block layer experts. Thanks. > > Thanks, > > Bart.
diff --git a/block/blk-core.c b/block/blk-core.c index a16b5abdbbf5..e1eacfad6e5b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q) * prevent I/O from crossing blk_queue_enter(). */ blk_freeze_queue_start(q); - if (queue_is_mq(q)) - blk_mq_wake_waiters(q); /* Make blk_queue_enter() reexamine the DYING flag. */ wake_up_all(&q->mq_freeze_wq); } diff --git a/block/blk-mq.c b/block/blk-mq.c index 4ecb9db62337..9eb3139e713a 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q) if (++q->mq_freeze_depth == 1) { percpu_ref_kill(&q->q_usage_counter); mutex_unlock(&q->mq_freeze_lock); - if (queue_is_mq(q)) + if (queue_is_mq(q)) { + blk_mq_wake_waiters(q); blk_mq_run_hw_queues(q, false); + } } else { mutex_unlock(&q->mq_freeze_lock); }
The depth is 62, and the wake_batch is 8. In the following situation, the task would hang forever. t1: t2: t3: blk_mq_get_tag . . io_schedule . . elevator_switch . blk_mq_freeze_queue . blk_freeze_queue_start . blk_mq_freeze_queue_wait . blk_mq_submit_bio __bio_queue_enter Fix this issue by waking up all the waiters sleeping on tags after freezing the queue. Signed-off-by: Yang Yang <yang.yang@vivo.com> --- block/blk-core.c | 2 -- block/blk-mq.c | 4 +++- 2 files changed, 3 insertions(+), 3 deletions(-)