diff mbox series

[v2,1/3] block: fix missing dispatching request when queue is started or unquiesced

Message ID 20240903081653.65613-2-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Fix some starvation problems in block layer | expand

Commit Message

Muchun Song Sept. 3, 2024, 8:16 a.m. UTC
Supposing the following scenario with a virtio_blk driver.

CPU0                                    CPU1                                    CPU2

blk_mq_try_issue_directly()
    __blk_mq_issue_directly()
        q->mq_ops->queue_rq()
            virtio_queue_rq()
                blk_mq_stop_hw_queue()
                                        blk_mq_try_issue_directly()             virtblk_done()
                                            if (blk_mq_hctx_stopped())
    blk_mq_request_bypass_insert()                                                  blk_mq_start_stopped_hw_queue()
    blk_mq_run_hw_queue()                                                               blk_mq_run_hw_queue()
                                                blk_mq_insert_request()
                                                return // Who is responsible for dispatching this IO request?

After CPU0 has marked the queue as stopped, CPU1 will see the queue is stopped.
But before CPU1 puts the request on the dispatch list, CPU2 receives the interrupt
of completion of request, so it will run the hardware queue and marks the queue
as non-stopped. Meanwhile, CPU1 also runs the same hardware queue. After both CPU1
and CPU2 complete blk_mq_run_hw_queue(), CPU1 just puts the request to the same
hardware queue and returns. It misses dispatching a request. Fix it by running
the hardware queue explicitly. And blk_mq_request_issue_directly() should handle
a similar situation. Fix it as well.

Fixes: d964f04a8fde8 ("blk-mq: fix direct issue")
Cc: stable@vger.kernel.org
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jens Axboe Sept. 10, 2024, 1:17 p.m. UTC | #1
On 9/3/24 2:16 AM, Muchun Song wrote:
> Supposing the following scenario with a virtio_blk driver.
> 
> CPU0                                    CPU1                                    CPU2
> 
> blk_mq_try_issue_directly()
>     __blk_mq_issue_directly()
>         q->mq_ops->queue_rq()
>             virtio_queue_rq()
>                 blk_mq_stop_hw_queue()
>                                         blk_mq_try_issue_directly()             virtblk_done()
>                                             if (blk_mq_hctx_stopped())
>     blk_mq_request_bypass_insert()                                                  blk_mq_start_stopped_hw_queue()
>     blk_mq_run_hw_queue()                                                               blk_mq_run_hw_queue()
>                                                 blk_mq_insert_request()
>                                                 return // Who is responsible for dispatching this IO request?
> 
> After CPU0 has marked the queue as stopped, CPU1 will see the queue is stopped.
> But before CPU1 puts the request on the dispatch list, CPU2 receives the interrupt
> of completion of request, so it will run the hardware queue and marks the queue
> as non-stopped. Meanwhile, CPU1 also runs the same hardware queue. After both CPU1
> and CPU2 complete blk_mq_run_hw_queue(), CPU1 just puts the request to the same
> hardware queue and returns. It misses dispatching a request. Fix it by running
> the hardware queue explicitly. And blk_mq_request_issue_directly() should handle
> a similar situation. Fix it as well.

Patch looks fine, but this commit message is waaaaay too wide. Please
limit it to 72-74 chars. The above ordering is diagram is going to
otherwise be unreadable in a git log viewing in a terminal.
Muchun Song Sept. 11, 2024, 2:43 a.m. UTC | #2
> On Sep 10, 2024, at 21:17, Jens Axboe <axboe@kernel.dk> wrote:
> 
> On 9/3/24 2:16 AM, Muchun Song wrote:
>> Supposing the following scenario with a virtio_blk driver.
>> 
>> CPU0                                    CPU1                                    CPU2
>> 
>> blk_mq_try_issue_directly()
>>    __blk_mq_issue_directly()
>>        q->mq_ops->queue_rq()
>>            virtio_queue_rq()
>>                blk_mq_stop_hw_queue()
>>                                        blk_mq_try_issue_directly()             virtblk_done()
>>                                            if (blk_mq_hctx_stopped())
>>    blk_mq_request_bypass_insert()                                                  blk_mq_start_stopped_hw_queue()
>>    blk_mq_run_hw_queue()                                                               blk_mq_run_hw_queue()
>>                                                blk_mq_insert_request()
>>                                                return // Who is responsible for dispatching this IO request?
>> 
>> After CPU0 has marked the queue as stopped, CPU1 will see the queue is stopped.
>> But before CPU1 puts the request on the dispatch list, CPU2 receives the interrupt
>> of completion of request, so it will run the hardware queue and marks the queue
>> as non-stopped. Meanwhile, CPU1 also runs the same hardware queue. After both CPU1
>> and CPU2 complete blk_mq_run_hw_queue(), CPU1 just puts the request to the same
>> hardware queue and returns. It misses dispatching a request. Fix it by running
>> the hardware queue explicitly. And blk_mq_request_issue_directly() should handle
>> a similar situation. Fix it as well.
> 
> Patch looks fine, but this commit message is waaaaay too wide. Please
> limit it to 72-74 chars. The above ordering is diagram is going to
> otherwise be unreadable in a git log viewing in a terminal.

Thanks for your reply. I'll adjust those lines to make the digram more
readable.

Muchun,
Thanks.

> 
> -- 
> Jens Axboe
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e3c3c0c21b553..b2d0f22de0c7f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2619,6 +2619,7 @@  static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 
 	if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) {
 		blk_mq_insert_request(rq, 0);
+		blk_mq_run_hw_queue(hctx, false);
 		return;
 	}
 
@@ -2649,6 +2650,7 @@  static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
 
 	if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) {
 		blk_mq_insert_request(rq, 0);
+		blk_mq_run_hw_queue(hctx, false);
 		return BLK_STS_OK;
 	}