Message ID | 20220718123528.178714-1-yuyufen@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: run queue after issuing the last request of the plug list | expand |
On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: > We do test on a virtio scsi device (/dev/sda) and the default mq > scheduler is 'none'. We found a IO hung as following: > > blk_finish_plug > blk_mq_plug_issue_direct > scsi_mq_get_budget > //get budget_token fail and sdev->restarts=1 > > scsi_end_request > scsi_run_queue_async > //sdev->restart=0 and run queue > > blk_mq_request_bypass_insert > //add request to hctx->dispatch list Here the issue shouldn't be related with scsi's get budget or scsi_run_queue_async. If blk-mq adds request into ->dispatch_list, it is blk-mq core's responsibility to re-run queue for moving on. Can you investigate a bit more why blk-mq doesn't run queue after adding request to hctx dispatch list? Thanks, Ming
On 2022/7/19 17:26, Ming Lei wrote: > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: >> We do test on a virtio scsi device (/dev/sda) and the default mq >> scheduler is 'none'. We found a IO hung as following: >> >> blk_finish_plug >> blk_mq_plug_issue_direct >> scsi_mq_get_budget >> //get budget_token fail and sdev->restarts=1 >> >> scsi_end_request >> scsi_run_queue_async >> //sdev->restart=0 and run queue >> >> blk_mq_request_bypass_insert >> //add request to hctx->dispatch list > > Here the issue shouldn't be related with scsi's get budget or > scsi_run_queue_async. > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's > responsibility to re-run queue for moving on. Can you investigate a > bit more why blk-mq doesn't run queue after adding request to > hctx dispatch list? > In my IO hung scenario, no one issue any IO any more after calling blk_finish_plug(). There is no chance of run queue. Thanks, Yufen
Hi, Ming! 在 2022/07/19 17:26, Ming Lei 写道: > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: >> We do test on a virtio scsi device (/dev/sda) and the default mq >> scheduler is 'none'. We found a IO hung as following: >> >> blk_finish_plug >> blk_mq_plug_issue_direct >> scsi_mq_get_budget >> //get budget_token fail and sdev->restarts=1 >> >> scsi_end_request >> scsi_run_queue_async >> //sdev->restart=0 and run queue >> >> blk_mq_request_bypass_insert >> //add request to hctx->dispatch list > > Here the issue shouldn't be related with scsi's get budget or > scsi_run_queue_async. > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's > responsibility to re-run queue for moving on. Can you investigate a > bit more why blk-mq doesn't run queue after adding request to > hctx dispatch list? I think Yufen is probably thinking about the following Concurrent scenario: blk_mq_flush_plug_list # assume there are three rq blk_mq_plug_issue_direct blk_mq_request_issue_directly # dispatch rq1, succeed blk_mq_request_issue_directly # dispatch rq2 __blk_mq_try_issue_directly blk_mq_get_dispatch_budget scsi_mq_get_budget atomic_inc(&sdev->restarts); # rq2 failed to get budget # restarts is 1 now scsi_end_request # rq1 is completed ┊scsi_run_queue_async ┊ atomic_cmpxchg(&sdev->restarts, old, 0) == old ┊ # set restarts to 0 ┊ blk_mq_run_hw_queues ┊ # hctx->dispatch list is empty blk_mq_request_bypass_insert # insert rq2 to hctx->dispatch list blk_mq_dispatch_plug_list # continue to dispatch rq3 blk_mq_sched_insert_requests blk_mq_try_issue_list_directly # blk_mq_run_hw_queue() won't be called # because dispatching is succeed scsi_end_request ┊# rq3 is complete ┊ scsi_run_queue_async ┊ # restarts is 0, won't run queue The root cause is that the failed dispatching is not the last rq, and last rq is dispatched sucessfully. Thanks, Kuai > > > > Thanks, > Ming > > . >
On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: > Hi, Ming! > > 在 2022/07/19 17:26, Ming Lei 写道: > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: > > > We do test on a virtio scsi device (/dev/sda) and the default mq > > > scheduler is 'none'. We found a IO hung as following: > > > > > > blk_finish_plug > > > blk_mq_plug_issue_direct > > > scsi_mq_get_budget > > > //get budget_token fail and sdev->restarts=1 > > > > > > scsi_end_request > > > scsi_run_queue_async > > > //sdev->restart=0 and run queue > > > > > > blk_mq_request_bypass_insert > > > //add request to hctx->dispatch list > > > > Here the issue shouldn't be related with scsi's get budget or > > scsi_run_queue_async. > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's > > responsibility to re-run queue for moving on. Can you investigate a > > bit more why blk-mq doesn't run queue after adding request to > > hctx dispatch list? > > I think Yufen is probably thinking about the following Concurrent > scenario: > > blk_mq_flush_plug_list > # assume there are three rq > blk_mq_plug_issue_direct > blk_mq_request_issue_directly > # dispatch rq1, succeed > blk_mq_request_issue_directly > # dispatch rq2 > __blk_mq_try_issue_directly > blk_mq_get_dispatch_budget > scsi_mq_get_budget > atomic_inc(&sdev->restarts); > # rq2 failed to get budget > # restarts is 1 now > scsi_end_request > # rq1 is completed > ┊scsi_run_queue_async > ┊ atomic_cmpxchg(&sdev->restarts, > old, 0) == old > ┊ # set restarts to 0 > ┊ blk_mq_run_hw_queues > ┊ # hctx->dispatch list is empty > blk_mq_request_bypass_insert > # insert rq2 to hctx->dispatch list After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), no matter if list_empty(list) is empty or not, queue will be run either from blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). And rq2 should be visible to the run queue, just wondering why rq2 isn't issued finally? Thanks, Ming
Hi, Ming! 在 2022/07/25 23:43, Ming Lei 写道: > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: >> Hi, Ming! >> >> 在 2022/07/19 17:26, Ming Lei 写道: >>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: >>>> We do test on a virtio scsi device (/dev/sda) and the default mq >>>> scheduler is 'none'. We found a IO hung as following: >>>> >>>> blk_finish_plug >>>> blk_mq_plug_issue_direct >>>> scsi_mq_get_budget >>>> //get budget_token fail and sdev->restarts=1 >>>> >>>> scsi_end_request >>>> scsi_run_queue_async >>>> //sdev->restart=0 and run queue >>>> >>>> blk_mq_request_bypass_insert >>>> //add request to hctx->dispatch list >>> >>> Here the issue shouldn't be related with scsi's get budget or >>> scsi_run_queue_async. >>> >>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's >>> responsibility to re-run queue for moving on. Can you investigate a >>> bit more why blk-mq doesn't run queue after adding request to >>> hctx dispatch list? >> >> I think Yufen is probably thinking about the following Concurrent >> scenario: >> >> blk_mq_flush_plug_list >> # assume there are three rq >> blk_mq_plug_issue_direct >> blk_mq_request_issue_directly >> # dispatch rq1, succeed >> blk_mq_request_issue_directly >> # dispatch rq2 >> __blk_mq_try_issue_directly >> blk_mq_get_dispatch_budget >> scsi_mq_get_budget >> atomic_inc(&sdev->restarts); >> # rq2 failed to get budget >> # restarts is 1 now >> scsi_end_request >> # rq1 is completed >> ┊scsi_run_queue_async >> ┊ atomic_cmpxchg(&sdev->restarts, >> old, 0) == old >> ┊ # set restarts to 0 >> ┊ blk_mq_run_hw_queues >> ┊ # hctx->dispatch list is empty >> blk_mq_request_bypass_insert >> # insert rq2 to hctx->dispatch list > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), > no matter if list_empty(list) is empty or not, queue will be run either from > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() is called from blk_mq_try_issue_list_directly(), list_empty() won't pass, thus thus blk_mq_request_bypass_insert() won't run queue. 2) after blk_mq_sched_insert_requests() dispatchs rq3, list_empty() will pass, thus blk_mq_sched_insert_requests() won't run queue. (That's what yufen tries to fix.) > > And rq2 should be visible to the run queue, just wondering why rq2 isn't > issued finally? > > > Thanks, > Ming > > . >
On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: > Hi, Ming! > > 在 2022/07/25 23:43, Ming Lei 写道: > > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: > > > Hi, Ming! > > > > > > 在 2022/07/19 17:26, Ming Lei 写道: > > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: > > > > > We do test on a virtio scsi device (/dev/sda) and the default mq > > > > > scheduler is 'none'. We found a IO hung as following: > > > > > > > > > > blk_finish_plug > > > > > blk_mq_plug_issue_direct > > > > > scsi_mq_get_budget > > > > > //get budget_token fail and sdev->restarts=1 > > > > > > > > > > scsi_end_request > > > > > scsi_run_queue_async > > > > > //sdev->restart=0 and run queue > > > > > > > > > > blk_mq_request_bypass_insert > > > > > //add request to hctx->dispatch list > > > > > > > > Here the issue shouldn't be related with scsi's get budget or > > > > scsi_run_queue_async. > > > > > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's > > > > responsibility to re-run queue for moving on. Can you investigate a > > > > bit more why blk-mq doesn't run queue after adding request to > > > > hctx dispatch list? > > > > > > I think Yufen is probably thinking about the following Concurrent > > > scenario: > > > > > > blk_mq_flush_plug_list > > > # assume there are three rq > > > blk_mq_plug_issue_direct > > > blk_mq_request_issue_directly > > > # dispatch rq1, succeed > > > blk_mq_request_issue_directly > > > # dispatch rq2 > > > __blk_mq_try_issue_directly > > > blk_mq_get_dispatch_budget > > > scsi_mq_get_budget > > > atomic_inc(&sdev->restarts); > > > # rq2 failed to get budget > > > # restarts is 1 now > > > scsi_end_request > > > # rq1 is completed > > > ┊scsi_run_queue_async > > > ┊ atomic_cmpxchg(&sdev->restarts, > > > old, 0) == old > > > ┊ # set restarts to 0 > > > ┊ blk_mq_run_hw_queues > > > ┊ # hctx->dispatch list is empty > > > blk_mq_request_bypass_insert > > > # insert rq2 to hctx->dispatch list > > > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), > > no matter if list_empty(list) is empty or not, queue will be run either from > > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). > > 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() > is called from blk_mq_try_issue_list_directly(), list_empty() won't > pass, thus thus blk_mq_request_bypass_insert() won't run queue. Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch list, the loop is broken and blk_mq_try_issue_list_directly() returns to blk_mq_sched_insert_requests() in which list_empty() is false, so blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue is still run. Also not sure why you make rq3 involved, since the list is local list on stack, and it can be operated concurrently. Thanks, Ming
在 2022/07/26 9:46, Ming Lei 写道: > On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: >> Hi, Ming! >> >> 在 2022/07/25 23:43, Ming Lei 写道: >>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: >>>> Hi, Ming! >>>> >>>> 在 2022/07/19 17:26, Ming Lei 写道: >>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: >>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq >>>>>> scheduler is 'none'. We found a IO hung as following: >>>>>> >>>>>> blk_finish_plug >>>>>> blk_mq_plug_issue_direct >>>>>> scsi_mq_get_budget >>>>>> //get budget_token fail and sdev->restarts=1 >>>>>> >>>>>> scsi_end_request >>>>>> scsi_run_queue_async >>>>>> //sdev->restart=0 and run queue >>>>>> >>>>>> blk_mq_request_bypass_insert >>>>>> //add request to hctx->dispatch list >>>>> >>>>> Here the issue shouldn't be related with scsi's get budget or >>>>> scsi_run_queue_async. >>>>> >>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's >>>>> responsibility to re-run queue for moving on. Can you investigate a >>>>> bit more why blk-mq doesn't run queue after adding request to >>>>> hctx dispatch list? >>>> >>>> I think Yufen is probably thinking about the following Concurrent >>>> scenario: >>>> >>>> blk_mq_flush_plug_list >>>> # assume there are three rq >>>> blk_mq_plug_issue_direct >>>> blk_mq_request_issue_directly >>>> # dispatch rq1, succeed >>>> blk_mq_request_issue_directly >>>> # dispatch rq2 >>>> __blk_mq_try_issue_directly >>>> blk_mq_get_dispatch_budget >>>> scsi_mq_get_budget >>>> atomic_inc(&sdev->restarts); >>>> # rq2 failed to get budget >>>> # restarts is 1 now >>>> scsi_end_request >>>> # rq1 is completed >>>> ┊scsi_run_queue_async >>>> ┊ atomic_cmpxchg(&sdev->restarts, >>>> old, 0) == old >>>> ┊ # set restarts to 0 >>>> ┊ blk_mq_run_hw_queues >>>> ┊ # hctx->dispatch list is empty >>>> blk_mq_request_bypass_insert >>>> # insert rq2 to hctx->dispatch list >>> >>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), >>> no matter if list_empty(list) is empty or not, queue will be run either from >>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). >> >> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() >> is called from blk_mq_try_issue_list_directly(), list_empty() won't >> pass, thus thus blk_mq_request_bypass_insert() won't run queue. > > Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch > list, the loop is broken and blk_mq_try_issue_list_directly() returns to > blk_mq_sched_insert_requests() in which list_empty() is false, so > blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue > is still run. > > Also not sure why you make rq3 involved, since the list is local list on > stack, and it can be operated concurrently. I make rq3 involved because there are some conditions that blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from blk_mq_sched_insert_requests(): This is the case I'm thinking, if blk_mq_try_issue_list_directly() succeed from blk_mq_sched_insert_requests(), and list become empty. Thanks, Kuai > > Thanks, > Ming > > . >
On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote: > 在 2022/07/26 9:46, Ming Lei 写道: > > On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: > > > Hi, Ming! > > > > > > 在 2022/07/25 23:43, Ming Lei 写道: > > > > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: > > > > > Hi, Ming! > > > > > > > > > > 在 2022/07/19 17:26, Ming Lei 写道: > > > > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: > > > > > > > We do test on a virtio scsi device (/dev/sda) and the default mq > > > > > > > scheduler is 'none'. We found a IO hung as following: > > > > > > > > > > > > > > blk_finish_plug > > > > > > > blk_mq_plug_issue_direct > > > > > > > scsi_mq_get_budget > > > > > > > //get budget_token fail and sdev->restarts=1 > > > > > > > > > > > > > > scsi_end_request > > > > > > > scsi_run_queue_async > > > > > > > //sdev->restart=0 and run queue > > > > > > > > > > > > > > blk_mq_request_bypass_insert > > > > > > > //add request to hctx->dispatch list > > > > > > > > > > > > Here the issue shouldn't be related with scsi's get budget or > > > > > > scsi_run_queue_async. > > > > > > > > > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's > > > > > > responsibility to re-run queue for moving on. Can you investigate a > > > > > > bit more why blk-mq doesn't run queue after adding request to > > > > > > hctx dispatch list? > > > > > > > > > > I think Yufen is probably thinking about the following Concurrent > > > > > scenario: > > > > > > > > > > blk_mq_flush_plug_list > > > > > # assume there are three rq > > > > > blk_mq_plug_issue_direct > > > > > blk_mq_request_issue_directly > > > > > # dispatch rq1, succeed > > > > > blk_mq_request_issue_directly > > > > > # dispatch rq2 > > > > > __blk_mq_try_issue_directly > > > > > blk_mq_get_dispatch_budget > > > > > scsi_mq_get_budget > > > > > atomic_inc(&sdev->restarts); > > > > > # rq2 failed to get budget > > > > > # restarts is 1 now > > > > > scsi_end_request > > > > > # rq1 is completed > > > > > ┊scsi_run_queue_async > > > > > ┊ atomic_cmpxchg(&sdev->restarts, > > > > > old, 0) == old > > > > > ┊ # set restarts to 0 > > > > > ┊ blk_mq_run_hw_queues > > > > > ┊ # hctx->dispatch list is empty > > > > > blk_mq_request_bypass_insert > > > > > # insert rq2 to hctx->dispatch list > > > > > > > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), > > > > no matter if list_empty(list) is empty or not, queue will be run either from > > > > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). > > > > > > 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() > > > is called from blk_mq_try_issue_list_directly(), list_empty() won't > > > pass, thus thus blk_mq_request_bypass_insert() won't run queue. > > > > Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch > > list, the loop is broken and blk_mq_try_issue_list_directly() returns to > > blk_mq_sched_insert_requests() in which list_empty() is false, so > > blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue > > is still run. > > > > Also not sure why you make rq3 involved, since the list is local list on > > stack, and it can be operated concurrently. > > I make rq3 involved because there are some conditions that > blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from > blk_mq_sched_insert_requests(): The two won't be called if list_empty() is true, and will be called if !list_empty(). That is why I mentioned run queue has been done after rq2 is added to ->dispatch_list. Can you show the debugfs log after the hang is caused? thanks, Ming
Hi, Ming 在 2022/07/26 10:32, Ming Lei 写道: > On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote: >> 在 2022/07/26 9:46, Ming Lei 写道: >>> On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: >>>> Hi, Ming! >>>> >>>> 在 2022/07/25 23:43, Ming Lei 写道: >>>>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: >>>>>> Hi, Ming! >>>>>> >>>>>> 在 2022/07/19 17:26, Ming Lei 写道: >>>>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: >>>>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq >>>>>>>> scheduler is 'none'. We found a IO hung as following: >>>>>>>> >>>>>>>> blk_finish_plug >>>>>>>> blk_mq_plug_issue_direct >>>>>>>> scsi_mq_get_budget >>>>>>>> //get budget_token fail and sdev->restarts=1 >>>>>>>> >>>>>>>> scsi_end_request >>>>>>>> scsi_run_queue_async >>>>>>>> //sdev->restart=0 and run queue >>>>>>>> >>>>>>>> blk_mq_request_bypass_insert >>>>>>>> //add request to hctx->dispatch list >>>>>>> >>>>>>> Here the issue shouldn't be related with scsi's get budget or >>>>>>> scsi_run_queue_async. >>>>>>> >>>>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's >>>>>>> responsibility to re-run queue for moving on. Can you investigate a >>>>>>> bit more why blk-mq doesn't run queue after adding request to >>>>>>> hctx dispatch list? >>>>>> >>>>>> I think Yufen is probably thinking about the following Concurrent >>>>>> scenario: >>>>>> >>>>>> blk_mq_flush_plug_list >>>>>> # assume there are three rq >>>>>> blk_mq_plug_issue_direct >>>>>> blk_mq_request_issue_directly >>>>>> # dispatch rq1, succeed >>>>>> blk_mq_request_issue_directly >>>>>> # dispatch rq2 >>>>>> __blk_mq_try_issue_directly >>>>>> blk_mq_get_dispatch_budget >>>>>> scsi_mq_get_budget >>>>>> atomic_inc(&sdev->restarts); >>>>>> # rq2 failed to get budget >>>>>> # restarts is 1 now >>>>>> scsi_end_request >>>>>> # rq1 is completed >>>>>> ┊scsi_run_queue_async >>>>>> ┊ atomic_cmpxchg(&sdev->restarts, >>>>>> old, 0) == old >>>>>> ┊ # set restarts to 0 >>>>>> ┊ blk_mq_run_hw_queues >>>>>> ┊ # hctx->dispatch list is empty >>>>>> blk_mq_request_bypass_insert >>>>>> # insert rq2 to hctx->dispatch list >>>>> >>>>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), >>>>> no matter if list_empty(list) is empty or not, queue will be run either from >>>>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). >>>> >>>> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() >>>> is called from blk_mq_try_issue_list_directly(), list_empty() won't >>>> pass, thus thus blk_mq_request_bypass_insert() won't run queue. >>> >>> Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch >>> list, the loop is broken and blk_mq_try_issue_list_directly() returns to >>> blk_mq_sched_insert_requests() in which list_empty() is false, so >>> blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue >>> is still run. >>> >>> Also not sure why you make rq3 involved, since the list is local list on >>> stack, and it can be operated concurrently. >> >> I make rq3 involved because there are some conditions that >> blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from >> blk_mq_sched_insert_requests(): > > The two won't be called if list_empty() is true, and will be called if > !list_empty(). > > That is why I mentioned run queue has been done after rq2 is added to > ->dispatch_list. I don't follow here, it's right after rq2 is inserted to dispatch list, list is not empty, and blk_mq_sched_insert_requests() will be called. However, do you think that it's impossible that blk_mq_sched_insert_requests() can dispatch rq in the list and list will become empty? > > Can you show the debugfs log after the hang is caused? I didn't repoduce the problem myself, perhaps Yufen can show the log. Thanks, Kuai
On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote: > Hi, Ming > 在 2022/07/26 10:32, Ming Lei 写道: > > On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote: > > > 在 2022/07/26 9:46, Ming Lei 写道: > > > > On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: > > > > > Hi, Ming! > > > > > > > > > > 在 2022/07/25 23:43, Ming Lei 写道: > > > > > > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: > > > > > > > Hi, Ming! > > > > > > > > > > > > > > 在 2022/07/19 17:26, Ming Lei 写道: > > > > > > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: > > > > > > > > > We do test on a virtio scsi device (/dev/sda) and the default mq > > > > > > > > > scheduler is 'none'. We found a IO hung as following: > > > > > > > > > > > > > > > > > > blk_finish_plug > > > > > > > > > blk_mq_plug_issue_direct > > > > > > > > > scsi_mq_get_budget > > > > > > > > > //get budget_token fail and sdev->restarts=1 > > > > > > > > > > > > > > > > > > scsi_end_request > > > > > > > > > scsi_run_queue_async > > > > > > > > > //sdev->restart=0 and run queue > > > > > > > > > > > > > > > > > > blk_mq_request_bypass_insert > > > > > > > > > //add request to hctx->dispatch list > > > > > > > > > > > > > > > > Here the issue shouldn't be related with scsi's get budget or > > > > > > > > scsi_run_queue_async. > > > > > > > > > > > > > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's > > > > > > > > responsibility to re-run queue for moving on. Can you investigate a > > > > > > > > bit more why blk-mq doesn't run queue after adding request to > > > > > > > > hctx dispatch list? > > > > > > > > > > > > > > I think Yufen is probably thinking about the following Concurrent > > > > > > > scenario: > > > > > > > > > > > > > > blk_mq_flush_plug_list > > > > > > > # assume there are three rq > > > > > > > blk_mq_plug_issue_direct > > > > > > > blk_mq_request_issue_directly > > > > > > > # dispatch rq1, succeed > > > > > > > blk_mq_request_issue_directly > > > > > > > # dispatch rq2 > > > > > > > __blk_mq_try_issue_directly > > > > > > > blk_mq_get_dispatch_budget > > > > > > > scsi_mq_get_budget > > > > > > > atomic_inc(&sdev->restarts); > > > > > > > # rq2 failed to get budget > > > > > > > # restarts is 1 now > > > > > > > scsi_end_request > > > > > > > # rq1 is completed > > > > > > > ┊scsi_run_queue_async > > > > > > > ┊ atomic_cmpxchg(&sdev->restarts, > > > > > > > old, 0) == old > > > > > > > ┊ # set restarts to 0 > > > > > > > ┊ blk_mq_run_hw_queues > > > > > > > ┊ # hctx->dispatch list is empty > > > > > > > blk_mq_request_bypass_insert > > > > > > > # insert rq2 to hctx->dispatch list > > > > > > > > > > > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), > > > > > > no matter if list_empty(list) is empty or not, queue will be run either from > > > > > > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). > > > > > > > > > > 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() > > > > > is called from blk_mq_try_issue_list_directly(), list_empty() won't > > > > > pass, thus thus blk_mq_request_bypass_insert() won't run queue. > > > > > > > > Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch > > > > list, the loop is broken and blk_mq_try_issue_list_directly() returns to > > > > blk_mq_sched_insert_requests() in which list_empty() is false, so > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue > > > > is still run. > > > > > > > > Also not sure why you make rq3 involved, since the list is local list on > > > > stack, and it can be operated concurrently. > > > > > > I make rq3 involved because there are some conditions that > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from > > > blk_mq_sched_insert_requests(): > > > > The two won't be called if list_empty() is true, and will be called if > > !list_empty(). > > > > That is why I mentioned run queue has been done after rq2 is added to > > ->dispatch_list. > > I don't follow here, it's right after rq2 is inserted to dispatch list, > list is not empty, and blk_mq_sched_insert_requests() will be called. > However, do you think that it's impossible that > blk_mq_sched_insert_requests() can dispatch rq in the list and list > will become empty? Please take a look at blk_mq_sched_insert_requests(). When codes runs into blk_mq_sched_insert_requests(), the following blk_mq_run_hw_queue() will be run always, how does list empty or not make a difference there? In short, after rq2 is added into ->dispatch, the queue is guaranteed to run, the handling logic isn't wrong. That said that the reported hang isn't root caused yet, is it? Thanks, Ming
Hi, Ming 在 2022/07/26 11:02, Ming Lei 写道: > On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote: >> Hi, Ming >> 在 2022/07/26 10:32, Ming Lei 写道: >>> On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote: >>>> 在 2022/07/26 9:46, Ming Lei 写道: >>>>> On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: >>>>>> Hi, Ming! >>>>>> >>>>>> 在 2022/07/25 23:43, Ming Lei 写道: >>>>>>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: >>>>>>>> Hi, Ming! >>>>>>>> >>>>>>>> 在 2022/07/19 17:26, Ming Lei 写道: >>>>>>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: >>>>>>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq >>>>>>>>>> scheduler is 'none'. We found a IO hung as following: >>>>>>>>>> >>>>>>>>>> blk_finish_plug >>>>>>>>>> blk_mq_plug_issue_direct >>>>>>>>>> scsi_mq_get_budget >>>>>>>>>> //get budget_token fail and sdev->restarts=1 >>>>>>>>>> >>>>>>>>>> scsi_end_request >>>>>>>>>> scsi_run_queue_async >>>>>>>>>> //sdev->restart=0 and run queue >>>>>>>>>> >>>>>>>>>> blk_mq_request_bypass_insert >>>>>>>>>> //add request to hctx->dispatch list >>>>>>>>> >>>>>>>>> Here the issue shouldn't be related with scsi's get budget or >>>>>>>>> scsi_run_queue_async. >>>>>>>>> >>>>>>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's >>>>>>>>> responsibility to re-run queue for moving on. Can you investigate a >>>>>>>>> bit more why blk-mq doesn't run queue after adding request to >>>>>>>>> hctx dispatch list? >>>>>>>> >>>>>>>> I think Yufen is probably thinking about the following Concurrent >>>>>>>> scenario: >>>>>>>> >>>>>>>> blk_mq_flush_plug_list >>>>>>>> # assume there are three rq >>>>>>>> blk_mq_plug_issue_direct >>>>>>>> blk_mq_request_issue_directly >>>>>>>> # dispatch rq1, succeed >>>>>>>> blk_mq_request_issue_directly >>>>>>>> # dispatch rq2 >>>>>>>> __blk_mq_try_issue_directly >>>>>>>> blk_mq_get_dispatch_budget >>>>>>>> scsi_mq_get_budget >>>>>>>> atomic_inc(&sdev->restarts); >>>>>>>> # rq2 failed to get budget >>>>>>>> # restarts is 1 now >>>>>>>> scsi_end_request >>>>>>>> # rq1 is completed >>>>>>>> ┊scsi_run_queue_async >>>>>>>> ┊ atomic_cmpxchg(&sdev->restarts, >>>>>>>> old, 0) == old >>>>>>>> ┊ # set restarts to 0 >>>>>>>> ┊ blk_mq_run_hw_queues >>>>>>>> ┊ # hctx->dispatch list is empty >>>>>>>> blk_mq_request_bypass_insert >>>>>>>> # insert rq2 to hctx->dispatch list >>>>>>> >>>>>>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), >>>>>>> no matter if list_empty(list) is empty or not, queue will be run either from >>>>>>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). >>>>>> >>>>>> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() >>>>>> is called from blk_mq_try_issue_list_directly(), list_empty() won't >>>>>> pass, thus thus blk_mq_request_bypass_insert() won't run queue. >>>>> >>>>> Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch >>>>> list, the loop is broken and blk_mq_try_issue_list_directly() returns to >>>>> blk_mq_sched_insert_requests() in which list_empty() is false, so >>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue >>>>> is still run. >>>>> >>>>> Also not sure why you make rq3 involved, since the list is local list on >>>>> stack, and it can be operated concurrently. >>>> >>>> I make rq3 involved because there are some conditions that >>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from >>>> blk_mq_sched_insert_requests(): >>> >>> The two won't be called if list_empty() is true, and will be called if >>> !list_empty(). >>> >>> That is why I mentioned run queue has been done after rq2 is added to >>> ->dispatch_list. >> >> I don't follow here, it's right after rq2 is inserted to dispatch list, >> list is not empty, and blk_mq_sched_insert_requests() will be called. >> However, do you think that it's impossible that >> blk_mq_sched_insert_requests() can dispatch rq in the list and list >> will become empty? > > Please take a look at blk_mq_sched_insert_requests(). > > When codes runs into blk_mq_sched_insert_requests(), the following > blk_mq_run_hw_queue() will be run always, how does list empty or not > make a difference there? This is strange, always blk_mq_run_hw_queue() is exactly what Yufen tries to do in this patch, are we look at different code? I'm copying blk_mq_sched_insert_requests() here, the code is from latest linux-next: 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx, 462 ┊ struct blk_mq_ctx *ctx, 463 ┊ struct list_head *list, bool run_queue_async) 464 { 465 struct elevator_queue *e; 466 struct request_queue *q = hctx->queue; 467 468 /* 469 ┊* blk_mq_sched_insert_requests() is called from flush plug 470 ┊* context only, and hold one usage counter to prevent queue 471 ┊* from being released. 472 ┊*/ 473 percpu_ref_get(&q->q_usage_counter); 474 475 e = hctx->queue->elevator; 476 if (e) { 477 e->type->ops.insert_requests(hctx, list, false); 478 } else { 479 /* 480 ┊* try to issue requests directly if the hw queue isn't 481 ┊* busy in case of 'none' scheduler, and this way may save 482 ┊* us one extra enqueue & dequeue to sw queue. 483 ┊*/ 484 if (!hctx->dispatch_busy && !run_queue_async) { 485 blk_mq_run_dispatch_ops(hctx->queue, 486 blk_mq_try_issue_list_directly(hctx, list)); 487 if (list_empty(list)) 488 goto out; 489 } 490 blk_mq_insert_requests(hctx, ctx, list); 491 } 492 493 blk_mq_run_hw_queue(hctx, run_queue_async); 494 out: 495 percpu_ref_put(&q->q_usage_counter); 496 } Here in line 487, if list_empty() is true, out label will skip run_queue(). > > In short, after rq2 is added into ->dispatch, the queue is guaranteed > to run, the handling logic isn't wrong. That said that the reported > hang isn't root caused yet, is it? > > > Thanks, > Ming > > . >
On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote: > Hi, Ming > > 在 2022/07/26 11:02, Ming Lei 写道: > > On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote: > > > Hi, Ming > > > 在 2022/07/26 10:32, Ming Lei 写道: > > > > On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote: > > > > > 在 2022/07/26 9:46, Ming Lei 写道: > > > > > > On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: > > > > > > > Hi, Ming! > > > > > > > > > > > > > > 在 2022/07/25 23:43, Ming Lei 写道: > > > > > > > > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: > > > > > > > > > Hi, Ming! > > > > > > > > > > > > > > > > > > 在 2022/07/19 17:26, Ming Lei 写道: > > > > > > > > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: > > > > > > > > > > > We do test on a virtio scsi device (/dev/sda) and the default mq > > > > > > > > > > > scheduler is 'none'. We found a IO hung as following: > > > > > > > > > > > > > > > > > > > > > > blk_finish_plug > > > > > > > > > > > blk_mq_plug_issue_direct > > > > > > > > > > > scsi_mq_get_budget > > > > > > > > > > > //get budget_token fail and sdev->restarts=1 > > > > > > > > > > > > > > > > > > > > > > scsi_end_request > > > > > > > > > > > scsi_run_queue_async > > > > > > > > > > > //sdev->restart=0 and run queue > > > > > > > > > > > > > > > > > > > > > > blk_mq_request_bypass_insert > > > > > > > > > > > //add request to hctx->dispatch list > > > > > > > > > > > > > > > > > > > > Here the issue shouldn't be related with scsi's get budget or > > > > > > > > > > scsi_run_queue_async. > > > > > > > > > > > > > > > > > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's > > > > > > > > > > responsibility to re-run queue for moving on. Can you investigate a > > > > > > > > > > bit more why blk-mq doesn't run queue after adding request to > > > > > > > > > > hctx dispatch list? > > > > > > > > > > > > > > > > > > I think Yufen is probably thinking about the following Concurrent > > > > > > > > > scenario: > > > > > > > > > > > > > > > > > > blk_mq_flush_plug_list > > > > > > > > > # assume there are three rq > > > > > > > > > blk_mq_plug_issue_direct > > > > > > > > > blk_mq_request_issue_directly > > > > > > > > > # dispatch rq1, succeed > > > > > > > > > blk_mq_request_issue_directly > > > > > > > > > # dispatch rq2 > > > > > > > > > __blk_mq_try_issue_directly > > > > > > > > > blk_mq_get_dispatch_budget > > > > > > > > > scsi_mq_get_budget > > > > > > > > > atomic_inc(&sdev->restarts); > > > > > > > > > # rq2 failed to get budget > > > > > > > > > # restarts is 1 now > > > > > > > > > scsi_end_request > > > > > > > > > # rq1 is completed > > > > > > > > > ┊scsi_run_queue_async > > > > > > > > > ┊ atomic_cmpxchg(&sdev->restarts, > > > > > > > > > old, 0) == old > > > > > > > > > ┊ # set restarts to 0 > > > > > > > > > ┊ blk_mq_run_hw_queues > > > > > > > > > ┊ # hctx->dispatch list is empty > > > > > > > > > blk_mq_request_bypass_insert > > > > > > > > > # insert rq2 to hctx->dispatch list > > > > > > > > > > > > > > > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), > > > > > > > > no matter if list_empty(list) is empty or not, queue will be run either from > > > > > > > > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). > > > > > > > > > > > > > > 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() > > > > > > > is called from blk_mq_try_issue_list_directly(), list_empty() won't > > > > > > > pass, thus thus blk_mq_request_bypass_insert() won't run queue. > > > > > > > > > > > > Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch > > > > > > list, the loop is broken and blk_mq_try_issue_list_directly() returns to > > > > > > blk_mq_sched_insert_requests() in which list_empty() is false, so > > > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue > > > > > > is still run. > > > > > > > > > > > > Also not sure why you make rq3 involved, since the list is local list on > > > > > > stack, and it can be operated concurrently. > > > > > > > > > > I make rq3 involved because there are some conditions that > > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from > > > > > blk_mq_sched_insert_requests(): > > > > > > > > The two won't be called if list_empty() is true, and will be called if > > > > !list_empty(). > > > > > > > > That is why I mentioned run queue has been done after rq2 is added to > > > > ->dispatch_list. > > > > > > I don't follow here, it's right after rq2 is inserted to dispatch list, > > > list is not empty, and blk_mq_sched_insert_requests() will be called. > > > However, do you think that it's impossible that > > > blk_mq_sched_insert_requests() can dispatch rq in the list and list > > > will become empty? > > > > Please take a look at blk_mq_sched_insert_requests(). > > > > When codes runs into blk_mq_sched_insert_requests(), the following > > blk_mq_run_hw_queue() will be run always, how does list empty or not > > make a difference there? > > This is strange, always blk_mq_run_hw_queue() is exactly what Yufen > tries to do in this patch, are we look at different code? No. > > I'm copying blk_mq_sched_insert_requests() here, the code is from > latest linux-next: > > 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx, > 462 ┊ struct blk_mq_ctx *ctx, > 463 ┊ struct list_head *list, bool > run_queue_async) > 464 { > 465 struct elevator_queue *e; > 466 struct request_queue *q = hctx->queue; > 467 > 468 /* > 469 ┊* blk_mq_sched_insert_requests() is called from flush plug > 470 ┊* context only, and hold one usage counter to prevent queue > 471 ┊* from being released. > 472 ┊*/ > 473 percpu_ref_get(&q->q_usage_counter); > 474 > 475 e = hctx->queue->elevator; > 476 if (e) { > 477 e->type->ops.insert_requests(hctx, list, false); > 478 } else { > 479 /* > 480 ┊* try to issue requests directly if the hw queue isn't > 481 ┊* busy in case of 'none' scheduler, and this way may > save > 482 ┊* us one extra enqueue & dequeue to sw queue. > 483 ┊*/ > 484 if (!hctx->dispatch_busy && !run_queue_async) { > 485 blk_mq_run_dispatch_ops(hctx->queue, > 486 blk_mq_try_issue_list_directly(hctx, > list)); > 487 if (list_empty(list)) > 488 goto out; > 489 } > 490 blk_mq_insert_requests(hctx, ctx, list); > 491 } > 492 > 493 blk_mq_run_hw_queue(hctx, run_queue_async); > 494 out: > 495 percpu_ref_put(&q->q_usage_counter); > 496 } > > Here in line 487, if list_empty() is true, out label will skip > run_queue(). If list_empty() is true, run queue is guaranteed to run in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE is returned from blk_mq_request_issue_directly(). ret = blk_mq_request_issue_directly(rq, list_empty(list)); if (ret != BLK_STS_OK) { if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { blk_mq_request_bypass_insert(rq, false, list_empty(list)); //run queue break; } blk_mq_end_request(rq, ret); errors++; } else queued++; So why do you try to add one extra run queue? Thanks, Ming
On 2022/7/26 11:21, Ming Lei wrote: > On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote: >> Hi, Ming >> >> 在 2022/07/26 11:02, Ming Lei 写道: >>> On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote: >>>> Hi, Ming >>>> 在 2022/07/26 10:32, Ming Lei 写道: >>>>> On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote: >>>>>> 在 2022/07/26 9:46, Ming Lei 写道: >>>>>>> On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: >>>>>>>> Hi, Ming! >>>>>>>> >>>>>>>> 在 2022/07/25 23:43, Ming Lei 写道: >>>>>>>>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: >>>>>>>>>> Hi, Ming! >>>>>>>>>> >>>>>>>>>> 在 2022/07/19 17:26, Ming Lei 写道: >>>>>>>>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: >>>>>>>>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq >>>>>>>>>>>> scheduler is 'none'. We found a IO hung as following: >>>>>>>>>>>> >>>>>>>>>>>> blk_finish_plug >>>>>>>>>>>> blk_mq_plug_issue_direct >>>>>>>>>>>> scsi_mq_get_budget >>>>>>>>>>>> //get budget_token fail and sdev->restarts=1 >>>>>>>>>>>> >>>>>>>>>>>> scsi_end_request >>>>>>>>>>>> scsi_run_queue_async >>>>>>>>>>>> //sdev->restart=0 and run queue >>>>>>>>>>>> >>>>>>>>>>>> blk_mq_request_bypass_insert >>>>>>>>>>>> //add request to hctx->dispatch list >>>>>>>>>>> >>>>>>>>>>> Here the issue shouldn't be related with scsi's get budget or >>>>>>>>>>> scsi_run_queue_async. >>>>>>>>>>> >>>>>>>>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's >>>>>>>>>>> responsibility to re-run queue for moving on. Can you investigate a >>>>>>>>>>> bit more why blk-mq doesn't run queue after adding request to >>>>>>>>>>> hctx dispatch list? >>>>>>>>>> >>>>>>>>>> I think Yufen is probably thinking about the following Concurrent >>>>>>>>>> scenario: >>>>>>>>>> >>>>>>>>>> blk_mq_flush_plug_list >>>>>>>>>> # assume there are three rq >>>>>>>>>> blk_mq_plug_issue_direct >>>>>>>>>> blk_mq_request_issue_directly >>>>>>>>>> # dispatch rq1, succeed >>>>>>>>>> blk_mq_request_issue_directly >>>>>>>>>> # dispatch rq2 >>>>>>>>>> __blk_mq_try_issue_directly >>>>>>>>>> blk_mq_get_dispatch_budget >>>>>>>>>> scsi_mq_get_budget >>>>>>>>>> atomic_inc(&sdev->restarts); >>>>>>>>>> # rq2 failed to get budget >>>>>>>>>> # restarts is 1 now >>>>>>>>>> scsi_end_request >>>>>>>>>> # rq1 is completed >>>>>>>>>> ┊scsi_run_queue_async >>>>>>>>>> ┊ atomic_cmpxchg(&sdev->restarts, >>>>>>>>>> old, 0) == old >>>>>>>>>> ┊ # set restarts to 0 >>>>>>>>>> ┊ blk_mq_run_hw_queues >>>>>>>>>> ┊ # hctx->dispatch list is empty >>>>>>>>>> blk_mq_request_bypass_insert >>>>>>>>>> # insert rq2 to hctx->dispatch list >>>>>>>>> >>>>>>>>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), >>>>>>>>> no matter if list_empty(list) is empty or not, queue will be run either from >>>>>>>>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). >>>>>>>> >>>>>>>> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() >>>>>>>> is called from blk_mq_try_issue_list_directly(), list_empty() won't >>>>>>>> pass, thus thus blk_mq_request_bypass_insert() won't run queue. >>>>>>> >>>>>>> Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch >>>>>>> list, the loop is broken and blk_mq_try_issue_list_directly() returns to >>>>>>> blk_mq_sched_insert_requests() in which list_empty() is false, so >>>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue >>>>>>> is still run. >>>>>>> >>>>>>> Also not sure why you make rq3 involved, since the list is local list on >>>>>>> stack, and it can be operated concurrently. >>>>>> >>>>>> I make rq3 involved because there are some conditions that >>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from >>>>>> blk_mq_sched_insert_requests(): >>>>> >>>>> The two won't be called if list_empty() is true, and will be called if >>>>> !list_empty(). >>>>> >>>>> That is why I mentioned run queue has been done after rq2 is added to >>>>> ->dispatch_list. >>>> >>>> I don't follow here, it's right after rq2 is inserted to dispatch list, >>>> list is not empty, and blk_mq_sched_insert_requests() will be called. >>>> However, do you think that it's impossible that >>>> blk_mq_sched_insert_requests() can dispatch rq in the list and list >>>> will become empty? >>> >>> Please take a look at blk_mq_sched_insert_requests(). >>> >>> When codes runs into blk_mq_sched_insert_requests(), the following >>> blk_mq_run_hw_queue() will be run always, how does list empty or not >>> make a difference there? >> >> This is strange, always blk_mq_run_hw_queue() is exactly what Yufen >> tries to do in this patch, are we look at different code? > > No. > >> >> I'm copying blk_mq_sched_insert_requests() here, the code is from >> latest linux-next: >> >> 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx, >> 462 ┊ struct blk_mq_ctx *ctx, >> 463 ┊ struct list_head *list, bool >> run_queue_async) >> 464 { >> 465 struct elevator_queue *e; >> 466 struct request_queue *q = hctx->queue; >> 467 >> 468 /* >> 469 ┊* blk_mq_sched_insert_requests() is called from flush plug >> 470 ┊* context only, and hold one usage counter to prevent queue >> 471 ┊* from being released. >> 472 ┊*/ >> 473 percpu_ref_get(&q->q_usage_counter); >> 474 >> 475 e = hctx->queue->elevator; >> 476 if (e) { >> 477 e->type->ops.insert_requests(hctx, list, false); >> 478 } else { >> 479 /* >> 480 ┊* try to issue requests directly if the hw queue isn't >> 481 ┊* busy in case of 'none' scheduler, and this way may >> save >> 482 ┊* us one extra enqueue & dequeue to sw queue. >> 483 ┊*/ >> 484 if (!hctx->dispatch_busy && !run_queue_async) { >> 485 blk_mq_run_dispatch_ops(hctx->queue, >> 486 blk_mq_try_issue_list_directly(hctx, >> list)); >> 487 if (list_empty(list)) >> 488 goto out; >> 489 } >> 490 blk_mq_insert_requests(hctx, ctx, list); >> 491 } >> 492 >> 493 blk_mq_run_hw_queue(hctx, run_queue_async); >> 494 out: >> 495 percpu_ref_put(&q->q_usage_counter); >> 496 } >> >> Here in line 487, if list_empty() is true, out label will skip >> run_queue(). > > If list_empty() is true, run queue is guaranteed to run > in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE > is returned from blk_mq_request_issue_directly(). If request issue success here, this what I say in my patch. Then no one run queue. > > ret = blk_mq_request_issue_directly(rq, list_empty(list)); > if (ret != BLK_STS_OK) { > if (ret == BLK_STS_RESOURCE || > ret == BLK_STS_DEV_RESOURCE) { > blk_mq_request_bypass_insert(rq, false, > list_empty(list)); //run queue > break; > } > blk_mq_end_request(rq, ret); > errors++; > } else > queued++; > > So why do you try to add one extra run queue? > > > Thanks, > Ming > > .
在 2022/07/26 11:21, Ming Lei 写道: > On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote: >> Hi, Ming >> >> 在 2022/07/26 11:02, Ming Lei 写道: >>> On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote: >>>> Hi, Ming >>>> 在 2022/07/26 10:32, Ming Lei 写道: >>>>> On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote: >>>>>> 在 2022/07/26 9:46, Ming Lei 写道: >>>>>>> On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: >>>>>>>> Hi, Ming! >>>>>>>> >>>>>>>> 在 2022/07/25 23:43, Ming Lei 写道: >>>>>>>>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: >>>>>>>>>> Hi, Ming! >>>>>>>>>> >>>>>>>>>> 在 2022/07/19 17:26, Ming Lei 写道: >>>>>>>>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: >>>>>>>>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq >>>>>>>>>>>> scheduler is 'none'. We found a IO hung as following: >>>>>>>>>>>> >>>>>>>>>>>> blk_finish_plug >>>>>>>>>>>> blk_mq_plug_issue_direct >>>>>>>>>>>> scsi_mq_get_budget >>>>>>>>>>>> //get budget_token fail and sdev->restarts=1 >>>>>>>>>>>> >>>>>>>>>>>> scsi_end_request >>>>>>>>>>>> scsi_run_queue_async >>>>>>>>>>>> //sdev->restart=0 and run queue >>>>>>>>>>>> >>>>>>>>>>>> blk_mq_request_bypass_insert >>>>>>>>>>>> //add request to hctx->dispatch list >>>>>>>>>>> >>>>>>>>>>> Here the issue shouldn't be related with scsi's get budget or >>>>>>>>>>> scsi_run_queue_async. >>>>>>>>>>> >>>>>>>>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's >>>>>>>>>>> responsibility to re-run queue for moving on. Can you investigate a >>>>>>>>>>> bit more why blk-mq doesn't run queue after adding request to >>>>>>>>>>> hctx dispatch list? >>>>>>>>>> >>>>>>>>>> I think Yufen is probably thinking about the following Concurrent >>>>>>>>>> scenario: >>>>>>>>>> >>>>>>>>>> blk_mq_flush_plug_list >>>>>>>>>> # assume there are three rq >>>>>>>>>> blk_mq_plug_issue_direct >>>>>>>>>> blk_mq_request_issue_directly >>>>>>>>>> # dispatch rq1, succeed >>>>>>>>>> blk_mq_request_issue_directly >>>>>>>>>> # dispatch rq2 >>>>>>>>>> __blk_mq_try_issue_directly >>>>>>>>>> blk_mq_get_dispatch_budget >>>>>>>>>> scsi_mq_get_budget >>>>>>>>>> atomic_inc(&sdev->restarts); >>>>>>>>>> # rq2 failed to get budget >>>>>>>>>> # restarts is 1 now >>>>>>>>>> scsi_end_request >>>>>>>>>> # rq1 is completed >>>>>>>>>> ┊scsi_run_queue_async >>>>>>>>>> ┊ atomic_cmpxchg(&sdev->restarts, >>>>>>>>>> old, 0) == old >>>>>>>>>> ┊ # set restarts to 0 >>>>>>>>>> ┊ blk_mq_run_hw_queues >>>>>>>>>> ┊ # hctx->dispatch list is empty >>>>>>>>>> blk_mq_request_bypass_insert >>>>>>>>>> # insert rq2 to hctx->dispatch list >>>>>>>>> >>>>>>>>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), >>>>>>>>> no matter if list_empty(list) is empty or not, queue will be run either from >>>>>>>>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). >>>>>>>> >>>>>>>> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() >>>>>>>> is called from blk_mq_try_issue_list_directly(), list_empty() won't >>>>>>>> pass, thus thus blk_mq_request_bypass_insert() won't run queue. >>>>>>> >>>>>>> Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch >>>>>>> list, the loop is broken and blk_mq_try_issue_list_directly() returns to >>>>>>> blk_mq_sched_insert_requests() in which list_empty() is false, so >>>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue >>>>>>> is still run. >>>>>>> >>>>>>> Also not sure why you make rq3 involved, since the list is local list on >>>>>>> stack, and it can be operated concurrently. >>>>>> >>>>>> I make rq3 involved because there are some conditions that >>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from >>>>>> blk_mq_sched_insert_requests(): >>>>> >>>>> The two won't be called if list_empty() is true, and will be called if >>>>> !list_empty(). >>>>> >>>>> That is why I mentioned run queue has been done after rq2 is added to >>>>> ->dispatch_list. >>>> >>>> I don't follow here, it's right after rq2 is inserted to dispatch list, >>>> list is not empty, and blk_mq_sched_insert_requests() will be called. >>>> However, do you think that it's impossible that >>>> blk_mq_sched_insert_requests() can dispatch rq in the list and list >>>> will become empty? >>> >>> Please take a look at blk_mq_sched_insert_requests(). >>> >>> When codes runs into blk_mq_sched_insert_requests(), the following >>> blk_mq_run_hw_queue() will be run always, how does list empty or not >>> make a difference there? >> >> This is strange, always blk_mq_run_hw_queue() is exactly what Yufen >> tries to do in this patch, are we look at different code? > > No. > >> >> I'm copying blk_mq_sched_insert_requests() here, the code is from >> latest linux-next: >> >> 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx, >> 462 ┊ struct blk_mq_ctx *ctx, >> 463 ┊ struct list_head *list, bool >> run_queue_async) >> 464 { >> 465 struct elevator_queue *e; >> 466 struct request_queue *q = hctx->queue; >> 467 >> 468 /* >> 469 ┊* blk_mq_sched_insert_requests() is called from flush plug >> 470 ┊* context only, and hold one usage counter to prevent queue >> 471 ┊* from being released. >> 472 ┊*/ >> 473 percpu_ref_get(&q->q_usage_counter); >> 474 >> 475 e = hctx->queue->elevator; >> 476 if (e) { >> 477 e->type->ops.insert_requests(hctx, list, false); >> 478 } else { >> 479 /* >> 480 ┊* try to issue requests directly if the hw queue isn't >> 481 ┊* busy in case of 'none' scheduler, and this way may >> save >> 482 ┊* us one extra enqueue & dequeue to sw queue. >> 483 ┊*/ >> 484 if (!hctx->dispatch_busy && !run_queue_async) { >> 485 blk_mq_run_dispatch_ops(hctx->queue, >> 486 blk_mq_try_issue_list_directly(hctx, >> list)); >> 487 if (list_empty(list)) >> 488 goto out; >> 489 } >> 490 blk_mq_insert_requests(hctx, ctx, list); >> 491 } >> 492 >> 493 blk_mq_run_hw_queue(hctx, run_queue_async); >> 494 out: >> 495 percpu_ref_put(&q->q_usage_counter); >> 496 } >> >> Here in line 487, if list_empty() is true, out label will skip >> run_queue(). > > If list_empty() is true, run queue is guaranteed to run > in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE > is returned from blk_mq_request_issue_directly(). > > ret = blk_mq_request_issue_directly(rq, list_empty(list)); > if (ret != BLK_STS_OK) { > if (ret == BLK_STS_RESOURCE || > ret == BLK_STS_DEV_RESOURCE) { > blk_mq_request_bypass_insert(rq, false, > list_empty(list)); //run queue > break; > } > blk_mq_end_request(rq, ret); > errors++; > } else > queued++; > > So why do you try to add one extra run queue? Hi, Ming Perhaps I didn't explain the scenario clearly, please notice that list contain three rq is required. 1) rq1 is dispatched successfuly 2) rq2 failed to dispatch due to no budget, in this case - rq2 will insert to dispatch list - list is not emply yet, run queue won't called 3) finally, blk_mq_sched_insert_requests() dispatch rq3 successfuly, and list will become empty, thus run queue still won't be called. Thanks, Kuai
On Tue, Jul 26, 2022 at 11:31:34AM +0800, Yu Kuai wrote: > 在 2022/07/26 11:21, Ming Lei 写道: > > On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote: > > > Hi, Ming > > > > > > 在 2022/07/26 11:02, Ming Lei 写道: > > > > On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote: > > > > > Hi, Ming > > > > > 在 2022/07/26 10:32, Ming Lei 写道: > > > > > > On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote: > > > > > > > 在 2022/07/26 9:46, Ming Lei 写道: > > > > > > > > On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: > > > > > > > > > Hi, Ming! > > > > > > > > > > > > > > > > > > 在 2022/07/25 23:43, Ming Lei 写道: > > > > > > > > > > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: > > > > > > > > > > > Hi, Ming! > > > > > > > > > > > > > > > > > > > > > > 在 2022/07/19 17:26, Ming Lei 写道: > > > > > > > > > > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: > > > > > > > > > > > > > We do test on a virtio scsi device (/dev/sda) and the default mq > > > > > > > > > > > > > scheduler is 'none'. We found a IO hung as following: > > > > > > > > > > > > > > > > > > > > > > > > > > blk_finish_plug > > > > > > > > > > > > > blk_mq_plug_issue_direct > > > > > > > > > > > > > scsi_mq_get_budget > > > > > > > > > > > > > //get budget_token fail and sdev->restarts=1 > > > > > > > > > > > > > > > > > > > > > > > > > > scsi_end_request > > > > > > > > > > > > > scsi_run_queue_async > > > > > > > > > > > > > //sdev->restart=0 and run queue > > > > > > > > > > > > > > > > > > > > > > > > > > blk_mq_request_bypass_insert > > > > > > > > > > > > > //add request to hctx->dispatch list > > > > > > > > > > > > > > > > > > > > > > > > Here the issue shouldn't be related with scsi's get budget or > > > > > > > > > > > > scsi_run_queue_async. > > > > > > > > > > > > > > > > > > > > > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's > > > > > > > > > > > > responsibility to re-run queue for moving on. Can you investigate a > > > > > > > > > > > > bit more why blk-mq doesn't run queue after adding request to > > > > > > > > > > > > hctx dispatch list? > > > > > > > > > > > > > > > > > > > > > > I think Yufen is probably thinking about the following Concurrent > > > > > > > > > > > scenario: > > > > > > > > > > > > > > > > > > > > > > blk_mq_flush_plug_list > > > > > > > > > > > # assume there are three rq > > > > > > > > > > > blk_mq_plug_issue_direct > > > > > > > > > > > blk_mq_request_issue_directly > > > > > > > > > > > # dispatch rq1, succeed > > > > > > > > > > > blk_mq_request_issue_directly > > > > > > > > > > > # dispatch rq2 > > > > > > > > > > > __blk_mq_try_issue_directly > > > > > > > > > > > blk_mq_get_dispatch_budget > > > > > > > > > > > scsi_mq_get_budget > > > > > > > > > > > atomic_inc(&sdev->restarts); > > > > > > > > > > > # rq2 failed to get budget > > > > > > > > > > > # restarts is 1 now > > > > > > > > > > > scsi_end_request > > > > > > > > > > > # rq1 is completed > > > > > > > > > > > ┊scsi_run_queue_async > > > > > > > > > > > ┊ atomic_cmpxchg(&sdev->restarts, > > > > > > > > > > > old, 0) == old > > > > > > > > > > > ┊ # set restarts to 0 > > > > > > > > > > > ┊ blk_mq_run_hw_queues > > > > > > > > > > > ┊ # hctx->dispatch list is empty > > > > > > > > > > > blk_mq_request_bypass_insert > > > > > > > > > > > # insert rq2 to hctx->dispatch list > > > > > > > > > > > > > > > > > > > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), > > > > > > > > > > no matter if list_empty(list) is empty or not, queue will be run either from > > > > > > > > > > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). > > > > > > > > > > > > > > > > > > 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() > > > > > > > > > is called from blk_mq_try_issue_list_directly(), list_empty() won't > > > > > > > > > pass, thus thus blk_mq_request_bypass_insert() won't run queue. > > > > > > > > > > > > > > > > Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch > > > > > > > > list, the loop is broken and blk_mq_try_issue_list_directly() returns to > > > > > > > > blk_mq_sched_insert_requests() in which list_empty() is false, so > > > > > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue > > > > > > > > is still run. > > > > > > > > > > > > > > > > Also not sure why you make rq3 involved, since the list is local list on > > > > > > > > stack, and it can be operated concurrently. > > > > > > > > > > > > > > I make rq3 involved because there are some conditions that > > > > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from > > > > > > > blk_mq_sched_insert_requests(): > > > > > > > > > > > > The two won't be called if list_empty() is true, and will be called if > > > > > > !list_empty(). > > > > > > > > > > > > That is why I mentioned run queue has been done after rq2 is added to > > > > > > ->dispatch_list. > > > > > > > > > > I don't follow here, it's right after rq2 is inserted to dispatch list, > > > > > list is not empty, and blk_mq_sched_insert_requests() will be called. > > > > > However, do you think that it's impossible that > > > > > blk_mq_sched_insert_requests() can dispatch rq in the list and list > > > > > will become empty? > > > > > > > > Please take a look at blk_mq_sched_insert_requests(). > > > > > > > > When codes runs into blk_mq_sched_insert_requests(), the following > > > > blk_mq_run_hw_queue() will be run always, how does list empty or not > > > > make a difference there? > > > > > > This is strange, always blk_mq_run_hw_queue() is exactly what Yufen > > > tries to do in this patch, are we look at different code? > > > > No. > > > > > > > > I'm copying blk_mq_sched_insert_requests() here, the code is from > > > latest linux-next: > > > > > > 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx, > > > 462 ┊ struct blk_mq_ctx *ctx, > > > 463 ┊ struct list_head *list, bool > > > run_queue_async) > > > 464 { > > > 465 struct elevator_queue *e; > > > 466 struct request_queue *q = hctx->queue; > > > 467 > > > 468 /* > > > 469 ┊* blk_mq_sched_insert_requests() is called from flush plug > > > 470 ┊* context only, and hold one usage counter to prevent queue > > > 471 ┊* from being released. > > > 472 ┊*/ > > > 473 percpu_ref_get(&q->q_usage_counter); > > > 474 > > > 475 e = hctx->queue->elevator; > > > 476 if (e) { > > > 477 e->type->ops.insert_requests(hctx, list, false); > > > 478 } else { > > > 479 /* > > > 480 ┊* try to issue requests directly if the hw queue isn't > > > 481 ┊* busy in case of 'none' scheduler, and this way may > > > save > > > 482 ┊* us one extra enqueue & dequeue to sw queue. > > > 483 ┊*/ > > > 484 if (!hctx->dispatch_busy && !run_queue_async) { > > > 485 blk_mq_run_dispatch_ops(hctx->queue, > > > 486 blk_mq_try_issue_list_directly(hctx, > > > list)); > > > 487 if (list_empty(list)) > > > 488 goto out; > > > 489 } > > > 490 blk_mq_insert_requests(hctx, ctx, list); > > > 491 } > > > 492 > > > 493 blk_mq_run_hw_queue(hctx, run_queue_async); > > > 494 out: > > > 495 percpu_ref_put(&q->q_usage_counter); > > > 496 } > > > > > > Here in line 487, if list_empty() is true, out label will skip > > > run_queue(). > > > > If list_empty() is true, run queue is guaranteed to run > > in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE > > is returned from blk_mq_request_issue_directly(). > > > > ret = blk_mq_request_issue_directly(rq, list_empty(list)); > > if (ret != BLK_STS_OK) { > > if (ret == BLK_STS_RESOURCE || > > ret == BLK_STS_DEV_RESOURCE) { > > blk_mq_request_bypass_insert(rq, false, > > list_empty(list)); //run queue > > break; > > } > > blk_mq_end_request(rq, ret); > > errors++; > > } else > > queued++; > > > > So why do you try to add one extra run queue? > > Hi, Ming > > Perhaps I didn't explain the scenario clearly, please notice that list > contain three rq is required. > > 1) rq1 is dispatched successfuly > 2) rq2 failed to dispatch due to no budget, in this case > - rq2 will insert to dispatch list > - list is not emply yet, run queue won't called In the case, blk_mq_try_issue_list_directly() returns to blk_mq_sched_insert_requests() immediately, then blk_mq_insert_requests() and blk_mq_run_hw_queue() will be run from blk_mq_sched_insert_requests() because the list isn't empty. Right? Thanks, Ming
On 2022/7/26 12:16, Ming Lei wrote: > On Tue, Jul 26, 2022 at 11:31:34AM +0800, Yu Kuai wrote: >> 在 2022/07/26 11:21, Ming Lei 写道: >>> On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote: >>>> Hi, Ming >>>> >>>> 在 2022/07/26 11:02, Ming Lei 写道: >>>>> On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote: >>>>>> Hi, Ming >>>>>> 在 2022/07/26 10:32, Ming Lei 写道: >>>>>>> On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote: >>>>>>>> 在 2022/07/26 9:46, Ming Lei 写道: >>>>>>>>> On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: >>>>>>>>>> Hi, Ming! >>>>>>>>>> >>>>>>>>>> 在 2022/07/25 23:43, Ming Lei 写道: >>>>>>>>>>> On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: >>>>>>>>>>>> Hi, Ming! >>>>>>>>>>>> >>>>>>>>>>>> 在 2022/07/19 17:26, Ming Lei 写道: >>>>>>>>>>>>> On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: >>>>>>>>>>>>>> We do test on a virtio scsi device (/dev/sda) and the default mq >>>>>>>>>>>>>> scheduler is 'none'. We found a IO hung as following: >>>>>>>>>>>>>> >>>>>>>>>>>>>> blk_finish_plug >>>>>>>>>>>>>> blk_mq_plug_issue_direct >>>>>>>>>>>>>> scsi_mq_get_budget >>>>>>>>>>>>>> //get budget_token fail and sdev->restarts=1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> scsi_end_request >>>>>>>>>>>>>> scsi_run_queue_async >>>>>>>>>>>>>> //sdev->restart=0 and run queue >>>>>>>>>>>>>> >>>>>>>>>>>>>> blk_mq_request_bypass_insert >>>>>>>>>>>>>> //add request to hctx->dispatch list >>>>>>>>>>>>> >>>>>>>>>>>>> Here the issue shouldn't be related with scsi's get budget or >>>>>>>>>>>>> scsi_run_queue_async. >>>>>>>>>>>>> >>>>>>>>>>>>> If blk-mq adds request into ->dispatch_list, it is blk-mq core's >>>>>>>>>>>>> responsibility to re-run queue for moving on. Can you investigate a >>>>>>>>>>>>> bit more why blk-mq doesn't run queue after adding request to >>>>>>>>>>>>> hctx dispatch list? >>>>>>>>>>>> >>>>>>>>>>>> I think Yufen is probably thinking about the following Concurrent >>>>>>>>>>>> scenario: >>>>>>>>>>>> >>>>>>>>>>>> blk_mq_flush_plug_list >>>>>>>>>>>> # assume there are three rq >>>>>>>>>>>> blk_mq_plug_issue_direct >>>>>>>>>>>> blk_mq_request_issue_directly >>>>>>>>>>>> # dispatch rq1, succeed >>>>>>>>>>>> blk_mq_request_issue_directly >>>>>>>>>>>> # dispatch rq2 >>>>>>>>>>>> __blk_mq_try_issue_directly >>>>>>>>>>>> blk_mq_get_dispatch_budget >>>>>>>>>>>> scsi_mq_get_budget >>>>>>>>>>>> atomic_inc(&sdev->restarts); >>>>>>>>>>>> # rq2 failed to get budget >>>>>>>>>>>> # restarts is 1 now >>>>>>>>>>>> scsi_end_request >>>>>>>>>>>> # rq1 is completed >>>>>>>>>>>> ┊scsi_run_queue_async >>>>>>>>>>>> ┊ atomic_cmpxchg(&sdev->restarts, >>>>>>>>>>>> old, 0) == old >>>>>>>>>>>> ┊ # set restarts to 0 >>>>>>>>>>>> ┊ blk_mq_run_hw_queues >>>>>>>>>>>> ┊ # hctx->dispatch list is empty >>>>>>>>>>>> blk_mq_request_bypass_insert >>>>>>>>>>>> # insert rq2 to hctx->dispatch list >>>>>>>>>>> >>>>>>>>>>> After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), >>>>>>>>>>> no matter if list_empty(list) is empty or not, queue will be run either from >>>>>>>>>>> blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). >>>>>>>>>> >>>>>>>>>> 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() >>>>>>>>>> is called from blk_mq_try_issue_list_directly(), list_empty() won't >>>>>>>>>> pass, thus thus blk_mq_request_bypass_insert() won't run queue. >>>>>>>>> >>>>>>>>> Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch >>>>>>>>> list, the loop is broken and blk_mq_try_issue_list_directly() returns to >>>>>>>>> blk_mq_sched_insert_requests() in which list_empty() is false, so >>>>>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue >>>>>>>>> is still run. >>>>>>>>> >>>>>>>>> Also not sure why you make rq3 involved, since the list is local list on >>>>>>>>> stack, and it can be operated concurrently. >>>>>>>> >>>>>>>> I make rq3 involved because there are some conditions that >>>>>>>> blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from >>>>>>>> blk_mq_sched_insert_requests(): >>>>>>> >>>>>>> The two won't be called if list_empty() is true, and will be called if >>>>>>> !list_empty(). >>>>>>> >>>>>>> That is why I mentioned run queue has been done after rq2 is added to >>>>>>> ->dispatch_list. >>>>>> >>>>>> I don't follow here, it's right after rq2 is inserted to dispatch list, >>>>>> list is not empty, and blk_mq_sched_insert_requests() will be called. >>>>>> However, do you think that it's impossible that >>>>>> blk_mq_sched_insert_requests() can dispatch rq in the list and list >>>>>> will become empty? >>>>> >>>>> Please take a look at blk_mq_sched_insert_requests(). >>>>> >>>>> When codes runs into blk_mq_sched_insert_requests(), the following >>>>> blk_mq_run_hw_queue() will be run always, how does list empty or not >>>>> make a difference there? >>>> >>>> This is strange, always blk_mq_run_hw_queue() is exactly what Yufen >>>> tries to do in this patch, are we look at different code? >>> >>> No. >>> >>>> >>>> I'm copying blk_mq_sched_insert_requests() here, the code is from >>>> latest linux-next: >>>> >>>> 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx, >>>> 462 ┊ struct blk_mq_ctx *ctx, >>>> 463 ┊ struct list_head *list, bool >>>> run_queue_async) >>>> 464 { >>>> 465 struct elevator_queue *e; >>>> 466 struct request_queue *q = hctx->queue; >>>> 467 >>>> 468 /* >>>> 469 ┊* blk_mq_sched_insert_requests() is called from flush plug >>>> 470 ┊* context only, and hold one usage counter to prevent queue >>>> 471 ┊* from being released. >>>> 472 ┊*/ >>>> 473 percpu_ref_get(&q->q_usage_counter); >>>> 474 >>>> 475 e = hctx->queue->elevator; >>>> 476 if (e) { >>>> 477 e->type->ops.insert_requests(hctx, list, false); >>>> 478 } else { >>>> 479 /* >>>> 480 ┊* try to issue requests directly if the hw queue isn't >>>> 481 ┊* busy in case of 'none' scheduler, and this way may >>>> save >>>> 482 ┊* us one extra enqueue & dequeue to sw queue. >>>> 483 ┊*/ >>>> 484 if (!hctx->dispatch_busy && !run_queue_async) { >>>> 485 blk_mq_run_dispatch_ops(hctx->queue, >>>> 486 blk_mq_try_issue_list_directly(hctx, >>>> list)); >>>> 487 if (list_empty(list)) >>>> 488 goto out; >>>> 489 } >>>> 490 blk_mq_insert_requests(hctx, ctx, list); >>>> 491 } >>>> 492 >>>> 493 blk_mq_run_hw_queue(hctx, run_queue_async); >>>> 494 out: >>>> 495 percpu_ref_put(&q->q_usage_counter); >>>> 496 } >>>> >>>> Here in line 487, if list_empty() is true, out label will skip >>>> run_queue(). >>> >>> If list_empty() is true, run queue is guaranteed to run >>> in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE >>> is returned from blk_mq_request_issue_directly(). >>> >>> ret = blk_mq_request_issue_directly(rq, list_empty(list)); >>> if (ret != BLK_STS_OK) { >>> if (ret == BLK_STS_RESOURCE || >>> ret == BLK_STS_DEV_RESOURCE) { >>> blk_mq_request_bypass_insert(rq, false, >>> list_empty(list)); //run queue >>> break; >>> } >>> blk_mq_end_request(rq, ret); >>> errors++; >>> } else >>> queued++; >>> >>> So why do you try to add one extra run queue? >> >> Hi, Ming >> >> Perhaps I didn't explain the scenario clearly, please notice that list >> contain three rq is required. >> >> 1) rq1 is dispatched successfuly >> 2) rq2 failed to dispatch due to no budget, in this case >> - rq2 will insert to dispatch list >> - list is not emply yet, run queue won't called > > In the case, blk_mq_try_issue_list_directly() returns to > blk_mq_sched_insert_requests() immediately, then blk_mq_insert_requests() > and blk_mq_run_hw_queue() will be run from blk_mq_sched_insert_requests() > because the list isn't empty. > > Right? > hi Ming, Here rq2 fail from blk_mq_plug_issue_direct() in blk_mq_flush_plug_list(), not blk_mq_sched_insert_requests blk_mq_flush_plug_list if (!plug->multiple_queues && !plug->has_elevator && !from_schedule) { struct request_queue *q; rq = rq_list_peek(&plug->mq_list); q = rq->q; /* * Peek first request and see if we have a ->queue_rqs() hook. * If we do, we can dispatch the whole plug list in one go. We * already know at this point that all requests belong to the * same queue, caller must ensure that's the case. * * Since we pass off the full list to the driver at this point, * we do not increment the active request count for the queue. * Bypass shared tags for now because of that. */ if (q->mq_ops->queue_rqs && !(rq->mq_hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) { blk_mq_run_dispatch_ops(q, __blk_mq_flush_plug_list(q, plug)); if (rq_list_empty(plug->mq_list)) return; } blk_mq_run_dispatch_ops(q, blk_mq_plug_issue_direct(plug, false)); //rq2 insert into dispatch list if (rq_list_empty(plug->mq_list)) return; } do { blk_mq_dispatch_plug_list(plug, from_schedule); //continue issue rq3 and success } while (!rq_list_empty(plug->mq_list)); > > Thanks, > Ming > > .
On Tue, Jul 26, 2022 at 01:01:41PM +0800, Yufen Yu wrote: > > > On 2022/7/26 12:16, Ming Lei wrote: > > On Tue, Jul 26, 2022 at 11:31:34AM +0800, Yu Kuai wrote: > > > 在 2022/07/26 11:21, Ming Lei 写道: > > > > On Tue, Jul 26, 2022 at 11:14:23AM +0800, Yu Kuai wrote: > > > > > Hi, Ming > > > > > > > > > > 在 2022/07/26 11:02, Ming Lei 写道: > > > > > > On Tue, Jul 26, 2022 at 10:52:56AM +0800, Yu Kuai wrote: > > > > > > > Hi, Ming > > > > > > > 在 2022/07/26 10:32, Ming Lei 写道: > > > > > > > > On Tue, Jul 26, 2022 at 10:08:13AM +0800, Yu Kuai wrote: > > > > > > > > > 在 2022/07/26 9:46, Ming Lei 写道: > > > > > > > > > > On Tue, Jul 26, 2022 at 09:08:19AM +0800, Yu Kuai wrote: > > > > > > > > > > > Hi, Ming! > > > > > > > > > > > > > > > > > > > > > > 在 2022/07/25 23:43, Ming Lei 写道: > > > > > > > > > > > > On Sat, Jul 23, 2022 at 10:50:03AM +0800, Yu Kuai wrote: > > > > > > > > > > > > > Hi, Ming! > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2022/07/19 17:26, Ming Lei 写道: > > > > > > > > > > > > > > On Mon, Jul 18, 2022 at 08:35:28PM +0800, Yufen Yu wrote: > > > > > > > > > > > > > > > We do test on a virtio scsi device (/dev/sda) and the default mq > > > > > > > > > > > > > > > scheduler is 'none'. We found a IO hung as following: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > blk_finish_plug > > > > > > > > > > > > > > > blk_mq_plug_issue_direct > > > > > > > > > > > > > > > scsi_mq_get_budget > > > > > > > > > > > > > > > //get budget_token fail and sdev->restarts=1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > scsi_end_request > > > > > > > > > > > > > > > scsi_run_queue_async > > > > > > > > > > > > > > > //sdev->restart=0 and run queue > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > blk_mq_request_bypass_insert > > > > > > > > > > > > > > > //add request to hctx->dispatch list > > > > > > > > > > > > > > > > > > > > > > > > > > > > Here the issue shouldn't be related with scsi's get budget or > > > > > > > > > > > > > > scsi_run_queue_async. > > > > > > > > > > > > > > > > > > > > > > > > > > > > If blk-mq adds request into ->dispatch_list, it is blk-mq core's > > > > > > > > > > > > > > responsibility to re-run queue for moving on. Can you investigate a > > > > > > > > > > > > > > bit more why blk-mq doesn't run queue after adding request to > > > > > > > > > > > > > > hctx dispatch list? > > > > > > > > > > > > > > > > > > > > > > > > > > I think Yufen is probably thinking about the following Concurrent > > > > > > > > > > > > > scenario: > > > > > > > > > > > > > > > > > > > > > > > > > > blk_mq_flush_plug_list > > > > > > > > > > > > > # assume there are three rq > > > > > > > > > > > > > blk_mq_plug_issue_direct > > > > > > > > > > > > > blk_mq_request_issue_directly > > > > > > > > > > > > > # dispatch rq1, succeed > > > > > > > > > > > > > blk_mq_request_issue_directly > > > > > > > > > > > > > # dispatch rq2 > > > > > > > > > > > > > __blk_mq_try_issue_directly > > > > > > > > > > > > > blk_mq_get_dispatch_budget > > > > > > > > > > > > > scsi_mq_get_budget > > > > > > > > > > > > > atomic_inc(&sdev->restarts); > > > > > > > > > > > > > # rq2 failed to get budget > > > > > > > > > > > > > # restarts is 1 now > > > > > > > > > > > > > scsi_end_request > > > > > > > > > > > > > # rq1 is completed > > > > > > > > > > > > > ┊scsi_run_queue_async > > > > > > > > > > > > > ┊ atomic_cmpxchg(&sdev->restarts, > > > > > > > > > > > > > old, 0) == old > > > > > > > > > > > > > ┊ # set restarts to 0 > > > > > > > > > > > > > ┊ blk_mq_run_hw_queues > > > > > > > > > > > > > ┊ # hctx->dispatch list is empty > > > > > > > > > > > > > blk_mq_request_bypass_insert > > > > > > > > > > > > > # insert rq2 to hctx->dispatch list > > > > > > > > > > > > > > > > > > > > > > > > After rq2 is added to ->dispatch_list in blk_mq_try_issue_list_directly(), > > > > > > > > > > > > no matter if list_empty(list) is empty or not, queue will be run either from > > > > > > > > > > > > blk_mq_request_bypass_insert() or blk_mq_sched_insert_requests(). > > > > > > > > > > > > > > > > > > > > > > 1) while inserting rq2 to dispatch list, blk_mq_request_bypass_insert() > > > > > > > > > > > is called from blk_mq_try_issue_list_directly(), list_empty() won't > > > > > > > > > > > pass, thus thus blk_mq_request_bypass_insert() won't run queue. > > > > > > > > > > > > > > > > > > > > Yeah, but in blk_mq_try_issue_list_directly() after rq2 is inserted to dispatch > > > > > > > > > > list, the loop is broken and blk_mq_try_issue_list_directly() returns to > > > > > > > > > > blk_mq_sched_insert_requests() in which list_empty() is false, so > > > > > > > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() are called, queue > > > > > > > > > > is still run. > > > > > > > > > > > > > > > > > > > > Also not sure why you make rq3 involved, since the list is local list on > > > > > > > > > > stack, and it can be operated concurrently. > > > > > > > > > > > > > > > > > > I make rq3 involved because there are some conditions that > > > > > > > > > blk_mq_insert_requests() and blk_mq_run_hw_queue() won't be called from > > > > > > > > > blk_mq_sched_insert_requests(): > > > > > > > > > > > > > > > > The two won't be called if list_empty() is true, and will be called if > > > > > > > > !list_empty(). > > > > > > > > > > > > > > > > That is why I mentioned run queue has been done after rq2 is added to > > > > > > > > ->dispatch_list. > > > > > > > > > > > > > > I don't follow here, it's right after rq2 is inserted to dispatch list, > > > > > > > list is not empty, and blk_mq_sched_insert_requests() will be called. > > > > > > > However, do you think that it's impossible that > > > > > > > blk_mq_sched_insert_requests() can dispatch rq in the list and list > > > > > > > will become empty? > > > > > > > > > > > > Please take a look at blk_mq_sched_insert_requests(). > > > > > > > > > > > > When codes runs into blk_mq_sched_insert_requests(), the following > > > > > > blk_mq_run_hw_queue() will be run always, how does list empty or not > > > > > > make a difference there? > > > > > > > > > > This is strange, always blk_mq_run_hw_queue() is exactly what Yufen > > > > > tries to do in this patch, are we look at different code? > > > > > > > > No. > > > > > > > > > > > > > > I'm copying blk_mq_sched_insert_requests() here, the code is from > > > > > latest linux-next: > > > > > > > > > > 461 void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx, > > > > > 462 ┊ struct blk_mq_ctx *ctx, > > > > > 463 ┊ struct list_head *list, bool > > > > > run_queue_async) > > > > > 464 { > > > > > 465 struct elevator_queue *e; > > > > > 466 struct request_queue *q = hctx->queue; > > > > > 467 > > > > > 468 /* > > > > > 469 ┊* blk_mq_sched_insert_requests() is called from flush plug > > > > > 470 ┊* context only, and hold one usage counter to prevent queue > > > > > 471 ┊* from being released. > > > > > 472 ┊*/ > > > > > 473 percpu_ref_get(&q->q_usage_counter); > > > > > 474 > > > > > 475 e = hctx->queue->elevator; > > > > > 476 if (e) { > > > > > 477 e->type->ops.insert_requests(hctx, list, false); > > > > > 478 } else { > > > > > 479 /* > > > > > 480 ┊* try to issue requests directly if the hw queue isn't > > > > > 481 ┊* busy in case of 'none' scheduler, and this way may > > > > > save > > > > > 482 ┊* us one extra enqueue & dequeue to sw queue. > > > > > 483 ┊*/ > > > > > 484 if (!hctx->dispatch_busy && !run_queue_async) { > > > > > 485 blk_mq_run_dispatch_ops(hctx->queue, > > > > > 486 blk_mq_try_issue_list_directly(hctx, > > > > > list)); > > > > > 487 if (list_empty(list)) > > > > > 488 goto out; > > > > > 489 } > > > > > 490 blk_mq_insert_requests(hctx, ctx, list); > > > > > 491 } > > > > > 492 > > > > > 493 blk_mq_run_hw_queue(hctx, run_queue_async); > > > > > 494 out: > > > > > 495 percpu_ref_put(&q->q_usage_counter); > > > > > 496 } > > > > > > > > > > Here in line 487, if list_empty() is true, out label will skip > > > > > run_queue(). > > > > > > > > If list_empty() is true, run queue is guaranteed to run > > > > in blk_mq_try_issue_list_directly() in case that BLK_STS_*RESOURCE > > > > is returned from blk_mq_request_issue_directly(). > > > > > > > > ret = blk_mq_request_issue_directly(rq, list_empty(list)); > > > > if (ret != BLK_STS_OK) { > > > > if (ret == BLK_STS_RESOURCE || > > > > ret == BLK_STS_DEV_RESOURCE) { > > > > blk_mq_request_bypass_insert(rq, false, > > > > list_empty(list)); //run queue > > > > break; > > > > } > > > > blk_mq_end_request(rq, ret); > > > > errors++; > > > > } else > > > > queued++; > > > > > > > > So why do you try to add one extra run queue? > > > > > > Hi, Ming > > > > > > Perhaps I didn't explain the scenario clearly, please notice that list > > > contain three rq is required. > > > > > > 1) rq1 is dispatched successfuly > > > 2) rq2 failed to dispatch due to no budget, in this case > > > - rq2 will insert to dispatch list > > > - list is not emply yet, run queue won't called > > > > In the case, blk_mq_try_issue_list_directly() returns to > > blk_mq_sched_insert_requests() immediately, then blk_mq_insert_requests() > > and blk_mq_run_hw_queue() will be run from blk_mq_sched_insert_requests() > > because the list isn't empty. > > > > Right? > > > > hi Ming, > > Here rq2 fail from blk_mq_plug_issue_direct() in blk_mq_flush_plug_list(), > not blk_mq_sched_insert_requests OK, just wondering why Yufen's patch touches blk_mq_sched_insert_requests(). Here the issue is in blk_mq_plug_issue_direct() itself, it is wrong to use last request of plug list to decide if run queue is needed since all the remained requests in plug list may be from other hctxs, and the simplest fix could be pass run_queue as true always to blk_mq_request_bypass_insert(). Thanks, Ming
On 2022/7/26 15:39, Ming Lei wrote: > On Tue, Jul 26, 2022 at 01:01:41PM +0800, Yufen Yu wrote: >> >>> >> >> hi Ming, >> >> Here rq2 fail from blk_mq_plug_issue_direct() in blk_mq_flush_plug_list(), >> not blk_mq_sched_insert_requests > > OK, just wondering why Yufen's patch touches > blk_mq_sched_insert_requests(). > > Here the issue is in blk_mq_plug_issue_direct() itself, it is wrong to use last > request of plug list to decide if run queue is needed since all the remained > requests in plug list may be from other hctxs, and the simplest fix could be pass > run_queue as true always to blk_mq_request_bypass_insert(). > OK, thanks for your suggestion and I will send v2. Thanks, Yufen > > Thanks, > Ming > > .
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index a4f7c101b53b..c3ad97ca2753 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -490,8 +490,8 @@ void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx, blk_mq_insert_requests(hctx, ctx, list); } - blk_mq_run_hw_queue(hctx, run_queue_async); out: + blk_mq_run_hw_queue(hctx, run_queue_async); percpu_ref_put(&q->q_usage_counter); }
We do test on a virtio scsi device (/dev/sda) and the default mq scheduler is 'none'. We found a IO hung as following: blk_finish_plug blk_mq_plug_issue_direct scsi_mq_get_budget //get budget_token fail and sdev->restarts=1 scsi_end_request scsi_run_queue_async //sdev->restart=0 and run queue blk_mq_request_bypass_insert //add request to hctx->dispatch list //continue to dispath plug list blk_mq_dispatch_plug_list blk_mq_try_issue_list_directly //success issue all requests from plug list After .get_budget fail, scsi_mq_get_budget will increase 'restarts'. Normally, it will run hw queue when io complete and set 'restarts' as 0. But if we run queue before adding request to the dispatch list and blk_mq_dispatch_plug_list also success issue all requests, then on one will run queue, and the request will be stall in the dispatch list and cannot complete forever. To fix the bug, we run queue after issuing the last request in function blk_mq_sched_insert_requests. Signed-off-by: Yufen Yu <yuyufen@huawei.com> --- block/blk-mq-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)