Message ID | 20200818090728.2696802-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RESEND] blk-mq: insert request not through ->queue_rq into sw/scheduler queue | expand |
On 8/18/20 2:07 AM, Ming Lei wrote: > c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed > to add request which has been through ->queue_rq() to the hw queue dispatch > list, however it adds request running out of budget or driver tag to hw queue > too. This way basically bypasses request merge, and causes too many request > dispatched to LLD, and system% is unnecessary increased. > > Fixes this issue by adding request not through ->queue_rq into sw/scheduler > queue, and this way is safe because no ->queue_rq is called on this request > yet. > > High %system can be observed on Azure storvsc device, and even soft lock > is observed. This patch reduces %system during heavy sequential IO, > meantime decreases soft lockup risk. Applied, thanks Ming.
On Tue, Aug 18 2020 at 10:50am -0400, Jens Axboe <axboe@kernel.dk> wrote: > On 8/18/20 2:07 AM, Ming Lei wrote: > > c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed > > to add request which has been through ->queue_rq() to the hw queue dispatch > > list, however it adds request running out of budget or driver tag to hw queue > > too. This way basically bypasses request merge, and causes too many request > > dispatched to LLD, and system% is unnecessary increased. > > > > Fixes this issue by adding request not through ->queue_rq into sw/scheduler > > queue, and this way is safe because no ->queue_rq is called on this request > > yet. > > > > High %system can be observed on Azure storvsc device, and even soft lock > > is observed. This patch reduces %system during heavy sequential IO, > > meantime decreases soft lockup risk. > > Applied, thanks Ming. Hmm, strikes me as strange that this is occurring given the direct insertion into blk-mq queue (bypassing scheduler) is meant to avoid 2 layers of IO merging when dm-mulipath is stacked on blk-mq path(s). The dm-mpath IO scheduler does all merging and underlying paths' blk-mq request_queues are meant to just dispatch the top-level's requests. So this change concerns me. Feels like this design has broken down. Could be that some other entry point was added for the __blk_mq_try_issue_directly() code? And it needs to be untangled away from the dm-multipath use-case? Apologies for not responding to this patch until now. Mike
On Tue, Aug 18, 2020 at 11:20:22AM -0400, Mike Snitzer wrote: > On Tue, Aug 18 2020 at 10:50am -0400, > Jens Axboe <axboe@kernel.dk> wrote: > > > On 8/18/20 2:07 AM, Ming Lei wrote: > > > c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed > > > to add request which has been through ->queue_rq() to the hw queue dispatch > > > list, however it adds request running out of budget or driver tag to hw queue > > > too. This way basically bypasses request merge, and causes too many request > > > dispatched to LLD, and system% is unnecessary increased. > > > > > > Fixes this issue by adding request not through ->queue_rq into sw/scheduler > > > queue, and this way is safe because no ->queue_rq is called on this request > > > yet. > > > > > > High %system can be observed on Azure storvsc device, and even soft lock > > > is observed. This patch reduces %system during heavy sequential IO, > > > meantime decreases soft lockup risk. > > > > Applied, thanks Ming. > > Hmm, strikes me as strange that this is occurring given the direct > insertion into blk-mq queue (bypassing scheduler) is meant to avoid 2 > layers of IO merging when dm-mulipath is stacked on blk-mq path(s). The > dm-mpath IO scheduler does all merging and underlying paths' blk-mq > request_queues are meant to just dispatch the top-level's requests. > > So this change concerns me. Feels like this design has broken down. > 'bypass_insert' is 'true' when blk_insert_cloned_request() is called from device mapper code, so this patch doesn't affect dm. > Could be that some other entry point was added for the > __blk_mq_try_issue_directly() code? And it needs to be untangled away > from the dm-multipath use-case? __blk_mq_try_issue_directly() can be called from blk-mq directly, that is the case this patch is addressing, if one request can't be queued to LLD because of running out of budget or driver tag, it should be added to scheduler queue for improving io merge, meantime we can avoid too many requests dispatched to hardware. Thanks, Ming
On Tue, Aug 18 2020 at 7:52pm -0400, Ming Lei <ming.lei@redhat.com> wrote: > On Tue, Aug 18, 2020 at 11:20:22AM -0400, Mike Snitzer wrote: > > On Tue, Aug 18 2020 at 10:50am -0400, > > Jens Axboe <axboe@kernel.dk> wrote: > > > > > On 8/18/20 2:07 AM, Ming Lei wrote: > > > > c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed > > > > to add request which has been through ->queue_rq() to the hw queue dispatch > > > > list, however it adds request running out of budget or driver tag to hw queue > > > > too. This way basically bypasses request merge, and causes too many request > > > > dispatched to LLD, and system% is unnecessary increased. > > > > > > > > Fixes this issue by adding request not through ->queue_rq into sw/scheduler > > > > queue, and this way is safe because no ->queue_rq is called on this request > > > > yet. > > > > > > > > High %system can be observed on Azure storvsc device, and even soft lock > > > > is observed. This patch reduces %system during heavy sequential IO, > > > > meantime decreases soft lockup risk. > > > > > > Applied, thanks Ming. > > > > Hmm, strikes me as strange that this is occurring given the direct > > insertion into blk-mq queue (bypassing scheduler) is meant to avoid 2 > > layers of IO merging when dm-mulipath is stacked on blk-mq path(s). The > > dm-mpath IO scheduler does all merging and underlying paths' blk-mq > > request_queues are meant to just dispatch the top-level's requests. > > > > So this change concerns me. Feels like this design has broken down. > > > > 'bypass_insert' is 'true' when blk_insert_cloned_request() is > called from device mapper code, so this patch doesn't affect dm. Great. > > Could be that some other entry point was added for the > > __blk_mq_try_issue_directly() code? And it needs to be untangled away > > from the dm-multipath use-case? > > __blk_mq_try_issue_directly() can be called from blk-mq directly, that > is the case this patch is addressing, if one request can't be queued to > LLD because of running out of budget or driver tag, it should be added to > scheduler queue for improving io merge, meantime we can avoid too many > requests dispatched to hardware. I see, so if retry is needed best to attempt merge again. Thanks for the explanation. Mike
diff --git a/block/blk-mq.c b/block/blk-mq.c index 5ac80bfac325..f50c38ccac3c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2039,7 +2039,8 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, if (bypass_insert) return BLK_STS_RESOURCE; - blk_mq_request_bypass_insert(rq, false, run_queue); + blk_mq_sched_insert_request(rq, false, run_queue, false); + return BLK_STS_OK; }
c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed to add request which has been through ->queue_rq() to the hw queue dispatch list, however it adds request running out of budget or driver tag to hw queue too. This way basically bypasses request merge, and causes too many request dispatched to LLD, and system% is unnecessary increased. Fixes this issue by adding request not through ->queue_rq into sw/scheduler queue, and this way is safe because no ->queue_rq is called on this request yet. High %system can be observed on Azure storvsc device, and even soft lock is observed. This patch reduces %system during heavy sequential IO, meantime decreases soft lockup risk. Fixes: c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") Signed-off-by: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Mike Snitzer <snitzer@redhat.com> --- block/blk-mq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)