diff mbox series

[RESEND] blk-mq: insert request not through ->queue_rq into sw/scheduler queue

Message ID 20200818090728.2696802-1-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series [RESEND] blk-mq: insert request not through ->queue_rq into sw/scheduler queue | expand

Commit Message

Ming Lei Aug. 18, 2020, 9:07 a.m. UTC
c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed
to add request which has been through ->queue_rq() to the hw queue dispatch
list, however it adds request running out of budget or driver tag to hw queue
too. This way basically bypasses request merge, and causes too many request
dispatched to LLD, and system% is unnecessary increased.

Fixes this issue by adding request not through ->queue_rq into sw/scheduler
queue, and this way is safe because no ->queue_rq is called on this request
yet.

High %system can be observed on Azure storvsc device, and even soft lock
is observed. This patch reduces %system during heavy sequential IO,
meantime decreases soft lockup risk.

Fixes: c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-mq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Jens Axboe Aug. 18, 2020, 2:50 p.m. UTC | #1
On 8/18/20 2:07 AM, Ming Lei wrote:
> c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed
> to add request which has been through ->queue_rq() to the hw queue dispatch
> list, however it adds request running out of budget or driver tag to hw queue
> too. This way basically bypasses request merge, and causes too many request
> dispatched to LLD, and system% is unnecessary increased.
> 
> Fixes this issue by adding request not through ->queue_rq into sw/scheduler
> queue, and this way is safe because no ->queue_rq is called on this request
> yet.
> 
> High %system can be observed on Azure storvsc device, and even soft lock
> is observed. This patch reduces %system during heavy sequential IO,
> meantime decreases soft lockup risk.

Applied, thanks Ming.
Mike Snitzer Aug. 18, 2020, 3:20 p.m. UTC | #2
On Tue, Aug 18 2020 at 10:50am -0400,
Jens Axboe <axboe@kernel.dk> wrote:

> On 8/18/20 2:07 AM, Ming Lei wrote:
> > c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed
> > to add request which has been through ->queue_rq() to the hw queue dispatch
> > list, however it adds request running out of budget or driver tag to hw queue
> > too. This way basically bypasses request merge, and causes too many request
> > dispatched to LLD, and system% is unnecessary increased.
> > 
> > Fixes this issue by adding request not through ->queue_rq into sw/scheduler
> > queue, and this way is safe because no ->queue_rq is called on this request
> > yet.
> > 
> > High %system can be observed on Azure storvsc device, and even soft lock
> > is observed. This patch reduces %system during heavy sequential IO,
> > meantime decreases soft lockup risk.
> 
> Applied, thanks Ming.

Hmm, strikes me as strange that this is occurring given the direct
insertion into blk-mq queue (bypassing scheduler) is meant to avoid 2
layers of IO merging when dm-mulipath is stacked on blk-mq path(s).  The
dm-mpath IO scheduler does all merging and underlying paths' blk-mq
request_queues are meant to just dispatch the top-level's requests.

So this change concerns me.  Feels like this design has broken down.

Could be that some other entry point was added for the
__blk_mq_try_issue_directly() code?  And it needs to be untangled away
from the dm-multipath use-case?

Apologies for not responding to this patch until now.

Mike
Ming Lei Aug. 18, 2020, 11:52 p.m. UTC | #3
On Tue, Aug 18, 2020 at 11:20:22AM -0400, Mike Snitzer wrote:
> On Tue, Aug 18 2020 at 10:50am -0400,
> Jens Axboe <axboe@kernel.dk> wrote:
> 
> > On 8/18/20 2:07 AM, Ming Lei wrote:
> > > c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed
> > > to add request which has been through ->queue_rq() to the hw queue dispatch
> > > list, however it adds request running out of budget or driver tag to hw queue
> > > too. This way basically bypasses request merge, and causes too many request
> > > dispatched to LLD, and system% is unnecessary increased.
> > > 
> > > Fixes this issue by adding request not through ->queue_rq into sw/scheduler
> > > queue, and this way is safe because no ->queue_rq is called on this request
> > > yet.
> > > 
> > > High %system can be observed on Azure storvsc device, and even soft lock
> > > is observed. This patch reduces %system during heavy sequential IO,
> > > meantime decreases soft lockup risk.
> > 
> > Applied, thanks Ming.
> 
> Hmm, strikes me as strange that this is occurring given the direct
> insertion into blk-mq queue (bypassing scheduler) is meant to avoid 2
> layers of IO merging when dm-mulipath is stacked on blk-mq path(s).  The
> dm-mpath IO scheduler does all merging and underlying paths' blk-mq
> request_queues are meant to just dispatch the top-level's requests.
> 
> So this change concerns me.  Feels like this design has broken down.
> 

'bypass_insert' is 'true' when blk_insert_cloned_request() is
called from device mapper code, so this patch doesn't affect dm.

> Could be that some other entry point was added for the
> __blk_mq_try_issue_directly() code?  And it needs to be untangled away
> from the dm-multipath use-case?

__blk_mq_try_issue_directly() can be called from blk-mq directly, that
is the case this patch is addressing, if one request can't be queued to
LLD because of running out of budget or driver tag, it should be added to
scheduler queue for improving io merge, meantime we can avoid too many
requests dispatched to hardware.


Thanks,
Ming
Mike Snitzer Aug. 19, 2020, 12:20 a.m. UTC | #4
On Tue, Aug 18 2020 at  7:52pm -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> On Tue, Aug 18, 2020 at 11:20:22AM -0400, Mike Snitzer wrote:
> > On Tue, Aug 18 2020 at 10:50am -0400,
> > Jens Axboe <axboe@kernel.dk> wrote:
> > 
> > > On 8/18/20 2:07 AM, Ming Lei wrote:
> > > > c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed
> > > > to add request which has been through ->queue_rq() to the hw queue dispatch
> > > > list, however it adds request running out of budget or driver tag to hw queue
> > > > too. This way basically bypasses request merge, and causes too many request
> > > > dispatched to LLD, and system% is unnecessary increased.
> > > > 
> > > > Fixes this issue by adding request not through ->queue_rq into sw/scheduler
> > > > queue, and this way is safe because no ->queue_rq is called on this request
> > > > yet.
> > > > 
> > > > High %system can be observed on Azure storvsc device, and even soft lock
> > > > is observed. This patch reduces %system during heavy sequential IO,
> > > > meantime decreases soft lockup risk.
> > > 
> > > Applied, thanks Ming.
> > 
> > Hmm, strikes me as strange that this is occurring given the direct
> > insertion into blk-mq queue (bypassing scheduler) is meant to avoid 2
> > layers of IO merging when dm-mulipath is stacked on blk-mq path(s).  The
> > dm-mpath IO scheduler does all merging and underlying paths' blk-mq
> > request_queues are meant to just dispatch the top-level's requests.
> > 
> > So this change concerns me.  Feels like this design has broken down.
> > 
> 
> 'bypass_insert' is 'true' when blk_insert_cloned_request() is
> called from device mapper code, so this patch doesn't affect dm.

Great.
 
> > Could be that some other entry point was added for the
> > __blk_mq_try_issue_directly() code?  And it needs to be untangled away
> > from the dm-multipath use-case?
> 
> __blk_mq_try_issue_directly() can be called from blk-mq directly, that
> is the case this patch is addressing, if one request can't be queued to
> LLD because of running out of budget or driver tag, it should be added to
> scheduler queue for improving io merge, meantime we can avoid too many
> requests dispatched to hardware.

I see, so if retry is needed best to attempt merge again.

Thanks for the explanation.

Mike
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5ac80bfac325..f50c38ccac3c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2039,7 +2039,8 @@  static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	if (bypass_insert)
 		return BLK_STS_RESOURCE;
 
-	blk_mq_request_bypass_insert(rq, false, run_queue);
+	blk_mq_sched_insert_request(rq, false, run_queue, false);
+
 	return BLK_STS_OK;
 }