Message ID | 20220408073916.1428590-5-yukuai3@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | improve tag allocation under heavy load | expand |
On 4/8/22 00:39, Yu Kuai wrote: > The idle way to disable tag preemption is to track how many tags are idle -> ideal? > available, and wait directly in blk_mq_get_tag() if free tags are > very little. However, this is out of reality because fast path is > affected. > > As 'ws_active' is only updated in slow path, this patch disable tag > preemption if 'ws_active' is greater than 8, which means there are many > threads waiting for tags already. > > Once tag preemption is disabled, there is a situation that can cause > performance degration(or io hung in extreme scenarios): the waitqueue degration -> degradation? > diff --git a/block/blk-mq.h b/block/blk-mq.h > index 2615bd58bad3..b49b20e11350 100644 > --- a/block/blk-mq.h > +++ b/block/blk-mq.h > @@ -156,6 +156,7 @@ struct blk_mq_alloc_data { > > /* allocate multiple requests/tags in one go */ > unsigned int nr_tags; > + bool preemption; > struct request **cached_rq; > Please change "preemption" into "preempt". Thanks, Bart.
在 2022/04/08 22:24, Bart Van Assche 写道: > On 4/8/22 00:39, Yu Kuai wrote: >> The idle way to disable tag preemption is to track how many tags are > > idle -> ideal? > >> available, and wait directly in blk_mq_get_tag() if free tags are >> very little. However, this is out of reality because fast path is >> affected. >> >> As 'ws_active' is only updated in slow path, this patch disable tag >> preemption if 'ws_active' is greater than 8, which means there are many >> threads waiting for tags already. >> >> Once tag preemption is disabled, there is a situation that can cause >> performance degration(or io hung in extreme scenarios): the waitqueue > > degration -> degradation? > >> diff --git a/block/blk-mq.h b/block/blk-mq.h >> index 2615bd58bad3..b49b20e11350 100644 >> --- a/block/blk-mq.h >> +++ b/block/blk-mq.h >> @@ -156,6 +156,7 @@ struct blk_mq_alloc_data { >> /* allocate multiple requests/tags in one go */ >> unsigned int nr_tags; >> + bool preemption; >> struct request **cached_rq; > > Please change "preemption" into "preempt". Thanks for your advice, will change that and previous spelling mistakes. Kuai > > Thanks, > > Bart. > > . >
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 228a0001694f..be2d49e6d69e 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -127,6 +127,13 @@ unsigned long blk_mq_get_tags(struct blk_mq_alloc_data *data, int nr_tags, return ret; } +static inline bool preempt_tag(struct blk_mq_alloc_data *data, + struct sbitmap_queue *bt) +{ + return data->preemption || + atomic_read(&bt->ws_active) <= SBQ_WAIT_QUEUES; +} + unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) { struct blk_mq_tags *tags = blk_mq_tags_from_data(data); @@ -148,12 +155,14 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) tag_offset = tags->nr_reserved_tags; } - tag = __blk_mq_get_tag(data, bt); - if (tag != BLK_MQ_NO_TAG) - goto found_tag; + if (data->flags & BLK_MQ_REQ_NOWAIT || preempt_tag(data, bt)) { + tag = __blk_mq_get_tag(data, bt); + if (tag != BLK_MQ_NO_TAG) + goto found_tag; - if (data->flags & BLK_MQ_REQ_NOWAIT) - return BLK_MQ_NO_TAG; + if (data->flags & BLK_MQ_REQ_NOWAIT) + return BLK_MQ_NO_TAG; + } do { struct sbitmap_queue *bt_prev; @@ -169,20 +178,25 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) * Retry tag allocation after running the hardware queue, * as running the queue may also have found completions. */ - tag = __blk_mq_get_tag(data, bt); - if (tag != BLK_MQ_NO_TAG) - break; + if (preempt_tag(data, bt)) { + tag = __blk_mq_get_tag(data, bt); + if (tag != BLK_MQ_NO_TAG) + break; + } ws = bt_wait_ptr(bt, data->hctx); sbitmap_prepare_to_wait(bt, ws, &wait, TASK_UNINTERRUPTIBLE); - tag = __blk_mq_get_tag(data, bt); - if (tag != BLK_MQ_NO_TAG) - break; + if (preempt_tag(data, bt)) { + tag = __blk_mq_get_tag(data, bt); + if (tag != BLK_MQ_NO_TAG) + break; + } bt_prev = bt; io_schedule(); + data->preemption = true; sbitmap_finish_wait(bt, ws, &wait); data->ctx = blk_mq_get_ctx(data->q); diff --git a/block/blk-mq.h b/block/blk-mq.h index 2615bd58bad3..b49b20e11350 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -156,6 +156,7 @@ struct blk_mq_alloc_data { /* allocate multiple requests/tags in one go */ unsigned int nr_tags; + bool preemption; struct request **cached_rq; /* input & output parameter */
Tag preemption is the default behaviour, specifically blk_mq_get_tag() will try to get tag unconditionally, which means a new io can preempt tag even if there are lots of ios that are waiting for tags. Such behaviour doesn't make sense when the disk is under heavy load, because it will intensify competition without improving performance, especially for huge io as split io is unlikely to be issued continuously. The idle way to disable tag preemption is to track how many tags are available, and wait directly in blk_mq_get_tag() if free tags are very little. However, this is out of reality because fast path is affected. As 'ws_active' is only updated in slow path, this patch disable tag preemption if 'ws_active' is greater than 8, which means there are many threads waiting for tags already. Once tag preemption is disabled, there is a situation that can cause performance degration(or io hung in extreme scenarios): the waitqueue doesn't have 'wake_batch' threads, thus wake up on this waitqueue might cause the concurrency of ios to be decreased. The next patch will fix this problem. Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- block/blk-mq-tag.c | 36 +++++++++++++++++++++++++----------- block/blk-mq.h | 1 + 2 files changed, 26 insertions(+), 11 deletions(-)