diff mbox series

[-next,RFC,1/6] blk-mq: add a new flag 'BLK_MQ_F_NO_TAG_PREEMPTION'

Message ID 20220329094048.2107094-2-yukuai3@huawei.com (mailing list archive)
State New, archived
Headers show
Series improve large random io for HDD | expand

Commit Message

Yu Kuai March 29, 2022, 9:40 a.m. UTC
Tag preemption is the default behaviour, specifically blk_mq_get_tag()
will try to get tag unconditionally, which means a new io can preempt tag
even if there are lots of ios that are waiting for tags.

This patch introduce a new flag, prepare to disable such behaviour, in
order to optimize io performance for large random io for HHD.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-mq-debugfs.c | 1 +
 block/blk-mq.h         | 5 +++++
 include/linux/blk-mq.h | 7 ++++++-
 3 files changed, 12 insertions(+), 1 deletion(-)

Comments

Jens Axboe March 29, 2022, 12:44 p.m. UTC | #1
On 3/29/22 3:40 AM, Yu Kuai wrote:
> Tag preemption is the default behaviour, specifically blk_mq_get_tag()
> will try to get tag unconditionally, which means a new io can preempt tag
> even if there are lots of ios that are waiting for tags.
> 
> This patch introduce a new flag, prepare to disable such behaviour, in
> order to optimize io performance for large random io for HHD.

Not sure why we need a flag for this behavior. Does it ever make sense
to allow preempting waiters, jumping the queue?
Yu Kuai March 30, 2022, 1:18 a.m. UTC | #2
On 2022/03/29 20:44, Jens Axboe wrote:
> On 3/29/22 3:40 AM, Yu Kuai wrote:
>> Tag preemption is the default behaviour, specifically blk_mq_get_tag()
>> will try to get tag unconditionally, which means a new io can preempt tag
>> even if there are lots of ios that are waiting for tags.
>>
>> This patch introduce a new flag, prepare to disable such behaviour, in
>> order to optimize io performance for large random io for HHD.
> 
> Not sure why we need a flag for this behavior. Does it ever make sense
> to allow preempting waiters, jumping the queue?
> 

Hi,

I was thinking using the flag to control the new behavior, in order to
reduce the impact on general path.

If wake up path is handled properly, I think it's ok to disable
preempting tags.

Thanks
Kuai
Jens Axboe March 30, 2022, 1:20 a.m. UTC | #3
On 3/29/22 7:18 PM, yukuai (C) wrote:
> On 2022/03/29 20:44, Jens Axboe wrote:
>> On 3/29/22 3:40 AM, Yu Kuai wrote:
>>> Tag preemption is the default behaviour, specifically blk_mq_get_tag()
>>> will try to get tag unconditionally, which means a new io can preempt tag
>>> even if there are lots of ios that are waiting for tags.
>>>
>>> This patch introduce a new flag, prepare to disable such behaviour, in
>>> order to optimize io performance for large random io for HHD.
>>
>> Not sure why we need a flag for this behavior. Does it ever make sense
>> to allow preempting waiters, jumping the queue?
>>
> 
> Hi,
> 
> I was thinking using the flag to control the new behavior, in order to
> reduce the impact on general path.
> 
> If wake up path is handled properly, I think it's ok to disable
> preempting tags.

If we hit tag starvation, we are by definition out of the fast path.
That doesn't mean that scalability should drop to the floor, something
that often happened before blk-mq and without the rolling wakeups. But
it does mean that we can throw a bit more smarts at it, if it improves
fairness/performance in that situation.
diff mbox series

Patch

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index aa0349e9f083..f4228532ee3d 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -226,6 +226,7 @@  static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(NO_SCHED),
 	HCTX_FLAG_NAME(STACKING),
 	HCTX_FLAG_NAME(TAG_HCTX_SHARED),
+	HCTX_FLAG_NAME(NO_TAG_PREEMPTION),
 };
 #undef HCTX_FLAG_NAME
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 2615bd58bad3..1a084b3b6097 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -168,6 +168,11 @@  static inline bool blk_mq_is_shared_tags(unsigned int flags)
 	return flags & BLK_MQ_F_TAG_HCTX_SHARED;
 }
 
+static inline bool blk_mq_is_tag_preemptive(unsigned int flags)
+{
+	return !(flags & BLK_MQ_F_NO_TAG_PREEMPTION);
+}
+
 static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
 {
 	if (!(data->rq_flags & RQF_ELV))
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 7aa5c54901a9..c9434162acc5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -656,7 +656,12 @@  enum {
 	 * or shared hwqs instead of 'mq-deadline'.
 	 */
 	BLK_MQ_F_NO_SCHED_BY_DEFAULT	= 1 << 7,
-	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
+	/*
+	 * If the disk is under high io pressure, new io will wait directly
+	 * without trying to preempt tag.
+	 */
+	BLK_MQ_F_NO_TAG_PREEMPTION	= 1 << 8,
+	BLK_MQ_F_ALLOC_POLICY_START_BIT = 9,
 	BLK_MQ_F_ALLOC_POLICY_BITS = 1,
 
 	BLK_MQ_S_STOPPED	= 0,