[v4,3/7] block: Send requeued requests to the I/O scheduler

Message ID	20230621201237.796902-4-bvanassche@acm.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@vger.kernel.org> From: Bart Van Assche <bvanassche@acm.org> To: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>, Bart Van Assche <bvanassche@acm.org>, Damien Le Moal <dlemoal@kernel.org>, Ming Lei <ming.lei@redhat.com>, Mike Snitzer <snitzer@kernel.org> Subject: [PATCH v4 3/7] block: Send requeued requests to the I/O scheduler Date: Wed, 21 Jun 2023 13:12:30 -0700 Message-ID: <20230621201237.796902-4-bvanassche@acm.org> In-Reply-To: <20230621201237.796902-1-bvanassche@acm.org> References: <20230621201237.796902-1-bvanassche@acm.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Submit zoned writes in order \| expand [v4,0/7] Submit zoned writes in order [v4,1/7] block: Rename a local variable in blk_mq_requeue_work() [v4,2/7] block: Simplify blk_mq_requeue_work() [v4,3/7] block: Send requeued requests to the I/O scheduler [v4,4/7] block: One requeue list per hctx [v4,5/7] block: Preserve the order of requeued requests [v4,6/7] dm: Inline __dm_mq_kick_requeue_list() [v4,7/7] block: Inline blk_mq_{,delay_}kick_requeue_list()

Message ID

20230621201237.796902-4-bvanassche@acm.org (mailing list archive)

State

New, archived

Headers

From: Bart Van Assche <bvanassche@acm.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
        Bart Van Assche <bvanassche@acm.org>,
        Damien Le Moal <dlemoal@kernel.org>,
        Ming Lei <ming.lei@redhat.com>,
        Mike Snitzer <snitzer@kernel.org>
Subject: [PATCH v4 3/7] block: Send requeued requests to the I/O scheduler
Date: Wed, 21 Jun 2023 13:12:30 -0700
Message-ID: <20230621201237.796902-4-bvanassche@acm.org>
In-Reply-To: <20230621201237.796902-1-bvanassche@acm.org>
References: <20230621201237.796902-1-bvanassche@acm.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Submit zoned writes in order | expand

Commit Message

Bart Van Assche June 21, 2023, 8:12 p.m. UTC

Send requeued requests to the I/O scheduler if the dispatch order
matters such that the I/O scheduler can control the order in which
requests are dispatched.

This patch reworks commit aef1897cd36d ("blk-mq: insert rq with DONTPREP
to hctx dispatch list when requeue"). Instead of sending DONTPREP
requests to the dispatch list, send these to the I/O scheduler and
prevent that the I/O scheduler merges these requests by adding
RQF_DONTPREP to the list of flags that prevent merging
(RQF_NOMERGE_FLAGS).

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c         | 10 +++++-----
 include/linux/blk-mq.h |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

Comments

Damien Le Moal June 22, 2023, 1:19 a.m. UTC | #1

On 6/22/23 05:12, Bart Van Assche wrote:
> Send requeued requests to the I/O scheduler if the dispatch order
> matters such that the I/O scheduler can control the order in which
> requests are dispatched.
> 
> This patch reworks commit aef1897cd36d ("blk-mq: insert rq with DONTPREP
> to hctx dispatch list when requeue"). Instead of sending DONTPREP
> requests to the dispatch list, send these to the I/O scheduler and
> prevent that the I/O scheduler merges these requests by adding
> RQF_DONTPREP to the list of flags that prevent merging
> (RQF_NOMERGE_FLAGS).
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c         | 10 +++++-----
>  include/linux/blk-mq.h |  4 ++--
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f440e4aaaae3..453a90767f7a 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1453,13 +1453,13 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  	while (!list_empty(&requeue_list)) {
>  		rq = list_entry(requeue_list.next, struct request, queuelist);
>  		/*
> -		 * If RQF_DONTPREP ist set, the request has been started by the
> -		 * driver already and might have driver-specific data allocated
> -		 * already.  Insert it into the hctx dispatch list to avoid
> -		 * block layer merges for the request.
> +		 * Only send those RQF_DONTPREP requests to the dispatch list
> +		 * that may be reordered freely. If the request order matters,
> +		 * send the request to the I/O scheduler.
>  		 */
>  		list_del_init(&rq->queuelist);
> -		if (rq->rq_flags & RQF_DONTPREP)
> +		if (rq->rq_flags & RQF_DONTPREP &&
> +		    !op_needs_zoned_write_locking(req_op(rq)))

Why ? I still do not understand the need for this. There is always only a single
in-flight write per sequential zone. Requeuing that in-flight write directly to
the dispatch list will not reorder writes and it will be better for the command
latency.

>  			blk_mq_request_bypass_insert(rq, 0);
>  		else
>  			blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index f401067ac03a..2610b299ec77 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -62,8 +62,8 @@ typedef __u32 __bitwise req_flags_t;
>  #define RQF_RESV		((__force req_flags_t)(1 << 23))
>  
>  /* flags that prevent us from merging requests: */
> -#define RQF_NOMERGE_FLAGS \
> -	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
> +#define RQF_NOMERGE_FLAGS                                               \
> +	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_DONTPREP | RQF_SPECIAL_PAYLOAD)
>  
>  enum mq_rq_state {
>  	MQ_RQ_IDLE		= 0,

Bart Van Assche June 22, 2023, 5:23 p.m. UTC | #2

On 6/21/23 18:19, Damien Le Moal wrote:
> Why ? I still do not understand the need for this. There is always only a single
> in-flight write per sequential zone. Requeuing that in-flight write directly to
> the dispatch list will not reorder writes and it will be better for the command
> latency.
Hi Damien,

After having taken a closer look I see that blk_req_zone_write_unlock() 
is called from inside dd_insert_request() when requeuing a request. 
Hence, there is no reordering risk when requeuing a zoned write. I will 
drop this patch.

Do you agree with having one requeue list per hctx instead of per 
request queue? This change enables eliminating 
blk_mq_kick_requeue_list(). I think that's an interesting simplification 
of the block layer API.

Thanks,

Bart.

Damien Le Moal June 22, 2023, 9:50 p.m. UTC | #3

On 6/23/23 02:23, Bart Van Assche wrote:
> On 6/21/23 18:19, Damien Le Moal wrote:
>> Why ? I still do not understand the need for this. There is always only a single
>> in-flight write per sequential zone. Requeuing that in-flight write directly to
>> the dispatch list will not reorder writes and it will be better for the command
>> latency.
> Hi Damien,
> 
> After having taken a closer look I see that blk_req_zone_write_unlock() 
> is called from inside dd_insert_request() when requeuing a request. 
> Hence, there is no reordering risk when requeuing a zoned write. I will 
> drop this patch.

OK. Thanks.

> 
> Do you agree with having one requeue list per hctx instead of per 
> request queue? This change enables eliminating 
> blk_mq_kick_requeue_list(). I think that's an interesting simplification 
> of the block layer API.

I do not see any issue with that. Indeed, it does simplify the code nicely.
Reading patch 5, I wondered though if it is really worth keeping the helpers
blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list(). May be calling
blk_mq_run_hw_queues() and blk_mq_delay_run_hw_queues() is better ? No strong
opinion though.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f440e4aaaae3..453a90767f7a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1453,13 +1453,13 @@  static void blk_mq_requeue_work(struct work_struct *work)
 	while (!list_empty(&requeue_list)) {
 		rq = list_entry(requeue_list.next, struct request, queuelist);
 		/*
-		 * If RQF_DONTPREP ist set, the request has been started by the
-		 * driver already and might have driver-specific data allocated
-		 * already.  Insert it into the hctx dispatch list to avoid
-		 * block layer merges for the request.
+		 * Only send those RQF_DONTPREP requests to the dispatch list
+		 * that may be reordered freely. If the request order matters,
+		 * send the request to the I/O scheduler.
 		 */
 		list_del_init(&rq->queuelist);
-		if (rq->rq_flags & RQF_DONTPREP)
+		if (rq->rq_flags & RQF_DONTPREP &&
+		    !op_needs_zoned_write_locking(req_op(rq)))
 			blk_mq_request_bypass_insert(rq, 0);
 		else
 			blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f401067ac03a..2610b299ec77 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -62,8 +62,8 @@  typedef __u32 __bitwise req_flags_t;
 #define RQF_RESV		((__force req_flags_t)(1 << 23))
 
 /* flags that prevent us from merging requests: */
-#define RQF_NOMERGE_FLAGS \
-	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
+#define RQF_NOMERGE_FLAGS                                               \
+	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_DONTPREP | RQF_SPECIAL_PAYLOAD)
 
 enum mq_rq_state {
 	MQ_RQ_IDLE		= 0,

[v4,3/7] block: Send requeued requests to the I/O scheduler

Commit Message

Comments

Patch