diff mbox series

[v4,3/7] block: Send requeued requests to the I/O scheduler

Message ID 20230621201237.796902-4-bvanassche@acm.org (mailing list archive)
State New, archived
Headers show
Series Submit zoned writes in order | expand

Commit Message

Bart Van Assche June 21, 2023, 8:12 p.m. UTC
Send requeued requests to the I/O scheduler if the dispatch order
matters such that the I/O scheduler can control the order in which
requests are dispatched.

This patch reworks commit aef1897cd36d ("blk-mq: insert rq with DONTPREP
to hctx dispatch list when requeue"). Instead of sending DONTPREP
requests to the dispatch list, send these to the I/O scheduler and
prevent that the I/O scheduler merges these requests by adding
RQF_DONTPREP to the list of flags that prevent merging
(RQF_NOMERGE_FLAGS).

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c         | 10 +++++-----
 include/linux/blk-mq.h |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

Comments

Damien Le Moal June 22, 2023, 1:19 a.m. UTC | #1
On 6/22/23 05:12, Bart Van Assche wrote:
> Send requeued requests to the I/O scheduler if the dispatch order
> matters such that the I/O scheduler can control the order in which
> requests are dispatched.
> 
> This patch reworks commit aef1897cd36d ("blk-mq: insert rq with DONTPREP
> to hctx dispatch list when requeue"). Instead of sending DONTPREP
> requests to the dispatch list, send these to the I/O scheduler and
> prevent that the I/O scheduler merges these requests by adding
> RQF_DONTPREP to the list of flags that prevent merging
> (RQF_NOMERGE_FLAGS).
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c         | 10 +++++-----
>  include/linux/blk-mq.h |  4 ++--
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f440e4aaaae3..453a90767f7a 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1453,13 +1453,13 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  	while (!list_empty(&requeue_list)) {
>  		rq = list_entry(requeue_list.next, struct request, queuelist);
>  		/*
> -		 * If RQF_DONTPREP ist set, the request has been started by the
> -		 * driver already and might have driver-specific data allocated
> -		 * already.  Insert it into the hctx dispatch list to avoid
> -		 * block layer merges for the request.
> +		 * Only send those RQF_DONTPREP requests to the dispatch list
> +		 * that may be reordered freely. If the request order matters,
> +		 * send the request to the I/O scheduler.
>  		 */
>  		list_del_init(&rq->queuelist);
> -		if (rq->rq_flags & RQF_DONTPREP)
> +		if (rq->rq_flags & RQF_DONTPREP &&
> +		    !op_needs_zoned_write_locking(req_op(rq)))

Why ? I still do not understand the need for this. There is always only a single
in-flight write per sequential zone. Requeuing that in-flight write directly to
the dispatch list will not reorder writes and it will be better for the command
latency.

>  			blk_mq_request_bypass_insert(rq, 0);
>  		else
>  			blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index f401067ac03a..2610b299ec77 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -62,8 +62,8 @@ typedef __u32 __bitwise req_flags_t;
>  #define RQF_RESV		((__force req_flags_t)(1 << 23))
>  
>  /* flags that prevent us from merging requests: */
> -#define RQF_NOMERGE_FLAGS \
> -	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
> +#define RQF_NOMERGE_FLAGS                                               \
> +	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_DONTPREP | RQF_SPECIAL_PAYLOAD)
>  
>  enum mq_rq_state {
>  	MQ_RQ_IDLE		= 0,
Bart Van Assche June 22, 2023, 5:23 p.m. UTC | #2
On 6/21/23 18:19, Damien Le Moal wrote:
> Why ? I still do not understand the need for this. There is always only a single
> in-flight write per sequential zone. Requeuing that in-flight write directly to
> the dispatch list will not reorder writes and it will be better for the command
> latency.
Hi Damien,

After having taken a closer look I see that blk_req_zone_write_unlock() 
is called from inside dd_insert_request() when requeuing a request. 
Hence, there is no reordering risk when requeuing a zoned write. I will 
drop this patch.

Do you agree with having one requeue list per hctx instead of per 
request queue? This change enables eliminating 
blk_mq_kick_requeue_list(). I think that's an interesting simplification 
of the block layer API.

Thanks,

Bart.
Damien Le Moal June 22, 2023, 9:50 p.m. UTC | #3
On 6/23/23 02:23, Bart Van Assche wrote:
> On 6/21/23 18:19, Damien Le Moal wrote:
>> Why ? I still do not understand the need for this. There is always only a single
>> in-flight write per sequential zone. Requeuing that in-flight write directly to
>> the dispatch list will not reorder writes and it will be better for the command
>> latency.
> Hi Damien,
> 
> After having taken a closer look I see that blk_req_zone_write_unlock() 
> is called from inside dd_insert_request() when requeuing a request. 
> Hence, there is no reordering risk when requeuing a zoned write. I will 
> drop this patch.

OK. Thanks.

> 
> Do you agree with having one requeue list per hctx instead of per 
> request queue? This change enables eliminating 
> blk_mq_kick_requeue_list(). I think that's an interesting simplification 
> of the block layer API.

I do not see any issue with that. Indeed, it does simplify the code nicely.
Reading patch 5, I wondered though if it is really worth keeping the helpers
blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list(). May be calling
blk_mq_run_hw_queues() and blk_mq_delay_run_hw_queues() is better ? No strong
opinion though.
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f440e4aaaae3..453a90767f7a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1453,13 +1453,13 @@  static void blk_mq_requeue_work(struct work_struct *work)
 	while (!list_empty(&requeue_list)) {
 		rq = list_entry(requeue_list.next, struct request, queuelist);
 		/*
-		 * If RQF_DONTPREP ist set, the request has been started by the
-		 * driver already and might have driver-specific data allocated
-		 * already.  Insert it into the hctx dispatch list to avoid
-		 * block layer merges for the request.
+		 * Only send those RQF_DONTPREP requests to the dispatch list
+		 * that may be reordered freely. If the request order matters,
+		 * send the request to the I/O scheduler.
 		 */
 		list_del_init(&rq->queuelist);
-		if (rq->rq_flags & RQF_DONTPREP)
+		if (rq->rq_flags & RQF_DONTPREP &&
+		    !op_needs_zoned_write_locking(req_op(rq)))
 			blk_mq_request_bypass_insert(rq, 0);
 		else
 			blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f401067ac03a..2610b299ec77 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -62,8 +62,8 @@  typedef __u32 __bitwise req_flags_t;
 #define RQF_RESV		((__force req_flags_t)(1 << 23))
 
 /* flags that prevent us from merging requests: */
-#define RQF_NOMERGE_FLAGS \
-	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
+#define RQF_NOMERGE_FLAGS                                               \
+	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_DONTPREP | RQF_SPECIAL_PAYLOAD)
 
 enum mq_rq_state {
 	MQ_RQ_IDLE		= 0,