diff mbox series

[v3,4/7] block: Make it easier to debug zoned write reordering

Message ID 20230522183845.354920-5-bvanassche@acm.org (mailing list archive)
State New, archived
Headers show
Series Submit zoned writes in order | expand

Commit Message

Bart Van Assche May 22, 2023, 6:38 p.m. UTC
Issue a kernel warning if a zoned write is passed directly to the block
driver instead of to the I/O scheduler because passing a zoned write
directly to the block driver may cause zoned write reordering.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Christoph Hellwig May 23, 2023, 7:19 a.m. UTC | #1
On Mon, May 22, 2023 at 11:38:39AM -0700, Bart Van Assche wrote:
> @@ -2429,6 +2429,9 @@ static void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags)
>  {
>  	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>  
> +	WARN_ON_ONCE(rq->rq_flags & RQF_USE_SCHED &&
> +		     blk_rq_is_seq_zoned_write(rq));
> +
>  	spin_lock(&hctx->lock);
>  	if (flags & BLK_MQ_INSERT_AT_HEAD)
>  		list_add(&rq->queuelist, &hctx->dispatch);
> @@ -2562,6 +2565,9 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
>  	};
>  	blk_status_t ret;
>  
> +	WARN_ON_ONCE(rq->rq_flags & RQF_USE_SCHED &&
> +		     blk_rq_is_seq_zoned_write(rq));

What makes sequential writes here special vs other requests that are
supposed to be using the scheduler and not a bypass?
Bart Van Assche May 23, 2023, 7:34 p.m. UTC | #2
On 5/23/23 00:19, Christoph Hellwig wrote:
> On Mon, May 22, 2023 at 11:38:39AM -0700, Bart Van Assche wrote:
>> @@ -2429,6 +2429,9 @@ static void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags)
>>   {
>>   	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>>   
>> +	WARN_ON_ONCE(rq->rq_flags & RQF_USE_SCHED &&
>> +		     blk_rq_is_seq_zoned_write(rq));
>> +
>>   	spin_lock(&hctx->lock);
>>   	if (flags & BLK_MQ_INSERT_AT_HEAD)
>>   		list_add(&rq->queuelist, &hctx->dispatch);
>> @@ -2562,6 +2565,9 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
>>   	};
>>   	blk_status_t ret;
>>   
>> +	WARN_ON_ONCE(rq->rq_flags & RQF_USE_SCHED &&
>> +		     blk_rq_is_seq_zoned_write(rq));
> 
> What makes sequential writes here special vs other requests that are
> supposed to be using the scheduler and not a bypass?

Hi Christoph,

If some REQ_OP_WRITE or REQ_OP_WRITE_ZEROES requests are submitted to 
the I/O scheduler and others bypass the I/O scheduler this may lead to 
reordering. Hence this patch that triggers a kernel warning if any 
REQ_OP_WRITE or REQ_OP_WRITE_ZEROES requests bypass the I/O scheduler.

Thanks,

Bart.
Christoph Hellwig May 24, 2023, 6:13 a.m. UTC | #3
On Tue, May 23, 2023 at 12:34:38PM -0700, Bart Van Assche wrote:
>> What makes sequential writes here special vs other requests that are
>> supposed to be using the scheduler and not a bypass?
>
> Hi Christoph,
>
> If some REQ_OP_WRITE or REQ_OP_WRITE_ZEROES requests are submitted to the 
> I/O scheduler and others bypass the I/O scheduler this may lead to 
> reordering. Hence this patch that triggers a kernel warning if any 
> REQ_OP_WRITE or REQ_OP_WRITE_ZEROES requests bypass the I/O scheduler.

But why would we ever do a direct insert when using a scheduler for
write (an read for that matter) commands when not on a zoned device?
Bart Van Assche May 24, 2023, 6:25 p.m. UTC | #4
On 5/23/23 23:13, Christoph Hellwig wrote:
> On Tue, May 23, 2023 at 12:34:38PM -0700, Bart Van Assche wrote:
>>> What makes sequential writes here special vs other requests that are
>>> supposed to be using the scheduler and not a bypass?
>>
>> Hi Christoph,
>>
>> If some REQ_OP_WRITE or REQ_OP_WRITE_ZEROES requests are submitted to the
>> I/O scheduler and others bypass the I/O scheduler this may lead to
>> reordering. Hence this patch that triggers a kernel warning if any
>> REQ_OP_WRITE or REQ_OP_WRITE_ZEROES requests bypass the I/O scheduler.
> 
> But why would we ever do a direct insert when using a scheduler for
> write (an read for that matter) commands when not on a zoned device?

Patch 2/7 of this series removes a blk_mq_request_bypass_insert() call 
from blk_mq_requeue_work(). I think this shows that without this patch 
series blk_mq_request_bypass_insert() could get called for REQ_OP_WRITE 
/ REQ_OP_WRITE_ZEROES requests.

Thanks,

Bart.
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bc52a57641e2..9ef6fa5d7471 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2429,6 +2429,9 @@  static void blk_mq_request_bypass_insert(struct request *rq, blk_insert_t flags)
 {
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
+	WARN_ON_ONCE(rq->rq_flags & RQF_USE_SCHED &&
+		     blk_rq_is_seq_zoned_write(rq));
+
 	spin_lock(&hctx->lock);
 	if (flags & BLK_MQ_INSERT_AT_HEAD)
 		list_add(&rq->queuelist, &hctx->dispatch);
@@ -2562,6 +2565,9 @@  static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
 	};
 	blk_status_t ret;
 
+	WARN_ON_ONCE(rq->rq_flags & RQF_USE_SCHED &&
+		     blk_rq_is_seq_zoned_write(rq));
+
 	/*
 	 * For OK queue, we are done. For error, caller may kill it.
 	 * Any other error (busy), just add it to our list as we