diff mbox series

[v2,06/11] block: mq-deadline: Disable head insertion for zoned writes

Message ID 20230418224002.1195163-7-bvanassche@acm.org (mailing list archive)
State New, archived
Headers show
Series mq-deadline: Improve support for zoned block devices | expand

Commit Message

Bart Van Assche April 18, 2023, 10:39 p.m. UTC
Make sure that zoned writes (REQ_OP_WRITE and REQ_OP_WRITE_ZEROES) are
submitted in LBA order. This patch does not affect REQ_OP_WRITE_APPEND
requests.

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christoph Hellwig April 19, 2023, 4:30 a.m. UTC | #1
On Tue, Apr 18, 2023 at 03:39:57PM -0700, Bart Van Assche wrote:
> Make sure that zoned writes (REQ_OP_WRITE and REQ_OP_WRITE_ZEROES) are
> submitted in LBA order. This patch does not affect REQ_OP_WRITE_APPEND
> requests.

As said before this is not correct.  What we need to instead is to
support proper patch insertation when the at_head flag is set so
that the requests get inserted before the existing requests, but
in ordered they are passed to the I/O scheduler.

This also needs to be done for the other two I/O schedulers.
Bart Van Assche April 19, 2023, 10:43 p.m. UTC | #2
On 4/18/23 21:30, Christoph Hellwig wrote:
> On Tue, Apr 18, 2023 at 03:39:57PM -0700, Bart Van Assche wrote:
>> Make sure that zoned writes (REQ_OP_WRITE and REQ_OP_WRITE_ZEROES) are
>> submitted in LBA order. This patch does not affect REQ_OP_WRITE_APPEND
>> requests.
> 
> As said before this is not correct.  What we need to instead is to
> support proper patch insertation when the at_head flag is set so
> that the requests get inserted before the existing requests, but
> in ordered they are passed to the I/O scheduler.

It is not clear to me how this approach should work if the AT_HEAD flag 
is set for some zoned writes but not for other zoned writes for the same 
zone? The mq-deadline scheduler uses separate lists for at-head and for 
other requests. Having to check both lists before dispatching a request 
would significantly complicate the mq-deadline scheduler.

> This also needs to be done for the other two I/O schedulers.

As far as I know neither Kyber nor BFQ support zoned storage so we don't 
need to worry about how these schedulers handle zoned writes?

Thanks,

Bart.
Christoph Hellwig April 20, 2023, 5:06 a.m. UTC | #3
On Wed, Apr 19, 2023 at 03:43:41PM -0700, Bart Van Assche wrote:
>> As said before this is not correct.  What we need to instead is to
>> support proper patch insertation when the at_head flag is set so
>> that the requests get inserted before the existing requests, but
>> in ordered they are passed to the I/O scheduler.
>
> It is not clear to me how this approach should work if the AT_HEAD flag is 
> set for some zoned writes but not for other zoned writes for the same zone? 
> The mq-deadline scheduler uses separate lists for at-head and for other 
> requests. Having to check both lists before dispatching a request would 
> significantly complicate the mq-deadline scheduler.
>
>> This also needs to be done for the other two I/O schedulers.
>
> As far as I know neither Kyber nor BFQ support zoned storage so we don't 
> need to worry about how these schedulers handle zoned writes?

The problem is that we now run into different handling depending
on which kind of write is coming in.  So we need to find a policy
that works for everyone.  Remember that as of the current for-6.4/block
branch the only at_head inservations remaining are:

 - blk_execute* for passthrough requests that never enter the I/O
   scheduler
 - REQ_OP_FLUSH that never enter the I/O scheduler
 - requeues using blk_mq_requeue_request
 - processing of the actual writes in the flush state machine

with the last one going away in my RFC series.

So if we come to the conclusion that requeues from the driver do not
actually need at_head insertations to avoid starvation or similar
we should just remove at_head insertations from the I/O scheduler.
If we can't do that, we need to handle them for zoned writes as well.
Bart Van Assche April 20, 2023, 5 p.m. UTC | #4
On 4/19/23 22:06, Christoph Hellwig wrote:
> The problem is that we now run into different handling depending
> on which kind of write is coming in.  So we need to find a policy
> that works for everyone.  Remember that as of the current for-6.4/block
> branch the only at_head inservations remaining are:
> 
>   - blk_execute* for passthrough requests that never enter the I/O
>     scheduler
>   - REQ_OP_FLUSH that never enter the I/O scheduler
>   - requeues using blk_mq_requeue_request
>   - processing of the actual writes in the flush state machine
> 
> with the last one going away in my RFC series.
> 
> So if we come to the conclusion that requeues from the driver do not
> actually need at_head insertations to avoid starvation or similar
> we should just remove at_head insertations from the I/O scheduler.
> If we can't do that, we need to handle them for zoned writes as well.

Hi Christoph,

I'm fine with not inserting requeued requests at the head of the queue. 
Inserting requeued requests at the head of the queue only preserves the 
original submission order if a single request is requeued. If multiple 
requests are requeued inserting at the head of the queue will cause 
inversion of the order of the requeued requests.

This implies that the I/O scheduler or disk controller (if no I/O 
scheduler is configured) will become responsible for optimizing the 
request order if requeuing happens.

Bart.
Christoph Hellwig April 24, 2023, 7 a.m. UTC | #5
On Thu, Apr 20, 2023 at 10:00:07AM -0700, Bart Van Assche wrote:
> I'm fine with not inserting requeued requests at the head of the queue. 
> Inserting requeued requests at the head of the queue only preserves the 
> original submission order if a single request is requeued.

Yes.

> If multiple 
> requests are requeued inserting at the head of the queue will cause 
> inversion of the order of the requeued requests.
>
> This implies that the I/O scheduler or disk controller (if no I/O scheduler 
> is configured) will become responsible for optimizing the request order if 
> requeuing happens.

I think we need to understand why these requeues even happen.
Unfortunately blk_mq_requeue_request has quite a few callers, so they'll
need a bit of an audit.

Quite a few are about resource constraints in the hardware and or driver.
In this case I suspect it is essential that they are prioritized over
incoming new commands in the way I suggested before.

A different case is nvme_retry_req with the CRD field set, in which
case we want to wait some time before retrying this particular command,
so having new command bypass it makes sense.

Another one is the SCSI ALUA transition delay, in which case we want
to wait before sending commands to the LU again.  In this case we
really don't want to resend any commands until the delay in kicking
the requeue list.

So I'm really not sure what the right thing to is here.  But I'm
pretty sure just skipping head inserts for zoned locked writes is
even more wrong than what we do right now.  And I also don't see
what it would be useful for.  All zoned writes should either be
locked by higher layers, or even better just use zone append and
just get a new new location assigned when requeing as discussed
before.
diff mbox series

Patch

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 4a0a269db382..a73e16d3f8ac 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -796,7 +796,7 @@  static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	trace_block_rq_insert(rq);
 
-	if (flags & BLK_MQ_INSERT_AT_HEAD) {
+	if ((flags & BLK_MQ_INSERT_AT_HEAD) && !blk_rq_is_seq_zoned_write(rq)) {
 		list_add(&rq->queuelist, &per_prio->dispatch);
 		rq->fifo_time = jiffies;
 	} else {