Message ID | 20230418224002.1195163-7-bvanassche@acm.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mq-deadline: Improve support for zoned block devices | expand |
On Tue, Apr 18, 2023 at 03:39:57PM -0700, Bart Van Assche wrote: > Make sure that zoned writes (REQ_OP_WRITE and REQ_OP_WRITE_ZEROES) are > submitted in LBA order. This patch does not affect REQ_OP_WRITE_APPEND > requests. As said before this is not correct. What we need to instead is to support proper patch insertation when the at_head flag is set so that the requests get inserted before the existing requests, but in ordered they are passed to the I/O scheduler. This also needs to be done for the other two I/O schedulers.
On 4/18/23 21:30, Christoph Hellwig wrote: > On Tue, Apr 18, 2023 at 03:39:57PM -0700, Bart Van Assche wrote: >> Make sure that zoned writes (REQ_OP_WRITE and REQ_OP_WRITE_ZEROES) are >> submitted in LBA order. This patch does not affect REQ_OP_WRITE_APPEND >> requests. > > As said before this is not correct. What we need to instead is to > support proper patch insertation when the at_head flag is set so > that the requests get inserted before the existing requests, but > in ordered they are passed to the I/O scheduler. It is not clear to me how this approach should work if the AT_HEAD flag is set for some zoned writes but not for other zoned writes for the same zone? The mq-deadline scheduler uses separate lists for at-head and for other requests. Having to check both lists before dispatching a request would significantly complicate the mq-deadline scheduler. > This also needs to be done for the other two I/O schedulers. As far as I know neither Kyber nor BFQ support zoned storage so we don't need to worry about how these schedulers handle zoned writes? Thanks, Bart.
On Wed, Apr 19, 2023 at 03:43:41PM -0700, Bart Van Assche wrote: >> As said before this is not correct. What we need to instead is to >> support proper patch insertation when the at_head flag is set so >> that the requests get inserted before the existing requests, but >> in ordered they are passed to the I/O scheduler. > > It is not clear to me how this approach should work if the AT_HEAD flag is > set for some zoned writes but not for other zoned writes for the same zone? > The mq-deadline scheduler uses separate lists for at-head and for other > requests. Having to check both lists before dispatching a request would > significantly complicate the mq-deadline scheduler. > >> This also needs to be done for the other two I/O schedulers. > > As far as I know neither Kyber nor BFQ support zoned storage so we don't > need to worry about how these schedulers handle zoned writes? The problem is that we now run into different handling depending on which kind of write is coming in. So we need to find a policy that works for everyone. Remember that as of the current for-6.4/block branch the only at_head inservations remaining are: - blk_execute* for passthrough requests that never enter the I/O scheduler - REQ_OP_FLUSH that never enter the I/O scheduler - requeues using blk_mq_requeue_request - processing of the actual writes in the flush state machine with the last one going away in my RFC series. So if we come to the conclusion that requeues from the driver do not actually need at_head insertations to avoid starvation or similar we should just remove at_head insertations from the I/O scheduler. If we can't do that, we need to handle them for zoned writes as well.
On 4/19/23 22:06, Christoph Hellwig wrote: > The problem is that we now run into different handling depending > on which kind of write is coming in. So we need to find a policy > that works for everyone. Remember that as of the current for-6.4/block > branch the only at_head inservations remaining are: > > - blk_execute* for passthrough requests that never enter the I/O > scheduler > - REQ_OP_FLUSH that never enter the I/O scheduler > - requeues using blk_mq_requeue_request > - processing of the actual writes in the flush state machine > > with the last one going away in my RFC series. > > So if we come to the conclusion that requeues from the driver do not > actually need at_head insertations to avoid starvation or similar > we should just remove at_head insertations from the I/O scheduler. > If we can't do that, we need to handle them for zoned writes as well. Hi Christoph, I'm fine with not inserting requeued requests at the head of the queue. Inserting requeued requests at the head of the queue only preserves the original submission order if a single request is requeued. If multiple requests are requeued inserting at the head of the queue will cause inversion of the order of the requeued requests. This implies that the I/O scheduler or disk controller (if no I/O scheduler is configured) will become responsible for optimizing the request order if requeuing happens. Bart.
On Thu, Apr 20, 2023 at 10:00:07AM -0700, Bart Van Assche wrote: > I'm fine with not inserting requeued requests at the head of the queue. > Inserting requeued requests at the head of the queue only preserves the > original submission order if a single request is requeued. Yes. > If multiple > requests are requeued inserting at the head of the queue will cause > inversion of the order of the requeued requests. > > This implies that the I/O scheduler or disk controller (if no I/O scheduler > is configured) will become responsible for optimizing the request order if > requeuing happens. I think we need to understand why these requeues even happen. Unfortunately blk_mq_requeue_request has quite a few callers, so they'll need a bit of an audit. Quite a few are about resource constraints in the hardware and or driver. In this case I suspect it is essential that they are prioritized over incoming new commands in the way I suggested before. A different case is nvme_retry_req with the CRD field set, in which case we want to wait some time before retrying this particular command, so having new command bypass it makes sense. Another one is the SCSI ALUA transition delay, in which case we want to wait before sending commands to the LU again. In this case we really don't want to resend any commands until the delay in kicking the requeue list. So I'm really not sure what the right thing to is here. But I'm pretty sure just skipping head inserts for zoned locked writes is even more wrong than what we do right now. And I also don't see what it would be useful for. All zoned writes should either be locked by higher layers, or even better just use zone append and just get a new new location assigned when requeing as discussed before.
diff --git a/block/mq-deadline.c b/block/mq-deadline.c index 4a0a269db382..a73e16d3f8ac 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -796,7 +796,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, trace_block_rq_insert(rq); - if (flags & BLK_MQ_INSERT_AT_HEAD) { + if ((flags & BLK_MQ_INSERT_AT_HEAD) && !blk_rq_is_seq_zoned_write(rq)) { list_add(&rq->queuelist, &per_prio->dispatch); rq->fifo_time = jiffies; } else {
Make sure that zoned writes (REQ_OP_WRITE and REQ_OP_WRITE_ZEROES) are submitted in LBA order. This patch does not affect REQ_OP_WRITE_APPEND requests. Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- block/mq-deadline.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)