diff mbox series

[RFC] blk-mq: don't plug for HIPRI IO

Message ID 20201027132951.121812-1-xiaoguang.wang@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series [RFC] blk-mq: don't plug for HIPRI IO | expand

Commit Message

Xiaoguang Wang Oct. 27, 2020, 1:29 p.m. UTC
Commit cb700eb3faa4 ("block: don't plug for aio/O_DIRECT HIPRI IO")
only does not call blk_start_plug() or blk_finish_plug for HIPRI IO
in __blkdev_direct_IO(), but if upper layer subsystem, such as io_uring,
still initializes valid plug, block layer may still plug HIPRI IO.
To disable plug for HIPRI IO completely, do it in blk_mq_plug().

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 block/blk-mq.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig Oct. 27, 2020, 7:33 p.m. UTC | #1
On Tue, Oct 27, 2020 at 09:29:51PM +0800, Xiaoguang Wang wrote:
> Commit cb700eb3faa4 ("block: don't plug for aio/O_DIRECT HIPRI IO")
> only does not call blk_start_plug() or blk_finish_plug for HIPRI IO
> in __blkdev_direct_IO(), but if upper layer subsystem, such as io_uring,
> still initializes valid plug, block layer may still plug HIPRI IO.
> To disable plug for HIPRI IO completely, do it in blk_mq_plug().

This creates a somewhat awkward layering.  Why can't io_uring just
stop creating a plug?
Jens Axboe Oct. 28, 2020, 2:42 p.m. UTC | #2
On 10/27/20 7:29 AM, Xiaoguang Wang wrote:
> Commit cb700eb3faa4 ("block: don't plug for aio/O_DIRECT HIPRI IO")
> only does not call blk_start_plug() or blk_finish_plug for HIPRI IO
> in __blkdev_direct_IO(), but if upper layer subsystem, such as io_uring,
> still initializes valid plug, block layer may still plug HIPRI IO.
> To disable plug for HIPRI IO completely, do it in blk_mq_plug().

There's something funky going on with plugging and polled IO. I tried
to improve the io_uring plugging, so we don't plug for polled IO (or for
non-bdev IO in general), and it tanked performance here from ~2.5M IOPS
to ~1.4M IOPS. Thinking I had made some sort of mistake, I just tried
your patch alone, and I see the same performance drop.

This doesn't make a lot of sense, so some investigation is needed.
diff mbox series

Patch

diff --git a/block/blk-mq.h b/block/blk-mq.h
index a52703c..5453d14 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -272,9 +272,11 @@  static inline struct blk_plug *blk_mq_plug(struct request_queue *q,
 {
 	/*
 	 * For regular block devices or read operations, use the context plug
-	 * which may be NULL if blk_start_plug() was not executed.
+	 * which may be NULL if blk_start_plug() was not executed, and don't
+	 * plug for HIPRI/polled IO, as those should go straight to issue.
 	 */
-	if (!blk_queue_is_zoned(q) || !op_is_write(bio_op(bio)))
+	if (!(bio->bi_opf & REQ_HIPRI) &&
+	    (!blk_queue_is_zoned(q) || !op_is_write(bio_op(bio))))
 		return current->plug;
 
 	/* Zoned block device write operation case: do not plug the BIO */