diff mbox series

[v2] io_uring: Avoid polling configuration errors

Message ID 20240711082430.609597-1-xue01.he@samsung.com (mailing list archive)
State New
Headers show
Series [v2] io_uring: Avoid polling configuration errors | expand

Commit Message

hexue July 11, 2024, 8:24 a.m. UTC
If user doesn't configured poll queue but do the polled IO, it will get
a low performeance than regular poll, but 100% CPU uage. And there's no
prompts or warnings.

This patch aims to help users more easily verify their configurations
correctly, avoiding time and performance losses.

--
changes from v1:
- without disrupting the original I/O process.
- move judgement from block to io_uring.

Signed-off-by: hexue <xue01.he@samsung.com>
---
 include/linux/io_uring_types.h |  1 +
 io_uring/io_uring.c            |  4 +++-
 io_uring/rw.c                  | 18 ++++++++++++++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig July 12, 2024, 5:20 a.m. UTC | #1
On Thu, Jul 11, 2024 at 04:24:30PM +0800, hexue wrote:
> +	if (!ctx->check_poll_queue) {
> +		struct block_device *bdev;
> +		struct request_queue *q;
> +		struct inode *inode = req->file->f_inode;
> +
> +		if (inode->i_rdev) {
> +			bdev = blkdev_get_no_open(inode->i_rdev);
> +			q = bdev->bd_queue;
> +			if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
> +				pr_warn("the device does't configured with poll queues\n");
> +		}
> +		ctx->check_poll_queue = true;
> +	}

This is wrong for multiple reasons.  One is that we can't simply poke
into block device internals like this in a higher layer like io_uring.
Second blkdev_get_no_open is in no way available for use outside the
block layer.  The fact that the even exist as separate helpers that
aren't entirely hidden is a decade old layering violation in blk-cgroup.

If you want to advertize this properly we'll need a flag in struct
file or something similar.
hexue July 12, 2024, 6:57 a.m. UTC | #2
>This is wrong for multiple reasons.  One is that we can't simply poke
>into block device internals like this in a higher layer like io_uring.
>Second blkdev_get_no_open is in no way available for use outside the
>block layer.  The fact that the even exist as separate helpers that
>aren't entirely hidden is a decade old layering violation in blk-cgroup.

Got it, thanks.

>If you want to advertize this properly we'll need a flag in struct
>file or something similar.

Thanks, I will try to do this.

--
hexue
Jens Axboe July 12, 2024, 3:36 p.m. UTC | #3
On 7/12/24 12:57 AM, hexue wrote:
>> This is wrong for multiple reasons.  One is that we can't simply poke
>> into block device internals like this in a higher layer like io_uring.
>> Second blkdev_get_no_open is in no way available for use outside the
>> block layer.  The fact that the even exist as separate helpers that
>> aren't entirely hidden is a decade old layering violation in blk-cgroup.
> 
> Got it, thanks.
> 
>> If you want to advertize this properly we'll need a flag in struct
>> file or something similar.
> 
> Thanks, I will try to do this.

My stance is still the same - why add all of this junk just to detect a
misuse of polled IO? It doesn't make sense to me, it's the very
definition of "doctor it hurts when I do this" - don't do it.

So unless this has _zero_ overhead or extra code, which obviously isn't
possible, or extraordinary arguments exists for why this should be
added, I don't see this going anywhere.
hexue July 15, 2024, 2:39 a.m. UTC | #4
>My stance is still the same - why add all of this junk just to detect a
>misuse of polled IO? It doesn't make sense to me, it's the very
>definition of "doctor it hurts when I do this" - don't do it.

>So unless this has _zero_ overhead or extra code, which obviously isn't
>possible, or extraordinary arguments exists for why this should be
>added, I don't see this going anywhere.

Actually, I just want users to know why they got wrong data, just a warning of an error,
like doctor tell you why you do this will hurt. I think it's helpful for users to use tools
accurately.
and yes, this should be as simple as possible, I'll working on it. I'm not sure if I made
myself clear and make sense to you?

--
hexue
Jens Axboe July 15, 2024, 10:59 a.m. UTC | #5
On 7/14/24 8:39 PM, hexue wrote:
>> My stance is still the same - why add all of this junk just to detect a
>> misuse of polled IO? It doesn't make sense to me, it's the very
>> definition of "doctor it hurts when I do this" - don't do it.
> 
>> So unless this has _zero_ overhead or extra code, which obviously isn't
>> possible, or extraordinary arguments exists for why this should be
>> added, I don't see this going anywhere.
> 
> Actually, I just want users to know why they got wrong data, just a
> warning of an error, like doctor tell you why you do this will hurt. I
> think it's helpful for users to use tools accurately. and yes, this
> should be as simple as possible, I'll working on it. I'm not sure if I
> made myself clear and make sense to you?

Certainly agree that that is an issue and a much more worthy reason for
the addition. It's the main reason why -EOPNOTSUPP return would be more
useful, and I'd probably argue the better way then to do it. It may
indeed break existing use cases, but probably only because they are
misconfigured.

That then means that it'd be saner to do this on the block layer side,
imho, as that's when the queue is resolved anyway, rather than attempt
to hack around this on the issuing side.
hexue July 16, 2024, 7:16 a.m. UTC | #6
On 7/15/24 10:59 AM, Jens Axboe wrote:
>On 7/14/24 8:39 PM, hexue wrote:
>>> My stance is still the same - why add all of this junk just to detect a
>>> misuse of polled IO? It doesn't make sense to me, it's the very
>>> definition of "doctor it hurts when I do this" - don't do it.
>>>
>>> So unless this has _zero_ overhead or extra code, which obviously isn't
>>> possible, or extraordinary arguments exists for why this should be
>>> added, I don't see this going anywhere.
>>
>> Actually, I just want users to know why they got wrong data, just a
>> warning of an error, like doctor tell you why you do this will hurt. I
>> think it's helpful for users to use tools accurately. and yes, this
>> should be as simple as possible, I'll working on it. I'm not sure if I
>> made myself clear and make sense to you?
>
>Certainly agree that that is an issue and a much more worthy reason for
>the addition. It's the main reason why -EOPNOTSUPP return would be more
>useful, and I'd probably argue the better way then to do it. It may
>indeed break existing use cases, but probably only because they are
>misconfigured.
>
>That then means that it'd be saner to do this on the block layer side,
>imho, as that's when the queue is resolved anyway, rather than attempt
>to hack around this on the issuing side.

Implementing it at the block layer is indeed more reasonable, thanks for
your affirmation and suggestion, I will look for an appropriate place in
the path to perform the check. Thanks.
diff mbox series

Patch

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 91224bbcfa73..270c3edbbf21 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -428,6 +428,7 @@  struct io_ring_ctx {
 	unsigned short			n_sqe_pages;
 	struct page			**ring_pages;
 	struct page			**sqe_pages;
+	bool				check_poll_queue;
 };
 
 struct io_tw_state {
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 816e93e7f949..1b45b4c52ae0 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3463,8 +3463,10 @@  static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
 	    !(ctx->flags & IORING_SETUP_SQPOLL))
 		ctx->task_complete = true;
 
-	if (ctx->task_complete || (ctx->flags & IORING_SETUP_IOPOLL))
+	if (ctx->task_complete || (ctx->flags & IORING_SETUP_IOPOLL)) {
 		ctx->lockless_cq = true;
+		ctx->check_poll_queue = false;
+	}
 
 	/*
 	 * lazy poll_wq activation relies on ->task_complete for synchronisation
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 1a2128459cb4..20f417152a17 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -772,6 +772,23 @@  static bool need_complete_io(struct io_kiocb *req)
 		S_ISBLK(file_inode(req->file)->i_mode);
 }
 
+static void check_poll_queue_state(struct io_ring_ctx *ctx, struct io_kiocb *req)
+{
+	if (!ctx->check_poll_queue) {
+		struct block_device *bdev;
+		struct request_queue *q;
+		struct inode *inode = req->file->f_inode;
+
+		if (inode->i_rdev) {
+			bdev = blkdev_get_no_open(inode->i_rdev);
+			q = bdev->bd_queue;
+			if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+				pr_warn("the device does't configured with poll queues\n");
+		}
+		ctx->check_poll_queue = true;
+	}
+}
+
 static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
 {
 	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
@@ -804,6 +821,7 @@  static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
 	if (ctx->flags & IORING_SETUP_IOPOLL) {
 		if (!(kiocb->ki_flags & IOCB_DIRECT) || !file->f_op->iopoll)
 			return -EOPNOTSUPP;
+		check_poll_queue_state(ctx, req);
 
 		kiocb->private = NULL;
 		kiocb->ki_flags |= IOCB_HIPRI;