diff mbox series

[for-next,2/2] nvme: optimise io_uring passthrough completion

Message ID ecdfacd0967a22d88b7779e2efd09e040825d0f8.1684154817.git.asml.silence@gmail.com (mailing list archive)
State New
Headers show
Series Enable IOU_F_TWQ_LAZY_WAKE for passthrough | expand

Commit Message

Pavel Begunkov May 15, 2023, 12:54 p.m. UTC
Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
commands completion. It further delays the execution of task_work for
DEFER_TASKRUN until there are enough of task_work items queued to meet
the waiting criteria, which reduces the number of wake ups we issue.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/nvme/host/ioctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig May 17, 2023, 7:23 a.m. UTC | #1
On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote:
> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
> commands completion. It further delays the execution of task_work for
> DEFER_TASKRUN until there are enough of task_work items queued to meet
> the waiting criteria, which reduces the number of wake ups we issue.

Why wouldn't you just do that unconditionally for
io_uring_cmd_complete_in_task?
Pavel Begunkov May 17, 2023, 12:32 p.m. UTC | #2
On 5/17/23 08:23, Christoph Hellwig wrote:
> On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote:
>> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
>> commands completion. It further delays the execution of task_work for
>> DEFER_TASKRUN until there are enough of task_work items queued to meet
>> the waiting criteria, which reduces the number of wake ups we issue.
> 
> Why wouldn't you just do that unconditionally for
> io_uring_cmd_complete_in_task?

1) ublk does secondary batching and so may produce multiple cqes,
that's not supported. I believe Ming sent patches removing it,
but I'd rather not deal with conflicts for now.

2) Some users may have dependencies b/w requests, i.e. a request
will only complete when another request's task_work is executed.

3) There might be use cases when you don't wont it to be delayed,
IO retries would be a good example. I wouldn't also use it for
control paths like ublk_ctrl_uring_cmd.

Let's better keep io_uring_cmd_complete_in_task() as the default
option for the interface.
Christoph Hellwig May 17, 2023, 12:39 p.m. UTC | #3
On Wed, May 17, 2023 at 01:32:53PM +0100, Pavel Begunkov wrote:
> 1) ublk does secondary batching and so may produce multiple cqes,
> that's not supported. I believe Ming sent patches removing it,
> but I'd rather not deal with conflicts for now.
>
> 2) Some users may have dependencies b/w requests, i.e. a request
> will only complete when another request's task_work is executed.
>
> 3) There might be use cases when you don't wont it to be delayed,
> IO retries would be a good example. I wouldn't also use it for
> control paths like ublk_ctrl_uring_cmd.

You speak a lot of some users and some cases when the only users
are ublk and nvme, both of which would obviously benefit.

If you don't want conflicts wait for Ming to finish his work
and then we can do this cleanly and without leaving dead code
around.
Pavel Begunkov May 17, 2023, 1:30 p.m. UTC | #4
On 5/17/23 13:39, Christoph Hellwig wrote:
> On Wed, May 17, 2023 at 01:32:53PM +0100, Pavel Begunkov wrote:
>> 1) ublk does secondary batching and so may produce multiple cqes,
>> that's not supported. I believe Ming sent patches removing it,
>> but I'd rather not deal with conflicts for now.
>>
>> 2) Some users may have dependencies b/w requests, i.e. a request
>> will only complete when another request's task_work is executed.
>>
>> 3) There might be use cases when you don't wont it to be delayed,
>> IO retries would be a good example. I wouldn't also use it for
>> control paths like ublk_ctrl_uring_cmd.
> 
> You speak a lot of some users and some cases when the only users
> are ublk and nvme, both of which would obviously benefit.
> 
> If you don't want conflicts wait for Ming to finish his work
> and then we can do this cleanly and without leaving dead code
> around.

Aside that you decided to ignore the third point, that's a
generic interface, not nvme specific, there are patches for
net cmds, someone even tried to use it for drm. How do you
think new users are supposed to appear if the only helper
doing the job can hang the userspace for their use case?
Well, then maybe it'll remain nvme/ublk specific with such
an approach.

It is clean, and it's not dead code, and we should not
remove the simpler and more straightforward helper.
Christoph Hellwig May 17, 2023, 1:53 p.m. UTC | #5
On Wed, May 17, 2023 at 02:30:47PM +0100, Pavel Begunkov wrote:
> Aside that you decided to ignore the third point, that's a
> generic interface, not nvme specific, there are patches for
> net cmds, someone even tried to use it for drm. How do you
> think new users are supposed to appear if the only helper
> doing the job can hang the userspace for their use case?
> Well, then maybe it'll remain nvme/ublk specific with such
> an approach.

New users can add new code when it's actualy needed.  We don't
bloat the kernel for maybe in the future crap as a policy.
Jens Axboe May 17, 2023, 7:31 p.m. UTC | #6
On 5/17/23 6:32 AM, Pavel Begunkov wrote:
> On 5/17/23 08:23, Christoph Hellwig wrote:
>> On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote:
>>> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
>>> commands completion. It further delays the execution of task_work for
>>> DEFER_TASKRUN until there are enough of task_work items queued to meet
>>> the waiting criteria, which reduces the number of wake ups we issue.
>>
>> Why wouldn't you just do that unconditionally for
>> io_uring_cmd_complete_in_task?
> 
> 1) ublk does secondary batching and so may produce multiple cqes,
> that's not supported. I believe Ming sent patches removing it,
> but I'd rather not deal with conflicts for now.

Ming, what's the status of those patches? Looks like we'll end up
with a dependency regardless of the ordering of these. Since these
patches are here now, sanest approach seems to move forward with
this series and defer the conflict resolving to the ublk side.
Pavel Begunkov May 17, 2023, 8:11 p.m. UTC | #7
On 5/17/23 14:53, Christoph Hellwig wrote:
> On Wed, May 17, 2023 at 02:30:47PM +0100, Pavel Begunkov wrote:
>> Aside that you decided to ignore the third point, that's a
>> generic interface, not nvme specific, there are patches for
>> net cmds, someone even tried to use it for drm. How do you
>> think new users are supposed to appear if the only helper
>> doing the job can hang the userspace for their use case?
>> Well, then maybe it'll remain nvme/ublk specific with such
>> an approach.
> 
> New users can add new code when it's actualy needed.  We don't
> bloat the kernel for maybe in the future crap as a policy.

Let me put it for you this way, it's an absolutely horrendous
idea to leave the old innocently looking name, i.e.
io_uring_cmd_complete_in_task(), and add there a bunch of
restrictions no new user would care about, that's called
shooting yourself in the leg.

So, we need to rename the function, which, again, for absolutely
no reason adds dependency on ublk. Why doing that instead of
waiting until ublk is converted? That's a big mystery.
Ming Lei May 18, 2023, 2:15 a.m. UTC | #8
On Wed, May 17, 2023 at 01:31:00PM -0600, Jens Axboe wrote:
> On 5/17/23 6:32 AM, Pavel Begunkov wrote:
> > On 5/17/23 08:23, Christoph Hellwig wrote:
> >> On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote:
> >>> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
> >>> commands completion. It further delays the execution of task_work for
> >>> DEFER_TASKRUN until there are enough of task_work items queued to meet
> >>> the waiting criteria, which reduces the number of wake ups we issue.
> >>
> >> Why wouldn't you just do that unconditionally for
> >> io_uring_cmd_complete_in_task?
> > 
> > 1) ublk does secondary batching and so may produce multiple cqes,
> > that's not supported. I believe Ming sent patches removing it,
> > but I'd rather not deal with conflicts for now.
> 
> Ming, what's the status of those patches? Looks like we'll end up
> with a dependency regardless of the ordering of these. Since these
> patches are here now, sanest approach seems to move forward with
> this series and defer the conflict resolving to the ublk side.

I didn't send patch to remove the batch in ublk, such as, the following
line code:

ublk_queue_cmd():
	...
	if (!llist_add(&data->node, &ubq->io_cmds))
		return;
	...

But I did want to kill it given __io_req_task_work_add() can do batching
process, but we have to re-order request in this list, so can't remove it
now simply, see commit:

	7d4a93176e01 ("ublk_drv: don't forward io commands in reserve order")

Pavel must have misunderstood the following one as the batch removal:

https://lore.kernel.org/linux-block/20230427124414.319945-2-ming.lei@redhat.com/

	

thanks,
Ming
diff mbox series

Patch

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 81c5c9e38477..52ed1094ccbb 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -521,7 +521,7 @@  static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
 	if (cookie != NULL && blk_rq_is_poll(req))
 		nvme_uring_task_cb(ioucmd, IO_URING_F_UNLOCKED);
 	else
-		io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb);
+		io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
 
 	return RQ_END_IO_FREE;
 }
@@ -543,7 +543,7 @@  static enum rq_end_io_ret nvme_uring_cmd_end_io_meta(struct request *req,
 	if (cookie != NULL && blk_rq_is_poll(req))
 		nvme_uring_task_meta_cb(ioucmd, IO_URING_F_UNLOCKED);
 	else
-		io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_meta_cb);
+		io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_meta_cb);
 
 	return RQ_END_IO_NONE;
 }