diff mbox series

[1/3] io_uring: split req init from submit

Message ID 20230728201449.3350962-1-kbusch@meta.com (mailing list archive)
State New
Headers show
Series [1/3] io_uring: split req init from submit | expand

Commit Message

Keith Busch July 28, 2023, 8:14 p.m. UTC
From: Keith Busch <kbusch@kernel.org>

Split the req initialization and link handling from the submit. This
simplifies the submit path since everything that can fail is separate
from it, and makes it easier to create batched submissions later.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 io_uring/io_uring.c | 66 +++++++++++++++++++++++++--------------------
 1 file changed, 37 insertions(+), 29 deletions(-)

Comments

Pavel Begunkov July 31, 2023, 12:53 p.m. UTC | #1
On 7/28/23 21:14, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Split the req initialization and link handling from the submit. This
> simplifies the submit path since everything that can fail is separate
> from it, and makes it easier to create batched submissions later.

Keith, I don't think this prep patch does us any good, I'd rather
shove the link assembling code further out of the common path. I like
the first version more (see [1]). I'd suggest to merge it, and do
cleaning up after.

I'll also say that IMHO the overhead is well justified. It's not only
about having multiple nvmes, the problem slows down cases mixing storage
with net and the rest of IO in a single ring.

[1] https://lore.kernel.org/io-uring/20230504162427.1099469-1-kbusch@meta.com/
Jens Axboe July 31, 2023, 9 p.m. UTC | #2
On 7/31/23 6:53?AM, Pavel Begunkov wrote:
> On 7/28/23 21:14, Keith Busch wrote:
>> From: Keith Busch <kbusch@kernel.org>
>>
>> Split the req initialization and link handling from the submit. This
>> simplifies the submit path since everything that can fail is separate
>> from it, and makes it easier to create batched submissions later.
> 
> Keith, I don't think this prep patch does us any good, I'd rather
> shove the link assembling code further out of the common path. I like
> the first version more (see [1]). I'd suggest to merge it, and do
> cleaning up after.
> 
> I'll also say that IMHO the overhead is well justified. It's not only
> about having multiple nvmes, the problem slows down cases mixing storage
> with net and the rest of IO in a single ring.
> 
> [1] https://lore.kernel.org/io-uring/20230504162427.1099469-1-kbusch@meta.com/

The downside of that one, to me, is that it just serializes all of it
and we end up looping over the submission list twice. With alloc+init
split, at least we get some locality wins by grouping the setup side of
the requests.
Pavel Begunkov Aug. 1, 2023, 2:13 p.m. UTC | #3
On 7/31/23 22:00, Jens Axboe wrote:
> On 7/31/23 6:53?AM, Pavel Begunkov wrote:
>> On 7/28/23 21:14, Keith Busch wrote:
>>> From: Keith Busch <kbusch@kernel.org>
>>>
>>> Split the req initialization and link handling from the submit. This
>>> simplifies the submit path since everything that can fail is separate
>>> from it, and makes it easier to create batched submissions later.
>>
>> Keith, I don't think this prep patch does us any good, I'd rather
>> shove the link assembling code further out of the common path. I like
>> the first version more (see [1]). I'd suggest to merge it, and do
>> cleaning up after.
>>
>> I'll also say that IMHO the overhead is well justified. It's not only
>> about having multiple nvmes, the problem slows down cases mixing storage
>> with net and the rest of IO in a single ring.
>>
>> [1] https://lore.kernel.org/io-uring/20230504162427.1099469-1-kbusch@meta.com/
> 
> The downside of that one, to me, is that it just serializes all of it
> and we end up looping over the submission list twice.

Right, and there is nothing can be done if we want to know about all
requests in advance, at least without changing uapi and/or adding
userspace hints.

> With alloc+init
> split, at least we get some locality wins by grouping the setup side of
> the requests.

I don't think I follow, what grouping do you mean? As far as I see, v1
and v2 are essentially same with the difference of whether you have a
helper for setting up links or not, see io_setup_link() from v2. In both
cases it's executed in the same sequence:

1) init (generic init + opcode init + link setup) each request and put
    into a temporary list.
2) go go over the list and submit them one by one

And after inlining they should look pretty close.
Keith Busch Aug. 1, 2023, 3:17 p.m. UTC | #4
On Tue, Aug 01, 2023 at 03:13:59PM +0100, Pavel Begunkov wrote:
> On 7/31/23 22:00, Jens Axboe wrote:
> > On 7/31/23 6:53?AM, Pavel Begunkov wrote:
> > > On 7/28/23 21:14, Keith Busch wrote:
> > > > From: Keith Busch <kbusch@kernel.org>
> > > > 
> > > > Split the req initialization and link handling from the submit. This
> > > > simplifies the submit path since everything that can fail is separate
> > > > from it, and makes it easier to create batched submissions later.
> > > 
> > > Keith, I don't think this prep patch does us any good, I'd rather
> > > shove the link assembling code further out of the common path. I like
> > > the first version more (see [1]). I'd suggest to merge it, and do
> > > cleaning up after.
> > > 
> > > I'll also say that IMHO the overhead is well justified. It's not only
> > > about having multiple nvmes, the problem slows down cases mixing storage
> > > with net and the rest of IO in a single ring.
> > > 
> > > [1] https://lore.kernel.org/io-uring/20230504162427.1099469-1-kbusch@meta.com/
> > 
> > The downside of that one, to me, is that it just serializes all of it
> > and we end up looping over the submission list twice.
> 
> Right, and there is nothing can be done if we want to know about all
> requests in advance, at least without changing uapi and/or adding
> userspace hints.
> 
> > With alloc+init
> > split, at least we get some locality wins by grouping the setup side of
> > the requests.
> 
> I don't think I follow, what grouping do you mean? As far as I see, v1
> and v2 are essentially same with the difference of whether you have a
> helper for setting up links or not, see io_setup_link() from v2. In both
> cases it's executed in the same sequence:
> 
> 1) init (generic init + opcode init + link setup) each request and put
>    into a temporary list.
> 2) go go over the list and submit them one by one
> 
> And after inlining they should look pretty close.

The main difference in this one compared to the original version is that
everything in the 2nd loop is just for the final dispatch. Anything that
can fail, fallback, or defer to async happens in the first loop. I'm not
sure that makes a difference in runtime, but having the 2nd loop handle
only fast-path requests was what I set out to do for this version.
Pavel Begunkov Aug. 1, 2023, 4:05 p.m. UTC | #5
On 8/1/23 16:17, Keith Busch wrote:
> On Tue, Aug 01, 2023 at 03:13:59PM +0100, Pavel Begunkov wrote:
>> On 7/31/23 22:00, Jens Axboe wrote:
>>> On 7/31/23 6:53?AM, Pavel Begunkov wrote:
>>>> On 7/28/23 21:14, Keith Busch wrote:
>>>>> From: Keith Busch <kbusch@kernel.org>
>>>>>
>>>>> Split the req initialization and link handling from the submit. This
>>>>> simplifies the submit path since everything that can fail is separate
>>>>> from it, and makes it easier to create batched submissions later.
>>>>
>>>> Keith, I don't think this prep patch does us any good, I'd rather
>>>> shove the link assembling code further out of the common path. I like
>>>> the first version more (see [1]). I'd suggest to merge it, and do
>>>> cleaning up after.
>>>>
>>>> I'll also say that IMHO the overhead is well justified. It's not only
>>>> about having multiple nvmes, the problem slows down cases mixing storage
>>>> with net and the rest of IO in a single ring.
>>>>
>>>> [1] https://lore.kernel.org/io-uring/20230504162427.1099469-1-kbusch@meta.com/
>>>
>>> The downside of that one, to me, is that it just serializes all of it
>>> and we end up looping over the submission list twice.
>>
>> Right, and there is nothing can be done if we want to know about all
>> requests in advance, at least without changing uapi and/or adding
>> userspace hints.
>>
>>> With alloc+init
>>> split, at least we get some locality wins by grouping the setup side of
>>> the requests.
>>
>> I don't think I follow, what grouping do you mean? As far as I see, v1
>> and v2 are essentially same with the difference of whether you have a
>> helper for setting up links or not, see io_setup_link() from v2. In both
>> cases it's executed in the same sequence:
>>
>> 1) init (generic init + opcode init + link setup) each request and put
>>     into a temporary list.
>> 2) go go over the list and submit them one by one
>>
>> And after inlining they should look pretty close.
> 
> The main difference in this one compared to the original version is that
> everything in the 2nd loop is just for the final dispatch. Anything that
> can fail, fallback, or defer to async happens in the first loop. I'm not
> sure that makes a difference in runtime, but having the 2nd loop handle
> only fast-path requests was what I set out to do for this version.

For performance it doesn't matter, it's a very slow path and we should
not be hitting it. And it only smears single req submission over multiple
places, for instance it won't be legal to use io_submit_sqe() without
those extra checks. Those are all minor points, but I don't think it's
anyhow better than v1 in this aspect.
diff mbox series

Patch

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index d585171560ce5..818b2d1661c5e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2279,18 +2279,20 @@  static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe,
 	return 0;
 }
 
-static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
-			 const struct io_uring_sqe *sqe)
-	__must_hold(&ctx->uring_lock)
+static inline void io_submit_sqe(struct io_kiocb *req)
 {
-	struct io_submit_link *link = &ctx->submit_state.link;
-	int ret;
+	trace_io_uring_submit_req(req);
 
-	ret = io_init_req(ctx, req, sqe);
-	if (unlikely(ret))
-		return io_submit_fail_init(sqe, req, ret);
+	if (unlikely(req->flags & (REQ_F_FORCE_ASYNC | REQ_F_FAIL)))
+		io_queue_sqe_fallback(req);
+	else
+		io_queue_sqe(req);
+}
 
-	trace_io_uring_submit_req(req);
+static int io_setup_link(struct io_submit_link *link, struct io_kiocb **orig)
+{
+	struct io_kiocb *req = *orig;
+	int ret;
 
 	/*
 	 * If we already have a head request, queue this one for async
@@ -2300,35 +2302,28 @@  static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
 	 * conditions are true (normal request), then just queue it.
 	 */
 	if (unlikely(link->head)) {
+		*orig = NULL;
+
 		ret = io_req_prep_async(req);
 		if (unlikely(ret))
-			return io_submit_fail_init(sqe, req, ret);
+			return ret;
 
 		trace_io_uring_link(req, link->head);
 		link->last->link = req;
 		link->last = req;
-
 		if (req->flags & IO_REQ_LINK_FLAGS)
 			return 0;
+
 		/* last request of the link, flush it */
-		req = link->head;
+		*orig = link->head;
 		link->head = NULL;
-		if (req->flags & (REQ_F_FORCE_ASYNC | REQ_F_FAIL))
-			goto fallback;
-
-	} else if (unlikely(req->flags & (IO_REQ_LINK_FLAGS |
-					  REQ_F_FORCE_ASYNC | REQ_F_FAIL))) {
-		if (req->flags & IO_REQ_LINK_FLAGS) {
-			link->head = req;
-			link->last = req;
-		} else {
-fallback:
-			io_queue_sqe_fallback(req);
-		}
-		return 0;
+	} else if (unlikely(req->flags & IO_REQ_LINK_FLAGS)) {
+	        link->head = req;
+	        link->last = req;
+		*orig = NULL;
+	        return 0;
 	}
 
-	io_queue_sqe(req);
 	return 0;
 }
 
@@ -2412,9 +2407,10 @@  static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe)
 int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 	__must_hold(&ctx->uring_lock)
 {
+	struct io_submit_link *link = &ctx->submit_state.link;
 	unsigned int entries = io_sqring_entries(ctx);
 	unsigned int left;
-	int ret;
+	int ret, err;
 
 	if (unlikely(!entries))
 		return 0;
@@ -2434,12 +2430,24 @@  int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 			break;
 		}
 
+		err = io_init_req(ctx, req, sqe);
+		if (unlikely(err))
+			goto error;
+
+		err = io_setup_link(link, &req);
+		if (unlikely(err))
+			goto error;
+
+		if (likely(req))
+			io_submit_sqe(req);
+		continue;
+error:
 		/*
 		 * Continue submitting even for sqe failure if the
 		 * ring was setup with IORING_SETUP_SUBMIT_ALL
 		 */
-		if (unlikely(io_submit_sqe(ctx, req, sqe)) &&
-		    !(ctx->flags & IORING_SETUP_SUBMIT_ALL)) {
+		err = io_submit_fail_init(sqe, req, err);
+		if (err && !(ctx->flags & IORING_SETUP_SUBMIT_ALL)) {
 			left--;
 			break;
 		}