Message ID | 378d3aba69ea2b6a8b14624810a551c2ae011791.1655213915.git.asml.silence@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | 5.20 cleanups and poll optimisations | expand |
On 6/14/22 22:37, Pavel Begunkov wrote: > REQ_F_COMPLETE_INLINE is only needed to delay queueing into the > completion list to io_queue_sqe() as __io_req_complete() is inlined and > we don't want to bloat the kernel. > > As now we complete in a more centralised fashion in io_issue_sqe() we > can get rid of the flag and queue to the list directly. > > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> > --- > io_uring/io_uring.c | 20 ++++++++------------ > io_uring/io_uring.h | 5 ----- > io_uring/io_uring_types.h | 3 --- > 3 files changed, 8 insertions(+), 20 deletions(-) > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index 1fb93fdcfbab..fcee58c6c35e 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2) > > inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags) > { > - if (issue_flags & IO_URING_F_COMPLETE_DEFER) > - io_req_complete_state(req); > - else > - io_req_complete_post(req); > + io_req_complete_post(req); > } > io_read/write and provide_buffers/remove_buffers are still using io_req_complete() in their own function. By removing the IO_URING_F_COMPLETE_DEFER branch they will end in complete_post path 100% which we shouldn't.
On 6/15/22 09:20, Hao Xu wrote: > On 6/14/22 22:37, Pavel Begunkov wrote: >> REQ_F_COMPLETE_INLINE is only needed to delay queueing into the >> completion list to io_queue_sqe() as __io_req_complete() is inlined and >> we don't want to bloat the kernel. >> >> As now we complete in a more centralised fashion in io_issue_sqe() we >> can get rid of the flag and queue to the list directly. >> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >> --- >> io_uring/io_uring.c | 20 ++++++++------------ >> io_uring/io_uring.h | 5 ----- >> io_uring/io_uring_types.h | 3 --- >> 3 files changed, 8 insertions(+), 20 deletions(-) >> >> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >> index 1fb93fdcfbab..fcee58c6c35e 100644 >> --- a/io_uring/io_uring.c >> +++ b/io_uring/io_uring.c >> @@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2) >> inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags) >> { >> - if (issue_flags & IO_URING_F_COMPLETE_DEFER) >> - io_req_complete_state(req); >> - else >> - io_req_complete_post(req); >> + io_req_complete_post(req); >> } > > io_read/write and provide_buffers/remove_buffers are still using > io_req_complete() in their own function. By removing the > IO_URING_F_COMPLETE_DEFER branch they will end in complete_post path > 100% which we shouldn't. Old provided buffers are such a useful feature that Jens adds a new ring-based version of it, so I couldn't care less about those two. I any case, let's leave it to follow ups. Those locking is a weird construct and shouldn't be done this ad-hook way, it's a potential bug nest
On 6/15/22 11:18, Pavel Begunkov wrote: > On 6/15/22 09:20, Hao Xu wrote: >> On 6/14/22 22:37, Pavel Begunkov wrote: >>> REQ_F_COMPLETE_INLINE is only needed to delay queueing into the >>> completion list to io_queue_sqe() as __io_req_complete() is inlined and >>> we don't want to bloat the kernel. >>> >>> As now we complete in a more centralised fashion in io_issue_sqe() we >>> can get rid of the flag and queue to the list directly. >>> >>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >>> --- >>> io_uring/io_uring.c | 20 ++++++++------------ >>> io_uring/io_uring.h | 5 ----- >>> io_uring/io_uring_types.h | 3 --- >>> 3 files changed, 8 insertions(+), 20 deletions(-) >>> >>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>> index 1fb93fdcfbab..fcee58c6c35e 100644 >>> --- a/io_uring/io_uring.c >>> +++ b/io_uring/io_uring.c >>> @@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2) >>> inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags) >>> { >>> - if (issue_flags & IO_URING_F_COMPLETE_DEFER) >>> - io_req_complete_state(req); >>> - else >>> - io_req_complete_post(req); >>> + io_req_complete_post(req); >>> } >> >> io_read/write and provide_buffers/remove_buffers are still using >> io_req_complete() in their own function. By removing the >> IO_URING_F_COMPLETE_DEFER branch they will end in complete_post path >> 100% which we shouldn't. > > Old provided buffers are such a useful feature that Jens adds > a new ring-based version of it, so I couldn't care less about > those two. > > I any case, let's leave it to follow ups. Those locking is a > weird construct and shouldn't be done this ad-hook way, it's > a potential bug nest Ok, missed io_read/write part, that's a problem, agree
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 1fb93fdcfbab..fcee58c6c35e 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2) inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags) { - if (issue_flags & IO_URING_F_COMPLETE_DEFER) - io_req_complete_state(req); - else - io_req_complete_post(req); + io_req_complete_post(req); } void __io_req_complete32(struct io_kiocb *req, unsigned int issue_flags, u64 extra1, u64 extra2) { if (issue_flags & IO_URING_F_COMPLETE_DEFER) { - io_req_complete_state(req); + io_req_add_compl_list(req); req->extra1 = extra1; req->extra2 = extra2; } else { @@ -2132,9 +2129,12 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) if (creds) revert_creds(creds); - if (ret == IOU_OK) - __io_req_complete(req, issue_flags); - else if (ret != IOU_ISSUE_SKIP_COMPLETE) + if (ret == IOU_OK) { + if (issue_flags & IO_URING_F_COMPLETE_DEFER) + io_req_add_compl_list(req); + else + io_req_complete_post(req); + } else if (ret != IOU_ISSUE_SKIP_COMPLETE) return ret; /* If the op doesn't have a file, we're not polling for it */ @@ -2299,10 +2299,6 @@ static inline void io_queue_sqe(struct io_kiocb *req) ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER); - if (req->flags & REQ_F_COMPLETE_INLINE) { - io_req_add_compl_list(req); - return; - } /* * We async punt it if the file wasn't marked NOWAIT, or if the file * doesn't support non-blocking read/write attempts diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 26b669746d61..2141519e995a 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -193,11 +193,6 @@ static inline bool io_run_task_work(void) return false; } -static inline void io_req_complete_state(struct io_kiocb *req) -{ - req->flags |= REQ_F_COMPLETE_INLINE; -} - static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked) { if (!*locked) { diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h index ca8e25992ece..3228872c199b 100644 --- a/io_uring/io_uring_types.h +++ b/io_uring/io_uring_types.h @@ -299,7 +299,6 @@ enum { REQ_F_POLLED_BIT, REQ_F_BUFFER_SELECTED_BIT, REQ_F_BUFFER_RING_BIT, - REQ_F_COMPLETE_INLINE_BIT, REQ_F_REISSUE_BIT, REQ_F_CREDS_BIT, REQ_F_REFCOUNT_BIT, @@ -353,8 +352,6 @@ enum { REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT), /* buffer selected from ring, needs commit */ REQ_F_BUFFER_RING = BIT(REQ_F_BUFFER_RING_BIT), - /* completion is deferred through io_comp_state */ - REQ_F_COMPLETE_INLINE = BIT(REQ_F_COMPLETE_INLINE_BIT), /* caller should reissue async */ REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT), /* supports async reads/writes */
REQ_F_COMPLETE_INLINE is only needed to delay queueing into the completion list to io_queue_sqe() as __io_req_complete() is inlined and we don't want to bloat the kernel. As now we complete in a more centralised fashion in io_issue_sqe() we can get rid of the flag and queue to the list directly. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/io_uring.c | 20 ++++++++------------ io_uring/io_uring.h | 5 ----- io_uring/io_uring_types.h | 3 --- 3 files changed, 8 insertions(+), 20 deletions(-)