Message ID | 20230419162552.576489-4-axboe@kernel.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Enable NO_OFFLOAD support | expand |
On 4/19/23 17:25, Jens Axboe wrote: > Some applications don't necessarily care about io_uring not blocking for > request issue, they simply want to use io_uring for batched submission > of IO. However, io_uring will always do non-blocking issues, and for > some request types, there's simply no support for doing non-blocking > issue and hence they get punted to io-wq unconditionally. If the > application doesn't care about issue potentially blocking, this causes > a performance slowdown as thread offload is not nearly as efficient as > inline issue. > > Add support for configuring the ring with IORING_SETUP_NO_OFFLOAD, and > add an IORING_ENTER_NO_OFFLOAD flag to io_uring_enter(2). If either one > of these is set, then io_uring will ignore the non-block issue attempt > for any file which we cannot poll for readiness. The simplified io_uring > issue model looks as follows: > > 1) Non-blocking issue is attempted for IO. If successful, we're done for > now. > > 2) Case 1 failed. Now we have two options > a) We can poll the file. We arm poll, and we're done for now > until that triggers. > b) File cannot be polled, we punt to io-wq which then does a > blocking attempt. > > If either of the NO_OFFLOAD flags are set, we should never hit case > 2b. Instead, case 1 would issue the IO without the non-blocking flag > being set and perform an inline completion. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > include/linux/io_uring_types.h | 3 +++ > include/uapi/linux/io_uring.h | 7 +++++++ > io_uring/io_uring.c | 26 ++++++++++++++++++++------ > io_uring/io_uring.h | 2 +- > io_uring/sqpoll.c | 3 ++- > 5 files changed, 33 insertions(+), 8 deletions(-) > > diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h > index 4dd54d2173e1..c54f3fb7ab1a 100644 > --- a/include/linux/io_uring_types.h > +++ b/include/linux/io_uring_types.h > @@ -403,6 +403,7 @@ enum { > REQ_F_APOLL_MULTISHOT_BIT, > REQ_F_CLEAR_POLLIN_BIT, > REQ_F_HASH_LOCKED_BIT, > + REQ_F_NO_OFFLOAD_BIT, > /* keep async read/write and isreg together and in order */ > REQ_F_SUPPORT_NOWAIT_BIT, > REQ_F_ISREG_BIT, > @@ -475,6 +476,8 @@ enum { > REQ_F_CLEAR_POLLIN = BIT_ULL(REQ_F_CLEAR_POLLIN_BIT), > /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ > REQ_F_HASH_LOCKED = BIT_ULL(REQ_F_HASH_LOCKED_BIT), > + /* don't offload to io-wq, issue blocking if needed */ > + REQ_F_NO_OFFLOAD = BIT_ULL(REQ_F_NO_OFFLOAD_BIT), > }; > > typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h > index 0716cb17e436..ea903a677ce9 100644 > --- a/include/uapi/linux/io_uring.h > +++ b/include/uapi/linux/io_uring.h > @@ -173,6 +173,12 @@ enum { > */ > #define IORING_SETUP_DEFER_TASKRUN (1U << 13) > > +/* > + * Don't attempt non-blocking issue on file types that would otherwise > + * punt to io-wq if they cannot be completed non-blocking. > + */ > +#define IORING_SETUP_NO_OFFLOAD (1U << 14) > + > enum io_uring_op { > IORING_OP_NOP, > IORING_OP_READV, > @@ -443,6 +449,7 @@ struct io_cqring_offsets { > #define IORING_ENTER_SQ_WAIT (1U << 2) > #define IORING_ENTER_EXT_ARG (1U << 3) > #define IORING_ENTER_REGISTERED_RING (1U << 4) > +#define IORING_ENTER_NO_OFFLOAD (1U << 5) > > /* > * Passed in for io_uring_setup(2). Copied back with updated info on success > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index 9568b5e4cf87..04770b06de16 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -1947,6 +1947,10 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > if (unlikely(!io_assign_file(req, def, issue_flags))) > return -EBADF; > > + if (req->flags & REQ_F_NO_OFFLOAD && > + (!req->file || !file_can_poll(req->file))) > + issue_flags &= ~IO_URING_F_NONBLOCK; > + > if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred())) > creds = override_creds(req->creds); > > @@ -2337,7 +2341,7 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe, > } > > static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, > - const struct io_uring_sqe *sqe) > + const struct io_uring_sqe *sqe, bool no_offload) > __must_hold(&ctx->uring_lock) > { > struct io_submit_link *link = &ctx->submit_state.link; > @@ -2385,6 +2389,9 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, > return 0; > } > > + if (no_offload) > + req->flags |= REQ_F_NO_OFFLOAD; Shouldn't it be a part of the initial "in syscall" submission but not extended to tw? I'd say it should, otherwise it risks making !DEFER_TASKRUN totally unpredictable. E.g. any syscall can try to execute tw and get stuck waiting in there.
On 4/19/23 7:01?PM, Pavel Begunkov wrote: > On 4/19/23 17:25, Jens Axboe wrote: >> Some applications don't necessarily care about io_uring not blocking for >> request issue, they simply want to use io_uring for batched submission >> of IO. However, io_uring will always do non-blocking issues, and for >> some request types, there's simply no support for doing non-blocking >> issue and hence they get punted to io-wq unconditionally. If the >> application doesn't care about issue potentially blocking, this causes >> a performance slowdown as thread offload is not nearly as efficient as >> inline issue. >> >> Add support for configuring the ring with IORING_SETUP_NO_OFFLOAD, and >> add an IORING_ENTER_NO_OFFLOAD flag to io_uring_enter(2). If either one >> of these is set, then io_uring will ignore the non-block issue attempt >> for any file which we cannot poll for readiness. The simplified io_uring >> issue model looks as follows: >> >> 1) Non-blocking issue is attempted for IO. If successful, we're done for >> now. >> >> 2) Case 1 failed. Now we have two options >> a) We can poll the file. We arm poll, and we're done for now >> until that triggers. >> b) File cannot be polled, we punt to io-wq which then does a >> blocking attempt. >> >> If either of the NO_OFFLOAD flags are set, we should never hit case >> 2b. Instead, case 1 would issue the IO without the non-blocking flag >> being set and perform an inline completion. >> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> --- >> include/linux/io_uring_types.h | 3 +++ >> include/uapi/linux/io_uring.h | 7 +++++++ >> io_uring/io_uring.c | 26 ++++++++++++++++++++------ >> io_uring/io_uring.h | 2 +- >> io_uring/sqpoll.c | 3 ++- >> 5 files changed, 33 insertions(+), 8 deletions(-) >> >> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h >> index 4dd54d2173e1..c54f3fb7ab1a 100644 >> --- a/include/linux/io_uring_types.h >> +++ b/include/linux/io_uring_types.h >> @@ -403,6 +403,7 @@ enum { >> REQ_F_APOLL_MULTISHOT_BIT, >> REQ_F_CLEAR_POLLIN_BIT, >> REQ_F_HASH_LOCKED_BIT, >> + REQ_F_NO_OFFLOAD_BIT, >> /* keep async read/write and isreg together and in order */ >> REQ_F_SUPPORT_NOWAIT_BIT, >> REQ_F_ISREG_BIT, >> @@ -475,6 +476,8 @@ enum { >> REQ_F_CLEAR_POLLIN = BIT_ULL(REQ_F_CLEAR_POLLIN_BIT), >> /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ >> REQ_F_HASH_LOCKED = BIT_ULL(REQ_F_HASH_LOCKED_BIT), >> + /* don't offload to io-wq, issue blocking if needed */ >> + REQ_F_NO_OFFLOAD = BIT_ULL(REQ_F_NO_OFFLOAD_BIT), >> }; >> typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); >> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >> index 0716cb17e436..ea903a677ce9 100644 >> --- a/include/uapi/linux/io_uring.h >> +++ b/include/uapi/linux/io_uring.h >> @@ -173,6 +173,12 @@ enum { >> */ >> #define IORING_SETUP_DEFER_TASKRUN (1U << 13) >> +/* >> + * Don't attempt non-blocking issue on file types that would otherwise >> + * punt to io-wq if they cannot be completed non-blocking. >> + */ >> +#define IORING_SETUP_NO_OFFLOAD (1U << 14) >> + >> enum io_uring_op { >> IORING_OP_NOP, >> IORING_OP_READV, >> @@ -443,6 +449,7 @@ struct io_cqring_offsets { >> #define IORING_ENTER_SQ_WAIT (1U << 2) >> #define IORING_ENTER_EXT_ARG (1U << 3) >> #define IORING_ENTER_REGISTERED_RING (1U << 4) >> +#define IORING_ENTER_NO_OFFLOAD (1U << 5) >> /* >> * Passed in for io_uring_setup(2). Copied back with updated info on success >> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >> index 9568b5e4cf87..04770b06de16 100644 >> --- a/io_uring/io_uring.c >> +++ b/io_uring/io_uring.c >> @@ -1947,6 +1947,10 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >> if (unlikely(!io_assign_file(req, def, issue_flags))) >> return -EBADF; >> + if (req->flags & REQ_F_NO_OFFLOAD && >> + (!req->file || !file_can_poll(req->file))) >> + issue_flags &= ~IO_URING_F_NONBLOCK; >> + >> if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred())) >> creds = override_creds(req->creds); >> @@ -2337,7 +2341,7 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe, >> } >> static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, >> - const struct io_uring_sqe *sqe) >> + const struct io_uring_sqe *sqe, bool no_offload) >> __must_hold(&ctx->uring_lock) >> { >> struct io_submit_link *link = &ctx->submit_state.link; >> @@ -2385,6 +2389,9 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, >> return 0; >> } >> + if (no_offload) >> + req->flags |= REQ_F_NO_OFFLOAD; > > Shouldn't it be a part of the initial "in syscall" submission > but not extended to tw? I'd say it should, otherwise it risks > making !DEFER_TASKRUN totally unpredictable. E.g. any syscall > can try to execute tw and get stuck waiting in there. Yeah, it should probably be ignore outside of off io_uring_enter(2) submissions. If we do that, we could drop the flag too (and the flags extension).
On 4/20/23 16:03, Jens Axboe wrote: > On 4/19/23 7:01?PM, Pavel Begunkov wrote: >> On 4/19/23 17:25, Jens Axboe wrote: >>> Some applications don't necessarily care about io_uring not blocking for >>> request issue, they simply want to use io_uring for batched submission >>> of IO. However, io_uring will always do non-blocking issues, and for >>> some request types, there's simply no support for doing non-blocking >>> issue and hence they get punted to io-wq unconditionally. If the >>> application doesn't care about issue potentially blocking, this causes >>> a performance slowdown as thread offload is not nearly as efficient as >>> inline issue. >>> >>> Add support for configuring the ring with IORING_SETUP_NO_OFFLOAD, and >>> add an IORING_ENTER_NO_OFFLOAD flag to io_uring_enter(2). If either one >>> of these is set, then io_uring will ignore the non-block issue attempt >>> for any file which we cannot poll for readiness. The simplified io_uring >>> issue model looks as follows: >>> >>> 1) Non-blocking issue is attempted for IO. If successful, we're done for >>> now. >>> >>> 2) Case 1 failed. Now we have two options >>> a) We can poll the file. We arm poll, and we're done for now >>> until that triggers. >>> b) File cannot be polled, we punt to io-wq which then does a >>> blocking attempt. >>> >>> If either of the NO_OFFLOAD flags are set, we should never hit case >>> 2b. Instead, case 1 would issue the IO without the non-blocking flag >>> being set and perform an inline completion. >>> >>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>> --- >>> include/linux/io_uring_types.h | 3 +++ >>> include/uapi/linux/io_uring.h | 7 +++++++ >>> io_uring/io_uring.c | 26 ++++++++++++++++++++------ >>> io_uring/io_uring.h | 2 +- >>> io_uring/sqpoll.c | 3 ++- >>> 5 files changed, 33 insertions(+), 8 deletions(-) >>> >>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h >>> index 4dd54d2173e1..c54f3fb7ab1a 100644 >>> --- a/include/linux/io_uring_types.h >>> +++ b/include/linux/io_uring_types.h >>> @@ -403,6 +403,7 @@ enum { >>> REQ_F_APOLL_MULTISHOT_BIT, >>> REQ_F_CLEAR_POLLIN_BIT, >>> REQ_F_HASH_LOCKED_BIT, >>> + REQ_F_NO_OFFLOAD_BIT, >>> /* keep async read/write and isreg together and in order */ >>> REQ_F_SUPPORT_NOWAIT_BIT, >>> REQ_F_ISREG_BIT, >>> @@ -475,6 +476,8 @@ enum { >>> REQ_F_CLEAR_POLLIN = BIT_ULL(REQ_F_CLEAR_POLLIN_BIT), >>> /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ >>> REQ_F_HASH_LOCKED = BIT_ULL(REQ_F_HASH_LOCKED_BIT), >>> + /* don't offload to io-wq, issue blocking if needed */ >>> + REQ_F_NO_OFFLOAD = BIT_ULL(REQ_F_NO_OFFLOAD_BIT), >>> }; >>> typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); >>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >>> index 0716cb17e436..ea903a677ce9 100644 >>> --- a/include/uapi/linux/io_uring.h >>> +++ b/include/uapi/linux/io_uring.h >>> @@ -173,6 +173,12 @@ enum { >>> */ >>> #define IORING_SETUP_DEFER_TASKRUN (1U << 13) >>> +/* >>> + * Don't attempt non-blocking issue on file types that would otherwise >>> + * punt to io-wq if they cannot be completed non-blocking. >>> + */ >>> +#define IORING_SETUP_NO_OFFLOAD (1U << 14) >>> + >>> enum io_uring_op { >>> IORING_OP_NOP, >>> IORING_OP_READV, >>> @@ -443,6 +449,7 @@ struct io_cqring_offsets { >>> #define IORING_ENTER_SQ_WAIT (1U << 2) >>> #define IORING_ENTER_EXT_ARG (1U << 3) >>> #define IORING_ENTER_REGISTERED_RING (1U << 4) >>> +#define IORING_ENTER_NO_OFFLOAD (1U << 5) >>> /* >>> * Passed in for io_uring_setup(2). Copied back with updated info on success >>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>> index 9568b5e4cf87..04770b06de16 100644 >>> --- a/io_uring/io_uring.c >>> +++ b/io_uring/io_uring.c >>> @@ -1947,6 +1947,10 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >>> if (unlikely(!io_assign_file(req, def, issue_flags))) >>> return -EBADF; >>> + if (req->flags & REQ_F_NO_OFFLOAD && >>> + (!req->file || !file_can_poll(req->file))) >>> + issue_flags &= ~IO_URING_F_NONBLOCK; >>> + >>> if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred())) >>> creds = override_creds(req->creds); >>> @@ -2337,7 +2341,7 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe, >>> } >>> static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, >>> - const struct io_uring_sqe *sqe) >>> + const struct io_uring_sqe *sqe, bool no_offload) >>> __must_hold(&ctx->uring_lock) >>> { >>> struct io_submit_link *link = &ctx->submit_state.link; >>> @@ -2385,6 +2389,9 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, >>> return 0; >>> } >>> + if (no_offload) >>> + req->flags |= REQ_F_NO_OFFLOAD; >> >> Shouldn't it be a part of the initial "in syscall" submission >> but not extended to tw? I'd say it should, otherwise it risks >> making !DEFER_TASKRUN totally unpredictable. E.g. any syscall >> can try to execute tw and get stuck waiting in there. > > Yeah, it should probably be ignore outside of off io_uring_enter(2) > submissions. If we do that, we could drop the flag too (and the flags > extension). issue_flags instead of req->flags might be a better place for it
On 4/20/23 9:16?AM, Pavel Begunkov wrote: > On 4/20/23 16:03, Jens Axboe wrote: >> On 4/19/23 7:01?PM, Pavel Begunkov wrote: >>> On 4/19/23 17:25, Jens Axboe wrote: >>>> Some applications don't necessarily care about io_uring not blocking for >>>> request issue, they simply want to use io_uring for batched submission >>>> of IO. However, io_uring will always do non-blocking issues, and for >>>> some request types, there's simply no support for doing non-blocking >>>> issue and hence they get punted to io-wq unconditionally. If the >>>> application doesn't care about issue potentially blocking, this causes >>>> a performance slowdown as thread offload is not nearly as efficient as >>>> inline issue. >>>> >>>> Add support for configuring the ring with IORING_SETUP_NO_OFFLOAD, and >>>> add an IORING_ENTER_NO_OFFLOAD flag to io_uring_enter(2). If either one >>>> of these is set, then io_uring will ignore the non-block issue attempt >>>> for any file which we cannot poll for readiness. The simplified io_uring >>>> issue model looks as follows: >>>> >>>> 1) Non-blocking issue is attempted for IO. If successful, we're done for >>>> now. >>>> >>>> 2) Case 1 failed. Now we have two options >>>> a) We can poll the file. We arm poll, and we're done for now >>>> until that triggers. >>>> b) File cannot be polled, we punt to io-wq which then does a >>>> blocking attempt. >>>> >>>> If either of the NO_OFFLOAD flags are set, we should never hit case >>>> 2b. Instead, case 1 would issue the IO without the non-blocking flag >>>> being set and perform an inline completion. >>>> >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>> --- >>>> include/linux/io_uring_types.h | 3 +++ >>>> include/uapi/linux/io_uring.h | 7 +++++++ >>>> io_uring/io_uring.c | 26 ++++++++++++++++++++------ >>>> io_uring/io_uring.h | 2 +- >>>> io_uring/sqpoll.c | 3 ++- >>>> 5 files changed, 33 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h >>>> index 4dd54d2173e1..c54f3fb7ab1a 100644 >>>> --- a/include/linux/io_uring_types.h >>>> +++ b/include/linux/io_uring_types.h >>>> @@ -403,6 +403,7 @@ enum { >>>> REQ_F_APOLL_MULTISHOT_BIT, >>>> REQ_F_CLEAR_POLLIN_BIT, >>>> REQ_F_HASH_LOCKED_BIT, >>>> + REQ_F_NO_OFFLOAD_BIT, >>>> /* keep async read/write and isreg together and in order */ >>>> REQ_F_SUPPORT_NOWAIT_BIT, >>>> REQ_F_ISREG_BIT, >>>> @@ -475,6 +476,8 @@ enum { >>>> REQ_F_CLEAR_POLLIN = BIT_ULL(REQ_F_CLEAR_POLLIN_BIT), >>>> /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ >>>> REQ_F_HASH_LOCKED = BIT_ULL(REQ_F_HASH_LOCKED_BIT), >>>> + /* don't offload to io-wq, issue blocking if needed */ >>>> + REQ_F_NO_OFFLOAD = BIT_ULL(REQ_F_NO_OFFLOAD_BIT), >>>> }; >>>> typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); >>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >>>> index 0716cb17e436..ea903a677ce9 100644 >>>> --- a/include/uapi/linux/io_uring.h >>>> +++ b/include/uapi/linux/io_uring.h >>>> @@ -173,6 +173,12 @@ enum { >>>> */ >>>> #define IORING_SETUP_DEFER_TASKRUN (1U << 13) >>>> +/* >>>> + * Don't attempt non-blocking issue on file types that would otherwise >>>> + * punt to io-wq if they cannot be completed non-blocking. >>>> + */ >>>> +#define IORING_SETUP_NO_OFFLOAD (1U << 14) >>>> + >>>> enum io_uring_op { >>>> IORING_OP_NOP, >>>> IORING_OP_READV, >>>> @@ -443,6 +449,7 @@ struct io_cqring_offsets { >>>> #define IORING_ENTER_SQ_WAIT (1U << 2) >>>> #define IORING_ENTER_EXT_ARG (1U << 3) >>>> #define IORING_ENTER_REGISTERED_RING (1U << 4) >>>> +#define IORING_ENTER_NO_OFFLOAD (1U << 5) >>>> /* >>>> * Passed in for io_uring_setup(2). Copied back with updated info on success >>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>> index 9568b5e4cf87..04770b06de16 100644 >>>> --- a/io_uring/io_uring.c >>>> +++ b/io_uring/io_uring.c >>>> @@ -1947,6 +1947,10 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >>>> if (unlikely(!io_assign_file(req, def, issue_flags))) >>>> return -EBADF; >>>> + if (req->flags & REQ_F_NO_OFFLOAD && >>>> + (!req->file || !file_can_poll(req->file))) >>>> + issue_flags &= ~IO_URING_F_NONBLOCK; >>>> + >>>> if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred())) >>>> creds = override_creds(req->creds); >>>> @@ -2337,7 +2341,7 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe, >>>> } >>>> static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, >>>> - const struct io_uring_sqe *sqe) >>>> + const struct io_uring_sqe *sqe, bool no_offload) >>>> __must_hold(&ctx->uring_lock) >>>> { >>>> struct io_submit_link *link = &ctx->submit_state.link; >>>> @@ -2385,6 +2389,9 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, >>>> return 0; >>>> } >>>> + if (no_offload) >>>> + req->flags |= REQ_F_NO_OFFLOAD; >>> >>> Shouldn't it be a part of the initial "in syscall" submission >>> but not extended to tw? I'd say it should, otherwise it risks >>> making !DEFER_TASKRUN totally unpredictable. E.g. any syscall >>> can try to execute tw and get stuck waiting in there. >> >> Yeah, it should probably be ignore outside of off io_uring_enter(2) >> submissions. If we do that, we could drop the flag too (and the flags >> extension). > > issue_flags instead of req->flags might be a better place for it For sure, if we're not carrying it in io_kiocb state, then an issue flag is the way to go. FWIW, I already rebased and did that. And then I ran the full test suite, with a modification to the queue init helpers that sets IORING_SETUP_NO_OFFLOAD for all queue creations, except the ones where IORING_SETUP_IOPOLL is set. Ran into one minor issue, which is test/fallocate.c which will trigger SIGXFSZ for the process itself (rather than io-wq, where it gets ignored) when exceeding the file size. That's to be expected. Outside of that, everything worked, nothing odd observed. Obviously not a comprehensive test for potential issues, but it does show that we're not THAT much in trouble here. Didn't drop the uring_lock yet, outside of dropping the REQ_F_NO_OFFLOAD flag and making it an issue flag, it's all pretty much the same as before.
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 4dd54d2173e1..c54f3fb7ab1a 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -403,6 +403,7 @@ enum { REQ_F_APOLL_MULTISHOT_BIT, REQ_F_CLEAR_POLLIN_BIT, REQ_F_HASH_LOCKED_BIT, + REQ_F_NO_OFFLOAD_BIT, /* keep async read/write and isreg together and in order */ REQ_F_SUPPORT_NOWAIT_BIT, REQ_F_ISREG_BIT, @@ -475,6 +476,8 @@ enum { REQ_F_CLEAR_POLLIN = BIT_ULL(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ REQ_F_HASH_LOCKED = BIT_ULL(REQ_F_HASH_LOCKED_BIT), + /* don't offload to io-wq, issue blocking if needed */ + REQ_F_NO_OFFLOAD = BIT_ULL(REQ_F_NO_OFFLOAD_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 0716cb17e436..ea903a677ce9 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -173,6 +173,12 @@ enum { */ #define IORING_SETUP_DEFER_TASKRUN (1U << 13) +/* + * Don't attempt non-blocking issue on file types that would otherwise + * punt to io-wq if they cannot be completed non-blocking. + */ +#define IORING_SETUP_NO_OFFLOAD (1U << 14) + enum io_uring_op { IORING_OP_NOP, IORING_OP_READV, @@ -443,6 +449,7 @@ struct io_cqring_offsets { #define IORING_ENTER_SQ_WAIT (1U << 2) #define IORING_ENTER_EXT_ARG (1U << 3) #define IORING_ENTER_REGISTERED_RING (1U << 4) +#define IORING_ENTER_NO_OFFLOAD (1U << 5) /* * Passed in for io_uring_setup(2). Copied back with updated info on success diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 9568b5e4cf87..04770b06de16 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1947,6 +1947,10 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(!io_assign_file(req, def, issue_flags))) return -EBADF; + if (req->flags & REQ_F_NO_OFFLOAD && + (!req->file || !file_can_poll(req->file))) + issue_flags &= ~IO_URING_F_NONBLOCK; + if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred())) creds = override_creds(req->creds); @@ -2337,7 +2341,7 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe, } static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, - const struct io_uring_sqe *sqe) + const struct io_uring_sqe *sqe, bool no_offload) __must_hold(&ctx->uring_lock) { struct io_submit_link *link = &ctx->submit_state.link; @@ -2385,6 +2389,9 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, return 0; } + if (no_offload) + req->flags |= REQ_F_NO_OFFLOAD; + io_queue_sqe(req); return 0; } @@ -2466,7 +2473,7 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe) return false; } -int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) +int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, bool no_offload) __must_hold(&ctx->uring_lock) { unsigned int entries = io_sqring_entries(ctx); @@ -2495,7 +2502,7 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) * Continue submitting even for sqe failure if the * ring was setup with IORING_SETUP_SUBMIT_ALL */ - if (unlikely(io_submit_sqe(ctx, req, sqe)) && + if (unlikely(io_submit_sqe(ctx, req, sqe, no_offload)) && !(ctx->flags & IORING_SETUP_SUBMIT_ALL)) { left--; break; @@ -3524,7 +3531,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP | IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG | - IORING_ENTER_REGISTERED_RING))) + IORING_ENTER_REGISTERED_RING | + IORING_ENTER_NO_OFFLOAD))) return -EINVAL; /* @@ -3575,12 +3583,17 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, ret = to_submit; } else if (to_submit) { + bool no_offload; + ret = io_uring_add_tctx_node(ctx); if (unlikely(ret)) goto out; + no_offload = flags & IORING_ENTER_NO_OFFLOAD || + ctx->flags & IORING_SETUP_NO_OFFLOAD; + mutex_lock(&ctx->uring_lock); - ret = io_submit_sqes(ctx, to_submit); + ret = io_submit_sqes(ctx, to_submit, no_offload); if (ret != to_submit) { mutex_unlock(&ctx->uring_lock); goto out; @@ -3969,7 +3982,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL | IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG | IORING_SETUP_SQE128 | IORING_SETUP_CQE32 | - IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN)) + IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN | + IORING_SETUP_NO_OFFLOAD)) return -EINVAL; return io_uring_create(entries, &p, params); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 25515d69d205..c5c0db7232c0 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -76,7 +76,7 @@ int io_uring_alloc_task_context(struct task_struct *task, struct io_ring_ctx *ctx); int io_poll_issue(struct io_kiocb *req, struct io_tw_state *ts); -int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr); +int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, bool no_offload); int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin); void io_free_batch_list(struct io_ring_ctx *ctx, struct io_wq_work_node *node); int io_req_prep_async(struct io_kiocb *req); diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index 9db4bc1f521a..9a9417bf9e3f 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -166,6 +166,7 @@ static inline bool io_sqd_events_pending(struct io_sq_data *sqd) static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries) { + bool no_offload = ctx->flags & IORING_SETUP_NO_OFFLOAD; unsigned int to_submit; int ret = 0; @@ -190,7 +191,7 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries) */ if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) && !(ctx->flags & IORING_SETUP_R_DISABLED)) - ret = io_submit_sqes(ctx, to_submit); + ret = io_submit_sqes(ctx, to_submit, no_offload); mutex_unlock(&ctx->uring_lock); if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
Some applications don't necessarily care about io_uring not blocking for request issue, they simply want to use io_uring for batched submission of IO. However, io_uring will always do non-blocking issues, and for some request types, there's simply no support for doing non-blocking issue and hence they get punted to io-wq unconditionally. If the application doesn't care about issue potentially blocking, this causes a performance slowdown as thread offload is not nearly as efficient as inline issue. Add support for configuring the ring with IORING_SETUP_NO_OFFLOAD, and add an IORING_ENTER_NO_OFFLOAD flag to io_uring_enter(2). If either one of these is set, then io_uring will ignore the non-block issue attempt for any file which we cannot poll for readiness. The simplified io_uring issue model looks as follows: 1) Non-blocking issue is attempted for IO. If successful, we're done for now. 2) Case 1 failed. Now we have two options a) We can poll the file. We arm poll, and we're done for now until that triggers. b) File cannot be polled, we punt to io-wq which then does a blocking attempt. If either of the NO_OFFLOAD flags are set, we should never hit case 2b. Instead, case 1 would issue the IO without the non-blocking flag being set and perform an inline completion. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- include/linux/io_uring_types.h | 3 +++ include/uapi/linux/io_uring.h | 7 +++++++ io_uring/io_uring.c | 26 ++++++++++++++++++++------ io_uring/io_uring.h | 2 +- io_uring/sqpoll.c | 3 ++- 5 files changed, 33 insertions(+), 8 deletions(-)