Message ID | 20230420183135.119618-5-axboe@kernel.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Enable NO_OFFLOAD support | expand |
On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: > Add an opdef bit for them, and set it for the opcodes where we always > need io-wq punt. With that done, exclude them from the file_can_poll() > check in terms of whether or not we need to punt them if any of the > NO_OFFLOAD flags are set. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > io_uring/io_uring.c | 2 +- > io_uring/opdef.c | 22 ++++++++++++++++++++-- > io_uring/opdef.h | 2 ++ > 3 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index fee3e461e149..420cfd35ebc6 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > return -EBADF; > > if (issue_flags & IO_URING_F_NO_OFFLOAD && > - (!req->file || !file_can_poll(req->file))) > + (!req->file || !file_can_poll(req->file) || def->always_iowq)) > issue_flags &= ~IO_URING_F_NONBLOCK; I guess the check should be !def->always_iowq? Thanks, Ming
On 4/24/23 1:30?AM, Ming Lei wrote: > On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: >> Add an opdef bit for them, and set it for the opcodes where we always >> need io-wq punt. With that done, exclude them from the file_can_poll() >> check in terms of whether or not we need to punt them if any of the >> NO_OFFLOAD flags are set. >> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> --- >> io_uring/io_uring.c | 2 +- >> io_uring/opdef.c | 22 ++++++++++++++++++++-- >> io_uring/opdef.h | 2 ++ >> 3 files changed, 23 insertions(+), 3 deletions(-) >> >> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >> index fee3e461e149..420cfd35ebc6 100644 >> --- a/io_uring/io_uring.c >> +++ b/io_uring/io_uring.c >> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >> return -EBADF; >> >> if (issue_flags & IO_URING_F_NO_OFFLOAD && >> - (!req->file || !file_can_poll(req->file))) >> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) >> issue_flags &= ~IO_URING_F_NONBLOCK; > > I guess the check should be !def->always_iowq? How so? Nobody that takes pollable files should/is setting ->always_iowq. If we can poll the file, we should not force inline submission. Basically the ones setting ->always_iowq always do -EAGAIN returns if nonblock == true.
On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: > On 4/24/23 1:30?AM, Ming Lei wrote: > > On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: > >> Add an opdef bit for them, and set it for the opcodes where we always > >> need io-wq punt. With that done, exclude them from the file_can_poll() > >> check in terms of whether or not we need to punt them if any of the > >> NO_OFFLOAD flags are set. > >> > >> Signed-off-by: Jens Axboe <axboe@kernel.dk> > >> --- > >> io_uring/io_uring.c | 2 +- > >> io_uring/opdef.c | 22 ++++++++++++++++++++-- > >> io_uring/opdef.h | 2 ++ > >> 3 files changed, 23 insertions(+), 3 deletions(-) > >> > >> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > >> index fee3e461e149..420cfd35ebc6 100644 > >> --- a/io_uring/io_uring.c > >> +++ b/io_uring/io_uring.c > >> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > >> return -EBADF; > >> > >> if (issue_flags & IO_URING_F_NO_OFFLOAD && > >> - (!req->file || !file_can_poll(req->file))) > >> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) > >> issue_flags &= ~IO_URING_F_NONBLOCK; > > > > I guess the check should be !def->always_iowq? > > How so? Nobody that takes pollable files should/is setting > ->always_iowq. If we can poll the file, we should not force inline > submission. Basically the ones setting ->always_iowq always do -EAGAIN > returns if nonblock == true. I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and these OPs won't return -EAGAIN, then run in the current task context directly. Thanks, Ming
On 4/24/23 6:57?PM, Ming Lei wrote: > On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: >> On 4/24/23 1:30?AM, Ming Lei wrote: >>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: >>>> Add an opdef bit for them, and set it for the opcodes where we always >>>> need io-wq punt. With that done, exclude them from the file_can_poll() >>>> check in terms of whether or not we need to punt them if any of the >>>> NO_OFFLOAD flags are set. >>>> >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>> --- >>>> io_uring/io_uring.c | 2 +- >>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- >>>> io_uring/opdef.h | 2 ++ >>>> 3 files changed, 23 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>> index fee3e461e149..420cfd35ebc6 100644 >>>> --- a/io_uring/io_uring.c >>>> +++ b/io_uring/io_uring.c >>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >>>> return -EBADF; >>>> >>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && >>>> - (!req->file || !file_can_poll(req->file))) >>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) >>>> issue_flags &= ~IO_URING_F_NONBLOCK; >>> >>> I guess the check should be !def->always_iowq? >> >> How so? Nobody that takes pollable files should/is setting >> ->always_iowq. If we can poll the file, we should not force inline >> submission. Basically the ones setting ->always_iowq always do -EAGAIN >> returns if nonblock == true. > > I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and > these OPs won't return -EAGAIN, then run in the current task context > directly. Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of it :-)
On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: > On 4/24/23 6:57?PM, Ming Lei wrote: > > On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: > >> On 4/24/23 1:30?AM, Ming Lei wrote: > >>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: > >>>> Add an opdef bit for them, and set it for the opcodes where we always > >>>> need io-wq punt. With that done, exclude them from the file_can_poll() > >>>> check in terms of whether or not we need to punt them if any of the > >>>> NO_OFFLOAD flags are set. > >>>> > >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> > >>>> --- > >>>> io_uring/io_uring.c | 2 +- > >>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- > >>>> io_uring/opdef.h | 2 ++ > >>>> 3 files changed, 23 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > >>>> index fee3e461e149..420cfd35ebc6 100644 > >>>> --- a/io_uring/io_uring.c > >>>> +++ b/io_uring/io_uring.c > >>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > >>>> return -EBADF; > >>>> > >>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && > >>>> - (!req->file || !file_can_poll(req->file))) > >>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) > >>>> issue_flags &= ~IO_URING_F_NONBLOCK; > >>> > >>> I guess the check should be !def->always_iowq? > >> > >> How so? Nobody that takes pollable files should/is setting > >> ->always_iowq. If we can poll the file, we should not force inline > >> submission. Basically the ones setting ->always_iowq always do -EAGAIN > >> returns if nonblock == true. > > > > I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and > > these OPs won't return -EAGAIN, then run in the current task context > > directly. > > Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of > it :-) But ->always_iowq isn't actually _always_ since fallocate/fsync/... are not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of ->always_iowq is a bit confusing? Thanks, Ming
On 4/24/23 8:13?PM, Ming Lei wrote: > On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: >> On 4/24/23 6:57?PM, Ming Lei wrote: >>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: >>>> On 4/24/23 1:30?AM, Ming Lei wrote: >>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: >>>>>> Add an opdef bit for them, and set it for the opcodes where we always >>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() >>>>>> check in terms of whether or not we need to punt them if any of the >>>>>> NO_OFFLOAD flags are set. >>>>>> >>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>>> --- >>>>>> io_uring/io_uring.c | 2 +- >>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- >>>>>> io_uring/opdef.h | 2 ++ >>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>>>> index fee3e461e149..420cfd35ebc6 100644 >>>>>> --- a/io_uring/io_uring.c >>>>>> +++ b/io_uring/io_uring.c >>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >>>>>> return -EBADF; >>>>>> >>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && >>>>>> - (!req->file || !file_can_poll(req->file))) >>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) >>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; >>>>> >>>>> I guess the check should be !def->always_iowq? >>>> >>>> How so? Nobody that takes pollable files should/is setting >>>> ->always_iowq. If we can poll the file, we should not force inline >>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN >>>> returns if nonblock == true. >>> >>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and >>> these OPs won't return -EAGAIN, then run in the current task context >>> directly. >> >> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of >> it :-) > > But ->always_iowq isn't actually _always_ since fallocate/fsync/... are > not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of > ->always_iowq is a bit confusing? Yeah naming isn't that great, I can see how that's bit confusing. I'll be happy to take suggestions on what would make it clearer.
On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: > On 4/24/23 8:13?PM, Ming Lei wrote: > > On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: > >> On 4/24/23 6:57?PM, Ming Lei wrote: > >>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: > >>>> On 4/24/23 1:30?AM, Ming Lei wrote: > >>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: > >>>>>> Add an opdef bit for them, and set it for the opcodes where we always > >>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() > >>>>>> check in terms of whether or not we need to punt them if any of the > >>>>>> NO_OFFLOAD flags are set. > >>>>>> > >>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> > >>>>>> --- > >>>>>> io_uring/io_uring.c | 2 +- > >>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- > >>>>>> io_uring/opdef.h | 2 ++ > >>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) > >>>>>> > >>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > >>>>>> index fee3e461e149..420cfd35ebc6 100644 > >>>>>> --- a/io_uring/io_uring.c > >>>>>> +++ b/io_uring/io_uring.c > >>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > >>>>>> return -EBADF; > >>>>>> > >>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && > >>>>>> - (!req->file || !file_can_poll(req->file))) > >>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) > >>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; > >>>>> > >>>>> I guess the check should be !def->always_iowq? > >>>> > >>>> How so? Nobody that takes pollable files should/is setting > >>>> ->always_iowq. If we can poll the file, we should not force inline > >>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN > >>>> returns if nonblock == true. > >>> > >>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and > >>> these OPs won't return -EAGAIN, then run in the current task context > >>> directly. > >> > >> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of > >> it :-) > > > > But ->always_iowq isn't actually _always_ since fallocate/fsync/... are > > not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of > > ->always_iowq is a bit confusing? > > Yeah naming isn't that great, I can see how that's bit confusing. I'll > be happy to take suggestions on what would make it clearer. Except for the naming, I am also wondering why these ->always_iowq OPs aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it shouldn't improve performance by doing so because these OPs are supposed to be slow and always slept, not like others(buffered writes, ...), can you provide one hint about not offloading these OPs? Or is it just that NO_OFFLOAD needs to not offload every OPs? Or can we rename IORING_SETUP_NO_OFFLOAD as IORING_SETUP_SUBMIT_MAY_WAIT and still punt ->always_iowq OPs to iowq? Thanks, Ming
On 4/24/23 8:50?PM, Ming Lei wrote: > On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: >> On 4/24/23 8:13?PM, Ming Lei wrote: >>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: >>>> On 4/24/23 6:57?PM, Ming Lei wrote: >>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: >>>>>> On 4/24/23 1:30?AM, Ming Lei wrote: >>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: >>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always >>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() >>>>>>>> check in terms of whether or not we need to punt them if any of the >>>>>>>> NO_OFFLOAD flags are set. >>>>>>>> >>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>>>>> --- >>>>>>>> io_uring/io_uring.c | 2 +- >>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- >>>>>>>> io_uring/opdef.h | 2 ++ >>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) >>>>>>>> >>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>>>>>> index fee3e461e149..420cfd35ebc6 100644 >>>>>>>> --- a/io_uring/io_uring.c >>>>>>>> +++ b/io_uring/io_uring.c >>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >>>>>>>> return -EBADF; >>>>>>>> >>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && >>>>>>>> - (!req->file || !file_can_poll(req->file))) >>>>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) >>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; >>>>>>> >>>>>>> I guess the check should be !def->always_iowq? >>>>>> >>>>>> How so? Nobody that takes pollable files should/is setting >>>>>> ->always_iowq. If we can poll the file, we should not force inline >>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN >>>>>> returns if nonblock == true. >>>>> >>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and >>>>> these OPs won't return -EAGAIN, then run in the current task context >>>>> directly. >>>> >>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of >>>> it :-) >>> >>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are >>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of >>> ->always_iowq is a bit confusing? >> >> Yeah naming isn't that great, I can see how that's bit confusing. I'll >> be happy to take suggestions on what would make it clearer. > > Except for the naming, I am also wondering why these ->always_iowq OPs > aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it > shouldn't improve performance by doing so because these OPs are supposed > to be slow and always slept, not like others(buffered writes, ...), > can you provide one hint about not offloading these OPs? Or is it just that > NO_OFFLOAD needs to not offload every OPs? The whole point of NO_OFFLOAD is that items that would normally be passed to io-wq are just run inline. This provides a way to reap the benefits of batched submissions and syscall reductions. Some opcodes will just never be async, and io-wq offloads are not very fast. Some of them can eventually be migrated to async support, if the underlying mechanics support it. You'll note that none of the ->always_iowq opcodes are pollable. If NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK cleared, as you'd just get -EAGAIN and then need to call them again with NONBLOCK cleared from the same context. For naming, maybe ->always_iowq is better as ->no_nonblock or something like that. Or perhaps get rid of the double negation and just call it ->blocking, or maybe ->no_async_support to make it clearer? > Or can we rename IORING_SETUP_NO_OFFLOAD as IORING_SETUP_SUBMIT_MAY_WAIT > and still punt ->always_iowq OPs to iowq? I think NO_OFFLOAD better explains that we'll never offload to io-wq. I would've called it NO_IOWQ, but I don't think that's understandable to users in the same way. The problem is that the user does need some knowledge of how ios are issued and completed in io_uring to fully grok what it does, which I'll put in the man pages.
On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: > On 4/24/23 8:50?PM, Ming Lei wrote: > > On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: > >> On 4/24/23 8:13?PM, Ming Lei wrote: > >>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: > >>>> On 4/24/23 6:57?PM, Ming Lei wrote: > >>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: > >>>>>> On 4/24/23 1:30?AM, Ming Lei wrote: > >>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: > >>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always > >>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() > >>>>>>>> check in terms of whether or not we need to punt them if any of the > >>>>>>>> NO_OFFLOAD flags are set. > >>>>>>>> > >>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> > >>>>>>>> --- > >>>>>>>> io_uring/io_uring.c | 2 +- > >>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- > >>>>>>>> io_uring/opdef.h | 2 ++ > >>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > >>>>>>>> index fee3e461e149..420cfd35ebc6 100644 > >>>>>>>> --- a/io_uring/io_uring.c > >>>>>>>> +++ b/io_uring/io_uring.c > >>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > >>>>>>>> return -EBADF; > >>>>>>>> > >>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && > >>>>>>>> - (!req->file || !file_can_poll(req->file))) > >>>>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) > >>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; > >>>>>>> > >>>>>>> I guess the check should be !def->always_iowq? > >>>>>> > >>>>>> How so? Nobody that takes pollable files should/is setting > >>>>>> ->always_iowq. If we can poll the file, we should not force inline > >>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN > >>>>>> returns if nonblock == true. > >>>>> > >>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and > >>>>> these OPs won't return -EAGAIN, then run in the current task context > >>>>> directly. > >>>> > >>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of > >>>> it :-) > >>> > >>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are > >>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of > >>> ->always_iowq is a bit confusing? > >> > >> Yeah naming isn't that great, I can see how that's bit confusing. I'll > >> be happy to take suggestions on what would make it clearer. > > > > Except for the naming, I am also wondering why these ->always_iowq OPs > > aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it > > shouldn't improve performance by doing so because these OPs are supposed > > to be slow and always slept, not like others(buffered writes, ...), > > can you provide one hint about not offloading these OPs? Or is it just that > > NO_OFFLOAD needs to not offload every OPs? > > The whole point of NO_OFFLOAD is that items that would normally be > passed to io-wq are just run inline. This provides a way to reap the > benefits of batched submissions and syscall reductions. Some opcodes > will just never be async, and io-wq offloads are not very fast. Some of Yeah, seems io-wq is much slower than inline issue, maybe it needs to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. > them can eventually be migrated to async support, if the underlying > mechanics support it. > > You'll note that none of the ->always_iowq opcodes are pollable. If True, then looks the ->always_iowq flag doesn't make a difference here because your patch clears IO_URING_F_NONBLOCK for !file_can_poll(req->file). Also almost all these ->always_iowq OPs are slow and blocked, if they are issued inline, the submission pipeline will be blocked. > NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK > cleared, as you'd just get -EAGAIN and then need to call them again with > NONBLOCK cleared from the same context. My point is that these OPs are slow and slept, so inline issue won't improve performance actually for them, and punting to io-wq couldn't be worse too. On the other side, inline issue may hurt perf because submission pipeline is blocked when issuing these OPs. Thanks, Ming
On 4/25/23 8:42?AM, Ming Lei wrote: > On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: >> On 4/24/23 8:50?PM, Ming Lei wrote: >>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: >>>> On 4/24/23 8:13?PM, Ming Lei wrote: >>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: >>>>>> On 4/24/23 6:57?PM, Ming Lei wrote: >>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: >>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote: >>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: >>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always >>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() >>>>>>>>>> check in terms of whether or not we need to punt them if any of the >>>>>>>>>> NO_OFFLOAD flags are set. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>>>>>>> --- >>>>>>>>>> io_uring/io_uring.c | 2 +- >>>>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- >>>>>>>>>> io_uring/opdef.h | 2 ++ >>>>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644 >>>>>>>>>> --- a/io_uring/io_uring.c >>>>>>>>>> +++ b/io_uring/io_uring.c >>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >>>>>>>>>> return -EBADF; >>>>>>>>>> >>>>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && >>>>>>>>>> - (!req->file || !file_can_poll(req->file))) >>>>>>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) >>>>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; >>>>>>>>> >>>>>>>>> I guess the check should be !def->always_iowq? >>>>>>>> >>>>>>>> How so? Nobody that takes pollable files should/is setting >>>>>>>> ->always_iowq. If we can poll the file, we should not force inline >>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN >>>>>>>> returns if nonblock == true. >>>>>>> >>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and >>>>>>> these OPs won't return -EAGAIN, then run in the current task context >>>>>>> directly. >>>>>> >>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of >>>>>> it :-) >>>>> >>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are >>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of >>>>> ->always_iowq is a bit confusing? >>>> >>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll >>>> be happy to take suggestions on what would make it clearer. >>> >>> Except for the naming, I am also wondering why these ->always_iowq OPs >>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it >>> shouldn't improve performance by doing so because these OPs are supposed >>> to be slow and always slept, not like others(buffered writes, ...), >>> can you provide one hint about not offloading these OPs? Or is it just that >>> NO_OFFLOAD needs to not offload every OPs? >> >> The whole point of NO_OFFLOAD is that items that would normally be >> passed to io-wq are just run inline. This provides a way to reap the >> benefits of batched submissions and syscall reductions. Some opcodes >> will just never be async, and io-wq offloads are not very fast. Some of > > Yeah, seems io-wq is much slower than inline issue, maybe it needs > to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. Indeed, depending on what is being linked, you may see io-wq activity which is not ideal. >> them can eventually be migrated to async support, if the underlying >> mechanics support it. >> >> You'll note that none of the ->always_iowq opcodes are pollable. If > > True, then looks the ->always_iowq flag doesn't make a difference here > because your patch clears IO_URING_F_NONBLOCK for !file_can_poll(req->file). Yep, we may be able to just get rid of it. The important bit is really getting rid of the forced setting of REQ_F_FORCE_ASYNC which the previous reverts take care of. So we can probably just drop this one, let me give it a spin. > Also almost all these ->always_iowq OPs are slow and blocked, if they are > issued inline, the submission pipeline will be blocked. That is true, but that's very much the tradeoff you make by using NO_OFFLOAD. I would expect any users of this to have two rings, one for just batched submissions, and one for "normal" usage. Or maybe they only do the batched submissions and one is fine. >> NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK >> cleared, as you'd just get -EAGAIN and then need to call them again with >> NONBLOCK cleared from the same context. > > My point is that these OPs are slow and slept, so inline issue won't > improve performance actually for them, and punting to io-wq couldn't > be worse too. On the other side, inline issue may hurt perf because > submission pipeline is blocked when issuing these OPs. That is definitely not true, it really depends on which ops it is. For a lot of them, they don't generally block, but we have to be prepared for them blocking. This is why they are offloaded. If they don't block and execute fast, then the io-wq offload is easily a 2-3x slowdown, while the batched submission can make it more efficient than simply doing the normal syscalls as you avoid quite a few syscalls. But you should not mix and match issue of these slower ops with "normal" issues, you should separate them.
On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote: > On 4/25/23 8:42?AM, Ming Lei wrote: > > On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: > >> On 4/24/23 8:50?PM, Ming Lei wrote: > >>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: > >>>> On 4/24/23 8:13?PM, Ming Lei wrote: > >>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: > >>>>>> On 4/24/23 6:57?PM, Ming Lei wrote: > >>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: > >>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote: > >>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: > >>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always > >>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() > >>>>>>>>>> check in terms of whether or not we need to punt them if any of the > >>>>>>>>>> NO_OFFLOAD flags are set. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> > >>>>>>>>>> --- > >>>>>>>>>> io_uring/io_uring.c | 2 +- > >>>>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- > >>>>>>>>>> io_uring/opdef.h | 2 ++ > >>>>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > >>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644 > >>>>>>>>>> --- a/io_uring/io_uring.c > >>>>>>>>>> +++ b/io_uring/io_uring.c > >>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > >>>>>>>>>> return -EBADF; > >>>>>>>>>> > >>>>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && > >>>>>>>>>> - (!req->file || !file_can_poll(req->file))) > >>>>>>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) > >>>>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; > >>>>>>>>> > >>>>>>>>> I guess the check should be !def->always_iowq? > >>>>>>>> > >>>>>>>> How so? Nobody that takes pollable files should/is setting > >>>>>>>> ->always_iowq. If we can poll the file, we should not force inline > >>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN > >>>>>>>> returns if nonblock == true. > >>>>>>> > >>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and > >>>>>>> these OPs won't return -EAGAIN, then run in the current task context > >>>>>>> directly. > >>>>>> > >>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of > >>>>>> it :-) > >>>>> > >>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are > >>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of > >>>>> ->always_iowq is a bit confusing? > >>>> > >>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll > >>>> be happy to take suggestions on what would make it clearer. > >>> > >>> Except for the naming, I am also wondering why these ->always_iowq OPs > >>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it > >>> shouldn't improve performance by doing so because these OPs are supposed > >>> to be slow and always slept, not like others(buffered writes, ...), > >>> can you provide one hint about not offloading these OPs? Or is it just that > >>> NO_OFFLOAD needs to not offload every OPs? > >> > >> The whole point of NO_OFFLOAD is that items that would normally be > >> passed to io-wq are just run inline. This provides a way to reap the > >> benefits of batched submissions and syscall reductions. Some opcodes > >> will just never be async, and io-wq offloads are not very fast. Some of > > > > Yeah, seems io-wq is much slower than inline issue, maybe it needs > > to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. > > Indeed, depending on what is being linked, you may see io-wq activity > which is not ideal. That is why I prefer to fused command for ublk zero copy, because the registering buffer approach suggested by Pavel and Ziyang has to link register buffer OP with the actual IO OP, and it is observed that IOPS drops to 1/2 in 4k random io test with registered buffer approach. > > >> them can eventually be migrated to async support, if the underlying > >> mechanics support it. > >> > >> You'll note that none of the ->always_iowq opcodes are pollable. If > > > > True, then looks the ->always_iowq flag doesn't make a difference here > > because your patch clears IO_URING_F_NONBLOCK for !file_can_poll(req->file). > > Yep, we may be able to just get rid of it. The important bit is really > getting rid of the forced setting of REQ_F_FORCE_ASYNC which the > previous reverts take care of. So we can probably just drop this one, > let me give it a spin. > > > Also almost all these ->always_iowq OPs are slow and blocked, if they are > > issued inline, the submission pipeline will be blocked. > > That is true, but that's very much the tradeoff you make by using > NO_OFFLOAD. I would expect any users of this to have two rings, one for > just batched submissions, and one for "normal" usage. Or maybe they only > do the batched submissions and one is fine. I guess that NO_OFFLOAD probably should be used for most of usecase, cause it does avoid slow io-wq if io-wq perf won't be improved. Also there is other issue for two rings, such as sync/communication between two rings, and single ring should be the easiest way. > > >> NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK > >> cleared, as you'd just get -EAGAIN and then need to call them again with > >> NONBLOCK cleared from the same context. > > > > My point is that these OPs are slow and slept, so inline issue won't > > improve performance actually for them, and punting to io-wq couldn't > > be worse too. On the other side, inline issue may hurt perf because > > submission pipeline is blocked when issuing these OPs. > > That is definitely not true, it really depends on which ops it is. For a > lot of them, they don't generally block, but we have to be prepared for OK, but fsync/fallocate does block. > them blocking. This is why they are offloaded. If they don't block and Got it. > execute fast, then the io-wq offload is easily a 2-3x slowdown, while > the batched submission can make it more efficient than simply doing the > normal syscalls as you avoid quite a few syscalls. > > But you should not mix and match issue of these slower ops with "normal" > issues, you should separate them. OK, I will keep it in mind when writing io_uring applications. thanks, Ming
On 4/25/23 9:07?AM, Ming Lei wrote: > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote: >> On 4/25/23 8:42?AM, Ming Lei wrote: >>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: >>>> On 4/24/23 8:50?PM, Ming Lei wrote: >>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: >>>>>> On 4/24/23 8:13?PM, Ming Lei wrote: >>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: >>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote: >>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: >>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote: >>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: >>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always >>>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() >>>>>>>>>>>> check in terms of whether or not we need to punt them if any of the >>>>>>>>>>>> NO_OFFLOAD flags are set. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>>>>>>>>> --- >>>>>>>>>>>> io_uring/io_uring.c | 2 +- >>>>>>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- >>>>>>>>>>>> io_uring/opdef.h | 2 ++ >>>>>>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644 >>>>>>>>>>>> --- a/io_uring/io_uring.c >>>>>>>>>>>> +++ b/io_uring/io_uring.c >>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >>>>>>>>>>>> return -EBADF; >>>>>>>>>>>> >>>>>>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && >>>>>>>>>>>> - (!req->file || !file_can_poll(req->file))) >>>>>>>>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) >>>>>>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; >>>>>>>>>>> >>>>>>>>>>> I guess the check should be !def->always_iowq? >>>>>>>>>> >>>>>>>>>> How so? Nobody that takes pollable files should/is setting >>>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline >>>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN >>>>>>>>>> returns if nonblock == true. >>>>>>>>> >>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and >>>>>>>>> these OPs won't return -EAGAIN, then run in the current task context >>>>>>>>> directly. >>>>>>>> >>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of >>>>>>>> it :-) >>>>>>> >>>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are >>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of >>>>>>> ->always_iowq is a bit confusing? >>>>>> >>>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll >>>>>> be happy to take suggestions on what would make it clearer. >>>>> >>>>> Except for the naming, I am also wondering why these ->always_iowq OPs >>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it >>>>> shouldn't improve performance by doing so because these OPs are supposed >>>>> to be slow and always slept, not like others(buffered writes, ...), >>>>> can you provide one hint about not offloading these OPs? Or is it just that >>>>> NO_OFFLOAD needs to not offload every OPs? >>>> >>>> The whole point of NO_OFFLOAD is that items that would normally be >>>> passed to io-wq are just run inline. This provides a way to reap the >>>> benefits of batched submissions and syscall reductions. Some opcodes >>>> will just never be async, and io-wq offloads are not very fast. Some of >>> >>> Yeah, seems io-wq is much slower than inline issue, maybe it needs >>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. >> >> Indeed, depending on what is being linked, you may see io-wq activity >> which is not ideal. > > That is why I prefer to fused command for ublk zero copy, because the > registering buffer approach suggested by Pavel and Ziyang has to link > register buffer OP with the actual IO OP, and it is observed that > IOPS drops to 1/2 in 4k random io test with registered buffer approach. It'd be worth looking into if we can avoid io-wq for link execution, as that'd be a nice win overall too. IIRC, there's no reason why it can't be done like initial issue rather than just a lazy punt to io-wq. That's not really related to fused command support or otherwise for that, it'd just be a generic improvement. But it may indeed make the linekd approach viable for that too. >>>> them can eventually be migrated to async support, if the underlying >>>> mechanics support it. >>>> >>>> You'll note that none of the ->always_iowq opcodes are pollable. If >>> >>> True, then looks the ->always_iowq flag doesn't make a difference here >>> because your patch clears IO_URING_F_NONBLOCK for !file_can_poll(req->file). Actually not sure that's the case, as we have plenty of ops that are not pollable, yet are perfectly fine for a nonblocking issue. Things like any read/write on a regular file or block device. >> Yep, we may be able to just get rid of it. The important bit is really >> getting rid of the forced setting of REQ_F_FORCE_ASYNC which the >> previous reverts take care of. So we can probably just drop this one, >> let me give it a spin. >> >>> Also almost all these ->always_iowq OPs are slow and blocked, if they are >>> issued inline, the submission pipeline will be blocked. >> >> That is true, but that's very much the tradeoff you make by using >> NO_OFFLOAD. I would expect any users of this to have two rings, one for >> just batched submissions, and one for "normal" usage. Or maybe they only >> do the batched submissions and one is fine. > > I guess that NO_OFFLOAD probably should be used for most of usecase, > cause it does avoid slow io-wq if io-wq perf won't be improved. > > Also there is other issue for two rings, such as sync/communication > between two rings, and single ring should be the easiest way. I think some use cases may indeed just use that and be fine with it, also because it is probably not uncommon to bundle the issues and hence not really mix and match for issue. But this is a vastly different use case than fast IO cases, for storage and networking. Though those will bypass that anyway as they can do nonblocking issue just fine. >>>> NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK >>>> cleared, as you'd just get -EAGAIN and then need to call them again with >>>> NONBLOCK cleared from the same context. >>> >>> My point is that these OPs are slow and slept, so inline issue won't >>> improve performance actually for them, and punting to io-wq couldn't >>> be worse too. On the other side, inline issue may hurt perf because >>> submission pipeline is blocked when issuing these OPs. >> >> That is definitely not true, it really depends on which ops it is. For a >> lot of them, they don't generally block, but we have to be prepared for > > OK, but fsync/fallocate does block. They do, but statx, fadvise, madvise, rename, shutdown, etc (basically all the rest of them) do not for a lot of cases.
On 4/25/23 15:42, Ming Lei wrote: > On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: >> On 4/24/23 8:50?PM, Ming Lei wrote: >>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: >>>> On 4/24/23 8:13?PM, Ming Lei wrote: >>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: >>>>>> On 4/24/23 6:57?PM, Ming Lei wrote: >>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: >>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote: >>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: >>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always >>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() >>>>>>>>>> check in terms of whether or not we need to punt them if any of the >>>>>>>>>> NO_OFFLOAD flags are set. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>>>>>>> --- >>>>>>>>>> io_uring/io_uring.c | 2 +- >>>>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- >>>>>>>>>> io_uring/opdef.h | 2 ++ >>>>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644 >>>>>>>>>> --- a/io_uring/io_uring.c >>>>>>>>>> +++ b/io_uring/io_uring.c >>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >>>>>>>>>> return -EBADF; >>>>>>>>>> >>>>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && >>>>>>>>>> - (!req->file || !file_can_poll(req->file))) >>>>>>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) >>>>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; >>>>>>>>> >>>>>>>>> I guess the check should be !def->always_iowq? >>>>>>>> >>>>>>>> How so? Nobody that takes pollable files should/is setting >>>>>>>> ->always_iowq. If we can poll the file, we should not force inline >>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN >>>>>>>> returns if nonblock == true. >>>>>>> >>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and >>>>>>> these OPs won't return -EAGAIN, then run in the current task context >>>>>>> directly. >>>>>> >>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of >>>>>> it :-) >>>>> >>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are >>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of >>>>> ->always_iowq is a bit confusing? >>>> >>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll >>>> be happy to take suggestions on what would make it clearer. >>> >>> Except for the naming, I am also wondering why these ->always_iowq OPs >>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it >>> shouldn't improve performance by doing so because these OPs are supposed >>> to be slow and always slept, not like others(buffered writes, ...), >>> can you provide one hint about not offloading these OPs? Or is it just that >>> NO_OFFLOAD needs to not offload every OPs? >> >> The whole point of NO_OFFLOAD is that items that would normally be >> passed to io-wq are just run inline. This provides a way to reap the >> benefits of batched submissions and syscall reductions. Some opcodes >> will just never be async, and io-wq offloads are not very fast. Some of > > Yeah, seems io-wq is much slower than inline issue, maybe it needs > to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. There were attempts like this one from Hao (CC'ed) https://lore.kernel.org/io-uring/20220627133541.15223-5-hao.xu@linux.dev/t/ Not sure why it got stalled, but maybe Hao would be willing to pick it up again.
On 4/25/23 16:25, Jens Axboe wrote: > On 4/25/23 9:07?AM, Ming Lei wrote: >> On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote: >>> On 4/25/23 8:42?AM, Ming Lei wrote: >>>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: >>>>> On 4/24/23 8:50?PM, Ming Lei wrote: >>>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: >>>>>>> On 4/24/23 8:13?PM, Ming Lei wrote: >>>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: >>>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote: >>>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: >>>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote: >>>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: >>>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always >>>>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() >>>>>>>>>>>>> check in terms of whether or not we need to punt them if any of the >>>>>>>>>>>>> NO_OFFLOAD flags are set. >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>>>>>>>>>> --- >>>>>>>>>>>>> io_uring/io_uring.c | 2 +- >>>>>>>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- >>>>>>>>>>>>> io_uring/opdef.h | 2 ++ >>>>>>>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644 >>>>>>>>>>>>> --- a/io_uring/io_uring.c >>>>>>>>>>>>> +++ b/io_uring/io_uring.c >>>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >>>>>>>>>>>>> return -EBADF; >>>>>>>>>>>>> >>>>>>>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && >>>>>>>>>>>>> - (!req->file || !file_can_poll(req->file))) >>>>>>>>>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) >>>>>>>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; >>>>>>>>>>>> >>>>>>>>>>>> I guess the check should be !def->always_iowq? >>>>>>>>>>> >>>>>>>>>>> How so? Nobody that takes pollable files should/is setting >>>>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline >>>>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN >>>>>>>>>>> returns if nonblock == true. >>>>>>>>>> >>>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and >>>>>>>>>> these OPs won't return -EAGAIN, then run in the current task context >>>>>>>>>> directly. >>>>>>>>> >>>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of >>>>>>>>> it :-) >>>>>>>> >>>>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are >>>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of >>>>>>>> ->always_iowq is a bit confusing? >>>>>>> >>>>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll >>>>>>> be happy to take suggestions on what would make it clearer. >>>>>> >>>>>> Except for the naming, I am also wondering why these ->always_iowq OPs >>>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it >>>>>> shouldn't improve performance by doing so because these OPs are supposed >>>>>> to be slow and always slept, not like others(buffered writes, ...), >>>>>> can you provide one hint about not offloading these OPs? Or is it just that >>>>>> NO_OFFLOAD needs to not offload every OPs? >>>>> >>>>> The whole point of NO_OFFLOAD is that items that would normally be >>>>> passed to io-wq are just run inline. This provides a way to reap the >>>>> benefits of batched submissions and syscall reductions. Some opcodes >>>>> will just never be async, and io-wq offloads are not very fast. Some of >>>> >>>> Yeah, seems io-wq is much slower than inline issue, maybe it needs >>>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. >>> >>> Indeed, depending on what is being linked, you may see io-wq activity >>> which is not ideal. >> >> That is why I prefer to fused command for ublk zero copy, because the >> registering buffer approach suggested by Pavel and Ziyang has to link >> register buffer OP with the actual IO OP, and it is observed that >> IOPS drops to 1/2 in 4k random io test with registered buffer approach. > > It'd be worth looking into if we can avoid io-wq for link execution, as > that'd be a nice win overall too. IIRC, there's no reason why it can't > be done like initial issue rather than just a lazy punt to io-wq. I might've missed a part of the discussion, but links are _usually_ executed by task_work, e.g. io_submit_flush_completions() -> io_queue_next() -> io_req_task_queue() There is one optimisation where if we're already in io-wq, it'll try to serve the next linked request by the same io-wq worker with no overhead on requeueing, but otherwise it'll only get there if the request can't be executed nowait / async as per usual rules. The tw execution part might be further optimised, it can be executed almost in place instead of queueing a tw. It saved quite a lot of CPU when I tried it with BPF requests.
On 4/25/23 16:07, Ming Lei wrote: > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote: >> On 4/25/23 8:42?AM, Ming Lei wrote: >>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: >>>> On 4/24/23 8:50?PM, Ming Lei wrote: >>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: >>>>>> On 4/24/23 8:13?PM, Ming Lei wrote: >>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: >>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote: >>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: >>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote: >>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: >>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always >>>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() >>>>>>>>>>>> check in terms of whether or not we need to punt them if any of the >>>>>>>>>>>> NO_OFFLOAD flags are set. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>>>>>>>>> --- >>>>>>>>>>>> io_uring/io_uring.c | 2 +- >>>>>>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- >>>>>>>>>>>> io_uring/opdef.h | 2 ++ >>>>>>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644 >>>>>>>>>>>> --- a/io_uring/io_uring.c >>>>>>>>>>>> +++ b/io_uring/io_uring.c >>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) >>>>>>>>>>>> return -EBADF; >>>>>>>>>>>> >>>>>>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && >>>>>>>>>>>> - (!req->file || !file_can_poll(req->file))) >>>>>>>>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) >>>>>>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; >>>>>>>>>>> >>>>>>>>>>> I guess the check should be !def->always_iowq? >>>>>>>>>> >>>>>>>>>> How so? Nobody that takes pollable files should/is setting >>>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline >>>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN >>>>>>>>>> returns if nonblock == true. >>>>>>>>> >>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and >>>>>>>>> these OPs won't return -EAGAIN, then run in the current task context >>>>>>>>> directly. >>>>>>>> >>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of >>>>>>>> it :-) >>>>>>> >>>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are >>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of >>>>>>> ->always_iowq is a bit confusing? >>>>>> >>>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll >>>>>> be happy to take suggestions on what would make it clearer. >>>>> >>>>> Except for the naming, I am also wondering why these ->always_iowq OPs >>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it >>>>> shouldn't improve performance by doing so because these OPs are supposed >>>>> to be slow and always slept, not like others(buffered writes, ...), >>>>> can you provide one hint about not offloading these OPs? Or is it just that >>>>> NO_OFFLOAD needs to not offload every OPs? >>>> >>>> The whole point of NO_OFFLOAD is that items that would normally be >>>> passed to io-wq are just run inline. This provides a way to reap the >>>> benefits of batched submissions and syscall reductions. Some opcodes >>>> will just never be async, and io-wq offloads are not very fast. Some of >>> >>> Yeah, seems io-wq is much slower than inline issue, maybe it needs >>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. >> >> Indeed, depending on what is being linked, you may see io-wq activity >> which is not ideal. > > That is why I prefer to fused command for ublk zero copy, because the > registering buffer approach suggested by Pavel and Ziyang has to link > register buffer OP with the actual IO OP, and it is observed that > IOPS drops to 1/2 in 4k random io test with registered buffer approach. What's good about it is that you can use linked requests with it but you _don't have to_. Curiously, I just recently compared submitting 8 two-request links (16 reqs in total) vs submit(8)+submit(8), all that in a loop. The latter was faster. It wasn't a clean experiment, but shows that links are not super fast and would be nice to get them better. For the register buf approach, I tried it out, looked good to me. It outperforms splice requests (with a hack that removes force iowq execution) by 5-10% with synthetic benchmark. Works better than splice(2) for QD>=2. Let me send it out, perhaps today, so we can figure out how it compares against ublk/fused and see the margin is.
On Tue, Apr 25, 2023 at 09:25:35AM -0600, Jens Axboe wrote: > On 4/25/23 9:07?AM, Ming Lei wrote: > > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote: > >> On 4/25/23 8:42?AM, Ming Lei wrote: > >>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: > >>>> On 4/24/23 8:50?PM, Ming Lei wrote: > >>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: > >>>>>> On 4/24/23 8:13?PM, Ming Lei wrote: > >>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: > >>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote: > >>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: > >>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote: > >>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: > >>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always > >>>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() > >>>>>>>>>>>> check in terms of whether or not we need to punt them if any of the > >>>>>>>>>>>> NO_OFFLOAD flags are set. > >>>>>>>>>>>> > >>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> > >>>>>>>>>>>> --- > >>>>>>>>>>>> io_uring/io_uring.c | 2 +- > >>>>>>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- > >>>>>>>>>>>> io_uring/opdef.h | 2 ++ > >>>>>>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > >>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644 > >>>>>>>>>>>> --- a/io_uring/io_uring.c > >>>>>>>>>>>> +++ b/io_uring/io_uring.c > >>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > >>>>>>>>>>>> return -EBADF; > >>>>>>>>>>>> > >>>>>>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && > >>>>>>>>>>>> - (!req->file || !file_can_poll(req->file))) > >>>>>>>>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) > >>>>>>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; > >>>>>>>>>>> > >>>>>>>>>>> I guess the check should be !def->always_iowq? > >>>>>>>>>> > >>>>>>>>>> How so? Nobody that takes pollable files should/is setting > >>>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline > >>>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN > >>>>>>>>>> returns if nonblock == true. > >>>>>>>>> > >>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and > >>>>>>>>> these OPs won't return -EAGAIN, then run in the current task context > >>>>>>>>> directly. > >>>>>>>> > >>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of > >>>>>>>> it :-) > >>>>>>> > >>>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are > >>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of > >>>>>>> ->always_iowq is a bit confusing? > >>>>>> > >>>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll > >>>>>> be happy to take suggestions on what would make it clearer. > >>>>> > >>>>> Except for the naming, I am also wondering why these ->always_iowq OPs > >>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it > >>>>> shouldn't improve performance by doing so because these OPs are supposed > >>>>> to be slow and always slept, not like others(buffered writes, ...), > >>>>> can you provide one hint about not offloading these OPs? Or is it just that > >>>>> NO_OFFLOAD needs to not offload every OPs? > >>>> > >>>> The whole point of NO_OFFLOAD is that items that would normally be > >>>> passed to io-wq are just run inline. This provides a way to reap the > >>>> benefits of batched submissions and syscall reductions. Some opcodes > >>>> will just never be async, and io-wq offloads are not very fast. Some of > >>> > >>> Yeah, seems io-wq is much slower than inline issue, maybe it needs > >>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. > >> > >> Indeed, depending on what is being linked, you may see io-wq activity > >> which is not ideal. > > > > That is why I prefer to fused command for ublk zero copy, because the > > registering buffer approach suggested by Pavel and Ziyang has to link > > register buffer OP with the actual IO OP, and it is observed that > > IOPS drops to 1/2 in 4k random io test with registered buffer approach. > > It'd be worth looking into if we can avoid io-wq for link execution, as > that'd be a nice win overall too. IIRC, there's no reason why it can't > be done like initial issue rather than just a lazy punt to io-wq. > > That's not really related to fused command support or otherwise for > that, it'd just be a generic improvement. But it may indeed make the > linekd approach viable for that too. Performance degrade with io-wq is just one reason, another thing is that the current link model doesn't support 1:N dependency, such as, if one buffer is registered, the following N OPs depend the registered the buffer, but actually all the N OPs(requests) have to be run one by one, that is one limit of current io_uring link model. Fused command actually performs pretty good because: 1) no io-wq is involved 2) allow the following N OPs to be issued concurrently 3) avoid to register the buffer to per-context data(we can ignore this part so far, cause the uring_lock should help us to avoid the contention) > > >>>> them can eventually be migrated to async support, if the underlying > >>>> mechanics support it. > >>>> > >>>> You'll note that none of the ->always_iowq opcodes are pollable. If > >>> > >>> True, then looks the ->always_iowq flag doesn't make a difference here > >>> because your patch clears IO_URING_F_NONBLOCK for !file_can_poll(req->file). > > Actually not sure that's the case, as we have plenty of ops that are not > pollable, yet are perfectly fine for a nonblocking issue. Things like > any read/write on a regular file or block device. But you mentioned "none of the ->always_iowq opcodes are pollable". If that isn't true, it is fine to add ->always_iowq. > > >> Yep, we may be able to just get rid of it. The important bit is really > >> getting rid of the forced setting of REQ_F_FORCE_ASYNC which the > >> previous reverts take care of. So we can probably just drop this one, > >> let me give it a spin. > >> > >>> Also almost all these ->always_iowq OPs are slow and blocked, if they are > >>> issued inline, the submission pipeline will be blocked. > >> > >> That is true, but that's very much the tradeoff you make by using > >> NO_OFFLOAD. I would expect any users of this to have two rings, one for > >> just batched submissions, and one for "normal" usage. Or maybe they only > >> do the batched submissions and one is fine. > > > > I guess that NO_OFFLOAD probably should be used for most of usecase, > > cause it does avoid slow io-wq if io-wq perf won't be improved. > > > > Also there is other issue for two rings, such as sync/communication > > between two rings, and single ring should be the easiest way. > > I think some use cases may indeed just use that and be fine with it, > also because it is probably not uncommon to bundle the issues and hence > not really mix and match for issue. But this is a vastly different use > case than fast IO cases, for storage and networking. Though those will > bypass that anyway as they can do nonblocking issue just fine. > > >>>> NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK > >>>> cleared, as you'd just get -EAGAIN and then need to call them again with > >>>> NONBLOCK cleared from the same context. > >>> > >>> My point is that these OPs are slow and slept, so inline issue won't > >>> improve performance actually for them, and punting to io-wq couldn't > >>> be worse too. On the other side, inline issue may hurt perf because > >>> submission pipeline is blocked when issuing these OPs. > >> > >> That is definitely not true, it really depends on which ops it is. For a > >> lot of them, they don't generally block, but we have to be prepared for > > > > OK, but fsync/fallocate does block. > > They do, but statx, fadvise, madvise, rename, shutdown, etc (basically > all the rest of them) do not for a lot of cases. OK, but fsync/fallocate is often mixed with normal IOs. Thanks, Ming
On Tue, Apr 25, 2023 at 04:46:03PM +0100, Pavel Begunkov wrote: > On 4/25/23 16:25, Jens Axboe wrote: > > On 4/25/23 9:07?AM, Ming Lei wrote: > > > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote: > > > > On 4/25/23 8:42?AM, Ming Lei wrote: > > > > > On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: > > > > > > On 4/24/23 8:50?PM, Ming Lei wrote: > > > > > > > On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: > > > > > > > > On 4/24/23 8:13?PM, Ming Lei wrote: > > > > > > > > > On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: > > > > > > > > > > On 4/24/23 6:57?PM, Ming Lei wrote: > > > > > > > > > > > On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: > > > > > > > > > > > > On 4/24/23 1:30?AM, Ming Lei wrote: > > > > > > > > > > > > > On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: > > > > > > > > > > > > > > Add an opdef bit for them, and set it for the opcodes where we always > > > > > > > > > > > > > > need io-wq punt. With that done, exclude them from the file_can_poll() > > > > > > > > > > > > > > check in terms of whether or not we need to punt them if any of the > > > > > > > > > > > > > > NO_OFFLOAD flags are set. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > io_uring/io_uring.c | 2 +- > > > > > > > > > > > > > > io_uring/opdef.c | 22 ++++++++++++++++++++-- > > > > > > > > > > > > > > io_uring/opdef.h | 2 ++ > > > > > > > > > > > > > > 3 files changed, 23 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > > > > > > > > > > > > > > index fee3e461e149..420cfd35ebc6 100644 > > > > > > > > > > > > > > --- a/io_uring/io_uring.c > > > > > > > > > > > > > > +++ b/io_uring/io_uring.c > > > > > > > > > > > > > > @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > > > > > > > > > > > > > > return -EBADF; > > > > > > > > > > > > > > if (issue_flags & IO_URING_F_NO_OFFLOAD && > > > > > > > > > > > > > > - (!req->file || !file_can_poll(req->file))) > > > > > > > > > > > > > > + (!req->file || !file_can_poll(req->file) || def->always_iowq)) > > > > > > > > > > > > > > issue_flags &= ~IO_URING_F_NONBLOCK; > > > > > > > > > > > > > > > > > > > > > > > > > > I guess the check should be !def->always_iowq? > > > > > > > > > > > > > > > > > > > > > > > > How so? Nobody that takes pollable files should/is setting > > > > > > > > > > > > ->always_iowq. If we can poll the file, we should not force inline > > > > > > > > > > > > submission. Basically the ones setting ->always_iowq always do -EAGAIN > > > > > > > > > > > > returns if nonblock == true. > > > > > > > > > > > > > > > > > > > > > > I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and > > > > > > > > > > > these OPs won't return -EAGAIN, then run in the current task context > > > > > > > > > > > directly. > > > > > > > > > > > > > > > > > > > > Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of > > > > > > > > > > it :-) > > > > > > > > > > > > > > > > > > But ->always_iowq isn't actually _always_ since fallocate/fsync/... are > > > > > > > > > not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of > > > > > > > > > ->always_iowq is a bit confusing? > > > > > > > > > > > > > > > > Yeah naming isn't that great, I can see how that's bit confusing. I'll > > > > > > > > be happy to take suggestions on what would make it clearer. > > > > > > > > > > > > > > Except for the naming, I am also wondering why these ->always_iowq OPs > > > > > > > aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it > > > > > > > shouldn't improve performance by doing so because these OPs are supposed > > > > > > > to be slow and always slept, not like others(buffered writes, ...), > > > > > > > can you provide one hint about not offloading these OPs? Or is it just that > > > > > > > NO_OFFLOAD needs to not offload every OPs? > > > > > > > > > > > > The whole point of NO_OFFLOAD is that items that would normally be > > > > > > passed to io-wq are just run inline. This provides a way to reap the > > > > > > benefits of batched submissions and syscall reductions. Some opcodes > > > > > > will just never be async, and io-wq offloads are not very fast. Some of > > > > > > > > > > Yeah, seems io-wq is much slower than inline issue, maybe it needs > > > > > to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. > > > > > > > > Indeed, depending on what is being linked, you may see io-wq activity > > > > which is not ideal. > > > > > > That is why I prefer to fused command for ublk zero copy, because the > > > registering buffer approach suggested by Pavel and Ziyang has to link > > > register buffer OP with the actual IO OP, and it is observed that > > > IOPS drops to 1/2 in 4k random io test with registered buffer approach. > > > > It'd be worth looking into if we can avoid io-wq for link execution, as > > that'd be a nice win overall too. IIRC, there's no reason why it can't > > be done like initial issue rather than just a lazy punt to io-wq. > > I might've missed a part of the discussion, but links are _usually_ > executed by task_work, e.g. > > io_submit_flush_completions() -> io_queue_next() -> io_req_task_queue() Good catch, just figured out that /dev/ublkcN & backing file isn't opened by O_NONBLOCK. But -EAGAIN is still returned from io_write() even though the regular file is opened with O_DIRECT, at least on btrfs & xfs, so io wq is still scheduled. Not look into the exact reason yet, and not see such issue for block device. Anyway, it isn't related with io wq. However, in case that multiple OPs depends on this registered buffer, the other OPs(from the 2nd to the last one) are still run one by one can be submitted concurrently, and fused command does not have such limit. > > There is one optimisation where if we're already in io-wq, it'll try > to serve the next linked request by the same io-wq worker with no > overhead on requeueing, but otherwise it'll only get there if > the request can't be executed nowait / async as per usual rules. > > The tw execution part might be further optimised, it can be executed > almost in place instead of queueing a tw. It saved quite a lot of CPU > when I tried it with BPF requests. OK, once it is done, I am happy to test it. Thanks, Ming
On Tue, Apr 25, 2023 at 05:10:41PM +0100, Pavel Begunkov wrote: > On 4/25/23 16:07, Ming Lei wrote: > > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote: > > > On 4/25/23 8:42?AM, Ming Lei wrote: > > > > On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: > > > > > On 4/24/23 8:50?PM, Ming Lei wrote: > > > > > > On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: > > > > > > > On 4/24/23 8:13?PM, Ming Lei wrote: > > > > > > > > On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: > > > > > > > > > On 4/24/23 6:57?PM, Ming Lei wrote: > > > > > > > > > > On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: > > > > > > > > > > > On 4/24/23 1:30?AM, Ming Lei wrote: > > > > > > > > > > > > On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: > > > > > > > > > > > > > Add an opdef bit for them, and set it for the opcodes where we always > > > > > > > > > > > > > need io-wq punt. With that done, exclude them from the file_can_poll() > > > > > > > > > > > > > check in terms of whether or not we need to punt them if any of the > > > > > > > > > > > > > NO_OFFLOAD flags are set. > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > > > > > > > > > > > > --- > > > > > > > > > > > > > io_uring/io_uring.c | 2 +- > > > > > > > > > > > > > io_uring/opdef.c | 22 ++++++++++++++++++++-- > > > > > > > > > > > > > io_uring/opdef.h | 2 ++ > > > > > > > > > > > > > 3 files changed, 23 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > > > > > > > > > > > > > index fee3e461e149..420cfd35ebc6 100644 > > > > > > > > > > > > > --- a/io_uring/io_uring.c > > > > > > > > > > > > > +++ b/io_uring/io_uring.c > > > > > > > > > > > > > @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > > > > > > > > > > > > > return -EBADF; > > > > > > > > > > > > > if (issue_flags & IO_URING_F_NO_OFFLOAD && > > > > > > > > > > > > > - (!req->file || !file_can_poll(req->file))) > > > > > > > > > > > > > + (!req->file || !file_can_poll(req->file) || def->always_iowq)) > > > > > > > > > > > > > issue_flags &= ~IO_URING_F_NONBLOCK; > > > > > > > > > > > > > > > > > > > > > > > > I guess the check should be !def->always_iowq? > > > > > > > > > > > > > > > > > > > > > > How so? Nobody that takes pollable files should/is setting > > > > > > > > > > > ->always_iowq. If we can poll the file, we should not force inline > > > > > > > > > > > submission. Basically the ones setting ->always_iowq always do -EAGAIN > > > > > > > > > > > returns if nonblock == true. > > > > > > > > > > > > > > > > > > > > I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and > > > > > > > > > > these OPs won't return -EAGAIN, then run in the current task context > > > > > > > > > > directly. > > > > > > > > > > > > > > > > > > Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of > > > > > > > > > it :-) > > > > > > > > > > > > > > > > But ->always_iowq isn't actually _always_ since fallocate/fsync/... are > > > > > > > > not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of > > > > > > > > ->always_iowq is a bit confusing? > > > > > > > > > > > > > > Yeah naming isn't that great, I can see how that's bit confusing. I'll > > > > > > > be happy to take suggestions on what would make it clearer. > > > > > > > > > > > > Except for the naming, I am also wondering why these ->always_iowq OPs > > > > > > aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it > > > > > > shouldn't improve performance by doing so because these OPs are supposed > > > > > > to be slow and always slept, not like others(buffered writes, ...), > > > > > > can you provide one hint about not offloading these OPs? Or is it just that > > > > > > NO_OFFLOAD needs to not offload every OPs? > > > > > > > > > > The whole point of NO_OFFLOAD is that items that would normally be > > > > > passed to io-wq are just run inline. This provides a way to reap the > > > > > benefits of batched submissions and syscall reductions. Some opcodes > > > > > will just never be async, and io-wq offloads are not very fast. Some of > > > > > > > > Yeah, seems io-wq is much slower than inline issue, maybe it needs > > > > to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. > > > > > > Indeed, depending on what is being linked, you may see io-wq activity > > > which is not ideal. > > > > That is why I prefer to fused command for ublk zero copy, because the > > registering buffer approach suggested by Pavel and Ziyang has to link > > register buffer OP with the actual IO OP, and it is observed that > > IOPS drops to 1/2 in 4k random io test with registered buffer approach. > > What's good about it is that you can use linked requests with it > but you _don't have to_. Can you explain it a bit? The register buffer OP has to be done before using it since the buffer register is done by calling ->uring_cmd(). > > Curiously, I just recently compared submitting 8 two-request links > (16 reqs in total) vs submit(8)+submit(8), all that in a loop. > The latter was faster. It wasn't a clean experiment, but shows > that links are not super fast and would be nice to get them better. > > For the register buf approach, I tried it out, looked good to me. > It outperforms splice requests (with a hack that removes force > iowq execution) by 5-10% with synthetic benchmark. Works better than > splice(2) for QD>=2. Let me send it out, perhaps today, so we can > figure out how it compares against ublk/fused and see the margin is. Cool! I will take a look and see if it can be used for ublk zero copy. Thanks, Ming
On Wed, Apr 26, 2023 at 11:25:15AM +0800, Ming Lei wrote: > On Tue, Apr 25, 2023 at 04:46:03PM +0100, Pavel Begunkov wrote: > > On 4/25/23 16:25, Jens Axboe wrote: > > > On 4/25/23 9:07?AM, Ming Lei wrote: > > > > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote: > > > > > On 4/25/23 8:42?AM, Ming Lei wrote: > > > > > > On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: > > > > > > > On 4/24/23 8:50?PM, Ming Lei wrote: > > > > > > > > On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: > > > > > > > > > On 4/24/23 8:13?PM, Ming Lei wrote: > > > > > > > > > > On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: > > > > > > > > > > > On 4/24/23 6:57?PM, Ming Lei wrote: > > > > > > > > > > > > On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: > > > > > > > > > > > > > On 4/24/23 1:30?AM, Ming Lei wrote: > > > > > > > > > > > > > > On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: > > > > > > > > > > > > > > > Add an opdef bit for them, and set it for the opcodes where we always > > > > > > > > > > > > > > > need io-wq punt. With that done, exclude them from the file_can_poll() > > > > > > > > > > > > > > > check in terms of whether or not we need to punt them if any of the > > > > > > > > > > > > > > > NO_OFFLOAD flags are set. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > io_uring/io_uring.c | 2 +- > > > > > > > > > > > > > > > io_uring/opdef.c | 22 ++++++++++++++++++++-- > > > > > > > > > > > > > > > io_uring/opdef.h | 2 ++ > > > > > > > > > > > > > > > 3 files changed, 23 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > > > > > > > > > > > > > > > index fee3e461e149..420cfd35ebc6 100644 > > > > > > > > > > > > > > > --- a/io_uring/io_uring.c > > > > > > > > > > > > > > > +++ b/io_uring/io_uring.c > > > > > > > > > > > > > > > @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > > > > > > > > > > > > > > > return -EBADF; > > > > > > > > > > > > > > > if (issue_flags & IO_URING_F_NO_OFFLOAD && > > > > > > > > > > > > > > > - (!req->file || !file_can_poll(req->file))) > > > > > > > > > > > > > > > + (!req->file || !file_can_poll(req->file) || def->always_iowq)) > > > > > > > > > > > > > > > issue_flags &= ~IO_URING_F_NONBLOCK; > > > > > > > > > > > > > > > > > > > > > > > > > > > > I guess the check should be !def->always_iowq? > > > > > > > > > > > > > > > > > > > > > > > > > > How so? Nobody that takes pollable files should/is setting > > > > > > > > > > > > > ->always_iowq. If we can poll the file, we should not force inline > > > > > > > > > > > > > submission. Basically the ones setting ->always_iowq always do -EAGAIN > > > > > > > > > > > > > returns if nonblock == true. > > > > > > > > > > > > > > > > > > > > > > > > I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and > > > > > > > > > > > > these OPs won't return -EAGAIN, then run in the current task context > > > > > > > > > > > > directly. > > > > > > > > > > > > > > > > > > > > > > Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of > > > > > > > > > > > it :-) > > > > > > > > > > > > > > > > > > > > But ->always_iowq isn't actually _always_ since fallocate/fsync/... are > > > > > > > > > > not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of > > > > > > > > > > ->always_iowq is a bit confusing? > > > > > > > > > > > > > > > > > > Yeah naming isn't that great, I can see how that's bit confusing. I'll > > > > > > > > > be happy to take suggestions on what would make it clearer. > > > > > > > > > > > > > > > > Except for the naming, I am also wondering why these ->always_iowq OPs > > > > > > > > aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it > > > > > > > > shouldn't improve performance by doing so because these OPs are supposed > > > > > > > > to be slow and always slept, not like others(buffered writes, ...), > > > > > > > > can you provide one hint about not offloading these OPs? Or is it just that > > > > > > > > NO_OFFLOAD needs to not offload every OPs? > > > > > > > > > > > > > > The whole point of NO_OFFLOAD is that items that would normally be > > > > > > > passed to io-wq are just run inline. This provides a way to reap the > > > > > > > benefits of batched submissions and syscall reductions. Some opcodes > > > > > > > will just never be async, and io-wq offloads are not very fast. Some of > > > > > > > > > > > > Yeah, seems io-wq is much slower than inline issue, maybe it needs > > > > > > to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. > > > > > > > > > > Indeed, depending on what is being linked, you may see io-wq activity > > > > > which is not ideal. > > > > > > > > That is why I prefer to fused command for ublk zero copy, because the > > > > registering buffer approach suggested by Pavel and Ziyang has to link > > > > register buffer OP with the actual IO OP, and it is observed that > > > > IOPS drops to 1/2 in 4k random io test with registered buffer approach. > > > > > > It'd be worth looking into if we can avoid io-wq for link execution, as > > > that'd be a nice win overall too. IIRC, there's no reason why it can't > > > be done like initial issue rather than just a lazy punt to io-wq. > > > > I might've missed a part of the discussion, but links are _usually_ > > executed by task_work, e.g. > > > > io_submit_flush_completions() -> io_queue_next() -> io_req_task_queue() > > Good catch, just figured out that /dev/ublkcN & backing file isn't opened by > O_NONBLOCK. > > But -EAGAIN is still returned from io_write() even though the regular > file is opened with O_DIRECT, at least on btrfs & xfs, so io wq is still > scheduled. Not look into the exact reason yet, and not see such issue for > block device. Anyway, it isn't related with io wq. It is because -EAGAIN is returned from call_write_iter() in case of IOCB_NOWAIT, so it is exactly what Jens's patchset is addressing. Thanks, Ming
On 4/25/23 23:28, Pavel Begunkov wrote: > On 4/25/23 15:42, Ming Lei wrote: >> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: >>> On 4/24/23 8:50?PM, Ming Lei wrote: >>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: >>>>> On 4/24/23 8:13?PM, Ming Lei wrote: >>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: >>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote: >>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: >>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote: >>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: >>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where >>>>>>>>>>> we always >>>>>>>>>>> need io-wq punt. With that done, exclude them from the >>>>>>>>>>> file_can_poll() >>>>>>>>>>> check in terms of whether or not we need to punt them if any >>>>>>>>>>> of the >>>>>>>>>>> NO_OFFLOAD flags are set. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>>>>>>>> --- >>>>>>>>>>> io_uring/io_uring.c | 2 +- >>>>>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- >>>>>>>>>>> io_uring/opdef.h | 2 ++ >>>>>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644 >>>>>>>>>>> --- a/io_uring/io_uring.c >>>>>>>>>>> +++ b/io_uring/io_uring.c >>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct >>>>>>>>>>> io_kiocb *req, unsigned int issue_flags) >>>>>>>>>>> return -EBADF; >>>>>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && >>>>>>>>>>> - (!req->file || !file_can_poll(req->file))) >>>>>>>>>>> + (!req->file || !file_can_poll(req->file) || >>>>>>>>>>> def->always_iowq)) >>>>>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; >>>>>>>>>> >>>>>>>>>> I guess the check should be !def->always_iowq? >>>>>>>>> >>>>>>>>> How so? Nobody that takes pollable files should/is setting >>>>>>>>> ->always_iowq. If we can poll the file, we should not force >>>>>>>>> inline >>>>>>>>> submission. Basically the ones setting ->always_iowq always do >>>>>>>>> -EAGAIN >>>>>>>>> returns if nonblock == true. >>>>>>>> >>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and >>>>>>>> these OPs won't return -EAGAIN, then run in the current task >>>>>>>> context >>>>>>>> directly. >>>>>>> >>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the >>>>>>> point of >>>>>>> it :-) >>>>>> >>>>>> But ->always_iowq isn't actually _always_ since >>>>>> fallocate/fsync/... are >>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the >>>>>> naming of >>>>>> ->always_iowq is a bit confusing? >>>>> >>>>> Yeah naming isn't that great, I can see how that's bit confusing. >>>>> I'll >>>>> be happy to take suggestions on what would make it clearer. >>>> >>>> Except for the naming, I am also wondering why these ->always_iowq OPs >>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it >>>> shouldn't improve performance by doing so because these OPs are >>>> supposed >>>> to be slow and always slept, not like others(buffered writes, ...), >>>> can you provide one hint about not offloading these OPs? Or is it >>>> just that >>>> NO_OFFLOAD needs to not offload every OPs? >>> >>> The whole point of NO_OFFLOAD is that items that would normally be >>> passed to io-wq are just run inline. This provides a way to reap the >>> benefits of batched submissions and syscall reductions. Some opcodes >>> will just never be async, and io-wq offloads are not very fast. Some of >> >> Yeah, seems io-wq is much slower than inline issue, maybe it needs >> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. > > There were attempts like this one from Hao (CC'ed) > > https://lore.kernel.org/io-uring/20220627133541.15223-5-hao.xu@linux.dev/t/ > > > Not sure why it got stalled, but maybe Hao would be willing > to pick it up again. Hi folks, I'd like to pick it up again, but I just didn't get any reply at that time after sending several versions of it...so before I restart that series, I'd like to ask Jens to comment the idea of that patchset (fixed worker). Thanks, Hao
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index fee3e461e149..420cfd35ebc6 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) return -EBADF; if (issue_flags & IO_URING_F_NO_OFFLOAD && - (!req->file || !file_can_poll(req->file))) + (!req->file || !file_can_poll(req->file) || def->always_iowq)) issue_flags &= ~IO_URING_F_NONBLOCK; if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred())) diff --git a/io_uring/opdef.c b/io_uring/opdef.c index cca7c5b55208..686d46001622 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -82,6 +82,7 @@ const struct io_issue_def io_issue_defs[] = { [IORING_OP_FSYNC] = { .needs_file = 1, .audit_skip = 1, + .always_iowq = 1, .prep = io_fsync_prep, .issue = io_fsync, }, @@ -125,6 +126,7 @@ const struct io_issue_def io_issue_defs[] = { [IORING_OP_SYNC_FILE_RANGE] = { .needs_file = 1, .audit_skip = 1, + .always_iowq = 1, .prep = io_sfr_prep, .issue = io_sync_file_range, }, @@ -202,6 +204,7 @@ const struct io_issue_def io_issue_defs[] = { }, [IORING_OP_FALLOCATE] = { .needs_file = 1, + .always_iowq = 1, .prep = io_fallocate_prep, .issue = io_fallocate, }, @@ -221,6 +224,7 @@ const struct io_issue_def io_issue_defs[] = { }, [IORING_OP_STATX] = { .audit_skip = 1, + .always_iowq = 1, .prep = io_statx_prep, .issue = io_statx, }, @@ -253,11 +257,13 @@ const struct io_issue_def io_issue_defs[] = { [IORING_OP_FADVISE] = { .needs_file = 1, .audit_skip = 1, + .always_iowq = 1, .prep = io_fadvise_prep, .issue = io_fadvise, }, [IORING_OP_MADVISE] = { .audit_skip = 1, + .always_iowq = 1, .prep = io_madvise_prep, .issue = io_madvise, }, @@ -308,6 +314,7 @@ const struct io_issue_def io_issue_defs[] = { .hash_reg_file = 1, .unbound_nonreg_file = 1, .audit_skip = 1, + .always_iowq = 1, .prep = io_splice_prep, .issue = io_splice, }, @@ -328,11 +335,13 @@ const struct io_issue_def io_issue_defs[] = { .hash_reg_file = 1, .unbound_nonreg_file = 1, .audit_skip = 1, + .always_iowq = 1, .prep = io_tee_prep, .issue = io_tee, }, [IORING_OP_SHUTDOWN] = { .needs_file = 1, + .always_iowq = 1, #if defined(CONFIG_NET) .prep = io_shutdown_prep, .issue = io_shutdown, @@ -343,22 +352,27 @@ const struct io_issue_def io_issue_defs[] = { [IORING_OP_RENAMEAT] = { .prep = io_renameat_prep, .issue = io_renameat, + .always_iowq = 1, }, [IORING_OP_UNLINKAT] = { .prep = io_unlinkat_prep, .issue = io_unlinkat, + .always_iowq = 1, }, [IORING_OP_MKDIRAT] = { .prep = io_mkdirat_prep, .issue = io_mkdirat, + .always_iowq = 1, }, [IORING_OP_SYMLINKAT] = { .prep = io_symlinkat_prep, .issue = io_symlinkat, + .always_iowq = 1, }, [IORING_OP_LINKAT] = { .prep = io_linkat_prep, .issue = io_linkat, + .always_iowq = 1, }, [IORING_OP_MSG_RING] = { .needs_file = 1, @@ -367,20 +381,24 @@ const struct io_issue_def io_issue_defs[] = { .issue = io_msg_ring, }, [IORING_OP_FSETXATTR] = { - .needs_file = 1, + .needs_file = 1, + .always_iowq = 1, .prep = io_fsetxattr_prep, .issue = io_fsetxattr, }, [IORING_OP_SETXATTR] = { + .always_iowq = 1, .prep = io_setxattr_prep, .issue = io_setxattr, }, [IORING_OP_FGETXATTR] = { - .needs_file = 1, + .needs_file = 1, + .always_iowq = 1, .prep = io_fgetxattr_prep, .issue = io_fgetxattr, }, [IORING_OP_GETXATTR] = { + .always_iowq = 1, .prep = io_getxattr_prep, .issue = io_getxattr, }, diff --git a/io_uring/opdef.h b/io_uring/opdef.h index c22c8696e749..657a831249ff 100644 --- a/io_uring/opdef.h +++ b/io_uring/opdef.h @@ -29,6 +29,8 @@ struct io_issue_def { unsigned iopoll_queue : 1; /* opcode specific path will handle ->async_data allocation if needed */ unsigned manual_alloc : 1; + /* op always needs io-wq offload */ + unsigned always_iowq : 1; int (*issue)(struct io_kiocb *, unsigned int); int (*prep)(struct io_kiocb *, const struct io_uring_sqe *);
Add an opdef bit for them, and set it for the opcodes where we always need io-wq punt. With that done, exclude them from the file_can_poll() check in terms of whether or not we need to punt them if any of the NO_OFFLOAD flags are set. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- io_uring/io_uring.c | 2 +- io_uring/opdef.c | 22 ++++++++++++++++++++-- io_uring/opdef.h | 2 ++ 3 files changed, 23 insertions(+), 3 deletions(-)