Message ID | 20200107170034.16165-1-axboe@kernel.dk (mailing list archive) |
---|---|
Headers | show |
Series | io_uring: add support for open/close | expand |
Am 07.01.20 um 18:00 schrieb Jens Axboe: > Sending this out separately, as I rebased it on top of the work.openat2 > branch from Al to resolve some of the conflicts with the differences in > how open flags are built. Now that you rebased on top of openat2, wouldn't it be better to add openat2 that to io_uring instead of the old openat call? metze
On 1/8/20 2:17 PM, Stefan Metzmacher wrote: > Am 07.01.20 um 18:00 schrieb Jens Axboe: >> Sending this out separately, as I rebased it on top of the work.openat2 >> branch from Al to resolve some of the conflicts with the differences in >> how open flags are built. > > Now that you rebased on top of openat2, wouldn't it be better to add > openat2 that to io_uring instead of the old openat call? The IORING_OP_OPENAT already exists, so it would probably make more sense to add IORING_OP_OPENAT2 alongside that. Or I could just change it. Don't really feel that strongly about it, I'll probably just add openat2 and leave openat alone, openat will just be a wrapper around openat2 anyway.
Am 08.01.20 um 23:57 schrieb Jens Axboe: > On 1/8/20 2:17 PM, Stefan Metzmacher wrote: >> Am 07.01.20 um 18:00 schrieb Jens Axboe: >>> Sending this out separately, as I rebased it on top of the work.openat2 >>> branch from Al to resolve some of the conflicts with the differences in >>> how open flags are built. >> >> Now that you rebased on top of openat2, wouldn't it be better to add >> openat2 that to io_uring instead of the old openat call? > > The IORING_OP_OPENAT already exists, so it would probably make more sense > to add IORING_OP_OPENAT2 alongside that. Or I could just change it. Don't > really feel that strongly about it, I'll probably just add openat2 and > leave openat alone, openat will just be a wrapper around openat2 anyway. Great, thanks! metze
On 1/8/20 4:05 PM, Stefan Metzmacher wrote: > Am 08.01.20 um 23:57 schrieb Jens Axboe: >> On 1/8/20 2:17 PM, Stefan Metzmacher wrote: >>> Am 07.01.20 um 18:00 schrieb Jens Axboe: >>>> Sending this out separately, as I rebased it on top of the work.openat2 >>>> branch from Al to resolve some of the conflicts with the differences in >>>> how open flags are built. >>> >>> Now that you rebased on top of openat2, wouldn't it be better to add >>> openat2 that to io_uring instead of the old openat call? >> >> The IORING_OP_OPENAT already exists, so it would probably make more sense >> to add IORING_OP_OPENAT2 alongside that. Or I could just change it. Don't >> really feel that strongly about it, I'll probably just add openat2 and >> leave openat alone, openat will just be a wrapper around openat2 anyway. > > Great, thanks! Here: https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs Not tested yet, will wire this up in liburing and write a test case as well.
On 1/8/20 6:02 PM, Jens Axboe wrote: > On 1/8/20 4:05 PM, Stefan Metzmacher wrote: >> Am 08.01.20 um 23:57 schrieb Jens Axboe: >>> On 1/8/20 2:17 PM, Stefan Metzmacher wrote: >>>> Am 07.01.20 um 18:00 schrieb Jens Axboe: >>>>> Sending this out separately, as I rebased it on top of the work.openat2 >>>>> branch from Al to resolve some of the conflicts with the differences in >>>>> how open flags are built. >>>> >>>> Now that you rebased on top of openat2, wouldn't it be better to add >>>> openat2 that to io_uring instead of the old openat call? >>> >>> The IORING_OP_OPENAT already exists, so it would probably make more sense >>> to add IORING_OP_OPENAT2 alongside that. Or I could just change it. Don't >>> really feel that strongly about it, I'll probably just add openat2 and >>> leave openat alone, openat will just be a wrapper around openat2 anyway. >> >> Great, thanks! > > Here: > > https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs > > Not tested yet, will wire this up in liburing and write a test case > as well. Wrote a basic test case, and used my openbench as well. Seems to work fine for me. Pushed prep etc support to liburing.
Am 09.01.20 um 03:03 schrieb Jens Axboe: > On 1/8/20 6:02 PM, Jens Axboe wrote: >> On 1/8/20 4:05 PM, Stefan Metzmacher wrote: >>> Am 08.01.20 um 23:57 schrieb Jens Axboe: >>>> On 1/8/20 2:17 PM, Stefan Metzmacher wrote: >>>>> Am 07.01.20 um 18:00 schrieb Jens Axboe: >>>>>> Sending this out separately, as I rebased it on top of the work.openat2 >>>>>> branch from Al to resolve some of the conflicts with the differences in >>>>>> how open flags are built. >>>>> >>>>> Now that you rebased on top of openat2, wouldn't it be better to add >>>>> openat2 that to io_uring instead of the old openat call? >>>> >>>> The IORING_OP_OPENAT already exists, so it would probably make more sense >>>> to add IORING_OP_OPENAT2 alongside that. Or I could just change it. Don't >>>> really feel that strongly about it, I'll probably just add openat2 and >>>> leave openat alone, openat will just be a wrapper around openat2 anyway. >>> >>> Great, thanks! >> >> Here: >> >> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs >> >> Not tested yet, will wire this up in liburing and write a test case >> as well. > > Wrote a basic test case, and used my openbench as well. Seems to work > fine for me. Pushed prep etc support to liburing. Thanks! Another great feature would the possibility to make use of the generated fd in the following request. This is a feature that's also available in the SMB3 protocol called compound related requests. The client can compound a chain with open, getinfo, read, close getinfo, read and close get an file handle of -1 and implicitly get the fd generated/used in the previous request. metze
On 1/16/20 3:50 PM, Stefan Metzmacher wrote: > Am 09.01.20 um 03:03 schrieb Jens Axboe: >> On 1/8/20 6:02 PM, Jens Axboe wrote: >>> On 1/8/20 4:05 PM, Stefan Metzmacher wrote: >>>> Am 08.01.20 um 23:57 schrieb Jens Axboe: >>>>> On 1/8/20 2:17 PM, Stefan Metzmacher wrote: >>>>>> Am 07.01.20 um 18:00 schrieb Jens Axboe: >>>>>>> Sending this out separately, as I rebased it on top of the work.openat2 >>>>>>> branch from Al to resolve some of the conflicts with the differences in >>>>>>> how open flags are built. >>>>>> >>>>>> Now that you rebased on top of openat2, wouldn't it be better to add >>>>>> openat2 that to io_uring instead of the old openat call? >>>>> >>>>> The IORING_OP_OPENAT already exists, so it would probably make more sense >>>>> to add IORING_OP_OPENAT2 alongside that. Or I could just change it. Don't >>>>> really feel that strongly about it, I'll probably just add openat2 and >>>>> leave openat alone, openat will just be a wrapper around openat2 anyway. >>>> >>>> Great, thanks! >>> >>> Here: >>> >>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs >>> >>> Not tested yet, will wire this up in liburing and write a test case >>> as well. >> >> Wrote a basic test case, and used my openbench as well. Seems to work >> fine for me. Pushed prep etc support to liburing. > > Thanks! > > Another great feature would the possibility to make use of the > generated fd in the following request. > > This is a feature that's also available in the SMB3 protocol > called compound related requests. > > The client can compound a chain with open, getinfo, read, close > getinfo, read and close get an file handle of -1 and implicitly > get the fd generated/used in the previous request. Right, the "plan" there is to utilize BPF to make this programmable. We really need something more expressive to be able to pass information between SQEs that are linked, or even to decide which link to run depending on the outcome of the parent. There's a lot of potential there!
On Thu, Jan 16, 2020, at 5:50 PM, Stefan Metzmacher wrote: > > The client can compound a chain with open, getinfo, read, close > getinfo, read and close get an file handle of -1 and implicitly > get the fd generated/used in the previous request. Sounds similar to https://capnproto.org/rpc.html too. But that seems most valuable in a situation with nontrivial latency.
On 1/16/20 5:44 PM, Colin Walters wrote: > > > On Thu, Jan 16, 2020, at 5:50 PM, Stefan Metzmacher wrote: >> >> The client can compound a chain with open, getinfo, read, close >> getinfo, read and close get an file handle of -1 and implicitly >> get the fd generated/used in the previous request. > > Sounds similar to https://capnproto.org/rpc.html too. Never heard of it, but I don't see how you can do any of this in an efficient manner without kernel support. Unless it's just wrapping the communication in its own protocol and using that as the basis for the on-wire part. It has "infinitely faster" in a yellow sticker, so it must work :-) > But that seems most valuable in a situation with nontrivial latency. Which is basically what Stefan is looking at, over the network open, read, close etc is pretty nondeterministic. But even for local storage, being able to setup a bundle like that and have it automagically work makes for easier programming and more efficient communication with the kernel.
On 1/17/2020 3:44 AM, Colin Walters wrote: > On Thu, Jan 16, 2020, at 5:50 PM, Stefan Metzmacher wrote: >> The client can compound a chain with open, getinfo, read, close >> getinfo, read and close get an file handle of -1 and implicitly >> get the fd generated/used in the previous request. > > Sounds similar to https://capnproto.org/rpc.html too. > Looks like just grouping a pack of operations for RPC. With io_uring we could implement more interesting stuff. I've been thinking about eBPF in io_uring for a while as well, and apparently it could be _really_ powerful, and would allow almost zero-context-switches for some usecases. 1. full flow control with eBPF - dropping requests (links) - emitting reqs/links (e.g. after completions of another req) - chaining/redirecting of course, all of that with fast intermediate computations in between 2. do long eBPF programs by introducing a new opcode (punted to async). (though, there would be problems with that) Could even allow to dynamically register new opcodes within the kernel and extend it to eBPF, if there will be demand for such things.
On 1/17/20 2:32 AM, Pavel Begunkov wrote: > On 1/17/2020 3:44 AM, Colin Walters wrote: >> On Thu, Jan 16, 2020, at 5:50 PM, Stefan Metzmacher wrote: >>> The client can compound a chain with open, getinfo, read, close >>> getinfo, read and close get an file handle of -1 and implicitly >>> get the fd generated/used in the previous request. >> >> Sounds similar to https://capnproto.org/rpc.html too. >> > Looks like just grouping a pack of operations for RPC. > With io_uring we could implement more interesting stuff. I've been > thinking about eBPF in io_uring for a while as well, and apparently it > could be _really_ powerful, and would allow almost zero-context-switches > for some usecases. > > 1. full flow control with eBPF > - dropping requests (links) > - emitting reqs/links (e.g. after completions of another req) > - chaining/redirecting > of course, all of that with fast intermediate computations in between > > 2. do long eBPF programs by introducing a new opcode (punted to async). > (though, there would be problems with that) > > Could even allow to dynamically register new opcodes within the kernel > and extend it to eBPF, if there will be demand for such things. We're also looking into exactly that at Facebook, nothing concrete yet though. But it's clear we need it to take full advantage of links at least, and it's also clear that it would unlock a lot of really cool functionality once we do. Pavel, I'd strongly urge you to submit a talk to LSF/MM/BPF about this. It's the perfect venue to have some concrete planning around this topic and get things rolling. https://lore.kernel.org/bpf/20191122172502.vffyfxlqejthjib6@macbook-pro-91.dhcp.thefacebook.com/
On 17/01/2020 18:21, Jens Axboe wrote: > On 1/17/20 2:32 AM, Pavel Begunkov wrote: >> On 1/17/2020 3:44 AM, Colin Walters wrote: >>> On Thu, Jan 16, 2020, at 5:50 PM, Stefan Metzmacher wrote: >>>> The client can compound a chain with open, getinfo, read, close >>>> getinfo, read and close get an file handle of -1 and implicitly >>>> get the fd generated/used in the previous request. >>> >>> Sounds similar to https://capnproto.org/rpc.html too. >>> >> Looks like just grouping a pack of operations for RPC. >> With io_uring we could implement more interesting stuff. I've been >> thinking about eBPF in io_uring for a while as well, and apparently it >> could be _really_ powerful, and would allow almost zero-context-switches >> for some usecases. >> >> 1. full flow control with eBPF >> - dropping requests (links) >> - emitting reqs/links (e.g. after completions of another req) >> - chaining/redirecting >> of course, all of that with fast intermediate computations in between >> >> 2. do long eBPF programs by introducing a new opcode (punted to async). >> (though, there would be problems with that) >> >> Could even allow to dynamically register new opcodes within the kernel >> and extend it to eBPF, if there will be demand for such things. > > We're also looking into exactly that at Facebook, nothing concrete yet > though. But it's clear we need it to take full advantage of links at > least, and it's also clear that it would unlock a lot of really cool > functionality once we do. > > Pavel, I'd strongly urge you to submit a talk to LSF/MM/BPF about this. > It's the perfect venue to have some concrete planning around this topic > and get things rolling. Sounds interesting, I'll try this, but didn't you intend to do it yourself? And thanks for the tip! > > https://lore.kernel.org/bpf/20191122172502.vffyfxlqejthjib6@macbook-pro-91.dhcp.thefacebook.com/ >
On 1/17/20 3:27 PM, Pavel Begunkov wrote: > On 17/01/2020 18:21, Jens Axboe wrote: >> On 1/17/20 2:32 AM, Pavel Begunkov wrote: >>> On 1/17/2020 3:44 AM, Colin Walters wrote: >>>> On Thu, Jan 16, 2020, at 5:50 PM, Stefan Metzmacher wrote: >>>>> The client can compound a chain with open, getinfo, read, close >>>>> getinfo, read and close get an file handle of -1 and implicitly >>>>> get the fd generated/used in the previous request. >>>> >>>> Sounds similar to https://capnproto.org/rpc.html too. >>>> >>> Looks like just grouping a pack of operations for RPC. >>> With io_uring we could implement more interesting stuff. I've been >>> thinking about eBPF in io_uring for a while as well, and apparently it >>> could be _really_ powerful, and would allow almost zero-context-switches >>> for some usecases. >>> >>> 1. full flow control with eBPF >>> - dropping requests (links) >>> - emitting reqs/links (e.g. after completions of another req) >>> - chaining/redirecting >>> of course, all of that with fast intermediate computations in between >>> >>> 2. do long eBPF programs by introducing a new opcode (punted to async). >>> (though, there would be problems with that) >>> >>> Could even allow to dynamically register new opcodes within the kernel >>> and extend it to eBPF, if there will be demand for such things. >> >> We're also looking into exactly that at Facebook, nothing concrete yet >> though. But it's clear we need it to take full advantage of links at >> least, and it's also clear that it would unlock a lot of really cool >> functionality once we do. >> >> Pavel, I'd strongly urge you to submit a talk to LSF/MM/BPF about this. >> It's the perfect venue to have some concrete planning around this topic >> and get things rolling. > > Sounds interesting, I'll try this, but didn't you intend to do it > yourself? And thanks for the tip! Just trying to delegate a bit, and I think you'd be a great candidate to drive this. I'll likely do some other io_uring related topic there.
Hi Jens, >> Thanks! >> >> Another great feature would the possibility to make use of the >> generated fd in the following request. >> >> This is a feature that's also available in the SMB3 protocol >> called compound related requests. >> >> The client can compound a chain with open, getinfo, read, close >> getinfo, read and close get an file handle of -1 and implicitly >> get the fd generated/used in the previous request. > > Right, the "plan" there is to utilize BPF to make this programmable. > We really need something more expressive to be able to pass information > between SQEs that are linked, or even to decide which link to run > depending on the outcome of the parent. > > There's a lot of potential there! I guess so, but I don't yet understand how BPF works in real life. Is it possible to do that as normal user without special privileges? My naive way would be using some flags and get res and pass fd by reference. metze
On 1/20/2020 3:15 PM, Stefan Metzmacher wrote: > Hi Jens, > >>> Thanks! >>> >>> Another great feature would the possibility to make use of the >>> generated fd in the following request. >>> >>> This is a feature that's also available in the SMB3 protocol >>> called compound related requests. >>> >>> The client can compound a chain with open, getinfo, read, close >>> getinfo, read and close get an file handle of -1 and implicitly >>> get the fd generated/used in the previous request. >> >> Right, the "plan" there is to utilize BPF to make this programmable. >> We really need something more expressive to be able to pass information >> between SQEs that are linked, or even to decide which link to run >> depending on the outcome of the parent. >> >> There's a lot of potential there! > > I guess so, but I don't yet understand how BPF works in real life. > > Is it possible to do that as normal user without special privileges? > > My naive way would be using some flags and get res and pass fd by reference. > Just have been discussing related stuff. See the link if curious https://github.com/axboe/liburing/issues/58 To summarise, there won't be enough flags to cover all use-cases and it will slow down the common path. There should be something with zero-overhead if the feature is not used, and that's not the case with flags. That's why it'd be great to have a custom eBPF program (in-kernel) controlling what and how to do next. I don't much about eBPF internals, but probably we will be able to attach an eBPF program to io_uring instance. Though, not sure whether it could be done without privileges.