Message ID | 20250324134905.766777-6-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | ublk: cleanup & improvement & zc follow-up | expand |
Context | Check | Description |
---|---|---|
shin/vmtest-for-next-PR | success | PR summary |
shin/vmtest-for-next-VM_Test-1 | success | Logs for build-kernel |
shin/vmtest-for-next-VM_Test-0 | success | Logs for build-kernel |
On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote: > > Add words to explain how zero copy feature works, and why it has to be > trusted for handling IO read command. > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > Documentation/block/ublk.rst | 28 ++++++++++++++++++++-------- > 1 file changed, 20 insertions(+), 8 deletions(-) > > diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst > index 1e0e7358e14a..33efff25b54d 100644 > --- a/Documentation/block/ublk.rst > +++ b/Documentation/block/ublk.rst > @@ -309,18 +309,30 @@ with specified IO tag in the command data: > ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy > the server buffer (pages) read to the IO request pages. > > -Future development > -================== > - > Zero copy > --------- > > -Zero copy is a generic requirement for nbd, fuse or similar drivers. A > -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace > -can't be remapped any more in kernel with existing mm interfaces. This can > -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that > -big requests (IO size >= 256 KB) may benefit a lot from zero copy. > +ublk zero copy relies on io_uring's fixed kernel buffer, which provides > +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`. nit: missing parentheses after io_buffer_unregister_bvec > + > +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call > +`io_buffer_register_bvec()` for ublk server to register client request > +buffer into io_uring buffer table, then ublk server can submit io_uring > +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF` > +calls `io_buffer_unregister_bvec` to unregister the buffer. Parentheses missing here too. It might be worth adding a note that the registered buffer and any I/O that uses it will hold a reference on the ublk request. For a ublk server implementer, I think it's useful to know that the buffer needs to be unregistered in order to complete the ublk request, and that the zero-copy I/O won't corrupt any data even if it completes after the buffer is unregistered. > + > +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted, > +because it is ublk server's responsibility to make sure IO buffer filled > +with data, and ublk server has to handle short read correctly by returning > +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer > +will be exposed to client application. This isn't specific to zero-copy, right? ublk devices configured with UBLK_F_USER_COPY also need to initialize the full request buffer. I would also drop the "handle short read" part; ublk servers don't necessarily issue read operations in the backend, so "short read" may not be applicable. Best, Caleb > + > +ublk server needs to align the parameter of `struct ublk_param_dma_align` > +with backend for zero copy to work correctly. > > +For reaching best IO performance, ublk server should align its segment > +parameter of `struct ublk_param_segment` with backend for avoiding > +unnecessary IO split. > > References > ========== > -- > 2.47.0 >
On Tue, Mar 25, 2025 at 08:26:18AM -0700, Caleb Sander Mateos wrote: > On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote: > > > > Add words to explain how zero copy feature works, and why it has to be > > trusted for handling IO read command. > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > Documentation/block/ublk.rst | 28 ++++++++++++++++++++-------- > > 1 file changed, 20 insertions(+), 8 deletions(-) > > > > diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst > > index 1e0e7358e14a..33efff25b54d 100644 > > --- a/Documentation/block/ublk.rst > > +++ b/Documentation/block/ublk.rst > > @@ -309,18 +309,30 @@ with specified IO tag in the command data: > > ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy > > the server buffer (pages) read to the IO request pages. > > > > -Future development > > -================== > > - > > Zero copy > > --------- > > > > -Zero copy is a generic requirement for nbd, fuse or similar drivers. A > > -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace > > -can't be remapped any more in kernel with existing mm interfaces. This can > > -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that > > -big requests (IO size >= 256 KB) may benefit a lot from zero copy. > > +ublk zero copy relies on io_uring's fixed kernel buffer, which provides > > +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`. > > nit: missing parentheses after io_buffer_unregister_bvec OK > > > + > > +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call > > +`io_buffer_register_bvec()` for ublk server to register client request > > +buffer into io_uring buffer table, then ublk server can submit io_uring > > +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF` > > +calls `io_buffer_unregister_bvec` to unregister the buffer. > > Parentheses missing here too. > It might be worth adding a note that the registered buffer and any I/O > that uses it will hold a reference on the ublk request. For a ublk > server implementer, I think it's useful to know that the buffer needs > to be unregistered in order to complete the ublk request, and that the > zero-copy I/O won't corrupt any data even if it completes after the > buffer is unregistered. Good point, how about the following words? ``` The ublk client IO request buffer is guaranteed to be live between calling `io_buffer_register_bvec()` and `io_buffer_unregister_bvec()`. And any io_uring operation which supports this kind of kernel buffer will grab one reference of the buffer until the operation is completed. ``` > > > + > > +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted, > > +because it is ublk server's responsibility to make sure IO buffer filled > > +with data, and ublk server has to handle short read correctly by returning > > +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer > > +will be exposed to client application. > > This isn't specific to zero-copy, right? ublk devices configured with > UBLK_F_USER_COPY also need to initialize the full request buffer. I Right, but it is important for zero copy. > would also drop the "handle short read" part; ublk servers don't > necessarily issue read operations in the backend, so "short read" may > not be applicable. Here 'read' in 'short read' means ublk front READ command, not actual read done in ublk server. Maybe we can make it more accurate: ``` ..., and ublk server has to return correct result to ublk driver when handling READ command, and the result has to match with how many bytes filled to the IO buffer. Otherwise, uninitialized kernel IO buffer will be exposed to client application. ``` Thanks, Ming
diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst index 1e0e7358e14a..33efff25b54d 100644 --- a/Documentation/block/ublk.rst +++ b/Documentation/block/ublk.rst @@ -309,18 +309,30 @@ with specified IO tag in the command data: ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy the server buffer (pages) read to the IO request pages. -Future development -================== - Zero copy --------- -Zero copy is a generic requirement for nbd, fuse or similar drivers. A -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace -can't be remapped any more in kernel with existing mm interfaces. This can -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that -big requests (IO size >= 256 KB) may benefit a lot from zero copy. +ublk zero copy relies on io_uring's fixed kernel buffer, which provides +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`. + +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call +`io_buffer_register_bvec()` for ublk server to register client request +buffer into io_uring buffer table, then ublk server can submit io_uring +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF` +calls `io_buffer_unregister_bvec` to unregister the buffer. + +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted, +because it is ublk server's responsibility to make sure IO buffer filled +with data, and ublk server has to handle short read correctly by returning +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer +will be exposed to client application. + +ublk server needs to align the parameter of `struct ublk_param_dma_align` +with backend for zero copy to work correctly. +For reaching best IO performance, ublk server should align its segment +parameter of `struct ublk_param_segment` with backend for avoiding +unnecessary IO split. References ==========
Add words to explain how zero copy feature works, and why it has to be trusted for handling IO read command. Signed-off-by: Ming Lei <ming.lei@redhat.com> --- Documentation/block/ublk.rst | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-)