diff mbox series

[5/8] ublk: document zero copy feature

Message ID 20250324134905.766777-6-ming.lei@redhat.com (mailing list archive)
State New
Headers show
Series ublk: cleanup & improvement & zc follow-up | expand

Checks

Context Check Description
shin/vmtest-for-next-PR success PR summary
shin/vmtest-for-next-VM_Test-1 success Logs for build-kernel
shin/vmtest-for-next-VM_Test-0 success Logs for build-kernel

Commit Message

Ming Lei March 24, 2025, 1:49 p.m. UTC
Add words to explain how zero copy feature works, and why it has to be
trusted for handling IO read command.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 Documentation/block/ublk.rst | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

Comments

Caleb Sander Mateos March 25, 2025, 3:26 p.m. UTC | #1
On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> Add words to explain how zero copy feature works, and why it has to be
> trusted for handling IO read command.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  Documentation/block/ublk.rst | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> index 1e0e7358e14a..33efff25b54d 100644
> --- a/Documentation/block/ublk.rst
> +++ b/Documentation/block/ublk.rst
> @@ -309,18 +309,30 @@ with specified IO tag in the command data:
>    ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
>    the server buffer (pages) read to the IO request pages.
>
> -Future development
> -==================
> -
>  Zero copy
>  ---------
>
> -Zero copy is a generic requirement for nbd, fuse or similar drivers. A
> -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
> -can't be remapped any more in kernel with existing mm interfaces. This can
> -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
> -big requests (IO size >= 256 KB) may benefit a lot from zero copy.
> +ublk zero copy relies on io_uring's fixed kernel buffer, which provides
> +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.

nit: missing parentheses after io_buffer_unregister_bvec

> +
> +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
> +`io_buffer_register_bvec()` for ublk server to register client request
> +buffer into io_uring buffer table, then ublk server can submit io_uring
> +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
> +calls `io_buffer_unregister_bvec` to unregister the buffer.

Parentheses missing here too.
It might be worth adding a note that the registered buffer and any I/O
that uses it will hold a reference on the ublk request. For a ublk
server implementer, I think it's useful to know that the buffer needs
to be unregistered in order to complete the ublk request, and that the
zero-copy I/O won't corrupt any data even if it completes after the
buffer is unregistered.

> +
> +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted,
> +because it is ublk server's responsibility to make sure IO buffer filled
> +with data, and ublk server has to handle short read correctly by returning
> +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer
> +will be exposed to client application.

This isn't specific to zero-copy, right? ublk devices configured with
UBLK_F_USER_COPY also need to initialize the full request buffer. I
would also drop the "handle short read" part; ublk servers don't
necessarily issue read operations in the backend, so "short read" may
not be applicable.

Best,
Caleb

> +
> +ublk server needs to align the parameter of `struct ublk_param_dma_align`
> +with backend for zero copy to work correctly.
>
> +For reaching best IO performance, ublk server should align its segment
> +parameter of `struct ublk_param_segment` with backend for avoiding
> +unnecessary IO split.
>
>  References
>  ==========
> --
> 2.47.0
>
Ming Lei March 26, 2025, 2:29 a.m. UTC | #2
On Tue, Mar 25, 2025 at 08:26:18AM -0700, Caleb Sander Mateos wrote:
> On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > Add words to explain how zero copy feature works, and why it has to be
> > trusted for handling IO read command.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  Documentation/block/ublk.rst | 28 ++++++++++++++++++++--------
> >  1 file changed, 20 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> > index 1e0e7358e14a..33efff25b54d 100644
> > --- a/Documentation/block/ublk.rst
> > +++ b/Documentation/block/ublk.rst
> > @@ -309,18 +309,30 @@ with specified IO tag in the command data:
> >    ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
> >    the server buffer (pages) read to the IO request pages.
> >
> > -Future development
> > -==================
> > -
> >  Zero copy
> >  ---------
> >
> > -Zero copy is a generic requirement for nbd, fuse or similar drivers. A
> > -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
> > -can't be remapped any more in kernel with existing mm interfaces. This can
> > -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
> > -big requests (IO size >= 256 KB) may benefit a lot from zero copy.
> > +ublk zero copy relies on io_uring's fixed kernel buffer, which provides
> > +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.
> 
> nit: missing parentheses after io_buffer_unregister_bvec

OK

> 
> > +
> > +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
> > +`io_buffer_register_bvec()` for ublk server to register client request
> > +buffer into io_uring buffer table, then ublk server can submit io_uring
> > +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
> > +calls `io_buffer_unregister_bvec` to unregister the buffer.
> 
> Parentheses missing here too.
> It might be worth adding a note that the registered buffer and any I/O
> that uses it will hold a reference on the ublk request. For a ublk
> server implementer, I think it's useful to know that the buffer needs
> to be unregistered in order to complete the ublk request, and that the
> zero-copy I/O won't corrupt any data even if it completes after the
> buffer is unregistered.

Good point, how about the following words?

```
The ublk client IO request buffer is guaranteed to be live between calling
`io_buffer_register_bvec()` and `io_buffer_unregister_bvec()`.

And any io_uring operation which supports this kind of kernel buffer will
grab one reference of the buffer until the operation is completed.
```

> 
> > +
> > +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted,
> > +because it is ublk server's responsibility to make sure IO buffer filled
> > +with data, and ublk server has to handle short read correctly by returning
> > +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer
> > +will be exposed to client application.
> 
> This isn't specific to zero-copy, right? ublk devices configured with
> UBLK_F_USER_COPY also need to initialize the full request buffer. I

Right, but it is important for zero copy.

> would also drop the "handle short read" part; ublk servers don't
> necessarily issue read operations in the backend, so "short read" may
> not be applicable.

Here 'read' in 'short read' means ublk front READ command, not actual read
done in ublk server. Maybe we can make it more accurate:

```
..., and ublk server has to return correct result to ublk driver when handling
READ command, and the result has to match with how many bytes filled to the IO
buffer. Otherwise, uninitialized kernel IO buffer will be exposed to client
application.
```

Thanks,
Ming
diff mbox series

Patch

diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
index 1e0e7358e14a..33efff25b54d 100644
--- a/Documentation/block/ublk.rst
+++ b/Documentation/block/ublk.rst
@@ -309,18 +309,30 @@  with specified IO tag in the command data:
   ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
   the server buffer (pages) read to the IO request pages.
 
-Future development
-==================
-
 Zero copy
 ---------
 
-Zero copy is a generic requirement for nbd, fuse or similar drivers. A
-problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
-can't be remapped any more in kernel with existing mm interfaces. This can
-occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
-big requests (IO size >= 256 KB) may benefit a lot from zero copy.
+ublk zero copy relies on io_uring's fixed kernel buffer, which provides
+two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.
+
+ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
+`io_buffer_register_bvec()` for ublk server to register client request
+buffer into io_uring buffer table, then ublk server can submit io_uring
+IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
+calls `io_buffer_unregister_bvec` to unregister the buffer.
+
+ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted,
+because it is ublk server's responsibility to make sure IO buffer filled
+with data, and ublk server has to handle short read correctly by returning
+correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer
+will be exposed to client application.
+
+ublk server needs to align the parameter of `struct ublk_param_dma_align`
+with backend for zero copy to work correctly.
 
+For reaching best IO performance, ublk server should align its segment
+parameter of `struct ublk_param_segment` with backend for avoiding
+unnecessary IO split.
 
 References
 ==========